Archive for the ‘Uncategorized’ Category

MG-RAST version 4.0 released today

Tuesday, November 15th, 2016

Version 4.0 is here!

After several months of beta testing and significant user feedback, we have today released the new version at

The old version is still available at (note no new data will be available in the old version).

Follow us on Twitter @mg_rast

Highlights of the new version include:

myData – access all your data in one place

The new myData page now summarizes all information about your data. It shows you your running jobs, completed studies, news and more. It is an entry point all main MG-RAST functionality. You can use the toggle bar at the top right to show and hide the different sections. Each section has links to a more detailed view or editor of the according data items. You can also use the search bar at the top to quickly find the datasets you are looking for.

analysis revamped

The new analysis page was built to cope with the growing size of datasets. Instead of waiting for the results of each parameter change, the complete abundance profiles of your selected data sets will be downloaded to your machine. You can then filter, drill-down, slice-and-dice your data all you want in an instant. Many different interactive visualizations are at your fingertips to explore your data. The plugin section offers additional tools to view your data.

Note that the RAM on your computer limits the number of data sets you can include in your analysis.

The profiles you downloaded can be saved to your harddrive and reloaded in a later session. This gives you instant access without the need to download from our server again.

MetaZen 2.0 — Improved metadata support with better feedback

Getting all set with the metadata for your project has just become a lot easier. The new version of metazen will give you direct feedback and input help when creating metadata spreadsheets. You can download the results in Excel format or just send them directly to your inbox.

metagenome overview and processing receipts

The metagenome overview page offers the data you are familiar with in a fresh look. Every bit of data now has provenance information, as well as multiple download options including the images as well as the data used to produce them. The technical details of the data progression through the MG-RAST pipeline is linked on the processing receipt page. It shows general information as well as details about every pipeline step.

MG-RAST newsletter November 2016

Wednesday, November 9th, 2016


  1. Switching to new MG-RAST web interface  on November 15th 2016
  2. Update on processing backlog
  3. Feedback
  4. Miscellaneous


1. Switching to new MG-RAST web interface on November 15th 2016

After an open beta testing period and feedback from a significant number of users, we are now ready to transition away from the old MG-RAST web interface to a new more modern and more capable user interface.

–> will direct you towards the new style interface after November 15th. <– 

As many of you have tested the new interface already (at we feel confident that the new version will add significantly to the usability of the system, the old style interface will remain available for a while at (  The old style interface will not be showing any new data, but remain functional for a while.

2. Update on processing backlog

Due to both the massive demand for MG-RAST analysis and several unplanned power failures, we experienced problems with loading the results of the analysis pipeline (aka profiles) into the database that hosts said profiles. We have now transitioned the pipeline (as of November 9th 2016) to load into Cassandra instead of Postgres and are able to load data at a rate of 1 Gigabasepair / 60 seconds. We already have uploaded 30% of all public data to Cassandra and are continuing to transition all profiles to Cassandra in the coming weeks.


Technical details below:

We are working with a 3 step plan that we present here in the interest of transparency:

– step 1:

November 9th: Switch over API and pipeline to the new (v4 or “Cassandra” version) of the code base. Any new jobs created will automatically use the new code. The old style API will remain available for a while and we will provide automatic forwarding in the API. If the automatic forwarding does not work for you, you can reach the old style (“postgres” or v3) API at You should not need to use the temporary address, but some codes might not contain support for http redirecting.

The following API resources are being forwarded to the v3 API:



Consequently they will not be available for any new data sets being added to the Cassandra database until we have retargeted them. This work is going on now.

– step 2:

We are currently loading all public data sets into the v4 database and this process will continue for a while (15,000 data sets took 1 week to load).

A large number of data sets are currently being processed. All existing jobs (they are mostly waiting for the database load steps) are being rewritten to load into the v4 databases as well. This process will start November 10th 2016 and continue for a number of days.

– step 3:

On November 15th we will switch the CNAME to point to (The old style interface will remain available at

On or around November 15th we will deploy the code to retarget the /annotation API resource to the v4 database (Cassandra).

The /matrix API resource will be retargeted at a later date. NOTE: Until it is retargeted the /matrix API resource will only support data loaded prior to October 1st 2016.

3. Feedback

We will keep the survey for the new web interface  (located at ) open at Please continue to provide us feedback for the changed web interface. The more specific you are, the better we are able to build what you need.

4. Miscellaneous

  • We have updated the manual for version 4.0
  • We will remove all old content from the Wiki with the release of version 4.0

MG-RAST newsletter, September 2016

Monday, September 19th, 2016


  1. Use version 4.0 prototype for reduced wait times for job completion
  2. Recent technical issues
  3. New web user interface taking shape
  4. Workshop announcement
  5. Miscellaneous


1. Use version 4.0 prototype for reduced wait times for job completion

Wait times for MG-RAST pipeline completion have been steadily rising for the last year. We are introducing changes to the pipeline and the database technology to accelerate processing and loading. Currently there is a significant backlog, over 4 Terabases. To alleviate this situation, we will introduce a different database technology that speeds up loading dramatically. However we will not be able to port the existing web interface to this new technology.

For a little while, MG-RAST will send out two notifications for each job completed: (1) job is available to view, analyze and compare in the version 4.0 prototype (see below, item 2) and the API. (2) A second email will be send once the loading into the old database technology for version 3 is completed. As the prototype for version 4.0 is nearly 100% complete and has 95% of the feature of version 3.0 we think this solution will provide the shorter round trip times we are looking for. This situation will be changed once we migrate to the new version.

2. Recent technical issues

In addition, as some of you have certainly noticed, MG-RAST was down four times in this quarter. We experienced 2 planned and 2 unplanned power outages that caused significant problems. While the majority of the data storage systems that support the MG-RAST API handled those outages well, our SQL servers (Postgres) did not. This caused a massive amount of downstream work. Unfortunately the 2nd unplanned power loss interrupted the clean-up of the first, creating a total of 4409 data sets that were incorrectly or incompletely loaded into the SQL representation. Currently the SQL databases support the Analysis and Download pages, thus rendering those jobs inaccessible.

We have begun re-loading the affected data sets into the SQL databases. Affected users will be notified once their data sets are loaded correctly.

The SQL databases are a bad match for the volume of data that MG-RAST handles today. We are working to replace the SQL database with a more modern (“sharded”) database in a future release. Unfortunately we cannot make this change while we are still using the old web user interface.

3. New web user interface taking shape

As mentioned in a previous newsletter, the web user interface for MG-RAST is undergoing significant changes. Users commenting on significant wait times and timeouts as well as the extremely high load on our servers prompted us to pick a new approach for web interface design.

The new interface features:

  • a cleaned up fresh look with an extended set of features
  • compatibility with all modern browsers
  • fully API-based with faster response times
  • client side computing for next to instant analysis results
  • provenance information and export functionality on all data

As some of you might recall the existing web user interface pre-dates the MG-RAST API, it accesses the underlying infrastructure directly, making changes next to impossible. Therefore we need to move off the old web interface and onto the modern interface as quickly as possible to increase our throughput and prevent future issues like those discussed in item #1.

A beta version of the new user interface is available at We appreciate your feedback!

Feedback URL:

Once the new version is fully tested, we will retire the old web interface within 4-5 weeks.


4. Workshop announcement

The MG-RAST team is accepting proposals for MG-RAST workshops for 2017. If you are interested in hosting a workshop and you can accommodate 20-30 students (who are expected to bring their own laptop or make other arrangements) and have decent WIFI and internet connectivity, contact us via the help-desk email that you are familiar with.

We are seeking to run at least 3 workshops each aiming at users at various levels. The 3-day workshop is increasing the amount of computational skills required with each day, starting with novice users and ending (on day 3) with an introduction to the API and other advanced topics.

Day 1: introduction and first steps

  • Gentle introduction to MG-RAST and shotgun metagenomics
  • The MG-RAST web interface
  • Performing analysis with the MG-RAST web user interface

Day 2: web based analytics and first steps on the command line

  • Advanced analytics with the web frontend
  • Using metadata
  • Using R, Perl and Python scripts to perform data uploads and analysis

Day 3: advanced topics

  • More details on using R, Perl and Python scripts to perform data uploads and analysis
  • How to change the MG-RAST pipeline and/or set up your own MG-RAST like pipeline (“Skyport, Shock and Awe”)

5. Miscellaneous

  • We will remove all old content from the Wiki with the release of version 4.0

MG-RAST newsletter, May 2016

Tuesday, April 12th, 2016


  1. MG-RAST hits 100 Terabasepairs
  2. Funding update
  3. Policy change: We will enforce data publication
  4. Beta version of next generation user interface

1. MG-RAST hits 100 Terabasepairs and 20,000 data submitters

MG-RAST has finished processing 100 Tbp threshold (terabasepairs or 100 x 10^12) earlier in April 2016. A total of over 240,000 data sets containing over 800 billion (800 * 10^9) sequences have been processed.

The MG-RAST systems continues to be popular with scientists, MG-RAST now has over 20,000 data submitters, in 2015 the web site and API had over 45,000 individual data consumers.

2. Funding update

MG-RAST has been fortunate to receive two awards recently, allowing us to continue operating. To our delight one of the awards will strengthen our collaboration with our friends from the EBI’s MG-Portal team. Resources remain tight but we are working hard to provide updates and good service to our users.

3. Policy change: We will enforce data publication

You will be no doubt aware that priority for processing in MG-RAST depends on your stated intention to release the data to the public. Many of you  have chosen to publish their data, subsequently received priority processing and have not released their data. On top of that we are receiving several help desk tickets per month for data that has been published in journals and not actually public on MG-RAST.

Therefore the new user interface will include reminders if data is overdue for publication.

In addition we will have the technical means to disable future submission for individual users if data release commitments have not been met.

4. Beta version of next generation user interface

MG-RAST’s popularity has led to some major load issues with our backend server infrastructure. Fortunately with the recent developments of web technologies and the progress in computing in general, we have the option of restructuring the user interaction. While retaining the graphical UI and adding a lot of functions, we are off-loading a lot of the load from the server to your desktop machines. This is done in a way that empowers the users and does not overwhelm current desktop machines. With the new MG-RAST client, you can download data sets from MG-RAST and prepare analyses using a graphical interface.


The beta version is available for testing at:

The new interface:

  • has significantly better optics and is faster
  • works with any modern browser (goodbye Firefox only requirement)
  • provides submission receipts with details on all user provided parameters
  • has a project editor that allows among other things moving non public metagenomes between projects
  • contains a processing receipt with details on the processing
  • significantly updated metadata dialogues

Upcoming features:

  • google spreadsheet editor for metadata
  • offline analyses of metagenomes including slice & dice
  • project page editor

Our plan is to collect feedback on the new user interface and update the production version after June 1st 2016.

MG-RAST newsletter, August 2015

Tuesday, August 11th, 2015

MG-RAST newsletter, August 2015 (and MG-RAST v3.6 Release Notes)

  1. Funding update
  2. Better normalization using DEseq
  3. More servers and back-end redundancy
  4. New upload, better metadata guidance
  5. Feature removal: removal of recruitment plot
  6. Change in policy: MG-RAST IDs and ACCESSION numbers only after making data public
  7. Change in policy: New algorithm for assigning job priority


1. Funding update

There are very few funding opportunities (in calls that we can write proposals for) to support the MG-RAST system. A system at the scale of MG-RAST is not static, but constantly evolving to address emerging issues like changing sequencing technologies, computer security issues or coping with the volume of data and submissions – which is continuously increasing.

The lack of funding opportunities is threatening to affect our ability to address future technical and scientific issues. One recent grant ending meant that we have lost our help-desk person. As a consequence, please be patient when sending requests to the help-desk, we are trying to crowd source that inside the team but it is hard.

We are devoting time to updating our manual to assist with navigating and analyzing data in MG-RAST. Please check to see if the manual or our FAQs can answer your question(s) before contacting the help desk.

2. Improve normalization of abundance values using DEseq

We are now offering the DEseq package as the default choice for normalizing data.

Formerly, the MG-RAST pipeline normalization procedure employed a simple log transformation of the data, as a large number of biological variables exhibit a log-normal distribution. We have incorporated DESeq, as it has been shown to outperform other methods of normalization – in particular, those that use any sort of linear scaling (PMID: 24699258).

3. MG-RAST now has multiple back-end servers and will (mostly) fail over automatically

The team has spent a lot of time updating the MG-RAST backend to shiny new technologies like CoreOS, FleetD etc. As a result, we can now run multiples of most servers, providing more scalability and failover if and when things go wrong. These new changes are rolled into production with version 3.6 (at the time of this email). There are no pipeline changes, just backend infrastructure changes to help keep the system running smoothly.


4. New Upload, better metadata support via improved MetaZen

4.1 Upload Changes

MG-RAST can now handle de-multiplexing of SFF and FASTQ files. A new visual frontend provides easier access and faster processing. In addition to the updated web front-end, there is a downloadable script that lets users submit large projects ( and our RESTful API ( provides upload functions as well. We encourage users to take a look at the scripts before trying the native API.

We have also changed the backend for file uploading to make it faster. Data is now uploaded directly to the SHOCK data store (instead of a slower secondary store). Consequently, if you are using the command line upload option, some procedural changes are required.

Please note: The main upload API remains the same, however the convenience scripts that we have made available to some will cease to function. We provide new scripts for mass upload.

The new procedure for uploading is documented in the manual

4.2 Metadata: Latest ENVO support now available

We have updated to the most recent version of ENVO. This is addressing the fact that due to changes with a third party API provider we were forced to use an older ENVO version. We are now up-to-date and support multiple versions of ENVO on a per project basis. This will ensure that (meta-)data added later to a project will be able to conform with the same standards as the existing data. URL:

5. Features removed: recruitment plot

MG-RAST web-based analysis tools use our API to retrieve data. Since the recruitment plot tool had not been updated to use the API, which is our standard method, we have removed it. Resources allowing, we will create a version that uses the API in the future.

6. Change in Operations: new procedure for obtaining MG-RAST Identifiers

MG-RAST IDs will continue to be assigned automatically, but at a different time point.

Some users are getting confused about the nature of their submissions, assuming them to be public when indeed they are not. As a consequence, we have to altered our procedures a bit. We will no longer immediately provide the MG-RAST IDs, instead the MG-RAST IDs will be made available when data is made public.


In addition are no longer displaying MG-RAST IDs until the data is public, instead displaying a link to make the data public. In several places we will no longer display the MG-RAST ID (e.g. 4447102.3) and instead display the job number.

7. Change in policy: New algorithm for assigning job priority

With the next update release of MG-RAST (3.6.1) we will introduce a change in the scheduling policy. Users with data in their accounts that is overdue for publication in MG-RAST will ALWAYS receive the lowest priority. We will provide tools to assist users with making data public.

As a service to users we will provide reminders for data sets that need to be made public.


Upcoming change to MG-RAST upload (early August 2015)

Tuesday, July 28th, 2015

In the week of August 3rd, we will change the upload mechanism for MG-RAST to a new, improved platform.

This will improve the end-user experience by providing:

  •  automatic MD5 checking on client and server side (for most files) to ensure that files are received correctly by MG-RAST
  •  faster upload and the ability to resume stopped uploads
  •  pre-upload file checking for content and naming scheme compliance
  •  de-multiplexing for Illumina barcoded data
  •  pre-upload metadata validation
  •  auto-decompression

In addition to the web browser based upload we will also provide a python based upload script that end users can either use as is or adapt to their needs.

We will be discontinuing the use of the existing, slow and disk-space limited upload system and retiring that system. As a result we ask that users submit all the data in their Inbox ASAP and not wait with submission. Immediately before the switch over we will put a warning up on the MG-RAST homepage to inform users.

For any data remaining in old upload system, we will migrate that to the new system IF the files are less than 72 hours at the time of transition.

July 28th, 2015


MG-RAST API available

Thursday, January 15th, 2015

A long time in the making the application programmers interface (API) for MG-RAST has now left beta status.

By opening MG-RAST up via a web services API (application programmers interface) we have greatly expanded access to MG-RAST data, as well as provided a mechanism for the use of third-party analysis tools with MG-RAST data. This RESTful API makes all data and data objects created by the MG-RAST pipeline accessible as JSON objects.

Our paper describing the API in PLoS Computational Biology is here.

The API entry point is:

Before you embark on a journey exploring the new capabilities, we’d like to mention:

  • we are fully committed to this API, it is what powers the web interface and will be the basis of any future development.
  • we note that programming skills are required to utilize the API, while we’d like to we cannot write the code to implement your analysis or query. We will however provide examples via github.
  • the API is versioned
  • we expect users to adhere to our terms of service.

Finally we’d like to thank the dozens of beta testers for their patience and helpful comments.

MG-RAST newsletter, September 2014

Wednesday, September 17th, 2014

Request for letter of support/testimonials
The MG-RAST team is applying for funding to support continued development on MG-RAST and to improve several aspects of the system. In particular, we need to update and improve the underlying data integration which has not changed significantly in the last five years. The funding will also be used to develop and upgrade the user interface.

We are requesting you as a member of our user community to help us by providing a letter of support that addresses the potential for the project, along with a testimonial as to the value MG-RAST has brought to your research. Your opinion is very important to us, we would like to include testimonials from the entire spectrum of MG-RAST users.

Your letter of support should be addressed to Dr. Folker Meyer at folker at and include your name, position and organization, it will need to be received before September 30, 2014 for inclusion in our proposal.

Click here to start

Thank you for your help,
Folker Meyer and the MG-RAST team

Recent highlights

– More server stability (new backend technology)

You may or may not have noticed several hardware related outages recently. Several of our servers are getting rather old and instead of replacing in kind (which is hard at our budget levels) we chose to move to newer more flexible technologies. Over the past 12 months, we have re-written the entire MG-RAST storage subsystem to rely on an object management system [SHOCK (] rather than a traditional file system.

Used together with the AWE resource management software, SHOCK allows the execution of the MG-RAST (and other) pipelines on a wide array of computational platforms. We have already provided cloud (aka Amazon EC2) machine images to a number of groups interested in providing their own computational resources for their data while analyzing the data with MG-RAST.

– More security

MG-RAST now supports encryption for your passwords. We note that the MG-RAST system does not offer secure connections (due to the increased hardware cost that would create).

Analysis Pipeline slowdown

You may have noticed that jobs submitted since June are taking longer to be processed. There are many contributing reasons — multiple hardware problems, move to the new Magellan cluster configuration, and preparing for the move to the SHOCK/AWE pipeline. In addition, there were some kinks in the new pipeline which are being worked out. The net result has been a large backlog in the compute queue and an increase in analysis time. Most of these issues have been resolved and we are working on resolving the last technical hurdles as quickly as we can.

Metadata issue

MG-RAST uses a controlled vocabulary for the metadata entries biome, feature, and material. This ontology, which was created and is controlled by EnvO (Environment Ontology), is available on the BioPortal site. Some terms from the latest EnvO version do not validate correctly on the MG-RAST website, resulting in an error message. We are revising our internal metadata validation process and until this is completed, you will need to select from compatible metadata terms from the list at: (or label=feature or label=material)
The list is returned as a JSON structure, use a viewer, e.g. FireFox + JSONView, to format it in a human-readable manner.

Publishing MG-RAST IDs

If you cite a MG-RAST ID in a publication you are responsible for making the data public on the MG-RAST site — all data in MG-RAST is private by default and has to be made public by explicit action of the owner. The declaration made during the submission process of the intention to make data public is only used to assign priority for the compute queue, your data will not be made public automatically based on the date entered.

The MG-RAST manual ( and linked from the front page) has more information in Section 4.11.

If you provide links to public datasets in a publication, use the linkin.cgi mechanism, not the URL displayed by the browser.
For example for the public dataset with MG-RAST ID 4440283.3 the linkin URL is:
and for the public project with project ID 128 the linkin URL is:
These URLs provide a stable method to link to public datasets in MG-RAST.

Citing MG-RAST

If you use our service for analysis or to make your data public, please cite:
The Metagenomics RAST server — A public resource for the automatic phylogenetic and functional analysis of metagenomes
F. Meyer, D. Paarmann, M. D’Souza, R. Olson , E. M. Glass, M. Kubal, T. Paczian , A. Rodriguez, R. Stevens, A. Wilke, J. Wilkening, and R. A. Edwards
BMC Bioinformatics 2008, 9:386

UI Changes/Bug fixes

FireFox v30 problems: upload broken, minor visual glitches
metadata file validation error for metadata files with integer metagenome names
collections removed from cacheing
fix missing list for jobs in progress
visual updates to job progress

Folker Meyer
and the MG-RAST team

MG-RAST data migration, September 2, 2014

Tuesday, September 2nd, 2014

MG-RAST is moving from using a traditional file system to store data to an object management system SHOCK ( The data involved includes but is not limited to sequence files, intermediate analyses pipeline outputs and annotation products.

The change will take place in the backend and the website should not be impacted, all webpages and analyses should work normally. However, if you do notice a problem with the website please let us know as soon as possible with all relevant details.

This change will have an impact on the MG-RAST ftp site. While most public datasets will remain available in the short term, projects which have been made public recently will not be accessible at all. We are working on a permanent long-term solution to this problem and should have it in place shortly.

We thank you for your patience and understanding.

— the MG-RAST team

MG-RAST Newsletter, June 2014

Wednesday, June 18th, 2014

MG-RAST Newsletter, June 2014

Recent highlight
MG-RAST has crossed the 400 billion sequences annotated threshold, that is
400×10^9 sequences. The estimated BLAST cost for this would have exceeded
100 million US dollars on Amazon’s EC2 cloud*.

* Following the calculations in Angiuoli et al, BMC Bioinformatics 2011
(DOI 10.1186/1471-2105-12-356) and Wilkening et al, IEEE Cluster 2009
(DOI: 10.1109/CLUSTR.2009.5289187).

User manual [NEW FEATURE]
We are transitioning our user help documentation from the blog system currently used
to a user manual in PDF format, mirrored by a “traditional” web site.
The PDF is ready today at:
We will keep updating and revising the user manual without changing this URL.
Dangers of account sharing [WARNING]
We are experiencing issues caused by multiple individuals sharing a single account.
Please note we cannot tell who the legitimate owner of an account is in cases like
this. If you share your password with another person, they can change it, thus taking
away access to your account and your data. The built-in sharing feature is a better way
to share data with someone. The metagenome overview and project pages have
‘share’ buttons, just type in a valid email address for the person you want to share with
(it does not matter if they have an MG-RAST account or not) to grant access to your
data. The user manual (linked above) has more details in section 4.11.
As part of the next version of MG-RAST we have created a REST application programmers’
interface. The API provides access to all public and private data in MG-RAST in many
programming languages, the access to private data requires authentication.
Check out details at:

New version of MG-RAST [UPDATE]

As some of you may have heard, we are working hard on a new version of MG-RAST to
be released later this year. It will include a number of new features, including better
support for metatranscriptomics, a browser independent user interface, and updated
annotations — we will update old jobs AND archive existing results. With this new version
we will also start automatic rolling updates of annotations for all jobs.
UI Changes/Bug fixes [UPDATE]
– fixed database issues caused by sequence file names containing unacceptable characters by modifying the file name filtering step,
– removed size restrictions for the clustering stage, allowing more efficient handling of extremely large datasets,
– several minor bug fixes improving the stability of the existing pipeline
– resolved several minor issues to improve usability of the UI

Folker Meyer
and the MG-RAST team