MG-RAST-CLOUD (or MG-RAST version 3) is ready! [March 1, 2011]

March 1st, 2011 by paczian

It is done! After months of work and countless useful interactions with many of our users, we are finally releasing our latest version of MG-RAST on Tuesday March 8, 2011.

The previous version (MG-RAST 2.0) was released in 2008 and has been used to analyze over 14,000 metagenomic data sets. Over 2000 users from more than 20 countries have submitted data.

The new release of MG-RAST builds on v2’s capabilities and adds a number of new features including

  • scalable to Illumina sized data sets with 75bp and longer with robust quality control
  • comprehensive support for metadata and metadata driven discovery of datasets
  • unparalleled data extraction capabilities
  • cloud support for back-end computing

What can you expect?

NEW DATA UPLOAD CAPABILITIES: Support for SFF, FASTQ and FASTA format data sets. The new server has been designed to handle reads of  75bp and longer, up to complete contig length. The server has been tested with individual data sets up to 50 GBp. We recommend uploading raw, unfiltered data, as MG-RAST will perform the QC steps required to clean up your data. The server also supports the use of assembled datasets.


GSC COMPLIANT METADATA: With many thousands of data sets in the system (many of them publicly available or in the process of being released), metadata is becoming more and more important in navigating the server. Version 3.0 includes support for GSC  metadata describing the sample. Users can enter metadata at time of submission or before sharing or publishing. MIMS, the minimal information about a metagenome, (or MIMARKS) is required before sharing or publishing any data sets on MG-RAST.


COMPREHENSIVE DATABASE. V3 includes such annotation resources as: SEED, KEGG, GO, INSDC, COGs, eggNOGs and IMG. Databases are updated every 3 months. Each data set is annotated with the database versions used to analyze it. Searching all these databases provides the ability to produce abundance profiles for COG categories, SEED subsystems as well as Kegg pathways based on the same computational analysis.


FEATURE PREDICTION. Identification of protein coding genes using FragGeneScan, an ab-initio gene caller (Rho, M et al., NAR, 2010, PMID: 20805240)  to identify the most likely reading frame and frame shifts for each sequence.  The similarity comparisons are then performed on the translated sequences, making the comparison both evolutionarily sensitive and computationally efficient.  Fraggenescan will identify multiple genes (called features) on lengthy fragments, permitting MG-RAST to annotate assembled contigs as well as short fragments.


CLUSTERING AND ASSEMBLY SUPPORT. V3 performs initial clustering of 90% identical protein fragments using uclust. During this operation we store the number of reads in each cluster to preserve abundances.

While version 3.0 supports the upload of assemblies (in FASTA format), we do not support performing assemblies in v3.0 (v4.0 will provide a web based assembly environment).


NEW USER INTERFACE AND TOOLS. Analyze your data and compare it to over 590 public metagenomes using a multitude of data visualization tools that allow for drilldown, data driven sub-selection of reads and data export. Users can, for example, download all reads for Lysine Biosynthesis from Actinobacteria from a specific data set.


CLOUD COMPUTING. The capability to not only speed up the analysis of current sequencing platforms, but also handle 3rd generation sequence data. Platforms are quickly moving from Gigabytes to Terabytes!!

WHAT WILL HAPPEN TO MY EXISTING  DATA: The MG-RAST team will migrate the data for the public metagenomes first, then we will migrate all private data sets. Existing sharing with other users will be retained.

THANK YOU BETA TESTERS: The MG-RAST team would like to offer our  thanks to the beta testers for providing their valuable feedback.