MG-RAST for the impatient (README 1st)

The MG-RAST pipeline performs quality control, protein prediction, clustering and similarity-based annotation on nucleic acid sequence datasets using a number of bioinformatics tools.  MG-RAST was built to analyze large shotgun metagenomic data sets ranging in size from megabases to terabases.  We also support amplicon (16s, 18s, ITS, …) sequence datasets and metatranscriptome (RNA-seq) sequence datasets.   The current MG-RAST pipeline is not capable of predicting coding regions from eukaryotes and thus will be of limited use for eukaryotic shotgun metagenomes and/or the eukaryotic subsets of shotgun metagenomes.

Data on MG-RAST is private to the submitting user unless shared with other users or made public by the user. We strongly encourage the eventual release of data and require metadata (“data describing data”) for data sharing or publication. Data submitted with metadata will be given priority for the computational queue.

You need to provide (raw or assembled) nucleotide sequence data and sample descriptions (“metadata“). The system accepts sequence data in FASTA, FASTQ and SFF format and metadata in the form or GSC standard compliant checklists (see Yilmaz et al, Nature Biotech, 2011).   Uploads can be put in the system via either the web interface or a command line tool.   Data and metadata are validated after upload.

There are three ways supported for uploading
Use the graphical upload tool in web interface if you are a new user and/or not comfortable with the command line. If you are comfortable with the command line, you can try the scripts at https://github.com/MG-RAST/MG-RAST-Tools (note, they require installation!). If you are a programmer setting up your own system, try to native RESTful API at http://api.metagenomics.anl.gov

If you are having upload issues, please bear in mind they might be related to your machine and note that hundreds of researchers are uploading into MG-RAST every week. So before you send a help-desk request you might want to double-check:

  • the version of Firefox you are using (make sure it is very recent)
  • enable Javascript and Cookies
  • disable any additional add-ons (e.g. add-blocker)
  • restart your browser
  • in some cases restarting Windows has helped.

We have heard from researchers that some computers could not upload while the computer next to them worked fine. So as a final step please try your upload from another (recent and updated) computer.

 

You must choose quality control filtering options at the time you submit your job. MG-RAST provides several options for  quality control (QC) filtering for nucleotide sequence data, including  removal of artificial duplicate reads, quality-based read trimming, length-based read trimming, and screening for DNA of model organisms (or humans). These filters are applied before the data are submitted for annotation.

The MG-RAST pipeline assigns an accession number and puts the data in a queue for computation.  The similarity search step is computationally expensive. Small jobs can complete as fast as hours, while large jobs can spend a week waiting in line for computational resources.

MG-RAST performs a protein similarity search between predicted proteins and database proteins (for shotgun) and a nucleic-acid similarity search (for reads similar to 16S and 18S sequences). These databases are searched.

MG-RAST presents the annotations via the tools on the analysis page which prepare, compare, display, and export the results on the website. The download page offers the input data, data at intermediate stages of filtering, the similarity search output, and summary tables of functions and organisms detected.

MG-RAST can compare thousands of data sets run through a consistent annotation pipeline.  We also provide a means to view annotations in multiple different namespaces (e.g. SEED functions, K.O. Terms, Cog Classes, EGGnoggs) via the M5Nr.

Read the MG-RAST FAQ next.  We also recommend Metagenomics-a guide from sampling to data analysis (PMID 22587947) in Microbial Informatics and Experimentation, 2012 is a review on best practices for experiment design.

Further reading and MG-RAST tutorial materials: