MG-RAST version 3.2 contains a number of usability updates. In addition we have updated the database underlying our sequence similarity searches. All old jobs will be updated, new jobs are automatically run against the new database. The new version supports better handling of metadata and has a new uploader to help ease the data transfer process. A command line uploader is available for sites with low bandwidth. In addition to new quality control tools, the new version also includes support for metadata for the indoor environment (see Sloan Built Environment Program). Finally, MG-RAST is now maintained on github under a BSD license.
Our upload has been revamped to provide more feedback and control about the process to the user. You can upload an validate metadata via a template file and get instant feedback on validity of your uploaded data. This allows the user to delete the uploaded file, upload a fixed version and be sure to discover simple errors before submission. Users can also use command line uploads to a private (and secure) directory using tools like cURL. The files are uploaded to a private user inbox. The user can delete and unpack files in their inbox. The interface allows the user to pick and validate files for the different submission requirements: metadata, sequence file, additional files. When all requirements are met, the user can perform the actual submission.
2. M5NR update
We have updated the underlying protein and DNA database (see m5nr). The new databases incorporated include: Updated Genbank, Phage (specs), fungal (specs), and others.
Note: we are using the final public release of KEGG, we can not offer access the current version as it requires licensing.
Ability to calculate unifrac and weighted unifrac distances among MG-RAST 16s annotations – will be available in all tools that utilize distance metrics — PCoA, Heatmap-dendrogram
4. Quality visualizations
We have added additional data summarizing tools to the Overview page: a base-call visualization and a kmer spectrum summary. These permit the identification of certain classes of problems with datasets and provide annotation-free characterization of sequence diversity and potential for assembly. The kmer-rank-abundance visualization can be interpreted as coverage vs genome size, and will reveal if considerable amounts of the dataset are explained by small amounts of unique sequence.
5. Maintain MG-RAST code on github in the future
We have switched all MG-RAST development to github. Creating a fully open version of MG-RAST under https://github.com/MG-RAST/
6. Incorporation of FungiDB
FungiDB, developed by Jason Stajich, has been incorporated into the analysis database of MG-RAST. FungiDB (http://FungiDB.org) is a functional genomic resource for pan-fungal genomes. This addition to the MG-RAST and MoBEDAC analysis servers provides valuable annotations for the classification and characterization of fungal sequences, which is important for the taxonomic and functional classification of microbial communities, especially in the built environment.
7. Built environment minimal metadata package accepted by the GSC and incorporated into MoBEDAC.
As the impact and prevalence of large-scale metagenomic surveys grows, so does the need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. While environments such as outdoor and human have representation in the standards being developed, the built environment does not. This environment is extremely different from others and only a limited number of terms are useful in its description, mostly describing common elements of the processing of the samples like sequencing technology and library construction. The Sloan Foundation has established the Microbiology of the Built Environment (BE) to uncover the complexity of microbial ecosystems of inside spaces. Bringing together researchers and architects, the Microbiomes of the Built Environment Data Analysis Core (MoBeDAC) is developing and coordinating a cohesive representation of the microbial community in built environments. MoBeDAC has established a working group to expand the GSC MIxS standard for microbial sequences collected from Built Environments. Samples collected, sequenced and annotated with MIxS-BE metatdata from waste-water, air filters, air and surfaces of indoor spaces provides a rigorous and structured tool for analysis of microbial sequences and ecosystems of the indoor and outdoor environments.
The BE-MIxS core standard has been developed as a minimal metadata standard to establish a core set of terms to describe BE samples collected among the diverse BE projects. A core minimal standard provides a rich resource for comparative analysis across widely different built environments.