MG-RAST API Release of Version 1

September 17th, 2013 by folker

Hi everyone, we’ve updated our API to point to Version 1. If you need to access the previous version of our API, you can do so by going to:

http://api.metagenomics.anl.gov/beta/

It’s always a good idea to reference a version of the API in your code and in your publications, so you know where your data came from. Version 1 can now be reached at either the base API URL or by explicitly including the version number:

http://api.metagenomics.anl.gov/

http://api.metagenomics.anl.gov/1/

To help guide you when using the API, links to documentation are provided on http://api.metagenomics.anl.gov/api.html

Thanks,
The MG-RAST Team

MG-RAST 3.3 release notes [December 12, 2012]

December 13th, 2012 by mark

New database:
The database underlying the MG-RAST analyses has been completely revamped and now uses a completely new schema running on new hardware.
The analyses results for all existing metagenomes in MG-RAST were ported to the new database and all code was modified to use the new schema.

Upload/Submission:
Changed to remove confusing text and simplify the interface.  Also, removed processing from Upload page and pushed it off to the compute cluster so that the Upload page should hang less and user’s will be informed more quickly what is happening.

Overview page:
We added a ‘delete’ button on the overview page which will delete a metagenome from MG-RAST completely. This button will be displayed to some users only, usually the owner of a metagenome. In addition, datasets which have been made public can not be deleted. Please be careful when using this function, once a dataset has been deleted it can not be recovered.

Miscellaneous:
Misc. changes and bug fixes

MG-RAST newsletter, September 2014

September 17th, 2014 by mark

************************************************
Request for letter of support/testimonials
************************************************
The MG-RAST team is applying for funding to support continued development on MG-RAST and to improve several aspects of the system. In particular, we need to update and improve the underlying data integration which has not changed significantly in the last five years. The funding will also be used to develop and upgrade the user interface.

We are requesting you as a member of our user community to help us by providing a letter of support that addresses the potential for the project, along with a testimonial as to the value MG-RAST has brought to your research. Your opinion is very important to us, we would like to include testimonials from the entire spectrum of MG-RAST users.

Your letter of support should be addressed to Dr. Folker Meyer at folker at anl.gov and include your name, position and organization, it will need to be received before September 30, 2014 for inclusion in our proposal.

Click here to start

Thank you for your help,
Folker Meyer and the MG-RAST team

************************************************
Recent highlights
************************************************

– More server stability (new backend technology)

You may or may not have noticed several hardware related outages recently. Several of our servers are getting rather old and instead of replacing in kind (which is hard at our budget levels) we chose to move to newer more flexible technologies. Over the past 12 months, we have re-written the entire MG-RAST storage subsystem to rely on an object management system [SHOCK (https://github.com/MG-RAST/Shock)] rather than a traditional file system.

Used together with the AWE resource management software, SHOCK allows the execution of the MG-RAST (and other) pipelines on a wide array of computational platforms. We have already provided cloud (aka Amazon EC2) machine images to a number of groups interested in providing their own computational resources for their data while analyzing the data with MG-RAST.

– More security

MG-RAST now supports encryption for your passwords. We note that the MG-RAST system does not offer secure connections (due to the increased hardware cost that would create).

************************************************
Analysis Pipeline slowdown
************************************************

You may have noticed that jobs submitted since June are taking longer to be processed. There are many contributing reasons — multiple hardware problems, move to the new Magellan cluster configuration, and preparing for the move to the SHOCK/AWE pipeline. In addition, there were some kinks in the new pipeline which are being worked out. The net result has been a large backlog in the compute queue and an increase in analysis time. Most of these issues have been resolved and we are working on resolving the last technical hurdles as quickly as we can.

************************************************
Metadata issue
************************************************

MG-RAST uses a controlled vocabulary for the metadata entries biome, feature, and material. This ontology, which was created and is controlled by EnvO (Environment Ontology), is available on the BioPortal site. Some terms from the latest EnvO version do not validate correctly on the MG-RAST website, resulting in an error message. We are revising our internal metadata validation process and until this is completed, you will need to select from compatible metadata terms from the list at:
http://api.metagenomics.anl.gov/1/metadata/cv?label=biome (or label=feature or label=material)
The list is returned as a JSON structure, use a viewer, e.g. FireFox + JSONView, to format it in a human-readable manner.

************************************************
Publishing MG-RAST IDs
************************************************

If you cite a MG-RAST ID in a publication you are responsible for making the data public on the MG-RAST site — all data in MG-RAST is private by default and has to be made public by explicit action of the owner. The declaration made during the submission process of the intention to make data public is only used to assign priority for the compute queue, your data will not be made public automatically based on the date entered.

The MG-RAST manual (ftp://ftp.metagenomics.anl.gov/data/manual/mg-rast-manual.pdf and linked from the front page) has more information in Section 4.11.

If you provide links to public datasets in a publication, use the linkin.cgi mechanism, not the URL displayed by the browser.
For example for the public dataset with MG-RAST ID 4440283.3 the linkin URL is:

http://metagenomics.anl.gov/linkin.cgi?metagenome=4440283.3

and for the public project with project ID 128 the linkin URL is:

http://metagenomics.anl.gov/linkin.cgi?project=128

These URLs provide a stable method to link to public datasets in MG-RAST.

************************************************
Citing MG-RAST
************************************************

If you use our service for analysis or to make your data public, please cite:
The Metagenomics RAST server — A public resource for the automatic phylogenetic and functional analysis of metagenomes
F. Meyer, D. Paarmann, M. D’Souza, R. Olson , E. M. Glass, M. Kubal, T. Paczian , A. Rodriguez, R. Stevens, A. Wilke, J. Wilkening, and R. A. Edwards
BMC Bioinformatics 2008, 9:386

http://www.biomedcentral.com/1471-2105/9/386

************************************************
UI Changes/Bug fixes
************************************************

FireFox v30 problems: upload broken, minor visual glitches
metadata file validation error for metadata files with integer metagenome names
collections removed from cacheing
fix missing list for jobs in progress
visual updates to job progress

Regards,
Folker Meyer
and the MG-RAST team

MG-RAST data migration, September 2, 2014

September 2nd, 2014 by mark

MG-RAST is moving from using a traditional file system to store data to an object management system SHOCK (https://github.com/MG-RAST/Shock). The data involved includes but is not limited to sequence files, intermediate analyses pipeline outputs and annotation products.

The change will take place in the backend and the website should not be impacted, all webpages and analyses should work normally. However, if you do notice a problem with the website please let us know as soon as possible with all relevant details.

This change will have an impact on the MG-RAST ftp site. While most public datasets will remain available in the short term, projects which have been made public recently will not be accessible at all. We are working on a permanent long-term solution to this problem and should have it in place shortly.

We thank you for your patience and understanding.

— the MG-RAST team

MG-RAST Newsletter, June 2014

June 18th, 2014 by mark

MG-RAST Newsletter, June 2014

Recent highlight
MG-RAST has crossed the 400 billion sequences annotated threshold, that is
400×10^9 sequences. The estimated BLAST cost for this would have exceeded
100 million US dollars on Amazon’s EC2 cloud*.

* Following the calculations in Angiuoli et al, BMC Bioinformatics 2011
(DOI 10.1186/1471-2105-12-356) and Wilkening et al, IEEE Cluster 2009
(DOI: 10.1109/CLUSTR.2009.5289187).


User manual [NEW FEATURE]
We are transitioning our user help documentation from the blog system currently used
to a user manual in PDF format, mirrored by a “traditional” web site.
The PDF is ready today at:
ftp://ftp.metagenomics.anl.gov/data/manual/mg-rast-manual.pdf
We will keep updating and revising the user manual without changing this URL.
Dangers of account sharing [WARNING]
We are experiencing issues caused by multiple individuals sharing a single account.
Please note we cannot tell who the legitimate owner of an account is in cases like
this. If you share your password with another person, they can change it, thus taking
away access to your account and your data. The built-in sharing feature is a better way
to share data with someone. The metagenome overview and project pages have
‘share’ buttons, just type in a valid email address for the person you want to share with
(it does not matter if they have an MG-RAST account or not) to grant access to your
data. The user manual (linked above) has more details in section 4.11.
API [NEW FEATURE]
As part of the next version of MG-RAST we have created a REST application programmers’
interface. The API provides access to all public and private data in MG-RAST in many
programming languages, the access to private data requires authentication.
Check out details at: http://api.metagenomics.anl.gov/api.html

New version of MG-RAST [UPDATE]

As some of you may have heard, we are working hard on a new version of MG-RAST to
be released later this year. It will include a number of new features, including better
support for metatranscriptomics, a browser independent user interface, and updated
annotations — we will update old jobs AND archive existing results. With this new version
we will also start automatic rolling updates of annotations for all jobs.
UI Changes/Bug fixes [UPDATE]
- fixed database issues caused by sequence file names containing unacceptable characters by modifying the file name filtering step,
– removed size restrictions for the clustering stage, allowing more efficient handling of extremely large datasets,
– several minor bug fixes improving the stability of the existing pipeline
– resolved several minor issues to improve usability of the UI

Regards,
Folker Meyer
and the MG-RAST team

MG-RAST 3.3.6 release notes (API changes and new Search implementation) [July 2013]

July 31st, 2013 by folker

User documentation and tech report available

We have made a PDF with a manual and tech report available describing MG-RAST in detail.

New search function

As many users have no doubt noticed the search function prior to version 3.3.6 was overwhelmed by the amount of data in the system. We have implemented a new search function that replaces the old capabilities but is significantly faster than prior to 3.3.6.

Creation of collections from search results

To create a collection from search results first select the metagenomes with the checkboxes and then click the “create collection” button. The collections created are displayed in the metagenome selection widget on the analysis page where they can be used for comparison, either as individual metagenomes or as a group.

Anonymous reviewer access

To grant reviewers access to a project while preserving their anonymity we added a ‘Create Reviewer Access Token’ button on the project page which is visible when you click on the ‘Share Project’ link. This generates a token that can be sent to the publisher to pass on to reviewers who can use the included link to get anonymous access to the project. The number of reviewers who have accessed the project will be displayed to the owner in the list of users the project is shared with, but the identity of the reviewers is not disclosed. The owner of the project can revoke the token at any time to disable access.

Some changes to the programmers interface (API) [still beta]

We have made some changes to the programmers interface that will affect existing deployed clients and third party tools. We will notify all known API users separately.

Misc. changes and bug fixes

Many miscellaneous changes and bug fixes have been implemented, see the github pages for details. See here: https://github.com/MG-RAST/MG-RAST/pull/391

MG-RAST v3 tech-report and manual available

June 19th, 2013 by folker

The tech-report and manual for MG-RAST version 3.3 is available for download.

MG-RAST 3.2.5 release notes [November 2, 2012]

November 14th, 2012 by mark

Analysis page:
High resolution images are now being displayed scaled to 800×800 pixels, the raw image size has not been changed
LCA — tree dsplay modified for display when multiple datasets selected.

Upload/Submission:
Metadata template updated
MetaZen made available. MetaZen is web-based tool for entering metadata into a spreadsheet and is an alternative to downloading and filling in the metadata template. It requires you to enter the metadata through a webpage and returns the data formatted in a metadata spreadsheet which can be edited further if necessary and then uploaded to MG-RAST.

Download:
The download page was modified to make it simpler to use and the download mechanism was changed to make downloads of large files more efficient.

Miscellaneous
Misc. changes and bug fixes

MG-RAST 3.2.4 release notes [October 2012]

October 23rd, 2012 by mark

Analysis page:
Barchart, tree, heatmap and PCoA visualizations have been added for the Lowest Common Ancestor analyses.

Upload page:
The upload page was modified to simplify the layout, grouping common functions together. The error handling was changed to make the messages displayed human readable and indicate the actions necessary to remedy the problems found.

Merging mate-pairs:

[Changed, see this FAQ entry for the current procedure]

The new ‘merge mate-pairs’ function on the Upload page allows users to upload and merge two separate fastq files which represent ordered paired end reads from the same sequencing run. The fastq-join utility (http://code.google.com/p/ea-utils/wiki/FastqJoin) is used to merge mate-pairs with a minimum overlap setting of 8bp and a maximum difference of 10% (parameters: -m 8 -p 10). Then, mate-pairs that have not been merged are joined by appending 10 N’s to the first read and then appending the reverse complemented paired read. These results are then merged into a single output file which can be submitted for analysis to MG-RAST.

Preprocessing:
In the preprocessing pipeline options, Sus Scrofa, NCBI v10.2 has been added to the list of species available for screening using bowtie.

Overview page:
The kmer profile and nucleotide position histogram is now displayed for amplicon datasets.

Bug fixes:
Miscellaneous bug fixes.

Announcing DRISEE our new tool to describe sequencing error [June 19, 2012]

June 19th, 2012 by folker

This is the first in a series of posts about DRISEE (duplicate read inferred sequence error estimation) our new tool to estimate the amount of “noise” in metagenomic data sets. As we find new things or the tool changes, we will blog about it here in the MG-RAST blog. Our manuscript in Plos Computational Biology describes the procedure in detail. We note that it provides a vendor independent quality score for your sequence libraries (provided you are using shotgun metagenomic data,  did not remove duplicate reads, assemble the data, and that the data do not contain adaptor contamination). The software is open source and available at github. We are currently computing DRISEE scores for all data sets in MG-RAST, once  computed they will appear on your metagenome overview page for your metagenome. The page will also provide a way to understand the relative quality of your data set as opposed to all other data sets in MG-RAST. We already have received some feedback for the paper and would like to clarify some statements. We’ve received a number a number of questions regarding figure 3 (reproduced below):

Questions center around the conspicuous difference in the distributions of average DRISEE scores between 454 and Illumina.  We would not interpret this as an indication that Illumina is inherently more error prone than 454.  The distribution of average DRISEE values for the data sets presented in the paper give a clear indication that the overall quality of the Illumina samples is much lower, but this is only true for the small subset of data shown in the manuscript. The samples selected for the study were a random subset of those publically available through MG-RAST at the time the method was developed (some 500 metagenomes at the time, more than 10,000 public data sets are available now). It is probably true that most early Illumina data sets had relatively low quality, compared to later studies with Illumina technology. Below you’ll see examples of the Phred (Blue) and DRISEE (Red) error for four Illumina data sets (selected from the low and high end of average DRISEE errors observed across all of the publically available WGS datasets in MG-RAST).  These represent the two extremes (low – top two examples, and high – bottom two examples):

There are many datasets that exhibit a DRISEE error that approaches Phred/Q error for the same sample.  In other cases, DRISEE error greatly exceeds Phred/Q values.  We see similar patterns in 454 samples.  At present, we would not say any one technology is any more error prone than another.  We can say that there are dramatic differences in DRISEE-based error rates from one sample to the next. So the error it appears it not technology inherent but rather operator or sample induced in some way.