MG-RAST newsletter, August 2015

—————————————————————————————————————————-
MG-RAST newsletter, August 2015 (and MG-RAST v3.6 Release Notes)
—————————————————————————————————————————-

  1. Funding update
  2. Better normalization using DEseq
  3. More servers and back-end redundancy
  4. New upload, better metadata guidance
  5. Feature removal: removal of recruitment plot
  6. Change in policy: MG-RAST IDs and ACCESSION numbers only after making data public
  7. Change in policy: New algorithm for assigning job priority

 

1. Funding update
*******************

There are very few funding opportunities (in calls that we can write proposals for) to support the MG-RAST system. A system at the scale of MG-RAST is not static, but constantly evolving to address emerging issues like changing sequencing technologies, computer security issues or coping with the volume of data and submissions – which is continuously increasing.

The lack of funding opportunities is threatening to affect our ability to address future technical and scientific issues. One recent grant ending meant that we have lost our help-desk person. As a consequence, please be patient when sending requests to the help-desk, we are trying to crowd source that inside the team but it is hard.

We are devoting time to updating our manual to assist with navigating and analyzing data in MG-RAST. Please check to see if the manual or our FAQs can answer your question(s) before contacting the help desk.

2. Improve normalization of abundance values using DEseq
****************************************************************

We are now offering the DEseq package as the default choice for normalizing data.

Formerly, the MG-RAST pipeline normalization procedure employed a simple log transformation of the data, as a large number of biological variables exhibit a log-normal distribution. We have incorporated DESeq, as it has been shown to outperform other methods of normalization – in particular, those that use any sort of linear scaling (PMID: 24699258).

3. MG-RAST now has multiple back-end servers and will (mostly) fail over automatically
*******************************************************************************************

The team has spent a lot of time updating the MG-RAST backend to shiny new technologies like CoreOS, FleetD etc. As a result, we can now run multiples of most servers, providing more scalability and failover if and when things go wrong. These new changes are rolled into production with version 3.6 (at the time of this email). There are no pipeline changes, just backend infrastructure changes to help keep the system running smoothly.

 

4. New Upload, better metadata support via improved MetaZen
*****************************************************************

4.1 Upload Changes

MG-RAST can now handle de-multiplexing of SFF and FASTQ files. A new visual frontend provides easier access and faster processing. In addition to the updated web front-end, there is a downloadable script that lets users submit large projects (ftp://ftp.metagenomics.anl.gov/tools/uploader/) and our RESTful API (http://api.metagenomics.anl.gov/api.html#inbox) provides upload functions as well. We encourage users to take a look at the scripts before trying the native API.

We have also changed the backend for file uploading to make it faster. Data is now uploaded directly to the SHOCK data store (instead of a slower secondary store). Consequently, if you are using the command line upload option, some procedural changes are required.

Please note: The main upload API remains the same, however the convenience scripts that we have made available to some will cease to function. We provide new scripts for mass upload.

The new procedure for uploading is documented in the manual ftp://ftp.metagenomics.anl.gov/manual.pdf

4.2 Metadata: Latest ENVO support now available

We have updated to the most recent version of ENVO. This is addressing the fact that due to changes with a third party API provider we were forced to use an older ENVO version. We are now up-to-date and support multiple versions of ENVO on a per project basis. This will ensure that (meta-)data added later to a project will be able to conform with the same standards as the existing data. URL: https://bioportal.bioontology.org/ontologies/ENVO

5. Features removed: recruitment plot
***************************************

MG-RAST web-based analysis tools use our API to retrieve data. Since the recruitment plot tool had not been updated to use the API, which is our standard method, we have removed it. Resources allowing, we will create a version that uses the API in the future.

6. Change in Operations: new procedure for obtaining MG-RAST Identifiers
******************************************************************************

MG-RAST IDs will continue to be assigned automatically, but at a different time point.

Some users are getting confused about the nature of their submissions, assuming them to be public when indeed they are not. As a consequence, we have to altered our procedures a bit. We will no longer immediately provide the MG-RAST IDs, instead the MG-RAST IDs will be made available when data is made public.

*************************************************************************************
*   NO MG-RAST IDs UNTIL THE DATA IS MADE PUBLICLY AVAILABLE.            *
*************************************************************************************

In addition are no longer displaying MG-RAST IDs until the data is public, instead displaying a link to make the data public. In several places we will no longer display the MG-RAST ID (e.g. 4447102.3) and instead display the job number.

7. Change in policy: New algorithm for assigning job priority
**************************************************************

With the next update release of MG-RAST (3.6.1) we will introduce a change in the scheduling policy. Users with data in their accounts that is overdue for publication in MG-RAST will ALWAYS receive the lowest priority. We will provide tools to assist users with making data public.

As a service to users we will provide reminders for data sets that need to be made public.

 

Comments are closed.