Data Deluge: Next Generation Sequencing for Microbiome Analysis

There are a number of online pipelines for the analysis of ribosomal samples for carrying out microbiome studies. Two of the most frequently used ones are:

RDP
MG-Rast

Another possibility is using a pipeline installed on a local machine. The weakness of this approach is that you need to keep the r-RNA databases up-to-date to make sure you get the best results. Possible locally installed pipelines are Pangaea and Qiime (pronounced chime).

An initial challenge can be that the files you get from Next Generation sequencing might not be compatible with the pipelines either because they have a different quality measure to that expected, or because the pipelines cannot work with fastq files.

Transforming Fastq Files

One tool for transforming fastq files is the FASTX-Toolkit. Another simpler and faster method but with much less functionality is to use BioPython (you need to install it first but that is relatively simple on Linux systems like Ubuntu). For the FASTX toolkit if you are using Ion Torrent data do not forget to include the -Q 33 flag to show that a quality string different to the default is being used.

The BioPython code for creating the fasta files from the fastq files is:

SeqIO.convert("filename.fastq", "fastq", "output.fasta", "fasta")

To create the accompanying quality file the command is:
SeqIO.convert("filename.fastq", "fastq", "output.qual", "qual")

Data Deluge

Thursday, 12 September 2013

Next Generation Sequencing for Microbiome Analysis - Getting Started

Transforming Fastq Files

No comments:

Post a Comment