Ready for Next-Gen Sequencing!
High throughput sequencing is now getting increasingly popular. For less than 10k€, a lab can sequence more than 25 millions reads. And this new and affordable tool has many applications: de novo sequencing, resequencing (for genomic diversity detection), RNA-Seq (detecting mRNA or short non-coding RNAs), ChIP-Seq (chromatin immunoprecipitation followed by sequencing), RIP-Seq (same as previous, where the RNA which is bound the protein of interest is sequenced), Hi-C (for chromosome conformation capture), etc.
Whereas the technology is appealing (see how many
sequencers there are around the world
), computers are now compulsory to analyze the sequencing data. The shear size of the data you get from the sequencing prevents any manual analysis. As a consequence, the URGI is more and more solicited to perform the analysis of the high throughput sequencing.
We are currently hosting four projects involving high throughput sequencing. The first two projects,
GrapeReSeq
and
Muscares
, investigate the genomic variability in Vitis vinifira and close relatives. The third project uses RNA-Seq data to assess the role of transposable elements in the transcription in Drosophila melanogaster. The fourth project explores the epigenetic diversity of Arabidopsis thaliana with RNA-Seq. The last new project has been recently started, where some RNA-Seq, ChIP-Seq and RIP-Seq will help us to understand some epigenetic regulation in Arabidopsis thaliana.
Since URGI is a "dry lab" (there is no bench whatsoever here), we mainly work in collaboration with other lab to produce the material and interpret the data. The first two projects, which are the largest projects, are international joint works from Spain, Italy, Germany and France, where our lab is in charge of the SNP detection. In these projects, more than 30 lanes of Solexa Genome Analyzer data will be produced by the
EPGV
and analyzed by our lab. The other projects involve with the collaboration of other labs, mainly with
IJPB
.
The data we are currently using are quite diverse: it can either be DNA or (long and short) RNA sequencing, but ChIP-Seq and RIP-Seq data will be arriving soon. So far, we have analyzed Solexa Genome Analyzer and Roche Genome Sequencer reads, possibly single or pair ends, normalized or not, 5' capped or not.
The URGI is actively developing tools to analyze the deluge of data we are facing. Two heavily used tools are already available to the community: MapHits and S-MART. MapHits is a pipe-line which finds and selects the reliable SNPs and select the SNP which can be used to design a chip. The pipe-line is highly flexible and can be launched through the
Galaxy
Web interface that URGI is hosting (the pipe-line is currently available in restricted access only).
S-MART
is a tool box for the analysis of RNA-Seq data. It can be used on your personal computer, or through Galaxy (work in progress). Do not hesitate to contact us if you think that you may require MapHits or S-MART!
Another project is to develop a new module inside our information system
GnpIS
, codenamed GnpSeqNGS, dedicated to store high throughput sequencing data. This new petal of the GnpIS flower would be a repository for high throughput sequencing data, highly connected with the other modules, especially
GnpSNP
(for resequencing data). This new repository will host the data from our collaborations and make them available to the community through simple query interfaces.
Although the future is hardly predictable, we may suspect that the number of high throughput sequencing projects that the URGI would host will increase. This relieves two main questions, which are: Will we be having the computer hardware to do the work? Will we be having enough human resources to do the analysis?
To handle this large amount of data, the lab has massively invested into
computer hardware
. We currently host a cluster of more than 700 Intel Xeon and more than 60 TeraBytes are available, with backup system and high throughput network between the nodes. And this could be increased in quantity very soon!
URGI is actively collaborating with the other platforms of the
APLIBIO
network, which gathers 8 platforms of the Region Ile-de-France. Here, we share our experiences and our tools to stay up to date with the latest technologies. Relying on this network of experts, it will be possible to face the future of next-generation sequencing.
Matthias Zytnicki