CONTACT  |  SITE MAP  |  ABOUT US   
Ask an account
You are here : Home / Home URGI / About us / Publications / Archives / 2011 / Efficient comparison of sets of intervals with NC-lists

2011

National,  COM (talks)

7-9 December, institut Pasteur de Lille, France

09 Dec 2011   Efficient comparison of sets of intervals with NC-lists

Yufei Luo, Matthias Zytnicki, Hadi Quesneville

High-throughput sequencing produces very rapidly a large amount of data which are usually difficult to analyze. It is especially true for RNA-Seq data, when they are compared to other annotations. Since the size of the produced data is dramatically increasing, there is a demand on optimized algorithms for the fast comparison of large amount of sequenced data.

With the advent of high-throughput sequencing, bio-informatics is generating a very large amount of data every day. Modern sequencers can generate several hundreds millions of sequences in a week. When a reference genome is available, the first task is to map the reads on genome. Many mapping tools are now available and the research is very active on this topic. For RNA-Seq, a common question is to compare the reads with respect to available annotations to get the expression of genes. A related task is to compare the expression of two different samples by comparing the reads. The comparison between reads and annotations consists in comparing sets of intervals. For example, comparing RNA-Seq reads to known transcripts reduces to comparing a set of query intervals (the reads) to a set of reference intervals (the exons of known transcripts). Much attention has been paid to compare a single query interval to a set of reference intervals. R-trees, NC-lists [1], etc. have been designed to do so. These structures can help comparing one set against the other by comparing each query interval to the reference intervals. In this paper, we will show how, using NC-lists, comparing the whole query sets to the reference sets can yield much better results.


Keywords: NCList, algorithm
[Download pdf: 339.76 kB]

Creation date: 11 May 2012