Estimation of sequence errors and capacity of genomic annotation in transcriptomic and DNA-protein interaction assays based on next generation sequencers - LIRMM - Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier
Journal Articles Cellular Oncology Year : 2009

Estimation of sequence errors and capacity of genomic annotation in transcriptomic and DNA-protein interaction assays based on next generation sequencers

Abstract

The transcriptome or the interactome at unprecedented depth. These techniques yield short sequence reads that are then mapped on a genome sequence to predict putatively transcribed or protein-interacting regions. We argue that factors such as false locations, sequence errors, and read length impact on the mapping prediction capacity of these short reads. Here we suggest a computational approach to measure those factors and analyse their influence on both transcriptomic and epigenomic assays. This investigation provides new clues on both methodological and biological issues. First, we estimate that 4.6% of reads are affected by SNPs. Second, we show that the nucleotide error probability is low, and it significantly increases with the position in the sequence. Third, by choosing a read length above 19 bp, we practically eliminates the risk of finding irrelevant positions. However, the number of uniquely mapped reads decreases with sequences above 20 bp. Following our procedure, we obtain 0.6% of false positives among genomic locations. Therefore, even rare signatures, if they are mapped on the genome, should identify biologically relevant regions. This indicates that digital transcriptomics may help to characterise the wealth of yet undiscovered, low abundance transcripts.
No file

Dates and versions

lirmm-00416012 , version 1 (11-09-2009)

Identifiers

  • HAL Id : lirmm-00416012 , version 1

Cite

Nicolas Philippe, Anthony Boureux, Laurent Brehelin, Jorma Tarhio, Thérèse Commes, et al.. Estimation of sequence errors and capacity of genomic annotation in transcriptomic and DNA-protein interaction assays based on next generation sequencers. Cellular Oncology, 2009, 31 (2), pp.145-146. ⟨lirmm-00416012⟩
289 View
0 Download

Share

More