Estimation of sequence errors and capacity of genomic annotation in transcriptomic and DNA-protein interaction assays based on next generation sequencers

Abstract : The transcriptome or the interactome at unprecedented depth. These techniques yield short sequence reads that are then mapped on a genome sequence to predict putatively transcribed or protein-interacting regions. We argue that factors such as false locations, sequence errors, and read length impact on the mapping prediction capacity of these short reads. Here we suggest a computational approach to measure those factors and analyse their influence on both transcriptomic and epigenomic assays. This investigation provides new clues on both methodological and biological issues. First, we estimate that 4.6% of reads are affected by SNPs. Second, we show that the nucleotide error probability is low, and it significantly increases with the position in the sequence. Third, by choosing a read length above 19 bp, we practically eliminates the risk of finding irrelevant positions. However, the number of uniquely mapped reads decreases with sequences above 20 bp. Following our procedure, we obtain 0.6% of false positives among genomic locations. Therefore, even rare signatures, if they are mapped on the genome, should identify biologically relevant regions. This indicates that digital transcriptomics may help to characterise the wealth of yet undiscovered, low abundance transcripts.
Document type :
Journal articles
Complete list of metadatas

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00416012
Contributor : Nicolas Philippe <>
Submitted on : Friday, September 11, 2009 - 4:10:46 PM
Last modification on : Thursday, October 4, 2018 - 10:58:05 AM

Identifiers

  • HAL Id : lirmm-00416012, version 1

Collections

Citation

Nicolas Philippe, Anthony Boureux, Laurent Brehelin, Jorma Tarhio, Thérèse Commes, et al.. Estimation of sequence errors and capacity of genomic annotation in transcriptomic and DNA-protein interaction assays based on next generation sequencers. Cellular Oncology, IOS Press, 2009, 31 (2), pp.145-146. ⟨lirmm-00416012⟩

Share

Metrics

Record views

767