Estimation of sequence errors and capacity of genomic annotation in transcriptomic and DNA-protein interaction assays based on next generation sequencers

The transcriptome or the interactome at unprecedented depth. These techniques yield short sequence reads that are then mapped on a genome sequence to predict putatively transcribed or protein-interacting regions. We argue that factors such as false locations, sequence errors, and read length impact on the mapping prediction capacity of these short reads. Here we suggest a computational approach to measure those factors and analyse their influence on both transcriptomic and epigenomic assays. This investigation provides new clues on both methodological and biological issues. First, we estimate that 4.6% of reads are affected by SNPs. Second, we show that the nucleotide error probability is low, and it significantly increases with the position in the sequence. Third, by choosing a read length above 19 bp, we practically eliminates the risk of finding irrelevant positions. However, the number of uniquely mapped reads decreases with sequences above 20 bp. Following our procedure, we obtain 0.6% of false positives among genomic locations. Therefore, even rare signatures, if they are mapped on the genome, should identify biologically relevant regions. This indicates that digital transcriptomics may help to characterise the wealth of yet undiscovered, low abundance transcripts.

Mots clés

Domaines

Bio-informatique [q-bio.QM]

Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00416012

Soumis le : vendredi 11 septembre 2009-16:10:46

Dernière modification le : lundi 22 avril 2024-11:32:03

Dates et versions

lirmm-00416012 , version 1 (11-09-2009)

Identifiants

HAL Id : lirmm-00416012 , version 1

Citer

Nicolas Philippe, Anthony Boureux, Laurent Brehelin, Jorma Tarhio, Thérèse Commes, et al.. Estimation of sequence errors and capacity of genomic annotation in transcriptomic and DNA-protein interaction assays based on next generation sequencers. Cellular Oncology, 2009, 31 (2), pp.145-146. ⟨lirmm-00416012⟩

Exporter

Collections

422 Consultations

0 Téléchargements