Human Transcriptome Annotation using Tandem SAGE Tags

Abstract : Recently, experiments based on tiling arrays have revealed the huge difference between number of active transcripts and the number of annotated genes. This discrepancy is not only due to differentially spliced variants, since many active transcriptional units were predicted far apart from genes. We are thus confronted to the challenge of annotating and characterizing more precisely the human transcriptome. For this purpose, approaches based on the sequencing of transcript signatures, like SAGE, which yield one tag par transcript expressed in the cell, do not provide enough sequence information per transcript to allow finding the unique corresponding loci on the genome. Indeed, on the human genome conventional SAGE tags (14 bps) occur in average more than 3000 times (theoretically), while long SAGE tags (21 bps) match a unique location in only 49% of the cases. To address this annotation challenge, we designed a novel strategy in which two SAGE libraries are built in parallel from the same polyadenylated RNA sample, with tags anchored on two different restrictions sites of cDNAs. In other words, the experiment yields per transcript two tags (14 bps) anchored at different positions on the transcript (the first NlaIII and the first Sau3A restriction sites starting from the 3' end), but the association of these is unknown. Among the many tags of each sort obtained by sequencing the libraries, one does not know a priori which pair corresponds to which transcript. We developed a fast program to find these associated pairs, called tandem tags, on the genome sequence. We first tested our strategy using two small twin SAGE libraries prepared from a unique RNA sample extracted from terminally differentiated macrophages. These two SAGE datasets were first used to assemble a set of 489 pairs mapping on well annotated human transcript sequences. This set enables us to evaluate the ability of our method to recover the genome locus from which each known transcript originates. In the remaining SAGE sets, we selected the subsets of tags that are not linked in any way to known transcripts, and searched on the genome for compound pairs of tags. This yields a set of 250 tag-delimited genomic sequences (TDGS), which were then investigated in silico, and for a selection, tested in vitro to see if they reveal unknown transcripts. When dealing with complex genomes, the results shows that this strategy improves on classical tag based approaches for the identification of novel transcript units.
Type de document :
Communication dans un congrès
Integrative Post-Genomics, Nov 2006, Lyon, pp.39, 2006
Liste complète des métadonnées

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00120164
Contributeur : Eric Rivals <>
Soumis le : mercredi 13 décembre 2006 - 14:50:38
Dernière modification le : jeudi 20 septembre 2018 - 15:40:02

Identifiants

  • HAL Id : lirmm-00120164, version 1

Collections

Citation

Eric Rivals, Anthony Boureux, Mireille Lejeune, Florence Ottones, Oscar Pecharromàn Pérez, et al.. Human Transcriptome Annotation using Tandem SAGE Tags. Integrative Post-Genomics, Nov 2006, Lyon, pp.39, 2006. 〈lirmm-00120164〉

Partager

Métriques

Consultations de la notice

331