Human Transcriptome Annotation using Tandem SAGE Tags

Eric Rivals; Anthony Boureux; Mireille Lejeune; Florence Ottones; Oscar Pecharromàn Pérez; Jorma Tarhio; Fabien Pierrat; Florence Ruffle; Thérèse Commes; Jacques Marti

Communication Dans Un Congrès Année : 2006

Human Transcriptome Annotation using Tandem SAGE Tags

(1) , (2) , (2) , (2) , (3) , (3) , (4) , (2) , (2) , (2)

1
2
3
4

Eric Rivals

Fonction : Auteur
PersonId : 2002
IdHAL : eric-rivals
ORCID : 0000-0003-3791-3973
IdRef : 118021850

Méthodes et Algorithmes pour la Bioinformatique

Anthony Boureux

Fonction : Auteur
PersonId : 740617
IdHAL : anthony-boureux
IdRef : 253129192

Institut de génétique humaine

Mireille Lejeune

Fonction : Auteur

Institut de génétique humaine

Florence Ottones

Fonction : Auteur

Institut de génétique humaine

Oscar Pecharromàn Pérez

Fonction : Auteur

Laboratory of Software Technology

Jorma Tarhio

Fonction : Auteur

Laboratory of Software Technology

Fabien Pierrat

Fonction : Auteur

Skuld-Tech, biotechnology company

Florence Ruffle

Fonction : Auteur
PersonId : 21359
IdHAL : florence-ruffle
ORCID : 0009-0006-4443-1533
IdRef : 123707013

Institut de génétique humaine

Thérèse Commes

Fonction : Auteur
PersonId : 743232
IdHAL : therese-commes
ORCID : 0000-0002-7918-0176
IdRef : 112246419

Institut de génétique humaine

Jacques Marti

Fonction : Auteur

Institut de génétique humaine

Résumé

Recently, experiments based on tiling arrays have revealed the huge difference between number of active transcripts and the number of annotated genes. This discrepancy is not only due to differentially spliced variants, since many active transcriptional units were predicted far apart from genes. We are thus confronted to the challenge of annotating and characterizing more precisely the human transcriptome. For this purpose, approaches based on the sequencing of transcript signatures, like SAGE, which yield one tag par transcript expressed in the cell, do not provide enough sequence information per transcript to allow finding the unique corresponding loci on the genome. Indeed, on the human genome conventional SAGE tags (14 bps) occur in average more than 3000 times (theoretically), while long SAGE tags (21 bps) match a unique location in only 49% of the cases. To address this annotation challenge, we designed a novel strategy in which two SAGE libraries are built in parallel from the same polyadenylated RNA sample, with tags anchored on two different restrictions sites of cDNAs. In other words, the experiment yields per transcript two tags (14 bps) anchored at different positions on the transcript (the first NlaIII and the first Sau3A restriction sites starting from the 3' end), but the association of these is unknown. Among the many tags of each sort obtained by sequencing the libraries, one does not know a priori which pair corresponds to which transcript. We developed a fast program to find these associated pairs, called tandem tags, on the genome sequence. We first tested our strategy using two small twin SAGE libraries prepared from a unique RNA sample extracted from terminally differentiated macrophages. These two SAGE datasets were first used to assemble a set of 489 pairs mapping on well annotated human transcript sequences. This set enables us to evaluate the ability of our method to recover the genome locus from which each known transcript originates. In the remaining SAGE sets, we selected the subsets of tags that are not linked in any way to known transcripts, and searched on the genome for compound pairs of tags. This yields a set of 250 tag-delimited genomic sequences (TDGS), which were then investigated in silico, and for a selection, tested in vitro to see if they reveal unknown transcripts. When dealing with complex genomes, the results shows that this strategy improves on classical tag based approaches for the identification of novel transcript units.

Mots clés

SAGE pattern matching algorithm macrophage

Domaines

Bio-informatique [q-bio.QM] Génomique, Transcriptomique et Protéomique [q-bio.GN] Cancer

Eric Rivals : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00120164

Soumis le : mercredi 13 décembre 2006-14:50:38

Dernière modification le : vendredi 24 mars 2023-14:52:48

Dates et versions

lirmm-00120164 , version 1 (13-12-2006)

Identifiants

HAL Id : lirmm-00120164 , version 1

Citer

Eric Rivals, Anthony Boureux, Mireille Lejeune, Florence Ottones, Oscar Pecharromàn Pérez, et al.. Human Transcriptome Annotation using Tandem SAGE Tags. Integrative Post-Genomics, Nov 2006, Lyon, France. pp.39. ⟨lirmm-00120164⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS MAB LIRMM MIPS BS UNIV-MONTPELLIER

299 Consultations

0 Téléchargements

Human Transcriptome Annotation using Tandem SAGE Tags

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager