Read indexing

The question of read indexing remains broadly unexplored. However, the increase in sequence throughput urges for new algorithmic solutions to query large read collections efficiently. We pro- pose a solution, named Gk arrays, to index large collections of reads, an algorithm to build the structure, and procedures to query it. Once constructed, the index structure is kept in main memory and is repeatedly accessed to answer various types of queries. We compare our data structure to other possible solutions to investigate its scalability and computational efficiency. Gk arrays are im- plemented in a general purpose library, which may prove useful for assembly purposes, for evaluating the expression level in RNA-seq, and others high throughput sequencing applications.

Mots clés

Next Generation Sequencing stringology data structure k-mer text indexing scalability genome transcriptome bioinformatics

Domaines

Bio-informatique [q-bio.QM] Bio-Informatique, Biologie Systémique [q-bio.QM]

Fichier principal

289-2729-1-PB.pdf (177.97 Ko)

Origine	Fichiers éditeurs autorisés sur une archive ouverte

Eric Rivals : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00757983

Soumis le : mardi 27 novembre 2012-18:14:57

Dernière modification le : vendredi 17 mai 2024-16:32:06

Archivage à long terme le : samedi 17 décembre 2016-16:12:35

Dates et versions

lirmm-00757983 , version 1 (27-11-2012)

Identifiants

HAL Id : lirmm-00757983 , version 1
DOI : 10.14806/ej.17.B.289

Citer

Nicolas Philippe, Mikael Salson, Thérèse Commes, Thierry Lecroq, Martine Léonard, et al.. Read indexing. EMBnet.journal, 2011, 17 (Supplement B), pp.1. ⟨10.14806/ej.17.B.289⟩. ⟨lirmm-00757983⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-LILLE3 CNRS INRIA INSA-ROUEN CRBM LIFL LITIS MAB LIRMM COMUE-NORMANDIE CRISTAL INRIA2 CRISTAL-BONSAI MIPS BS UNIV-MONTPELLIER UNIROUEN UNILEHAVRE INSA-GROUPE

500 Consultations

252 Téléchargements