Skip to Main content Skip to Navigation
Journal articles

A Scalable Indexing Solution to Mine Huge Genomic Sequence Collections

Eric Rivals 1, * Nicolas Philippe 1 Mikael Salson 2 Martine Léonard 3 Thérèse Commes 4 Thierry Lecroq 3
* Corresponding author
1 MAB - Méthodes et Algorithmes pour la Bioinformatique
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
2 BONSAI - Bioinformatics and Sequence Analysis
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe
Abstract : With High Throughput Sequencing (HTS) technologies, biology is experiencing a sequence data deluge. A single sequencing experiment currently yields 100 million short sequences, or reads, the analysis of which demands efficient and scalable sequence analysis algorithms. Diverse kinds of applications repeatedly need to query the sequence collection for the occurrence positions of a subword. Time can be saved by building an index of all subwords present in the sequences before performing huge numbers of queries. However, both the scalability and the memory requirement of the chosen data structure must suit the data volume. Here, we introduce a novel indexing data structure, called Gk arrays, and related algorithms that improve on classical indexes and state of the art hash tables.
Complete list of metadata

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00712653
Contributor : Eric Rivals <>
Submitted on : Wednesday, June 27, 2012 - 4:13:51 PM
Last modification on : Friday, September 17, 2021 - 3:27:13 AM
Long-term archiving on: : Friday, September 28, 2012 - 2:41:41 AM

File

Rivals-etal-ERCIM-News-89-5p.p...
Publisher files allowed on an open archive

Identifiers

  • HAL Id : lirmm-00712653, version 1

Citation

Eric Rivals, Nicolas Philippe, Mikael Salson, Martine Léonard, Thérèse Commes, et al.. A Scalable Indexing Solution to Mine Huge Genomic Sequence Collections. ERCIM News, ERCIM, 2012, 2012 (89), pp.20-21. ⟨lirmm-00712653⟩

Share

Metrics

Record views

1363

Files downloads

2235