High performance text indexing and applications in life sciences

Eric Rivals

Communication Dans Un Congrès Année : 2019

High performance text indexing and applications in life sciences

(1, 2)

1
2

Eric Rivals

Fonction : Auteur
PersonId : 2002
IdHAL : eric-rivals
ORCID : 0000-0003-3791-3973
IdRef : 118021850

Institut de Biologie Computationnelle

Méthodes et Algorithmes pour la Bioinformatique

Résumé

Large corpura of texts or of sequences serve as references and are interrogated through web site or programming interfaces. In life sciences, new sequencing technologies have revolutionised the acquisition of genomic sequences and triggered an exponential accumulation of reference sequences in international, public databases. Several kinds of text queries form the basic operation of programs that analyse genomic sequences. For instance, the webserver of EMBL-EBI receives 27 million queries a day. A typical sequencing experiment yields a hundred million sequencing reads-about 150 nucleotides long-each of which needs to be compared to a reference genome. To analyse such data or to mine public sequence repositories demands very efficient programs and algorithms. Only, the use of complex and specific, indexing data structures allows us to match the needs of Life sciences communities. I will present some indexing data structures that enables high performance computational analyses in genomics, and mention their pracical applications. Beyond text data, such data structures can be adapted to index other types of discrete data like trees or graphs. This will be key for the development of computational pan-genomics.

Mots clés

Data structure Indexing Sequencing Query NGS Efficiency Bioinformatics Services Genomics

Domaines

Bio-informatique [q-bio.QM]

Fichier principal

Rivals-hpc-UK.pdf (330.06 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Eric Rivals : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-02093450

Soumis le : lundi 8 avril 2019-23:13:24

Dernière modification le : vendredi 24 mars 2023-14:53:10

Dates et versions

lirmm-02093450 , version 1 (08-04-2019)

Identifiants

HAL Id : lirmm-02093450 , version 1

Citer

Eric Rivals. High performance text indexing and applications in life sciences. UK-France Bilateral International Meeting on High Performance Computing and Biomathematics, Feb 2019, Chicheley, United Kingdom. ⟨lirmm-02093450⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA INRA MAB LIRMM MIPS UNIV-MONTPELLIER INRAE ANR NUMEV

102 Consultations

52 Téléchargements

High performance text indexing and applications in life sciences

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager