Computational Pan-Genomics

Eric Rivals

Communication Dans Un Congrès Année : 2019

Computational Pan-Genomics

Enjeux computationnels de la pan-génomique

(1, 2)

1
2

Eric Rivals

Fonction : Auteur
PersonId : 2002
IdHAL : eric-rivals
ORCID : 0000-0003-3791-3973
IdRef : 118021850

Méthodes et Algorithmes pour la Bioinformatique

Institut de Biologie Computationnelle

Résumé

The notion of pan-genome or genome graph, understood as the representation of all variant genomes of one or more closely related species, is gaining attractiveness to replace the classical, sequence-based, linear reference genome. Indeed, one of main reasons why most analyses are based on the linear reference genome is simplicity: simplicity of processing-for most bioinformatic algorithms target a single linear sequence as reference-and simplicity of understanding-our minds prefer linear structures. Due evolution, related sequences may share common or highly similar regions, which should not be considered as independent, but are currently considered so, for instance when one blasts a query sequence against a collections of genomes. A sequence graph representation summarises in a single reference the similarities and differences of the multiple collected variants. For instance, read mapping on a sequence graph directly tells in which variants a read maps and in which it does not, thereby revealing variant specificity (if any). Even inside the class of graph based representations, several options are available for the data structure to use. With increasing sequencing throughput, the number of sequences to represent grows and efficiency becomes an issue. Practical uses of sequence graphs and pan-genomes depends heavily of the underlying indexing data structures, especially of their memory footprint. I will present an overview of computational issues related to such sequence graph based approaches to pan-genome representation, and illustrates their complexity and advantages (which reach far beyond efficiency aspects).

Mots clés

Bioinformatics Genomics Variation graphs Data structures Indexing

Domaines

Bio-informatique [q-bio.QM]

Rivals-gis-biotech-abs.pdf (551.2 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Eric Rivals : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-02387894

Soumis le : mercredi 15 juillet 2020-10:47:03

Dernière modification le : vendredi 24 mars 2023-14:53:14

Dates et versions

lirmm-02387894 , version 1 (15-07-2020)

Identifiants

HAL Id : lirmm-02387894 , version 1

Citer

Eric Rivals. Computational Pan-Genomics. Pangénomique végétale, GIS Biotechnologies Vertes, INRA, Jul 2019, Paris, France. ⟨lirmm-02387894⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA MAB LIRMM MIPS UNIV-MONTPELLIER INRAE ANR

64 Consultations

9 Téléchargements

Computational Pan-Genomics

Enjeux computationnels de la pan-génomique

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager