Skip to Main content Skip to Navigation
Conference papers

Computational Pan-Genomics

Eric Rivals 1, 2
1 MAB - Méthodes et Algorithmes pour la Bioinformatique
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : The notion of pan-genome or genome graph, understood as the representation of all variant genomes of one or more closely related species, is gaining attractiveness to replace the classical, sequence-based, linear reference genome. Indeed, one of main reasons why most analyses are based on the linear reference genome is simplicity: simplicity of processing-for most bioinformatic algorithms target a single linear sequence as reference-and simplicity of understanding-our minds prefer linear structures. Due evolution, related sequences may share common or highly similar regions, which should not be considered as independent, but are currently considered so, for instance when one blasts a query sequence against a collections of genomes. A sequence graph representation summarises in a single reference the similarities and differences of the multiple collected variants. For instance, read mapping on a sequence graph directly tells in which variants a read maps and in which it does not, thereby revealing variant specificity (if any). Even inside the class of graph based representations, several options are available for the data structure to use. With increasing sequencing throughput, the number of sequences to represent grows and efficiency becomes an issue. Practical uses of sequence graphs and pan-genomes depends heavily of the underlying indexing data structures, especially of their memory footprint. I will present an overview of computational issues related to such sequence graph based approaches to pan-genome representation, and illustrates their complexity and advantages (which reach far beyond efficiency aspects).
Document type :
Conference papers
Complete list of metadata

Cited literature [2 references]  Display  Hide  Download

https://hal-lirmm.ccsd.cnrs.fr/lirmm-02387894
Contributor : Eric Rivals <>
Submitted on : Wednesday, July 15, 2020 - 10:47:03 AM
Last modification on : Monday, December 14, 2020 - 3:41:25 PM

Identifiers

  • HAL Id : lirmm-02387894, version 1

Collections

Citation

Eric Rivals. Computational Pan-Genomics. Pangénomique végétale, GIS Biotechnologies Vertes, INRA, Jul 2019, Paris, France. ⟨lirmm-02387894⟩

Share

Metrics

Record views

64

Files downloads

8