Computational Pan-Genomics - LIRMM - Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier Access content directly
Conference Papers Year : 2019

Computational Pan-Genomics

Enjeux computationnels de la pan-génomique

Abstract

The notion of pan-genome or genome graph, understood as the representation of all variant genomes of one or more closely related species, is gaining attractiveness to replace the classical, sequence-based, linear reference genome. Indeed, one of main reasons why most analyses are based on the linear reference genome is simplicity: simplicity of processing-for most bioinformatic algorithms target a single linear sequence as reference-and simplicity of understanding-our minds prefer linear structures. Due evolution, related sequences may share common or highly similar regions, which should not be considered as independent, but are currently considered so, for instance when one blasts a query sequence against a collections of genomes. A sequence graph representation summarises in a single reference the similarities and differences of the multiple collected variants. For instance, read mapping on a sequence graph directly tells in which variants a read maps and in which it does not, thereby revealing variant specificity (if any). Even inside the class of graph based representations, several options are available for the data structure to use. With increasing sequencing throughput, the number of sequences to represent grows and efficiency becomes an issue. Practical uses of sequence graphs and pan-genomes depends heavily of the underlying indexing data structures, especially of their memory footprint. I will present an overview of computational issues related to such sequence graph based approaches to pan-genome representation, and illustrates their complexity and advantages (which reach far beyond efficiency aspects).
Rivals-gis-biotech-abs.pdf (551.2 Ko) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

lirmm-02387894 , version 1 (15-07-2020)

Identifiers

  • HAL Id : lirmm-02387894 , version 1

Cite

Eric Rivals. Computational Pan-Genomics. Pangénomique végétale, GIS Biotechnologies Vertes, INRA, Jul 2019, Paris, France. ⟨lirmm-02387894⟩
62 View
7 Download

Share

Gmail Facebook X LinkedIn More