RedOak: a reference-free and alignment-free structure for indexing a collection of similar genomes - LIRMM - Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2021

RedOak: a reference-free and alignment-free structure for indexing a collection of similar genomes

Résumé

Background: As the cost of DNA sequencing decreases, high-throughput sequencing technologies become increasingly accessible to many laboratories. Consequently, new issues emerge that require new algorithms, including tools for indexing and compressing hundred to thousands of complete genomes. Results: This paper presents RedOak, a reference-free and alignment-free software package that allows for the indexing of a large collection of similar genomes. RedOak can also be applied to reads from unassembled genomes, and it provides a nucleotide sequence query function. This software is based on a k-mer approach and has been developed to be heavily parallelized and distributed on several nodes of a cluster. The source code of our RedOak algorithm is available at https://gitlab.info-ufr.univ-montp2.fr/DoccY/RedOak. Conclusions: RedOak may be really useful for biologists and bioinformaticians expecting to extract information from large sequence datasets.
Fichier principal
Vignette du fichier
2020.12.19.423583v1.full.pdf (1.12 Mo) Télécharger le fichier

Dates et versions

lirmm-03117453 , version 1 (21-01-2021)

Identifiants

Citer

Clément Agret, Annie Chateau, Gaëtan Droc, Gautier Sarah, Alban Mancheron, et al.. RedOak: a reference-free and alignment-free structure for indexing a collection of similar genomes. 2021. ⟨lirmm-03117453⟩
161 Consultations
271 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More