Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

RedOak: a reference-free and alignment-free structure for indexing a collection of similar genomes

Abstract : Background: As the cost of DNA sequencing decreases, high-throughput sequencing technologies become increasingly accessible to many laboratories. Consequently, new issues emerge that require new algorithms, including tools for indexing and compressing hundred to thousands of complete genomes. Results: This paper presents RedOak, a reference-free and alignment-free software package that allows for the indexing of a large collection of similar genomes. RedOak can also be applied to reads from unassembled genomes, and it provides a nucleotide sequence query function. This software is based on a k-mer approach and has been developed to be heavily parallelized and distributed on several nodes of a cluster. The source code of our RedOak algorithm is available at https://gitlab.info-ufr.univ-montp2.fr/DoccY/RedOak. Conclusions: RedOak may be really useful for biologists and bioinformaticians expecting to extract information from large sequence datasets.
Document type :
Preprints, Working Papers, ...
Complete list of metadata

https://hal-lirmm.ccsd.cnrs.fr/lirmm-03117453
Contributor : Alban Mancheron <>
Submitted on : Thursday, January 21, 2021 - 11:06:00 AM
Last modification on : Thursday, March 4, 2021 - 3:26:29 PM
Long-term archiving on: : Thursday, April 22, 2021 - 6:50:02 PM

Identifiers

Citation

Clement Agret, Annie Chateau, Gaetan Droc, Gautier Sarah, Alban Mancheron, et al.. RedOak: a reference-free and alignment-free structure for indexing a collection of similar genomes. 2021. ⟨lirmm-03117453⟩

Share

Metrics

Record views

84

Files downloads

102