HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

Linking BWT and XBW via Aho-Corasick Automaton: Applications to Run-Length Encoding

Bastien Cazaux 1, 2 Eric Rivals 2, 1
1 MAB - Méthodes et Algorithmes pour la Bioinformatique
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : The boom of genomic sequencing makes compression of sets of sequences inescapable. This underlies the need for multi-string indexing data structures that helps compressing the data. The most prominent example of such data structures is the Burrows-Wheeler Transform (BWT), a reversible permutation of a text that improves its compressibility. A similar data structure, the eXtended Burrows-Wheeler Transform (XBW), is able to index a tree labelled with alphabet symbols. A link between a multi-string BWT and the Aho-Corasick automaton has already been found and led to a way to build a XBW from a multi-string BWT. We exhibit a stronger link between a multi-string BWT and a XBW by using the order of the concatenation in the multi-string. This bijective link has several applications: first, it allows one to build one data structure from the other; second, it enables one to compute an ordering of the input strings that optimises a Run-Length measure (i.e., the compressibility) of the BWT or of the XBW.
Complete list of metadata

Cited literature [2 references]  Display  Hide  Download

Contributor : Eric Rivals Connect in order to contact the contributor
Submitted on : Wednesday, November 27, 2019 - 7:38:59 AM
Last modification on : Thursday, January 13, 2022 - 12:11:18 PM
Long-term archiving on: : Friday, February 28, 2020 - 1:22:50 PM


Files produced by the author(s)


Distributed under a Creative Commons Attribution 4.0 International License




Bastien Cazaux, Eric Rivals. Linking BWT and XBW via Aho-Corasick Automaton: Applications to Run-Length Encoding. CPM 2019 - 30th Annual Symposium on Combinatorial Pattern Matching, University of Pisa, Jun 2019, Pise, Italy. pp.24:1--24:20, ⟨10.4230/LIPIcs.CPM.2019.24⟩. ⟨lirmm-02382066⟩



Record views


Files downloads