CSAM: Compressed SAM format

Rodrigo Cánovas 1, 2 Alistair Moffat Andrew Turpin
2 MAB - Méthodes et Algorithmes pour la Bioinformatique
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : Motivation: Next generation sequencing machines produce vast amounts of genomic data. For the data to be useful, it is essential that it can be stored and manipulated efficiently. This work responds to the combined challenge of compressing genomic data, while providing fast access to regions of interest, without necessitating decompression of whole files. Results: We describe CSAM (Compressed SAM format), a compression approach offering lossless and lossy compression for SAM files. The structures and techniques proposed are suitable for representing SAM files, as well as supporting fast access to the compressed information. They generate more compact lossless representations than BAM, which is currently the preferred lossless compressed SAM-equivalent format; and are self-contained, that is, they do not depend on any external resources to compress or decompress SAM files. Availability and Implementation: An implementation is available at https://github.com/rcanovas/
Document type :
Journal articles
Complete list of metadatas

Cited literature [22 references]  Display  Hide  Download

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01951638
Contributor : Eric Rivals <>
Submitted on : Tuesday, December 11, 2018 - 3:27:20 PM
Last modification on : Wednesday, July 10, 2019 - 7:14:02 PM
Long-term archiving on : Tuesday, March 12, 2019 - 2:56:42 PM

File

Canovas-bioinformatics-2016-bt...
Publisher files allowed on an open archive

Identifiers

Collections

Citation

Rodrigo Cánovas, Alistair Moffat, Andrew Turpin. CSAM: Compressed SAM format. Bioinformatics, Oxford University Press (OUP), 2016, 32 (24), pp.3709-3716. ⟨10.1093/bioinformatics/btw543⟩. ⟨lirmm-01951638⟩

Share

Metrics

Record views

119

Files downloads

116