Skip to Main content Skip to Navigation
Conference papers

MPSCAN: Fast Localisation of Multiple Reads in Genomes

Eric Rivals 1, * Leena Salmela 2 Petri Kalsi 2 Petteri Kiiskinen 2 Jorma Tarhio 2
* Corresponding author
1 MAB - Méthodes et Algorithmes pour la Bioinformatique
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : With Next Generation Sequencers, sequence based transcriptomic or epigenomic assays yield millions of short sequence reads that need to map back on a reference genome. The upcoming versions of these sequencers promise even higher sequencing capacities; this may turn the read mapping task into a bottleneck for which alternative pattern matching approaches must be experimented. We present an algorithm and its implementation, called MPSCAN, which uses a sophisticated filtration scheme to match a set patterns/reads exactly on a sequence. MPSCAN can search for millions of reads in a single pass through the genome without indexing its sequence. Moreover, we show that MPSCAN offers an optimal average time complexity, which is sublinear in the text length, meaning that it does not need to examine all sequence positions. Comparisons with BLAT-like tools and with specialised read mapping programs (like BOWTIE or ZOOM) demonstrate that MPSCAN also is the fastest algorithm in practice for exact matching. Our accuracy and scalability reveal that some tool are inappropriate for read mapping. Moreover, we provide evidence suggesting that exact matching may be a valuable solution in some read mapping applications. As most read mapping programs somehow rely on exact matching procedures to perform approximate pattern mapping, the filtration scheme we experimented may reveal useful in the design of future algorithms. The absence of genome index gives MPSCAN its low memory requirement and flexibility that let it to run on a desktop computer, and avoids a time-consuming genome preprocessing.
Complete list of metadata

Cited literature [27 references]  Display  Hide  Download

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00407173
Contributor : Eric Rivals <>
Submitted on : Thursday, July 23, 2009 - 7:03:47 PM
Last modification on : Monday, January 13, 2020 - 5:36:13 PM
Long-term archiving on: : Thursday, June 30, 2011 - 11:45:35 AM

File

mpscan-wabi-draft.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Eric Rivals, Leena Salmela, Petri Kalsi, Petteri Kiiskinen, Jorma Tarhio. MPSCAN: Fast Localisation of Multiple Reads in Genomes. WABI: Workshop on Algorithms in Bioinformatics, Sep 2009, Philadelphie, United States. pp.246-260, ⟨10.1007/978-3-642-04241-6_21⟩. ⟨lirmm-00407173⟩

Share

Metrics

Record views

1189

Files downloads

488