Spark-parSketch: A Massively Distributed Indexing of Time Series Datasets

Oleksandra Levchenko 1 Djamel-Edine Yagoubi 1 Reza Akbarinia 1 Florent Masseglia 1 Boyan Kolev 1 Dennis Shasha 2
1 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : A growing number of domains (finance, seismology, internet-of-things, etc.) collect massive time series. When the number of series grow to the hundreds of millions or even billions, similarity queries become intractable on a single machine. Further, naive (quadratic) parallelization won't work well. So, we need both efficient indexing and parallelization. We propose a demonstration of Spark-parSketch, a complete solution based on sketches / random projections to efficiently perform both the parallel indexing of large sets of time series and a similarity search on them. Because our method is approximate, we explore the tradeoff between time and precision. A video showing the dynamics of the demonstration can be found by the link http://parsketch.gforge.inria.fr/video/ parSketchdemo_720p.mov.
Complete list of metadatas

Cited literature [18 references]  Display  Hide  Download

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01886760
Contributor : Reza Akbarinia <>
Submitted on : Wednesday, October 3, 2018 - 10:58:52 AM
Last modification on : Wednesday, August 14, 2019 - 10:46:03 AM
Long-term archiving on : Friday, January 4, 2019 - 1:20:56 PM

File

CIKM2018.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Oleksandra Levchenko, Djamel-Edine Yagoubi, Reza Akbarinia, Florent Masseglia, Boyan Kolev, et al.. Spark-parSketch: A Massively Distributed Indexing of Time Series Datasets. CIKM: Conference on Information and Knowledge Management, Oct 2018, Turin, Italy. pp.1951-1954, ⟨10.1145/3269206.3269226⟩. ⟨lirmm-01886760⟩

Share

Metrics

Record views

101

Files downloads

291