Skip to Main content Skip to Navigation
Conference papers

Spark-parSketch: A Massively Distributed Indexing of Time Series Datasets

Oleksandra Levchenko 1 Djamel-Edine Yagoubi 1 Reza Akbarinia 1 Florent Masseglia 1 Boyan Kolev 1 Dennis Shasha 2
1 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : A growing number of domains (finance, seismology, internet-of-things, etc.) collect massive time series. When the number of series grow to the hundreds of millions or even billions, similarity queries become intractable on a single machine. Further, naive (quadratic) parallelization won't work well. So, we need both efficient indexing and parallelization. We propose a demonstration of Spark-parSketch, a complete solution based on sketches / random projections to efficiently perform both the parallel indexing of large sets of time series and a similarity search on them. Because our method is approximate, we explore the tradeoff between time and precision. A video showing the dynamics of the demonstration can be found by the link
Complete list of metadata

Cited literature [18 references]  Display  Hide  Download
Contributor : Reza Akbarinia Connect in order to contact the contributor
Submitted on : Wednesday, October 3, 2018 - 10:58:52 AM
Last modification on : Tuesday, October 19, 2021 - 11:05:59 AM
Long-term archiving on: : Friday, January 4, 2019 - 1:20:56 PM


Files produced by the author(s)



Oleksandra Levchenko, Djamel-Edine Yagoubi, Reza Akbarinia, Florent Masseglia, Boyan Kolev, et al.. Spark-parSketch: A Massively Distributed Indexing of Time Series Datasets. CIKM: Conference on Information and Knowledge Management, Oct 2018, Turin, Italy. pp.1951-1954, ⟨10.1145/3269206.3269226⟩. ⟨lirmm-01886760⟩



Record views


Files downloads