DPiSAX: Massively Distributed Partitioned iSAX

Djamel-Edine Yagoubi 1 Reza Akbarinia 1 Florent Masseglia 1, 2 Themis Palpanas 3
1 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : Indexing is crucial for many data mining tasks that rely on efficient and effective similarity query processing. Consequently, indexing large volumes of time series, along with high performance similarity query processing, have became topics of high interest. For many applications across diverse domains though, the amount of data to be processed might be intractable for a single machine, making existing centralized indexing solutions inefficient. We propose a parallel indexing solution that gracefully scales to billions of time series, and a parallel query processing strategy that, given a batch of queries, efficiently exploits the index. Our experiments, on both synthetic and real world data, illustrate that our index creation algorithm works on 1 billion time series in less than 2 hours , while the state of the art centralized algorithms need more than 5 days. Also, our distributed querying algorithm is able to efficiently process millions of queries over collections of billions of time series, thanks to an effective load balancing mechanism.
Type de document :
Communication dans un congrès
ICDM: International Conference on Data Mining, Nov 2017, New Orleans, United States. IEEE International Conference on Data Mining, pp.1-6, 2017, 〈http://www.ucs.louisiana.edu/~sxk6389/index.html〉. 〈10.1109/ICDM.2017.151〉
Liste complète des métadonnées

Littérature citée [20 références]  Voir  Masquer  Télécharger

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01620125
Contributeur : Reza Akbarinia <>
Soumis le : vendredi 20 octobre 2017 - 11:04:55
Dernière modification le : vendredi 15 mars 2019 - 01:15:10
Document(s) archivé(s) le : dimanche 21 janvier 2018 - 12:30:38

Fichier

DPiSAX__ICDM___short_paper_.pd...
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Djamel-Edine Yagoubi, Reza Akbarinia, Florent Masseglia, Themis Palpanas. DPiSAX: Massively Distributed Partitioned iSAX. ICDM: International Conference on Data Mining, Nov 2017, New Orleans, United States. IEEE International Conference on Data Mining, pp.1-6, 2017, 〈http://www.ucs.louisiana.edu/~sxk6389/index.html〉. 〈10.1109/ICDM.2017.151〉. 〈lirmm-01620125〉

Partager

Métriques

Consultations de la notice

947

Téléchargements de fichiers

587