Massively Distributed Time Series Indexing and Querying

Djamel-Edine Edine Yagoubi; Reza Akbarinia; Florent Masseglia; Themis Palpanas

doi:10.1109/TKDE.2018.2880215

Article Dans Une Revue IEEE Transactions on Knowledge and Data Engineering Année : 2020

Massively Distributed Time Series Indexing and Querying

(1) , (1) , (1) , (2)

1
2

Djamel-Edine Edine Yagoubi

Fonction : Auteur
PersonId : 1086292

Scientific Data Management

Reza Akbarinia

Fonction : Auteur
PersonId : 172647
IdHAL : reza-akbarinia
ORCID : 0000-0002-7098-0361
IdRef : 119863421

Scientific Data Management

Florent Masseglia

Fonction : Auteur
PersonId : 172896
IdHAL : florent-masseglia
ORCID : 0000-0002-1149-585X
IdRef : 120528681

Scientific Data Management

Themis Palpanas

Fonction : Auteur
PersonId : 974240
ORCID : 0000-0002-8031-0265

Université Paris Cité

Résumé

Indexing is crucial for many data mining tasks that rely on efficient and effective similarity query processing. Consequently, indexing large volumes of time series, along with high performance similarity query processing, have became topics of high interest. For many applications across diverse domains though, the amount of data to be processed might be intractable for a single machine, making existing centralized indexing solutions inefficient. We propose a parallel indexing solution that gracefully scales to billions of time series (or high-dimensional vectors, in general), and a parallel query processing strategy that, given a batch of queries, efficiently exploits the index. Our experiments, on both synthetic and real world data, illustrate that our index creation algorithm works on 4 billion time series in less than 5 hours, while the state of the art centralized algorithms do not scale and have their limit on 1 billion time series, where they need more than 5 days. Also, our distributed querying algorithm is able to efficiently process millions of queries over collections of billions of time series, thanks to an effective load balancing mechanism.

Mots clés

Index Terms-Time Series Parallel Indexing Distributed Querying

Domaines

Recherche d'information [cs.IR]

Fichier principal

DPiSAX_TKDE.pdf (591.39 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Reza Akbarinia : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-02197618

Soumis le : mardi 30 juillet 2019-14:58:36

Dernière modification le : jeudi 15 février 2024-03:30:53

Dates et versions

lirmm-02197618 , version 1 (30-07-2019)

Identifiants

HAL Id : lirmm-02197618 , version 1
DOI : 10.1109/TKDE.2018.2880215

Citer

Djamel-Edine Edine Yagoubi, Reza Akbarinia, Florent Masseglia, Themis Palpanas. Massively Distributed Time Series Indexing and Querying. IEEE Transactions on Knowledge and Data Engineering, 2020, 32 (1), pp.108-120. ⟨10.1109/TKDE.2018.2880215⟩. ⟨lirmm-02197618⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA IRISA ZENITH LIRMM INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC MIPS UNIV-MONTPELLIER UNIV-RENNES UR1-MATH-NUM

174 Consultations

712 Téléchargements

Massively Distributed Time Series Indexing and Querying

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager