Variable size segmentation for efficient representation and querying of non-uniform time series datasets

Lamia Djebour; Reza Akbarinia; Florent Masseglia

doi:10.1145/3477314.3507000

Communication Dans Un Congrès Année : 2022

Variable size segmentation for efficient representation and querying of non-uniform time series datasets

(1) , (1) , (1)

Lamia Djebour

Fonction : Auteur
PersonId : 1119503

Scientific Data Management

Reza Akbarinia

Fonction : Auteur
PersonId : 172647
IdHAL : reza-akbarinia
ORCID : 0000-0002-7098-0361
IdRef : 119863421

Scientific Data Management

Florent Masseglia

Fonction : Auteur
PersonId : 172896
IdHAL : florent-masseglia
ORCID : 0000-0002-1149-585X
IdRef : 120528681

Scientific Data Management

Résumé

Existing approaches for time series similarity computing are the core of many data analytics tasks. Given the considered data volumes, or simply the need for fast response times, they often rely on shorter representations, usually with information loss. This incurs approximate comparisons where precision is a major issue. We present and experimentally evaluate ASAX, a new approach for segmenting time series before their transformation into symbolic representations. ASAX reduces significantly the information loss incurred by possible splittings at different steps of the representation calculation, particularly for datasets with unbalanced (nonuniform) distributions. We provide theoretical guarantees on the lower bound of similarity measures, and our experiments illustrate that our method outperforms the state of the art, with significant gain in precision for datasets with unbalanced distributions.

Mots clés

Data mining Spatial-temporal systems Nearest-neighbor search Time Series Representations Information Retrieval Time series Nearest-neighbor search Information retrieval

Domaines

Recherche d'information [cs.IR]

Fichier principal

Entropy-SAC.pdf (3.9 Mo)

Origine	Fichiers produits par l'(les) auteur(s)

Reza Akbarinia : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-03806053

Soumis le : vendredi 7 octobre 2022-16:23:07

Dernière modification le : lundi 6 novembre 2023-09:58:03

Archivage à long terme le : dimanche 8 janvier 2023-19:24:38

Dates et versions

lirmm-03806053 , version 1 (07-10-2022)

Identifiants

HAL Id : lirmm-03806053 , version 1
DOI : 10.1145/3477314.3507000

Citer

Lamia Djebour, Reza Akbarinia, Florent Masseglia. Variable size segmentation for efficient representation and querying of non-uniform time series datasets. SAC 2022 - 37th ACM/SIGAPP Symposium on Applied Computing, Apr 2022, Virtual Event, United States. pp.395-402, ⟨10.1145/3477314.3507000⟩. ⟨lirmm-03806053⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA ZENITH LIRMM INRIA2 UNIV-MONTPELLIER

52 Consultations

89 Téléchargements

Variable size segmentation for efficient representation and querying of non-uniform time series datasets

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager