High Dimensional Data Clustering by means of Distributed Dirichlet Process Mixture Models - LIRMM - Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier
Communication Dans Un Congrès Année : 2019

High Dimensional Data Clustering by means of Distributed Dirichlet Process Mixture Models

Résumé

Clustering is a data mining technique intensively used for data analytics, with applications to marketing, security, text/document analysis, or sciences like biology, astronomy, and many more. Dirichlet Process Mixture (DPM) is a model used for multivariate clustering with the advantage of discovering the number of clusters automatically and offering favorable characteristics. However, in the case of high dimensional data, it becomes an important challenge with numerical and theoretical pitfalls. The advantages of DPM come at the price of prohibitive running times, which impair its adoption and makes centralized DPM approaches inefficient, especially with high dimensional data. We propose HD4C (High Dimensional Data Distributed Dirichlet Clustering), a parallel clustering solution that addresses the curse of dimensionality by two means. First it gracefully scales to massive datasets by distributed computing, while remaining DPM-compliant. Second, it performs clustering of high dimensional data such as time series (as a function of time), hyperspectral data (as a function of wavelength) etc. Our experiments, on both synthetic and real world data, illustrate the high performance of our approach.
Fichier principal
Vignette du fichier
IEEE_BigData_2019__HAL_.pdf (6.31 Mo) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

lirmm-02364411 , version 1 (16-11-2019)

Identifiants

  • HAL Id : lirmm-02364411 , version 1
  • WOS : 000554828700105

Citer

Khadidja Meguelati, Bénédicte Fontez, Nadine Hilgert, Florent Masseglia. High Dimensional Data Clustering by means of Distributed Dirichlet Process Mixture Models. IEEE Big Data 2019 - IEEE International Conference on Big Data, Dec 2019, Los-Angeles, United States. ⟨lirmm-02364411⟩
199 Consultations
514 Téléchargements

Partager

More