Skip to Main content Skip to Navigation
Conference papers

Massively Distributed Clustering via Dirichlet Process Mixture

Khadidja Meguelati 1 Benedicte Fontez 2 Nadine Hilgert 2 Florent Masseglia 1 Isabelle Sanchez 2
1 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : Dirichlet Process Mixture (DPM) is a model used for multivariate clustering with the advantage of discovering the number of clusters automatically and offering favorable characteristics, but with prohibitive response times, which makes centralized DPM approaches inefficient. We propose a demonstration of two parallel clustering solutions : i) DC-DPM that gracefully scales to millions of data points while remaining DPM compliant, which is the challenge of distributing this process, ii) HD4C that addresses the curse of dimensionality by performing a distributed DPM clustering of high dimensional data such as time series or hyperspectral data.
Document type :
Conference papers
Complete list of metadata

https://hal-lirmm.ccsd.cnrs.fr/lirmm-03036910
Contributor : Florent Masseglia <>
Submitted on : Wednesday, December 2, 2020 - 9:15:01 PM
Last modification on : Thursday, May 27, 2021 - 4:34:04 PM
Long-term archiving on: : Wednesday, March 3, 2021 - 8:19:29 PM

File

ECML_PKDD_2020.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : lirmm-03036910, version 1

Citation

Khadidja Meguelati, Benedicte Fontez, Nadine Hilgert, Florent Masseglia, Isabelle Sanchez. Massively Distributed Clustering via Dirichlet Process Mixture. ECML PKDD 2020 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Sep 2020, Ghent (virtual), Belgium. ⟨lirmm-03036910⟩

Share

Metrics

Record views

65

Files downloads

52