Skip to Main content Skip to Navigation
Conference papers

Massively Distributed Clustering via Dirichlet Process Mixture

Khadidja Meguelati 1 Bénédicte Fontez 2 Nadine Hilgert 2 Florent Masseglia 1 Isabelle Sanchez 2
1 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : Dirichlet Process Mixture (DPM) is a model used for multivariate clustering with the advantage of discovering the number of clusters automatically and offering favorable characteristics, but with prohibitive response times, which makes centralized DPM approaches inefficient. We propose a demonstration of two parallel clustering solutions : i) DC-DPM that gracefully scales to millions of data points while remaining DPM compliant, which is the challenge of distributing this process, ii) HD4C that addresses the curse of dimensionality by performing a distributed DPM clustering of high dimensional data such as time series or hyperspectral data.
Document type :
Conference papers
Complete list of metadata
Contributor : Florent Masseglia <>
Submitted on : Wednesday, December 2, 2020 - 9:15:01 PM
Last modification on : Friday, September 17, 2021 - 4:46:06 PM
Long-term archiving on: : Wednesday, March 3, 2021 - 8:19:29 PM


Files produced by the author(s)


  • HAL Id : lirmm-03036910, version 1


Khadidja Meguelati, Bénédicte Fontez, Nadine Hilgert, Florent Masseglia, Isabelle Sanchez. Massively Distributed Clustering via Dirichlet Process Mixture. ECML PKDD 2020 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Sep 2020, Ghent / Virtual, Belgium. ⟨lirmm-03036910⟩



Record views


Files downloads