Hierarchical Co-Clustering: Off-line and Incremental Approaches

Ruggero Pensa; Dino Ienco; Rosa Meo

doi:10.1007/s10618-012-0292-8

Article Dans Une Revue Data Mining and Knowledge Discovery Année : 2014

Hierarchical Co-Clustering: Off-line and Incremental Approaches

(1) , (2, 3) , (1)

1
2
3

Ruggero Pensa

Fonction : Auteur

Università degli studi di Torino = University of Turin

Dino Ienco

Fonction : Auteur
PersonId : 6226
IdHAL : dino-ienco
ORCID : 0000-0002-8736-3132
IdRef : 172688183

ADVanced Analytics for data SciencE

Institut national de recherche en sciences et technologies pour l'environnement et l'agriculture

Rosa Meo

Fonction : Auteur

Università degli studi di Torino = University of Turin

Résumé

Clustering data is challenging especially for two reasons. The dimensionality of the data is often very high which makes the cluster interpretation hard. Moreover, with high-dimensional data the classic metrics fail in identifying the real similarities between objects. The second challenge is the evolving nature of the observed phenomena which makes the datasets accumulating over time. In this paper we show how we propose to solve these problems. To tackle the high-dimensionality problem, we propose to apply a co-clustering approach on the dataset that stores the occurrence of features in the observed objects. Co-clustering computes a partition of objects and a partition of features simultaneously. The novelty of our co-clustering solution is that it arranges the clusters in a hierarchical fashion, and it consists of two hierarchies: one on the objects and one on the features. The two hierarchies are coupled because the clusters at a certain level in one hierarchy are coupled with the clusters at the same level of the other hierarchy and form the co-clusters. Each cluster of one of the two hierarchies thus provides insights on the clusters of the other hierarchy. Another novelty of the proposed solution is that the number of clusters is possibly unlimited. Nevertheless, the produced hierarchies are still compact and therefore more readable because our method allows multiple splits of a cluster at the lower level. As regards the second challenge, the accumulating nature of the data makes the datasets intractably huge over time. In this case, an incremental solution relieves the issue because it partitions the problem. In this paper we introduce an incremental version of our algorithm of hierarchical co-clustering. It starts from an intermediate solution computed on the previous version of the data and it updates the co-clustering results considering only the added block of data. This solution has the merit of speeding up the computation with respect to the original approach that would recompute the result on the overall dataset. In addition, the incremental algorithm guarantees approximately the same answer than the original version, but it saves much computational load. We validate the incremental approach on several high-dimensional datasets and perform an accurate comparison with both the original version of our algorithm and with the state of the art competitors as well. The obtained results open the way to a novel usage of the co-clustering algorithms in which it is advantageous to partition the data into several blocks and process them incrementally thus "incorporating" data gradually into an on-going co-clustering solution

Mots clés

Hierarchical Co-Clustering Incremental

Domaines

Base de données [cs.DB]

Fichier principal

paper-2_4aperto_889279.pdf (359.01 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Pascal Poncelet : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00798161

Soumis le : mercredi 29 janvier 2020-16:43:19

Dernière modification le : lundi 29 juillet 2024-16:09:15

Archivage à long terme le : jeudi 30 avril 2020-17:45:40

Dates et versions

lirmm-00798161 , version 1 (29-01-2020)

Licence

Paternité

Identifiants

HAL Id : lirmm-00798161 , version 1
DOI : 10.1007/s10618-012-0292-8
IRSTEA : PUB00042222

Citer

Ruggero Pensa, Dino Ienco, Rosa Meo. Hierarchical Co-Clustering: Off-line and Incremental Approaches. Data Mining and Knowledge Discovery, 2014, 28 (1), pp.31-64. ⟨10.1007/s10618-012-0292-8⟩. ⟨lirmm-00798161⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS IRSTEA ADVANSE LIRMM AGROPOLIS TETIS MIPS UNIV-MONTPELLIER INRAE

215 Consultations

174 Téléchargements

Hierarchical Co-Clustering: Off-line and Incremental Approaches

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager