Generalized Graph Clustering: Recognizing (p,q)-Cluster
Résumé
Cluster Editing is a classical graph theoretic approach to tackle the problem of data set clustering: it consists of modifying a similarity graph into a disjoint union of cliques, i.e, clusters. As pointed out in a number of recent papers, the cluster editing model is too rigid to capture common features of real data sets. Several generalizations have thereby been proposed. In this paper, we introduce (p, q)-cluster graphs, where each cluster misses at most p edges to be a clique, and there are at most q edges between a cluster and other clusters. Our generalization is the first one that allows a large number of false positives and negatives in total, while bounding the number of these locally for each cluster by p and q. We show that recognizing (p, q)-cluster graphs is NP-complete when p and q are input. On the positive side, we show that (0, q)-cluster, (p, 1)-cluster, (p, 2)-cluster, and (1, 3)-cluster graphs can be recognized in polynomial time.