Tree mining: Equivalence classes for candidate generation

With the rise of active research fields such as bioinformatics, taxonomies and the growing use of XML documents, tree data are playing a more and more important role. Mining for frequent subtrees from these data is thus an active research problem and traditional methods (e.g., itemset mining from transactional databases) have to be extended in order to tackle the problem of handling tree-based data. Some approaches have been proposed in the literature, mainly based on generate-and-prune methods. However, they generate a large volume of candidates before pruning them, whereas it could be possible to discard some solutions as they contain unfrequent subtrees. We thus propose a novel approach, called pivot, based on equivalence classes in order to decrease the number of candidates. Three equivalence classes are defined, the first one relying on a right equivalence relation between two trees, the second one on a left equivalence relation, and the third one on the ground of a root equivalence relation. In this paper, we introduce this new method, showing that it is complete (i.e., no frequent subtree is forgotten), and efficient, as illustrated by the experiments led on synthetic and real datasets.

Mots clés

Candidate generation Tree mining Data mining Equivalent classes

Domaines

Base de données [cs.DB]

Fichier principal

Treemining.pdf (453.64 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Pascal Poncelet : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00798708

Soumis le : vendredi 5 avril 2019-19:11:38

Dernière modification le : mardi 12 mars 2024-10:43:31

Archivage à long terme le : samedi 6 juillet 2019-16:16:11

Dates et versions

lirmm-00798708 , version 1 (05-04-2019)

Identifiants

HAL Id : lirmm-00798708 , version 1
DOI : 10.3233/IDA-2009-0381

Citer

Federico del Razo Lopez, Anne Laurent, Pascal Poncelet, Maguelonne Teisseire. Tree mining: Equivalence classes for candidate generation. Intelligent Data Analysis, 2009, 13 (4), pp.555-573. ⟨10.3233/IDA-2009-0381⟩. ⟨lirmm-00798708⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CIRAD AGROPARISTECH CNRS IRSTEA PARISTECH LIRMM AGROPOLIS TETIS MIPS UNIV-MONTPELLIER INRAE INRAEOCCITANIEMONTPELLIER MATHNUM

191 Consultations

221 Téléchargements