Data Structures for Efficient Tree Mining: From Crisp to Soft Embedding Constraints
Abstract
XML is playing an increasing role in data exchanges and the volume of available resources is thus growing dramatically. As they are heterogeneous, these resources must be translated into a {\em mediator} schema to be queried. For this purpose, automatic tools are required. These tools must allow the extraction of common data structures from the tree-like XML data. In this paper, we present a novel approach based on a low memory-consuming representation which can be improved by considering a binary representation. We show that these representations have many properties to enhance subtree mining algorithms, especially when considering soft tree embedding constraints. Experiments highlight the interest of our proposition.