Automatic Complex Schema Mapping Discovery and Validation by Structurally Coherent Frequent Mini-Taxonomies

Khalid Saleem 1 Zohra Bellahsene 2
2 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : Match cardinality aspect in schema matching is categorized as simple element level matching and complex structural level matching. Simple matching comprises of 1:1, 1:n and n:1 match cardinality, whereas n:m match cardinality is considered to be complex matching. Most of the existing approaches and tools give good 1:1 local and global match cardinality but lack the capabilities for handling the complex cardinality issue. In this paper we demonstrate an automatic approach for creation and validation of n:m schema mappings. Our technique is applicable to hierarchical structures like XML Schema. Basic idea is to propose an n:m nodes mapping between children (leaf nodes) of two matching non-leaf nodes of two schemas. The similarity computation of the two non-leaf nodes is based upon the syntactic and linguistic similarity of node labels; supported by similarity among the ancestral paths from nodes to the root. The n:m mapping proposition is then verified with the help of mini-taxonomies extracted from a large set of same domain schema trees. The mini-taxonomies are automatically extracted using frequent sub-tree mining approach; higher the frequency, higher the confidence of reliability. The verification algorithm performs comparison between the minitaxonomies and the subtrees rooted at non-leaf nodes which guide the system for authenticity of proposed n:m mapping.
Document type :
Other publications
Complete list of metadatas

Cited literature [24 references]  Display  Hide  Download

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00331358
Contributor : Khalid Saleem <>
Submitted on : Thursday, October 16, 2008 - 2:09:24 PM
Last modification on : Wednesday, November 14, 2018 - 2:56:02 PM
Long-term archiving on: Monday, June 7, 2010 - 8:17:20 PM

File

paper331.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : lirmm-00331358, version 1

Collections

Citation

Khalid Saleem, Zohra Bellahsene. Automatic Complex Schema Mapping Discovery and Validation by Structurally Coherent Frequent Mini-Taxonomies. 2008. ⟨lirmm-00331358⟩

Share

Metrics

Record views

523

Files downloads

448