Automatic Complex Schema Mapping Discovery and Validation by Structurally Coherent Frequent Mini-Taxonomies

Khalid Saleem 1 Zohra Bellahsene 2
2 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : Match cardinality aspect in schema matching is categorized as simple element level matching and complex structural level matching. Simple matching comprises of 1:1, 1:n and n:1 match cardinality, whereas n:m match cardinality is considered to be complex matching. Most of the existing approaches and tools give good 1:1 local and global match cardinality but lack the capabilities for handling the complex cardinality issue. In this paper we demonstrate an automatic approach for creation and validation of n:m schema mappings. Our technique is applicable to hierarchical structures like XML Schema. Basic idea is to propose an n:m nodes mapping between children (leaf nodes) of two matching non-leaf nodes of two schemas. The similarity computation of the two non-leaf nodes is based upon the syntactic and linguistic similarity of node labels; supported by similarity among the ancestral paths from nodes to the root. The n:m mapping proposition is then verified with the help of mini-taxonomies extracted from a large set of same domain schema trees. The mini-taxonomies are automatically extracted using frequent sub-tree mining approach; higher the frequency, higher the confidence of reliability. The verification algorithm performs comparison between the minitaxonomies and the subtrees rooted at non-leaf nodes which guide the system for authenticity of proposed n:m mapping.
Type de document :
Autre publication
2008
Liste complète des métadonnées

Littérature citée [24 références]  Voir  Masquer  Télécharger

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00331358
Contributeur : Khalid Saleem <>
Soumis le : jeudi 16 octobre 2008 - 14:09:24
Dernière modification le : jeudi 11 janvier 2018 - 16:59:01
Document(s) archivé(s) le : lundi 7 juin 2010 - 20:17:20

Fichier

paper331.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : lirmm-00331358, version 1

Citation

Khalid Saleem, Zohra Bellahsene. Automatic Complex Schema Mapping Discovery and Validation by Structurally Coherent Frequent Mini-Taxonomies. 2008. 〈lirmm-00331358〉

Partager

Métriques

Consultations de la notice

220

Téléchargements de fichiers

159