Performance Oriented Schema Matching

Khalid Saleem 1 Zohra Bellahsene 2 Ela Hunt 3
2 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : Semantic matching of schemas in heterogeneous data sharing systems is time consuming and error prone. Existing mapping tools employ semi-automatic techniques for mapping two schemas at a time. In a large-scale scenario, where data sharing involves a large number of data sources, such techniques are not suitable. We present a new robust mapping method which creates a mediated schema tree from a large set of input XML schema trees and defines mappings from the contributing schema to the mediated schema. The result is an almost automatic technique giving good performance with approximate semantic match quality. Our method uses node ranks calculated by pre-order traversal. It combines tree mining with semantic label clustering which minimizes the target search space and improves performance, thus making the algorithm suitable for large scale data sharing. We report on experiments with up to 80 schemas containing 83,770 nodes, with our prototype implementation taking 587 seconds to match and merge them to create a mediated schema and to return mappings from input schemas to the mediated schema.
Document type :
Conference papers
Complete list of metadatas

Cited literature [11 references]  Display  Hide  Download

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00171055
Contributor : Khalid Saleem <>
Submitted on : Tuesday, September 11, 2007 - 1:56:24 PM
Last modification on : Tuesday, March 5, 2019 - 9:32:52 AM
Long-term archiving on: Thursday, April 8, 2010 - 7:43:11 PM

Identifiers

  • HAL Id : lirmm-00171055, version 1

Collections

Citation

Khalid Saleem, Zohra Bellahsene, Ela Hunt. Performance Oriented Schema Matching. DEXA'07: 18th International Conference on Database and Expert Systems Applications, Sep 2007, pp.844-853. ⟨lirmm-00171055⟩

Share

Metrics

Record views

221

Files downloads

273