Skip to Main content Skip to Navigation
Journal articles

PORSCHE: Performance ORiented SCHEma mediation

Khalid Saleem 1 Zohra Bellahsene 2 Ela Hunt 3
2 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : Semantic matching of schemas in heterogeneous data sharing systems is time consuming and error prone. Existing mapping tools employ semi-automatic techniques for mapping two schemas at a time. In a large-scale scenario, where data sharing involves a large number of data sources, such techniques are not suitable. We present a new robust automatic method which discovers semantic schema matches in a large set of XML schemas, incrementally creates an integrated schema encompassing all schema trees, and defines mappings from the contributing schemas to the integrated schema. Our method, PORSCHE (Performance ORiented SCHEma mediation), utilises a holistic approach which first clusters the nodes based on linguistic label similarity. Then it applies a tree mining technique using node ranks calculated during depthfirst traversal. This minimises the target node search space and improves performance, which makes the technique suitable for large-scale data sharing. The PORSCHE framework is hybrid in nature and flexible enough to incorporate more matching techniques or algorithms. We report on experiments with up to 80 schemas containing 83,770 nodes, with our prototype implementation taking 587 s on average to match and merge them, resulting in an integrated schema and returning mappings from all input schemas to the integrated schema. The quality of matching in PORSCHE is shown using precision, recall and F-measure on randomly selected pairs of schemas from the same domain. We also discuss the integrity of the mediated schema in the light of completeness and minimality measures.
Document type :
Journal articles
Complete list of metadatas

Cited literature [26 references]  Display  Hide  Download

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00331330
Contributor : Khalid Saleem <>
Submitted on : Wednesday, March 20, 2019 - 3:02:34 PM
Last modification on : Wednesday, March 20, 2019 - 3:09:15 PM
Document(s) archivé(s) le : Friday, June 21, 2019 - 8:50:56 PM

File

Porsche.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Khalid Saleem, Zohra Bellahsene, Ela Hunt. PORSCHE: Performance ORiented SCHEma mediation. Information Systems, Elsevier, 2008, 33 (7-8), pp.637-657. ⟨10.1016/j.is.2008.01.010⟩. ⟨lirmm-00331330⟩

Share

Metrics

Record views

287

Files downloads

180