BMatch: A Quality/Performance Balanced Approach for Large Scale Schema Matching

Fabien Duchateau 1 Mathieu Roche 2 Zohra Bellahsene 3
2 TEXTE - Exploration et exploitation de données textuelles
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
3 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : Schema matching is a crucial task to gather information of the same domain. This is even more the case when dealing with data warehouses, where a large number of data sources are available and require matching and integra- tion. However, the matching process is still largely performed manually or semi- automatically, thus discouraging the use of large scale integration systems. In- deed, these large scale scenarios require a solution which ensures both an ac- ceptable matching quality and good performance. In this article, we present an approach to efficiently match a large number of schemas. The quality aspect is based on the combination of terminological methods and cosine measure between context vectors. The performance aspect relies on a B-tree indexing structure to reduce the search space. Finally, our approach has been implemented and exper- iments with real sets of schemas show that it is both scalable and provides an acceptable quality of matches as compared to results obtained by the most refer- enced schema matching tools.
Type de document :
Rapport
RR-08023, 2008
Liste complète des métadonnées

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00326887
Contributeur : Fabien Duchateau <>
Soumis le : lundi 6 octobre 2008 - 12:16:06
Dernière modification le : jeudi 24 mai 2018 - 15:59:23
Document(s) archivé(s) le : lundi 8 octobre 2012 - 14:01:48

Identifiants

  • HAL Id : lirmm-00326887, version 1

Collections

Citation

Fabien Duchateau, Mathieu Roche, Zohra Bellahsene. BMatch: A Quality/Performance Balanced Approach for Large Scale Schema Matching. RR-08023, 2008. 〈lirmm-00326887〉

Partager

Métriques

Consultations de la notice

202

Téléchargements de fichiers

331