BMatch: A Quality/Performance Balanced Approach for Large Scale Schema Matching - LIRMM - Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier
Rapport Année : 2008

BMatch: A Quality/Performance Balanced Approach for Large Scale Schema Matching

Résumé

Schema matching is a crucial task to gather information of the same domain. This is even more the case when dealing with data warehouses, where a large number of data sources are available and require matching and integra- tion. However, the matching process is still largely performed manually or semi- automatically, thus discouraging the use of large scale integration systems. In- deed, these large scale scenarios require a solution which ensures both an ac- ceptable matching quality and good performance. In this article, we present an approach to efficiently match a large number of schemas. The quality aspect is based on the combination of terminological methods and cosine measure between context vectors. The performance aspect relies on a B-tree indexing structure to reduce the search space. Finally, our approach has been implemented and exper- iments with real sets of schemas show that it is both scalable and provides an acceptable quality of matches as compared to results obtained by the most refer- enced schema matching tools.
Fichier principal
Vignette du fichier
RR-revised2008.pdf (230.17 Ko) Télécharger le fichier
appendixCourbesROC.pdf (144.28 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)
Format Autre

Dates et versions

lirmm-00326887 , version 1 (06-10-2008)

Identifiants

  • HAL Id : lirmm-00326887 , version 1

Citer

Fabien Duchateau, Mathieu Roche, Zohra Bellahsene. BMatch: A Quality/Performance Balanced Approach for Large Scale Schema Matching. RR-08023, 2008. ⟨lirmm-00326887⟩
156 Consultations
284 Téléchargements

Partager

More