BMatch: A Quality/Performance Balanced Approach for Large Scale Schema Matching
Résumé
Schema matching is a crucial task to gather information of the same domain. This is even more the case when dealing with data warehouses, where a large number of data sources are available and require matching and integra- tion. However, the matching process is still largely performed manually or semi- automatically, thus discouraging the use of large scale integration systems. In- deed, these large scale scenarios require a solution which ensures both an ac- ceptable matching quality and good performance. In this article, we present an approach to efficiently match a large number of schemas. The quality aspect is based on the combination of terminological methods and cosine measure between context vectors. The performance aspect relies on a B-tree indexing structure to reduce the search space. Finally, our approach has been implemented and exper- iments with real sets of schemas show that it is both scalable and provides an acceptable quality of matches as compared to results obtained by the most refer- enced schema matching tools.
Fichier principal
RR-revised2008.pdf (230.17 Ko)
Télécharger le fichier
appendixCourbesROC.pdf (144.28 Ko)
Télécharger le fichier
Origine | Fichiers produits par l'(les) auteur(s) |
---|
Format | Autre |
---|