BMatch: A Quality/Performance Balanced Approach for Large Scale Schema Matching - Archive ouverte HAL Access content directly
Reports Year : 2008

BMatch: A Quality/Performance Balanced Approach for Large Scale Schema Matching

(1) , (2) , (3)
1
2
3

Abstract

Schema matching is a crucial task to gather information of the same domain. This is even more the case when dealing with data warehouses, where a large number of data sources are available and require matching and integra- tion. However, the matching process is still largely performed manually or semi- automatically, thus discouraging the use of large scale integration systems. In- deed, these large scale scenarios require a solution which ensures both an ac- ceptable matching quality and good performance. In this article, we present an approach to efficiently match a large number of schemas. The quality aspect is based on the combination of terminological methods and cosine measure between context vectors. The performance aspect relies on a B-tree indexing structure to reduce the search space. Finally, our approach has been implemented and exper- iments with real sets of schemas show that it is both scalable and provides an acceptable quality of matches as compared to results obtained by the most refer- enced schema matching tools.
Fichier principal
Vignette du fichier
RR-revised2008.pdf (230.17 Ko) Télécharger le fichier
Vignette du fichier
appendixCourbesROC.pdf (144.28 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Format : Other

Dates and versions

lirmm-00326887 , version 1 (06-10-2008)

Identifiers

  • HAL Id : lirmm-00326887 , version 1

Cite

Fabien Duchateau, Mathieu Roche, Zohra Bellahsene. BMatch: A Quality/Performance Balanced Approach for Large Scale Schema Matching. RR-08023, 2008. ⟨lirmm-00326887⟩
136 View
222 Download

Share

Gmail Facebook Twitter LinkedIn More