An Ontology-Based Method for Duplicate Detection in Web Data Tables

Patrice Buche 1 Juliette Dibie 2 Rania Khefifi 3, 4 Fatiha Saïs 4, 5
1 GRAPHIK - Graphs for Inferences on Knowledge
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
5 LEO - Distributed and heterogeneous data and knowledge
UP11 - Université Paris-Sud - Paris 11, Inria Saclay - Ile de France, CNRS - Centre National de la Recherche Scientifique : UMR8623
Abstract : We present, in this paper, a duplicate detection method in semantically annotated Web data tables, driven by a domain Termino- Ontological Resource (TOR). Our method relies on the fuzzy semantic annotations automatically associated with the Web data tables. A fuzzy semantic annotation is automatically associated with each row of a Web data table. It corresponds to the instantiation of a composed concept of the domain TOR, which represents the semantic n-ary relationship that exists between the columns of the Web data table. A fuzzy semantic annotation contains fuzzy values expressed as fuzzy sets. We propose an automatic duplicate detection method which consists in detecting the pairs of duplicate fuzzy semantic annotations and relies on (i) knowledge declared in the domain TOR and on (ii) similarity measures between fuzzy sets. Two new similarity measures are defined to compare both, the symbolic fuzzy values and the numerical fuzzy values. Our method has been tested on a real application in the domain of chemical risk in food.
Type de document :
Communication dans un congrès
A. Hameurlain. DEXA: Database and Expert Systems Applications, Aug 2011, Toulouse, France. 22nd International Conference on Database and Expert Systems Applications, LNCS (6860), pp.511-525, 2011, Database and Expert Systems Applications. 〈http://www.dexa.org/〉. 〈10.1007/978-3-642-23088-2_38〉
Domaine :
Liste complète des métadonnées

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00611944
Contributeur : Patrice Buche <>
Soumis le : mercredi 27 juillet 2011 - 17:47:20
Dernière modification le : jeudi 28 juin 2018 - 11:06:59

Identifiants

Citation

Patrice Buche, Juliette Dibie, Rania Khefifi, Fatiha Saïs. An Ontology-Based Method for Duplicate Detection in Web Data Tables. A. Hameurlain. DEXA: Database and Expert Systems Applications, Aug 2011, Toulouse, France. 22nd International Conference on Database and Expert Systems Applications, LNCS (6860), pp.511-525, 2011, Database and Expert Systems Applications. 〈http://www.dexa.org/〉. 〈10.1007/978-3-642-23088-2_38〉. 〈lirmm-00611944〉

Partager

Métriques

Consultations de la notice

494