Keys and Pseudo-Keys Detection for Web Datasets Cleansing and Interlinking

Manuel Atencia 1 Jérôme David 1 François Scharffe 2
2 TATOO - Fouille de données environnementales
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : This paper introduces a method for analyzing web datasets based on key dependencies. The classical notion of a key in relational databases is adapted to RDF datasets. In order to better deal with web data of variable quality, the definition of a pseudo-key is presented. An RDF vocabulary for representing keys is also provided. An algorithm to discover keys and pseudo-keys is described. Experimental results show that even for a big dataset such as DBpedia, the runtime of the algorithm is still reasonable. Two applications are further discussed: (i) detection of errors in RDF datasets, and (ii) datasets interlinking.
Type de document :
Communication dans un congrès
EKAW: Knowledge Engineering and Knowledge Management, Oct 2012, Galway City, Ireland. Springer, EKAW'2012: The 18th International Conference on Knowledge Engineering and Knowledge Management, pp.144-153, 2012
Liste complète des métadonnées

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00802171
Contributeur : François Scharffe <>
Soumis le : mardi 19 mars 2013 - 11:47:07
Dernière modification le : jeudi 11 janvier 2018 - 06:26:17

Identifiants

  • HAL Id : lirmm-00802171, version 1

Collections

Citation

Manuel Atencia, Jérôme David, François Scharffe. Keys and Pseudo-Keys Detection for Web Datasets Cleansing and Interlinking. EKAW: Knowledge Engineering and Knowledge Management, Oct 2012, Galway City, Ireland. Springer, EKAW'2012: The 18th International Conference on Knowledge Engineering and Knowledge Management, pp.144-153, 2012. 〈lirmm-00802171〉

Partager

Métriques

Consultations de la notice

70