Dataset Recommendation for Data Linking: An Intensional Approach

Abstract : With the growing quantity and diversity of publicly available web datasets, most notably Linked Open Data, recommending datasets, which meet specific criteria, has become an increasingly important, yet challenging problem. This task is of particular interest when addressing issues such as entity retrieval, semantic search and data linking. Here, we focus on that last issue. We introduce a dataset recommendation approach to identify linking candidates based on the presence of schema overlap between datasets. While an understanding of the nature of the content of specific datasets is a crucial prerequisite, we adopt the notion of dataset profiles, where a dataset is characterized through a set of schema concept labels that best describe it and can be potentially enriched by retrieving their textual descriptions. We identify schema overlap by the help of a semantico-frequential concept similarity measure and a ranking criterium based on the tf*idf cosine similarity. The experiments, conducted over all available linked datasets on the Linked Open Data cloud, show that our method achieves an average precision of up to 53% for a recall of 100%. As an additional contribution, our method returns the mappings between the schema concepts across datasets – a particularly useful input for the data linking step.
Type de document :
Communication dans un congrès
ESWC: European Semantic Web Conference, May 2016, Heraklion, Crete, Greece. 13th European Semantic Web Conference, LNCS (9678), pp.36-51, 2016, The Semantic Web. Latest Advances and New Domains. 〈http://2016.eswc-conferences.org〉. 〈10.1007/978-3-319-34129-3_3〉
Liste complète des métadonnées

Littérature citée [24 références]  Voir  Masquer  Télécharger

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01408036
Contributeur : Mohamed Ben Ellefi <>
Soumis le : mercredi 7 décembre 2016 - 14:06:32
Dernière modification le : mercredi 14 novembre 2018 - 14:56:02
Document(s) archivé(s) le : lundi 20 mars 2017 - 22:24:36

Fichier

ESWC2016_Mohamed.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Mohamed Ben Ellefi, Zohra Bellahsene, Konstantin Todorov, Stefan Dietze. Dataset Recommendation for Data Linking: An Intensional Approach. ESWC: European Semantic Web Conference, May 2016, Heraklion, Crete, Greece. 13th European Semantic Web Conference, LNCS (9678), pp.36-51, 2016, The Semantic Web. Latest Advances and New Domains. 〈http://2016.eswc-conferences.org〉. 〈10.1007/978-3-319-34129-3_3〉. 〈lirmm-01408036〉

Partager

Métriques

Consultations de la notice

111

Téléchargements de fichiers

201