Entity Resolution for Distributed Probabilistic Data

Naser Ayat 1 Reza Akbarinia 2 Hamideh Afsarmanesh 1 Patrick Valduriez 2, 3
2 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : The problem of entity resolution over probabilistic data (ERPD) arises in many applications that have to deal with probabilistic data. In many of these applications, probabilistic data is distributed among a number of nodes. The simple, centralized approach to the ERPD problem does not scale well as large amounts of data need to be sent to a central node. In this paper, we present FD, a fully distributed algorithm for dealing with the ERPD problem over distributed data, with the goal of minimizing bandwidth usage and reducing processing time. FD is completely distributed and does not depends on the existence of certain nodes. We validated FD through implementation over a 75-node cluster. We used both synthetic and real-world data in our experiments. Our performance evaluation shows that FD can achieve major performance gains in terms of bandwidth usage and response time.
Type de document :
Article dans une revue
Distributed and Parallel Databases, Springer, 2013, 31 (4), pp.509-542. 〈10.1007/s10619-013-7129-3〉
Liste complète des métadonnées

Littérature citée [26 références]  Voir  Masquer  Télécharger

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00879631
Contributeur : Reza Akbarinia <>
Soumis le : lundi 4 novembre 2013 - 14:30:37
Dernière modification le : vendredi 12 janvier 2018 - 01:50:29
Document(s) archivé(s) le : vendredi 7 avril 2017 - 20:32:16

Fichier

D-ERUD.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Naser Ayat, Reza Akbarinia, Hamideh Afsarmanesh, Patrick Valduriez. Entity Resolution for Distributed Probabilistic Data. Distributed and Parallel Databases, Springer, 2013, 31 (4), pp.509-542. 〈10.1007/s10619-013-7129-3〉. 〈lirmm-00879631〉

Partager

Métriques

Consultations de la notice

311

Téléchargements de fichiers

368