Approaches of anonymisation of an SMS corpus - LIRMM - Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier
Communication Dans Un Congrès Année : 2013

Approaches of anonymisation of an SMS corpus

Résumé

This paper presents two anonymisation methods to process an SMS corpus. The first one is based on an unsupervised approach called Seek&Hide. The implemented system uses several dictionaries and rules in order to predict if a SMS needs anonymisation process. The second method is based on a supervised approach using machine learning techniques. We evaluate the two approaches and we propose a way to use them together. Only when the two methods do not agree on their prediction, will the SMS be checked by a human expert. This greatly reduces the cost of anonymising the corpus.
Fichier principal
Vignette du fichier
Approaches_of_anonymisation_of_an_SMS_co.pdf (140.59 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

lirmm-00816285 , version 1 (24-02-2017)

Identifiants

Citer

Namrata Patel, Pierre Accorsi, Diana Inkpen, Cédric Lopez, Mathieu Roche. Approaches of anonymisation of an SMS corpus. CICLing: Conference on Intelligent Text Processing and Computational Linguistics, Mar 2013, Samos, Greece. pp.77-88, ⟨10.1007/978-3-642-37247-6_7⟩. ⟨lirmm-00816285⟩
320 Consultations
543 Téléchargements

Altmetric

Partager

More