Approaches of anonymisation of an SMS corpus

Namrata Patel 1 Pierre Accorsi 2 Diana Inkpen 3 Cédric Lopez 4 Mathieu Roche 5
1 GRAPHIK - Graphs for Inferences on Knowledge
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
5 TEXTE - Exploration et exploitation de données textuelles
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : This paper presents two anonymisation methods to process an SMS corpus. The first one is based on an unsupervised approach called Seek&Hide. The implemented system uses several dictionaries and rules in order to predict if a SMS needs anonymisation process. The second method is based on a supervised approach using machine learning techniques. We evaluate the two approaches and we propose a way to use them together. Only when the two methods do not agree on their prediction, will the SMS be checked by a human expert. This greatly reduces the cost of anonymising the corpus.
Complete list of metadatas

Cited literature [15 references]  Display  Hide  Download

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00816285
Contributor : Mathieu Roche <>
Submitted on : Friday, February 24, 2017 - 6:22:43 PM
Last modification on : Thursday, February 7, 2019 - 2:50:12 PM
Long-term archiving on : Thursday, May 25, 2017 - 2:20:30 PM

File

Approaches_of_anonymisation_of...
Files produced by the author(s)

Identifiers

Collections

Citation

Namrata Patel, Pierre Accorsi, Diana Inkpen, Cédric Lopez, Mathieu Roche. Approaches of anonymisation of an SMS corpus. CICLing: Conference on Intelligent Text Processing and Computational Linguistics, Mar 2013, Samos, Greece. pp.77-88, ⟨10.1007/978-3-642-37247-6_7⟩. ⟨lirmm-00816285⟩

Share

Metrics

Record views

454

Files downloads

349