Skip to Main content Skip to Navigation
Conference papers

Approaches of anonymisation of an SMS corpus

Namrata Patel 1 Pierre Accorsi 2 Diana Inkpen 3 Cédric Lopez 4 Mathieu Roche 5 
1 GRAPHIK - Graphs for Inferences on Knowledge
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
5 TEXTE - Exploration et exploitation de données textuelles
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : This paper presents two anonymisation methods to process an SMS corpus. The first one is based on an unsupervised approach called Seek&Hide. The implemented system uses several dictionaries and rules in order to predict if a SMS needs anonymisation process. The second method is based on a supervised approach using machine learning techniques. We evaluate the two approaches and we propose a way to use them together. Only when the two methods do not agree on their prediction, will the SMS be checked by a human expert. This greatly reduces the cost of anonymising the corpus.
Complete list of metadata

Cited literature [15 references]  Display  Hide  Download
Contributor : Mathieu Roche Connect in order to contact the contributor
Submitted on : Friday, February 24, 2017 - 6:22:43 PM
Last modification on : Friday, August 5, 2022 - 3:03:22 PM
Long-term archiving on: : Thursday, May 25, 2017 - 2:20:30 PM


Files produced by the author(s)




Namrata Patel, Pierre Accorsi, Diana Inkpen, Cédric Lopez, Mathieu Roche. Approaches of anonymisation of an SMS corpus. CICLing: Conference on Intelligent Text Processing and Computational Linguistics, Mar 2013, Samos, Greece. pp.77-88, ⟨10.1007/978-3-642-37247-6_7⟩. ⟨lirmm-00816285⟩



Record views


Files downloads