Skip to Main content Skip to Navigation
Journal articles

Seek&Hide: Anonymising a French SMS corpus using natural language processing techniques

Pierre Accorsi 1, 2 Namrata Patel 1 Cédric Lopez 3 Rachel Panckhurst 4 Mathieu Roche 5
2 ADVANSE - ADVanced Analytics for data SciencE
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
5 TEXTE - Exploration et exploitation de données textuelles
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : This article presents the system Seek&Hide, a text message processing tool developed for the sud4science LR project. It performs the anonymisation/de-identification of a corpus. At present, it has been used to anonymise the sud4science LR corpus of French text messages collected during the project. This is done in two phases. In the first phase, it automatically processes over 70% of the corpus. The rest of the corpus is processed in the second phase, aided by an expert annotator via a web interface specifically designed to simplify the task.
Complete list of metadata

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00816272
Contributor : Mathieu Roche <>
Submitted on : Sunday, April 21, 2013 - 11:36:52 AM
Last modification on : Wednesday, March 10, 2021 - 10:36:03 AM

Identifiers

Citation

Pierre Accorsi, Namrata Patel, Cédric Lopez, Rachel Panckhurst, Mathieu Roche. Seek&Hide: Anonymising a French SMS corpus using natural language processing techniques. Lingvisticae Investigationes, Philadelphia; Amsterdam: John Benjamins, 2012, 35 (2), pp.163-180. ⟨10.1075/li.35.2.03acc⟩. ⟨lirmm-00816272⟩

Share

Metrics

Record views

258