Towards Electronic SMS Dictionary Construction: An Alignment-based Approach

Abstract : In this paper, we propose a method for aligning text messages (entitled AlignSMS) in order to automatically build an SMS dictionary. An extract of 100 text messages from the 88milSMS corpus (Panckhurst el al., 2013, 2014) was used as an initial test. More than 90,000 authentic text messages in French were collected from the general public by a group of academics in the south of France in the context of the sud4science project (http://www.sud4science.org). This project is itself part of a vast international SMS data collection project, entitled sms4science (http://www.sms4science.org, Fairon et al. 2006, Cougnon, 2014). After corpus collation, pre-processing and anonymisation (Accorsi et al., 2012, Patel et al., 2013), we discuss how "raw" anonymised text messages can be transcoded into normalised text messages, using a statistical alignment method. The future objective is to set up a hybrid (symbolic/statistic) approach based on both grammar rules and our statistical AlignSMS method.
Type de document :
Communication dans un congrès
LREC: Language Resources and Evaluation Conference, May 2014, Reykjavik, Iceland. International Conference on Language Resources and Evaluation, pp.2833-2838, 2014, 〈http://lrec2014.lrec-conf.org/en/〉
Liste complète des métadonnées

Littérature citée [18 références]  Voir  Masquer  Télécharger

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01054899
Contributeur : Mathieu Roche <>
Soumis le : samedi 9 août 2014 - 18:45:14
Dernière modification le : jeudi 11 janvier 2018 - 06:27:21
Document(s) archivé(s) le : mercredi 26 novembre 2014 - 16:55:36

Fichier

753_Paper.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

  • HAL Id : lirmm-01054899, version 1

Citation

Cédric Lopez, Reda Bestandji, Mathieu Roche, Rachel Panckhurst. Towards Electronic SMS Dictionary Construction: An Alignment-based Approach. LREC: Language Resources and Evaluation Conference, May 2014, Reykjavik, Iceland. International Conference on Language Resources and Evaluation, pp.2833-2838, 2014, 〈http://lrec2014.lrec-conf.org/en/〉. 〈lirmm-01054899〉

Partager

Métriques

Consultations de la notice

245

Téléchargements de fichiers

232