Managing the Acronym/Expansion Identification Process for Text-Mining Applications

Mathieu Roche; Violaine Prince

Article Dans Une Revue International Journal of Software and Informatics (IJSI) Année : 2008

Managing the Acronym/Expansion Identification Process for Text-Mining Applications

(1) , (1)

Mathieu Roche

Fonction : Auteur
PersonId : 4967
IdHAL : mathieu-roche
ORCID : 0000-0003-3272-8568
IdRef : 09042087X

Exploration et exploitation de données textuelles

Violaine Prince

Fonction : Auteur
PersonId : 942907
ORCID : 0000-0002-5997-9677

Exploration et exploitation de données textuelles

Résumé

This paper deals with an acronym/deﬁnition extraction approach from textual data (corpora) and the disambiguation of these deﬁnitions (or expansions). Both steps of our global process of acquisition and management of acronyms are precisely described. The ﬁrst step consists in using markers such as brackets to identify expansion candidates. The alignment of the letters allows to select the acronym/deﬁnition couples. The second step is to deﬁne the relevant expansion of an acronym in a given context. Our method is based on statistical measurements (Mutual Information, Cubic Mutual Information, Dice Measure) and the results provided by search engines. This paper presents an evaluation of the global process from real data (general and specialized domains).

Domaines

Autre Traitement du texte et du document Intelligence artificielle [cs.AI] Bio-informatique [q-bio.QM] Bio-Informatique, Biologie Systémique [q-bio.QM] Web

Mathieu Roche : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00349235

Soumis le : jeudi 25 décembre 2008-20:58:09

Dernière modification le : mercredi 5 juillet 2023-17:05:32

Dates et versions

lirmm-00349235 , version 1 (25-12-2008)

Identifiants

HAL Id : lirmm-00349235 , version 1

Citer

Mathieu Roche, Violaine Prince. Managing the Acronym/Expansion Identification Process for Text-Mining Applications. International Journal of Software and Informatics (IJSI), 2008, Special issue on Data Mining, 2 (2), pp.163-179. ⟨lirmm-00349235⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS TEXTE LIRMM MIPS UNIV-MONTPELLIER

182 Consultations

0 Téléchargements

Managing the Acronym/Expansion Identification Process for Text-Mining Applications

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager