Integration of Linguistic and Web Information to Improve Biomedical Terminology Extraction

Juan Antonio Lossio-Ventura 1, * Clement Jonquet 2, 3 Mathieu Roche 4, 1 Maguelonne Teisseire 4, 1
* Auteur correspondant
1 ADVANSE - ADVanced Analytics for data SciencE
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
2 SMILE - Système Multi-agent, Interaction, Langage, Evolution
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : Comprehensive terminology is essential for a community to describe, exchange, and retrieve data. In multiple domain, the explosion of text data produced has reached a level for which automatic terminology extraction and enrichment is mandatory. Automatic Term Extraction (or Recognition) methods use natural language processing to do so. Methods featuring linguistic and statistical aspects as often proposed in the literature, solve some problems related to term extraction as low frequency, complexity of the multi-word term extraction, human effort to validate candidate terms. In contrast, we present two new measures for extracting and ranking muli-word terms from domain-specific corpora, covering the all mentioned problems. In addition we demonstrate how the use of the Web to evaluate the significance of a multi-word term candidate, helps us to outperform precision results obtain on the biomedical GENIA corpus with previous reported measures such as C-value.
Type de document :
Communication dans un congrès
IDEAS: International Database Engineering & Applications Symposium, Jul 2014, Porto, Portugal. ACM, IDEAS'14: 18th International Database Engineering & Applications Symposium, pp.265-269, 2014, 〈http://confsys.encs.concordia.ca/IDEAS/ideas14/ideas14.php〉. 〈10.1145/2628194.2628208〉
Liste complète des métadonnées

Littérature citée [23 références]  Voir  Masquer  Télécharger

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01068547
Contributeur : Juan Antonio Lossio Ventura <>
Soumis le : mardi 30 septembre 2014 - 16:34:47
Dernière modification le : lundi 22 octobre 2018 - 09:54:03
Document(s) archivé(s) le : mercredi 31 décembre 2014 - 11:15:31

Fichier

IDEAS2014.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Juan Antonio Lossio-Ventura, Clement Jonquet, Mathieu Roche, Maguelonne Teisseire. Integration of Linguistic and Web Information to Improve Biomedical Terminology Extraction. IDEAS: International Database Engineering & Applications Symposium, Jul 2014, Porto, Portugal. ACM, IDEAS'14: 18th International Database Engineering & Applications Symposium, pp.265-269, 2014, 〈http://confsys.encs.concordia.ca/IDEAS/ideas14/ideas14.php〉. 〈10.1145/2628194.2628208〉. 〈lirmm-01068547v2〉

Partager

Métriques

Consultations de la notice

603

Téléchargements de fichiers

339