Biomedical term extraction: overview and a new methodology

Juan Antonio Lossio-Ventura 1, 2, * Clement Jonquet 3, 2 Mathieu Roche 4, 1 Maguelonne Teisseire 4, 1
* Auteur correspondant
1 ADVANSE - ADVanced Analytics for data SciencE
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
2 SMILE - Système Multi-agent, Interaction, Langage, Evolution
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : Terminology extraction is an essential task in domain knowledge acquisition, as well as for Information Retrieval (IR). It is also a mandatory first step aimed at building/enriching terminologies and ontologies. As often proposed in the literature, existing terminology extraction methods feature linguistic and statistical aspects and solve some problems related (but not completely) to term extraction, e.g. noise, silence, low frequency, large-corpora, complexity of the multi-word term extraction process. In contrast, we propose a cutting edge methodology to extract and to rank biomedical terms, covering the all mentioned problems. This methodology offers several measures based on linguistic, statistical, graphic and web aspects. These measures extract and rank candidate terms with excellent precision: we demonstrate that they outperform previously reported precision results for automatic term extraction, and work with different languages (English, French, and Spanish). We also demonstrate how the use of graphs and the web to assess the significance of a term candidate, enables us to outperform precision results. We evaluated our methodology on the biomedical GENIA and LabTestsOnline corpora and compared it with previously reported measures.
Type de document :
Article dans une revue
Information Retrieval Journal, Springer, 2016, Medical Information Retrieval, 19 (1), pp.59-99. 〈10.1007/s10791-015-9262-2〉
Liste complète des métadonnées

Littérature citée [63 références]  Voir  Masquer  Télécharger

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01274539
Contributeur : Juan Antonio Lossio Ventura <>
Soumis le : mardi 16 février 2016 - 03:32:50
Dernière modification le : mercredi 10 octobre 2018 - 14:28:11
Document(s) archivé(s) le : mardi 17 mai 2016 - 17:22:46

Fichier

manley.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Juan Antonio Lossio-Ventura, Clement Jonquet, Mathieu Roche, Maguelonne Teisseire. Biomedical term extraction: overview and a new methodology. Information Retrieval Journal, Springer, 2016, Medical Information Retrieval, 19 (1), pp.59-99. 〈10.1007/s10791-015-9262-2〉. 〈lirmm-01274539〉

Partager

Métriques

Consultations de la notice

454

Téléchargements de fichiers

430