Towards a mixed approach to extract biomedical terms from documents

Juan Antonio Lossio-Ventura; Clement Jonquet; Mathieu Roche; Maguelonne Teisseire

doi:10.4018/ijkdb.2014010101

Article Dans Une Revue Knowledge Discovery in Bioinformatics Année : 2014

Towards a mixed approach to extract biomedical terms from documents

(1) , (2) , (2) , (2)

1
2

Juan Antonio Lossio-Ventura

Fonction : Auteur correspondant
PersonId : 958120

Connectez-vous pour contacter l'auteur

Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier

Clement Jonquet

Fonction : Auteur
PersonId : 3067
IdHAL : jonquet
ORCID : 0000-0002-2404-1582
IdRef : 076431215

Fouille de données environnementales

Mathieu Roche

Fonction : Auteur
PersonId : 4967
IdHAL : mathieu-roche
ORCID : 0000-0003-3272-8568
IdRef : 09042087X

Fouille de données environnementales

Maguelonne Teisseire

Fonction : Auteur
PersonId : 8645
IdHAL : maguelonne-teisseire
ORCID : 0000-0001-9313-6414
IdRef : 117436593

Fouille de données environnementales

Résumé

The proposed work aims at automatically extracting biomedical terms from free text. We present new extraction methods taking into account linguistic patterns specialized for the biomedical field, statistic term extraction measures such as C-value and statistic keyword extraction measures such as Okapi BM25, and TFIDF. These measures are combined in order to improve the extraction process and we investigate which combinations are the more relevant associated to different contexts. Experimental results show that an appropriate harmonic mean of C-value associated to keyword extraction measures offers better precision, both for single-word and multi-words term extraction. Experiments describe the extraction of English and French biomedical terms from a corpus of laboratory tests available online. The results are validated by using UMLS (in English) and only MeSH (in French) as reference.

Mots clés

Biomedical Natural Language Processing (BioNLP) Biomedical Term Extraction Biomedical Thesaurus Statistic Measure Text Mining

Domaines

Bio-informatique [q-bio.QM] Bio-Informatique, Biologie Systémique [q-bio.QM] Traitement du texte et du document Recherche d'information [cs.IR]

Fichier principal

Juan_Lossio_Paper.pdf (3.35 Mo)

Origine	Fichiers produits par l'(les) auteur(s)

Juan Antonio Lossio Ventura : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00859846

Soumis le : lundi 9 septembre 2013-15:36:33

Dernière modification le : vendredi 24 mars 2023-14:52:57

Archivage à long terme le : jeudi 12 décembre 2013-10:10:48

Dates et versions

lirmm-00859846 , version 1 (09-09-2013)

lirmm-00859846 , version 2 (07-07-2014)

Identifiants

HAL Id : lirmm-00859846 , version 1
DOI : 10.4018/ijkdb.2014010101

Citer

Juan Antonio Lossio-Ventura, Clement Jonquet, Mathieu Roche, Maguelonne Teisseire. Towards a mixed approach to extract biomedical terms from documents. Knowledge Discovery in Bioinformatics, 2014, 4 (1), pp.15. ⟨10.4018/ijkdb.2014010101⟩. ⟨lirmm-00859846v1⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

572 Consultations

841 Téléchargements

Towards a mixed approach to extract biomedical terms from documents

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Altmetric

Partager