ICD-10 coding of death certificates with the NCBO and SIFR Annotators at CLEF eHealth 2017

Andon Tchechmedjiev 1, 2 Amine Abdaoui 2 Vincent Emonet 1 Clement Jonquet 1, 3
1 SMILE - Système Multi-agent, Interaction, Langage, Evolution
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
2 ADVANSE - ADVanced Analytics for data SciencE
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : The SIFR BioPortal is an open platform to host French biomedical ontologies and terminologies based on the technology developed by the US National Center for Biomedical Ontology (NCBO). The portal facilitates the use and fostering of terminologies and ontologies by offering a set of services including semantic annotation. The SIFR Anno-tator (http://bioportal.lirmm.fr/annotator) is a publicly accessible, easily usable ontology-based annotation tool to process French text data and facilitate semantic indexing. The web service relies on the ontology content (preferred labels and synonyms) as well as on the semantic of the ontologies (is-a hierarchies) and their mappings. The SIFR BioPortal also offers the possibility of querying the original NCBO Annotator for English text via a dedicated proxy that extends the original functionality. In this paper, we present a preliminary performance evaluation of the generic annotation web service (i.e., not specifically customized) for coding death certificates i.e., annotating with ICD-10 codes. This evaluation is done against the CépiDC/CDC CLEF eHealth 2017 task 1 manually annotated corpus. For this purpose, we have built custom SKOS vocabularies from the CéPIDC/CDC dictionaries as well as training and development corpora, for all three tasks using a most frequent code heuristic to assign ambiguous labels. We then submitted the vocabularies to the NCBO and SIFR BioPortal and used the annotation services on task 1 datasets. We obtained, for our best runs on each corpus the following results: English raw corpus (69.08% P, 51.37% R, 58,92% F1); French raw corpus (54.11% P, 48.00% R, 50,87% F1); French aligned corpus (50.63% P, 52.97% R, 51.77% F1).
Type de document :
Communication dans un congrès
Working Notes of CLEF eHealth Evaluation Lab, Sep 2017, Dublin, Ireland. CEUR, CEUR Workshop Proceedings, 1866, 〈https://sites.google.com/site/clefehealth2017/〉
Liste complète des métadonnées

Littérature citée [21 références]  Voir  Masquer  Télécharger

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01605359
Contributeur : Clement Jonquet <>
Soumis le : lundi 2 octobre 2017 - 21:39:08
Dernière modification le : jeudi 24 mai 2018 - 15:59:25

Fichier

Article-CLEFeHealth-2017_SIFR_...
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : lirmm-01605359, version 1

Collections

Citation

Andon Tchechmedjiev, Amine Abdaoui, Vincent Emonet, Clement Jonquet. ICD-10 coding of death certificates with the NCBO and SIFR Annotators at CLEF eHealth 2017. Working Notes of CLEF eHealth Evaluation Lab, Sep 2017, Dublin, Ireland. CEUR, CEUR Workshop Proceedings, 1866, 〈https://sites.google.com/site/clefehealth2017/〉. 〈lirmm-01605359〉

Partager

Métriques

Consultations de la notice

168

Téléchargements de fichiers

54