Non-parametric Bayesian annotator combination

Maximilien Servajean; Romain Chailan; Alexis Joly

doi:10.1016/j.ins.2018.01.020

Article Dans Une Revue Information Sciences Année : 2018

Non-parametric Bayesian annotator combination

(1, 2, 3, 4) , (3) , (3, 4)

1
2
3
4

Maximilien Servajean

Fonction : Auteur
PersonId : 169562
IdHAL : maximilien-servajean
ORCID : 0000-0002-9426-2583
IdRef : 196392160

Université Paul-Valéry - Montpellier 3

ADVanced Analytics for data SciencE

Scientific Data Management

Université de Montpellier

Romain Chailan

Fonction : Auteur
PersonId : 932119

Scientific Data Management

Alexis Joly

Fonction : Auteur
PersonId : 12088
IdHAL : alexis-joly
ORCID : 0000-0002-2161-9940
IdRef : 107969394

Scientific Data Management

Université de Montpellier

Résumé

Relying on a single imperfect human annotator is not recommended in real crowdsourced classification problems. In practice, several annotators' propositions are generally aggregated to obtain a better classification accuracy. Bayesian approaches, by modelling the relationship between each annotator's output and the possible true labels (classes), have been shown to outperform other simpler models. Unfortunately, they assume that the total number of true labels is known. This is not the case in lots of realistic scenarios such as open-world classification where the number of possible labels is undetermined and may change over time. In this paper, we show how to set a non-parametric prior over the possible label set using the Dirichlet process in order to overcome this limitation. We illustrate this prior over the Bayesian annotator combination (BAC) model from the state of the art, resulting in the so-called non-parametric BAC (NPBAC). We show how to derive its variational equations to evaluate the model and how to assess it when the Dirichlet process has a prior using the Laplace method. We apply the model to several scenarios related to closed-world classification , open-world classification and novelty detection on a dataset previously published and on two datasets related to plant classification. Our experiments show that NPBAC is able to determine the true number of labels, but also and surprisingly, it largely outperforms the parametric annotator combination by modelling more complex confusions, in particular when few or no training data are available.

Mots clés

Dirichlet process Combination Classification Laplace Variational Bayesian Crowdsourcing

Domaines

Informatique [cs] Apprentissage [cs.LG]

Fichier principal

parametric-bayesian-annotator-without_highlighted_corrections.pdf (484.25 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Maximilien Servajean : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01703020

Soumis le : mercredi 7 février 2018-13:54:30

Dernière modification le : jeudi 16 mai 2024-13:30:45

Dates et versions

lirmm-01703020 , version 1 (07-02-2018)

Identifiants

HAL Id : lirmm-01703020 , version 1
DOI : 10.1016/j.ins.2018.01.020

Citer

Maximilien Servajean, Romain Chailan, Alexis Joly. Non-parametric Bayesian annotator combination. Information Sciences, 2018, 436-437, pp.131-145. ⟨10.1016/j.ins.2018.01.020⟩. ⟨lirmm-01703020⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-MONTP3 ADVANSE ZENITH LIRMM INRIA2 MIPS UNIV-MONTPELLIER AMIS

306 Consultations

287 Téléchargements

Non-parametric Bayesian annotator combination

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager