United we stand: Using multiple strategies for topic labeling

Topic labeling aims at providing a sound, possibly multi-words, label that depicts a topic drawn from a topic model. This is of the utmost practical interest in order to quickly grasp a topic informa-tional content-the usual ranked list of words that maximizes a topic presents limitations for this task. In this paper, we introduce three new unsupervised n-gram topic labelers that achieve comparable results than the existing unsupervised topic labelers but following different assumptions. We demonstrate that combining topic labelers-even only two-makes it possible to target a 64% improvement with respect to single topic labeler approaches and therefore opens research in that direction. Finally, we introduce a fourth topic labeler that extracts representative sentences, using Dirichlet smoothing to add contextual information. This sentence-based labeler provides strong surrogate candidates when n-gram topic labelers fall short on providing relevant labels, leading up to 94% topic covering.

Domaines

Base de données [cs.DB]

Fichier principal

NLDB_Julien.pdf (456.13 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Pascal Poncelet : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01910614

Soumis le : jeudi 1 novembre 2018-15:23:58

Dernière modification le : mardi 9 juillet 2024-11:00:04

Archivage à long terme le : samedi 2 février 2019-14:00:06

Dates et versions

lirmm-01910614 , version 1 (01-11-2018)

Identifiants

HAL Id : lirmm-01910614 , version 1
DOI : 10.1007/978-3-319-91947-8_37

Citer

Antoine Gourru, Julien Velcin, Mathieu Roche, Christophe Gravier, Pascal Poncelet. United we stand: Using multiple strategies for topic labeling. NLDB: Natural Language Processing and Information Systems, Jun 2018, Paris, France. pp.352-363, ⟨10.1007/978-3-319-91947-8_37⟩. ⟨lirmm-01910614⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-ST-ETIENNE CIRAD IOGS AGROPARISTECH CNRS UNIV-LYON1 UNIV-LYON2 IRSTEA LAHC PARISTECH ERIC ADVANSE LIRMM AGROPOLIS TETIS MIPS UNIV-MONTPELLIER UDL INRAE INRAEOCCITANIEMONTPELLIER MATHNUM LABORATOIRE-HUBERT-CURIEN

224 Consultations

485 Téléchargements