GenDesc: A Partial Generalization of Linguistic Features For Text Classification

Guillaume Tisserant; Violaine Prince; Mathieu Roche

doi:10.1007/978-3-642-38824-8_35

Communication Dans Un Congrès Année : 2013

GenDesc: A Partial Generalization of Linguistic Features For Text Classification

(1) , (2) , (2)

1
2

Guillaume Tisserant

Fonction : Auteur
PersonId : 932279

Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier

Violaine Prince

Fonction : Auteur
PersonId : 942907
ORCID : 0000-0002-5997-9677

Exploration et exploitation de données textuelles

Mathieu Roche

Fonction : Auteur
PersonId : 4967
IdHAL : mathieu-roche
ORCID : 0000-0003-3272-8568
IdRef : 09042087X

Exploration et exploitation de données textuelles

Résumé

This paper presents an application that belongs to automatic classification of textual data by supervised learning algorithms. The aim is to study how a better textual data representation can improve the quality of classification. Considering that a word meaning depends on its context, we propose to use features that give important information about word contexts. We present a method named GenDesc, which generalizes (with POS tags) the least relevant words for the classification task.

Mots clés

Inverse document frequency Linguistic feature Sentiment analysis Textual data Ranking function

Domaines

Traitement du texte et du document

Guillaume Tisserant : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00823476

Soumis le : vendredi 17 mai 2013-09:58:26

Dernière modification le : vendredi 24 mars 2023-14:52:57

Dates et versions

lirmm-00823476 , version 1 (17-05-2013)

Identifiants

HAL Id : lirmm-00823476 , version 1
DOI : 10.1007/978-3-642-38824-8_35

Citer

Guillaume Tisserant, Violaine Prince, Mathieu Roche. GenDesc: A Partial Generalization of Linguistic Features For Text Classification. NLDB: Natural Language Processing and Information Systems, Jun 2013, Salford, United Kingdom. pp.343-348, ⟨10.1007/978-3-642-38824-8_35⟩. ⟨lirmm-00823476⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS TEXTE LIRMM MIPS UNIV-MONTPELLIER

106 Consultations

0 Téléchargements

GenDesc: A Partial Generalization of Linguistic Features For Text Classification

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager