Mining Tweet Data - Statistic and semantic information for political tweet classification

Guillaume Tisserant; Mathieu Roche; Violaine Prince

doi:10.5220/0005170205230529

Communication Dans Un Congrès Année : 2014

Mining Tweet Data - Statistic and semantic information for political tweet classification

(1) , (2, 3) , (1)

1
2
3

Guillaume Tisserant

Fonction : Auteur
PersonId : 932279

Exploration et exploitation de données textuelles

Mathieu Roche

Fonction : Auteur
PersonId : 4967
IdHAL : mathieu-roche
ORCID : 0000-0003-3272-8568
IdRef : 09042087X

ADVanced Analytics for data SciencE

Territoires, Environnement, Télédétection et Information Spatiale

Violaine Prince

Fonction : Auteur
PersonId : 942907
ORCID : 0000-0002-5997-9677

Exploration et exploitation de données textuelles

Résumé

This paper deals with the quality of textual features in messages in order to classify tweets. The aim of our study is to show how improving the representation of textual data affects the performance of learning algorithms. We will first introduce our method GenDesc. It generalises less relevant words for tweet classi- fication. Secondly we compare and discuss the types of textual features given by different approaches. More precisely we discuss the semantic specificity of textual features, e.g. Named Entity, HashTag.

Mots clés

Classification Tweets Text Mining

Domaines

Traitement du texte et du document Intelligence artificielle [cs.AI] Recherche d'information [cs.IR] Web

Fichier principal

SSTM_2014_2 2.pdf (343.92 Ko)

Origine	Fichiers produits par l'(les) auteur(s)
Licence	Paternité - Pas d'utilisation commerciale - Pas de modification

Mathieu Roche : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01054908

Soumis le : mercredi 29 janvier 2020-17:09:36

Dernière modification le : mardi 10 octobre 2023-16:38:10

Archivage à long terme le : jeudi 30 avril 2020-18:47:55

Dates et versions

lirmm-01054908 , version 1 (29-01-2020)

Licence

Paternité - Pas d'utilisation commerciale - Pas de modification

Identifiants

HAL Id : lirmm-01054908 , version 1
DOI : 10.5220/0005170205230529

Citer

Guillaume Tisserant, Mathieu Roche, Violaine Prince. Mining Tweet Data - Statistic and semantic information for political tweet classification. KDIR 2014 - 6th International Conference on Knowledge Discovery and Information Retrieval, Oct 2014, Rome, Italy. pp.523-529, ⟨10.5220/0005170205230529⟩. ⟨lirmm-01054908⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CIRAD AGROPARISTECH CNRS IRSTEA ADVANSE TEXTE LIRMM AGROPOLIS TETIS MIPS UNIV-MONTPELLIER INRAE INRAEOCCITANIEMONTPELLIER MATHNUM

297 Consultations

116 Téléchargements

Mining Tweet Data - Statistic and semantic information for political tweet classification

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager