Knowledge-Based Representation for Transductive Multilingual Document Classification

Abstract : Multilingual document classification is often addressed by approaches that rely on language-specific resources (e.g., bilingual dictionaries and machine translation tools) to evaluate cross-lingual document similarities. However, the required transformations may alter the original document semantics, raising additional issues to the known difficulty of obtaining high-quality labeled datasets. To overcome such issues we propose a new framework for multilingual document classification under a transductive learning setting. We exploit a large-scale multilingual knowledge base, BabelNet, to support the modeling of different language-written documents into a common conceptual space, without requiring any language translation process. We resort to a state-of-the-art transductive learner to produce the document classification. Results on two real-world multilingual corpora have highlighted the effectiveness of the proposed document model w.r.t. document representations usually involved in multilingual and cross-lingual analysis, and the robustness of the transductive setting for multilingual document classification.
Type de document :
Communication dans un congrès
ECIR: European Conference on Information Retrieval, Mar 2015, Vienna, Austria. 37th European Conference on IR Research, ECIR 2015, Vienna, Austria, March 29 - April 2, 2015. Proceedings, LNCS (9022), pp.92-103, 2015, Advances in Information Retrieval. 〈http://link.springer.com/book/10.1007%2F978-3-319-16354-3〉. 〈10.1007/978-3-319-16354-3_11〉
Liste complète des métadonnées

Littérature citée [21 références]  Voir  Masquer  Télécharger

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01239095
Contributeur : Dino Ienco <>
Soumis le : lundi 7 décembre 2015 - 14:00:48
Dernière modification le : mercredi 18 avril 2018 - 14:24:05
Document(s) archivé(s) le : samedi 29 avril 2017 - 09:44:37

Fichier

paper_169.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Salvatore Romeo, Dino Ienco, Andrea Tagarelli. Knowledge-Based Representation for Transductive Multilingual Document Classification. ECIR: European Conference on Information Retrieval, Mar 2015, Vienna, Austria. 37th European Conference on IR Research, ECIR 2015, Vienna, Austria, March 29 - April 2, 2015. Proceedings, LNCS (9022), pp.92-103, 2015, Advances in Information Retrieval. 〈http://link.springer.com/book/10.1007%2F978-3-319-16354-3〉. 〈10.1007/978-3-319-16354-3_11〉. 〈lirmm-01239095〉

Partager

Métriques

Consultations de la notice

184

Téléchargements de fichiers

241