Knowledge-Based Representation for Transductive Multilingual Document Classification

Abstract : Multilingual document classification is often addressed by approaches that rely on language-specific resources (e.g., bilingual dictionaries and machine translation tools) to evaluate cross-lingual document similarities. However, the required transformations may alter the original document semantics, raising additional issues to the known difficulty of obtaining high-quality labeled datasets. To overcome such issues we propose a new framework for multilingual document classification under a transductive learning setting. We exploit a large-scale multilingual knowledge base, BabelNet, to support the modeling of different language-written documents into a common conceptual space, without requiring any language translation process. We resort to a state-of-the-art transductive learner to produce the document classification. Results on two real-world multilingual corpora have highlighted the effectiveness of the proposed document model w.r.t. document representations usually involved in multilingual and cross-lingual analysis, and the robustness of the transductive setting for multilingual document classification.
Complete list of metadatas

Cited literature [21 references]  Display  Hide  Download

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01239095
Contributor : Dino Ienco <>
Submitted on : Monday, December 7, 2015 - 2:00:48 PM
Last modification on : Wednesday, September 18, 2019 - 4:04:04 PM
Long-term archiving on: Saturday, April 29, 2017 - 9:44:37 AM

File

paper_169.pdf
Files produced by the author(s)

Identifiers

Citation

Salvatore Romeo, Dino Ienco, Andrea Tagarelli. Knowledge-Based Representation for Transductive Multilingual Document Classification. ECIR: European Conference on Information Retrieval, Mar 2015, Vienna, Austria. pp.92-103, ⟨10.1007/978-3-319-16354-3_11⟩. ⟨lirmm-01239095⟩

Share

Metrics

Record views

338

Files downloads

559