Ordre et Désordre dans la Catégorisation de Textes

Simon Jaillet 1 Maguelonne Teisseire 2 Anne Laurent 2 Jacques Chauché 3
2 TATOO - Fouille de données environnementales
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
3 TEXTE - Exploration et exploitation de données textuelles
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : Text categorization is a well-known task essentially based on statistical approaches using neural networks, Support Vector Machines and other machine learning algorithms. Texts are generally considered as bags of words without any order. Although these approaches have proven to be efficient, they do not provide users with comprehensive and reusable rules about their data. These rules are however very important for users in order to describe the trends from the data they have to analyze. In this framework, an association-rule based approach has been proposed by Bing Liu (CBA). In this paper, we propose to extend this approach by using sequential patterns in the SPaC method (Sequential Patterns for Classification). Taking order into account allows us to represent the succession of words through a document without complex and time-consuming representations and treatments such as those performed in natural language and grammatical methods. We show on experiments that our proposition is relevant, and that it is very interesting compared to other methods.
Document type :
Conference papers
Complete list of metadatas

Cited literature [32 references]  Display  Hide  Download

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00108889
Contributor : Christine Carvalho de Matos <>
Submitted on : Wednesday, October 9, 2019 - 10:12:29 AM
Last modification on : Wednesday, October 9, 2019 - 5:02:03 PM

File

lirmm-00108889v1.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : lirmm-00108889, version 1

Collections

Citation

Simon Jaillet, Maguelonne Teisseire, Anne Laurent, Jacques Chauché. Ordre et Désordre dans la Catégorisation de Textes. BDA: Bases de Données Avancées, Oct 2004, Montpellier, France. pp.555-573. ⟨lirmm-00108889⟩

Share

Metrics

Record views

119

Files downloads

5