Order and Mess in Text Categorization: Why Using Sequential Patterns to Classify - LIRMM - Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier
Conference Papers Year : 2004

Order and Mess in Text Categorization: Why Using Sequential Patterns to Classify

Abstract

Text categorization is a well-known task essentially based on statistical approaches using neural networks, Support Vector Machines and other machine learning algorithms. Texts are generally considered as bags of words without any order. Although these approaches have proven to be efficient, they do not provide users with comprehensive and reusable rules about their data. These rules are however very important for users in order to describe the trends from the data they have to analyze. In this framework, an association-rule based approach has been proposed by Bing Liu (CBA). In this paper, we propose to extend this approach by using sequential patterns in the SPaC method (Sequential Patterns for Classification). Taking order into account allows us to represent the succession of words through a document without complex and time-consuming representations and treatments such as those performed in natural language and grammatical methods. The original method we propose here consists in mining sequential patterns in order to build a classifier. We show on experiments that our proposition is relevant, and that it is very interesting compared to other methods. In particular, our method has better results than SVM when SVM do not perform well. Moreover, our approach is very interesting for huge volumes of data.
Fichier principal
Vignette du fichier
lirmm-00108887v1 .pdf (148.42 Ko) Télécharger le fichier
Origin Files produced by the author(s)
Loading...

Dates and versions

lirmm-00108887 , version 1 (05-11-2019)

Identifiers

  • HAL Id : lirmm-00108887 , version 1

Cite

Simon Jaillet, Anne Laurent, Maguelonne Teisseire, Jacques Chauché. Order and Mess in Text Categorization: Why Using Sequential Patterns to Classify. 3rd Workshop on Mining Temporal and Sequential Data (TDM), Aug 2004, Seattle, United States. pp.121-128. ⟨lirmm-00108887⟩
99 View
65 Download

Share

More