Classifying Words: A Syllables-based Model

Pattaraporn Warintarawej; Anne Laurent; Pierre Pompidor; Armelle Cassanas; Bénédicte Laurent

doi:10.1109/DEXA.2011.21

Communication Dans Un Congrès Année : 2011

Classifying Words: A Syllables-based Model

(1) , (2) , (2) , (3) , (3)

1
2
3

Pattaraporn Warintarawej

Fonction : Auteur
PersonId : 937892

Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier

Anne Laurent

Fonction : Auteur
PersonId : 21743
IdHAL : anne-laurent
ORCID : 0000-0003-3708-6429
IdRef : 075173735

Fouille de données environnementales

Pierre Pompidor

Fonction : Auteur
PersonId : 170558
IdHAL : pierre-pompidor
ORCID : 0000-0001-5466-5137

Fouille de données environnementales

Armelle Cassanas

Fonction : Auteur
PersonId : 920972

Namae Concept

Bénédicte Laurent

Fonction : Auteur
PersonId : 920973

Namae Concept

Résumé

Text classification has been extensively studied by linguists and computer scientists. However, there are very few works on classification of words into classes or concepts (e.g. thesaurus). In this paper, we consider this topic, especially in the context of the classification of names like brand names or neologisms. The challenge is thus to provide automated tools to analyze new names by classifying them into concepts. Then, for example, a naming company customer can be informed about which concept a new name is closest to. As we argue that a word can belong to several concepts, we propose to consider the top-k classification approach. Moreover, we rely on syllables to build the classification model. The word corpus is collected from French thesaurus. All labeled-words are separated into syllables. Feature selection techniques are used to select discriminative syllables. We use a syllables frequency (SF) and mutual information (MI) performing with Naive Bayes classifier and K-nearest neighbor (KNN). Instead of selecting only one class, the model select top-k classes ranking them by a classifier score. The result shows the top-k classification model helps to analyze a new word by showing that it can be related to more than one concept. Moreover, the set of discriminative syllables can be used to explain the classification results which makes the results more meaningful.

Mots clés

Classification Words Classification Feature Selection Syllables Discriminative Features

Domaines

Base de données [cs.DB]

Fichier principal

11_DEXA.PDF (179.68 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Isabelle Gouat : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00671499

Soumis le : vendredi 17 février 2012-15:29:01

Dernière modification le : samedi 23 mars 2024-18:13:16

Archivage à long terme le : vendredi 23 novembre 2012-16:20:35

Dates et versions

lirmm-00671499 , version 1 (17-02-2012)

Identifiants

HAL Id : lirmm-00671499 , version 1
DOI : 10.1109/DEXA.2011.21

Citer

Pattaraporn Warintarawej, Anne Laurent, Pierre Pompidor, Armelle Cassanas, Bénédicte Laurent. Classifying Words: A Syllables-based Model. DEXA 2011 - 22nd International Conference on Database and Expert Systems Applications, Aug 2011, Toulouse, France. pp.208-212, ⟨10.1109/DEXA.2011.21⟩. ⟨lirmm-00671499⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS LIRMM MIPS UNIV-MONTPELLIER

629 Consultations

665 Téléchargements

Classifying Words: A Syllables-based Model

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager