Discovering Highly Informative Feature Set Over High Dimensions

Chongsheng Zhang 1 Florent Masseglia 2 Xiangliang Zhang 3
1 AxIS - Usage-centered design, analysis and improvement of information systems
CRISAM - Inria Sophia Antipolis - Méditerranée , Inria Paris-Rocquencourt
2 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : For many textual collections, the number of features is often overly large. These features can be very redundant, it is therefore desirable to have a small, succinct, yet highly informative collection of features that describes the key characteristics of a dataset. Information theory is one such tool for us to obtain this feature collection. With this paper, we mainly contribute to the improvement of efficiency for the process of selecting the most informative feature set over high-dimensional unlabeled data. We propose a heuristic theory for informative feature set selection from high dimensional data. Moreover, we design data structures that enable us to compute the entropies of the candidate feature sets efficiently. We also develop a simple pruning strategy that eliminates the hopeless candidates at each forward selection step. We test our method through experiments on real-world data sets, showing that our proposal is very efficient.
Type de document :
Communication dans un congrès
ICTAI'2012: 24th International Conference on Tools with Artificial Intelligence, Nov 2012, Greece. IEEE, pp.6, 2012, 〈http://ictai12.unipi.gr/〉
Liste complète des métadonnées

Littérature citée [10 références]  Voir  Masquer  Télécharger

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00753807
Contributeur : Florent Masseglia <>
Soumis le : lundi 19 novembre 2012 - 16:23:33
Dernière modification le : mercredi 21 novembre 2018 - 19:48:03
Document(s) archivé(s) le : jeudi 21 février 2013 - 11:36:26

Fichier

ictai12.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : lirmm-00753807, version 1

Citation

Chongsheng Zhang, Florent Masseglia, Xiangliang Zhang. Discovering Highly Informative Feature Set Over High Dimensions. ICTAI'2012: 24th International Conference on Tools with Artificial Intelligence, Nov 2012, Greece. IEEE, pp.6, 2012, 〈http://ictai12.unipi.gr/〉. 〈lirmm-00753807〉

Partager

Métriques

Consultations de la notice

449

Téléchargements de fichiers

512