Discovering Highly Informative Feature Set Over High Dimensions

Chongsheng Zhang; Florent Masseglia; Xiangliang Zhang

doi:10.1109/ICTAI.2012.149

Communication Dans Un Congrès Année : 2013

Discovering Highly Informative Feature Set Over High Dimensions

(1) , (2) , (3)

1
2
3

Chongsheng Zhang

Fonction : Auteur

AxIS - Usage-centered design, analysis and improvement of information systems

Florent Masseglia

Fonction : Auteur
PersonId : 172896
IdHAL : florent-masseglia
ORCID : 0000-0002-1149-585X
IdRef : 120528681

ZENITH - Scientific Data Management

Xiangliang Zhang

Fonction : Auteur

KAUST - King Abdullah University of Science and Technology [Thuwal, Saudi Arabia]

Résumé

For many textual collections, the number of features is often overly large. These features can be very redundant, it is therefore desirable to have a small, succinct, yet highly informative collection of features that describes the key characteristics of a dataset. Information theory is one such tool for us to obtain this feature collection. With this paper, we mainly contribute to the improvement of efﬁciency for the process of selecting the most informative feature set over high-dimensional unlabeled data. We propose a heuristic theory for informative feature set selection from high dimensional data. Moreover, we design data structures that enable us to compute the entropies of the candidate feature sets efﬁciently. We also develop a simple pruning strategy that eliminates the hopeless candidates at each forward selection step. We test our method through experiments on real-world data sets, showing that our proposal is very efﬁcient.

Mots clés

Domaines

Base de données [cs.DB]

Fichier principal

ictai12.pdf (150.29 Ko)

Origine	Fichiers produits par l'(les) auteur(s)
Licence	Autorisation HAL

Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00753807

Soumis le : lundi 19 novembre 2012-16:23:33

Dernière modification le : vendredi 14 novembre 2025-16:58:02

Archivage à long terme le : jeudi 21 février 2013-11:36:26

Dates et versions

lirmm-00753807 , version 1 (19-11-2012)

Licence

Autorisation HAL

Identifiants

HAL Id : lirmm-00753807 , version 1
DOI : 10.1109/ICTAI.2012.149

Citer

Chongsheng Zhang, Florent Masseglia, Xiangliang Zhang. Discovering Highly Informative Feature Set Over High Dimensions. ICTAI: International Conference on Tools with Artificial Intelligence, Nov 2012, Athens, Greece. pp.1059-1064, ⟨10.1109/ICTAI.2012.149⟩. ⟨lirmm-00753807⟩

Discovering Highly Informative Feature Set Over High Dimensions

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager