Fast and Exact Mining of Probabilistic Data Streams

Reza Akbarinia 1 Florent Masseglia 1
1 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : Discovering Probabilistic Frequent Itemsets (PFI) is very challenging since algorithms designed for deterministic data are not applicable in probabilistic data. The problem is even more difficult for probabilistic data streams where massive frequent updates need to be taken into account while respecting data stream constraints. In this paper, we propose FEMP (Fast and Exact Mining of Probabilistic data streams), the first solution for exact PFI mining in data streams with sliding windows. FEMP allows updating the frequentness probability of an itemset whenever a transaction is added or removed from the observation window. Using these update operations, we are able to extract PFI in sliding windows with very low response times. Furthermore, our method is exact, meaning that we are able to discover the exact probabilistic frequentness distribution function for any monitored itemset, at any time. We implemented FEMP and conducted an extensive experimental evaluation over synthetic and real-world data sets; the results illustrate its very good performance.
Type de document :
Communication dans un congrès
ECML-PKDD: Machine Learning and Knowledge Discovery in Databases, Sep 2013, Prague, Czech Republic. Springer, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, LNCS (8188), pp.493-508, 2013, 〈http://www.ecmlpkdd2013.org/〉. 〈10.1007/978-3-642-40988-2_32〉
Liste complète des métadonnées

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00838618
Contributeur : Reza Akbarinia <>
Soumis le : mercredi 26 juin 2013 - 10:14:23
Dernière modification le : mercredi 21 novembre 2018 - 20:41:01

Lien texte intégral

Identifiants

Collections

Citation

Reza Akbarinia, Florent Masseglia. Fast and Exact Mining of Probabilistic Data Streams. ECML-PKDD: Machine Learning and Knowledge Discovery in Databases, Sep 2013, Prague, Czech Republic. Springer, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, LNCS (8188), pp.493-508, 2013, 〈http://www.ecmlpkdd2013.org/〉. 〈10.1007/978-3-642-40988-2_32〉. 〈lirmm-00838618〉

Partager

Métriques

Consultations de la notice

551