The anti-bouncing data stream model for web usage streams with intralinkings

Chongsheng Zhang 1 Florent Masseglia 2 Yves Lechevallier 3
2 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
3 AxIS - Usage-centered design, analysis and improvement of information systems
CRISAM - Inria Sophia Antipolis - Méditerranée , Inria Paris-Rocquencourt
Abstract : Web usage mining is a significant research area with applications in various fields. However, Web usage data is usually considered streaming, due to its high volumes and rates. Because of these characteristics, we only have access, at any point in time, to a small fraction of the stream. When the data is observed through such a limited window, it is challenging to give a reliable description of the recent usage data. We show that data intralinkings, i.e. a usage record (event) may be associated with other records (events) in the same dataset, are common for Web usage streams. Therefore, in order to have a more authentic grasp of Web usage behaviors, the corresponding data stream models for Web usage streams should be able to process such intralinkings. We study the important consequences of the constraints and intralinkings, through the “bounce rate” problem and the clustering of usage streams. Then we propose the user-centric ABS (the Anti-Bouncing Stream) model which combines the advantages of previous models but avoids their drawbacks. First, ABS is the first data stream model that is able to seize the intralinkings between the Web usage records. It is also the first user-centric data stream model that can associate the usage records for the users in the Web usage streams. Second, owing to its simple but effective management principle, the data in ABS is available at any time for analysis. Under the same resource constraints as existing models in the literature, ABS can better model the recent data. Third, ABS can better measure the bounce rates for Web usage streams. We demonstrate its superiority through a theoretical study and experiments on two real-world data sets.
Type de document :
Article dans une revue
Liste complète des métadonnées

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01091868
Contributeur : Florent Masseglia <>
Soumis le : dimanche 7 décembre 2014 - 12:37:40
Dernière modification le : vendredi 25 mai 2018 - 12:02:04

Identifiants

Collections

Citation

Chongsheng Zhang, Florent Masseglia, Yves Lechevallier. The anti-bouncing data stream model for web usage streams with intralinkings. Information Sciences, Elsevier, 2014, 278, pp.757-772. 〈http://www.sciencedirect.com/science/article/pii/S0020025514003806〉. 〈10.1016/j.ins.2014.03.089〉. 〈lirmm-01091868〉

Partager

Métriques

Consultations de la notice

490