Pre-Processing Time Constraints for Efficiently Mining Generalized Sequential Patterns
Abstract
In this paper we consider the problem of discovering sequential patterns by handling time constraints. While sequential patterns could be seen as temporal relationships between facts embedded in the database, generalized sequential patterns aim at providing the end user with a more flexible handling of the transactions embedded in the database. We propose a new efficient algorithm, called GTC (graph for time constraints) for mining such patterns in very large databases. It is based on the idea that handling time constraints in the earlier stage of the algorithm can be highly beneficial since it minimizes computational costs by preprocessing data sequences. Our test shows that the proposed algorithm performs significantly faster than a state-of-the-art sequence mining algorithm.