Approximate Sequential Patterns for Incomplete Databases
Résumé
Industrial databases often contains a large amount of unfilled information. During the knowledge discovery process one specific processing step is often necessary in order to remove these incomplete data either by deleting them or by assessing them. When the data mining task consists in mining frequent sequences, incomplete data are, most of the time, deleted, which leads to an important loss of information. Extracted knowledge becomes so less representative of the whole database. We thus propose a method that uses estimation of missing values contained into incomplete records, while computing the frequency of sequences.