R. Agrawal and R. Srikant, Mining sequential patterns, Eleventh International Conference on Data Engineering, pp.3-14, 1995.

H. Akaike, Information theory as an extension of the maximum likelihood principle, Second International Symposium on Information Theory, pp.267-281, 1973.

R. Ash, Information Theory. Interscience publishers, 1965.

V. Barnett and T. Lewis, Outliers in Statistical Data, 1994.

A. Bateman, E. Birney, R. Durbin, S. R. Eddy, K. L. Howe et al., The pfam protein families database, Nucleic Acids Res, vol.28, pp.263-266, 2000.
URL : https://hal.archives-ouvertes.fr/hal-01294685

G. Bejerano and G. Yona, Modeling protein families using probabilistic suffix trees, Proceedings of the 3rd Annual International Conference on Computational Molecular Biology (RECOMB), pp.15-24, 1999.

G. Bennett, Probability inequalities for the sum of independent random variables, Journal of the American Statistical Association, vol.57, pp.33-45, 1962.

K. P. Burnham and D. R. Anderson, Model Selection and Inference: A Practical Information-Theoretic Approach, 1998.

L. Devroye and G. Lugosi, Combinatorial method in density estimation, 2001.

P. G. Ferreira and P. J. Azevedo, Chapter vi: Deterministic motif mining in protein databases, Successes and New Directions in Data Mining, 2007.

B. Grant, A. Rodrigues, K. Elsawy, J. Mccammon, and L. Caves, Bio3d: An r package for the comparative analysis of protein structures, Bioinformatics, vol.22, pp.2695-2696, 2006.

D. Hawkins, Identification of Outliers, 1980.

C. M. Hurvich and C. L. Tsai, Regression and time series model selection in small samples, Biometrika, vol.76, issue.2, pp.297-307, 1989.

E. M. Knorr and R. T. Ng, Algorithms for mining distancebased outliers in large datasets, Proc. 24th Int. Conf. Very Large Data Bases, VLDB, pp.24-27, 1998.

D. Ron, Y. Singer, and N. Tishby, The power of amnesia: Learning probabilistic automata with variable memory length, Machine Learning, vol.25, pp.117-149, 1996.

C. Shannon, A mathematical theory of communication, Bell System Technical Journal, vol.27, pp.379-423, 1948.

N. Sugiura, Further analysis of the data by akaike's information criterion and the finite corrections, Communications in Statistics: Theory and Methods, vol.7, pp.13-26, 1978.

P. Sun, S. Chawla, and B. Arunasalam, Mining for outliers in sequential databases, SDM, 2006.

R. D. Team, R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, 2006.