G. Salton, The SMART Retrieval System---Experiments in Automatic Document Processing, 1971.

C. Hori, Advances in Automatic Speech Summarization, Proc. EUROSPEECH2001, pp.1771-1774, 2001.

B. Lemaire, Limites de la lemmatisation pour l'extraction de significations, 9th International Conference on the Statistical Analysis of Textual Data, pp.725-732, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00385750

C. M. Tan, Y. F. Wang, and C. D. Lee, The use of bigrams to enhance text categorization, Information Processing & Management, vol.38, issue.4, pp.306-4573, 2002.
DOI : 10.1016/S0306-4573(01)00045-0

D. Bourigault, LEXTER un Logiciel d'EXtraction de TERminologie Application à l'extraction des connaissances à partir de textes, Thèse en mathématiques, informatique appliquée aux sciences de l'Homme, École des Hautes Études en Sciences Sociales, 1994.

H. Schmid, Improvements in Part-of-Speech Tagging with an Application to German, Proceedings of the ACL SIGDAT-Workshop, 1995.
DOI : 10.1007/978-94-017-2390-9_2

I. Witten, E. Frank, L. Trigg, M. Hall, G. Holmes et al., Weka: Practical machine learning tools and techniques with java implementations, Proc ICONIP/ANZIIS/ANNES'99 Int. Workshop: Emerging Knowledge Engineering and Connectionist-Based Info. Systems, pp.192-196, 1999.

G. Salton and C. Buckley, Term-weighting approaches in automatic text retrieval, Information Processing & Management, vol.24, issue.5, pp.513-523, 1988.
DOI : 10.1016/0306-4573(88)90021-0

M. Junker and R. Hoch, Evaluating OCR and Non-OCR Text Representations for Document Classification, Proceedings ICDAR-97 Fourth International Conference on Document Analysis and Recognition, 1997.

F. Sebastiani, A Tutorial on Automated Text Categorisation, Analia Amandi and Ricardo Zunino, Proceedings of the 1st Argentinian Symposium on Artificial Intelligence (ASAI'99), 1999.

T. Joachims, A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization, pp.143-151, 1997.

T. Joachims, Text categorization with Support Vector Machines: Learning with many relevant features, Proceedings of ECML-98, 10th European Conference on Machine Learning, pp.137-142, 1398.
DOI : 10.1007/BFb0026683

Y. Yang and X. Liu, A Re-Examination of Text Categorization Methods, SIGIR, pp.42-49, 1999.

Y. Yang, An Evaluation of Statistical Approaches to Text Categorization, Information Retrieval, vol.1, issue.1/2, pp.69-90, 1999.
DOI : 10.1023/A:1009982220290

W. B. Cavnar and J. M. Trenkle, Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, 1430, categorization, classification, information retrieval, information-retrieval, irGram-Based Text Categorization, pp.161-175, 1994.

F. Sebastiani, Machine learning in automated text categorization, ACM Computing Surveys, vol.34, issue.1, pp.360-0300, 2002.
DOI : 10.1145/505282.505283

M. Mansur, N. Uzzaman, and M. Khan, Analysis of Ngram based text categorization for Bangla in a newspaper corpus, Proc. of 9th International Conference on Computer and Information Technology, 2006.

B. , V. Vardhan, L. P. Reddy, and A. Vinaybabu, Text categorization using trigram technique for Telugu script, Journal of Theoretical and Applied Information Technology, vol.3, pp.1-2, 2007.

D. D. Lewis and M. Ringuette, A comparison of two learning algorithms for text categorization, Third Annual Symposium on Document Analysis and Information Retrieval, pp.81-93, 1994.

S. S. Sterling, S. Argamo, and O. Frieder, The Effect of OCR Errors on Text Classification, Poster Proc. SIGIR, 2006.

U. S. Kohomban and W. S. Lee, Optimizing Classifier Performance in Word Sense Disambiguation by Redefining Sense Classes, IJCAI, Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp.1635-1640, 2007.

I. Bayoudh, N. Béchet, and M. Roche, Blog Classification: Adding Linguistic Knowledge to Improve the K-NN Algorithm, Intelligent Information Processing, 5th IFIP International Conference on Intelligent Information Processing, pp.68-77, 2008.