Proposals for classification methods dedicated to biological data
Abstract
The number of available genomic sequences is growing very fast, due to the development of massive sequencing techniques. Sequence classification is needed and contributes to the assessment of gene and species evolutionary relationships. Classification methods are thus necessary to carry out these identification operations in an accurate and fast way. We develop a classification method dedicated to homologous sequence family databases, allowing the attribution of a new sequence to a cluster using similarity measures. We used this classification method to implement two applications, Homologous Sequence Identification (HoSeqI) and MultiHoSeqI. Lately, we developed a chimera detection method and implemented an application, Chimeric Sequence Identification (ChiSeqI) to automate the processes of classification of specific biological data, the bacterial 16S ribosomal RNA sequences, and of detection of chimeric sequences.