A Novel Approach for Comparative Genomics & Annotation Transfer
Résumé
With the rapid development of sequencing techniques, the situation where a newly sequenced genome needs to be annotated using available genomes from close species should become more prevalent in the future. However, because of the cost of genome finishing we may have to handle incomplete or not fully assembled genomes. Undoubtedly, the need for comparative annotation will increase, but the genomic community still lack computational solutions that are both efficient and sensitive under various conditions. Present approaches are mainly based on the sequence similarity detected at the gene or protein levels, which are mostly further analysed independently one of each other, despite the dependency implied by the genome. Hence, we propose a novel approach to genome comparison and use it to develop a system that transfers annotations between the compared genomes. Besides features' sequence similarity, it accounts for the synteny it detects across multiple genomes. This approach is simple for it avoids to solve complex questions that makes other approaches computationally hard. The underlying idea is to partition a focus genome according to its pairwise similarities with the other compared genomes. The question is formulated as searching for the intervals that are shared across all genomes under consideration, and maximal in length (i.e., not extendible). If a genomic region is covered by at least one interval it is conserved across all genomes, and the number of such intervals tells how many possibilities exist for aligning it with different regions of the other genomes. Hence, our algorithm partitions the genome into regions following two criteria: 1/ being shared or unshared across all genomes, 2/ offering a unique or several alignment possibilities. The annotation transfer procedure crosses the focus genome's annotations with these regions and automatically derives the possible alignments for each feature. All features falling entirely in a region offering only one alignment possibility are declared as potentially transferable, and the user may interactively select among those according to various criteria: alignment's percent of identity, feature class, etc. We implemented these procedures in an efficient and flexible tool, named QOD, equipped with a user-friendly graphical interface. Graphical and textual results representations allow both to grasp the overall genome similarity at a glance and to browse the conserved and unshared features in various ways. This enables the investigation of genome specific genes or of rearrangements, and copy number variations, for instance. For it does not require the genome sequence to be completely assembled, our approach allows to compare and pre-annotate unfinished genomes, as well as assemblies of Next Generation Sequencing data.
Fichier principal
HGM2010_PosterPresentation_certification_EricR.pdf (1.02 Mo)
Télécharger le fichier
qod-poster.pdf (5.39 Mo)
Télécharger le fichier
Origine | Fichiers produits par l'(les) auteur(s) |
---|
Origine | Fichiers produits par l'(les) auteur(s) |
---|