Ehrlichia Ruminantium Genome Segmentations Reveal Novel Homologous Genes
Résumé
Ehrlichia ruminantium, an intracellular bacterium, causes Heartwater disease in wild and domestic ruminants in the sub-Saharan Africa, in African and some Carribean islands. When affected by this fatal tick-borne disease, up to 90% of susceptible animals die within 3 weeks (Totte et al., 1997). The spread of E. ruminantium and Heartwater severely impacts the production of livestock in Africa: an important economical consequence (Mukhebi et al., 1999). Since current diagnostic tools and vaccines show a limited efficiency, partly due to genotypic variations, new targets need to be discovered (Frutos et al., 2006). For this sake, a sequencing program was completed to determine and annotate the protein-coding repertoire (Collin et al., 2005), then a comparative genomic analysis of 3 phenotypically different strains investigated genomic evolutionary mechanisms of this rickettsia (Frutos et al., 2006). Both studies pointed out the surprisingly important proportion of tandem repeats (TRs) in non-coding genomic regions, and suspected that repeat variation contributes greatly to genome adaptation in E. ruminantium. With a novel approach, we conducted a comparative genome analysis of the 3 strains studied in (Frutos et al., 2006): two Welgevonden strains (pathogenic in mice), Erwo isolated in South Africa in 1988 and Erwe, which is an evolved Erwo after passages in 11-13 different cell environments; finally a Gardel strain, Erga, isolated from a Guadeloupean goat. Whole genome alignment tools yield lists of evolutionary related regions in several genomes with their alignment, but these are difficult to browse graphically. Genome browsers allow to navigate across a genome and view its local annotations, but do not offer a detailed evolutionary view in comparison to related genomes. We developed a genome comparison tool, QOD, that gathers both advantages. It segments a reference genome in potentially overlapping regions according to their local DNA similarity to other genomes. Every segment is either specific to the reference genome or common to all compared genomes; a common segment may have a single or several multiple alignments, in which case it is termed ambiguous, since one cannot predict its orthologs. QOD allows to browse through the segmentation and crosses it with known annotations. We segmented the 3 strains in turn to determine their common and specific regions, and compared our results with those of Frutos et al. Both analysis yield globally the same picture of very similar genomes: 98% of the genomes are covered by unambiguous common segments. However, our results allow to correct or precise some annotations of Frutos et al., especially those concerning specific CDSs. For example, they predicted 7 specific CDSs in Erwo, which could represent potential targets. However, QOD finds for each a strong, unique alignment with an Erwe CDS and a homolog in Erga. All these CDSs are short, and their similarities may have escaped Frutos et al. approach. Among 59 CDSs that were predicted as absent in at least one strain, we could find a significant homology for 52 CDSs. Altogether, this suggests that QOD is sensitive enough to predict novel homologies, and that TR variation also contributes to codingregions evolution.