Skip to Main content Skip to Navigation
Conference papers

A new type of Hidden Markov Models to predict complex domain architecture in protein sequences

Raluca Uricaru 1 Laurent Brehelin 1 Eric Rivals 1, *
* Corresponding author
1 MAB - Méthodes et Algorithmes pour la Bioinformatique
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : Profile Hidden Markov Models (pHMMs) represent sequence regions, called domains or motifs, that are conserved among the proteins of a family. They are routinely used either i/ to recognize the presence of a domain in a protein and thereby to test its membership of a known family, or ii/ to tag the precise position of a domain in the sequence. However, a majority of proteins are composed of several domains, and during evolution, events such as rearrangements or duplications may create different domain architectures in proteins of the same family. Due to their intrinsic linear structure, pHMMs cannot model several distinct domains whose number and relative order may be variable in a family. We lack efficient tools to perform recognition and tagging in the case of complex domain architectures. Here, we propose a generalized HMM to solve exactly this. In our solution, called cyclic profile HMM (CpHMM), specific transitions can model the repetition of units, as well as different relative orders of domains. In a CpHMM, complete domains are modeled by nested pHMMs. We provide a program for the construction of CpHMMs that takes as input pHMMs, thereby allowing the user to capitalized on already developed pHMMs (PFAM). We adapted recognition and tagging algorithms to CpHMMs and test them on both the family of PentatricoPeptide Repeats proteins (PPR) and on the superfamily of saposins. Our results demonstrate that CpHMMs improve on pHMMs for the recognition and tagging of proteins with complex domains architectures, while keeping their efficiency. The architecture of PPR proteins has been manually annotated for a subfamilly in arabidopsis, however only the recognition with the PFAM PPR motif has been previously performed for the rice and poplar tree. Comparing our results with the annotations of arabidopsis PPR, we show that more than 88% of the motifs are precisely recognized by the cpHMM. Moreover, we completed the recognition of PPR, as well as the determination of their architecture, for both rice and poplar tree proteomes.
Document type :
Conference papers
Complete list of metadata

Cited literature [15 references]  Display  Hide  Download
Contributor : Eric Rivals <>
Submitted on : Tuesday, December 11, 2007 - 9:43:53 AM
Last modification on : Thursday, October 4, 2018 - 10:58:05 AM
Long-term archiving on: : Thursday, September 27, 2012 - 11:11:09 AM


  • HAL Id : lirmm-00195493, version 1



Raluca Uricaru, Laurent Brehelin, Eric Rivals. A new type of Hidden Markov Models to predict complex domain architecture in protein sequences. JOBIM: Journées Ouvertes Biologie, Informatique, Mathématiques, Jul 2007, Marseille, France. pp.97-102. ⟨lirmm-00195493⟩



Record views


Files downloads