A new type of Hidden Markov Models to predict complex domain architecture in protein sequences - LIRMM - Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier
Conference Papers Year : 2007

A new type of Hidden Markov Models to predict complex domain architecture in protein sequences

Abstract

Profile Hidden Markov Models (pHMMs) represent sequence regions, called domains or motifs, that are conserved among the proteins of a family. They are routinely used either i/ to recognize the presence of a domain in a protein and thereby to test its membership of a known family, or ii/ to tag the precise position of a domain in the sequence. However, a majority of proteins are composed of several domains, and during evolution, events such as rearrangements or duplications may create different domain architectures in proteins of the same family. Due to their intrinsic linear structure, pHMMs cannot model several distinct domains whose number and relative order may be variable in a family. We lack efficient tools to perform recognition and tagging in the case of complex domain architectures. Here, we propose a generalized HMM to solve exactly this. In our solution, called cyclic profile HMM (CpHMM), specific transitions can model the repetition of units, as well as different relative orders of domains. In a CpHMM, complete domains are modeled by nested pHMMs. We provide a program for the construction of CpHMMs that takes as input pHMMs, thereby allowing the user to capitalized on already developed pHMMs (PFAM). We adapted recognition and tagging algorithms to CpHMMs and test them on both the family of PentatricoPeptide Repeats proteins (PPR) and on the superfamily of saposins. Our results demonstrate that CpHMMs improve on pHMMs for the recognition and tagging of proteins with complex domains architectures, while keeping their efficiency. The architecture of PPR proteins has been manually annotated for a subfamilly in arabidopsis, however only the recognition with the PFAM PPR motif has been previously performed for the rice and poplar tree. Comparing our results with the annotations of arabidopsis PPR, we show that more than 88% of the motifs are precisely recognized by the cpHMM. Moreover, we completed the recognition of PPR, as well as the determination of their architecture, for both rice and poplar tree proteomes.
Fichier principal
Vignette du fichier
hmm-job.pdf (149.6 Ko) Télécharger le fichier
Loading...

Dates and versions

lirmm-00195493 , version 1 (11-12-2007)

Identifiers

  • HAL Id : lirmm-00195493 , version 1

Cite

Raluca Uricaru, Laurent Brehelin, Eric Rivals. A new type of Hidden Markov Models to predict complex domain architecture in protein sequences. JOBIM 2007 - 8es Journées Ouvertes en Biologie, Informatique et Mathématiques, Jul 2007, Marseille, France. pp.97-102. ⟨lirmm-00195493⟩
377 View
245 Download

Share

More