A new type of Hidden Markov Models to predict complex domain architecture in protein sequences

Raluca Uricaru 1 Laurent Brehelin 1 Eric Rivals 1, *
* Auteur correspondant
1 MAB - Méthodes et Algorithmes pour la Bioinformatique
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : Profile Hidden Markov Models (pHMMs) represent sequence regions, called domains or motifs, that are conserved among the proteins of a family. They are routinely used either i/ to recognize the presence of a domain in a protein and thereby to test its membership of a known family, or ii/ to tag the precise position of a domain in the sequence. However, a majority of proteins are composed of several domains, and during evolution, events such as rearrangements or duplications may create different domain architectures in proteins of the same family. Due to their intrinsic linear structure, pHMMs cannot model several distinct domains whose number and relative order may be variable in a family. We lack efficient tools to perform recognition and tagging in the case of complex domain architectures. Here, we propose a generalized HMM to solve exactly this. In our solution, called cyclic profile HMM (CpHMM), specific transitions can model the repetition of units, as well as different relative orders of domains. In a CpHMM, complete domains are modeled by nested pHMMs. We provide a program for the construction of CpHMMs that takes as input pHMMs, thereby allowing the user to capitalized on already developed pHMMs (PFAM). We adapted recognition and tagging algorithms to CpHMMs and test them on both the family of PentatricoPeptide Repeats proteins (PPR) and on the superfamily of saposins. Our results demonstrate that CpHMMs improve on pHMMs for the recognition and tagging of proteins with complex domains architectures, while keeping their efficiency. The architecture of PPR proteins has been manually annotated for a subfamilly in arabidopsis, however only the recognition with the PFAM PPR motif has been previously performed for the rice and poplar tree. Comparing our results with the annotations of arabidopsis PPR, we show that more than 88% of the motifs are precisely recognized by the cpHMM. Moreover, we completed the recognition of PPR, as well as the determination of their architecture, for both rice and poplar tree proteomes.
Type de document :
Communication dans un congrès
C. Brun; G Didier. JOBIM: Journées Ouvertes Biologie, Informatique, Mathématiques, Jul 2007, Marseille, France. pp.97-102, 2007, 〈http://crfb.univ-mrs.fr/jobim2007/〉
Liste complète des métadonnées

Littérature citée [15 références]  Voir  Masquer  Télécharger

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00195493
Contributeur : Eric Rivals <>
Soumis le : mardi 11 décembre 2007 - 09:43:53
Dernière modification le : jeudi 24 mai 2018 - 15:59:22
Document(s) archivé(s) le : jeudi 27 septembre 2012 - 11:11:09

Fichier

Identifiants

  • HAL Id : lirmm-00195493, version 1

Collections

Citation

Raluca Uricaru, Laurent Brehelin, Eric Rivals. A new type of Hidden Markov Models to predict complex domain architecture in protein sequences. C. Brun; G Didier. JOBIM: Journées Ouvertes Biologie, Informatique, Mathématiques, Jul 2007, Marseille, France. pp.97-102, 2007, 〈http://crfb.univ-mrs.fr/jobim2007/〉. 〈lirmm-00195493〉

Partager

Métriques

Consultations de la notice

365

Téléchargements de fichiers

281