Skip to Main content Skip to Navigation
Journal articles

A simple method to predict protein binding from aligned sequences - application to MHC superfamily and beta2-microglobulin

Abstract : Motivation: The MHC superfamily (MhcSF) consists of immune system MHC class I (MHC-I) proteins, along with proteins with a MHC-I-like structure that are involved in a large variety of biological processes. Beta2-microglobulin (B2M) noncovalent binding to MHC-I proteins is required for their surface expression and function, while MHC-I-like proteins interact, or not, with B2M. This study was de-signed to predict B2M binding (or non-binding) of newly identified MhcSF proteins, in order to decipher their function, understand the molecular recognition mechanisms, and identify deleterious muta-tions. IMGT standardization of MhcSF protein domains provides a unique numbering of the multiple alignment positions, and conditions to develop such predictive tool. Method: We combine a simple-Bayes classifier with IMGT unique numbering. Our method involves two steps: (1) selection of discrimi-nant binary features, which associate an alignment position with an amino acid group; (2) learning of the classifier by estimating the frequencies of selected features, conditionally to the B2M binding property. Results: Our dataset contains aligned sequences of 806 allelic forms of 47 MhcSF proteins, corresponding to 9 receptor types and 4 mammalian species. 18 discriminant features are selected, be-longing to B2M contact sites, or stabilizing the molecular structure required for this contact. Three leave-one-out procedures are used to assess classifier performance, which corresponds to B2M binding prediction for: (1) new proteins, (2) species not represented in the dataset, and (3) new receptor types. The prediction accuracy is high, i.e. 98%, 94% and 70%, respectively. Application of our classifier to lower vertebrate MHC-I proteins indicates that these proteins bind to B2M and should then be expressed on the cellular surface by a process similar to that of mammalian MHC-I proteins. These results demonstrate the usefulness and accuracy of our (simple) approach, which should apply to other function or interaction prediction prob-lems. Availability: Data and MhcSF multiple alignment are available on the IMGT website (, and supplementary material is downloadable at Contact:,,
Complete list of metadata
Contributor : Olivier Gascuel <>
Submitted on : Wednesday, March 14, 2007 - 7:42:41 PM
Last modification on : Wednesday, September 15, 2021 - 11:26:03 AM

Links full text




Elodie Duprat, Marie-Paule Lefranc, Olivier Gascuel. A simple method to predict protein binding from aligned sequences - application to MHC superfamily and beta2-microglobulin. Bioinformatics, Oxford University Press (OUP), 2006, 22 (4), pp.453-459. ⟨10.1093/bioinformatics/bti826⟩. ⟨lirmm-00136659⟩



Record views