Bias and Benefit Induced by Intra-Species Paralogy in Guilt by Association Methods to Predict Protein Function

Laurent Brehelin; Olivier Gascuel

Communication Dans Un Congrès Année : 2007

Bias and Benefit Induced by Intra-Species Paralogy in Guilt by Association Methods to Predict Protein Function

(1) , (1)

Laurent Brehelin

Fonction : Auteur correspondant
PersonId : 21572
IdHAL : laurent-brehelin
ORCID : 0000-0002-2582-2831
IdRef : 068625626

Connectez-vous pour contacter l'auteur

Méthodes et Algorithmes pour la Bioinformatique

Olivier Gascuel

Fonction : Auteur
PersonId : 938491
IdHAL : olivier-gascuel
ORCID : 0000-0002-9412-9723

Méthodes et Algorithmes pour la Bioinformatique

Résumé

Most genomes contain large amounts of orphan genes. For example, 60\% of the P.falciparum genes (the main causal agent of Malaria) lack functional annotation. New approaches, commonly referred as Guilt by Association (GBA) methods, have been proposed to functionally annotate the orphan genes. These methods are based on classification (non-supervised or supervised) applied on post-genomic data. They typically use transcriptome and protein interaction data, but do not exploit paralogy. However, homologous genes from the same species (i.e. paralogues) tend to share similar function(s). Results: This article focus on the effects of paralogy on GBA methods. We illustrate the strength of the dependence between functional annotations and paralogy, and show that applying any GBA method without accounting for this form of homology has two opposite effects: it leads us to over-estimate the method accuracy with orphan genes, and to lose the benefit of paralogy. We present and discuss a resampling algorithm to correctly estimate the performance of any GBA method, as well as a simple, general scheme that can be combined with any GBA method in order to benefit from paralogy. Both procedures are used to measure the bias and benefit on transcriptomic data from Yeast and P.falciparum, two organisms that are well and poorly annotated, respectively. Our results show that both the bias and the benefit induced by paralogy may be substantial, depending on the GBA methods to be considered, the data and the organisms. Conclusions: The search of annotated paralogues should be incorporated in the design of any GBA methods. Moreover, our resampling procedure should be used routinely to obtain predictions with unbiased performance estimates, which is of first importance, eg, to chose among contradictory predictions, or to select target genes to perform wet experiments.

Domaines

Bio-informatique [q-bio.QM] Bio-Informatique, Biologie Systémique [q-bio.QM]

Fichier principal

SMPGD07.pdf (71.72 Ko)

Laurent Brehelin : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00195262

Soumis le : lundi 10 décembre 2007-13:16:28

Dernière modification le : vendredi 24 mars 2023-14:52:49

Archivage à long terme le : lundi 12 avril 2010-06:45:21

Dates et versions

lirmm-00195262 , version 1 (10-12-2007)

Identifiants

HAL Id : lirmm-00195262 , version 1

Citer

Laurent Brehelin, Olivier Gascuel. Bias and Benefit Induced by Intra-Species Paralogy in Guilt by Association Methods to Predict Protein Function. SMPGD'07: Statistical Methods for Post-Genomic Data, Jan 2007, Paris, France. ⟨lirmm-00195262⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS MAB LIRMM MIPS UNIV-MONTPELLIER

101 Consultations

52 Téléchargements

Bias and Benefit Induced by Intra-Species Paralogy in Guilt by Association Methods to Predict Protein Function

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager