Bias and benefit induced by intra-species homologies in guilt by association methods to predict protein function
Résumé
The guilt by association (GBA) principle is used in several supervised and non-supervised methods to functionally annotate uncharacterized genes from transcriptomic data or from other information source. However, these methods do not distinguish between genes which have or have not intra-species homologues. We show here that functional annotation and intra-species homology are strongly dependent. We emphasize that applying any GBA method not accounting for this form of homology has two opposite effects: it leads to over-estimating the method performance on the genes with no intra-species homologues, and to losing the benefit of homology on the other genes. Bias and benefit are measured on \Pf\ and Yeast, and a general scheme to properly apply the GBA principle is given. All together, this method improves over previous standard applications of the GBA principle.