GONNA: a Gene Ontology Nearest Neighbor Approach for the Functional Prediction of P.falciparum Orphan Genes
Abstract
Title: GONNA: a Gene Ontology Nearest Neighbor Approach for the functional prediction of P.falciparum orphan genes Plasmodium falciparum, the pathogenic agent responsible for malaria, causes close to 3 millions human deaths each year in the world. Its genome, published in 2002, remains poorly understood. Among its ~5000 genes, ~3000 have a totally unknown biological function. Several reasons explain this situation, the first one being a totally atypical genome composition which renders ineffective the usual methods of functional annotation based on sequence comparison (alignment) with homologous genes in nearby organisms. Non-homology methods are thus needed to obtain functional clues for the orphan genes. Notably, transcriptomic analysis using DNA microarrays have been proposed. These approaches are based on the hypothesis that uncharacterized genes may potentially share functional roles with annotated genes of similar transcriptomic profile, a principle known as "Guilt by Association" (GBA). Interestingly, this intra-species principle can also be apply to other type of available postgenomic data, such as proteomic data or protein interaction data. The GO Consortium (http://www.geneontology.org) has developed a systematic and standardized nomenclature for annotating genes in various organisms, including P.falciparum. The GO annotation is a hierarchical structure describing generalization relationships between hundreds of terms. So each gene can have several associated GO terms, due to the hierarchical structure of the ontology, and of the existence of multi-functional genes. Here, we propose a low computing time method that allows the application of the GBA principle on the Gene Ontology in a very extensive way, and for several postgenomic data. Notably, it allows intensive used of a "cross-validation" procedure to provide high quality assessment of the predictions of the method for each GO term individually. The result is an assessed functional draft of the P.falciparum orphan genes. This draft has been integrated in a database available on the web and that can be accessed in different ways. A global view allows to rapidly inspect the branches of the ontology that are more suitable for the GBA principle using the considered type of data (transcriptome, proteome, and protein interaction data). Moreover, user can query the database to search for the potential GO terms of one particular gene, or to search for all the genes that potentially belong to a given GO term.