Functional annotation of polymorphisms identified by NGS approaches in P.falciparum
Abstract
Malaria is one of the most widespread parasitic infections in the world. The ongoing WHO Malaria elimination program has resulted in decreased cases. These encouraging results are the issue of public health policies and development of artemisinin based therapies. These approaches are now threatened by the emergence of artemisinin resistant parasites. The development of resistant assay (RSA test, [3]) and genetic markers (Kelch gene, [1]) enable us to better evaluate the prevalence of artemisinin resistant isolates in Cambodia. Plasmodium falciparum is one the major causative agent of malaria in Cambodia. The focus of this project is to identify drug resistant genes in the malaria parasite P.falciparum. It aims to identify these genes using genome polymorphisms. We use a large datasets to analyse the distribution of parasite population over the country. The set is based on NGS genome sequences available in ENA database. We recover 167 genomes originating from four different localities in Cambodia. We describe a reliable SNP variant calling pipeline from around 200 NGS genome sequences based on quantitative parameters provided in the VCF files. SNPs were extracted and filtered after comparison with 3D7 reference genome. Different tools like R, Perl and Artemis were used for the analysis. The major steps involved in the pipeline are, a) The quantitative parameters provided in the variant calling format (VCF) files were analysed to define a threshold to select good quality SNPs, b) SNPs were filtered based on MQ which represents the mapping quality and DA (� ALT / � DP4) which represents the percentage of high quality ALT reads, c) SNPs with low frequency and SNPs with uncertain ALT bases were not considered, d) Mapping was done to different genome version and annotation information was provided for each SNP. These SNPs were then characterized into three categories: non coding region, synonymous and non-synonymous. We differentiate SNPs associated to the coding core and to the sub-telomeric regions of the genome. The large number of samples indeed improves SNP extraction. The dataset obtained with the variant calling pipeline was compared to the other published datasets and validated with the presence of marker SNPs. Recent studies provide evidence that sub-populations of parasites are present in Cambodia [2]. We probe this hypothesis using SNP dataset extracted with pipeline as described above. Different set of SNPs were tested to evaluate the robustness of the sub-population including mutations in the Kelch gene that are being associated to the resistance to artemisinin derivatives. This genetic marker is found in large numbers in the region of Pailin, where drug resistance was first described. We provide genetic evidence for acquisition and transmission of artemisinin resistance in Cambodian parasite sub-populations. These results question the origin and the persistence of these sub-populations. Fragmentation of the P. falciparum is important information that must be taken into account for further statistical analysis of SNP distribution. Different approaches using bioinformatics resources and SNP data will be established to identify features providing functional annotation for proteins, pathways, isolates and sub-populations. These steps are essential to identify parasite sub-populations that could be more susceptible to acquire and to transmit drug resistance in Cambodia.