Data integration in the agronomic domain : national and international data discovery system

Abstract : Current research in Agronomy has produced a vast amount of genomic, genetic and phenomic data. To deal with the Volume, Variety and Velocity of those data, it is necessary to first refine to candidate datasets through data discovery then to integrate them through semantic web technologies. Data discovery is an approach that allows to easily search for datasets based on keywords and metadata. The plant bioinformatic node of the Institut Français de Bioinformatique (IFB) gives access to several public information systems hosting domain specific data. It is composed of five bioinformatics platforms : the South Green platform, the LIPM platform, the Roscoff platform ABiMS, the platform for Arthopods for Agroecosystems Arthropods and the URGI platform. The later one plays a key role in several national an international projects like the Whea Initiative. Those platforms integrate several plant genomic, genetic and phenomic data, which they need to expose in data discovery and integration systems. The distributed data discovery system need an ETL (Extraction, Transformation and Loading) based integration pipeline implemented on each platform. This ETL can either be developed from scratch or be based on existing technologies such as KarmaWeb, Talend or Open Refine. The pipeline is being developed at the URGI, and will be deployed on all IFB plant nodes. The data discovery system is based on SolR (implemented in the Transplant portal which uses the Lucene search framework at its core for full-text indexing. Here, we will present the data discovery system architecture and the ETL solutions evaluation and comparison. Work funded by IFB investment for the future infrastructure project, IFB_Plant node.
Document type :
Poster communications
Complete list of metadatas
Contributor : Pierre Larmande <>
Submitted on : Tuesday, February 16, 2016 - 10:56:06 AM
Last modification on : Friday, March 29, 2019 - 9:12:08 AM


  • HAL Id : lirmm-01274725, version 1


Florian Philippe, Aravind Venkatesan, Nordine El Hassouni, Cyril Pommier, Manuel Ruiz, et al.. Data integration in the agronomic domain : national and international data discovery system. JOBIM: Journées Ouvertes Biologie Informatique Mathématiques, Jul 2015, Clermont-Ferrand, France. 2015, ⟨⟩. ⟨lirmm-01274725⟩



Record views