Data integration in the agronomic domain : national and international data discovery system

Abstract : Current research in Agronomy has produced a vast amount of genomic, genetic and phenomic data. To deal with the Volume, Variety and Velocity of those data, it is necessary to first refine to candidate datasets through data discovery then to integrate them through semantic web technologies. Data discovery is an approach that allows to easily search for datasets based on keywords and metadata. The plant bioinformatic node of the Institut Français de Bioinformatique (IFB) gives access to several public information systems hosting domain specific data. It is composed of five bioinformatics platforms : the South Green platform, the LIPM platform, the Roscoff platform ABiMS, the platform for Arthopods for Agroecosystems Arthropods and the URGI platform. The later one plays a key role in several national an international projects like the Whea Initiative. Those platforms integrate several plant genomic, genetic and phenomic data, which they need to expose in data discovery and integration systems. The distributed data discovery system need an ETL (Extraction, Transformation and Loading) based integration pipeline implemented on each platform. This ETL can either be developed from scratch or be based on existing technologies such as KarmaWeb, Talend or Open Refine. The pipeline is being developed at the URGI, and will be deployed on all IFB plant nodes. The data discovery system is based on SolR (implemented in the Transplant portal http://www.transplantdb.eu) which uses the Lucene search framework at its core for full-text indexing. Here, we will present the data discovery system architecture and the ETL solutions evaluation and comparison. Work funded by IFB investment for the future infrastructure project, IFB_Plant node.
Type de document :
Poster
JOBIM: Journées Ouvertes Biologie Informatique Mathématiques, Jul 2015, Clermont-Ferrand, France. 2015, 〈https://www6.inra.fr/jobim2015/Programme/JOBIM-Scientific-Agenda〉
Liste complète des métadonnées

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01274725
Contributeur : Pierre Larmande <>
Soumis le : mardi 16 février 2016 - 10:56:06
Dernière modification le : jeudi 24 mai 2018 - 15:59:21

Identifiants

  • HAL Id : lirmm-01274725, version 1

Citation

Florian Philippe, Aravind Venkatesan, Nordine El Hassouni, Cyril Pommier, Manuel Ruiz, et al.. Data integration in the agronomic domain : national and international data discovery system. JOBIM: Journées Ouvertes Biologie Informatique Mathématiques, Jul 2015, Clermont-Ferrand, France. 2015, 〈https://www6.inra.fr/jobim2015/Programme/JOBIM-Scientific-Agenda〉. 〈lirmm-01274725〉

Partager

Métriques

Consultations de la notice

220