Semi-Automatic Loading of a Microbial Risk in Food Database Thanks to a Domain Ontology
Résumé
A preliminary step in microbial risk assessment in food is to gather and to capitalize experimental data. We have designed, in the predictive modeling platform Sym'Previus (www.symprevius.org), a relational database, MicroRisk-RDB, to store experimental data in microbial risk. Those data are used as required input parameters of pathogen germs growth simulation models. Data capitalization task encounters a challenging lock. Indeed, original data are spread out in heterogeneous data sources (scientific papers, technical reports / sheets, PhD thesis ...). Moreover, they are expressed in heterogeneous formats (mainly tables, text and graphics) and vocabularies. Manual entering of data in a database (e.g. MicroRisk-RDB) is therefore a time-consuming task and developing methods and tools to facilitate data capitalization are required. In the French ANR project MAP'OPT (Equilibrium Gas Composition in Modified Atmosphere Packaging and Food Quality), we are currently designing a semi-automatic tool, @Web-Tool, to facilitate the loading of MicroRisk-RDB. Data stored in MicroRisk-RDB are indexed using a predefined vocabulary (called ontology in the following), MicroRisk-Onto, organized as a taxonomy. This ontology is re-used in @Web-Tool to facilitate the loading of the database. @Web-Tool is a semi-automatic tool, under development, designed to help domain experts to load MicroRisk-RDB from data tables found in scientific publications, especially on the Web. We focus on data tables as they often contain a synthesis of experimental results published in scientific publications. The user downloads an HTML scientific document, then data tables are automatically identified and extracted from the document and a graphical user-friendly interface, based on MicroRisk-Onto, helps the user to extract pertinent information from selected data tables and to load them in MicroRisk-RDB. @Web-Tool is designed as a generic semantic Web tool allowing any database to be loaded assuming that there exists an ontology describing the application domain.