Zenith: Scientific Data Management on a Large Scale
Abstract
Modern science disciplines such as environmental science and astronomy must deal with overwhelming amounts of experimental data. Such data must be processed (cleaned, transformed, analyzed) in all kinds of ways in order to draw new conclusions and test scientific theories. Despite their differences, certain features are common to scientific data of all disciplines: massive scale; manipulated through large, distributed workflows; complexity with uncertainty in the data values, eg, to reflect data capture or observation; important metadata about experiments and their provenance; and mostly append-only (with rare updates). Furthermore, modern scientific research is highly collaborative, involving scientists from different disciplines (eg biologists, soil scientists, and geologists working on an environmental project), in some cases from different organizations in different countries. Since each discipline or organization tends to produce and manage its own data in specific formats, with its own processes, integrating distributed data and processes gets difficult as the amounts of heterogeneous data grow. In 2011, to address these challenges, we started Zenith (http://www-sop.inria.fr/teams/zenith/), a joint team between INRIA and University Montpellier 2. Zenith is located at LIRMM in Montpellier, a city that enjoys a very strong position in environmental science with major labs and groups working on related topics such as agronomy, biodiversity, water hazard, land dynamics and biology. We are developing our solutions by working closely with scientific application partners such as CIRAD and INRA in agronomy.