Data and Machine Learning Model Management with Gypscie
Abstract
As predictive analytics using ML models (or models for short) become preva- lent in different stages of scientific exploration, a new set of artifacts are pro- duced during the models’ life-cycle that need to be managed [2]. In addition to the models with their evolving versions, ML life-cycle artifacts include the collected training data and pre-processing workflows, data labels and selected features, model training, tuning and monitoring statistics and provenance in- formation. However, to realize the full potential of data science, these artifacts must be built and combined, which can be very complex as there can be many to select from. Furthermore, they should be shared and reused, in particular, in different execution environments such as HPC or Spark clusters. In order to support the complete ML life-cycle process and produced arti- facts, we have been developing the Gypscie framework, which offers collaborat- ing researchers a common software infrastructure to develop, share, improve and publish ML artifacts.
Origin | Files produced by the author(s) |
---|