Subset Modelling: A Domain Partitioning Strategy for Data-efficient Machine-Learning - LIRMM - Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier
Conference Papers Year : 2023

Subset Modelling: A Domain Partitioning Strategy for Data-efficient Machine-Learning

Abstract

The success of machine learning (ML) systems depends on data availability, volume, quality, and efficient computing resources. A challenge in this context is to reduce computational costs while maintaining adequate accuracy of the models. This paper presents a framework to address this challenge. The idea is to identify "subdomains" within the input space, train local models that produce better predictions for samples from that specific subdomain, instead of training a single global model on the full dataset. We experimentally evaluate our approach on two real-world datasets. Our results indicate that subset modelling (i) improves the predictive performance compared to a single global model and (ii) allows data-efficient training.
Fichier principal
Vignette du fichier
Data_clustering_for_training_in_domain_ML_Models__Falaah_edit_ (2).pdf (341.34 Ko) Télécharger le fichier
Origin Files produced by the author(s)

Dates and versions

lirmm-04264125 , version 1 (30-10-2023)

Identifiers

Cite

Vitor Ribeiro, Eduardo Pena, Raphael Saldanha, Reza Akbarinia, Patrick Valduriez, et al.. Subset Modelling: A Domain Partitioning Strategy for Data-efficient Machine-Learning. SBBD 2023 - Simpósio Brasileiro de Banco de Dados, SBC, Sep 2023, Belo Horizonte, Brazil. pp.318-323, ⟨10.5753/sbbd.2023.232829⟩. ⟨lirmm-04264125⟩
60 View
59 Download

Altmetric

Share

More