Subset Modelling: A Domain Partitioning Strategy for Data-efficient Machine-Learning - LIRMM - Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier
Communication Dans Un Congrès Année : 2023

Subset Modelling: A Domain Partitioning Strategy for Data-efficient Machine-Learning

Résumé

The success of machine learning (ML) systems depends on data availability, volume, quality, and efficient computing resources. A challenge in this context is to reduce computational costs while maintaining adequate accuracy of the models. This paper presents a framework to address this challenge. The idea is to identify "subdomains" within the input space, train local models that produce better predictions for samples from that specific subdomain, instead of training a single global model on the full dataset. We experimentally evaluate our approach on two real-world datasets. Our results indicate that subset modelling (i) improves the predictive performance compared to a single global model and (ii) allows data-efficient training.
Fichier principal
Vignette du fichier
Data_clustering_for_training_in_domain_ML_Models__Falaah_edit_ (2).pdf (341.34 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)

Dates et versions

lirmm-04264125 , version 1 (30-10-2023)

Identifiants

Citer

Vitor Ribeiro, Eduardo Pena, Raphael Saldanha, Reza Akbarinia, Patrick Valduriez, et al.. Subset Modelling: A Domain Partitioning Strategy for Data-efficient Machine-Learning. SBBD 2023 - Simpósio Brasileiro de Banco de Dados, SBC, Sep 2023, Belo Horizonte, Brazil. pp.318-323, ⟨10.5753/sbbd.2023.232829⟩. ⟨lirmm-04264125⟩
56 Consultations
51 Téléchargements

Altmetric

Partager

More