Demystifying the TensorFlow Eager Execution of Deep Learning Inference on a CPU-GPU Tandem

Paul Delestrac; Lionel Torres; David Novo

doi:10.1109/DSD57027.2022.00066

Communication Dans Un Congrès Année : 2022

Demystifying the TensorFlow Eager Execution of Deep Learning Inference on a CPU-GPU Tandem

(1) , (1) , (1)

Paul Delestrac

Fonction : Auteur
PersonId : 1163872
IdHAL : paul-delestrac
ORCID : 0000-0002-7476-1422

ADAptive Computing

Lionel Torres

Fonction : Auteur
PersonId : 929667
ORCID : 0000-0001-5807-5070

ADAptive Computing

David Novo

Fonction : Auteur
PersonId : 170933
IdHAL : david-novo
ORCID : 0000-0002-5510-4152
IdRef : 244276455

ADAptive Computing

Résumé

Machine Learning (ML) frameworks are tools that facilitate the development and deployment of ML models. These tools are major catalysts of the recent explosion in ML models and hardware accelerators thanks to their high programming abstraction. However, such an abstraction also obfuscates the run-time execution of the model and complicates the understanding and identification of performance bottlenecks. In this paper, we demystify how a modern ML framework manages code execution from a high-level programming language. We focus our work on the TensorFlow eager execution, which remains obscure to many users despite being the simplest mode of execution in TensorFlow. We describe in detail the process followed by the runtime to run code on a CPU-GPU tandem. We propose new metrics to analyze the framework's runtime performance overhead. We use our metrics to conduct in-depth analysis of the inference process of two Convolutional Neural Networks (CNNs) (LeNet-5 and ResNet-50) and a transformer (BERT) for different batch sizes. Our results show that GPU kernels execution need to be long enough to exploit thread parallelism, and effectively hide the runtime overhead of the ML framework.

Mots clés

ML frameworks TensorFlow eager execution profiling CPU-GPU tandem

Domaines

Intelligence artificielle [cs.AI] Apprentissage [cs.LG]

Fichier principal

dsd22_delestrac.pdf (683.91 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Paul Delestrac : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-03775613

Soumis le : vendredi 28 octobre 2022-09:05:46

Dernière modification le : mardi 23 janvier 2024-14:18:44

Dates et versions

lirmm-03775613 , version 1 (13-09-2022)

lirmm-03775613 , version 2 (28-10-2022)

Identifiants

HAL Id : lirmm-03775613 , version 2
DOI : 10.1109/DSD57027.2022.00066

Citer

Paul Delestrac, Lionel Torres, David Novo. Demystifying the TensorFlow Eager Execution of Deep Learning Inference on a CPU-GPU Tandem. DSD 2022 - 25th Euromicro Conference on Digital System Design, Aug 2022, Maspalomas, Spain. pp.446-455, ⟨10.1109/DSD57027.2022.00066⟩. ⟨lirmm-03775613v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS LIRMM GENCI ADAC UNIV-MONTPELLIER ANR

113 Consultations

387 Téléchargements

Demystifying the TensorFlow Eager Execution of Deep Learning Inference on a CPU-GPU Tandem

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager