Conference Papers Year : 2022

Demystifying the TensorFlow Eager Execution of Deep Learning Inference on a CPU-GPU Tandem

Paul Delestrac
Lionel Torres
David Novo

Abstract

Machine Learning (ML) frameworks are tools that facilitate the development and deployment of ML models. These tools are major catalysts of the recent explosion in ML models and hardware accelerators thanks to their high programming abstraction. However, such an abstraction also obfuscates the run-time execution of the model and complicates the understanding and identification of performance bottlenecks. In this paper, we demystify how a modern ML framework manages code execution from a high-level programming language. We focus our work on the TensorFlow eager execution, which remains obscure to many users despite being the simplest mode of execution in TensorFlow. We describe in detail the process followed by the runtime to run code on a CPU-GPU tandem. We propose new metrics to analyze the framework's runtime performance overhead. We use our metrics to conduct in-depth analysis of the inference process of two Convolutional Neural Networks (CNNs) (LeNet-5 and ResNet-50) and a transformer (BERT) for different batch sizes. Our results show that GPU kernels execution need to be long enough to exploit thread parallelism, and effectively hide the runtime overhead of the ML framework.
Fichier principal
Vignette du fichier
dsd22_delestrac.pdf (683.91 Ko) Télécharger le fichier
Origin Files produced by the author(s)

Dates and versions

lirmm-03775613 , version 1 (13-09-2022)
lirmm-03775613 , version 2 (28-10-2022)

Identifiers

Cite

Paul Delestrac, Lionel Torres, David Novo. Demystifying the TensorFlow Eager Execution of Deep Learning Inference on a CPU-GPU Tandem. DSD 2022 - 25th Euromicro Conference on Digital System Design, Aug 2022, Maspalomas, Spain. pp.446-455, ⟨10.1109/DSD57027.2022.00066⟩. ⟨lirmm-03775613v2⟩
156 View
551 Download

Altmetric

Share

More