Trade-offs in Neural Network Compression: Quantized and Binary Models for Keyword Spotting
Abstract
Enabling smart and independent IoT devices often requires to run complex Machine Learning (ML) workloads at the edge. Such systems usually operate with memories in the order of tens of kilobytes and low processing power. To fit within these constraints, model designers typically rely on low-precision integer representation of operations down to 1-bit, i.e., Binary Neural Networks (BNN). In this paper, we investigate the tradeoffs available to model designers between memory footprint and accuracy and the challenges to overcome for effective use of BNN. We show that designing BNN architectures is not a straightforward process. To overcome this, we propose a methodology based on design guidelines and Neural Architecture Search (NAS) to adapt traditional model architectures into BNN variants. As a case study, we apply this methodology to a ResNet-based model for a keyword spotting (KWS) application. Our results demonstrate that, contrary to 8-bit quantization, direct binarization significantly impacts accuracy. However, careful architecture redesign and hyperparameter tuning helps bringing BNNs performances on par with their quantized counterparts.
Origin | Files produced by the author(s) |
---|