Sensitivity Analysis and Compression Opportunities in DNNs Using Weight Sharing
Abstract
Deep artificial Neural Networks (DNNs) are currently one of the most intensively and widely used predictive models in the field of machine learning. However, the computational workload involved in DNNs is typically out of reach for lowpower embedded devices. The approximate computing paradigm can be exploited to reduce the DNN complexity. It improves performance and energy-efficiency by relaxing the need for fully accurate operations. There are a large number of implementation options leveraging many approximation techniques (e.g., pruning, quantization, weight-sharing, low-rank factorization, knowledge distillation, etc.). However, to the best of our knowledge, a few or no automated approach exists to explore, select and generate the best approximate version of a given DNN according to design objectives. The goal of this paper is to demonstrate that the design space exploration phase can enable significant network compression without noticeable accuracy loss. We demonstrate this via an example based on weight sharing and show that our direct conversion method can obtain a 4.85x compression rate with 0.14% accuracy loss in ResNet18 and 4.91x compression rate with 0.44% accuracy loss in SqueezeNet without involving retraining steps.
Origin | Files produced by the author(s) |
---|