Performance on SIMD architectures of auto-tuned programs for matrix multiplication
Résumé
The amount of numerical computations in scientific programs is ever increasing. Hence in recent years, a growing interest has emerged in dynamically adapting the precision of floating-point computations to balance performance and accuracy. We focus here on iterative routines. For that purpose, a tool was recently introduced, which enables such precision adaptation at the iteration level, leveraging multiple-precision computations and delta-debugging techniques, in order to produce a set of possible adaptations. The study presented in this article extends the exploration initiated by the aforementioned tool. For doing this, we developed a new tool to apply these adaptations to the input C program, allowing next to examine the performance characteristics of the output program. Leveraging SIMD micro-architectures, we investigate the potential for enhancing the performances delivered by this approach on the execution of precision-adapted iterative routines. By benchmarking against non-optimized versions, we assess the speedup achieved across a spectrum of the matrix multiplication program, and we illustrate how our framework of precision adaptation allows for achieving significant speedups on matrix multiplication, varying according to the accuracy threshold.
| Origine | Fichiers produits par l'(les) auteur(s) |
|---|---|
| Licence |
