Service interruption on Monday 11 July from 12:30 to 13:00: all the sites of the CCSD (HAL, EpiSciences, SciencesConf, AureHAL) will be inaccessible (network hardware connection).
Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

Reproducible, Accurately Rounded and Efficient BLAS

Chemseddine Chohra 1 Philippe Langlois 1 David Parello 1 
1 DALI - Digits, Architectures et Logiciels Informatiques
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, UPVD - Université de Perpignan Via Domitia
Abstract : Numerical reproducibility failures rise in parallel computation because floating-point summation is non-associative. Massively parallel and optimized executions dynamically modify the floating-point operation order. Hence, numerical results may change from one run to another. We propose to ensure reproducibility by extending as far as possible the IEEE-754 correct rounding property to larger operation sequences. We introduce our RARE-BLAS (Reproducible, Accurately Rounded and Efficient BLAS) that benefits from recent accurate and efficient summation algorithms. Solutions for level 1 (asum, dot and nrm2) and level 2 (gemv) routines are presented. Their performance is studied compared to Intel MKL library and other existing reproducible algorithms. For both shared and distributed memory parallel systems, we exhibit an extra-cost of 2× in the worst case scenario, which is satisfying for a wide range of applications. For Intel Xeon Phi accelerator a larger extra-cost (4× to 6×) is observed, which is still helpful at least for debugging and validation steps.
Complete list of metadata
Contributor : Philippe Langlois Connect in order to contact the contributor
Submitted on : Monday, February 29, 2016 - 1:09:58 PM
Last modification on : Friday, October 22, 2021 - 3:07:35 PM
Long-term archiving on: : Sunday, November 13, 2016 - 5:25:56 AM


Files produced by the author(s)


  • HAL Id : lirmm-01280324, version 1


Chemseddine Chohra, Philippe Langlois, David Parello. Reproducible, Accurately Rounded and Efficient BLAS. 2016. ⟨lirmm-01280324v1⟩



Record views


Files downloads