Level 1 Parallel RTN-BLAS: Implementation and Efficiency Analysis

Chemseddine Chohra 1, * Philippe Langlois 1 David Parello 1
* Auteur correspondant
1 DALI - Digits, Architectures et Logiciels Informatiques
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, UPVD - Université de Perpignan Via Domitia
Abstract : Modern high performance computation (HPC) performs a huge amount of floating point operations on massively multi-threaded systems. Those systems interleave operations and include both dynamic scheduling and non-deterministic reductions that prevent numerical reproducibility, i.e. getting identical results from multiple runs, even on one given machine. Floating point addition is non-associative and the results depend on the computation order. Of course, numerical reproducibility is important to debug, check the correctness of programs and validate the results. Some solutions have been proposed like parallel tree scheme [1] or new Demmel and Nguyen's reproducible sums [2]. Reproducibility is not equivalent to accuracy: a reproducible result may be far away from the exact result. Another way to guarantee the numerical reproducibility is to calculate the correctly rounded value of the exact result, i.e. extending the IEEE-754 rounding properties to larger computing sequences. When such computation is possible, it is certainly more costly. But is it unacceptable in practice? We are motivated by round-to-nearest parallel BLAS. We can imple-ment such RTN-BLAS thanks to recent algorithms that compute correctly rounded sums. This work is a first step for the level 1 of the BLAS.
Type de document :
Communication dans un congrès
SCAN: Scientific Computing, Computer Arithmetic and Validated Numerics, Sep 2014, Wurzburg, Germany. 16th GAMM-IMACS International Symposium on Scientific Computing, Computer Arithmetic and Validated Numerics, 2014, <http://www.scan2014.uni-wuerzburg.de/talks/>
Liste complète des métadonnées

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01095172
Contributeur : Philippe Langlois <>
Soumis le : samedi 27 juin 2015 - 13:38:32
Dernière modification le : vendredi 9 juin 2017 - 10:41:35
Document(s) archivé(s) le : mercredi 16 septembre 2015 - 01:37:26

Fichiers

Licence


Distributed under a Creative Commons Paternité - Pas de modifications 4.0 International License

Identifiants

  • HAL Id : lirmm-01095172, version 1

Collections

Citation

Chemseddine Chohra, Philippe Langlois, David Parello. Level 1 Parallel RTN-BLAS: Implementation and Efficiency Analysis. SCAN: Scientific Computing, Computer Arithmetic and Validated Numerics, Sep 2014, Wurzburg, Germany. 16th GAMM-IMACS International Symposium on Scientific Computing, Computer Arithmetic and Validated Numerics, 2014, <http://www.scan2014.uni-wuerzburg.de/talks/>. <lirmm-01095172>

Partager

Métriques

Consultations de
la notice

246

Téléchargements du document

322