Reproducible, Accurately Rounded and Efficient BLAS

Chemseddine Chohra; Philippe Langlois; David Parello

doi:10.1007/978-3-319-58943-5_49

Communication Dans Un Congrès Année : 2016

Reproducible, Accurately Rounded and Efficient BLAS

(1) , (1) , (1)

Chemseddine Chohra

Fonction : Auteur
PersonId : 172814
IdHAL : chemseddine-chohra
ORCID : 0000-0003-3989-3857
IdRef : 233027009

Digits, Architectures et Logiciels Informatiques

Philippe Langlois

Fonction : Auteur
PersonId : 3635
IdHAL : philippe-langlois
IdRef : 104061731

Digits, Architectures et Logiciels Informatiques

David Parello

Fonction : Auteur
PersonId : 6914
IdHAL : david-parello
IdRef : 083867767

Digits, Architectures et Logiciels Informatiques

Résumé

Numerical reproducibility failures rise in parallel computation because floating-point summation is non-associative. Massively parallel and optimized executions dynamically modify the floating-point operation order. Hence, numerical results may change from one run to another. We propose to ensure reproducibility by extending as far as possible the IEEE-754 correct rounding property to larger operation sequences. We introduce our RARE-BLAS (Reproducible, Accurately Rounded and Efficient BLAS) that benefits from recent accurate and efficient summation algorithms. Solutions for level 1 (asum, dot and nrm2) and level 2 (gemv) routines are presented. Their performance is studied compared to Intel MKL library and other existing reproducible algorithms. For both shared and distributed memory parallel systems, we exhibit an extra-cost of 2× in the worst case scenario, which is satisfying for a wide range of applications. For Intel Xeon Phi accelerator a larger extra-cost (4× to 6×) is observed, which is still helpful at least for debugging and validation steps.

Domaines

Arithmétique des ordinateurs Calcul parallèle, distribué et partagé [cs.DC] Logiciel mathématique [cs.MS]

Fichier principal

REPPAR16.pdf (313.96 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Philippe Langlois : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01280324

Soumis le : jeudi 28 juillet 2016-13:22:03

Dernière modification le : mardi 16 janvier 2024-16:28:58

Dates et versions

lirmm-01280324 , version 1 (29-02-2016)

lirmm-01280324 , version 2 (28-07-2016)

Identifiants

HAL Id : lirmm-01280324 , version 2
DOI : 10.1007/978-3-319-58943-5_49

Citer

Chemseddine Chohra, Philippe Langlois, David Parello. Reproducible, Accurately Rounded and Efficient BLAS. Euro-Par: Parallel Processing Workshops., Aug 2016, Grenoble, France. pp.609-620, ⟨10.1007/978-3-319-58943-5_49⟩. ⟨lirmm-01280324v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS UNIV-PERP DALI LIRMM GENCI MIPS UNIV-MONTPELLIER

278 Consultations

508 Téléchargements

Reproducible, Accurately Rounded and Efficient BLAS

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager