Level 1 Parallel RTN-BLAS: Implementation and Efficiency Analysis

Chemseddine Chohra; Philippe Langlois; David Parello

Communication Dans Un Congrès Année : 2014

Level 1 Parallel RTN-BLAS: Implementation and Efficiency Analysis

(1) , (1) , (1)

Chemseddine Chohra

Fonction : Auteur correspondant
PersonId : 172814
IdHAL : chemseddine-chohra
ORCID : 0000-0003-3989-3857
IdRef : 233027009

Connectez-vous pour contacter l'auteur

Digits, Architectures et Logiciels Informatiques

Philippe Langlois

Fonction : Auteur
PersonId : 3635
IdHAL : philippe-langlois
IdRef : 104061731

Digits, Architectures et Logiciels Informatiques

David Parello

Fonction : Auteur
PersonId : 6914
IdHAL : david-parello
IdRef : 083867767

Digits, Architectures et Logiciels Informatiques

Résumé

Modern high performance computation (HPC) performs a huge amount of floating point operations on massively multi-threaded systems. Those systems interleave operations and include both dynamic scheduling and non-deterministic reductions that prevent numerical reproducibility, i.e. getting identical results from multiple runs, even on one given machine. Floating point addition is non-associative and the results depend on the computation order. Of course, numerical reproducibility is important to debug, check the correctness of programs and validate the results. Some solutions have been proposed like parallel tree scheme [1] or new Demmel and Nguyen's reproducible sums [2]. Reproducibility is not equivalent to accuracy: a reproducible result may be far away from the exact result. Another way to guarantee the numerical reproducibility is to calculate the correctly rounded value of the exact result, i.e. extending the IEEE-754 rounding properties to larger computing sequences. When such computation is possible, it is certainly more costly. But is it unacceptable in practice? We are motivated by round-to-nearest parallel BLAS. We can imple-ment such RTN-BLAS thanks to recent algorithms that compute correctly rounded sums. This work is a first step for the level 1 of the BLAS.

Mots clés

Floating point arithmetic numerical reproducibility Round-To-Nearest BLAS parallelism summation algorithms

Domaines

Logiciel mathématique [cs.MS] Arithmétique des ordinateurs

Fichier principal

scan2014.pdf (39.55 Ko)

(2014-09-24) SCAN2014.pdf (3.26 Mo)

Origine	Fichiers produits par l'(les) auteur(s)

Format	Présentation
Origine	Fichiers produits par l'(les) auteur(s)

Philippe Langlois : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01095172

Soumis le : samedi 27 juin 2015-13:38:32

Dernière modification le : vendredi 24 mars 2023-14:53:00

Archivage à long terme le : mercredi 16 septembre 2015-01:37:26

Dates et versions

lirmm-01095172 , version 1 (27-06-2015)

Licence

Paternité - Pas de modifications

Identifiants

HAL Id : lirmm-01095172 , version 1

Citer

Chemseddine Chohra, Philippe Langlois, David Parello. Level 1 Parallel RTN-BLAS: Implementation and Efficiency Analysis. SCAN: Scientific Computing, Computer Arithmetic and Validated Numerics, Sep 2014, Wurzburg, Germany. ⟨lirmm-01095172⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS UNIV-PERP DALI LIRMM MIPS UNIV-MONTPELLIER

409 Consultations

372 Téléchargements

Level 1 Parallel RTN-BLAS: Implementation and Efficiency Analysis

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Partager