Numerical Reproducibility for the Parallel Reduction on Multi- and Many-Core Architectures

Caroline Collange; David Defour; Stef Graillat; Roman Iakymchuk

doi:10.1016/j.parco.2015.09.001

Article Dans Une Revue Parallel Computing Année : 2015

Numerical Reproducibility for the Parallel Reduction on Multi- and Many-Core Architectures

(1) , (2) , (3) , (3)

1
2
3

Caroline Collange

Fonction : Auteur
PersonId : 177452
IdHAL : caroline-collange
IdRef : 151116776

Pushing Architecture and Compilation for Application Performance

David Defour

Fonction : Auteur
PersonId : 4651
IdHAL : david-defour
ORCID : 0000-0001-9923-2394
IdRef : 104542454

Digits, Architectures et Logiciels Informatiques

Stef Graillat

Fonction : Auteur
PersonId : 5653
IdHAL : stef-graillat
IdRef : 104060735

Performance et Qualité des Algorithmes Numériques

Roman Iakymchuk

Fonction : Auteur
PersonId : 966
IdHAL : roman-iakymchuk
IdRef : 253135079

Performance et Qualité des Algorithmes Numériques

Résumé

On modern multi-core, many-core, and heterogeneous architectures, floating-point computations, especially reductions, may become non-deterministic and, therefore, non-reproducible mainly due to the non-associativity of floating-point operations. We introduce an approach to compute the correctly rounded sums of large floating-point vectors accurately and efficiently, achieving deterministic results by construction. Our multi-level algorithm consists of two main stages: first, a filtering stage that relies on fast vectorized floating-point expansion; second, an accumulation stage based on superaccumulators in a high-radix carry-save representation. We present implementations on recent Intel desktop and server processors, Intel Xeon Phi co-processors, and both AMD and NVIDIA GPUs. We show that numerical reproducibility and bit-perfect accuracy can be achieved at no additional cost for large sums that have dynamic ranges of up to 90 orders of magnitude by leveraging arithmetic units that are left underused by standard reduction algorithms.

Mots clés

Reproducibility Accuracy Long accumulator Kulisch long accumulator Parallel floating-point summation Multi- and many-core architectures Error-free transformations

Domaines

Architectures Matérielles [cs.AR] Arithmétique des ordinateurs

Fichier principal

superaccumulator.pdf (632.55 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

David Defour : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01206348

Soumis le : jeudi 6 juin 2019-13:01:20

Dernière modification le : mercredi 6 novembre 2024-12:42:02

Dates et versions

lirmm-01206348 , version 1 (06-06-2019)

Identifiants

HAL Id : lirmm-01206348 , version 1
DOI : 10.1016/j.parco.2015.09.001

Citer

Caroline Collange, David Defour, Stef Graillat, Roman Iakymchuk. Numerical Reproducibility for the Parallel Reduction on Multi- and Many-Core Architectures. Parallel Computing, 2015, 49, pp.83-97. ⟨10.1016/j.parco.2015.09.001⟩. ⟨lirmm-01206348⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UPMC UNIV-RENNES1 CNRS INRIA UNIV-PERP INSA-RENNES IRISA LIP6 DALI LIRMM CENTRALESUPELEC IRISA-D3 INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC MIPS UNIV-MONTPELLIER UNIV-RENNES SORBONNE-UNIVERSITE SU-SCIENCES ANR UR1-MATH-NUM

500 Consultations

530 Téléchargements

Numerical Reproducibility for the Parallel Reduction on Multi- and Many-Core Architectures

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager