Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load Prediction

Rahul Bera; Konstantinos Kanellopoulos; Shankar Balachandran; David Novo; Ataberk Olgun; Mohammad Sadrosadati; Onur Mutlu

doi:10.1109/MICRO56248.2022.00015

Communication Dans Un Congrès Année : 2022

Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load Prediction

(1) , (1) , (2) , (3) , (1) , (1) , (1)

1
2
3

Rahul Bera

Fonction : Auteur

Department of Information Technology and Electrical Engineering [Zürich]

Konstantinos Kanellopoulos

Fonction : Auteur

Department of Information Technology and Electrical Engineering [Zürich]

Shankar Balachandran

Fonction : Auteur

Intel Research Laboratory

David Novo

Fonction : Auteur
PersonId : 170933
IdHAL : david-novo
ORCID : 0000-0002-5510-4152
IdRef : 244276455

ADAptive Computing

Ataberk Olgun

Fonction : Auteur

Department of Information Technology and Electrical Engineering [Zürich]

Mohammad Sadrosadati

Fonction : Auteur

Department of Information Technology and Electrical Engineering [Zürich]

Onur Mutlu

Fonction : Auteur

Department of Information Technology and Electrical Engineering [Zürich]

Résumé

Long-latency load requests continue to limit the performance of high-performance processors. To increase the latency tolerance of a processor, architects have primarily relied on two key techniques: sophisticated data prefetchers and large on-chip caches. In this work, we show that: 1) even a sophisticated state-of-the-art prefetcher can only predict half of the off-chip load requests on average across a wide range of workloads, and 2) due to the increasing size and complexity of on-chip caches, a large fraction of the latency of an off-chip load request is spent accessing the on-chip cache hierarchy. The goal of this work is to accelerate off-chip load requests by removing the on-chip cache access latency from their critical path. To this end, we propose a new technique called Hermes, whose key idea is to: 1) accurately predict which load requests might go off-chip, and 2) speculatively fetch the data required by the predicted off-chip loads directly from the main memory, while also concurrently accessing the cache hierarchy for such loads. To enable Hermes, we develop a new lightweight, perceptron-based off-chip load prediction technique that learns to identify off-chip load requests using multiple program features (e.g., sequence of program counters). For every load request, the predictor observes a set of program features to predict whether or not the load would go off-chip. If the load is predicted to go off-chip, Hermes issues a speculative request directly to the memory controller once the load's physical address is generated. If the prediction is correct, the load eventually misses the cache hierarchy and waits for the ongoing speculative request to finish, thus hiding the on-chip cache hierarchy access latency from the critical path of the off-chip load. Our evaluation shows that Hermes significantly improves performance of a state-of-the-art baseline. We open-source Hermes.

Domaines

Informatique [cs]

Fichier principal

2209.00188.pdf (1.65 Mo)

Origine	Fichiers produits par l'(les) auteur(s)

Cathy Tuchming : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-03777161

Soumis le : mardi 17 octobre 2023-12:49:12

Dernière modification le : jeudi 7 novembre 2024-16:14:03

Dates et versions

lirmm-03777161 , version 1 (17-10-2023)

Licence

Paternité

Identifiants

HAL Id : lirmm-03777161 , version 1
ARXIV : 2209.00188
DOI : 10.1109/MICRO56248.2022.00015

Citer

Rahul Bera, Konstantinos Kanellopoulos, Shankar Balachandran, David Novo, Ataberk Olgun, et al.. Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load Prediction. MICRO 2022 - 55th IEEE/ACM International Symposium on Microarchitecture, Oct 2022, Chicago, IL, United States. pp.1-18, ⟨10.1109/MICRO56248.2022.00015⟩. ⟨lirmm-03777161⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS LIRMM ADAC UNIV-MONTPELLIER

62 Consultations

26 Téléchargements

Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load Prediction

Résumé

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager