Exploring potentials of perpendicular magnetic anisotropy STT-MRAM for cache design
Xiaolong Zhang, Yuanqing Cheng, Weisheng Zhao, Youguang Zhang, Aida Todri-Sanial

To cite this version:

HAL Id: lirmm-01248593
https://hal-lirmm.ccsd.cnrs.fr/lirmm-01248593
Submitted on 17 Jul 2019

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
EXPLORING POTENTIALS OF PERPENDICULAR MAGNETIC ANISOTROPY STT-MRAM FOR CACHE DESIGN

Xiaolong Zhang*, Yuanqing Cheng*, Weisheng Zhao†, Youguang Zhang* and Aida Todri-Sanial‡

*School of Electronic and Information Engineering, Beihang University, Beijing, China 100191
†IEF, Université Paris-Sud / CNRS, Orsay, 91405, France
‡LIRMM – Université Montpellier 2 / CNRS, Montpellier, 34095, France

Corresponding Author’s Email: yuanqing@ieee.org

Abstract—Traditional CMOS integrated circuits suffer from elevated power consumption as technology node advances. A few emerging technologies are proposed to deal with this issue. Among them, STT-MRAM is one of the most important candidates for future on-chip cache design. However, most STT-MRAM based architecture level evaluations focus on in-plane magnetic anisotropy effect. In the paper, we evaluate the most advanced perpendicular magnetic anisotropy (PMA) STT-MRAM for on-chip cache design in terms of performance, area and power consumption perspectively. The experimental results show that PMA STT-MRAM has higher power efficiency compared to SRAM as well as desirable scalability with technology node shrinking.

I. INTRODUCTION

As the technology node continuously shrinks, CMOS based VLSI circuits suffer from severe static power consumption challenge due to the aggravating sub-threshold leakage. To tackle this problem, some non-volatile technologies, such as STT-MRAM [1], PCRAM [2] and ReRAM [3], have emerged to reduce leakage power consumption. Among them, STT-MRAM (Spin Transfer Torque MRAM) is one of the most promising candidates for on-chip cache design because of its fast access speed, near zero leakage and unlimited read/write endurance [4].

STT-MRAM stores data by magnetic tunneling junction (MTJ). When spinpolarized current passes through MTJ, the magnetic direction of the free layer can be switched to parallel or antiparallel to that of the fixed layer resulting different MTJ resistances. Depending on the low/high resistance status of MTJ, a ‘0’ or ‘1’ can be recorded. Above 90nm technology node, in-plane magnetic anisotropy (i.e., magnetic directions of both free layer and fixed layer are within MTJ surface) dominates MTJ switching. Faber et al. proposed a compact model of in-plane STT-MRAM, which can capture its electrical behavior accurately [5]. Based on such compact models, many STT-MRAM based on-chip cache organizations are proposed to exploit its low power, non-volatility and fast read speed from the architecture perspective [6].

However, when the MTJ feature size scales down to 40nm, perpendicular magnetic anisotropy (PMA) dominates MTJ switching and manifests distinct electrical characteristics. PMA STT-MRAM has smaller switching current, regular shape (circle instead of ellipse) and consumes lower power. Therefore, it has desirable scalability with technology node. Unfortunately, there are few investigations focusing on perpendicular magnetic anisotropy STT-MRAM, especially at the architecture level due to the lack of low level device model capable of describing PMA MTJ switching mechanism accurately until recently. In the paper, we evaluate the effectiveness of PMA STT-MRAM as the on-chip cache based on the PMA STT-MRAM compact model proposed by Zhang et al. [7]. Our main contributions are as follows,

- To the best of our knowledge, this is the first work to evaluate PMA STT-MRAM effectiveness for on-chip cache design;
- Extensive simulations are performed to compare PMA STT-MRAM with SRAM technologies in terms of area, power and performance;
- Design space suitable for PMA STT-MRAM cache design are explored to instruct designer to take its full advantages.

The rest of the paper is organized as follows. Section II introduces PMA STT-MRAM briefly as well as the 40nm compact PMA STT-MRAM model used in this paper. Section III presents the architecture level evaluation method and several important considerations for parameter settings. Experimental results and analysis are shown in Section IV. Section V concludes the paper.

II. PMA STT-MRAM COMPACT MODEL

Instead of storing data by charge, STT-MRAM store information by electron’s spin property. The core component
of STT-MRAM is. It consists of two ferromagnetic layers (usually using CoFeB), and one insulating layer (usually made by MgO), which is sandwiched between ferromagnetic layers as shown in Fig.1. Bottom ferromagnetic layer pinned to a fixed magnetization direction is called reference layer while the magnetization direction of top layer, which is called free layer, can be switched by passing through spin-polarized current. When magnetization directions of reference layer and free layer are the same, MTJ manifests low resistance. Otherwise, it has high resistance. The information stored in MTJ can be sensed or changed through a CMOS access transistor, which is called free layer, can be switched by passing through spin-polarized current. When magnetization directions of reference layer and free layer are the same, MTJ manifests low resistance. Otherwise, it has high resistance. The information stored in MTJ can be sensed or changed through a CMOS access transistor, which is called free layer, can be switched by passing through spin-polarized current. The shape of MTJ for PMA STT-MRAM is circular whereas that of in-plane STT-MRAM is elliptical.

Our work is based on a compact model characterizing sub-50nm MTJ electromagnetic behaviors [7]. In this region, enhanced thermal stability can be calculated by the following formula,

\[ E = \frac{\mu_0 M_s \times V \times H_K}{2} \]  

(1)

The intrinsic critical current can be derived by,

\[ I_{c0} = \alpha \frac{\gamma e}{\mu_B g} (\mu_0 M_s) H_K V \]  

(2)

See Table I for the symbol meanings of above two formulas. Comparing with experimental results, the compact model can capture PMA MTJ electrical behavior accurately according to [7]. Our following exploration of PMA STT-MRAM cache design will base on this compact model.

### III. Evaluation Method & Critical Parameter Settings

In the paper, we use NVSim to explore the PMA STT-MRAM based cache design space [8]. NVSim is an architecture level simulator for emerging non-volatile memory design optimizations. It requires some technology node and memory cell structure parameters as inputs, and outputs optimal cache/memory organization for a specified optimization goal, such as performance, area and power consumption. Important parameters used for simulating PMA STT-MRAM device based on-chip cache design are derived as follows.

**MTJ resistance** If two ferromagnetic layers have the same magnetization direction, MTJ is in low resistance state.

- **PMA MTJ**
- **In-plane MTJ**

The resistance can be calculated by [7],

\[ R_P = \frac{t_{ox}}{F \times \phi^{-\frac{3}{2}}} \times e^{1.025 	imes t_{ox} \times \phi^{-\frac{1}{2}}} \]  

(3)

Please refer Table I for symbol meanings and values in the formula.

We assume Tunnel MagnetoResistance (TMR) value to be 120% according to [7]. The antiparallel resistance can be derived from TMR and \( R_P \). The insulator layer thickness is assumed to be 0.85nm as shown in Table I.

**NMOS transistor aspect ratio** To determine NMOS access transistor’s aspect ratio, we construct a PMA STT-MRAM cell using Cadence Virtuoso and our compact device model. Then, we use Cadence Spectre for circuit simulation to derive aspect ratio of NMOS transistor with enough driving ability for MTJ switching. The simulation results show that W/L should be larger than 2 for fast and reliable switching. Therefore, we choose W/L = 3 in the following evaluation.

**Write energy of a single PMA STT-MRAM cell** The supply voltage of PMA STT-MRAM cell is set to be 1.2V to ensure MTJ switching. Then, the MTJ write power and NMOS access power can be calculated by NMOS \( V_{ds} \) and MTJ \( I_{np} \). As a result, the write energy of a single cell is 0.51pJ at 40nm technology node and 0.15pJ at 32nm technology node respectively.

### IV. Experimental Results

**A. Area**

We compare cache area of PMA STT-MRAM and SRAM with different capacities for 32nm and 40nm technology nodes. The results are plotted in Fig.2. As shown in the figure, PMA STT-MRAM occupies much smaller area compared to SRAM with the same capacity. The reason is that a STT-MRAM cell size is 12F^2 while a SRAM cell size is 146F^2 (F is the feature size of fabrication process). However, since peripheral circuitry also occupies some area, the area ratio between STT-MRAM and SRAM is smaller than 146/12. As the cache capacity increases, PMA STT-MRAM based cache can save more area compared to SRAM. For example, at 40nm technology node, STT-MRAM can save nearly 80% area compared with SRAM cache indicating high integration density advantage brought by PMA STT-MRAM.

**B. Power Consumption**

As for in-plane STT-MRAM, write energy dominates the dynamic power consumption and is usually higher than that.
of SRAM. In contrast, PMA STT-MRAM shows a significant advantage in term of write power as shown in Fig. 3. Although the power for writing a single STT-MRAM cell is higher than that for a single SRAM cell, PMA STT-MRAM has much smaller area thus much lower H-Tree interconnect power. Moreover, PMA STT-MRAM write latency for a single cell is comparable to SRAM. Therefore, PMA STT-SRAM shows overall higher power efficiency than SRAM, and a desirable scalability with technology node shrinking. The static power comparison is shown in Fig. 4. Thanks to non-volatile property, PMA STT-MRAM has negligible leakage power compared to SRAM similar to its in-plane counterpart. As the technology node shrinks and capacity increases, the SRAM leakage power rises dramatically compared to PMA STT-MRAM, which indicates huge power benefits of PMA STT-MRAM as on-chip cache.

C. Access Latency

The read latency comparisons at 32nm and 40nm technology nodes are shown in Fig. 5. It shows that SRAM read access latency is smaller than PMA STT-MRAM when the cache capacity is small (e.g., smaller than 4MB). As the capacity increases, PMA STT-MRAM shows lower latency due to the reducing H-Tree interconnect delay. The write latency of PMA STT-MRAM is larger than SRAM as shown in Fig. 6. However, write path usually doesn’t lie in the critical path thereby will not impact cache performance significantly.

In summary, although PMA STT-MRAM has relative larger write latency, it outperforms SRAM in other metrics such as area overhead, power consumption and read latency. Therefore, it is advantageous to use PMA STT-MRAM for future on-chip cache design.

V. Conclusions

As integration density sharply increases with technology node shrinking, CMOS technology suffers from severe power consumption. To overcome this problem, some emerging technologies are proposed including PCRAM, ReRAM and STT-MRAM. STT-MRAM is a competitive candidate to be used for cache design. However, existing research focuses on in-plane STT-MRAM. When MTJ feature size scales down to 40nm and below, perpendicular magnetic anisotropy effect becomes dominant and requires new compact model as well as architecture level evaluations based on it. In this paper, we evaluate benefits brought by PMA STT-MRAM compared to SRAM from power, performance and area perspectives with the aid of a PMA STT-MRAM compact model. The simulation results show that PMA STT-MRAM can achieve better performance, lower power consumption and smaller area overhead and pave the way for next generation cache design.

REFERENCES