Neutron-Induced Effects on a Self-Refresh DRAM
Lucas Matana Luza, Daniel Söderström, Helmut Puchner, Rubén García Alía, Manon Letiche, Carlo Cazzaniga, Alberto Bosio, Luigi Dilillo

To cite this version:

HAL Id: lirmm-03435635
https://hal-lirmm.ccsd.cnrs.fr/lirmm-03435635
Submitted on 18 Nov 2021

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Neutron-Induced Effects on a Self-Refresh DRAM

Lucas Matana Luza¹, Daniel Söderström², Helmut Puchner³, Rubén García Alía⁴, Manon Letiche⁵, Carlo Cazzaniga⁶, Alberto Bosio⁷ and Luigi Dilillo¹

¹ LIRMM, Univ Montpellier, CNRS, Montpellier, France. *{lucas.matana-luza, dilillo}@lirmm.fr
² Department of Physics, University of Jyväskylä, Jyväskylä, Finland.
³ Infineon Technologies, CA, San Jose, USA.
⁴ Engineering Department, CERN, Geneva, Switzerland.
⁵ Institut Laue–Langevin, Grenoble, France.
⁶ ISIS Neutron and Muon Source, Science and Technology Facilities Council, Didcot, UK.
⁷ Univ Lyon, ECL, INSA Lyon, CNRS, UCBL, CPE Lyon, INL, UMR5270, 69130 Ecully, France.

Abstract—The field of radiation effects in electronics research includes unknowns for every new device, node size, and technical development. In this study, static and dynamic test methods were used to define the response of a self-refresh DRAM under neutron irradiation. The neutron-induced effects were investigated and characterised by event cross sections, soft-error rate, and bitmaps evaluations, leading to an identification of permanent and temporary stuck cells, single-bit upsets, and block errors. Block errors were identified in different patterns with dependency in the addressing order, leading to up to two thousand faulty words per event, representing a real threat from a user perspective, especially in critical applications. An analysis of the damaged cells' retention time was performed, showing a difference in the efficiency of the self-refresh mechanism and a read operation. Also, a correlation of the fault mechanism that generates both single-bit upsets and stuck bits is proposed. Post-irradiation high-temperature annealing procedures were applied, showing a recovery behaviour on the damaged cells.

Index Terms—Neutron, radiation, Self-Refresh, DRAM, SEE, stuck bits, HyperRAM.

I. INTRODUCTION

Neutron-induced soft errors on electronics devices have been a matter of study since the 70s [1]. The interaction of cosmic rays with the terrestrial atmosphere generates high-energy and thermal neutrons [2], [3]. Neutrons interact with matter and, through the processes of elastic and inelastic scattering and nuclear reactions, result in charged nuclear recoils [4], [5]. The created free charges of electron-hole pairs generated from the neutron events might subsequently lead to Single-Event Effects (SEE) in the devices [5].

The reaction of thermal neutrons with boron-10 (¹⁰B) also generates byproducts (an alpha particle and a lithium-7 nucleus) that can cause Single-Event Upsets (SEUs) [2], [3], [6], [7]. These effects were a concern for static random access memories (SRAMs) and dynamic random access memories (DRAMs) fabricated with borophosphosilicate glass (BPSG) during the 90s. Nowadays, the BPSG is not present in these devices [3].

However, several works have been done on sub-micron devices showing that even without the BPSG layer in advanced Si technologies, there is a high probability of contamination during the fabrication process [8], [9]. The ¹⁰B is still present as p-doping in the source/drain implementation [3], [10]–[12]. For instance, ¹⁰B originate from B₂H₆ etcher gas used to improve the adhesion of tungsten in the trench contacts, causing high concentrations of ¹⁰B close to active regions of transistors [13], [14]. As a consequence of the scaling trend, a reduction of the ¹⁰B present in high-density devices is expected since also the contact size is reduced. At the same time, the shrinking of the technology node may also decrease the critical charge needed to induce SEUs [10]. In addition, the details of the internal architecture are normally not publicly available since they are proprietary, and radiation experiments are an important method to evaluate the neutron sensitivity of modern devices. Often neglected, also the impact of thermal neutrons should not be ignored [15]–[17]. As shown in [18], thermal neutrons contribute to the error rate in modern computing devices, such as the double-data-rate (DDR3 and DDR4) DRAMs.

DRAM cells present a variable retention time (VRT) capability. This effect is intensified when the device is exposed to radiation environments which enhance the cell leakage current, and consequently, reduces its retention time. The cells with a reduced retention capability may appear as stuck bits, that are known to be induced by the radiation [19]–[21]. This behaviour was already reported in several studies on DRAM memories with different radiation sources, showing the intermittent behaviour with a radiation-induced variation in the retention time of the memory cells [22]–[25]. A relation between the bias condition and the occurrence of stuck bits is discussed in [26], as well as a temperature dependency [27].

Micro-dose and displacement damage is concluded as a cause of this effect in several works [19], [22]. In, e.g., [22], [27], the stuck bits were attributed to single-particle displacement damage effects (SPDDE), induced by single high-energy
neutrons and protons. Also, in [23], the authors state that the neutron-induced number of VRT cells is similar to the observed in the same TID range on $^{60}\text{Co}$ testing. However, the work also shows that the intermittent stuck bits can be caused by neutron-induced displacement damage.

In our previous work [28], we presented the effects of thermal-neutron irradiation on a self-refresh DRAM, also known as Pseudo-Static RAM (PSRAM), where we identified the occurrence of Single-Bit Upsets (SBUs), stuck bits, and block errors in the memory. In this work, we extended the study also by analysing the effects under an atmospheric-like neutron spectrum, in which we identified the same types of faults. Moreover, an analysis of the damaged cells’ retention time was performed, showing a difference between the self-refresh mechanism and a read operation. Additionally, a correlation of the fault mechanism that generates both SBUs and stuck bits under neutron irradiation is proposed, in line with the results under electron irradiation in [24]. Furthermore, high-temperature annealing was observed in post-irradiation tests.

The rest of the paper is structured as follows: Section II presents the Device Under Test (DUT), the test facilities, and the experimental setup; Section III describes the applied test modes; Section IV presents and analyzes the results from the neutron irradiation; Section V concludes the work.

II. TEST SETUP

A. Device Under Test

The DUT is the S27KS0642GABHI020, a 64 Mib HyperRAM™ self-refresh DRAM manufactured by Cypress Semiconductor. The DUT is a high-speed CMOS with a HyperBus™ interface, which uses the Double Data Rate (DDR) to reach a data throughput up to 400 MBps with a maximum clock rate of 200 MHz. The memory is laid out on a 38 nm technology, and the cell array is composed of 8192 rows, and each row contains 512 word (16 bits) addresses. The self-refresh mechanism distributes single row refresh operations with an array refresh interval of 64 ms [29].

B. Test Facilities

This work involves two separate test campaigns. The first test campaign was carried out at the Platform for Advanced Characterisation (PAC-G) facility that is hosted by the Institute Laue Langevin (ILL) in Grenoble, France, using the D50 instrument. This instrument provides thermal neutrons moderated by liquid deuterium at 20 K, and the captured flux (i.e., the equivalent flux of 25 meV neutrons, which correspond to the room temperature of 300 K) was $10^9$ n/cm$^2$/s, which is controlled by a $^3\text{He}$-detector and periodical gold foil measurements [30].

The second test campaign [31] was carried out at the Rutherford Appleton Laboratories, UK, at the ChipIr beamline that provides an atmospheric-like neutron spectrum with a flux of about $5\times10^6$ n/cm$^2$/s for energies above 10 MeV, and also a thermal component for energies lower than 0.5 eV with a flux of $4\times10^5$ n/cm$^2$/s [32], [33]. The thermal neutron component is smaller than 10% [18], [34]. In comparative terms, the neutron flux in the beamline is approximately $10^9$ times larger than the atmospheric neutron flux, which is 13 n/cm$^2$/h ($3.6\times10^{-3}$ n/cm$^2$/s) at sea level [35].

C. Test Setup

The test setup is composed of a control board based on the Zynq-7000 SoC from Xilinx and a daughter board carrying the DUT. Fig. 1 depicts the top-level diagram of the controller system with the DUT. The control system uses the System-on-Chip (SoC) ARM Cortex™-A9 processor to perform the test algorithms on the DUT through the HyperBus™ controller, which is an IP (Intellectual Property) provided by Cypress and implemented in the SoC’s Programmable Logic, which manages the communication between the processor and the DUT.

During the tests, the power supply was monitored to identify Single-Event Latch-up (SEL). All performed tests were logged with the logical address of the observed errors, bit error data, and operation status. Functional tests were performed between the runs to check the full functionality of the device.

For the first test campaign, the DUT was tested under room temperature and nominal supply voltage, using a 25 meV thermal neutron equivalent flux of $10^9$ n/cm$^2$/s with a 30 mm diameter beam, and it was possible to reach a cumulative fluence of $7.8 \times 10^{12}$ n/cm$^2$. In the second test campaign, for energies above 10 MeV, the average flux was about $4 \times 10^6$ n/cm$^2$/s, reaching a cumulative run fluence of $8.25 \times 10^{11}$ n/cm$^2$. In both cases, to avoid faulty behaviours in the system, the control board was positioned out of the beam, and it was also shielded using a boron carbide material [36].

To increase the reliability of the system, the SoC configuration memory (CRAM) was monitored by the commercial Xilinx scrubber, the Soft Error Mitigation (SEM) core. It reports detected SBUs, and, when possible, corrects them [37]. In the case of an uncorrectable error induced by a secondary particle hitting the SoC CRAM, the controller system was reprogrammed and relaunched.

III. TEST MODES

In this study, to evaluate the memory response during irradiation, static and dynamic memory tests were applied to the DUT. Dynamic tests constantly access the memory
employing read and write operations in order to emulate real applications and detect functional faults [38], [39]. For the static test, a write operation is performed with a known data pattern (e.g., solid ‘0’, solid ‘1’ and checkerboard patterns), then the memory is irradiated during a time interval. During irradiation, the memory then only performs data refresh, after which a readback operation is applied to identify the corrupted bits.

For dynamic tests, four different algorithms were used: March C-, Dynamic Stress, Dynamic Classic, and mMats+ [40], [41]. These algorithms were previously used on SRAM [42], FRAM [43], MRAM [44], to evaluate the radiation impact on the devices. The algorithms are presented below in: (1) March C-, (2) Dynamic Stress, (3) Dynamic Classic, and (4) mMats+. The arrow indicates the addressing order (‘↑’ up or ‘↓’ down), ‘w’ (write), and ‘r’ (read) indicates the operation, and the following Boolean number indicates the data background. The algorithms are composed of elements indicated by the arrow, followed by the operations in parenthesis. In our work, the operations enclosed by the parenthesis are performed in sequence in each memory address. When the addressing order is ‘↑’, the operations are executed from the address 0 to N, and when is ‘↓’, the operations are executed from the address N down to 0, being N the highest memory address. Thus, e.g., the element ‘↑(r0, w1)’, goes from the first address up to the last one, applying a read operation (where a solid ‘0’ data background is expected) followed by a write operation (using the solid ‘1’) in each address. A complete dynamic test algorithm is delimited by a bracket pair [45]. For the March C-, Dynamic Stress, and mMats+, the first element (up write operation) is performed only once to initialise the memory.

\[
\begin{align*}
\{\uparrow (w0);\} \\
\{\uparrow (r0, w1); \uparrow (r1, w0); \downarrow (r0, w1); \downarrow (r1, w0); \uparrow (r0);\} \\
\{\uparrow (w1);\} \\
\{\uparrow (r1, w0, r0, r0, r0, r0, r0, r0); \uparrow (r0, w1, r1, r1, r1, r1, r1, r1); \uparrow (r1, w0, r0, r0, r0, r0, r0, r0); \downarrow (r0, w1, r1, r1, r1, r1, r1, r1); \downarrow (r1, w0, r0, r0, r0, r0, r0, r0); \uparrow (r0, w1, r1, r1, r1, r1, r1, r1);\} \\
\{\uparrow (w0);\uparrow (r0); \uparrow (w1); \downarrow (r1);\} \\
\{\uparrow (w0);\} \\
\{\uparrow (r0, w1); \uparrow (r1, w0);\}
\end{align*}
\]

(1) (2) (3) (4)

**IV. RESULTS**

The analysis of the test outputs led to the identification of different types of faults. First, in this section, the results related to SBUs and stuck bits are detailed, including retention capability analysis and high-temperature thermal annealing. Following this, a description and discussion of large events leading to clusters of errors defined in this work as block errors are presented. Finally, the overall event cross section and Soft Error Rate (SER) for the three different types of faults are given.

**A. SBUs and Stuck Bits**

The simplest observed fault consists of SBUs, appearing as a ‘0’ to ‘1’, and ‘1’ to ‘0’ transition. Analysing the data from the full test campaign, for each DUT, we classified as SBU errors in bits that appeared only once, having no further occurrence in the same location.

Stuck bits were observed in two different manners: permanent and temporary stuck bits. The fault is defined as a memory cell that has its retention time affected by a particle interaction resulting in a cell with a stuck value (‘0’ or ‘1’) independently of the written value. In this study, permanent stuck bits are ones that, after the first appearance, return a faulty logic value for each of the following read accesses to the faulty address. In the case of temporary stuck bits, the error returns during a certain time window, with an intermittent behaviour. During the test campaign, as described in Section III, the algorithms were executed always accessing the full range of memory addresses. The time between write and read, for each address, was then on average 4.5 s, this then being the time used to determine a stuck bit, with the memory array constantly being self-refreshed every 64 ms if not specifically stated otherwise.

The stuck cell’s logic value was either ‘0’ or ‘1’, showing that each logic value is represented by a charged or discharged capacitor depending on the memory region. The acquired results support this assumption, where for atmospheric-like neutrons, in which the amount of stuck-at and SBU faults were about two thousand events, 49% of the cells were stuck-at, or flipped, to ‘0’, and 51% to ‘1’.

The number of stuck bits as a function of cumulative run fluence is presented in Fig. 2 for thermal neutron irradiation, and in Fig. 3 for the atmospheric-like neutron beam. The points depicted in these figures represent the number of stuck bits (both permanent and temporary) during a test run. The data points were generated by first identifying all the occurrences of stuck bits, then, in the graphs, the points depict the number of stuck bits addresses that were identified within each test run and are located at the total cumulative fluence on the DUT at the end of the runs. The results exhibit a growth of stuck bits with the increase of the cumulative run fluence.

For thermal neutrons, no significant difference can be observed between static and dynamic data. The atmospheric-like irradiation results presented more variations on the static test mode. These variations can be caused by the long exposure (long duration of the run). The arrow in the plot indicates a
result related to a 9 hours irradiation run with a static test (memory in retention mode with just the self-refresh action during the whole run).

Moreover, in Fig. 4 and Fig. 5, it is presented the bit cross section for each type of algorithm and test mode. For this purpose, first, we identified all the bit addresses where an SBU or a stuck bit was present. Then, according with each type of test (all the dynamic algorithms, and the static mode), the bit cross section is defined as:

$$\sigma_{\text{type}(\text{bit})} = \frac{\sum N_{\text{type}}}{\sum F_{\text{type}} \times M}$$  \hspace{1cm} (5)$$

where $\sum N_{\text{type}}$ is the sum of the number of bits that return an error (SBU and stuck are presented separately) within the test type, $\sum F_{\text{type}}$ is the cumulative run fluence of each test type, and $M$ is the memory size in bits.

During the exposure under thermal neutrons, with a flux of $10^9$ n/cm²/s, the SBUs appear in dynamic and static tests runs, with only a few events. In the second test campaign, with an average flux of $4 \times 10^8$ n/cm²/s and an atmospheric-like neutron beam, the SBUs only appeared during the static mode, with over a thousand events, excluding a unique event during a Dynamic Classic. Even with a lower cumulative fluence, the amount of SBU occurrences was higher, about $62 \times$, under the atmospheric-like neutron beam. However, the events were identified only during static mode tests. In the dynamic mode, the cells are continuously accessed for read or write actions that both induce a refresh of the cells’ content. In the static mode, the refresh is reduced only to the self-refresh mechanism of the memory. Thus, w.r.t static mode, the charge stored in the cells is statistically lower (weaker cells), and, for this reason, the SBU occurrence is higher. Since SBUs were identified in dynamic test mode during the ILL test campaign but not in ChipIr, it is possible that the lower flux of the ChipIr beam may play a role in this different behaviour. Furthermore, the occurrence of stuck bits is very similar within the four dynamic algorithms.

B. Cells’ Retention Time

Temporary stuck bits present the same behaviour as the permanent ones. The only difference is that, in the first case, the fault is not permanent and just occurs during consecutive write and read operations that were performed within the dynamic and static test modes. Temporary stuck bits also presented a different level of damage, i.e., different levels of degradation of the cell’s retention capability. The duration of these temporary errors was different depending on the test run. During Dynamic Stress tests, all the cells that present the stuck-at phenomenon do not return the faulty bit as an error in the sequential five read back performed just after a
write operation. However, it appeared in the first read operation performed in the next element of the algorithm. This behaviour can be explained with an induced reduction (by the particle interaction) of the retention time of the storage capacitor of the cell [22].

To deeper analyse this effect, a post-radiation test was performed in both DUTs targeting the cells that present the stuck behaviour. To identify the faulty cells, the entire memory is written with both solid ‘0’ and ‘1’ data patterns, and since they present a different data retention time, the read operation is performed just after 60 seconds in order to elapse a significant amount of time to induce the stuck cells to lose their contents. Then, the retention test consists of writing a ‘0’ (or ‘1’) in the faulty cells’ addresses, followed by a wait statement (with different duration), and finally, a read-back operation is performed to check if the elapsed time between the write and read operations was enough to induce the fault. Since the memory self-refresh mechanism can be disabled, we considered the two different test scenarios with the self-refreshing enabled and not.

The acquired results from these tests are presented in Fig. 6 and Fig. 7, where the bars presents the number of bits that appeared as stuck relatively to the different elapsed time between the write and read operations for both scenarios. A variation due to borderline cells was identified, leading to a maximum and minimum amount of stuck cells for each value of elapsed time (giving the error bars). The write-wait-read operations were executed $10 \times$ for each value of waiting time and scenario. The points for the time between write and read were defined in order to easily spot the differences between the two scenarios.

![Fig. 6. Retention time on faulty cells in post thermal neutron irradiation tests in the DUT used during the ILL test campaign. Error bars present the maximum and minimum values. The bars’ height presents the mode value.](image)

From both presented graphs, it is possible to spot that the refresh operation decreased the faulty cells’ discharge rate, which is expected behaviour. These results are in line with several works that relate the stuck fault with the refresh rate, as presented in [19] and [22], where the number of stuck bits decreases when the refresh rate increases. However, since the memory self-refresh mechanism distributes single row refresh operations, where the array should be fully refreshed in a maximal time interval of 64 ms [29], it is expected that if the time needed to discharge the bit cell is higher than the refresh interval, the self-refresh mechanism should be able to keep the capacitor with a charge above the threshold value used to identify the cells’ logic data. The results presented in Fig. 6 and Fig. 7 show that the expected behaviour is not achieved in those cells that presented radiation-induced errors since they have degraded retention capability.

A further analysis based on this behaviour is also proposed. After identifying the stuck bits in the post-radiation DUTs, we performed two different procedures:

1) A write operation is performed in each bit that appears as stuck post-radiation; the self-refresh mechanism is kept enabled during 10 minutes; the bits are read.
2) A write operation is performed in each bit that appears as stuck post-radiation; the self-refresh mechanism is disabled; keep applying read operations in the bits with a time interval of 64 ms (the same of the self-refresh) during 10 minutes;

As result, with the first procedure (1), the read operation returned the stuck bits failing. On the other hand, with the second procedure (2), no failing bits were returned when reading the memory, which means that the “refresh” achieved by an actual read operation was able to keep the bit cells with a charge above the failure threshold, where the normal refresh operation was not. Experimentally, we noticed that the self-refresh and the actual read (applied by the host) lead to different refresh efficiency. Since the internal device design and architecture are not available, our hypothesis of explanation is that the self-refresh circuitry accesses the memory for a period shorter than an actual write/read accesses. The larger access time allows a
larger equivalent charge to be stored in the cell capacitor.

C. Fault Mechanism of the Stuck Bits and SBUs

Neutron irradiation may induce different levels of damage on a cell, which is presented by the appearance of permanent and temporary stuck bits. Further analysis of results exhibits a trend concerning the cells that experienced an SBU, which shows to be very similar to the stuck bit fault mechanism. In this case, the cells’ retention time has been measured by disabling the memory’s self-refresh mechanism and performing write and read operations with different time intervals. Note that in all observed cases, within a 16-bit word, only one SBU or one stuck bit was identified, and the adopted test procedure is similar to the one used to obtain the results presented in Fig. 6 and Fig. 7. However, in this case, the target word addresses are the ones that presented either an SBU or a stuck-at fault during the irradiation tests. As a control procedure, the same test has also been applied to random portions of the memory, with word addresses that did not present any fault during the radiation tests. Table I presents the number of addresses used for each case. Fig. 8 presents the acquired results for the thermal neutron irradiation, and Fig. 9 for the atmospheric-like neutron irradiation. The error bars represent the maximum and minimum value, and the dot represents the mean value. The dashed lines represent the increase in the number of bits failing from one measure to the next. In this case, in order to show the behaviour of normal cells over refresh time, we considered a larger time window (w.r.t Fig. 6 and Fig. 7) to enable error appearance.

<table>
<thead>
<tr>
<th>Neutron beam</th>
<th>SBUs</th>
<th>Stuck bits</th>
<th>Normal</th>
</tr>
</thead>
<tbody>
<tr>
<td>Thermal</td>
<td>18</td>
<td>35</td>
<td>32</td>
</tr>
<tr>
<td>Atmospheric</td>
<td>1127</td>
<td>821</td>
<td>1057</td>
</tr>
</tbody>
</table>

The populations that contain the damaged cells had their nominal retention time decreased, showing similar behaviour for both SBUs and stuck bits. Also, the other cells (the remaining 15 bits) from the faulty addresses show normal behaviour. These results suggest a similarity of fault mechanism between SBUs and stuck bits, despite the fact they lead to different fault models. The difference in terms of effect (fault model) is the different levels of damage in the cell due to the particle hit that is more severe in the case of a stuck bit. Thus, concerning the fault mechanism, the most probable interpretation is particle-induced displacement damage that leads to a variation (reduction) of the retention time of the affected cells (with a leakage current discharging the cells) [21]–[23]. The permanent degradation is observable when the refresh mechanism is disabled, and a sequence of write-wait-read operations is acted. Intermittent bits show borderline behaviours between permanent stuck and normally working cells. Clearly, the more time elapses between the read and write accesses, the more small degradation will be detected as fault.

In the case of SBUs, the degradation leads to a small reduction of the retention time of the cell. In order to spot the degradation of the cells concurrent to SBUs, we need to relax the refresh frequency, or enlarge the time between two cell accesses when the self-refresh is disabled. This behaviour is also confirmed in our experiments by the different efficiency of the self-refresh and actual read operation, as presented in the previous subsection (Section IV-A). In our tests, under atmospheric-like neutron irradiation, only a unique occurrence
of SBU was detected in dynamic test mode, while several SBUs were observed in static test mode. In the first case, the dynamic test mode ensures frequent read and write accesses, while in static mode the data refresh is made only by the self-refresh mechanism. The Fig. 10 presents the percentage of bits failings when targeting the different addresses (SBUs, stuck bits, and “normal”) in post-radiation tests on the DUT used in the atmospheric-like neutron irradiation. Differently from Fig. 8 and Fig. 9, in Fig. 10, the time interval between the write and read is shorter, enabling a better visualisation of the degradation on the retention time of the target addresses.

![Fig. 10. The percentage number of bits failing at a defined interval between the write and read operation with the self-refresh mechanism disabled. Error bars present the maximum and minimum value. Data acquired from post-radiation tests on the DUT used in the atmospheric-like neutron irradiation.](image)

For the dynamic and static tests, the self-refresh mechanism was kept enabled. These two tests target the annealing on the cells that appear as stuck on the DUT after the irradiation. The results are presented in Fig. 11 and Fig. 12 for thermal and atmospheric-like neutron irradiation, respectively. The acquired results are compared with tests performed before the thermal annealing tests. Hereinafter, Pre-TA stands for pre thermal annealing.

![Fig. 11. The number of bits failing for pre and post thermal annealing tests using static with solid ‘0’ and solid ‘1’ data pattern, and the dynamic March C- algorithm. Errors bars present the maximum and minimum value. Data acquired from the post-irradiation thermal annealing test on the DUT used in the thermal neutron irradiation.](image)

D. Thermal Annealing Tests

A temperature dependency is shown in several works treating the stuck phenomena in DRAMs. In running time, the increase in the temperature raises the leakage current, leading to the appearance of more stuck bits [22], [23], [27]. However, studies have shown that a high-temperature baking process can recover the memory cells’ retention capability. In [20], the damage induced by X-ray irradiation was recovered by a thermal annealing process, which presented a reduction in the cells’ retention time with the increase of the baking temperature. This behaviour is also observed by [19], where the number of stuck bits decreased with the increase of the temperature annealing.

To analyse the thermal annealing effect on the damaged cells, the DUTs were baked during 8 hours at four different temperatures: 80°C, 100°C, 120°C, and 140°C. After each high-temperature exposure, we performed five runs of the static test with solid ‘0’ and solid ‘1’ data pattern with an interval of 60 seconds between the write and read operation, four sequences of the dynamic March C- algorithm (with ten dynamic cycles each sequence), and retention time tests were applied in the DUTs at room temperature.

Further, an additional test for retention time was made in the addresses where a stuck bit or a SBU was identified during the irradiation tests. This test is similar to the one presented in the previous section. The addresses that contain...
a faulty cell passed through a sequence of write-wait-read operations, while the self-refresh mechanism was disabled. This test identifies the cells' retention time. Fig. 13 and Fig. 14 depict a comparison between pre-annealing and post-annealing results for both DUTs. The figures show how the high-temperature annealing decreases the number of stuck bits present at all tested points, and that the retention time of the cells recovers with the annealing. The displacement damage induced by the neutron irradiation is thus annealed and annealed more efficiently with higher temperatures.

**E. Block Errors**

Besides the described faults (SBUs and stuck bits), block errors with vertical and horizontal shapes were observed in the memory bitmaps. To evaluate these events, we generated logical bitmaps by dividing the memory array into two parts, using the left side for odd rows and the right side for the even ones. This procedure generated 16384 columns. In a bitmap, each pixel represents a bit cell.

An example of a horizontal block error can be seen in Fig. 15, which is the resulted bitmap of a static test with a checkerboard pattern as a data background. In the figure, two square zones are zoomed-in to increase visibility. These events are characterised by errors occurring in all the 512-word addresses of two consecutive even or odd rows, being most of the bits within a word with an error. An exception of this behaviour is presented in the top left zoomed-in square of Fig. 15, where within the same address range, the bitmap shows a horizontal strip of errors with most of the not-faulty bits, resulting in events with less than the expected 1024 words errors.

**Fig. 15.** Bitmap was obtained after a static test mode using a checkerboard “AAAAh” during a thermal neutron run. Each pixel represents a bit; bits that were identified with errors appear in black. The grey lines are used to limit the region. Zoom-ins are added to increase the visibility of the horizontal block events.

Block errors were also observed with a vertical shape, in which the same column is affected in subsequent even or odd rows. Fig. 16 spot this block error identified during a Dynamic Stress test in a second cycle for the first “r1” operation of the fourth element of the algorithm. It is interesting to highlight that in all vertical lines of errors, the addresses with errors span in the same range, returning a maximum of 2048 words with errors.

For both vertical and horizontal block errors, a write operation was able to restore the access to the cells without the need to carry out a power cycle. This error type is not due to a problem related to the affected cells but rather to the control logic. In particular, a temporary malfunction of the sense amplifier or register that serves that column may lead to this behavior.

An interesting way to see the impact of both faults (stuck-at and block errors) is using a timeline plot. Fig. 17 presents a run of a mMats+ test (4), where the dots represent a faulty word detected during a read element of the test. The faults...
detected during the $↑(r0, w1)$-element are depicted in blue, and the ones detected during the $↑(r1, w0)$-element are in orange. During this test run, stuck bits appear as permanent and temporary, which can be seen by the horizontal sequence of dots on the graph. Also, a block error spanning 2048 addresses can be identified by the two vertical sequences of dots.

Two blocks of errors spanning a different range of addresses occurred during the thermal-neutron test campaign. The first event is depicted in Fig. 18. The Bitmap presented in the figure was obtained during a Dynamic Stress test. The red arrows show the six error lines that were presented in the five “r1” operations performed in the last element of the Dynamic Stress algorithm. In this case, in three fixed columns in both even and odd rows, we identified twelve addresses range. As the opposite of the first vertical event, all the addresses returned all bits with an error, and a power cycle was performed in the DUT.

The second type of vertical line block error was observed during March C- test execution, with increasing addressing order, resulting in a sequence of more than 100 words with errors. The affected addresses were dependent on the execution order, resulting in a range from “000000h” to “00006Ah” for an increasing order ($↑$), and from “3FFFFFFh” down to “3FFFF8Dh” for a decreasing order ($↓$). The effect persisted during several cycles of dynamic tests. However, after a dynamic execution, we performed a static write and read operation, and the block error was recovered after two static writes, returning its appearance during the next dynamic test. This event occurred during five runs using March C-, Dynamic Classic, mMats+, and with a sequence of static tests between the irradiation runs. It was recovered only through a power cycle.

Experimentally, the vertical and horizontal block errors can be recovered by a write operation in the memory addresses. However, for the above specific cases (Fig. 18, and the one mentioned in the previous paragraph), a power cycle was required to reestablish the memory functionality. It is possible that a micro latch-up occurring in the memory may have produced the malfunction since we did not spot any relevant increase in the memory current, which is typical of a large scale latch-up.

Finally, the block error cross section is defined as

$$\sigma_{mode(device)} = \frac{\sum N_{mode}}{\sum F_{mode}}$$  \hspace{1cm} (6)

where $\sum N_{type}$ is the total number of occurrence of block errors for each test mode (static or dynamic), and $\sum F_{mode}$ is the total cumulative fluence of each test mode. Table II and Table III present the values that were calculated using a 95% confidence interval and a fluence uncertainty of 10% for both scenarios (thermal and atmospheric-like neutrons).

### Table II

<table>
<thead>
<tr>
<th>Test mode</th>
<th>$\sigma$</th>
<th>Lower limit</th>
<th>Upper limit</th>
</tr>
</thead>
<tbody>
<tr>
<td>Static</td>
<td>$1.81 \times 10^{-12}$ cm$^{-2}$</td>
<td>$6.52 \times 10^{-13}$ cm$^{-2}$</td>
<td>$3.96 \times 10^{-12}$ cm$^{-2}$</td>
</tr>
<tr>
<td>Dynamic</td>
<td>$1.77 \times 10^{-12}$ cm$^{-2}$</td>
<td>$7.51 \times 10^{-13}$ cm$^{-2}$</td>
<td>$3.51 \times 10^{-12}$ cm$^{-2}$</td>
</tr>
</tbody>
</table>

### Table III

<table>
<thead>
<tr>
<th>Test mode</th>
<th>$\sigma$</th>
<th>Lower limit</th>
<th>Upper limit</th>
</tr>
</thead>
<tbody>
<tr>
<td>Static</td>
<td>$1.87 \times 10^{-11}$ cm$^{-2}$</td>
<td>$9.25 \times 10^{-12}$ cm$^{-2}$</td>
<td>$3.35 \times 10^{-11}$ cm$^{-2}$</td>
</tr>
<tr>
<td>Dynamic</td>
<td>$1.08 \times 10^{-10}$ cm$^{-2}$</td>
<td>$6.98 \times 10^{-11}$ cm$^{-2}$</td>
<td>$1.60 \times 10^{-10}$ cm$^{-2}$</td>
</tr>
</tbody>
</table>

**F. Overall Event Cross Section and SER**

To evaluate the overall events’ cross sections of this memory, the fault types were divided into SBUs, stuck bits, and block errors. For these cases, we did not split the events occurring in dynamic and static modes.

The estimated event cross section (\(\sigma\)) is defined in two different ways since the stuck bits and the SBUs are cell-related, the cross section takes into account the size of the memory, being the equation as

$$\sigma_{bit} = \frac{N}{F \times M}$$  \hspace{1cm} (7)

where \(N\) is the number of events, \(F\) is the beam fluence in n/cm$^2$, and \(M\) is the number of bits [46]. When it comes to the block errors evaluation, since this fault is related to the control logic of the device, we may define the cross section as

$$\sigma_{device} = \frac{N}{F}$$  \hspace{1cm} (8)
Step
↑(r0,w1)
↑(r1,w0)

Fig. 17. Errors during a mMats+ test run under the atmospheric-like neutron beam. The dots represent a faulty word detected during the different operations of the algorithm. Stuck bit appears as a horizontal sequence of dots; a block error appears as two vertical sequences of dots.

Fig. 18. Bitmap obtained during a Dynamic Stress test after the fifth ‘r0’ of the algorithm’s sixth line in the thermal-neutron test campaign. Each pixel represents a bit; bits that were identified with errors appear in black. The grey lines are used to limit the region. Zoom-ins are added to increase the visibility of the horizontal block events. Red arrows indicate the six vertical lines.

where the memory size is removed from the equation, and the cross section is device-based.

From the calculated events cross sections, we define the SER expressed in FIT/Mb. 1 FIT/Mb is equal to one failure per billions of working hours per Mb [47], [48]. The equation is

\[
SER_{FIT/Mb} = \sigma_{bit} \times (1024 \times 1024) \times 10^9 \times j
\]

for the SBU and stuck bit, where 1024 \(\times\) 1024 (bits) is the Mb coefficient, 10\(^9\) is the FIT definition, and \(j\) is the flux at New York (sea level) outdoors for a mean solar activity defined in JEDEC JESD89A, being 6.5 n/cm\(^2\)/h for the thermal energies’ (< 400 meV), and 13 particles/cm\(^2\)/h for the high energy neutrons (> 10 MeV) [3], [35], [48]. Being slightly modified in the evaluation of the block errors SER, where the Mb coefficient is removed from the equation, which becomes

\[
SER_{FIT} = \sigma_{device} \times 10^9 \times j
\]

Table IV presents the estimated cross sections related to the thermal neutron test campaign, while Table V presents the estimated cross sections and SER for the atmospheric-like neutron test campaign. For the presented results, \(F\) is the total cumulative run fluence, however, for the specific case of SBUs under atmospheric-like neutron beam, \(F\) is the cumulative run fluence for static tests, since no SBU was identified under dynamic mode (the only one SBU under Dynamic Classic was not considered). The values were calculated using a 95% confidence interval and a fluence uncertainty of 10% in both scenarios.

V. CONCLUSION

The effects of neutron irradiation in a self-refresh DRAM were described. From static and dynamic test modes realised during two test campaigns, different kinds of faults were
identified. Besides the occurrence of SBUs, the tests showed permanent and temporary stuck bits, which already had been reported in several studies, presenting different fault mechanisms, with the most probable cause being the irradiation impact on the variable retention time phenomenon.

Tests targeting the retention time of the damaged cells shows that the fault mechanism of the stuck bits and SBUs present a very similar behaviour, being the main difference the degradation level on the cells’ retention time. The retention time tests also show that, experimentally, there is a difference between the self-refresh and read operation, which should lead to a difference in the equivalent stored charge in the cell’s capacitors. The damage induced in both the cells with SBUs and stuck bits was also found to anneal during high-temperature annealing tests. The higher the annealing temperature, the more the cells retention time was found to recover.

Furthermore, block errors were observed in four different patterns, with intermittent word errors in vertical and horizontal sequential logical addresses, and also presenting divided vertical lines with all bits within a word with errors, and a sequential error with dependency in the addressing order.

Cross sections for the different kinds of faults were estimated, showing that the memory is not very sensitive to thermal neutrons. However, it is necessary to consider that vertical and horizontal block errors present a significant quantity of word errors within an event, where, from a user point of view, it could represent an issue in critical applications.

### REFERENCES
