Vertical Line Fault Mechanism Induced by Heavy Ions in an SLC NAND Flash
Viyas Gupta, Alexandre Louis Bosser, Lucas Matana Luza, Daniel Söderström, Arto Javanainen, Heikki Kettunen, Jaan Praks, Kay-Obbe Voss, Ari Virtanen, Luigi Dilillo

To cite this version:

HAL Id: lirmm-03358989
https://hal-lirmm.ccsd.cnrs.fr/lirmm-03358989
Submitted on 29 Sep 2021

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Vertical Line Fault Mechanism Induced by Heavy Ions in an SLC NAND Flash

V. Gupta¹, A. Bosser², L. Matana Luza³, D. Söderström⁴, A. Javanainen⁴, H. Kettunen⁴, J. Praks², K.-O. Voss⁵, A. Virtanen⁴, L. Dilillo³

Abstract
The vertical line fault mechanism occurring in NAND flash devices under heavy-ion irradiation is described in detail. The location where the fault is generated as well as the recovery sequence are identified.

Index Terms
Radiation testing, heavy-ion, NAND Flash, SEE, SEFI, Vertical error, Vertical line, column error.

I. INTRODUCTION

Flash memories have been widely used for mass storage in spacecraft embedded systems in the last decade. When compared to other types of non-volatile memories, NAND Flash achieves high density in a small area. However, the Floating Gate (FG) technology and the peripheral circuitry used in the NAND Flash architecture are susceptible to radiation effects [1]. Besides, technology scaling to deep sub-micron levels has increased these memories’ susceptibility to Single-Event Effects (SEE).

NAND Flash devices have been extensively tested using different sources of radiation such as in [2], [3]. Proton-induced effects and their sensitivity in SLC (Single-Level Cell) and MLC (Multiple-Level Cell) devices are approached in [4]. In [5] the authors compare the effects of electron irradiation with results from Co-60 Total Ionizing Dose (TID) measurements. The fluence dependence for SEEs with heavy-ion irradiation is investigated in [6]. Besides, studies have been published regarding failure mechanisms in the peripheral circuitry such as the charge pump [7], and the contribution of the Page Buffer (PB), as well as the duration of data storage in that buffer concerning the overall upset rate [8].

The present study was driven by the MTCube project, which calls for a 1-Unit CubeSat to expose several types of memories fabricated with both legacy and emerging technologies to space radiation, including the NAND Flash device studied in this paper. The device has been irradiated with heavy-ion beams in static (retention) and dynamic mode, after having been written with different test patterns. A previously seldom-reported failure mode [2] [9] is described in detail, along with the location of the radiation-induced fault within the device and a recovery sequence.

II. EXPERIMENTAL SETUP

The Device Under Test (DUT) is a 32 Gib Asynchronous Single-Level Cell (SLC) NAND Flash memory manufactured by Micron Technology (MT29F32G08ABAAA). The DUT nominal operating voltage is 3.3 V and consists of one Logical Unit (LU), which is divided in two planes; each plane has 2048 blocks (one plane stores even-numbered blocks, and the other stores the odd-numbered blocks); each block has 128 pages, and each page can store one 8-bit word per column, with a total of 8192 columns per page.

All specimens utilized in our experiments were delidded before irradiation using chemical means; all of these passed functional tests and were fully operational before irradiation.

This study is based on data from three separate test campaigns. The first test campaign took place using a broad beam at GANIL (Grand Accélérateur National d’Ions Lourds) (Caen, France) and involved two specimens. The primary xenon beam was degraded in order to reach an LET (Linear Energy Transfer) in silicon of 26.75 MeV.cm²/mg at the

This study has been achieved thanks to the financial support of the Van Allen Foundation.

This work was also supported by the European Space Agency (ESA/ESTEC Contracts No. 4000111085/14/NL/PA and 4000111630/14/NL/PA) and the Academy of Finland under the Finnish Centre of Excellence Programme 2012-2017 (Project No. 2513553, Nuclear and Accelerator Based Physics).

¹ - ESA/ESEC-Galaxia, Transinne, Belgium.
² - School of Electrical Engineering, Aalto University, Espoo, Finland.
³ - LIRMM, University of Montpellier/CNRS, Montpellier, France.
⁴ - Department of Physics, University of Jyväskylä, Finland.
⁵ - Material Research Department, GSI Helmholtzzentrum für Schwerionenforschung, Darmstadt, Germany.

The authors acknowledge the support of Véronique Ferlet-Cavrois of ESA/ESTEC in putting together the proposal for the GSI test campaign.
### Facilities and beams used for the irradiations

<table>
<thead>
<tr>
<th>Facility</th>
<th>Ion</th>
<th>Energy (MeV)</th>
<th>Effective LET (MeV.cm(^2)/mg)</th>
<th>Range to Bragg peak in Si (µm)</th>
</tr>
</thead>
<tbody>
<tr>
<td>GANIL</td>
<td>Xe</td>
<td>6005</td>
<td>26.75</td>
<td>700</td>
</tr>
<tr>
<td>RADEF</td>
<td>N</td>
<td>139</td>
<td>1.8 (0°) and 2.1 (30°)</td>
<td>202</td>
</tr>
<tr>
<td></td>
<td>Ne</td>
<td>186</td>
<td>3.6 (0°) and 4.2 (30°)</td>
<td>146</td>
</tr>
<tr>
<td></td>
<td>Ar</td>
<td>372</td>
<td>10.1 (0°) and 11.7 (30°)</td>
<td>118</td>
</tr>
<tr>
<td></td>
<td>Fe</td>
<td>523</td>
<td>18.5 (0°) and 21.4 (30°)</td>
<td>97</td>
</tr>
<tr>
<td></td>
<td>Kr</td>
<td>768</td>
<td>32.1 (0°)</td>
<td>94</td>
</tr>
<tr>
<td></td>
<td>Xe</td>
<td>1217</td>
<td>60.0 (0°) and 69.3 (30°)</td>
<td>89</td>
</tr>
<tr>
<td>GSI</td>
<td>Ca</td>
<td>230.4</td>
<td>15.6</td>
<td>55</td>
</tr>
</tbody>
</table>

DUT surface at normal incidence, and the tests were performed in air. GANIL provided the values displayed in Table I.

The second test campaign was carried out using the broad beam of the RADiation Effects Facility (RADEF) at the University of Jyväskylä, Finland. Tests were performed on three specimens in vacuum, with effective LETs (LET accounting for beam incidence angle on the DUT, which varied from 0° to 30°) ranging from 1.8 to 69.3 MeV.cm\(^2\)/mg. The values given in Table I were calculated using SRIM [10].

The third test campaign was carried out on a single specimen at GSI, using the UNILAC microbeam. Only one species was used (calcium at 230.4 MeV at normal incidence), yielding a surface LET in silicon of 15.6 MeV.cm\(^2\)/mg (calculated using SRIM). Areas of interest on the die were identified with the aid of a colinear microscope, then selectively irradiated.

In the GANIL and RADEF test campaigns, the devices were irradiated and tested using test sockets. Due to the geometry of the GSI microbeam facility, test sockets could not be used, so the DUT was directly soldered on a PCB during this test campaign. The specimen irradiated at GSI had previously been irradiated with a broad muon beam, accumulating a total ionizing dose of about 5.5 krad. Nevertheless, the device was fully functional at the start of the GSI campaign.

Whether mounted on test sockets or directly soldered to a PCB, the DUTs were connected to FPGA-based controllers. Although located in the test chamber, the FPGAs were not exposed to the beam to ensure reliable operation. During each test run, when a bit error was detected, the wrong word along with other information such as the address and the time-stamp was transmitted to a computer for storage and processing.

The memory devices were tested under three different modes: unbiased static mode, biased static mode, and dynamic read mode. The static modes used either a solid ‘1’, solid ‘0’ and checkerboard patterns, and the chosen pattern was stored on the memory before the irradiation. The pre-irradiation and post-irradiation data were then compared to detect the bit flips. In dynamic read mode, the memory was written with a solid ‘0’ pattern, then irradiated while being read continuously.

Due to the very large capacity of the memory, only 512 Mib out of 32 Gib (64 blocks) were considered for the static tests, and 8 Mib (one block) for the dynamic tests. Several erase, write, and read operations were performed between test runs, before and after Power Cycles (PC), to sensitize and observe the errors occurring during the test runs, and also to ensure that the memory was error-free and fully functional before the next run.

### III. Results and Discussion

The test results were analyzed with house-made Scilab data processing scripts. Three different types of failures were identified: isolated word errors, small cluster of word errors, and vertical lines. Since the main subject of this study is the failure mode generating vertical lines, the other types of failures will just be introduced.

#### A. Single-bit upsets, clusters of SBUs and MBUs

The simplest failure mode observed consists in SBUs (Single-Bit Upsets), which were detected:

- with nitrogen (LET 1.8 to 2.1 MeV.cm\(^2\)/mg), only in a few biased and unbiased static tests using a solid ‘0’ pattern;
- with neon and all heavier ions (LET 3.6 to 69.3 MeV.cm\(^2\)/mg), in all tests (bias and unbiased static as well as dynamic) using a solid ‘0’ pattern.
An erase operation was able to correct these SBUs without the need to carry out a PC. SBUs were never observed when using a solid ‘1’ pattern. This behaviour is consistent with multiple previous reports in the literature, and is due to the fact that cells holding a data value of ‘1’ are in a discharged state [1].

Small clusters of SBUs and MBUs were also observed, each affecting the same column position on a few consecutive pages.

**B. Vertical Lines**

The most dramatic observed failure mode were vertical lines of errors (VLs). Individual VLs occur within a single memory plane; they span all the blocks within the plane (meaning they affect either all the even blocks, or all the odd blocks), and affect only the words of a single column. Several VLs affecting different columns may occur at once during a test, over one or both planes. VLs can be seen on the bitmap in Fig. 1 computed using data from a static test with xenon at GANIL.

![Fig. 1. Bitmap obtained after a static irradiation at GANIL, using xenon and a solid ‘0’ pattern. Each pixel represents one word; words appear as a red pixel if they suffered at least one bit upset, and white otherwise. Black lines are overlayed on top of the image to indicate block limits; each block is made of 128 pages (lines) of 8192 columns each. Zoom-ins are added to enhance visibility of parts of an intermittent VL (left) and of a continuous VL (right). Both VLs span the whole height of the bitmap.](image)

When considering test runs carried out with a broad beam, VLs did not occur in unbiased static mode. In static tests (with both solid ‘1’ and ‘0’ patterns) and dynamic tests, VLs did not occur at an effective LET of 4.2 MeV.cm²/mg (neon irradiation at 30°), but they did occur at an effective LET of at least 10.1 MeV.cm²/mg (argon irradiation at normal incidence). Hence, the LET threshold for the appearance of those VLs is somewhere between these two values.

Vertical lines may either be continuous (with all words of the column exhibiting bit errors) or discontinuous (with sparse word errors along the column). Within a VL, the affected words tend to share the same data pattern. To be more precise, most data bit positions have identical values across all the words of a given VL, although the value of one or two bit positions may fluctuate randomly.

Unlike SEUs, VLs persist when attempts are made to overwrite the erroneous data. Furthermore, the set of columns affected by VLs and the error pattern of the affected words is affected by write and erase operations, and by power cycles.

However, erase operations alone or power cycles alone are not sufficient to eliminate a VL. Indeed, in most cases, VLs persist after several erase operations or several power cycles, and only disappear when the device is overwritten after both an erase operation and a power cycle were applied, regardless of the order. (In a few rare instances, VLs were found to disappear spontaneously.)

Table II illustrates the behaviour of vertical lines observed after a biased static irradiation at RADEF, using xenon at a normal incidence (LET 60 MeV.cm²/mg). This behaviour is typical of observations made on other test runs.

As can be seen from Table II, the device was written with a solid ‘0’ data pattern, then irradiated with xenon, and read back. In addition to SBUs, MBUs and clusters of MBUs affecting a total of 305 words (not represented in the table), memory plane #1 initially exhibited five VLs, at column positions 571, 1595, 3775, 4290, 6701. The error pattern was consistent within each VL (give or take one or two bit positions), and differed from VL to VL. The device was then erased, and the data was read. All SBUs, MBUs and clusters of MBUs were overwritten successfully. Among the previously observed VLs, only those at positions 3775 and 6701 remained, but their error patterns were completely modified. The remaining VLs had seemingly vanished. Following that, the device was again written with a solid ‘0’ pattern, and its data was read. The original set of five VLs reappeared, each with its original error pattern. Power supply to the device was then cycled, after which its data was read. A slightly different set of VLs was now observed, at positions 571, 1595, 3775, 6338, 6701, and every word of each VL had all bits set to ‘1’ (0xFF). Power
TABLE II

HEXADECIMAL DATA VALUES IN VERTICAL LINES OBSERVED ON PLANE 1 DURING A STATIC SOLID ‘0’ TEST. ‘-’ MEANS CORRECT DATA.

<table>
<thead>
<tr>
<th>OPERATION</th>
<th>ADDRESS</th>
<th>VL COLUMN</th>
</tr>
</thead>
<tbody>
<tr>
<td>W0, irrad, R0</td>
<td>24/26</td>
<td>571</td>
</tr>
<tr>
<td>E, R1</td>
<td>-</td>
<td>1595</td>
</tr>
<tr>
<td>W0, R0</td>
<td>24/26</td>
<td>3775</td>
</tr>
<tr>
<td>PC, R0</td>
<td>FF</td>
<td>4290</td>
</tr>
<tr>
<td>E, R1</td>
<td>-</td>
<td>6338</td>
</tr>
<tr>
<td>W0, R0</td>
<td>-</td>
<td>6701</td>
</tr>
</tbody>
</table>

Fig. 2. Device cross section for vertical lines as a function of effective LET. The dynamic cross section is extrapolated from a single memory plane to the whole device. The error bars were calculated using Poisson statistics. Below 10 MeV.cm²/mg, no VLs were detected, so the cross section was arbitrarily set at $1.0 \times 10^{-7}$ cm² for this semi-log plot. Data taken from the RADEF test campaign.

Fig. 3. Device cross section for vertical lines as a function of effective ion LET. The static and dynamic cross sections are reported separately. While static tests are carried out on memory blocks from both memory planes, dynamic tests are only carried out on one block; since each VL only affects one memory plane, the dynamic device cross section was extrapolated by doubling the cross section observed from the only memory plane which was used during the dynamic tests.

Once the VL failure mode was observed, the authors set to identify its origin on the die using the GSI microprobe facility. The DUT was subjected to a dynamic read test, in which a solid ‘0’ data pattern was first written, then read constantly under irradiation. Fig. 3 exhibits a photograph of the DUT die; in two occasions, and only when irradiating the area identified by the frame, the DUT exhibited VL failure mode. This area was identified by Gerardin et al., who studied the same device, to contain the page buffers and sense amplifiers.

The authors propose two basic failure mechanisms which can explain the occurrence of these vertical lines of errors:

1) A stuck bit in the data buffer: During a block read operation, one page at a time is loaded in the data buffer, which is then serially sent out of the memory. If one bit of the data buffer is stuck to a value, which is the opposite of the value stored in the memory (e.g. stuck to ‘1’ when ‘0’ are stored in the memory), at each page read, the very same error will appear at the same location. Since the pages are represented by horizontal lines of pixels, the errors appear at the same position on each horizontal line of the bitmap, creating a VL of errors.
This failure mechanism can explain the shape and extent of the VLs, but alone it cannot explain why an erase cycle is necessary for recovery, as the erase action does not affect the data buffer.

2) The control electronics of the failing bit line: If the faulty behavior is not due to the data buffer, it must depend on malfunctions at the bit line level and, in particular, its control logic, since it is very unlikely that a single particle or multiple particles would upset hundreds of cells at once and in that specific arrangement. For this reason, the failure must concern one of the elements in the column which are involved in the sensing action of the read operation. For example, in the event of a particle hit which would generate a large amount of charge, the concurrent effects of a triggered micro-latch-up and charges trapped in the bit line access transistor [8] result in partial or total inhibition of the access to the bit line. All access to the column will be affected and generating a VL. In order to stop the failing process, it is necessary to carry out both a PC, which inhibits the micro-latch-up, and an erase operation, which restores the access transistor.

IV. CONCLUSION

This paper reports on a failure mode in SLC NAND flash memories. Schmidt et al. and Oldham et al. reported comparable Vertical Errors in a study of SEE occurrences in Micron NAND flash [2][9]. However, to the best of the authors’ knowledge, so far no study reported any detail on this failure mode (dependence on data pattern, persistence, resilience to erase operations or power cycles, and origin of the fault within the device). The origin of the radiation-induced fault leading to this failure mode is either the device’s page buffers or sense amplifiers.

This failure mode has the potential to cause large-scale data corruption, because it affects word columns across entire memory planes, and cannot be resolved by simply overwriting erroneous data. The only possible mitigation strategy for the end user is to cycle power to the device and erase all blocks in the affected plane, which inevitably leads to data loss. While this strategy is impractical for applications where the device cannot be erased, other applications in high-radiation environments which use flash memories for temporary data storage could benefit from regularly “flushing” these devices (performing an erase and power cycle).

REFERENCES
