Single-Event Effects in the Peripheral Circuitry of a Commercial Ferroelectric Random Access Memory

To cite this version:
Title: Single-Event Effects in the Peripheral Circuitry of a Commercial Ferroelectric Random Access Memory


DOI: [10.1109/TNS.2018.27975436](https://doi.org/10.1109/TNS.2018.27975436)

Published: 24 January 2021

Document version: Post-print version (Final draft)

Please cite the original version:

This material is protected by copyright and other intellectual property rights, and duplication or sale of all or part of any of the repository collections is not permitted, except that material may be duplicated by you for your research use or educational purposes in electronic or print form. You must obtain permission for any other use. Electronic or print copies may not be offered, whether for sale or otherwise to anyone who is not an authorised user.
Single-Event Effects in the Peripheral Circuitry of a Commercial Ferroelectric Random-Access Memory


Abstract – This study identifies the failure modes of a commercial 130 nm ferroelectric random-access memory (FRAM). The devices were irradiated with heavy-ion and pulsed focused X-ray beams. Various failure modes are observed, which generate characteristic error patterns, affecting isolated bits, words, groups of pages, and sometimes entire regions of the memory array. The underlying mechanisms are discussed.

Index Terms— Single-Event Effect, Single-Event Upset, SEFI, FRAM, X-ray, heavy ion, static test, dynamic test

I. INTRODUCTION

FERROELECTRIC Random-Access Memories (FRAMs) are a type of memory device, where the binary information is stored in the electric polarity of minute ferroelectric capacitors. When subjected to a sufficient electric field, the ferroelectric material retains electric polarization, until a sufficiently high reverse electric bias is applied. This bistable characteristic makes FRAM memory cells capable of retaining information for extended periods of time, even at high temperatures [1], [2]. This property makes FRAM interesting as an all-purpose technology, in some instances capable of replacing both traditional non-volatile storage memory (i.e. flash) as well as fast, volatile working memories such as static and dynamic random-access memories (SRAMs and DRAMs). Another advantage of this technology is its resilience to radiation. FRAM memory cells exhibit resilience up to total ionizing dose (TID) levels in the Mrad range (limited by the TID response of the access transistor) [3], [4], and are immune to single-event effects (SEEs) [5], [6].

Nevertheless, besides the memory array, the peripheral circuitry of FRAMs is implemented with traditional complementary metal-oxide-semiconductor (CMOS) technology, and hence it can potentially suffer the same kind of radiation-induced effects that are known to affect CMOS circuits. In particular, CMOS buffers and registers can suffer from single-event upsets (SEUs), which in turn can lead to temporary read/write errors, and even to single-event functional interrupts (SEFIs) of the device. FRAM devices are thus not necessarily radiation-hard, and their radiation sensitivity must be studied before they can be considered safe for use in a radiative environment.

The present study aims at further investigating the radiation-related faults that are due to failures in the peripheral circuits of an FRAM. The chosen device is the FM22L16, a 4 Mbit parallel FRAM from Cypress Semiconductor (previously manufactured by Ramtron Intl.). This component has the largest memory capacity available on the FRAM market. It has been the object of several test campaigns focusing on dose effects as well as SEEs, when exposed to heavy-ion irradiation. Different types of SEEs have been identified from the test data, and their fault mechanisms have been investigated using a pulsed focused X-ray beam. Lastly, the impact of these faults on the device’s failure rate is discussed.

II. EXPERIMENTAL SETUP

The FM22L16 is organized following a two-transistor, two-capacitors per bit architecture (2T2C), and was set in a 16-bit configuration. The FRAM array is organized in 8 blocks, each having 8192 pages, each holding 4 words of 16 bits. Data from five specimens have been used for these experiments; DUTs #1, 2 and 3 are from the same lot, while...
Table 1: Summary of the irradiations performed on the DUTs.

DUT #4 and DUT #5 come from a second and third lot. DUT #1 was irradiated with a xenon beam at the Grand Accélérateur National d’Ions Lourds (GANIL, Caen, France), and DUT #2, #3 and #4 were irradiated with several heavy ion species at the Radiation Effects Facility (RADEF, University of Jyväskylä, Finland). In these tests, the DUTs were irradiated as a whole, as the beam profile was wider than the die.

Finally, DUT #5 was irradiated with pulsed focused X-rays, using beamline 20-ID-B at the Advanced Photon Source (APS) (Argonne National Laboratory, Chicago, IL, USA). The X-ray pulses delivered by the beam have a 100 ps full width at half maximum (FWHM) duration and a 1.77 μm * 1.81 μm FWHM spot size. The X-ray energy was set at 8 keV; the attenuation lengths for the most common materials used in IC manufacturing at this photon energy are presented in Table 2. The open-top DUT has about 5 μm of interconnecting and passivation layers above the active silicon region [7]. We can estimate the attenuation caused by these layers to be minor, since they are mainly composed of SiO₂, Al and Cu. Denser materials such as Sn and W are typically used only for the lowest, thinnest interconnect layers and connecting plugs, in small amounts, while TaN is used only as a thin barrier layer between Cu and insulators.

Throughout this campaign, the total pulse energy at the DUT surface was 87 pJ. In [9], a method has been developed to correlate the transients resulting from the collection of charge carriers generated by pulsed X-rays (using the same APS beam line) and heavy ions. Using the equivalence model described in [9], we obtain:

\[ LET_{eq} = \frac{1}{a} \times E_{pulse}(bE_{pulse} + c) \]

\[ LET_{eq} = \frac{43 \text{ MeV} \cdot \text{cm}^2 \cdot \text{mg}^{-1}}{0.172} \times 87 \times \left(1.16 \times 10^{-4} \times 87 + 7.40 \times 10^{-2}\right) \]

i.e. an X-ray equivalent LET of 43 MeV cm² mg⁻¹ at the DUT surface. In the following discussion, we assume an overlayer profile of 1.5 μm of Al, 1.5 μm of Cu and 3 μm of SiO₂; this results in about 11% pulse energy absorption between the DUT surface and the active silicon region [8]. The formula from [9] then predicts a 37 MeV cm² mg⁻¹ equivalent LET at the sensitive volume depth.

The attenuation length in silicon is so large (69.6 μm) compared to the typical dimensions of logic gates and register cells (a few square micrometers) that for our purposes, we can consider the beam unattenuated once it reaches the silicon, generating charge carriers in a long vertical column.

Several regions of the die have been selectively irradiated, to identify the failure modes triggered by specific circuits. These regions included either memory cells or parts of the central spine (a region of the die containing peripheral circuitry, running across the memory array).

The irradiations performed on the DUTs are summarized in Table 1.

All five DUTs were tested in both static and dynamic modes, with an FPGA-based memory controller developed in-house. In static mode, data is written to the memory, which is subsequently irradiated; during irradiation, the peripheral circuits are idle. The data is read back after the irradiation and checked for errors. In dynamic mode, the memory controller continuously performs March test algorithms on the DUT. A March algorithm includes several elements, and each element consists in one or more read and/or write operation(s). During execution, the first element is applied to each address in the array, one after the other. Then, the next element is performed on each address as well, and so on, until all elements have been applied. The whole process repeats indefinitely until the user stops the test run. Table 3 summarizes the March algorithms most commonly used during our tests; parentheses separate elements, commas separate operations, and arrows indicate the direction in which the address space is scanned by the element.
Dynamic tests were sometimes carried out in a natural order (the algorithm moved from address to address by simply increasing or decreasing the address vector), and sometimes in other, more complex modes. The addressing order can be determined in a pseudorandom mode with a Linear Feedback Shift Register (LFSR), or using a Gray code (one bit toggling at every address change) or an anti-Gray code counting pattern (like Gray, but every other address is complemented so that all bits but one toggle at every address change). This allowed different levels of stress to be induced on the device’s periphery (in particular on the address decoders and registers).

III. DATA PROCESSING

The readback data was processed to generate logical bitmaps, which are images where every pixel represents a memory cell. If the cell has suffered no upset during the test, the pixel is black; otherwise it is colored. On the logical bitmaps, the words are arranged as a function of their logical address: the four words of the first page (addresses 0x0000 to 0x0003) are displayed next to each other (1x64 pixels), then the next four words (next page) below, and so on. The resulting 64*262,144 pixels image is rearranged as a square image for ease of display, with the first band on the left edge and the last band on the right edge. These logical bitmaps help the identification of the fault mechanisms: neighboring words have closely related addresses (their addresses share many identical bits), and so are likely to share peripheral resources (higher-level address decoders, buffers, bit/word lines, etc.).

When the tests were not carried out in a natural order, the data were also arranged as chronological bitmaps, on which the words are placed in the order of access during the test. On chronological bitmaps, it is easy to identify SEFIs, which generate bursts of errors, because they appear as coherent colored blocks.

For static tests and dynamic tests carried out in a natural order, the logical and chronological bitmaps are equivalent.

All bitmaps only display one entry per word (16 pixels per 16-bit word). This means that on bitmaps from dynamic test data, each pixel contains the information from all the successive read operations performed on the corresponding cell. If a pixel is colored, it means that it suffered at least one upset during the test, but it is not possible to tell how many upsets occurred from the bitmap.

Horizontal divisions (every 256 lines) and vertical divisions (64 columns/1 page wide) are displayed on the bitmaps to ease their interpretation. Separation lines divide a bitmap in 256 bitmap sectors; the height of these sectors matches the height of some error cluster types (e.g. type 4; see Figure 3).

The color code used on the bitmaps is used to visually associate errors which were detected on the same read cycle.

IV. EXPERIMENTAL RESULTS

The data presented in this section originates exclusively from test runs where no Single-Event Latch-up (SEL) occurred.

When irradiated with xenon ions (LET of 60 MeV.cm².mg⁻¹) while not being powered, DUT#2 suffered no data corruption, which confirms that the FRAM memory cell itself is immune to SEE when not biased. The memory cells of DUT #5 did not suffer any upset, either in static or dynamic mode, when irradiated under bias with pulsed X-rays. Using this data and the conclusions of study [6], which was done on a closely related device from the same manufacturer, we can assume that the FRAM cells are immune to SEE. For the rest of this study, we will then assume that the observed SEE originate from the device’s peripheral circuitry.

Figures 1 and 2 exhibit logical/chronological bitmaps from the results of two dynamic heavy-ion irradiation tests, on which different failure types can be observed. They can be classified into several categories, which have been numbered by increasing order of importance:

- Type 1 (Figure 1): 1-bit failures. These events can be isolated (type 1a), but sometimes several 1-bit failures can occur at different times at related addresses (sharing many bits) or within the same page (type 1b). Type 1 events were observed on all test campaigns, during both static and dynamic tests.

- Type 2 (Figure 1): several bits in one word are upset at once. The word is either partially corrupted (type 2a) or completely corrupted (type 2b). Type 2 events were observed on all test campaigns, during both static and dynamic tests.
Type 3 (Figure 3): several pages, which have the same page number (appear at the same height within their logical bitmap sectors) exhibit large numbers of upsets affecting several words. Type 3 errors were only observed on heavy-ion campaigns, and only on dynamic tests.

Type 4 (Figure 3): one particular bit of every page within a logic sector suffers either intermittent errors (type 4a) or continuous errors (type 4b), resulting in an interrupted or continuous vertical line on the logical bitmap, respectively. In addition, sparse single-bit upsets (SBUs) may occur randomly within the affected sector. Type 4 events were observed on all test campaigns, mostly on dynamic tests.

Type 5: the chronological bitmap on display on Figure 4 was gathered during an anti-Gray Dynamic Stress test on DUT #4. It exhibits, among type 1 and 2 events, two small blocks of errors in the top left corner; each block is made of 37 completely upset words. A closer examination of the data logs reveals that each of these 76 addresses actually returned errors on several occasions, during two consecutive element scans of the Dynamic stress algorithm. During the first element scan, after the w0 operation, the five consecutive r0 operations all failed on each of these addresses; then, on the next element scan, the first operation, r0, failed on all these addresses. Subsequent accesses to these memory locations returned no errors for the rest of the test.

The logical bitmap for this test run is available on Figure 5. This figure shows how all the errors visible on the chronological bitmap in Figure 4 have closely related addresses (they are close to each other on the logical bitmap).

Type 5 failures are rare: they were only reported once, during this heavy-ion test on DUT #4.
Type 6 (Figures 1, 2 and 7): several hundred consecutively-accessed words are either completely upset (type 6a), or completely upset except for a few occasional bits (type 6b). The colored blocks appearing on Figures 1 and 2 are type 6a events. The number of words affected by type 6 failures seems to be directly influenced by the type of dynamic test, and more precisely, by the speed at which the algorithm scans across the address space. Figure 1 exhibits several type 6a events, each affecting about 350 words. The data for this figure was gathered during an mMats+ test; the elements of this algorithm contain two operations each. The data used for Figure 2 was gathered on the same DUT in exactly similar conditions, except that the test algorithm was Dynamic Classic, whose elements only contain one operation – meaning that the Dynamic Classic algorithm scans addresses faster. Figure 2 also exhibits several type 6a events, but in this case each event affects about 770 words. This correlation between algorithm scanning speed and type 6 event severity was verified on tens of different test runs; it indicates that type 6 events last for a constant amount of time (or a constant amount of I/O operations). Type 6 events were observed only on heavy-ion campaigns, dynamic tests only.

Type 7 (Figure 6): several thousands to tens of thousands of consecutively-accessed words are affected with a high density of random upsets, generating hundreds of thousands to millions of upsets. The device may eventually recover from the condition spontaneously. The type 7 event visible on Figure 6 is the logical/chronological bitmap from a pulsed X-ray test on DUT #5 at APS. The beam scanned a region of the central peripheral spine, while a natural-order mMats+ dynamic test was performed. This type of SEFI also occurred during heavy-ion dynamic testing.

Type 8 (Figure 7): several thousands to tens of thousands of words are either entirely, or almost entirely corrupted; these words all have a few address bits in common. This is evidenced by the fact that on a logical bitmap, the errors generated by type 8 events fill up entire binary subdivisions of the bitmap – either the whole bitmap, or one half, or one or more quarters or eighths, etc. This is evident on Figure 7, where a type 8 event takes up a whole eighth of the bitmap. Since this is a logical bitmap from a natural-addressing test, it means that the type 8 event started as the third most-significant address bit toggled from 0 to 1, and ended as soon as it toggled back to 0. This type of event was recorded on all heavy-ion test campaigns, but only on dynamic tests. Type 8 events were also detected on dynamic tests where the addressing was not natural – for example, anti-Gray. This means that the errors which appear during a type 8 event are not necessarily accessed consecutively.

Several noteworthy events occurred during static heavy-ion tests on DUT #2. The memory was written with a known data pattern (every word contains the lower 16 bits of its address vector) and irradiated under bias, then read back. On a few occasions, with iron, krypton and xenon beams, the readback data contained a few words with erroneous data (type 2a errors). These errors can be considered permanent, since subsequent readbacks returned the same errors. However, they disappeared after power cycling the DUT.

Similar events occurred during a static test on DUT #5, with the X-ray beam aimed at the central spine. The device was written with a known data pattern, then irradiated. When read back after the irradiation, two type 2a events were detected at unrelated addresses. These two words were marked by overwriting a specific data pattern (0xABCD),
after which the device was power cycled, and read back again: the readback data were correct at all addresses, except from the two words, which previously underwent a type 2a event (they did not contain 0xABCD anymore). These two words were written with 0xABCD again, the DUT was power cycled again, after which the memory performed as expected.

Table 2 gives the threshold equivalent LET and observed maximum device cross-section for each failure category, for dynamic mode and for static mode. Dashes indicate the failure category types which were not encountered in static tests.

<table>
<thead>
<tr>
<th>SEFI type</th>
<th>Static $\text{LET}_{\text{th}}$ (MeV cm$^2$mg$^{-1}$)</th>
<th>Static max. XS (cm$^2$)</th>
<th>Dynamic $\text{LET}_{\text{th}}$ (MeV cm$^2$mg$^{-1}$)</th>
<th>Dynamic max. XS (cm$^2$)</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>$\leq 1.8$</td>
<td>$5.9 \times 10^4$</td>
<td>$\leq 1.8$</td>
<td>$1.4 \times 10^4$</td>
</tr>
<tr>
<td>2</td>
<td>$\leq 1.8$</td>
<td>$8.2 \times 10^4$</td>
<td>$\leq 1.8$</td>
<td>$4 \times 10^4$</td>
</tr>
<tr>
<td>3</td>
<td>$-,$</td>
<td>$\leq 1.8$</td>
<td>$6 \times 10^4$</td>
<td></td>
</tr>
<tr>
<td>4</td>
<td>$2.5&lt;\text{LET}_{\text{th}} \leq 3.6$</td>
<td>$1 \times 10^7$</td>
<td>$\leq 1.8$</td>
<td>$1 \times 10^5$</td>
</tr>
<tr>
<td>5</td>
<td>$-,$</td>
<td>$10.1&lt;\text{LET}_{\text{th}} \leq 11.7$</td>
<td>$2 \times 10^7$</td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>$-,$</td>
<td>$\leq 1.8$</td>
<td>$6.3 \times 10^5$</td>
<td></td>
</tr>
<tr>
<td>7</td>
<td>$-,$</td>
<td>$\leq 1.8$</td>
<td>$3 \times 10^7$</td>
<td></td>
</tr>
<tr>
<td>8</td>
<td>$-,$</td>
<td>$11.7&lt;\text{LET}_{\text{th}} \leq 18.5$</td>
<td>$2 \times 10^6$</td>
<td></td>
</tr>
</tbody>
</table>

Table 2: Threshold equivalent LET and maximum measured cross-sections for each type of failure category, in static and in dynamic mode.

V. DISCUSSION

These different failure modes suggest the occurrence of faults in several different elements of the peripheral circuitry.

Type 1 and 2 SEEs were detected both in static and dynamic modes, both during heavy-ion testing and X-ray periphery attacks, but never during X-ray FRAM cell attacks, thus their origin must lie in the peripheral circuitry. These events never occurred when the memory was irradiated in a powered-off state. It was observed that:

- If a word is corrupted by a type 2 SEE, and no write operations are carried out on it, then the error remains until the next power cycle, after which the error disappears;
- If a word is corrupted by a type 2 SEE, and is then overwritten with arbitrary data, then this word behaves normally until the next power cycle. At startup, this word will contain different data than the data it contained before the power cycle.

This suggests that the expected data is written at the wrong place after a type 2 failure, and that the element of the periphery which is upset by radiation is restored in its correct state during device power-on boot. The authors identified one potential cause of these SEEs to be upsets occurring in SRAM-based redundancy registers, whose purpose is the reallocation of faulty memory elements (rows, columns, blocks) to spare elements within the memory array. Upsets in such registers will be latched until they are reinitialized to their correct value. These registers are always reloaded with correct values at device power-on.

Type 4 SEE: the facts that this type of event occurred during X-ray periphery testing, and that most of the errors generated occur at the same bit of the same word within their page suggest two possible fault mechanisms. The first hypothesis is an upset of a redundancy register, with the consequence of either reallocating a functional column to a spare column (thus not correctly initialized), creating a continuous 4b event; or the re-allocation of a spare column to a malfunctioning column (not supposed to be used), or the allocation of a functioning column to a malfunctioning spare (not supposed to be used), resulting in an intermittent 4a error. The second hypothesis is the occurrence of a micro-latchup event or a stuck bit in a page buffer. Such events induce metastability in the buffer cells, explaining the occurrence of seemingly random errors occurring concurrently to the “vertical lines” in the rest of the page buffer positions during type 4 events (see Figure 3).

SEE type 3: these events could have similar origins to those of type 4 events. Since the affected pages share similar page numbers, they could all be part of a single memory row which was reallocated to a spare row. Another possibility would be that an element common to these pages (e.g. a low-level address decoder) was disturbed during the test.

SEE type 5: the addresses involved in this event started returning all-corrupted words after a W0 operation. For each of these addresses, several read operations spread over two scanning cycles returned the same result, until their cells were eventually rewritten. This failure can be explained by a temporary stuck address bit. Typically, during an access to the memory, the value input on the memory’s address pins is loaded into an address buffer. If, under the effect of...
radiation, one or more bits from the buffer get stuck, then the requested operation will be performed at the wrong memory location. This hypothesis is supported by the chronological bitmap, which indicates that during the event, in chronological order, every other address accessed failed. This is consistent with the fact that all address bits -but one-toggle from one access to the next in anti-Gray addressing mode: the stuck bit fault can only trigger errors on every other position accessed.

Another explanation for this event could be a failure of the write operation of the first element of the algorithm. As indicated by Figure 5, all the words involved in this event have related addresses, which means that there is a high probability that they share common read/write control circuits. It is possible that locally, the peripheral elements required for write operations were temporarily disabled by an ion strike. This hypothesis is supported by the fact that no other large group of errors is visible on the logical bitmap.

Type 7 events are large-scale functional interrupts, which do not affect an “even” amount of words (a power of 2), seemingly start and stop at random address positions, and trigger a pseudorandom output, could originate in an upset of device configuration registers, or in a micro-latch-up affecting peripheral elements. Micro-latch-up conditions have been shown to disappear spontaneously in CMOS devices, when the high voltage lines sustaining them are switched off as part of normal device activity [11].

Type 8 events are large-scale failures which affect an “even” amount of words (a power of 2). They begin and end when certain address bits toggle; since each address bit controls one level of address decoding, type 8 events must be “mapped” on the memory array. For example, the type 8 event visible on Figure 7 affected exactly one eighth of the memory array; since the memory array is organized in eight blocks, one possibility is that a radiation-induced upset in a configuration register disabled a critical element in one of the eight memory blocks – and that subsequent accesses to this memory block returned an erroneous value. Since three address bits are used to select blocks, the type 8 event started when the lowest-level block-selecting bit toggled, and ended when another block was selected at the next toggle. Possible origins for these events could be upsets in configuration registers (e.g. controlling power switches feeding memory blocks).

**CONCLUSION**

This study shows that the SEE occurring in the FM22L16 FRAM device come in several types, with different root causes, of different magnitudes and severity. The detected failure types may involve either individual bits, isolated words, groups of pages, 1-bit-wide columns, entire regions of the memory array, or a variety of SEFIs generating errors for an arbitrary duration. All these SEEs can be considered to originate in the peripheral circuitry, as also suggested by previous studies [7], [12]; possible origins for some of these failures include internal redundancy and control registers. However, experimental data show that at least some categories of SEE - notably single-word (type 2) errors - can be avoided by forcing a reset of the involved peripheral elements via power cycling the DUT before access (and possibly via putting the device out of sleep mode). This has major implications regarding the device’s radiation sensitivity, since type 2 events are by far the most frequently encountered. Many applications using the device as a storage memory could easily implement systematic power cycling before device access as an error mitigation technique.

The results of this study suggest that hardening key elements of the peripheral circuitry of a memory device (e.g. implementing the registers with additional transistors [13] or a dual-interlocked cell architecture [14]) could effectively mitigate the most common failure modes. This would dramatically improve the failure rate of the device, at the expense of a small increase in the area of the peripheral circuitry.

**ACKNOWLEDGEMENTS**

The authors would like to thank Stephen Buchner (U.S. Naval Research Laboratory) and Erik C. Dillingham (Aerospace Corp.) for their assistance in organizing and conducting the test campaign at Argonne National Laboratory.

**REFERENCES**


DOI: 10.1109/REDW.2015.7336734


DOI: 10.1109/REDW.2008.10
