

## Multiple-Cell-Upsets on a commercial 90nm SRAM in Dynamic Mode

Georgios Tsiligiannis, Luigi Dilillo, Alberto Bosio, Patrick Girard, Serge Pravossoudovitch, Aida Todri-Sanial, Arnaud Virazel, Christopher Frost, Frédéric Wrobel, Frédéric Saigné

### ▶ To cite this version:

Georgios Tsiligiannis, Luigi Dilillo, Alberto Bosio, Patrick Girard, Serge Pravossoudovitch, et al.. Multiple-Cell-Upsets on a commercial 90nm SRAM in Dynamic Mode. RADECS: Radiation and Its Effects on Components and Systems, Sep 2013, Oxford, United Kingdom. pp.1-4, 10.1109/RADECS.2013.6937429. lirmm-00839062

## HAL Id: lirmm-00839062 https://hal-lirmm.ccsd.cnrs.fr/lirmm-00839062v1

Submitted on 26 Jun 2016

**HAL** is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L'archive ouverte pluridisciplinaire **HAL**, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

# Multiple-Cell-Upsets on a commercial 90nm SRAM in Dynamic Mode\*

#### G. Tsiligiannis, L. Dilillo, A. Bosio, P. Girard, S. Pravossoudovitch, A. Todri, A. Virazel, C. Frost, F. Wrobel and F.Saigné

Abstract— Downscaling of devices increases the Multiple-Cell-Upset (MCU) cross section of SRAMs making them an important threat for the robustness of systems. In this paper we present different types of MCUs as they were recorded during atmospheric-neutron irradiation experiments, while a commercial 90nm SRAM was tested in dynamic mode. This study shows that when the memory is in dynamic mode, not only the typical MCU that involve a few flipped cells may appear but also, large clusters of upsets are possible to occur with hundreds of cells being affected. We identify different patterns of MCUs and categorize them according to their shapes and sizes.

*Index Terms*—Multiple Cell Upsets (MCU), neutron irradiation, SRAM, dynamic mode

#### I. INTRODUCTION

THE effects of neutron induced upsets to ICs have been studied extensively for over two decades. Until recently, the main source of errors induced by neutrons has been considered Single Event Upsets (SEUs) where a single bit cell of the memory is affected. However, with the downscaling of devices, Multiple Cell Upsets (MCU) have started to appear more often, affecting significantly IC robustness. MCUs are upsets induced by a single impinging particle, when cells neighboring to the hit cell are upset. The importance of MCUs comes to the possibility of being Multiple Bit Upsets (MBU). Depending on the mapping of the memory bit cells, MCUs may include upset cells belonging to the same word, a case extremely difficult to be mitigated by Error Correction Codes (ECC) at the system level.

Many studies investigating MCUs exist in the literature. Most of them are focused on MCUs appearing on SRAMs at both the simulation and the experimental level. The failure mechanism of MCUs is described as follows: a particle strike

G. Tsiligiannis, L. Dilillo, A. Bosio, P. Girard, S. Pravossoudovitch, A. Todri and A. Virazel are with the Laboratoire d'Informatique, de Robotique et de Microelectronique de Montpellier (LIRMM) Universite de Montpellier II/CNRS, 161 rue Ada - 34095 Montpellier Cedex 5, France, (phone: +33(0)467418526, fax: +33 (0)467418500,email:{tsiligiann, dilillo, bosio, girard, Serge.Pravossoudovitch, todri, virazel}@lirmm.fr)

C. Frost is with the Rutherford Appleton Laboratory, Harwell Oxford Didcot OX11 0QX, tel: +44 (0) 1235 445296, email: christopher.frost@stfc.ac.uk

F. Wrobel and F. Saigné are with the Institut d'Electronique du Sud, Universite Montpellier II / CNRS, UMR-CNRS 5214, Place Eugene Bataillon - 34095 Montpellier Cedex 5, France (email: {frederic.wrobel, frederic.saigne}@ies.univ-montp2.fr). generates a parasitic current to a memory cell, inducing an upset (SEU). If big enough, the current can propagate to neighboring cells upsetting them as well, and resulting to an MCU. When more than one of the affected cells belongs to the same word the result is an MBU. At the simulation level, [1] shows that the proton induced MCU cross sections is a result of the downscaling of the sensitive nodes (cells) among other factors. Also in [2] another study is performed in which the SEU and MCU cross sections are calculated with respect to the deposited charge on the SRAM cells. [3] explains the role of the triple-well in the MCU frequency increase, due to the amplification of the collected charge.

Besides the work done at the simulation level, several studies have confirmed the existence of MCU at the experimental level. An extensive work where the MCUs are categorized according to their shape and size has been presented in [4]. The reported MCUs have been retrieved by irradiating an SRAM with different neutron energies while the memory was in retention (static) mode. In [5] micro-Single Event Latchups (SEL) have been recorded as a form of big clusters of upsets. In [6] MCUs are observed with the form of big clusters of upsets as well while a 90nm SRAM was irradiated with neutrons and read back during the irradiation run.

This work presents an analytical study on MCU clusters observed during neutron irradiation experiments on an SRAM, similar of which have never been observed before to the best of our knowledge. The MCUs are categorized according to their shapes, sizes and source of occurrence. The experiments have been performed under an atmospheric-like neutron beam at the ISIS facilities in UK [7]. The memories were tested in dynamic mode during the irradiation run and as it will be shown, this is the main reason for observing such big clusters of upsets.

The rest of the paper is organized as follows: in section II a brief presentation of the experimental setup and conditions is given. Section III shows the MCUs retrieved during the irradiation experiments and explains the source of these events, while section IV concludes the paper.

#### II. EXPERIMENTAL SETUP AND CONDITIONS

The memory used during the experiments is a commercial 8-word, 32Mbit 90nm SRAM. A Finite State Machine (FSM) has been implemented to an FPGA responsible for the execution of the different applied tests to the memory. We applied tests while the memory was in both the static and

<sup>\*</sup>This work has been funded by the French "Agence National pour la Recherche" (ANR) under the framework of the HAMLET project (n° ANR-09-BLAN-0155-01)

dynamic mode and under different operating conditions. In dynamic mode testing different March algorithms [8],[9],[10] have been used as already described in our previous work [11], while the static mode testing was performed using a checkerboard sequence. When the memories operate in dynamic mode, different sequences of read and write operations are applied repeatedly, according to each March test. During the read operations, each cell is compared with the value to be expected. In case a faulty value is detected, the word containing the corrupted cell is transmitted from the FPGA to a computer along with its corresponding address and the timestamp of the event. When the memory is in static mode, the read back is performed every 30 seconds where all the recorded upsets are transmitted.

The memories were irradiated under the atmospheric-like neutron beam of the ISIS facilities in UK. Emitted neutrons where of energies in the range of 10-800MeV. Experiments were held for different devices under different operating conditions, and also under different irradiation runs, such that the results would not be related to a specific device, but rather to the technology and architecture of the SRAM. TABLE I shows the total number of single events retrieved for each applied test, their corresponding Soft Error Rates (SER) and the conditions under which each test took place. The SER was calculated by considering each MCU as a single event occurring by a single particle (i.e. recorded faulty cells that were part of MCU clusters were not counted separately). Finally, a relative error of 10% should be considered for the SER according to the log files provided by the facility.

TABLE I Applied Test Data

| Test           | #Events | SER (FIT/Mbit) | Conditions |
|----------------|---------|----------------|------------|
| static         | 1562    | 1996           | 50°C       |
| dynamic stress | 664     | 1140           | 50°C       |
| dynamic stress | 2987    | 1149           | nominal    |
| C-             | 1374    | 781            | nominal    |
| Mats +         | 468     | 1343           | 110°C      |

#### III. MCU ANALYSIS

In order to facilitate the presentation of our results, we categorize the recorded MCUs according to their shapes and sizes. In Fig. 1 an example of the results obtained by applying the Dynamic Stress March test is depicted. Each white pixel of the picture represents a bit cell while the pixels in black color are the recorded upsets. This visual representation provides a global view of the memory, with all the accumulated upsets during the irradiation experiment.

As Fig. 1 shows, besides the SEUs, different types of clusters of upsets are recorded when the memory was tested under the Dynamic Stress March test. Similar results were obtained for all the applied tests when the memory was in dynamic mode. However, when the memory was tested in static mode, we have never observed such big clusters. From the results displayed in Fig. 1 and from several other radiation tests we distinguish four categories that will be detailed in the following sub-sections. The categorizing of the four event categories was possible with the use of the upset timestamp.



Fig. 1. 32 Mbit SRAM bitmap with all the upsets accumulated during two hours of atmospheric neutron irradiation. The applied test was the Dynamic Stress and the memory was under the temperature of 50°C. Pixels in black represent the upset cells.

#### A. Typical MCU: Type A

Type A category corresponds to all the typical cases of MCUs that have been observed and presented in most of the previous studies [1-2],[4]. Such MCUs have been observed through all the different tests that we applied so far, in static and dynamic mode. In Fig. 3 a few representative examples



Fig. 2. Type A MCU: typical case of MCU occurring while the memory was tested in static mode at 50°C.

According to the studies [1],[2], such type of upsets occurs when a particle strike generates a transient current high and long enough to travel and upset neighboring cells, besides the cell that it initially collided. Another case also would be to have more than one generated transient currents from the impinging particle, traveling through different directions that would also induce upsets to neighboring cells. Both cases of MCUs have been verified in several studies at the simulation level. ECC can mitigate these types of MCU as long as other cells are interfering physically between the bits of the word. Memories are designed with a distance, of 8 to 16 cells between two consecutive bits of the same word for that reason. Type A events were the only type of MCUs that was observed when the memory was in static mode.

#### B. Rectangular horizontal MCU: Type B

Continuing our analysis, we will present a type of MCU that occurred in all the memories while tested in dynamic mode. Fig. 4 displays some random occurrences of these upsets when the memory was tested with the C-, the Mats+ and the Dynamic Stress March tests in different operating conditions. They are identified as one upset since they occur concurrently.



Fig. 3. Type B MCU occurring when tested with: (b1) March C- algorithm – 337 cells upset (b2) Dynamic Stress algorithm at  $50^{\circ}$ C – 582 cells upset (b3) Mats + algorithm - 553 cells upset - and (b4) Dynamic Stress algorithm at nominal conditions – 261 cells upset. The area affected is usually 16x128 cells.

This type of errors appears rather frequently and considering the number of cells they affect, they have to be taken into consideration. The major problem is due to the non-effectiveness of common ECC algorithms to correct more than two upset bits, belonging to the same word. From measurements performed on our experiments the number of upset bit-cells is in the order of 100-700. Often, more than two bit cells belong to the same word, making such errors difficult to mitigate.

These events cannot be characterized as MCUs with the typical definition since a single particle cannot induce a transient current that would affect such a large number of cells. Errors similar to these have been reported in [5] but in that case they formed different shapes. The difference on the shapes is probably due to the different architecture in particular the different position of the p and n well taps that stop the propagation of latchup events, reducing them to localized micro-latchups. According to the technology used

for the memory under test in that study, when a block is in retention mode it is not fully powered. The levels of voltage are lowered so that the cells ensure the information and at the same time keep the power consumption low. Once an operation is applied to a cell, the block it belongs to is fully powered for the operation cycle. As reported in [5] if a microlatchup occurs in that block during an operation, the lowering of voltage levels allows limiting the micro-latchup to one operation cycle. The memories used in our experiments may empty imply a similar architectural scheme, which would explain the micro-latchups that we observe. These latchups do not expand to the rest of the device and also the occurrence of such upsets is during dynamic mode testing and not static. Although the details of the well taping scheme are not known, from the shapes of the recorded micro-latchups we presume that the well taps are located every 16 cells vertically and 64 cells horizontally, and assist to the blocking of the transmission of the latchup as explained in [4].

#### C. Rectangular horizontal MCU: Type C

The third type of observed events, Type C affects a larger area with respect to Type B. To the best of our knowledge type C events have not been observed in the past. Fig. 4 displays the events as recorded in two different devices tested under different algorithms.

| (c1) |
|------|
|      |
| (c2) |
|      |
| (c3) |
|      |
| (c4) |

Fig. 4. Type C MCU: (c1),(c2) Dynamic Stress algorithm at  $50^{\circ}$ C – 32112 cells upset (c3),(c4) March C- algorithm – 46504 cells upset.

Type C clusters are significantly larger than Type B, since they have an horizontal extension of upsets that cover from one edge of the memory to the other as seen in Fig 4. (c3,4), or in some cases as seen in Fig 4 (c1,2) from the middle of the memory to the one edge. In addition the vertical extension covers approximately 20-60 bit cells (i.e. vertical lines). It appears that for all the presented cases, there is a repeated sequence. Although this sequence differentiated for different applied test algorithms we do not have enough statistical data to correlate the error sequence with the test. What is important to mention is that the sequences of Fig.4 (c1) and (c2) were recorded concurrently as well as the ones of Fig. 4 (c3) and (c4), meaning that these sequences are considered as one event. The difference between these two cases is their size, shape and placement at the memory as observed in Fig. 1. A first thing to note is the appearance of Type C events in different devices and also different irradiation experiments (one year distance between them). Another important observation is that for all the observed cases, the corrupted bits of the words were all '0' instead of being all '1'

The cluster shapes that are formed in this particular way can be related to the addressing scheme that is followed in this memory. Combined with the incidence of a Single Event Functional Interrupt (SEFI) to the memory periphery, or to the sense amplifier can be a possible explanation. We come to this conclusion since Type C events are symmetric and cannot be correlated to the same source of occurrence with Type B events. Another possible explanation is that these upsets can be related to the address decoder (if placed in the middle of the array, as in common butterfly architecture) malfunction, since at the same cluster of upsets, a different sequence is observed between the first half (left part of the cluster) and the second half (right part of the cluster) as can be clearly seen in Fig. 4(c3) and (c4). Also in Fig 4(c1) and (c2) the corrupted cells cover exactly half of the width of the memory. It is important to note that such events have not been observed for all the tests that we applied something that indicates that their frequency of occurrence is rather small. However they should be taken into account when it comes to the evaluation of the sensitivity of the memory since the number of cells affected is rather large (30k up to 50k bit-cells).

#### D. Rectangular vertical MCU: Type D

Type D cluster was observed through all the different test algorithms applied in dynamic mode. Fig. 5 indicates a few examples observed through h the different test runs as they were isolated from the rest of incident upsets.



Fig. 5. Type D MCU: (d1) Dynamic Stress at  $50^{\circ}C - 38246$  cells upset (d2) March C- algorithm - 89934 cells upset, (d3) Mats+ at  $110^{\circ}C - 20176$  cells upset (d) Dynamic Stress nominal - 41022 cells upset.

Such MCUs can cover a large part of a memory region as can be seen in Fig. 5(d2) or smaller parts but still significant as in Fig 5(d3). Type D events involve 20.000-90.000 corrupted cells, inside an area of 64x2048 cells in a vertical positioned rectangular cluster as according to our experimental results. A possible cause for such events could be micro-latchups that are localized in the failing areas as in the Type B event, but on a larger scale, such that the anti-latchup mechanism does not restrict the event to a smaller area. Type D events cannot be correlated to Type C, since there is no specific sequence on the upset cells that was observed. It is also interesting to note, that these Type D events appear always at the vertical edges of the memory array (left and/or right). This fact may be possibly explained with the concurrence of micro-latchups in this peripheral locations and major delay of the controlling signals (such as word line selection) that are more important in positions tat are far from the address decoder. Othjer possible explaiations may come from the topology of the power grid or the lower effectiveness of the well taps in these areas in reducing the propagation of the latchups, but we do not have elements to ratify these possible causes. Finally, this type of events present regions of 64x64 cells that are not affected by

errors and represent discontinuities in cluster. These regions have been observed in different devices and experiments, thus they cannot be correlated to a faulty device.

#### IV. CONCLUSIONS

Besides the Type A MCU in which a few cells are affected, the rest of the reviewed cases impact a big number of cells, which can be of great importance at a system level. It has to be clarified that such events have been observed only when the memories are tested in dynamic mode. It seems that the architecture of the memory, does not allow these microlatchups to propagate to the entire memory, but limits them to small areas and to one memory cycle duration. For this reason, the device does not need to be power cycled when a latchup occurs. Through this work we revealed some of the issues that may occur during the dynamic mode operation of the memory. Although usually static mode testing is considered a standard for neutron testing, for high-reliability systems, the presented events may potentially occur and should be taken into account. TABLE II summarizes the obtained results with the device cross sections for each type of MCU.

TABLE II

| MCU CROSS SECTION |              |              |              |              |  |  |  |
|-------------------|--------------|--------------|--------------|--------------|--|--|--|
| Dynamic Test      | Type A<br>XS | Type B<br>XS | Type C<br>XS | Type D<br>XS |  |  |  |
| Dyn. Str. 50°C    | 2,8E-07      | 1,9E-07      | 3,3E-09      | 6,6E-09      |  |  |  |
| March C-          | 2,1E-07      | 1,3E-07      | 2,1E-09      | 5,4E-09      |  |  |  |
| Dyn.Str.          | 3,9E-07      | 7,7E-08      | 0            | 2,9E-09      |  |  |  |
| Mats + 110°C      | 3,1E-07      | 5,1E-07      | 0            | 1,6E-08      |  |  |  |

#### REFERENCES

- A.D. Tipton, J.A. Pellish, R.A. Reed, R.D. Schrimpf, R.A. Weller, M.H. Mendenhall, B. Sierawski, A.K. Sutton, R.M. Diestelhorst, G. Espinel, J.D. Cressler, P.W. Marshall, G. Vizkelethy, "Multiple-Bit Upset in 130 nm CMOS Technology," *IEEE Trans. Nucl. Sci.*, vol. 53, no.6, pp.3259,3264, Dec. 2006.
- [2] F. Wrobel, J.-M. Palau, M.-C. Calvet, O. Bersillon, H. Duarte, "Simulation of nucleon-induced nuclear reactions in a simplified SRAM structure: scaling effects on SEU and MBU cross sections," *IEEE Trans. Nucl. Sci.*, vol.48, no.6, pp.1946,1952, Dec 2001.
- [3] G. Gasiot, D. Giot, P. Roche, "Multiple Cell Upsets as the Key Contribution to the Total SER of 65 nm CMOS SRAMs and Its Dependence on Well Engineering," *IEEE Trans. Nucl. Sci.*, vol.54, no.6, pp.2468,2473, Dec. 2007
- [4] D. Radaelli, H. Puchner, Skip Wong, S. Daniel, "Investigation of multibit upsets in a 150 nm technology SRAM device," *IEEE Trans. Nucl. Sci.*, vol.52, no.6, pp.2433,2437, Dec. 2005
- [5] J. Tausch, D. Sleeter, D. Radaelli, H. Puchner, "Neutron Induced Micro SEL Events in COTS SRAM Devices," in *Proc. IEEE Radiation Effects Data Workshop*, pp.185,188, 23-27 July 2007
- [6] A. Hands, P. Morris, K. Ryden, C. Dyer, "Large-Scale Multiple Cell Upsets in 90 nm Commercial SRAMs During Neutron Irradiation," *IEEE Trans. Nucl. Sci.*, vol.59, no.6, pp.2824,2830, Dec. 2012
- [7] ISIS Rutherford Appleton Laboratory, http://www.isis.stfc.ac.uk/
- [8] P. Rech, J-M. Galliere, P. Girard, F. Wrobel, F. Saigné, and L. Dilillo, "Dynamic-Stress Test for Efficient Evaluation of Neutron Impact on Commercial SRAMs", in *Proc. IEEE Nuclear and Space Radiation Effects Conference*, 2011.
- [9] M. Marinescu, "Simple and Efficient Algorithms for Functional RAM Testing", in *Proc. IEEE Int. Test Conf.*, 1982, pp. 236-239.
- [10] Niggemeyer D., Redeker M, Otterstedt J., "Integration of non-classical faults in standard March tests," in *Proc. IEEE Int. Test Conf.*, pp 53-62, 1998.
- [11] G. Tsiligiannis, L. Dilillo, A. Bosio, P. Girard, A. Todri, A. Virazel, A.D. Touboul, F. Wrobel, F. Saigne, "Evaluation of test algorithms stress effect on SRAMs under neutron radiation," in *Proc. IEEE International On-Line Testing Symposium (IOLTS)*, pp.121-122, 2012