Yield Improvement, Fault-Tolerance to the Rescue?
Julien Vial, Alberto Bosio, Patrick Girard, Christian Landrault, Serge Pravossoudovitch, Arnaud Virazel

To cite this version:

HAL Id: lirmm-00303400
https://hal-lirmm.ccsd.cnrs.fr/lirmm-00303400
Submitted on 22 Jul 2008

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Abstract

With the technology entering the nano dimension, manufacturing processes are less and less reliable, thus drastically impacting the yield. A possible solution to alleviate this problem in the future could consist in using fault tolerant architectures to tolerate manufacturing defects. In this paper, we analyze the conditions that make the use of a classical Triple Modular Redundancy (TMR) architecture interesting for a yield improvement purpose.

1. Introduction

To increase the yield for future VLSI systems, fault tolerant architectures have been proposed as a potential solution [1]. Fault tolerant architectures are commonly used to tolerate on-line faults, i.e. faults that appear during the normal functioning irrespective of their transient or permanent nature [2]. In the near future, fault tolerant architectures could also be used to tolerate permanent defects due to an imperfect manufacturing process.

Fault tolerant architectures use redundancies. Redundancy is the property of having more resources than needed to perform a given function. They are generally classified depending on the type of redundant resources. Basically four types of redundancy are considered: hardware, software, information and time [3].

In this paper, we consider hardware redundancy to achieve yield ramp-up benefits since we consider manufacturing process defects. Hardware redundancy consists in modifying the design by adding extra hardware. For example, instead of having a single processor, we can use three processors, each performing the same operation. The failure of one processor is tolerated thanks to a voter that chooses the majority outputs. This is the basic principle of TMR architectures [3].

In the rest of this paper, we study the potential use of a TMR architecture for tolerating manufacturing defects and hence improve the yield.

2. The TMR approach

A TMR structure is a fault tolerant architecture based on three identical modules performing the same function. Their inputs receiving the same data are tied together, and their outputs feed a majority voter (V) circuit as shown in Figure 1. As a result, the TMR architecture significantly reduces the error probability at the primary outputs of the system. A defective module propagating an erroneous value can be masked thanks to the presence of the two other fault-free modules.

![Figure 1: TMR principle](image)

![Figure 2: Tolerance of two stuck-at faults](image)
defects can be handled by considering all the possible fault couples between them.

### 3 A TMR structure for yield improvement

In this section, we investigate the interest in producing the TMR version of a circuit, instead of its single version, in order to tolerate manufacturing defects and consequently improve the yield. We therefore analyze the conditions that have to be fulfilled to achieve benefits in implementing a TMR architecture.

For our analysis, we consider that the voter area is negligible. Thus, if we triplicate a circuit to implement a TMR architecture on a given silicon area S, we can have N TMR circuits having a yield equal to \( \eta_{\text{TMR}} (\eta_{\text{TMR}} \times N \text{ fault-free circuits}) \) or 3N circuits (without redundancy) having a yield equal to \( \eta_c \) \((\eta_c \times 3N \text{ fault-free circuits})\). A tolerant TMR architecture is worthwhile only if:

\[
\eta_{\text{TMR}} \times N > \eta_c \times 3N \quad \text{with} \quad \eta_{\text{TMR}} \leq 1 \quad \text{and} \quad \eta_c \leq 1
\]

Consequently:

\[
\eta_{\text{TMR}} > 3 \eta_c \quad \text{with} \quad \eta_c \leq 1/3 \quad (1)
\]

First of all, due to Eq. 1, it is important to notice that a TMR architecture can be profitable only if \( \eta_c \leq 1/3 \), i.e. when a low manufacturing yield is expected due to the use of aggressive nanotechnologies.

Let us now compute \( \eta_{\text{TMR}} \) and \( \eta_c \) by using the Poisson distribution. It would not be completely accurate to use the Poisson distribution for large circuits due to clustering effects on defects [4], but for a first and rough evaluation this is reasonable. In our case, the Poisson distribution is a discrete probability distribution that defines the probability that a number of manufacturing defects occur in a fixed area if these defects occur with a known probability.

Let \( X \) be the number of manufacturing defects and \( \lambda \) be the average number of expected defects for a given silicon area. Then, \( \lambda = n \lambda_p \) with \( n \) being the number of logic gates (or transistors) and \( \lambda_p \) the average defect rate of a gate (or a transistor). Let \( P[X=k] \) be the probability that the structure has \( k \) manufacturing defects. If \( n \) is high and \( \lambda_p \) is low, the binomial distribution becomes the Poisson distribution:

\[
P[X=0] = e^{-\lambda} \left( \frac{\lambda^0}{0!} \right)
\]

(2)

The presence of a fault makes the entire system faulty. So, \( \eta_c \) is the probability that there is no defect inside the circuit:

\[
\eta_c = P[X=0] = e^{-\lambda} \left( \frac{\lambda^0}{0!} \right) \quad \Leftrightarrow \quad \eta_c = e^{-\lambda}
\]

(3)

Probability that there is no defect

Let \( R \) be the probability that two defects are tolerated. The yield of the TMR structure \( \eta_{\text{TMR}} \) is thus given by:

\[
\eta_{\text{TMR}} = P[X=0] + P[X=1] + R \times P[X=2] + R^2 \times P[X=3] + \ldots
\]

2 defects are equivalent to 1 couple of defect

3 defects are equivalent to 3 couples of defects

\[
\Leftrightarrow \eta_{\text{TMR}} = e^{-\lambda_{\text{TMR}}} \times \left( 1 + \lambda_{\text{TMR}} + \frac{R \lambda_{\text{TMR}}^2}{2!} + R^2 \frac{\lambda_{\text{TMR}}^3}{3!} + \ldots \right)
\]

There are three times more gates (or transistors) into a TMR architecture than into a non-redundant circuit. So, by substituting \( \lambda_{\text{TMR}} \) by \( 3 \lambda_c \), we obtain:

\[
\eta_{\text{TMR}} = e^{-3\lambda_c} \times \left( 1 + 3 \lambda_c + \sum_{\text{i}=2}^{\infty} \frac{R \lambda_c^i}{i!} \right)
\]

and with \( \eta_c = e^{-\lambda_c} \Rightarrow \lambda_c = \ln \eta_c \)

\[
\eta_{\text{TMR}} = e^{3 \ln \eta_c} \times \left( 1 - 3 \ln \eta_c + \sum_{\text{i}=2}^{\infty} \frac{R \ln \eta_c^i}{i!} \right)
\]

(4)

A TMR architecture will improve the resulting yield if \( \eta_{\text{TMR}} > 3 \eta_c \). Figure 3 gives \( \eta_{\text{TMR}} \) as a function of \( \eta_c \) for different values of \( R \) probability. The bold dotted line represents the condition \( \eta_{\text{TMR}} > 3 \eta_c \). From Figure 3, it appears that the condition is satisfied only for values of \( R \) greater than 92.58%.

![Figure 3: \( \eta_{\text{TMR}} \) as a function of \( \eta_c \)](image)

To summarize, implementing a TMR architecture for a given circuit can improve the yield if i) the yield of the non-tolerant circuit is lower than 33.33% and ii) the percentage of tolerated fault pairs using a TMR architecture is greater than 92.58%. A question is still open concerning the meaning of 92.58% value of \( R \). In order to answer this question we will investigate test issues related to TMR architectures.

Consequently, our future work will consist in proposing a test method able to deal with TMR architectures in order to evaluate the percentage of tolerated fault pairs (R probability) on a set of benchmark circuits. These evaluations are mandatory to demonstrate the interest of using TMR architectures for yield improvement purpose.

### References


