A Novel Dummy Bitline Driver for Read Margin Improvement in an eSRAM
Michael yap San Min, Philippe Maurine, Magali Bastian Hage-Hassan, Michel Robert

To cite this version:

HAL Id: lirmm-00243966
https://hal-lirmm.ccsd.cnrs.fr/lirmm-00243966
Submitted on 27 Jun 2022

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
A Novel Dummy Bitline Driver for Read Margin Improvement in an eSRAM

M. Yap San Min1, 2, P. Maurine1, M. Bastian2 and M. Robert1
1LIRMM, Montpellier, France
2INFINEON TECHNOLOGIES, Sophia Antipolis, France
{michael.yapsanmin, pmaurine, michel.robert}@lirmm.fr, {michael.yap, magali.bastian}@infineon.com

Abstract
Aggressive scaling of transistors is often accompanied by an increase in variability of its intrinsic parameters. In this paper, we point out the importance of considering sensitivity performances due to process variations during SRAM design. We propose a novel dummy bitline driver, an essential component in a self timed memory, which is less sensitive to process variations. A statistical sizing method of this dummy bitline driver is introduced so as to improve the read timing margin, while ensuring a high timing yield. The memory considered is a 256kb SRAM design in 90nm technology node.

Keywords: dummy bitline driver, low power, self-timed memory, SRAM, statistical design

1 Introduction
Technology fabrications have led to the realization of system on chip whereby functional blocks coexist, like embedded memories which can occupy up to 80% of the chip’s area. Hence, the overall performances and the fabrication yield of the chips rely heavily on memory’s yield. Simultaneously with the rapid increase of memory blocks within the chips, technology evolution is accompanied by an increase of variability effects owing to process variations, which appear during the manufacturing steps.

Generally, process variability can be classified into 2 distinct groups of manufacturing processes namely: global and local variations. Global variations originate from numerous factors: non uniform chemical mechanical polishing [1], lens aberrations [2] and non-uniformity of temperature [1], whereas local variations stem from a variety of factors like random dopant fluctuations [3] and line edge roughness [4]. In fact transistor scaling has exacerbated the impact of local and global variations, affecting performances of integrated circuits like maximum operation frequency and static power consumption.

To handle the impact of process variations in circuit design, corner based methodology is performed by characterizing the circuit across process corners. However, the increase of variability in manufacturing process results in an underestimation of performances in the operating frequency of an integrated circuit. This can therefore impact on the convergence of the design flow. In this paper, we highlight the importance of considering process variations in the design of an SRAM. We propose a novel dummy bitline driver which tracks the discharge time of the bitline in a read operation and triggers the sense amplifier at the right time. This structure is less sensitive to process variation. A statistical sizing method of the driver is also introduced to improve the read timing margin while guaranteeing a high timing yield.

The paper is organized as follows. Section 2 introduces a simple way of computing the required read timing margin without being too pessimistic and of calculating the probability of fulfilling this constraint. Section 3 presents a new structure, dubbed Dummy Bitline Driver (DBD), having its timing performances less sensitive to manufacturing processes compared to a more classic DBD [5]. In section 4, we will introduce a statistical sizing method of the DBD which is independent of the process corner. Section 5 compares the results obtained between the proposed and the classic structures.

2 Modelling Approach
Conventionally, the characterization of a circuit involves performing several simulations across best and worst case corners to verify whether its performances and timing constraints are met under all conditions. For example, the worst case delay is defined by considering that principal parameters $p_i$ of transistors have their values at $\pm m_i \sigma_{p_i} (m_i \in \mathbb{N})$ around their mean values $\mu_{p_i}$ ($\sigma_{p_i}$ represents the standard deviation of the statistical distribution of parameter $p_i$). The set up of such a simple approach, through a proper choice of $m_i$ values, allows the worst case to be defined at $n \sigma_D$, with $\sigma_D$ being the standard deviation of the delay distribution.

Failing to account for local variations across process corners is not a serious problem as far as simple data paths are considered. However, this issue is far more complex for data path with racing conditions. This approach incurs optimistic and pessimistic estimations of worst and best case methods.
3 Dummy Bitline Driver with Reduced Variance

In a more specific context, involved in the design of advanced technologies, the corner method seems no longer enough to satisfy the timing constraints without the use of an increasing timing margins caused by an increase of local variations. This fact brings up a question: Is it possible to maintain, or even reduce the design timing margins through design?

To do so, we have defined a Dummy Bitline Driver (DBD) structure (Fig. 3a) which is less sensitive to process variations compared to a more classic structure (Fig. 3b). Indeed, the DBD is an essential component of a self-timed SRAM.

![Figure 1: Signal race between paths A and B in an SRAM](image)

In this section, we will introduce a way of computing the required read timing margin without being too pessimistic or optimistic. Consider the signal races during a read operation between signals A and B issued from the same control block. Signal A activates a selected memory cell (denoted by CC in Fig. 1) which discharges bitline BL, whereas signal B triggers the sense amplifier during the discharge process of BL (Fig. 1). Let us also assume that the signal A should arrive at most 0 ps after signal B for a proper read operation of a selected SRAM cell. Let \( \mu_A, \mu_B, \sigma_A, \sigma_B \) be the mean values and the standard deviations of the propagation delay distributions of signals A and B. Let \( \mu_D, \sigma_D \) represent the mean and the standard deviation values of the path delay difference D (read timing margin) between A and B.

Let us now evaluate the probability of meeting a timing constraint. Assuming that all distributions are normal, the mean value and the standard deviation of distribution D are given by:

\[
\mu_D = \mu_B - \mu_A \\
\sigma_D = \sqrt{\sigma_A^2 + \sigma_B^2 - 2 \cdot \sigma_A \cdot \sigma_B \cdot \rho} \tag{1}
\]

Using the Galton approximation, with the hypothesis that \( \mu_D > 0 \), the probability \( P_D \) of satisfying the timing constraint for all values of \( \rho \) is computed as follows:

\[
P_D = \frac{1}{2} \left[ 1 + \exp \left( \frac{-\frac{1}{2} \mu_D}{\sigma_D} \right) \right] \tag{2}
\]

As the sensitivities of delays to process variations \( V_A = \sigma_A / \mu_A \) and \( V_B = \sigma_B / \mu_B \) are known and found to be relatively constant over a wide range of \( \mu_A \) and \( \mu_B \) values \((\pm20\%)\), the value \( \mu_D \) and subsequently that of the read timing margin \( \mu_D^{\text{Yield}} \) (Appendix A.1) can be computed as follows to guarantee a proper read operation defined at \( n \sigma \):

\[
\mu_D^{\text{Yield}} = -\frac{a}{b} \left( \sqrt{1 - \frac{c}{a^2}} + 1 \right) - \mu_A \tag{3}
\]

\[ a = n^2 \cdot V_A \cdot \sigma_A \cdot \rho - \mu_A \]
\[ b = l - n^2 \cdot V_B^2 \]
\[ c = \mu_B^2 - n^2 \cdot \sigma_B^2 \]

![Figure 3: (a) Proposed DBD (b) Reference DBD (c) 6T SRAM cell](image)

In the absence of an internal clock signal, the DBD coupled with the dummy bit line acts as a metronome to fire the sense amplifier at the appropriate time during a read operation. It guarantees, as shown in Fig. 1, the proper triggering of the appropriate sense amplifier when the potential difference of the input signals between BL and BLB of the sense amplifier has reached the required level \((10\% \text{ of Vdd})\).

The topology of the proposed DBD has been realized such that the discharge characteristics of dummy bitline (Fig. 1) being discharged by the DBD match those of bitline being discharged by an SRAM cell represented in Fig. 3c. As shown in Fig. 3a, transistors PD and PGi \((i=1 \text{ to } 4)\) of the proposed DBD are akin to transistors PDcci \((i=1, 2)\) and PGcci \((i=1, 2)\) of the SRAM cell. Moreover, logic gates g1 and g2 will mimic the signal WEN which controls pass gate PGcci. The transistor Pr is used for precharging dummy bitline, connected to pin 'out', at Vdd before any read operation. Transistor N1 sets
node Z to 0 V at the beginning of a read cycle operation. When the internal signal WLSDUM is at ‘1’ during a read mode, inputs pins Iadji (i=1 to 4) are activated by hardcoding them individually at Vdd. Hence, they can be used to adjust the discharge current of the DBD with respect to the actual supplied voltage of the memory. In doing so, the read timing margin can be adapted to the supply voltage applied. It should be noted that the reference DBD has also the same functionalities as the proposed DBD. The main difference lies in the use of stacked transistors for representing pass gate and pull down transistors of the SRAM cell. This condition causes the sensitivity of the read current flowing through PGi and PDi (i=1..4) to be less representative of the read current flowing through PGccl and PDecl in the 6T SRAM cell.

4 Statistical Sizing Method

In order to perform comparisons between the reference and proposed DBDs under constant timing yield, we have developed a sizing methodology.

Step 1 (identification of most critical condition): Starting from an initial solution, the first step involves identifying the voltage and temperature (V, T) Crit conditions having the poorest timing yield. To identify the critical condition, transient simulations of the timing performances of critical paths A and B in the memory are done under different temperature and voltage conditions covering this whole range to obtain μA and μB. The critical condition corresponds to the highest numerical value of the following expression:

\[
\text{Yield } = \frac{\mu_A \cdot \rho - \sigma_A}{\rho - 1}
\]

Step 2 (variability estimation): The second step requires the estimation of the variability of paths A and B involved in the signal races. To do so, Monte Carlo simulations of the critical path are performed at the critical conditions (V, T) Crit found in step 1. Once these statistical simulations are performed and the values of μA, μB, σA, σB and ρ are obtained, the value of the required timing margin μ0Yield corresponding to a timing yield is computed using (3).

Step 3 (sizing for a given timing yield): The third step consists in sizing the DBD at a typical process and under (V, T) Crit to obtain the computed μ0Yield.

Step 4 (first verification step of the timing yield): Once the above sizing procedure is over, the first verification step consists in performing Monte Carlo simulations on the critical path at (V, T) Crit to obtain μA, μB, σA, σB and ρ values. The constraint of the timing yield is then evaluated using (2). If the computed value fulfills the predefined constraint, we proceed with the second verification step. Otherwise, we reiterate step 3 with the new values of μA, μB, σA, σB and ρ.

Step 5 (second verification step of the timing yield): It implies verifying that the constraint of the timing yield satisfies all temperature and supply voltage conditions. This is done through Monte Carlo simulations in order to estimate the values of μA, μB, σA, σB and ρ for different values of V and T. Once the statistical simulation has been done, the timing yield is processed. If the values obtained for the various (V, T) couples are greater than the predefined constraint at (V, T) Crit, the verification step is over. However, if the constraint is not satisfied, step 1 should be repeated with the new sizing obtained.

5 Performance Comparisons

To perform performance comparisons, both reference and proposed DBDs have been placed in the critical path of a 256kb SRAM memory. The model card, used in Hspice simulations, is the bsim4.3.0 which takes into account local and global variations. The sizing methodology developed in section four has been applied to pass gate transistors PG1 to PG4 and pull down transistors PD, PD1 to PD4 (Fig. 3a and 3b) at four operating voltages considered i.e. 1.0V, 1.08V, 1.2V and 1.32V. The timing yield had been set at 99.87% i.e. n=3 and the correlation value ρ considered was equal to 0.9. At each operating voltage, the appropriate adjustments of pins Iadji were performed.

Once the statistical sizing method has been done, we performed 2000 Monte Carlo runs in order to obtain the mean values (μA and μB) and standard deviation values (σA and σB) of the characteristic delays of the signal races of paths A and B over the whole voltage and temperature ranges considered. The results obtained were used to compute in table 1 the reduction in the delay variance (ΔVBD) of path B between proposed (prop) and reference (ref) DBDs and in table 2, the probability PV (2) of meeting the timing constraint, the read timing margin (1) of the reference µdref and proposed µ dprop DBDs and subsequently the reduction in read timing margin Δµd0 between proposed and reference DBDs.

Table 1 shows the reduction in variability obtained. The first column Iadji corresponds to the respective branches of transistors selected with respect to supply voltage. For instance Iadji=1, 2 means that branches Iadj1 and Iadj2 are selected at Vdd= 1.08V. The reduction in variability (ΔVB/VBref) is quite important, lying between 5.8% and 24.7%. This reduction has been achieved by using pass gate transistors PGi in the proposed DBD which is 2 to 3 times the size of the PGi used in the reference DBD.

In table 2, we can see that the values of the probability PV of fulfilling the read timing constraint have been computed. As expected, the values of PV are very close to the required 99.87% (3σ) for both the reference and proposed structures. Simultaneously, we observe a reduction in the read
timming margin $\Delta \mu_D/\mu_{D_{ref}}$ lying between 14.5% to 25.2%.

### Table 1: Variability reduction

<table>
<thead>
<tr>
<th>$\text{Vol}$(V)</th>
<th>$\text{T}(T)$</th>
<th>$\mu_{D}$(ps)</th>
<th>$\sigma_{D}$(ps)</th>
<th>$F_{D_{ref}}(V)$</th>
<th>$F_{D}(V)$</th>
<th>$P_{D_{ref}}$</th>
<th>$P_{D}$</th>
<th>$\Delta P_{D}$</th>
<th>$\Delta F_{D_{ref}}(V)$</th>
<th>$\Delta F_{D}(V)$</th>
<th>$\Delta P_{D}$</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1.00</td>
<td>-0.40</td>
<td>8.00</td>
<td>60</td>
<td>62.00</td>
<td>80.00</td>
<td>68.00</td>
<td>-800.00</td>
<td>-800.00</td>
<td>-800.00</td>
<td>-800.00</td>
</tr>
<tr>
<td>1.2</td>
<td>1.00</td>
<td>-0.15</td>
<td>1.00</td>
<td>60</td>
<td>60.00</td>
<td>60.00</td>
<td>60.00</td>
<td>-800.00</td>
<td>-800.00</td>
<td>-800.00</td>
<td>-800.00</td>
</tr>
<tr>
<td>1.2, 1.3, 1.4</td>
<td>1.00</td>
<td>-0.01</td>
<td>0.00</td>
<td>60</td>
<td>60.00</td>
<td>60.00</td>
<td>60.00</td>
<td>-800.00</td>
<td>-800.00</td>
<td>-800.00</td>
<td>-800.00</td>
</tr>
</tbody>
</table>

### Table 2: Reduction of read timing margin

<table>
<thead>
<tr>
<th>$\text{Vol}$(V)</th>
<th>$\text{T}(T)$</th>
<th>$\mu_{D}$(ps)</th>
<th>$\sigma_{D}$(ps)</th>
<th>$F_{D_{ref}}(V)$</th>
<th>$F_{D}(V)$</th>
<th>$P_{D_{ref}}$</th>
<th>$P_{D}$</th>
<th>$\Delta P_{D}$</th>
<th>$\Delta F_{D_{ref}}(V)$</th>
<th>$\Delta F_{D}(V)$</th>
<th>$\Delta P_{D}$</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1.00</td>
<td>-0.40</td>
<td>8.00</td>
<td>60</td>
<td>62.00</td>
<td>80.00</td>
<td>68.00</td>
<td>-800.00</td>
<td>-800.00</td>
<td>-800.00</td>
<td>-800.00</td>
</tr>
<tr>
<td>1.2</td>
<td>1.00</td>
<td>-0.15</td>
<td>1.00</td>
<td>60</td>
<td>60.00</td>
<td>60.00</td>
<td>60.00</td>
<td>-800.00</td>
<td>-800.00</td>
<td>-800.00</td>
<td>-800.00</td>
</tr>
<tr>
<td>1.2, 1.3, 1.4</td>
<td>1.00</td>
<td>-0.01</td>
<td>0.00</td>
<td>60</td>
<td>60.00</td>
<td>60.00</td>
<td>60.00</td>
<td>-800.00</td>
<td>-800.00</td>
<td>-800.00</td>
<td>-800.00</td>
</tr>
</tbody>
</table>

6 Conclusion

Due to the pessimism of corner analysis method, we have proposed a simple way of computing the required read timing margin and of calculating the probability of meeting this constraint. A statistical optimization method has also been developed to ensure a predefined timing yield. The developed design approach has been particularly introduced to reduce the critical path of the SRAM memory, in which the dummy bitline driver has been replaced by a more robust structure to manufacturing process variations. Results have demonstrated that the use of the optimization method and the proposed dummy bitline driver improves significantly the reduction in the design timing margins, while ensuring a given timing yield.

Appendix

A.1) Estimation of design margin $\mu_D$ at $n \sigma_D$

Suppose that we want to have a read margin $\mu_D$ at $n \sigma_D$, such that:

$$\mu_D - n \sigma_D = 0$$  \hspace{1cm} (A.1)

As we have seen previously in section 2, $\mu_A$ and $\mu_B$ can be defined using (1). Expression (A.1) can therefore be represented by the following equation:

$$\mu_A - \mu_B = -n \sigma_A = \sqrt{\sigma_A^2 + \sigma_B^2 - 2 \sigma_A \cdot \sigma_B \cdot \rho} = 0$$  \hspace{1cm} (A.2)

Let $V_A = \frac{\sigma_A}{\mu_A}$ and $V_B = \frac{\sigma_B}{\mu_B}$

As the delay of signal B should be greater than that of signal A, delay $\mu_B$ in (A.2) becomes:

$$\mu_B = \frac{a}{b} \sqrt{\left(\frac{V_B}{V_A}\right)^2 + 1}$$  \hspace{1cm} (A.3)

Once the delay of $\mu_B$ is computed, the required design margin $\mu_{D_{yield}}$ in (1) is given by:

$$\mu_{D_{yield}} = \frac{a}{b} \sqrt{\left(\frac{V_B}{V_A}\right)^2 + 1} - \mu_A$$  \hspace{1cm} (A.4)

A.2) Identification of critical condition $(V, T)_{crit}$

The probability $P_V$ of fulfilling a timing constraint is given by (2). In fact, since $(V, T)_{crit}$ represents the condition showing the highest probability of the occurrence of a timing constraint violation $P_V$ should be minimum at this condition. Thus, $P_V$ is minimum if expression $\mu_{D_{yield}}^2$ is also minimum

$$\alpha = \frac{\sigma_A - \sigma_B}{\mu_A - \mu_B}$$  \hspace{1cm} (A.5)

By substituting both $\sigma_B$ by (1) and $\sigma_A$ by (A.7) in (A.6), expression (A.6) can be represented by:

$$\frac{\mu_B^2}{\alpha^2 \left(1 + \frac{2 \mu_B}{\mu_A - \mu_B} \right)}$$  \hspace{1cm} (A.6)

It can be clearly seen that expression (A.8) is minimum if expression $\frac{\mu_B^2}{\mu_A - \mu_B} \sigma_B$ is the largest.

7 References


