# Design and Selection of Buffers for Minimum Power-Delay Product Sandra Turgis, Nadine Azemard, Daniel Auvergne # ▶ To cite this version: Sandra Turgis, Nadine Azemard, Daniel Auvergne. Design and Selection of Buffers for Minimum Power-Delay Product. ED&TC: European Design and Test Conference, Mar 1996, Paris, France. pp.224-228, 10.1109/EDTC.1996.494153. lirmm-00239400 # HAL Id: lirmm-00239400 https://hal-lirmm.ccsd.cnrs.fr/lirmm-00239400 Submitted on 5 Sep 2019 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire **HAL**, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. # Design and Selection of Buffers for Minimum Power-Delay Product S.Turgis, N.Azemard, D.Auvergne Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier LIRMM UMR CNRS 9928 Un de Montpellier II 161 Rue ADA 34392 Montpellier FRANCE #### **Abstract** Using explicit modeling of delays we present and discuss real design conditions of CMOS buffers from the viewpoint of power dissipation. Efficiency of buffer implementation is first studied through the definition of limit for buffer insertion. Closed form alternatives to the design for minimum power-delay product are then proposed in terms of this limit. Validations are obtained through SPICE simulations on two stage inverter arrays. Applications are given to standard cell library in comparing implementations for different selection alternatives. #### 1. Introduction Driving buffers have been extensively used to control delays on combinatorial paths. Values of tapering factors were determined depending on the performance modeling level and on the physical representation of the cells involved with a common objective: minimizing the delay of paths. In an initial simple theory Lin and Linholm [1] introduced the fixed tapered buffer where the minimum propagation delay time is achieved when the output current drive to output capacitance ratio, in each buffer stage remains constant. Jeager [2] showed that the optimum tapering factor was a constant e = 2.72. This result has been then highlighted in the popular books of Mead and Conway [3] and Weste and Eshragian [4]. Improvements to this simple model were obtained by including inertial delay of gates [5,6,7] in the so called split capacitor model. Sakurai [8] also extended the model using standard cell characterization laws, to define optimal tapering factors. First consideration on power minimization for buffer design have been given by Veendricks [9] in his widely recognized attempt to model the influence of input rise and fall time on short circuit power dissipation. Vemuru [10] introduced then a variable tapered approach to save area (and dynamic power). Using calibrated analytical equation of delays [11] Sha Ma [12] obtained 20-30% energy saving in designing multistage variable tapered buffers from multivariable Kuhn-Tucker optimization. CAD tools have been proposed to size gates [13,14,15], moreover they use numerical programming techniques which are far to be intuitive for designers. An exception is offered in [16] where a local optimization has been used to define explicit sizing equations for equal rise and fall times on individual gates, resulting in a variable taper implementation with smaller area than that obtained with global methods. The last published work to date addresses the problem of buffer design with local interconnect capacitance [17] by conserving a constant load capacitance to current drive tapering ratio. A complete treatment of buffers must use an accurate explicit modeling of delay and power, showing clearly up design and process parameters in order to useful in full custom design approach as well as with standard cell methodology. In the work presented here, we address this important problem of designing tapered buffers, with low power-delay product, using analytical means. This paper is organized as follows. The delay modeling is described in section 2. In section 3 we specify the power component to be considered in the objective function. Using the results obtained in section 2 it is then possible to define a general sizing strategy for buffers, this is given in section 4. Section 5 introduces the concept of load limit for buffer insertion. Design conditions for a minimum power delay product are examined in section 6 and applied to standard cell library in section 7. In section 8, we draw a conclusion and discuss a speed up strategy based on template selection and fanout limit evaluation. The main contributions of this paper include: the general formulation of buffer sizing rule, the definition of buffer insertion limits and the application to the buffer design to satisfy minimum power delay product constraints. ### 2 - Delay Modeling An accurate delay analysis must take into account real cell parameters such as technology, structural complexity, cell environment including total output load and input slope effects. To obtain a better accuracy than the RC model (18), we used an explicit formulation based on the physical modeling of the switching operations of individual transistors [11,19]. It allows a real delay evaluation of inverters as a linear combination of the step responses of the driving (i-1) and the controled (i) cell (Fig.1) as follows: $$t_{HL,LH}(i) = \frac{A \cdot t_{LHS,HLS}(i-1) + t_{HLS,LHS}(i)}{I + \alpha \cdot A \frac{t_{LHS,HLS}(i-1)}{t_{HLS,LHS}(i)}}$$ (1) where A specifies the voltage range to be considered in evaluating input slope effect (A=1-2.V<sub>T</sub>/V<sub>DD</sub>), $\alpha$ is a slow input ramp correcting factors [19] and t<sub>HLS</sub>,t<sub>LHS</sub> are the fall and rise step responses of inverters, evaluated for a variation of the output voltage from the output static level to half supply voltage (V<sub>DD</sub>/2). Figure 1: General configuration for a real delay evaluation. These responses can be directly obtained from the mean charge transfer evaluated across the node under consideration and produced by the imbalance current developed in the cell under evaluation as follows: $$t_{HLS}(i) = \tau_{ST} \cdot \frac{C_{load}}{2C_N(i)} \cdot \dots \cdot t_{LHS}(i) = \tau_{ST} \cdot \frac{\mu_N}{\mu_P} \cdot \frac{C_{load}}{2C_P(i)}$$ (2) for an inverter made up of N $(C_N)$ and P $(C_P)$ transistors, loaded by a capacitance $C_{load}$ , where: $$\tau_{st} = \frac{2CoxW_N L min^2}{\mu_N Cox(V_{cc} - V_{tN})} \cdot \frac{8V_{cc}(V_{cc} - V_{tN})}{7V_{cc}^2 + 4V_{tN}^2 - 12V_{cc}V_{tN}}$$ (3) is the elementary fall time characteristic of the technology, which can easily be found to be the fall time of a sequence of ideal inverters (WN=Wp and no parasitic capacitance) with minimum length transistors, loaded by an identical one. The ratios $C_{load}/C_N$ , $C_{load}/C_P$ can be understood as the ratio of load capacitance to the current drive of each switching structure. Note that $\tau_{ST}$ and $\mu_N/\mu_P$ (eq. 2), represent speed characteristic parameters of the considered process, and are defined through average values of mobilities and transistor lengths across the full voltage excursion. Their values can be extracted directly from the characterization of specific oscillators or calibrated from simulation of inverters [20]. #### 3 -Power Dissipation in CMOS Buffers The main component of power to be considered arises during the input signal transition, and consists of two parts, a dynamic dissipation ( $P_d = C_{load} V_{DD}^2$ .f) and short circuit one $(P_{SC} = I_{average}.V_{DD})$ . The dynamic power is the power dissipated to modify the charge content of the capacitive load, it is proportional to the switching frequency and to the load capacitance, and thus depends on the area of the buffers. The short circuit power dissipation is produced by the simultaneous conduction of P and N transistors during the transition of the input signal. As shown in [9,21] the ratio of input to output transition times is a good indicator of short circuit power dissipation. For ratio values comparable or lower than unity, which is the objective of buffer design, the short circuit power dissipation can be neglected. Buffer design with fixed or variable tapering factors have been obtained [10] with short circuit power content lower than 10% of the total power dissipation. As a result, in the definition of our optimizing objectives, we will just consider the dynamic component: $$P = f \cdot V_{DD}^2 \cdot C_{REF} \cdot \frac{C_{tot}}{C_{REF}} \tag{4}$$ where $C_{tot}$ is the total value of the load capacitances involved in the design, it includes layout parasitic (diffusion and interconnect) and active loads ( $C_{IN}(i) = CN(i) + Cp(i)$ ) and $C_{REF}$ is the minimum capacitance considered in the design methodology or in the library under consideration. ### 4 - Buffer Design An interesting analyzis of the power dissipation of buffers has been published recently [22]. A comparison of uniform and nonuniform tapered buffers for minimum switching energy, using simple RC model, is given. For an ideal array constituted of N stages, design rules can be summarized as follows: $$T^{N} = \frac{C_{load}}{C_{IN}} \cdots N = \frac{\ln \frac{C_{load}}{C_{IN}}}{\ln T}$$ (5) where $C_{load}$ is the load capacitance, $C_{IN}$ the minimum input capacitance and T the tapering factor between stages. The output ratio of PMOS to NMOS transistors is assumed to be the same for all stages, and no consideration is given on the symmetry of the responses. As expected for ideal array, the minimum delay time is obtained for T = 2.72 while the minimum power delay product is reached for higher values (T=4.25). Considering real structures with layout parasitic capacitances these factors are significantly increased depending on the relative parasitic content. A non uniform tapering factor can only be considered for minimizing the power delay product and results in only few per cent of improvement [22] with respect to uniform implementations. Minimization of the power-delay product can then be resumed (for regular structures, but this can be solved at the layout level) to the search for a minimum of the product active area-delay. Due to the tapering factor values involved in real buffer arrays it appears reasonable to limit the study to a maximum of N=3 stages. In order to take into accouns input controlling ramp [19], the minimum structure to be considered for design optimization must be constituted of two inverters and this guarantees convexity of delays. Let us now consider a real two stage bufferconstituted of inverters with identical internal configuration ratios $(k=Cp_i/CN_i)$ , the delay expressions become: $$\theta_{HL} = \frac{\mu_{N}}{\mu_{P}} \cdot \frac{1}{k} \left[ \frac{(1+k)C_{N2} + C_{par1}}{C_{N1}} \right] + \left[ \frac{(1+k)C_{N3} + C_{par2}}{C_{N2}} \right]$$ (6) with equivalent expression for the rising edge. In this equation the delay is expressed in reduced units ( $\theta$ =2t/(1+A) $\tau_{ST}$ ), and the subscrits (1,2,3) specify the rank of the buffer, from the input of the array to the output, respectively, $C_{par}(i)$ represents the parasitic output capacitance of each inverter. As deduced from the posynomial variation of delay equations, constraint on equality of delays between falling and rising edges results in the equality of loading factors (load to drive $$\frac{(1+k)C_{N2} + C_{par1}}{C_{N1}} = \frac{(1+k)C_{N3} + C_{par2}}{C_{N2}}$$ (7) which gives: $$C_{N2} = \frac{C_{par1}}{2(1+k)} \left[ -I + \sqrt{1 + \frac{4(1+k)C_{N1} \left[ n(1+k)C_{N1} + C_{par2} \right]}{C_{par1}^2}} \right]$$ (8) where it is assumed that $C_{N\,3}$ is a tapered value of $C_{N\,1}$ . Cancellation of the derivative of $\theta$ with respect to $C_{N\,2}$ cannot be obtained in closed form. Replacing, in eq.6, $C_{N\,2}$ by the value given in eq.8, we plot on figure 2 the variation of $t_{HL} = t_{LH}$ for different values of node capacitive load versus the value of the internal configuration ratio. As shown, the minimum delay occurs for the above defined k values belonging to the interval 1.5-3.8, for pure active or parasitic load conditions, as it can directly be obtained from the cancellation of the derivative of $\theta$ for these special Figure.2: Evolution of the delay of two stage inverters with the content of the capacitive load versus the internal configuration ratio value (k). For the total range of considered loads, sizing with k=1 results in delay degradation ranging from 36% for purely parasitic load to 5% for purely active one. However for buffer array where the parasitic content is not higher than the gate capacitance value of the corresponding transistors [24], the degradation of performance obtained, by selecting identical N and P type transistors to size the inverters, is not much higher than 20% of the minimum available value for a 75% reduction of the transistor area (dynamic power) . #### 5 - Buffer Insertion Limit Next point of interest is to characterize the performance of inverters in terms of load and design parameters, and to define the limit of loads at which any cell can be speed up by insertion of wider inverters. This corresponds to the buffer insertion limit definition for each cell. As previously discussed the general performance equation for inverters can be written as: $$Delay = t_0 + K \cdot \frac{C_L}{C_{IN}} \tag{9}$$ where the signification of the different terms can be found from equ. 2. $t_0$ corresponds to the delay introduced by the inverter layout parasitic capacitance (usually specified as inertial delay), $\mathbf{K}$ is characteristic of the gate strength and is defined (equ.8-14) by the sizing strategy of each inverter. Figure 3 illustrates the load limit of an inverter, defined as the fanout limit, at which buffer insertion improves the speed of the cell. Using equ.6 (fig.4), we have to satisfy the condition: t(a) > t(b). This results in the straight condition : $$t_{0I} + K_{I} \cdot \frac{C_{load}}{C_{INI}} > t_{0I} + K_{I} \cdot \frac{C_{IN2}}{C_{INI}} + t_{02} + K_{2} \cdot \frac{C_{load}}{C_{IN2}}$$ (10) In this inequality $C_{load}$ and $C_{IN1}$ are well identified, $C_{IN2}$ is the input capacitance of the buffer to be selected. A direct determination of the optimal buffer ( $C_{IN2}$ ) is obtained from the cancellation of the derivative, of the right part of equation 10, with respect to $C_{\mbox{IN}2}$ . Figure 3: Illustration of buffer insertion limits: $t_1$ , $(t_{1,4})$ represents the variation of the delay with the load, for one inverter (INV1) or an array of two inverters (INV1,INV4), respectively. t(a) t(b) Figure 4:Arrangement for evaluation of the buffer insertion limit In this inequality $C_{load}$ and $C_{IN1}$ are well identified, $C_{IN2}$ is the input capacitance of the buffer to be selected. A direct determination of the optimal buffer ( $C_{IN2}$ ) is obtained from the cancellation of the derivative, of the right part of equation 10, with respect to $C_{IN2}$ . This gives the general selection condition : $$C_{IN2} = \sqrt{\frac{K_2}{K_I} C_{load} \cdot C_{INI}} \tag{11}$$ This equation defines the best template satisfying the preceding inequality with an optimal buffer insertion, it defines also the maximum load to be used for a given selection of templates. Solving inequality 10 with 11 gives a direct evaluation of the INV1 load limit for which buffer insertion will speed up the corresponding path. This is obtained from: $$T_{lim \ it} - 2\sqrt{\frac{K_2}{K_I}} \bullet \sqrt{T_{lim \ it}} - \frac{t_{02}}{K_I} \ge 0$$ (12) where $T_{limit}$ is more conveniently defined as: $T = C_{load}/C_{IN1}$ , $K_2$ and $K_1$ represent the derating coefficients of equ. 9, characteristics of the gate structure. This value of $T_{limit}$ is absolutely general and can be defined for inverters as well as for gates, as long as performance characterization laws are available. For real inverters, with identical configuration ratios ( $K_1 = K_2$ ), the solution of equation [12] is: $$T_{lim it} = \left[ 1 + \sqrt{1 + \frac{t_{02}}{K_I}} \right]^2 \tag{13}$$ For real structures, evaluated on full custom cells, $t_{02}/K_1$ can be easily evaluated from equ.14 as: $$\frac{t_{02}}{K_1} = \frac{C_{par2}}{C_{IN2}} = \frac{C_{par2}}{C_{IN1}} \sqrt{\frac{1}{T_{lim \, it}}}$$ (14) showing clearly up design parameters. The buffer insertion condition is then obtained from: $$T_{lim\,it} - 2\sqrt{T_{lim\,it}} - \frac{C_{par2}}{C_{IN1}} \bullet \sqrt{\frac{1}{T_{lim\,it}}} \ge 0 \tag{15}$$ Note that considering average value of delays, this result stands for individual inverters with internal configuration ratios balancing or not fall and rise delay times. From layout modeling considerations it can be easily shown that the ratio Cpar2/CIN1 is nearly constant, for transistor widths greater than approximatively 10 times the minimum available length [23,24]. Table 1 summarizes the values of Tlimit deduced, from equ.14, for different parasitic content values. | C <sub>par2</sub> /C <sub>IN1</sub> | 0 | 1 | 2 | 4 | 10 | |-------------------------------------|---|-----|-----|-----|-----| | T <sub>limit</sub> | 4 | 4.9 | 5.6 | 6.8 | 9.4 | Table 1: Buffer insertion limit of single inverter for different output parasitic capacitance content. As shown, Tlimit value varies from 4 (which corresponds to the result previously obtained by Mead [3]), for an ideal structure or with negligible parasitic content, to 10 for a poorly designed inverter. $C_{par2}/C_{IN1} = 4$ is a typical value of 1µm standard cell library. This limit for buffer insertion, that we defined here, corresponds to the maximum acceptable load on an inverter implementing the lowest delay configuration. This structure appears then as the best candidate to be selected for a minimum power-delay product implementation. #### 6 - Design for Minimum Power Delay Product In this part we define an explicit expression for the powerdelay product of inverters and compare design solutions, for the minimum of this product, to the alternatives investigated in the preceding parts. We show that delays in CMOS structures depend on the ratio of the total load to the drive capacitance of each cell, and that this ratio could be expressed $(\theta)$ as a fractional part of a reference capacitance, $(C_{REF})$ used as a measure unit for the load. In the same way, we expressed in equ. 4, the total dynamic power as the product of a reference power, $P_{REF} = f.V_{DD}^2.C_{REF}$ , by the total load capacitance normalized with respect to the reference capacitance CREF, as: $$P_{Dyn} = P_{REF} \cdot \frac{C_{Tot}}{C_{REF}} \tag{16}$$ $$P_{Dyn} = P_{REF} \cdot \frac{C_{Tot}}{C_{REF}}$$ (16) For an array of 2 inverters this results in: $$\frac{P_{Dyn}}{P_{REF}} = \left[ \frac{C_{load} + C_{par2}}{C_{REF}} + \frac{(1+k)C_{N2} + C_{par1}}{C_{REF}} + \frac{(1+k)C_{N1}}{C_{REF}} \right]$$ (17) The power-delay product can then be obtained as: The power-delay product can then be obtained as: $$\Sigma \theta \bullet \frac{P_{Dyn}}{P_{REF}} = \left[ [1+k]^2 \left[ (1+\frac{\mu_N}{\mu_P} \cdot \frac{1}{k}) \cdot (\frac{C_{IN2} + C_{par1}}{C_{IN1}} + \frac{C_{load} + C_{par2}}{C_{IN2}}) \right]$$ $$\bullet \left[ \frac{C_{Load} + C_{par2} + C_{par1}}{C_{IN1}} + (\frac{C_{IN2}}{C_{IN1}} + 1) \right]$$ where C<sub>IN1</sub>=(1+k)C<sub>N1</sub> defines the reference capacitance. where $C_{IN1}=(1+k)C_{N1}$ defines the reference capacitance. Derivative of this expression with respect to C<sub>IN2</sub> gives the optimum tapering factor minimizing the power-delay product. However for this equation no closed form solution is available. | C <sub>par1</sub> =C <sub>par2</sub> | 0 | 1 | 2 | 4 | 6 | |--------------------------------------|-----|-----|-----|-----|-----| | C <sub>IN2</sub> min P.θ | 2.6 | 2.7 | 2.8 | 3 | 3.2 | | T= Cload/C <sub>IN2</sub> | 3.8 | 3.7 | 3.6 | 3.3 | 3.1 | | θ | 45 | 54 | 63 | 82 | 99 | | P | 27 | 31 | 37 | 44 | 52 | | Ρ.θ | 1.2 | 1.5 | 2.3 | 3.6 | 5.2 | | $C_{IN2}$ min $\theta$ | 3.2 | 2.7 | 2.5 | 1.9 | 1.2 | | T= Cload/C <sub>IN2</sub> | 3.2 | 3.7 | 3.9 | 5.3 | 8.5 | | θ | 44 | 53 | 62 | 80 | 98 | | P | 28 | 32 | 37 | 456 | 54 | | Ρ.θ | 1.3 | 1.5 | 2.3 | 3.9 | 7 | | C <sub>IN2</sub> T <sub>limit</sub> | 2.5 | 2 | 1.8 | 1.5 | 1.3 | | T= Cload/C <sub>IN2</sub> | 4 | 5 | 5.6 | 6.8 | 7.7 | | θ | 46 | 60 | 74 | 105 | 137 | | P | 27 | 30 | 34 | 41 | 49 | | Ρ.θ | 1.2 | 1.6 | 2.5 | 4.3 | 6.7 | Table 2: Comparison of the power delay product minima obtained for a two stage array, to the values obtained by sizing the 2nd stage for $\theta$ min or at Tlimit. For illustration we represent in Table 2 the variation, with C<sub>IN2</sub> of the power-delay product (equ.18) for a configuration of 2 real inverters. The load has been imposed equal to 10 C<sub>IN1</sub> and the configuration ratio k=1. As shown, depending on the parasitic content of the array implementation, tapering factors for minimum power delay implementations exhibit a small variation with respect to the parasitic load. Alternatives, with explicit solutions, allow interesting trade offs between speed and power with a few penalty on the power delay product. As shown, an explicit sizing at Tlimit constitutes a good solution for low power implementations. # 7 - Application to Standard Cells From the previously shown results it appears that if the inverter is the most efficient gate in terms of load derating factor and fanout limit, the cost of each gate selection can be evaluated, on line, in terms of transistor size and fanout factor compared to its limit value (T/Tlimit). | | RISE | FALL | Tlimit | C <sub>limit</sub> | $C_{IN}$ | |------|--------------------------|--------------------------|--------|--------------------|----------| | | (ns) | (ns) | | (pF) | (pF) | | INV1 | 0.29+ 2.95C <sub>L</sub> | 0.29+ 2.68C <sub>L</sub> | 5.9 | 0.34 | 0.06 | | INV2 | 0.22+ 1.44C <sub>L</sub> | 0.20+ 1.21C <sub>L</sub> | 5.9 | 0.65 | 0.11 | | INV3 | $0.25 + 0.95 C_{L}$ | $0.22 + 0.76C_{L}$ | 5.9 | 0.96 | 0.16 | | INV4 | $0.21 + 0.74C_{L}$ | 0.19+ 0.63C <sub>L</sub> | 5.9 | 1.20 | 0.2 | | INV5 | 0.22+ 0.59C <sub>L</sub> | $0.21 + 0.50C_{L}$ | 5.9 | 1.50 | 0.25 | Table 3: Performance equations of a 1µm standard cell library. We applied these results to the different inverter cells of an industrial 1µm library, which performance equations are given in Table 3 together with the values of Tlimit, the maximum load allowed and the input capacitance of each template. These values constitute the first determination, at the cell level, of real buffer insertion limits allowing, by a direct inspection of loading factors on the output nodes, the definition of speed up or reconfiguration strategies as discussed later. In Table 4 we compare delay, power and power-delay product values for different selection of inverters necessary to feed different loads. Different selection of design choices are investigated: implementation with the optimal number of stages (equ.5), minimum number of stages defined at the minimum delay with the condition $T < T_{limit}$ , with regular tapering factor values, and selection of templates at $T_{limit}$ resulting in irregular tapering factor values. As shown in the Table the last condition results always in minimum power-delay product with very low penalty in delay. Sizing loaded stage at Tlimit always results in non uniform tapering factors with a lower $P\theta$ product. Note that due to the limited choice in the library (discrete variation of the drive capabilities of the cell), some alternatives are equivalent for particular loading conditions | particular loading conditions. | | | | | | | | |--------------------------------|-----------------|--------------|----------------------------|-----------------|-------------|-----|--| | Load<br>(pF) | Design<br>strat | Stage<br>nbr | Cell<br>selection | Delay<br>r.unit | P<br>r.unit | Ρ.θ | | | | 1 | 3 | 2INV5+INV2<br>INV3<br>INV1 | 2.1 | 2.74 | 5.7 | | | 1.9 | 2 | 2 | 2INV3<br>INV1 | 2.19 | 2.28 | 5 | | | | 3 | 2 | 2INV3<br>INV1 | 2.19 | 2.28 | 5 | | | | 1 | 3 | 2INV5<br>INV3<br>INV1 | 2.03 | 2.22 | 4.5 | | | 1.5 | 2 | 2 | INV5+INV1<br>INV1 | 2 | 1.87 | 3.7 | | | | 3 | 2 | INV5<br>INV1 | 2 | 1.51 | 3.0 | | | | 1 | 3 | INV3+INV4<br>INV3<br>INV1 | 1.8 | 1.59 | 2.8 | | | 1 | 2 | 2 | INV5<br>INV1 | 1.73 | 1.31 | 2.3 | | | | 3 | 1 | INV3<br>INV1 | 1.74 | 1.22 | 2.1 | | Table 4: Comparison of design alternatives for a low powerdelay product implementation: 1 is the solution with optimal number of stages, 2 is the minimum delay implementation and 3 the selection at $T_{limit}$ . # 8 - Conclusion Design and sizing of tapered buffers has been considered for minimizing the power-delay product. Rules for buffer insertion have been defined as a function of physical design parameters. They have been shown to constitute a good alternative for efficient use of inverter templates as well as for searching heavily loaded nodes to apply speed up alternatives. From a general definition of the power-delay product it has been shown that sizing at the buffer insertion limit constitutes an efficient solution for buffer implementation with a minimum power-delay product. Moreover this allows closed form solutions of this problem, which can be completely determined from the performance characterization laws of library cells. Application to a $1\mu m$ standard cell library has been given. Comparisons of implementations realized for an optimum buffer stage number, a minimum delay and a selection of templates at T<sub>limit</sub>, gives evidence of the benefit of this last solution for low power design. #### References - [1] H.C. Lin and L.W. Lindholm, "An optimized output stage for Mos integrated circuits" IEEE J. Solid State circuits, vol SC 10,n°2, pp.106-109, Apr. 1975. - [2] R.C. Jaeger, "Comments on 'An optimized output stage for MOS integrated circuits", IEEE J. Solid State circuits, vol SC 10,n°3, pp.185-186, June. 1975. - [3] C. Mead, L. Conway "Introduction to VLSI systems", Addison Wesley 1980. - [4] N.Weste"Principles of CMOS VLSI design" Addison Wesley 1985. - [5] N. Hedenstierna and K.O. Jeppson "CMOS circuit speed and buffer optimization" IEEE Trans. on CAD vol.CAD-6, n°2, pp.270-281, March 1987. - [6] N.C. Li, G.L. haviland, and A.A.Tuszinsky, "CMOS tapered buffer" IEEE J. Solid State Circuits, vol. 25, pp.1005-1008, Aug.1990. - [7] C.Prunty and L. Gal "optimum tapered buffer" IEEE J. Solid State circuits, vol 27,n°1, pp.118-119, Jan. 1992. - [8] T. Sakurai " A unified theory for mixed CMOS/BiCMOS buffer optimization" IEEE J. Solid State circuits, vol 27,n°7, pp.1014-1019, July 1992. - [9] H.J.M. Veendrick "short circuit dissipation of static CMOS circuitry and its impact on the design of buffer circuits" IEEE J. Solid State circuits, vol 19, pp.468-473, Aug. 1984. - [10] S.R. Vemuru, A.R. Thorbjornsen "Variable Taper CMOS buffer", IEEE J. of Solid State Circuits, vol.26, n<sup>o</sup>9, p1265-1269, 1991. - [11] D.Deschacht, M.Robert, D.Auvergne "Explicit formulation of delays on CMOS data path" IEEE J. of Solid State Circuits vol.23, p.1257-1264, Oct1988 - [12] Sha Ma and P. Francon, "Energy control and accurate delay estimation in the design of CMOS buffers" IEEE J. of Solid State Circuits, vol.29, n<sup>0</sup>9, p1150-1153, Sept. 1994. - [13] J. Fishburn and A. Dunlop "TILOS: A polynomial programming approach to transistor sizing" Proc. ICCADnov. 1985 pp. 326-328. - [14] A.J. Al-Khalili, Y. Zhu and D. Al-Khalili "A module generator for optimized CMOS buffers" proc. 26th Design Automation Conference pp.245-250. 1989 - [15] H.Y. Chen, S.M. Kang, "iCOACH: a circuit optimization aid for CMOS high performance circuits", Integration the VLSI Journal, vol.10, pp185-212, 1991 - [16] D.Auvergne, N.Azemard, V.Bonzom, D.Deschacht, M.Robert "Formal sizing rules of CMOS circuits", EDAC, European Design Automation Conference, pp 96-100, Amsterdam, 25-28 February 1991 - [17] B.S. Cherkauer and E.B. Friedman "design of tapered buffers with local interconnect capacitance" IEEE J. of Solid State Circuits vol.30, n°2, p.151-155, Feb. 1995 - [18] J.Rubinstein, P. Penfield, M. Horowitz "Signal delay in RC tree networks" IEEE trans. on Computer Aided design vol. CAD 6, pp202- - [19] D. Auvergne, N. Azemard, D. Deschacht, M. Robert "Input waveform slope effects in CMOS delays", IEEE J. of Solid State Circuits, vol.25, n<sup>0</sup>6, p1588-1590, 1990. - [20] P. Coll, M. Robert, X. Regnier and D. Auvergne "Process characterization with dynamic test structures" Electronics letters vol. 29, n° 20, pp1764-1766, Sept. 1993. - [21] S. Turgis, N. Azemard and D. Auvergne "Explicit evaluation of short circuit power dissipation for CMOS logic structures" "1995's International Symposium on Low Power Design" April 23-26, Dana Point Resort, California. - [22] Joo-Sun Choi and K.Lee "design of CMOS tapered buffer for minimum power delay product" IEEE J. of Solid State Circuits, vol.29, n<sup>0</sup>9, p1142-1145, 94. - [23] F. Moraes, N. Azemard, M. Robert, D. Auvergne "Flexible macrocell layout generator" 4th ACM/SIGDA Physical design Workshop, pp 105-116, Lake Arrow CA, April 19-21, 1993. - [24] M. Mellah, N. Azemard, D. Auvergne"Standard cell performance modeling" Proc. PATMOS 94,pp.158-169 Barcelona 1994