

# Static Implementation of QDI Asynchronous Primitives

Philippe Maurine, Jean-Baptiste Rigaud, Ghislain Fraidy Bouesse, Gilles

Sicard, Marc Renaudin

# ▶ To cite this version:

Philippe Maurine, Jean-Baptiste Rigaud, Ghislain Fraidy Bouesse, Gilles Sicard, Marc Renaudin. Static Implementation of QDI Asynchronous Primitives. PATMOS: Power And Timing Modeling, Optimization and Simulation, Sep 2003, Turin, Italy. pp.181-191, 10.1007/978-3-540-39762-5\_20. lirmm-00269567

# HAL Id: lirmm-00269567 https://hal-lirmm.ccsd.cnrs.fr/lirmm-00269567

Submitted on 14 Sep 2019

**HAL** is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire **HAL**, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

# **Static Implementation of QDI Asynchronous Primitives**

P. Maurine, J.B. Rigaud, F. Bouesse, G. Sicard, and M. Renaudin

LIRMM 161 rue Ada 34392 Montpellier Cedex 5 France TIMA Laboratory 46, avenue Félix Viallet 38031 Grenoble Cedex France

**Abstract.** To fairly compare the performance of an asynchronous ASIC to its homologous synchronous one requires the availability of a dedicated asynchronous library. In this paper we present TAL\_130nm a standard cell library dedicated to the design of QDI asynchronous circuits. Cell selection and sizing rules applied to develop TAL\_130nm are detailed. It is shown that significant area and power savings as well as speed improvements can be obtained.

## **1** Introduction

If asynchronous circuits can outperform synchronous ICs in many application domains such as security and automotive [11], the design of integrated circuits still remains essentially limited to the realization of synchronous chips. One reason can explain this fact: no CAD suite has been proposed by the EDA industry to provide a useful and tractable design framework. However, some academic tools have been or are under development [1,2,3].

Among them TAST [3] is dedicated to the design of micropipeline ( $\mu$ P) and Quasi Delay Insensitive (QDI) circuits [11]. Its main characteristic is to target a standard cell approach. Unfortunately, it is uncommon to find in typical libraries (dedicated to synchronous circuit design) basic asynchronous primitives such as C-elements. Consequently, the designer of QDI asynchronous IC, adopting a standard cell approach, must implement the required boolean functions on the basis of AO222 gate [1,9]. It results in sub optimal physical implementations as illustrated on figure (1) that gives evidence of the power and area savings that can be obtained from the development of a library dedicated to the design of asynchronous Library), a standard cell library dedicated to the design of QDI asynchronous circuits.

This paper aims to introduce the methods we used and the choice we made to design TAL. It is organized as follows. In section II, the structural specificities of QDI gates are introduced. This section also describes two sizing criteria, deduced from a first order delay model, allowing reducing area cost while maintaining the throughput. In section IV, we deduce from the first order delay models of both static and ratioed CMOS structures two sizing criteria allowing reducing the area cost of any QDI gate while maintaining its throughput. Finally, section IV reports the performance of the gates designed following our sizing strategy and compare them to gates implemented using basic AO222 gates borrowed from a standard synchronous library.

# NB: the meaning of the different notations used throughout the paper is given in table 1.

# 2 QDI Element Specificities and Library Sizing Strategy

## 2.1 QDI Element Specificities

Depending on the desired robustness to process, voltage and temperature variations, handshake technology offers a large variety of asynchronous circuit styles and a large number of communication protocols. Our aim is not here to give an exhaustive list of all the possible alternatives, but to introduce the main specificities of the primitives required to design 4-phase QDI circuits.

For such circuits, the data transfer through a channel starts by the emission of a request signal encoded into the data, and finishes by the emission of an acknowledge signal. During this time interval, which is a priori unknown, the incoming data must be hold in order to guarantee the quasi-delay-insensitivity property. This implies the intensive use of logical gate including a state holding element (usually a latch) or a feedback loop.

As we target a CMOS implementation, it results from the preceding consideration that most of the required primitive are composite or complex positive gates. Indeed they can be decomposed in one or more simple dynamic logic gates and a state holding element. In fig.1 we give possible decompositions of a 3-input Muller gate and a COR222 gate, both widely used to implement basic logic such as "And", "Or", "Xor" in multi-rail design style.

## 2.2 Library Sizing Strategy

Due to their composite structure, different sizing strategies can be applied to the library. The one we adopted is based on the five following design rules:

 $\mathbb{O}$ : balance at first order the amplitudes of the currents flowing through the N and P arrays in order to balance the active and RTZ phases.

O: designing at least the drives X0, X1, X2, X4 for each functionality in order to accommodate a large range of loads. (Many gates have been designed in drives 0,1,2,4,8,12)

③: design each drive in order to ensure that, independently of the logic function, its output driver has the same current capability that the equivalent inverter. As an example, the last stage of the logic decomposition of the 3-input Muller gate (M3) of drive Xj is sized in order to deliver the same switching current than the inverter of drive Xj.

④: minimize the area by designing each cell in order to accommodate weak and important loads in two functional stages. This means that only the two last stages of the COR222 decomposition will be sized in order to accommodate the output load; the preceding stage being designed for a minimum area cost. This is equivalent to targeting implementations with low input capacitance values. Such strategy may allow the most frequent possible use of weak drives without compromising too much the speed performances.

⑤: avoid whenever possible logic decompositions in which the state holding element drives the output node. In figure (1f), the placement of the output inverter and the

latch can be interchanged, but it is preferable as suggested in [10] to place the latch first and to let the inverter drive the output node in order to minimize the cell area according to ③.

### 3 First Order Models and Sizing Criteria

In order to achieve high speed performance and to ensure the correct behaviour of the state holding element, we need first order delay models of both static and ratioed CMOS structures. This section's aim is to briefly present the models we adopted and the gate sizing strategy we deduced from them. We first introduce the first order model of the drain to source current they are based on.

#### 3.1 Drain Source Current Model

The drain source current model adopted is the  $\alpha$ -power law model proposed in [4] considering for simplicity that  $\alpha$ =1. It allows us to model the behaviour of the transistors for which the current saturation occurs by carrier velocity saturation phenomenon. Thus, the expressions of the drain source current considered afterward are:

$$I_{DS,N/P} = \begin{cases} 0 \\ \mu_{0,N/P} \cdot C_{OX} \\ L_{GEO} \\ K_{N/P} \cdot W_{N/P} \cdot (V_{GS,N/P} - V_{TN/P}) \cdot V_{DS,N/P} \\ K_{N/P} \cdot W_{N/P} \cdot (V_{GS,N/P} - V_{TN/P}) \cdot V_{DS,N/P} \end{cases}$$
(1)

#### 3.2 Step Response of a Static CMOS Structure

As a first order delay model for all the static structures, we use the generalization of the inverter step response proposed by Mead [11]:

$$t_{HLS(HLS)} = \frac{C_L \cdot \Delta V}{I_{N(P)}} = \tau \cdot \frac{C_L}{C_{N(P)}}$$
(2)

The inverter step response can be generalized to all the static gates by reducing each gate to an equivalent inverter [3, 5, 6]. To do so, one can estimate the ratio  $\Delta W_{N,P}$  between the current that a transistor is likely to deliver and the current that the associated serial array of transistor can deliver. Then following (2), the step responses of a logical gate can be expressed as:

$$t_{HLS} = \tau \cdot \Delta W_N \cdot \frac{C_L}{C_N} = \tau \cdot \Delta W_N \cdot F_O \cdot \frac{(1+k)}{2}$$

$$t_{LHS} = \tau \cdot R \cdot \Delta W_P \cdot F_O \cdot \frac{(1+k)}{2 \cdot k}$$
(3)

From (3), we get, defining adequately  $SD_{N,P}$  the following expressions:

$$t_{HLS(LHS)} = \tau \cdot SD_{N(P)} \cdot F_O \tag{4}$$

#### 3.3 Step Response of a Ratioed Structure

Let us consider a ratioed structure loaded by an infinite load as represented on figure 2. It corresponds to the worst-case configuration, as the output driver is not able to discharge significantly the output node Z before the node  $Z_{INT}$  stabilized.

The step response of a MOS gate being defined as the time necessary for the structure to charge or discharge its output voltage up to or down to  $V_{DD}/2$ , we solved the differential equation:

$$I_{P}^{L}(t) - I_{N}^{H}(t) = -C_{L} \cdot \frac{dV_{OUT}}{dt}$$

$$\tag{5}$$

to obtain the output voltage evolution:

$$V_{OUT}(t) = \frac{\alpha_P \cdot m_n \cdot W_P^L}{\alpha'_N \cdot \Delta W_P \cdot W_N^H} \cdot \left(1 - e^{-\frac{\alpha'_N \cdot W_N^H}{m_n \cdot C_L} \cdot t}\right)$$
(6)

This expression increases quickly to finally reach an asymptotic value. Let us note  $\Delta V_{LH}$  the corresponding voltage variation of node  $Z_{INT}$ :

$$\Delta V_{LH} = \lim_{t \to \infty} V_{OUT}(t) = \frac{\alpha_P \cdot m_n \cdot W_P^L}{\alpha'_N \cdot \Delta W_P \cdot W_N^H}$$
(7)

To ensure a correct behaviour of the latch, the limit must be at least equal to the inversion voltage  $V_{INV}$  of the output driver. However, in order to design high speed latches, and as this limit is reached asymptotically, it is required to satisfy:

$$\Delta V_{LH} > V_{INV} \tag{8}$$

To ensure a maximal security of operation as well as a good switching speed, we set  $\Delta V_{LH} = \Delta V_{HL} = V_{DD}$  as our standard. This standard leads, while designing the latch, to respect the following sizing ratios,

$$K_{P^{L}(N^{L}) \to N^{H}(P^{H})} = \frac{W_{P(N)}^{L}}{W_{N(P)}^{H}} = \frac{\alpha_{N(P)}}{m_{n(p)}} \cdot \frac{\Delta W_{P(N)}}{\alpha_{P(N)}} \cdot \Delta V_{LH(HL)}$$
(9)

reported in table 2 for the considered 130nm process. The analysis of these results clearly shows that a single  $N^L$  transistor delivers enough current to control the latch. Therefore, we set  $m_p$  to one. On the contrary, it appears necessary to reduce the current capabilities of the  $N^H$  transistor array in order to avoid an area expensive oversizing of the  $P^L$  transistor array. This explains why we set  $m_p$  value to 2.

Knowing how to size the latches in order to ensure a correct behaviour, we can evaluate from (6) and from its dual expression, the step responses  $t_{HLS,LHS}^{R}$  of the ratioed structure represented in fig.2.

$$t_{HLS}^{R} = -\frac{m_{p} \cdot C_{L}}{\alpha'_{p} \cdot W_{p}^{H}} \cdot \ln \left( 1 - \frac{V_{DD} \cdot \Delta W_{N} \cdot \alpha'_{p} \cdot W_{p}^{H}}{2 \cdot m_{p} \cdot \alpha_{N} \cdot W_{N}^{L}} \right)$$
(10a)

$$t_{LHS}^{R} = -\frac{m_{n} \cdot C_{L}}{\alpha_{N}^{'} \cdot W_{N}^{H}} \cdot \ln \left( 1 - \frac{V_{DD} \cdot \Delta W_{P} \cdot \alpha_{N}^{'} \cdot W_{N}^{H}}{2 \cdot m_{n} \cdot \alpha_{P} \cdot W_{P}^{L}} \right)$$
(10b)

With m<sub>n</sub> and m<sub>p</sub> set respectively to two and one, expressions 10a and 10b) become:

$$t_{HLS}^{R} = t_{HLS} \left( 1 + \frac{V_{DD}}{4} \cdot \frac{\Delta W_{N} \cdot \alpha_{P}^{'} \cdot W_{P}^{H}}{m_{P} \cdot \alpha_{N} \cdot W_{N}^{L}} \right) = t_{HLS} \cdot \beta_{HLS}$$

$$t_{LHS}^{R} = t_{LHS} \left( 1 + \frac{V_{DD}}{4} \cdot \frac{\Delta W_{P} \cdot \alpha_{N}^{'} \cdot W_{N}^{H}}{m_{n} \cdot \alpha_{P} \cdot W_{P}^{L}} \right) = t_{LHS} \cdot \beta_{LHS}$$
(11a,b)

where  $t_{LHS}$  ( $t_{LHS}$ ) is the step response of the pull-up (pull down) associated to the structure of fig.2 in the absence of the transistor  $N^{H}$  ( $P^{H}$ ), and the term  $\beta_{LHS}$  ( $\beta_{HLS}$ ) corresponds to the slowing down factor induced by transistor  $N^{H}$  ( $P^{H}$ ). Thus, considering expression (11), it seems that the ratioed structure represented on fig.2 behaves as a static structure for which the transistors have the following widths:  $W_{P}^{L/}$ / $\beta_{LHS}$  and  $W_{N}^{L/}$ / $\beta_{HLS}$ .

#### 3.4 Gate Sizing

As explain in section II-b, where the library sizing strategy is defined, we want to take advantage of the composite structure of the QDI primitives in order to minimize the area. The application of this strategy results in only sizing the two last stages so that the preceding stages are sized at the minimal area cost. This leads to consider the structure of fig.3, in order to determine the tapering factor [7,8] to be applied to the logic decomposition to minimize the propagation delays. In order to respect rule (\$\overline\$, all functionalities are decomposed in such a way that the state holding element do not control the output driver. However, it appears to be too area-expensive for Muller gates. Consequently, two cases have been studied.

#### Case 1 : the last stages is a static CMOS structure (usually an inverter)

Let us express at first order, the propagation delays  $(\Theta \approx \Theta_{HL} \approx \Theta_{LH})$  of the three last stages of the generic structure of fig.3. Using the generalized step response and considering that the internal configuration ratio is equal to R (see rule  $\mathbb{O}$ ), we get :

$$\frac{\Theta}{\tau} = SD_N(q-2) \cdot F_O(q-2) + SD_P(q-1) \cdot F_O(q-1) + SD_N(q) \cdot F_O(q)$$
(12)

with:

$$F_{0}^{(k)} = \frac{C_{k+1}}{C_{k}}$$
(13)

Evaluating the optimal value  $(d\theta/dC_{(q-1)}=0)$  of the input capacitance of stage (q-1),  $C_{q-1}^{opt}$ , we get:

$$C_{q-1}^{opt} = \sqrt{\frac{SD_{p}(q-1)}{SD_{N}(q-2)}} \cdot C_{q} \cdot C_{q-2}$$
(14)

To estimate the quality of this sizing criterion, we applied a derating factor  $\eta$  to  $C_{i\text{-opt}}$  ( $C_{q-1} = C_{q-1}^{\text{opt}} \cdot \eta$ ). Then, for some implementations, we simulated the propagation delay value of the structures. As an example, fig.4 illustrates the variation of the propagation delays with respect to  $\eta$  in the case of a Muller2 (for 2 different loading conditions: Foo = 2 and 5). As shown, the structure sized according to (14) is closed to the optimal one.

#### *Case 2:* the last stage is the state holding element (Muller gate only):

Let us again express the first order propagation delay  $(\Theta)$  of the three last stages of the generic structures of fig.3. We get:

$$\frac{\Theta}{\tau} = SD_N(q-2) \cdot F_O(q-2) + \beta_{LHS}(q-1) \cdot SD_P(q-1) \cdot F_O(q-1) + SD_N(q) \cdot F_O(q)$$
(15)

It can be rewritten as:

$$\frac{\Theta}{\tau} = SD_{N(q-2)} \cdot F_{O(q-2)} + \left(1 + \frac{\varphi}{C_{q-1}}\right) \cdot SD_{P(q-1)} \cdot F_{O(q-1)} + SD_{N(q)} \cdot F_{O(q)}$$
(16)

The evaluation of the  $C_{q\text{-}1}^{}^{}^{opt}$  value minimizing equ.16 leads solving the Ferro-Cardan equation:

$$C_{q^{-1}}^{3} - A^{-1} \cdot C_{i} - 2\varphi \cdot A^{-1} = 0$$
<sup>(17)</sup>

$$=\frac{SD_{N}(q-2)}{SD_{P}(q-1)}\cdot\frac{1}{C_{q-2}\cdot C_{q}}\quad \varphi=\frac{V_{DD}}{4}\cdot\frac{\Delta W_{P}(q-1)}{m_{n}}\cdot\frac{\alpha_{N}'}{\alpha_{P}}\cdot\frac{1+k}{k}\cdot C_{OX}\cdot L_{GEO}$$
(18)

Two sub-cases have to be considered.

#### Case 2.a: $\beta_{LHS}$ and $\beta_{HLS}$ are close to 1

It corresponds to the case of  $C_{i+1}$ 's high values or equivalently to the case of strong drives. In that case, the solution of eq.17 is eq.19.

#### Case 2.b: $\beta_{LHS}$ and $\beta_{HLS}$ are greater than 1

It corresponds to the case of weak drives. The resolution leads then to the solution:

$$C_{q-1}^{opt} = \sqrt[3]{\frac{\varphi}{A}} \cdot \left(1 + \sqrt{1 - \frac{1}{27 \cdot A \cdot \varphi^2}}\right) + \sqrt[3]{\frac{\varphi}{A}} \cdot \left(1 - \sqrt{1 - \frac{1}{27 \cdot A \cdot \varphi^2}}\right)$$
(19)

## 4 Results

with: A

We designed thirty functionalities that are very frequently used, using the 130nm process from STMicroelectronics and an industrial automatic layout generator. Table 3 reports typical values of the area reduction factor when compared to AO222 based implementations. As shown, the average area reduction factor obtained for all gates is 1.9.

As it was difficult to detail herein speed and power performances for all the gates with respect to their AO222 based implementations, the results obtained for three representative gates are detailed : Muller2\_X2, Muller3\_X2 and COR222\_X2.

These gates are representative of many others as the electrical paths involved in the switching process are the same or practically the same than various other implemented gates such as: Muller4, COR211, COR221, COR22, COR21, COR222\_Ackin\_Set ....

The simulation protocol used to compare the proposed implementations to the AO222 based implementations is described in fig.3. With such a protocol, the Foi and Foo values variation enable to analyze the effect of the input ramp and of the load on the performances.

Fig.5 reports the speed improvement and the power saving obtained with respect to AO222 based implementations. As shown, in the typical design range (Foi and Foo ranging from 1 to 10), we can conclude that for almost identical speed performances (speed improvement between -15% and 15%), the cells designed using our strategy are significantly smaller and consumes less power, except for the Muller2 gate.

This exception can be explained by one main reason. The output driver of the Muller gates is a latch. Such a structure burns a large amount of power while driving a large capacitance. However, for Muller3 and 4, this extra power consumption is easily balanced by the fact that we only need one latch to implement them while the corresponding AO222 based implementations require respectively two and three feedback loops.

# **5** Conclusion

Taking advantage of the composite structure of QDI gates, we designed a complete library. Using a generalized step delay model, we have been able to obtain gates that are two times smaller, while maintaining their power and speed performances (compared to AO222 based implementations). These results clearly demonstrate that to obtain a fair comparison between asynchronous and synchronous ASICs, one need to develop dedicated libraries. Indeed, the area reduction of the cells strongly impacts the routing, and thus the global performances of a given circuit, both in terms of speed and power. Future works will be focused on quantifying the gain brought by TAL in terms of speed and power by the design of significant asynchronous prototypes chips.



**Fig. 1.** (a,d) Symbols of Muller3 and COR222 gates, (b,d) Muller3 and COR222 decompositions in AO222 based ,,design style", (c,f) Muller3 and COR222 schematics requiring a minimal number of transistors.







Fig. 3. Generic CMOS structure used to size our gate and considered during the comparison protocol



Fig. 4. Quality evaluation of the sizing criterion (14); case of a Muller implementation. (Fo=C\_L/Ci+1)



Fig. 5. Speed improvement and the power saving obtained with respect to AO222 based implementations for various loading and controlling design conditions.

| Notations         | Definition, meaning                                                                                                                                     |       |  |  |  |
|-------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------|-------|--|--|--|
| $\mu_{0N,P}$      | Electron/hole mobility                                                                                                                                  |       |  |  |  |
| Cox               | Gate oxide capacitance                                                                                                                                  |       |  |  |  |
| V <sub>DD</sub>   | Supply voltage                                                                                                                                          |       |  |  |  |
| K <sub>N,P</sub>  | Conduction factor of N and P transistors in the strong inversion region                                                                                 |       |  |  |  |
| L <sub>GEO</sub>  | Geometrical channel length                                                                                                                              |       |  |  |  |
| V <sub>TN,P</sub> | Threshold voltages of N/P transistors                                                                                                                   |       |  |  |  |
| α <sub>N/P</sub>  | $K_{N,P}(V_{DD} - V_{TN,P})$                                                                                                                            | mA/µm |  |  |  |
| α' <sub>N/P</sub> | $\mu_{0N,P} \frac{C_{OX}}{L_{GEO}} (V_{DD} - V_{TN,P})$                                                                                                 |       |  |  |  |
| τ                 | Process parameter time metric                                                                                                                           |       |  |  |  |
| R                 | Switching current asymmetry factor                                                                                                                      | none  |  |  |  |
| V <sub>INV</sub>  | Inversion voltage of a CMOS structure                                                                                                                   | V     |  |  |  |
| W <sub>N,P</sub>  | N/P transistor width                                                                                                                                    | μm    |  |  |  |
| C <sub>N/P</sub>  | N/P transistor gate capacitance                                                                                                                         | fF    |  |  |  |
| CL                | Output capacitance                                                                                                                                      | fF    |  |  |  |
| Ci                | Input capacitance of stage i.                                                                                                                           |       |  |  |  |
| Fo <sup>L</sup>   | Output load to input capacitance ration of a given stage                                                                                                | none  |  |  |  |
| $\Delta W_{N,P}$  | Switching current reduction factor: ratio between the current available in a N/P transistor array to that of a single identically sized N/P transistor. |       |  |  |  |
| SD <sub>N,P</sub> | Global switching current reduction factor.                                                                                                              |       |  |  |  |
| k                 | Internal configuration ration W <sub>P</sub> /W <sub>N</sub>                                                                                            | none  |  |  |  |

Table 1. The meaning of the different notations used throughout the paper

**Table 2.** Sizing ratio  $K_{P,N}^{L} \rightarrow_{N,P}^{H}$ 

| $K_{P^L \to N^H}$ | $m \rightarrow$   | 1    | 2    | 3    | 4    |
|-------------------|-------------------|------|------|------|------|
|                   | 2                 | 12.2 | 6.1  | 4.1  | 3.1  |
| $n \rightarrow$   | 3                 | 18.0 | 9.0  | 6.0  | 4.5  |
|                   | 4                 | 23.8 | 11.9 | 7.9  | 6.0  |
| $K_{N^L \to P^H}$ | $m_p \rightarrow$ | 1    | 2    | 3    | 4    |
| $n_n \rightarrow$ | 2                 | 1.3  | 0.65 | 0.43 | 0.32 |
|                   | 3                 | 1.7  | 0.85 | 0.56 | 0.42 |
|                   | 4                 | 2.1  | 1.05 | 0.7  | 0.53 |

**Table 3.** Area reduction factors obtained with respect to AO222 based implementation (Ack means that there is an input acknowledgement signal, and R an input reset signal)

| GATE          | AREA REDUCTION<br>FACTOR |     | CTION<br>R | GATE         | AREA REDUCTION<br>FACTOR |     |     |
|---------------|--------------------------|-----|------------|--------------|--------------------------|-----|-----|
|               | X1                       | X2  | X4         |              | X1                       | X2  | X4  |
| M2            | 1.0                      | 1.4 | 1.5        | M2_Ack_R     | 2.5                      | 2.3 | 2.3 |
| M3            | 2.6                      | 2.7 | 2.4        | M3_Ack_R     | 2.3                      | 2.3 | 2.2 |
| M4            | 3.4                      | 3.5 | 2.3        | M4_ACK_R     | 2.4                      | 2.4 | 2.4 |
| <b>COR222</b> | 1.8                      | 1.8 | 2.0        | COR222_Ack_R | 2.5                      | 2.5 | 2.1 |
| COR221        | 1.3                      | 1.3 | 1.5        | COR221_Ack_R | 2.2                      | 2.2 | 2.1 |
| COR211        | 0.9                      | 0.9 | 1.0        | COR211_Ack_R | 1.7                      | 1.5 | 1.9 |

### References

- T. Chelcea, A. Bardsley, D. A. Edwards, S. M. Nowick" A Burst-Mode Oriented Back-End for the Balsa Synthesis System "Proceedings of DATE '02, pp. 330–337 Paris, March 2002
- [2] J. Cortadella, M. Kishinevsky, A. Kondratyev, L. Lavagno and A. Yakovlev "Petrify: a tool for manipulating concurrent specifications and synthesis of asynchronous controllers" IEICE Transactions on Information and Systems, Vol. E80-D, No. 3, March 1997, pages 315–325.
- [3] M. Renaudin et al, "TAST", Tutorial given at the 8<sup>th</sup> international Symposium on Advanced Research in Asynchronous Circuits and Systems, Manchester, UK, Apr. 8–11, 2002
- [4] T. Sakurai and A.R. Newton, "Alpha-power model, and its application to CMOS inverter delay and other formulas", J. of Solid State Circuits vol.25, pp.584–594, April 1990.
- [5] P. Maurine, M. Rezzoug, N. Azemard, D. Auvergne, "Transition Time modelling in Deep Submicron CMOS" IEEE trans. on CAD of IC and Systems, vol. 21, n°11, November 2002
- [6] A. Chatzigeorgiou, S. Nikolaidis, I. Tsoukalas," A modelling technique for CMOS gates" IEEE Trans on CAD of Integrated Circuits and Systems, Vol.18, n°5, May 1999
- [7] B. S. Cherkauer, E. G. Friedman " A Unified Design Methodology for CMOS Tapered Buffers" IEEE Transactions on VLSI Systems, vol.3,n°1,pp.99–111, March 1995
- [8] N. Hedenstierna, K. O. Jeppson "CMOS Circuit Speed and Buffer Optimization" IEEE transactions on CAD, vol. CAD-6, n°2, pp. 270-281, March 1987
- [9] C. Piguet, J. Zhand "Electrical Design of Dynamic and Static Speed Independent CMOS Circuits from Signal Transistion Graphs" PATMOS '98, pp. 357–366,1998.
- [10] I. Sutherland, B. Sproull, D. Harris, "Logical Effort: Designing Fast CMOS Circuits", Morgan Kaufmann Publishers, INC., San Francisco, California, 1999.
- [11] C. Mead N. Conway "Introduction to VLSI systems" Reading MA: Addison Wesley 1980
- [12] M. Renaudin, "Asynchronous Circuits and Systems : a promising design alternative", in "Microelectronics for Telecommunications : managing high complexity and mobility" (MIGAS 2000), special issue of the Microelectronics-Engineering Journal, Elsevier Science, Guest Editors : P. Senn, M. Renaudin, J. Boussey, Vol. 54, N° 1–2, December 2000, pp. 133–149.