

# Alleviating Through-Silicon-Via Electromigration for 3-D Integrated Circuits Taking Advantage of Self-Healing Effect

Yuanqing Cheng, Aida Todri-Sanial, Jianlei Yang, Weisheng Zhao

# ▶ To cite this version:

Yuanqing Cheng, Aida Todri-Sanial, Jianlei Yang, Weisheng Zhao. Alleviating Through-Silicon-Via Electromigration for 3-D Integrated Circuits Taking Advantage of Self-Healing Effect. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2016, 24 (11), pp.3310-3322. 10.1109/TVLSI.2016.2543260. lirmm-01446125

# HAL Id: lirmm-01446125 https://hal-lirmm.ccsd.cnrs.fr/lirmm-01446125

Submitted on 25 Jan 2017  $\,$ 

**HAL** is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire **HAL**, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

# A Study of 3-D Power Delivery Networks With Multiple Clock Domains

Aida Todri-Sanial, Member, IEEE, and Yuanqing Cheng, Member, IEEE

Abstract—Ongoing advancements in 3-D manufacturing are enabling 3-D ICs to contain several processing cores, hardware accelerators, and dedicated peripherals. Most of these functional units operate with independent clock frequencies for power management reasons or simply for being hard intellectual properties. Thus, as diverse and heterogeneous circuits can be implemented on a 3-D IC, it also leads to the use of multiple clock domains. While these domains allow many functional units to run in parallel to exploit 3-D potentials, they also introduce power delivery challenges. This paper proposes an efficient analysis for assessing the worst case power supply noise on 3-D power delivery networks (PDNs) with multiple clock domains. This paper discusses power and thermal integrity issues that arise from multiple clock domains that share the same 3-D global PDN. We first examine power supply noise distribution on each tier and investigate scenarios that lead to worst case noise. Thermal analyses are also performed and heat distribution among clock domains and tiers is examined. In addition, the impact of clock domain structure and frequency on the overall power supply noise and temperature distribution has been quantified. Experiments show that the multiclock domains can induce excessive noise and the through-silicon-vias can contribute to power supply noise and heat transfer among tiers. This paper presents a summary of guidelines for modeling, analyzing, and exploring a design of reliable 3-D PDNs with multiple clock domains.

*Index Terms*—3-D ICs, clock networks, power and thermal integrity, power delivery networks (PDNs), through-siliconvias (TSVs).

#### I. INTRODUCTION

**L** ONG global interconnects are the principal performance bottleneck for high-performance system-onchip (SoCs) [1]. Such long interconnects are becoming the main obstacle in terms of communication bandwidth, latency, and power consumption [2]. Network-on-chip (NoC) architectures have emerged as an efficient methodology to offset such drawbacks and outperform mainstream bus architectures [2], [3]. Today, SoC designs are getting more complex and a variety of intellectual-property (IP) blocks can be integrated in them. These designs often employ several processors, memories, and special function units. Each of these blocks can

Manuscript received September 4, 2015; revised January 9, 2016 and February 26, 2016; accepted March 27, 2016.

A. Todri-Sanial is with the Department of Microelectronics, Laboratoire Informatique Robotique Microelectronic de Montpellier and French National Center of Scientific Research (CNRS), Montpellier 34095, France (e-mail: aida.todri@lirmm.fr)

Y. Cheng is with the School of Electrical and Information Engineering, Beihang University, Beijing 100191, China (e-mail: yuanqing@ieee.org).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TVLSI.2016.2549275

operate at different frequencies and often require individual clock domains.

To further exploit the use of various functional blocks in the same package, 3-D integration presents a path toward higher circuit density and bandwidth in small form factor. The 3-D integration leverages on the vertical direction to minimize communication distances, increase the number of cores/IPs, and provide more connectivity among the blocks. In addition, 3-D technology enables heterogeneous integration and new class of applications and systems through significantly improved performance and energy efficiency of complex architectures [3], [4]. These heterogeneous systems often require multiple clock domains.

Current manufacturing advancements have led to the development of through-silicon-vias (TSVs) with fine pitch and thinned dies to accommodate stacking of several circuit layers. There are many research centers that investigate 3-D integration and are designing complex systems, such as stacking of several memory layers on top of logic, sensors and actuators on top of logic, and multicore systems connected through TSV arrays [5]. The synergy of using 3-D integration and NoC is making possible to shorten communication distances, thus improving performance and ultimately reducing overall power consumption.

The continuing drive for higher performance systems has led to unprecedented increase in clock frequency and even more so due to the growing need of smart and connected electronic devices and systems. One of the challenging tasks for an efficient 3-D design is the distribution of clock across a large system with multiple tiers while meeting the clock skew and slew budgets. A widely and effective applied method for 3-D and even 2-D systems is to design globally asynchronous clocks and abandon the single clock synchronous systems. Such paradigm introduces multiple clock domains and accommodates cores and/or IPs to operate at different frequencies while in a single system. Multiple clock domains are also widely effective not only for implementing different functional blocks but also for power management and energyefficient system design [6].

However, the use of multiple clock domains poses the challenges on reliable delivery of power supply voltage to each tier on a 3-D stack. There are several challenges that the multiple clock domains introduce to 3-D power delivery networks (PDNs). Assume that there are two clock domains on tier 1 (T1), as shown in Fig. 1. Block 1 with freq<sub>1</sub> has fast switching and power hungry circuits, while block 2 with

1063-8210 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.



Fig. 1. Illustration of a two-tier 3-D PDN with several blocks representing multiple clock domains.

freq<sub>2</sub> has slow switching with low-power demand circuits. Both these blocks operate with the same voltage supply and share the global power grid. Due to fast and frequent switching of block 1, voltage fluctuations (power supply noise) are introduced on the power grid, which consequently impact the performance of block 2. Furthermore, TSVs serve to connect power networks among tiers, but undesirably they also transfer power supply noise between them. For example, block 3 in tier 2 (T2) will also experience excessive voltage fluctuations due to the noise transfer from T1 to T2. Worst case noise may occur when a block is experiencing self-induced noise and transferred noise from neighboring clock domains or tiers that can lead to critical performance issues. We will elaborate on derivation and quantifying these two types of noises later on this paper.

Heat dissipation and transfer among tiers exhibit a similar trend but in the reverse direction. Circuit blocks that are near to heat sink benefit from immediate cooling also referred to as primary heat flow path, while circuits that are further away experience higher temperatures. The tiers next to package have some of the heat alleviated due to the secondary heat flow path from die to package, whereas the middle tiers tend to experience higher temperature gradients. TSVs conduct the electrical connections between tiers but also transfer heat from a hot to a cooler tier, hence contributing to nonuniform thermal gradients in a tier and between tiers [7], [8]. The heat transfer between the tiers creates thermal wave zones around the TSVs, and hot spots can be created on the overlapping of these waves, which can detrimentally impact the performance of devices and parasitics of interconnects and eventually lead to Joule heating phenomenon [8]. Hence, multiple clock domains facilitate power management and improve communication latencies, but also introduce critical reliability concerns on 3-D PDNs due to nonuniform power supply and temperature distribution.

The objective of this paper is to investigate the worst case power supply noise and temperature distribution on 3-D PDNs due to 3-D heterogeneous systems with multiple clock domains. We quantify the impact of multiple clock domains, such as topologies, operating frequencies, and granularities on reliability and robustness of 3-D PDNs.

In addition, we outline a comprehensive review of various design guidelines for exploring the design knobs (i.e., power track sizing, TSV sizing and placement, decoupling capacitance insertion, workload assignment, and stacking order) to obtain reliable 3-D PDNs with multiple clock domains.

#### A. Contributions of This Paper

Despite these efforts, research on resiliency of 3-D PDNs is still in early stages. While the aforementioned works have started to investigate the reliability issues of 3-D PDNs, the impact of multiple clock domains on power and thermal integrity of 3-D PDNs remains to be inspected. The contributions of this paper are summarized as follows.

- 1) We provide a summary of guidelines for electrothermal modeling and simulation of 3-D PDNs with multiple clock domains.
- We perform in-depth electrical and thermal analysis to understand the voltage droop and temperature distribution in 3-D ICs with multiple clock domains.
- We investigate different clock domain frequencies and quantify their impact on power supply noise and temperature distribution on 3-D PDNs.
- We identify potential power and thermal integrity issues by inspecting workload assignment and stacking order.
- 5) We evaluate possible design knob solutions, such as insertion of decoupling capacitances, TSV sizing and placement, and core placement for achieving reliable 3-D PDNs.
- 6) We perform several case studies and draw practical design guidelines that can be specially important during the design phase of 3-D PDNs to mitigate some of the excessive heat and voltage droop introduced by multiple clock domains.

The rest of this paper is organized as follows. Section II covers the related and recent works. Section III explains the preliminaries and the models utilized in this paper. Section IV provides the analyses of 3-D PDNs with multiple clock domains. In Section V, we provide our experimental results and case studies. Section VI presents a summary of practical design guidelines for reliable 3-D PDNs with multiple clock domains. Section VII concludes this paper.

#### **II. RELATED WORKS**

Investigations on 3-D stacked systems have attracted a lot of attention in the recent years from both academia and industry. There exists a rich body of research papers that are relevant for this paper and can be mainly grouped into two categories, such as papers focused on the design of 3-D clock networks and 3-D PDNs.

On the design of 3-D clock networks, several topological approaches have been investigated, such as clock trees, meshes, and hybrids. Zhao *et al.* [9], [10] proposed a 3-D clock tree synthesis algorithm for low-power and reliable clock networks while taking into account the TSV count and clock buffer insertion. Recently, Bang *et al.* [11] proposed a novel clock tree synthesis and synchronization scheme based on the concept of clustered clocking for 3-D ICs. A 3-D TSV fault-tolerant clock tree synthesis method was

TODRI-SANIAL AND CHENG: STUDY OF 3-D PDNs WITH MULTIPLE CLOCK DOMAINS



Fig. 2. Illustration of PDN and ground delivery network for a three-tier system.

presented in [12] while minimizes the number of clock TSVs. Kim and Kim [13] proposed a 3-D clock tree embedding algorithm for 3-D ICs with zero clock skew routing constraint. Thermal effects were included when modeling the 3-D clock tree and the design of adaptive clock buffers was proposed for designing thermally robust clock trees in [14]. Savidis *et al.* [15] investigated various clock distribution topologies and developed models to accurately determine the clock delay. Lung *et al.* [16] investigated the TSV faulttolerant clock network design. A comparison of various different 3-D clock topologies, such as H-trees and with the combination of local and global meshes, is provided in [17]. In this paper, we will exploit and utilize these developments to explore the impact of different 3-D clock topologies on the reliability of 3-D PDNs.

On the design of 3-D PDNs, several groups of papers can be identified. On the first group, the impact of the 3-D integration technology on PDNs is investigated. In [18], 3-D PDNs are modeled and analyzed by inspecting the TSV array model. Moreover, He *et al.* [19] investigated the impacts of TSV dimensions and placement on 3-D PDNs. More analyses were performed in [20] on different TSV densities and aspect ratios and their impact on 3-D PDNs.

On the second group, various topological approaches were investigated for reliable 3-D PDNs. Healy and Lim [21] proposed a novel topology for 3-D PDNs while taking into account TSV placement and package parasitics. Khan *et al.* [22] explored different 3-D PDN topologies. Pavlidis and De Micheli [23] investigated different power distribution and return paths that impact the overall voltage droop on 3-D PDNs. Floorplanning and optimization of PDN and ground delivery network are explored in [24].

On the third group, optimization methods and design knobs are explored for alleviating the excessive voltage droop. Todri *et al.* [25] and Todri-Sanial *et al.* [26] provide the electrothermal modeling and analysis of 3-D PDNs and explore the optimization of 3-D PDNs for electrothermal constraints. In [27]–[29], power supply noise issues are investigated and decoupling insertion is explored. Decoupling capacitance insertion for 3-D power grid optimization is explored in [30]. Planning and placement of power and ground TSVs were investigated in [31].

On retrospect, the issue of power supply noise due to multiple clock domains is also critical to 2-D systems.

There are several works that have looked into this problem and have investigated the timing issues introduced and presented power grid optimization methods, such as in [32] and [33]. Power supply noise issues are also present on 3-D PDNs with multiple clock domains; however, new challenges arise due to vertical integration, stacking order, TSV sizing, placement, and overall number of TSVs. TSVs allow power delivery among many vertical tiers, but they also contribute to power supply noise transfer among cores and tiers. As the experiments demonstrate, mitigation of voltage droop on 3-D PDNs is not as straightforward as in 2-D systems, but it requires several design knobs to be applied simultaneously due to the lateral and vertical inherent noise distribution behavior in 3-D systems.

#### **III. PRELIMINARIES**

For 2-D ICs, PDNs are usually structured as regular meshes with metal tracks running in perpendicular to each other. Other works have explored the nonregularity of power meshes for reducing voltage droop [34]. Regardless of their structure, PDNs are designed to deliver voltage reliably, such that the underlying circuits can function properly. Any deviation from the supply voltage can impede optimal performance of devices and consequently leading to circuit delay and throughput reduction.

In this paper, we investigate power supply noise and temperature distribution on a three-tier PDN with various clock domains. An illustration of the 3-D system is shown in Fig. 2. In this paper, we include parasitics of package C4 bumps, global PDN, TSVs, switching circuits, decoupling capacitance, and heat sink. Our analyses are based on the *RLC* model representation of the power grid, package, and switching circuits. We utilize the existing physical models for studying power supply noise on 3-D power grids and further enhance them to include the thermal effects [32], [35]–[37].

The granularity of the switching circuits can be represented at die-, core-, block-, gate- or transistor-level. In this paper, as we are interested in exploring multiple clock domains, we utilize core-level modeling, where the core is operating at certain frequency range, which represents its clock domain. There is a lot of work dedicated to the optimal design of clock networks [9], [10], [13]–[17], and this is also a challenging topic for 3-D ICs. In this paper, we do not focus on the structure or the design of clock domains, but make

the assumption that each core has already an optimal clock network, which allows it to operate at a certain frequency.

#### A. Package Model

On-chip supply voltage is delivered through the package balls to the global PDN of tier next to the package. In this paper, controlled collapse chip connection (C4) bumps are used for the package, which are an industry standard for the flip-chip technology [38]. The bump sizes are of 100  $\mu$ m width and 200  $\mu$ m pitch and its *RL* parasitics are extracted as 10 m $\Omega$ and 60 pH [38]. We assume an array of C4 bumps where half are used for supplying power (*V*<sub>DD</sub>) and ground (GND), respectively.

#### B. TSV Model

In this paper, we study high-density TSVs with dimensions as 15  $\mu$ m length, 3  $\mu$ m diameter, and 60  $\mu$ m pitch [39]. Please note that such TSV dimensions are simply used as case studies, and our analysis is general enough to handle TSVs with different geometries. TSV *RLC* parasitics have been extracted as  $R_{tsv} = 100 \text{ m}\Omega$ ,  $L_{tsv} = 10 \text{ pH}$ , and  $C_{tsv} = 40 \text{ fF}$  [39]. In this paper, we assume that TSVs are inserted in an array and uniformly distributed. TSVs can serve for both signaling and power/ground connections. As this paper is focused on the analysis of PDNs, we only consider TSVs for power and ground connections, where half of TSVs are used for power and ground, respectively.

#### C. PDN Model

RLC models for 3-D PDNs have been presented in [27], as regular mesh networks for each tier that are connected together using TSVs. In this paper, we assume that the power grids are uniform, meaning that each power track has a uniform width throughout its length, whereas the width of each track can vary, as shown in Figs. 1 and 2. In addition, we consider that the topology of PDN on each tier can vary due to the heterogeneous die stacking. RLC parasitics,  $R_{grid}$ ,  $L_{\text{grid}}$ , and  $C_{\text{grid}}$  of each power track can be extracted based on its dimension. Furthermore, we investigate the power grids that are shared among all cores on the same die even though each core would have its own clock domain. From power supply noise analysis perspective, it is of interest to study the shared PDNs among the cores for investigating the noise transfer between them. In this paper, we consider the power tracks of 1000  $\mu$ m length and 10–30  $\mu$ m widths.

#### D. Switching Circuit Model

Each working core or functional block can represent the circuits of different functionalities, power demands, and operating frequencies. From the power grid analysis perspective, these switching circuits draw current from the power grid and are commonly modeled as switching current sources with specific parameters to represent the characteristics of the underlying circuits [18], [21]–[25], [27]. Switching circuits are commonly modeled as the triangular current sources to represent peak current, leakage current, peak time, rise time, and fall time as  $I_{\text{peak}}$ ,  $I_{\text{leak}}$ ,  $t_{\text{pise}}$ , and  $t_{\text{fall}}$ . To represent multiple clock domains, the operating frequencies of the functional blocks are varied such as to represent various switching clock frequencies. Time period of a switching clock for a block *i* can be represented as  $t_{\text{period}_i} = 2(t_{\text{rise}_i} + t_{\text{fall}_i})$  and clock frequency as  $\text{freq}_i = 1/t_{\text{period}_i}$ . The switching source current and its frequency are based on previous work for the PDN analysis [40].

#### E. Decoupling Capacitance

Nonswitching circuits also provide some of the intrinsic decoupling capacitance to sustain the voltage fluctuations of the power supply. In this paper, we consider the intrinsic decoupling capacitance and the intentionally inserted decoupling capacitance as of 5 fF/ $\mu$ m<sup>2</sup>. On-chip decoupling capacitors can be implemented in ICs in a number of ways. Typically, MIM capacitors are utilized and their capacitance density between 1 and 10 fF/ $\mu$ m<sup>2</sup> is reported [41]. Please note that this model is flexible enough to represent various amounts of inserted decoupling capacitances within each tier.

#### F. Thermal Model

Heat is generated from switching circuits and gets dispersed primarily through conduction and convection and to a lesser degree by radiation. Frequent switching circuits consume dynamic power, which consequently generates heat, and worsen the voltage droop. Simultaneous switching circuits on a tier and neighboring tiers lead to nonuniform power and heat distributions. Parasitics of power and ground networks may vary due to nonuniform thermal distribution, as resistivity is a function of temperature. Thus, at nodes with high temperature, voltage droop worsens. In addition, a large amount of currents flowing through power and ground networks elevate temperature even more, which over long periods of time can cause Joule heating and/or electromigration [42], [43]. Hence, voltage droop and thermal distributions are correlated and we consider them simultaneously.

We perform simultaneous electrical and thermal analyses to determine the voltage droop and temperature distribution on each tier. To do so, a thermal model of the three-tier system is built, where the principle of electrical-thermal duality is exploited. Duality is based on the equivalence of electrical current through an electrical resistor to heat flow through a thermal resistor and voltage difference equivalence to temperature difference. Thermal resistance models obtained from [25], [36], [37], and [43] are applied to represent PDNs, TSVs, and C4s. Heat generated from switching circuits is modeled by current sources to represent their power consumption. Temperature at heat sink is assumed 27 °C. A thermal analysis is based on 1-D static analysis, and combined with electrical analysis, we recompute voltage droop on each tier. In Table I, we list the parameter values of the models used in this paper.

### IV. ANALYSIS OF 3-D POWER DELIVERY NETWORK WITH MULTIPLE CLOCK DOMAINS

Here, we summarize the electrothermal analyses technique that is utilized for computing voltage droop and temperature

 TABLE I

 Detailed Summary of the Parameter Values Used in This Paper

| Element                         | Geometries                                           | Parasitic                                              |
|---------------------------------|------------------------------------------------------|--------------------------------------------------------|
| Power Grid Mesh                 | 1mm by 1mm                                           |                                                        |
| Package C4 bumps                | $100\mu m$ width, $200\mu m$ pitch                   | $R = 10m\Omega, L = 60pH$                              |
| TSV                             | $3\mu m$ diameter, $15\mu m$ length, $60\mu m$ pitch | $R_{tsv} = 100m\Omega, L_{tsv} = 10pH, C_{tsv} = 40fF$ |
| Power grid track                | 1000 $\mu$ m length, 10 $\mu$ m to 30 $\mu$ m width  | $R_{grid} = 45m\Omega, \ L_{grid} = 3.5pH$             |
| Inserted decoupling capacitance |                                                      | $5fF/\mu m^2$                                          |
| V <sub>DD</sub>                 |                                                      | 1V                                                     |
| Power Density                   | High                                                 | $3\mu W/\mu m^2$                                       |
| 5                               | Mid                                                  | $1.5\mu W/\mu m^2$                                     |
|                                 | Low                                                  | $0.7\mu W/\mu m^2$                                     |

on each tier of 3-D PDNs. In addition, we apply the presented analytical method on a sample circuit to illustrate the electrothermal analysis of 3-D PDNs with multiple clock domains.

#### A. Electrical Analysis

To perform multiple clock domain power grid analysis [32], we first describe how power grid node voltages are mathematically computed applying a modified nodal analysis (MNA). Current flow on power grid branches and node voltages on power grid intersections respect the Kirchhoffs current and voltage rules, i.e., KCL and KVL. Thus, the equations of each node voltage can be devised in matrices as

$$V = AU = G^{-1}U \tag{1}$$

where G is the modified conductance matrix of the power grid, U is the vector of current sources, and V is the vector of node voltages. A node voltage,  $v_i$  can be expressed as

$$v_i = \sum_{j=1}^n a_{ij} g_{ij} V_{\text{DD}} - \sum_{k=1}^n a_{ik} I_k$$
(2)

where *n* is the number of voltage nodes on the PDN,  $g_{ij}V_{DD}$  is the conductance term, where *j* is the neighboring node to node *i*,  $I_k$  is the current source at node *k*, and  $a_{ij}$  and  $a_{ik}$  are the elements of matrix *A*.

Please note that node voltage  $v_i$  can represent both power and ground node voltages. Solving for node equations (2) would rely on computing inverse matrix, which can be computationally expensive. There are several efficient and accurate linear algebra-based methods for the MNA analysis. Discussion of MNA solvers is not the scope of this paper and perusal on these methods is left to the reader [44], [45].

Based on the node voltages, one can easily compute the final node voltage using the superposition principle. The superposition applies to linear networks, where the response of each frequency is combined together as

$$V_{i} = V_{i}^{\text{freq}_{1}} + V_{i}^{\text{freq}_{2}} + \dots + V_{i}^{\text{freq}_{m}} + V_{o} = \sum_{r=1}^{m} V_{i}^{\text{freq}_{r}} + V_{o}$$
(3)

where  $V_i^{\text{freq}_m}$  is the node voltage for each frequency domain, m is the number of different clock frequency domains, and  $V_o$  is the initial conditions node voltage. Based on the node

voltages for each frequency, the amount of power supply noise can be derived by computing the voltage droop during a given time period as [40]

$$PSN_{i}^{freq_{j}} = \int_{ts}^{te} \left( V_{DD} - V_{i}^{freq_{j}} \right) dt$$
(4)

where ts and te are starting and end time for computing the amount of voltage droop, which can be user specific, i.e., being and end time of a single clock cycle or several cycles. Similarly, the total amount of power supply noise induced from different clock domains can be computed as

$$PSN_i = \sum_{j=1}^{m} PSN_i^{freq_j} + PSN_i^o$$
(5)

where  $PSN_i^o$  is the initial condition of power supply noise. Here, we also introduce the notion of self-noise and transferred noise.

*Definition:* Self-noise is the amount of noise that a block i with clock frequency, freq<sub>k</sub> introduces on itself as

$$PSN_i^{self} = PSN_i^{treq_k} + PSN_i^o.$$
 (6)

*Definition:* Transferred noise is the amount of noise that block *i* experiences from other switching blocks with different clock frequencies (different from  $\text{freq}_k$ ) as

$$PSN_{i}^{transferred} = \sum_{r=1, r \neq k}^{m} PSN_{i}^{freq_{r}}.$$
 (7)

### B. Thermal Analysis

Similarly, temperature distribution on 3-D PDNs can be computed by applying superposition principle. There are several works that discuss in-depth PDN thermal modeling [25], [32], [36], [37]. In this paper, we utilize these models to perform simultaneous electrical and thermal analysis of 3-D PDNs. By exploiting the principle of electrothermal duality, 3-D PDN thermal networks can be devised as electrical networks and accurately solved for node temperatures similarly as node voltages. Based on 1-D treatment of temperature distribution, temperature on PDN segments can be derived as

$$T = G_{\rm th}^{-1} Q_{\rm th} = H_{\rm th} Q_{\rm th} \tag{8}$$



Fig. 3. Comparison of HSPICE and our model voltage transient response on a power network node.



Fig. 4. Illustration of sample circuit.

where  $G_{\text{th}}^{-1}$  is thermal impedance matrix and  $Q_{\text{th}}$  is the heat source vector based on derived power consumption from electrical network. Node temperature can be expressed as

$$t_i = \sum_{j=1}^n h_{\mathrm{th}_{ij}} q_{\mathrm{th}_j}.$$
(9)

Overall, node temperature due to multiple clock domains can be derived as

$$T_{i} = \sum_{j=1}^{m} T_{i}^{\text{freq}_{j}} + T_{i}^{o}$$
(10)

where  $T_i^o$  is the initial temperature. Accuracy of the mathematical formulations for power supply noise and temperature is compared with HSPICE simulations. Fig. 3 shows the comparison of the voltage response at a node on the power grid with respect to our analytical model, and the results show up to 92% accuracy of the proposed method for deriving power supply noise distribution on 3-D PDNs with multiple clock domains. Moreover, experiments indicate that excessive noise and temperatures can be induced due to various clock frequencies. For example, some results are shown for a sample circuit in Section IV-C. Overall, such analyses motivate deeper investigations into the optimal design of 3-D PDNs with multiple clock domains.

#### C. Sample Circuit Analysis

Here, we provide a sample circuit where we perform all the mathematical aforementioned steps in order to illustrate the analysis flow. Fig. 4 shows an illustration of the sample threetier multicore system that we utilize for this paper. All cores are identical; however, their switching frequencies vary. For this sample circuit, we set the switching frequencies as 500, 750, and 600 MHz for cores A, B, and C, respectively. Please note that these are simply taken as the case studies to show the analysis flow. We consider four different working cases. In case I, only core A is active and we can compute the power supply noise that only core A generates or its self-induced noise. In case II, both cores A and B are active and here we can compute the amount of noise that working core B transfers to core A, thus transferred noise. Similarly, in case III, cores A and C are active and we can derive the amount of noise that is transferred through the TSVs from core C to core A. In last case IV, all cores A, B, and C are active, and we derive the total amount of noise on core A due to neighboring cores B and C. We also derive the worst case temperature distributions on each tier for each case in order to understand the impact of working neighboring cores and TSVs.

In Fig. 5, we show both the worst case power supply noise (i.e., maximum dc voltage droop) and temperature (i.e., highest temperature based on the largest dc voltage droop) distributions for all cases. The worst case supply noise map is shown for each tier for each case to visualize the evolution of noise between tiers. We notice that for case I, the worst case noise peaks at core A as

$$PSN_A^{self} = PSN_A^{freq_A} + PSN_A^o = 167 \times 10^{-12} \text{ V} \cdot \text{s} \quad (11)$$

where Vs is the unit for supply noise in voltage per seconds. In case II, we derive the worst case noise at core A as

$$PSN_{A}^{caseII} = PSN_{A}^{freq_{A}} + PSN_{A}^{freq_{B}} + PSN_{A}^{o}$$
$$= 230 \times 10^{-12} \text{ V} \cdot \text{s}$$
(12)

where the transferred noise can be derived as

$$PSN_{A}^{caseII} - PSN_{A}^{self} = (230 - 167) \times 10^{-12}$$
$$= 63 \times 10^{-12} \text{ V} \cdot \text{s}$$
(13)

which represent the additional amount of supply noise induced by working neighbor core B on core A. In case III, cores A and C are active and the supply noise at core A is derived as  $PSN_A^{caseIII} = 261 \times 10^{-12} Vs$  and the transferred noise is derived as

$$PSN_{A}^{caseIII} - PSN_{A}^{self} = (261 - 167) \times 10^{-12}$$
$$= 94 \times 10^{-12} \text{ V} \cdot \text{s}$$
(14)

which is the additional amount of supply noise induced from core C to A through the TSVs. We notice that the amount of supply noise transferred from core C to A is larger than noise transferred from core B to A, which is due to the switching frequency difference between cores B and C. In case IV, the total amount of noise when all cores A, B, and C are active is derived as  $PSN_A^{caseIV} = 272 \times 10^{-12}$  V · s, and the total amount of noise transferred is derived as

$$PSN_{A}^{caseIV} - PSN_{A}^{self} = (272 - 167) \times 10^{-12}$$
$$= 105 \times 10^{-12} \text{ V} \cdot \text{s}$$
(15)

which is the additional supply noise induced by cores B and C. We note that the total amount of noise from cores B and C



Fig. 5. Power supply noise and temperature distribution for all cases on all tiers. We note that there is a strong relation between worst case voltage droop and temperature maps.

is not accumulative of their individual supply noise. This is due to the amount of decoupling capacitance available in the system to sustain some of the supply noise when several cores are active. Previous works notably [46], [47] have shown that decoupling capacitors located in close distance (or effective distance) to switching circuits are helpful in suppressing power supply noise. The effective distance in our test case is the Manhattan distance from core to core. Thus, for each case, the amount of effective decoupling capacitance varies depending on the location of the switching core and the idle cores around it that can serve as decoupling capacitance. Fig. 6 shows the amount of voltage droop derived on each tier for each case.

Temperature distributions for each tier and for each case are shown in Fig. 5 (right-hand side). Only tiers 1 and 2 are shown as the temperature on tier 3 does not vary significantly from the heat sink temperature ( $\sim$ 300 K). In case I, a worst case temperature of 341 K was obtained, where only core A is active. In case II, we derive a worst case temperature of 372 K in tier 1, and we note a temperature increase by 372 K-341 K = 31 K on core A due to the impact of working neighbor core B. Similarly, in case III, a temperature increase on core A is observed as 382 K, where the temperature impact of core C on A is derived as 382 K – 341 K = 41 K. Such impact also demonstrates the thermal distribution through the TSVs from one tier to the next. When all three cores are



Voltage Droop Distribution

Fig. 6. Power supply noise distribution for all cases and each tier.

active, the temperature increase on core A reaches 384 K, and the overall impact of cores B and C on core A is as 384 K - 341 K = 43 K, which is less than the total impact of individual cores of 72 K (i.e., where effect of core B on A is 31 K and core C on A is 41 K, a total of 72 K).

Fig. 7 shows the maximum temperature distributions on each tier for each case. Overall, we notice that the highest temperatures occur on tier 1, which is due to the activity



Fig. 7. Temperature distribution for all cases and each tier. Note that Tier 3 is next to heat sink and benefits from immediate cooling.

of cores A and B. In addition, we note that in tier 1, the largest temperature is obtained for case IV when all cores are active and some of the heat is transferred through the TSVs. Such observation reveals that the TSVs have an important role on overall thermal distribution and we will investigate how their sizing and placement can further impact voltage droop and temperature.

#### V. EXPERIMENTS

To perform a meaningful electrothermal analysis, if every individual switching device is represented, a system model containing a 3-D PDN can easily reach a complication level that is hard to be implemented and simulated. For this reason, we devise several exhaustive 3-D power delivery benchmarks with many switching cores and various stacking orders. The objective is to have a wide representation of 3-D power delivery benchmarks with multiple cores that can be meaningful for voltage droop and temperature analysis purposes. We devise benchmarks with a three-tier system, where each tier has nine cores. Tier 1 (T1) is the closest to the package bumps, whereas tier 3 (T3) is next to heat sink and benefits from immediate cooling. From a wide range of operating frequencies, we select three frequency ranges to represent low (i.e., L 500 MHz, frequency <1 GHz), = mid (i.e., M = 1 GHz), and high (i.e., H = 3 GHz) clock frequencies. The total number of TSVs is 288, where 16 TSVs are used for each core for power and ground, respectively. We apply various clock frequencies and investigate the voltage droop and temperature on the 3-D PDN. In Sections V-A-V-E, we perform the following experiments.

#### A. Impact of Identical Dies With Single Clock Domain

For this study, we apply the same clock frequency to all the tiers and their cores. We study three setups (i.e., s1, s2, and s3). In setup s1, all the cores operate in low-frequency clock domain (*L*), in setup s2, all the cores operate in mid clock frequency (*M*), and in setup s3, all the cores operate in high clock frequency (*H*), as shown in Fig. 8. For each setup, we compute the worst case power supply noise (voltage droop



Fig. 8. Experimental setup for *s*1, *s*2, and *s*3.

TABLE II Worst Case Voltage Droop and Temperature With Single Clock Domain on Identical Tiers

| Setup | Voltg. Droop (mV) |        |        | Temp. (°C) |       |       |
|-------|-------------------|--------|--------|------------|-------|-------|
|       | T1                | T2     | T3     | T1         | T2    | T3    |
| s1    | 74.52             | 95.50  | 98.53  | 73.5       | 75.80 | 32.00 |
| s2    | 75.18             | 97.42  | 101.30 | 73.90      | 76.60 | 32.20 |
| s3    | 78.05             | 108.58 | 119.03 | 76.40      | 81.60 | 33.10 |



Fig. 9. Experimental setup for s4, s5, and s6.

on power and ground tracks) and temperature on each tier, as listed in Table II.

We note that in general, the worst voltage droop is measured on tier 3 (furthest from package), and worst temperature is measured on tier 2 (middle of stack). We also note that high-frequency clock domain (setup s3) introduces the most voltage droop and temperature rise. This is due to increased inductive parasitic impedance (jwL) with frequency increase and reduced effectiveness of decoupling capacitances, as they cannot be recharged in time before the next transition. Tier 2 located in the middle of the stack suffers from heat dissipation from both the top and bottom tiers; hence, it experiences the highest temperature.

Such results are interesting as they indicate that a system can satisfy the power and thermal constraints when it is running applications or task below a given frequency (i.e., 1 GHz); however, it can suffer from larger voltage droop and temperature when its cores run applications or tasks operating at a higher frequency (i.e., 3 GHz). Hence, the characteristics of applications to run on a 3-D multicore system should be carefully analyzed, such that no critical power and thermal integrity issues are introduced.

#### B. Impact of Identical Dies With Multiple Clock Domains

To investigate the impact of multiple clock domains, we devise three setups s4, s5, and s6. For the sake of clarity, we choose these three cases; however, more cases can be envisioned. Each setup is shown in Fig. 9, where setup s4 has operating cores with M and H clock domains, setup s5 has cores with L and M clock domains, and setup s6 has cores with H and L clock domains, respectively.

TABLE III Worst Case Voltage Droop and Temperature With Multiple Clock Domains on Identical Tiers

| Setup | Voltg. Droop (mV) |       |        | Temp. (°C) |       |       |
|-------|-------------------|-------|--------|------------|-------|-------|
|       | T1                | T2    | T3     | T1         | T2    | T3    |
| s4    | 62.35             | 87.63 | 97.02  | 63.00      | 68.00 | 32.00 |
| s5    | 60.08             | 77.70 | 80.90  | 60.20      | 62.70 | 31.10 |
| s6    | 72.76             | 95.68 | 101.50 | 71.64      | 74.48 | 32.20 |

We note that the assignment of clock domains on each tiers result into different voltage droop and temperatures. The results are listed in Table III. The worst voltage droop and the temperature are measured on setup s6, where there is up to 10 mV and 12 °C difference from setup s4. The voltage droop is worse on s6 than s4 mainly due to the chip-package resonance frequency effect. The resonance is computed as in [48] with formula  $f = 1/(2\pi (L_{pkg}C_{decap})^{1/2})$  and derived as 290 MHz, which is close to the L low-frequency workload of 500 MHz. When the switching frequency of the circuit is close to the chip-package resonance frequency, it causes additional voltage fluctuations on the supply voltage, which further detriments voltage droop. The inductance of the package and on-chip decoupling capacitance creates an LC tank effect, which worsens voltage droop for s6 compared with s4. To avoid resonance frequency issues, designer should be careful on the choice of package and amount of inserted decoupling capacitance. Previous works [41] have shown that inserting too much decoupling capacitance lowers the resonance frequency, which worsens on-chip power supply noise.

Such results indicate the importance of stacking order for identical dies with different clock domains. In addition, results demonstrate that the stacking of identical dies might not necessarily be of interest for the operating cores (i.e., applications running on cores) with frequencies above 1-GHz range. For example, in case s6, 3-D PDNs can experience a large amount of voltage droop and high temperatures, which can cause further performance degradation to the underlying cores.

## C. Impact of Heterogeneous Dies With Multiple Clock Domains

In this scenario, we study the impact of heterogeneous dies with different clock domains and we devise three setups s7, s8, and s9. We assume that each tier can have cores operating with clock domains as: 1) low-mid (*LM*); 2) mid-high (*MH*); and 3) high-low (*HL*), as shown in Fig. 10. Please note that more clock domain combinations can be derived. The results for each setup are listed in Table IV.

Results indicate that setup s7 has the worst voltage droop and temperature on tier 3 with clock domains *HL*. We note that when clock domains *HL* are inserted on tier 1 as in setup s9 (or tier 2 as in setup s8), up to 12 mV and 6 °C less voltage droop and temperature are observed. This reveals the importance of workload assignment and stacking order of dies. While, overall, the same type of workloads is applied to all three setups, only their stacking order varies. We observe up to 5% and 7.5% difference in worst case voltage droop



Fig. 10. Experimental setup for s7, s8, and s9.

## TABLE IV

WORST CASE VOLTAGE DROOP AND TEMPERATURE WITH MULTIPLE CLOCK DOMAINS ON HETEROGENOUS TIERS

| Setup | Voltg. Droop (mV) |       |        | Temp. (°C) |       |       |
|-------|-------------------|-------|--------|------------|-------|-------|
|       | T1                | T2    | T3     | T1         | T2    | T3    |
| s7    | 72.78             | 95.70 | 101.52 | 71.65      | 74.50 | 32.20 |
| s8    | 62.35             | 87.64 | 97.02  | 62.90      | 67.95 | 31.90 |
| s9    | 65.57             | 85.80 | 89.70  | 66.10      | 68.86 | 31.60 |



Fig. 11. Experimental setup for s10 and s11.

TABLE V Worst Case Voltage Droop and Temperature With Multiple Clock Domains on Heterogenous Tiers

| Setup | Voltg. Droop (mV) |        |        | Temp. $(^{o}C)$ |       |       |
|-------|-------------------|--------|--------|-----------------|-------|-------|
|       | T1                | T2     | T3     | T1              | T2    | T3    |
| s10   | 120.07            | 121.06 | 122.12 | 65.42           | 61.87 | 56.12 |
| s11   | 126.10            | 122.62 | 123.68 | 68.26           | 62.51 | 55.07 |

and temperature, respectively, due to stacking order difference. Such variations on voltage droop and temperature could be critical on the performance and reliability of 3-D multicore systems, where the voltage droop margins are already very tight. Such results further motivate investigation on workload assignment while taking into account stacking order and core's operating frequencies.

In practical cases, each tier would have a single clock domain that could vary from tier to tier. We also investigate the scenarios where heterogeneous dies are stacked together and each die has a single clock domain. We investigate experimental setup s10, where workloads with clock frequencies H is in tier 1, M in tier 2, and L in tier 3. Both setups are shown in Fig. 11. Setup s11 has workloads with clock frequency L on tier 1, M on tier 2, and H on tier 3. Please note that more setups can be envisioned. We aim to capture the dependence of workload frequency and stacking order for heterogeneous dies with a single clock domain.

We observe that voltage droop values are higher up to 4.7% for T1 in benchmark *s*11 than in *s*10, as shown in Table V. The main difference between these two benchmarks is the stacking order of workloads *H* and *L*. We observe that allocating top tier (T1) for high-frequency workloads gives slightly less voltage droop and temperature increase. Workloads on tier T1 benefit from package closeness, hence less voltage droop,

whereas allocating workloads with low frequency on the bottom tier (T3) weakens chip-package resonance effects due to the longer distance from package to low-frequency switching cores and also benefits from immediate cooling of heat sink. On benchmark s11, locating high-frequency workloads on T3 helps to mitigate elevated temperatures, however, it worsens its voltage droop. Thus, benchmark s10 provides overall less voltage droop, and it has higher temperature on T3 than in s11 due to H workload located next to heat sink. Tier T1 on s11 has more voltage droop due to resonance effect, which ultimately the additional switching due to resonance increase power consumption and results in higher temperature than T1 in s10. This experiment highlights the importance of stacking order and clock domain frequency and points out that to faster switching cores due to their large current demand is better to locate them on the tier next to the package. Such observations also reveal the complexity of the problem for workload assignment as voltage droop and temperature increase in the opposite directions on the 3-D stack.

#### D. Impact of TSV Sizing

Here, we investigate the impact of TSV sizing on power supply noise and temperature distribution. We use three different types of TSV dimensions. We investigate TSVs with diameter 3, 5, and 10  $\mu$ m. As we are investigating global interconnects for power and ground delivery, we utilize the TSV parameters from ITRS 2013 [49] on global interconnects based on 3-D-stacked ICs/3-D-SoC roadmap. We evaluate the power supply noise for each type of TSV while applying the same switching activity and frequency to the 3-D system. We apply two different clock frequencies, 1 and 3 GHz, respectively. Please note that all cores on each tier are active and switching with the same clock frequency. As the goal is to capture the impact of TSV sizing, we perform two experiments, first when all cores are switching clock frequency 1 GHz and second with 3 GHz.

In Fig. 12(a), maximum voltage droop for each tier with clock frequency 1 GHz is shown. We note that with increasing TSV diameter, there is up to 6.25% effect on voltage droop for each tier. Increasing TSV diameter results in less resistance parasitic, whereas there is a sharp increase in TSV inductance and capacitance parasitics that overall impact its impedance; hence, we observe a small reduction on the voltage droop as TSV diameter increases. We observe a reduction of 6.25% in voltage droop with an increase of  $3-5-\mu m$  TSV diameter, whereas only 4% decrease in voltage droop with increase of 5–10- $\mu$ m TSV diameter. In Fig. 12(b), the maximum temperatures are shown for each tier. We observe almost the same temperature levels on each tier indicating TSV contribution to vertical heat transfer. Note that tier 3 has almost no temperature change (i.e.,  $\sim 300$  K) as it is next to heat sink and benefits from immediate cooling. Results depict that lateral voltage droop is more dominant (i.e., less droop difference between tiers), whereas vertical heat transfer is more crucial (i.e., larger temperature difference between tiers) among tiers. We perform similar experiments when the switching clock frequency in the 3-D system is increased to 3 GHz. We observe that voltage droop worsens with the increase in



Fig. 12. Maximum voltage droop and temperature on each tier for different diameter TSVs for the clock frequency of 1 GHz. All cores are active on all tiers. Note that Tier 3 is next to heat sink and benefits from immediate cooling. (a) Maximum voltage droop on each tier. (b) Maximum temperature on each tier.

clock frequency regardless of a TSV diameter size, whereas large diameter TSVs demonstrate to reduce up to 4% of the heat dissipation, as shown in Figs. 12(b) and 13(b).

#### E. Impact of TSV Placement

Here, we investigate the impact of TSV placement on power supply noise and temperature distribution. We use three different types of TSV dimensions as in the previous experiment. In addition, we investigate uniform and clustered TSV placement. Uniformly distributed TSVs are homogeneously and consistently distributed among cores, where 16 TSVs (both power and ground) are used for each core, thus a total of 288 TSVs in the 3-D system, whereas clustered TSVs are grouped together and placed in the center of chip area while utilizing the same number of TSVs. Thus, the total number of TSVs (288) remains unchanged for both uniform and clustered test cases; however, their placement changes. The objective is



Fig. 13. Maximum voltage droop and temperature on each tier for different diameter TSVs for the clock frequency of 3 GHz. Note that Tier 3 is next to heat sink and benefits from immediate cooling. (a) Maximum voltage droop on each tier. (b) Maximum temperature on each tier.

to investigate TSV placement impact on worst case voltage droop and temperature.

Similar to the previous experiment, we evaluate the worst case power supply noise for each type of TSV while applying the same switching activity and frequency on the 3-D system, where all cores are active. We apply two different clock frequencies, 1 and 3 GHz, respectively, and compute the worst case voltage droop and temperature on each tier.

In Fig. 14(a), the maximum voltage droop distributions are given for each tier for both uniform and clustered TSV placements. We note that the clustered TSVs induce an overall increase in the voltage droop for each tier compared with uniformly distributed TSVs. Some of the voltage droop increase is lowered when TSV diameter increases. Results show that the distribution and the sizing of TSVs can have up to 32% effect on the worst case voltage droop even when the same number of TSVs is utilized.

In Fig. 14(b), the maximum temperature distribution on each tier is shown when both the uniform and clustered



Fig. 14. Maximum voltage droop and temperature on each tier for different diameter TSVs for the clock frequency of 1 GHz. We investigate uniformly and clustered TSVs. Note that there is almost no temperature variation on Tier 3 as it is next to heat sink and benefits from immediate cooling. (a) Maximum voltage droop on each tier. (b) Maximum temperature on each tier.

TSVs are used. Similarly, we note that the clustered TSVs elevate the temperature drastically than uniformly distributed TSVs up to 9.6%; however, there is some alleviated heat with the increase in TSV diameter. This is due to the thermal blockage effect that clustered TSV create between cores. This phenomenon was also observed in [50]. The local thermal build-up around the TSVs leads to increase lateral heat distribution, which consequently increase intratier voltage droop. We perform similar experiments when the switching clock frequency in the 3-D system is set to 3 GHz. We observed a similar trend for both voltage droop and temperature, as shown in Fig. 15(a) and (b).

Overall, these experiments demonstrate the criticality that multiple clock domains induce on voltage droop and temperature on 3-D PDNs. They further motivate dedicated analysis 12



Fig. 15. Maximum voltage droop and temperature on each tier for different diameter TSVs for the clock frequency of 3 GHz. We investigate uniformly and clustered TSVs. Note that there is almost no temperature variation on Tier 3 as it is next to heat sink and benefits from immediate cooling. (a) Maximum voltage droop on each tier. (b) Maximum temperature on each tier.

and optimization methods for addressing power supply noise and excessive temperatures on 3-D PDNs due to multiple clock domains.

#### VI. OBSERVATIONS AND PRACTICAL DESIGN GUIDELINES

Based on the results obtained from the above experiments, we deduce several observations that can be utilized during the early design phase to ensure power and thermal integrity of 3-D PDNs with multiple clock domains.

We draw the following observations.

#### A. Observation 1: Clock Frequency and Supply Noise

Core frequency has a significant effect on the amount of supply noise induced. We observe that when the 2-D power delivery for a multicore system (i.e., a single tier with nine cores)

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

is utilized to build a 3-D PDN, as shown in Fig. 8, it might no longer be optimal (i.e., voltage droop  $\geq 10\% V_{DD}$ ). In other words, PDNs that were initially designed for a 2-D multicore system might not necessarily be optimal when reutilized for 3-D PDNs. For low-frequency switching cores (i.e.,  $\leq 1$  GHz as in benchmark *s*1), simple reutilization of 2-D PDNs to 3-D PDNs might be sufficient. However, if the frequency of cores is higher than 1 GHz (i.e., as in benchmarks *s*2 and *s*3), then a redesign of 3-D PDN is needed to minimize the amount of voltage droop and high temperatures. Stacking of identical dies for more processing power can be exploited with minor design effort depending on the core operating frequencies.

## B. Observation 2: Multiple Clock Domains and Supply Noise

Depending on the frequency of multiple clock domains, they can adversely affect supply noise. For example, when identical dies with cores operating with two different clock frequencies, *MH* and *LM*, such as benchmarks *s*4 and *s*5, then simple stacking of PDNs does not lead to excessive voltage droop or temperatures, whereas identical dies with cores operating with high-low *HL* clock frequencies (i.e., benchmark *s*6) would lead to exacerbated noise and temperature, requiring a redesign of 3-D PDNs. These results also show that the worst case noise would occur when high- and low-frequency cores are operating simultaneously. This observation can be important during the runtime application of multicore system in order to avoid excessive noise and heat, and parallel workload assignment of high and low frequencies can be avoided.

# C. Observation 3: Multiple Clock Domains and Stacking Order

Benchmarks s7, s8, s9, s10, and s11 represent the same circuit but with different stacking orders of cores. We observe that clock domain frequency and stacking order have a significant effect on the amount of voltage droop and temperature on each tier. We note that minimal voltage droop is obtained when high-frequency switching cores are located next to the heat sink or in middle tier as in benchmark s8, whereas large voltage droop and temperatures are obtained when high-frequency switching cores are located next to the package or T1. This can also be explained by the effect of high-frequency switching noise due to coupling of decoupling capacitance with package inductance that creates an *LC* resonance effect at high frequencies, as observed in [41]. In addition, high-frequency cores also lead to excessive temperatures on T1, which is farthest from heat sink and further exacerbates voltage droop.

# D. Observation 4: Multiple Clock Domains and Workload Assignment Order

Another angle to view the combined results from benchmarks sets (s4, s5, and s6) and (s7, s8, and s9) is the effect that workload frequency and assignment order have on 3-D PDNs voltage droop and temperature. From benchmarks s4, s5, and s6, we deduced that 3-D PDNs remain reliable as long as cores with *MH* and *LM* frequencies are operating simultaneously. From benchmarks *s*7, *s*8, and *s*9, we deduced that the location of switching cores plays a crucial role on reliability of 3-D PDNs. Overall, one can conclude that taking into account core switching frequency, location, and assignment order, excessive voltage droop and temperature issues can be mitigated. For example, for a given set of workloads to be executed, deciding to simultaneously operate *MH* or *LM* workloads and selecting cores next to heat sink for *HL* workloads can be crucial for alleviating power supply noise and thermal issues, or assigning middle tier or tier next to heat sink for high-frequency workloads can be an alternative to reducing noise.

## E. Observation 5: Multiple Clock Domains and TSV Sizing and Placement

We observe that dense clustered TSV regions around the cores alter voltage droop and thermal profile on each tier. Moreover, this effect worsens with the increase in core frequency and smaller TSV diameters. Experiments show that TSV clustered structures can affect thermal profile of tiers despite that each tier might have cores operating at the same switching frequency. Thus, the amount of heat transfer from TSVs (i.e., intratier and intertier) has an effect on the total amount of voltage droop per tier, which can also lead possible thermal blockage effects [50]. Such observation raises concerns on thermal-aware placement and sizing of TSVs while taking into account workload assignment, core switching frequency, and their stacking order.

#### VII. CONCLUSION

In this paper, electrothermal simulations are performed to derive the worst case power supply noise and temperature distributions on 3-D PDNs with multiple clock domains. This is the first work that targets 3-D PDN resiliency in the presence of various switching frequencies. We provide a guideline on modeling efforts for 3-D PDNs with multiple clock domains and describe mathematical formulations to derive the amount of self-induced and transferred power supply noise and temperature on each tier. We conduct experiments on several 3-D multicore benchmarks to investigate power supply noise and thermal issues. Our experimental results show that TSVs contribute to the amount of noise and heat transfer intratier and intertier. Certain combinations of clock domains cause severe voltage droop and elevated temperatures. Some mitigation techniques are explored, such as investigating workload assignment placement, stacking order, and TSV sizing and placement. Based on observations derived from the experiments, we present a list of observation and practical guidelines for ensuring power and thermal integrity of 3-D PDNs with multiple clock domains.

#### REFERENCES

- L. Benini and G. De Micheli, "Networks on chips: A new SoC paradigm," *Computer*, vol. 35, no. 1, pp. 70–78, Jan. 2002.
- [2] P. Magarshack and P. G. Paulin, "System-on-chip beyond the nanometer wall," in *Proc. IEEE/ACM Design Autom. Conf. (DAC)*, Crolles, France, Jun. 2003, pp. 419–424.
- [3] B. Feero and P. P. Pande, "Performance evaluation for three-dimensional networks-on-chip," in *Proc. IEEE Comput. Soc. Annu. Symp. VLSI (ISVLSI)*, Porto Alegre, Brazil, Mar. 2007, pp. 305–310.

- [4] I. Loi, F. Angiolini, S. Fujita, S. Mitra, and L. Benini, "Characterization and implementation of fault-tolerant vertical links for 3-D networks-onchip," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 30, no. 1, pp. 124–134, Jan. 2011.
- [5] F. Clermidy, F. Darve, D. Dutoit, W. Lafi, and P. Vivet, "3D embedded multi-core: Some perspectives," in *Proc. IEEE/ACM Design, Autom. Test Eur. Conf. Exhibit. (DATE)*, Grenoble, France, Mar. 2011, pp. 1–6.
- [6] N. AbouGhazaleh, B. Childers, D. Mossé, and R. Melhem, "Integrated CPU cache power management in multiple clock domain processors," in *High Performance Embedded Architectures and Compilers*. Berlin, Germany: Springer-Verlag, 2008.
- [7] A. Todri-Sanial, A. Bosio, L. Dilillo, P. Girard, and A. Virazel, "Fast and accurate electro-thermal analysis of three-dimensional power delivery networks," in *Proc. 14th Int. Conf. Thermal, Mech. Multi-Phys. Simulation Experim. Microelectron. Microsyst. (EuroSimE)*, Wrocław, Poland, Apr. 2013, pp. 1–4.
- [8] A. Todri-Sanial, "Electro-thermal characterization of through-silicon vias," in Proc. 15th Int. Conf. Thermal, Mech. Multi-Phys. Simulation Experim. Microelectron. Microsyst. (EuroSimE), Ghent, Belgium, Apr. 2014, pp. 1–6.
- [9] X. Zhao, J. Minz, and S. K. Lim, "Low-power and reliable clock network design for through-silicon via (TSV) based 3D ICs," *IEEE Trans. Compon. Packag. Manuf. Technol.*, vol. 1, no. 2, pp. 247–259, Feb. 2011.
- [10] X. Zhao and S. K. Lim, "Power and slew-aware clock network design for through-silicon-via (TSV) based 3D ICs," in *Proc. IEEE/ACM 15th Asia South Pacific Design Autom. Conf. (ASP-DAC)*, Taipei, Taiwan, Jan. 2010, pp. 175–180.
- [11] S. Bang, K. Han, A. B. Kahng, and V. Srinivas, "Clock clustering and IO optimization for 3D integration," in *Proc. IEEE/ACM Int. Workshop Syst. Level Interconnect Predict. (SLIP)*, San Francisco, CA, USA, Jun. 2015, pp. 1–8.
- [12] H. Park and T. Kim, "Synthesis of TSV fault-tolerant 3-D clock trees," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 34, no. 2, pp. 266–279, Feb. 2015.
- [13] T.-Y. Kim and T. Kim, "Clock network design techniques for 3D ICs," in *Proc. IEEE Int. Midwest Symp. Circuits Syst. (MWSCAS)*, Seoul, South Korea, Aug. 2011, pp. 1–4.
- [14] M. Mondal et al., "Thermally robust clocking schemes for 3D integrated circuits," in Proc. IEEE/ACM Design, Autom. Test Eur. Conf. Exhibit. (DATE), Nice, France, Apr. 2007, pp. 1–6.
- [15] I. Savidis, V. Pavlidis, and E. G. Friedman, "Clock distribution models of 3-D integrated systems," in *Proc. IEEE Int. Symp. Circuits Syst. (ISCAS)*, Rio de Janeiro, Brazil, May 2011, pp. 2225–2228.
- [16] C.-L. Lung, Y.-S. Su, H.-H. Huang, Y. Shi, and S.-C. Chang, "Through-silicon via fault-tolerant clock networks for 3-D ICs," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 32, no. 7, pp. 1100–1109, Jul. 2013.
- [17] V. F. Pavlidis, I. Savidis, and E. G. Friedman, "Clock distribution networks in 3-D integrated systems," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 19, no. 12, pp. 2256–2266, Dec. 2011.
- [18] J. S. Pak et al., "PDN impedance modeling and analysis of 3D TSV IC by using proposed P/G TSV array model based on separated P/G TSV and chip-PDN models," *IEEE Trans. Compon., Packag., Manuf. Technol.*, vol. 1, no. 2, pp. 208–219, Feb. 2011.
- [19] H. He, J. J.-Q. Lu, and X. Gu, "Analysis of TSV geometric parameter impact on switching noise in 3D power distribution network," in *Proc.* 25th Annu. SEMI Adv. Semiconductor Manuf. Conf. (ASMC), San Jose, CA, USA, May 2014, pp. 67–72.
- [20] H. He, J. J.-Q. Lu, Z. Xu, and X. Gu, "TSV density impact on 3D power delivery with high aspect ratio TSVs," in *Proc. 24th Annu. SEMI Adv. Semiconductor Manuf. Conf. (ASMC)*, San Jose, CA, USA, May 2013, pp. 70–74.
- [21] M. B. Healy and S. K. Lim, "A novel TSV topology for many-tier 3D power-delivery networks," in *Proc. Design, Autom. Test Eur. Conf. Exhibit. (DATE)*, Grenoble, France, Mar. 2011, pp. 1–4.
- [22] N. H. Khan, S. M. Alam, and S. Hassoun, "Power delivery design for 3-D ICs using different through-silicon via (TSV) technologies," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 19, no. 4, pp. 647–658, Apr. 2011.
- [23] V. F. Pavlidis and G. De Micheli, "Power distribution paths in 3-D ICs," in *Proc. IEEE Comput. Soc. Annu. Symp. VLSI (ISVLSI)*, Tampa, FL, USA, May 2009, pp. 263–268.

- [24] P. Falkenstern, Y. Xie, Y.-W. Chang, and Y. Wang, "Three-dimensional integrated circuits (3D IC) floorplan and power/ground network co-synthesis," in *Proc. IEEE/ACM Asia South Pacific Design Autom. Conf. (ASP-DAC)*, Taipei, Taiwan, Jan. 2010, pp. 169–174.
- [25] A. Todri, S. Kundu, P. Girard, A. Bosio, L. Dilillo, and A. Virazel, "A study of tapered 3-D TSVs for power and thermal integrity," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 21, no. 2, pp. 306–319, Feb. 2013.
- [26] A. Todri-Sanial, S. Kundu, P. Girard, A. Bosio, L. Dilillo, and A. Virazel, "Globally constrained locally optimized 3-D power delivery networks," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 22, no. 10, pp. 2131–2144, Oct. 2014.
- [27] G. Huang, M. S. Bakir, A. Naeemi, and J. D. Meindl, "Power delivery for 3-D chip stacks: Physical modeling and design implication," *IEEE Trans. Compon., Packag., Manuf. Technol.*, vol. 2, no. 5, pp. 852–859, May 2012.
- [28] H. He and J. J.-Q. Lu, "Modeling and analysis of PDN impedance and switching noise in TSV-based 3-D integration," *IEEE Trans. Electron Devices*, vol. 62, no. 4, pp. 1241–1247, Apr. 2015.
- [29] H. He, X. Gu, and J. Q. Lu, "Modeling of switching noise and coupling in multiple chips of 3D TSV-based systems," in *Proc. IEEE* 64th Electron. Compon. Technol. Conf. (ECTC), San Jose, CA, USA, May 2014, pp. 548–553.
- [30] P. Zhou, K. Sridharan, and S. S. Sapatnekar, "Congestion-aware power grid optimization for 3D circuits using MIM and CMOS decoupling capacitors," in *Proc. IEEE/ACM Asia South Pacific Design Autom. Conf. (ASP-DAC)*, Yokohama, Japan, Jan. 2009, pp. 179–184.
- [31] S. Yao, X. Chen, Y. Wang, Y. Ma, Y. Xie, and H. Yang, "Efficient regionaware P/G TSV planning for 3D ICs," in *Proc. IEEE 15th Int. Symp. Quality Electron. Design (ISQED)*, Santa Clara, CA, USA, Mar. 2014, pp. 171–178.
- [32] A. Todri and M. Marek-Sadowska, "Power delivery for multicore systems," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 19, no. 12, pp. 2243–2255, Dec. 2011.
- [33] M. Saint-Laurent and M. Swaminathan, "Impact of power-supply noise on timing in high-frequency microprocessors," *IEEE Trans. Adv. Packag.*, vol. 27, no. 1, pp. 135–144, Feb. 2004.
- [34] J. Singh and S. S. Sapatnekar, "Congestion-aware topology optimization of structured power/ground networks," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 24, no. 5, pp. 683–695, May 2005.
- [35] X. Hu, Y. Xu, Y. Hu, and Y. Xie, "SwimmingLane: A composite approach to mitigate voltage droop effects in 3D power delivery network," in *Proc. IEEE/ACM Asia South Pacific Design Autom. Conf. (ASP-DAC)*, Singapore, Jan. 2014, pp. 550–555.
- [36] J. Xie et al., "Electrical-thermal co-analysis for power delivery networks in 3D system integration," in Proc. IEEE Int. Conf. 3D Syst. Integr. (3DIC), San Francisco, CA, USA, Sep. 2009, pp. 1–4.
- [37] J. Xie and M. Swaminathan, "Simulation of power delivery networks with joule heating effects for 3D integration," in *Proc. IEEE 3rd Electron. Syst.-Integr. Technol. Conf. (ESTC)*, Berlin, Germany, Sep. 2010, pp. 1–6.
- [38] S. L. Wright *et al.*, "Characterization of micro-bump C4 interconnects for Si-carrier SOP applications," in *Proc. IEEE 56th Electron. Compon. Technol. Conf. (ECTC)*, San Diego, CA, USA, May 2006, pp. 633–640.
- [39] C. Fuchs et al., "Process and RF modelling of TSV last approach for 3D RF interposer," in Proc. IEEE Int. Technol. Conf. Mater. Adv. Metallization (IITC/MAM), Dresden, Germany, May 2011, pp. 1–3.
- [40] A. Todri, M. Marek-Sadowska, and J. Kozhaya, "Power supply noise aware workload assignment for multi-core systems," in *Proc. IEEE/ACM Int. Conf. Comput.-Aided Design (ICCAD)*, San Jose, CA, USA, Nov. 2008, pp. 330–337.
- [41] R. Jakushokas, M. Popovich, A. V. Mezhiba, S. Köse, and E. G. Friedman, *Power Distribution Networks With On-Chip Decoupling Capacitors.* New York, NY, USA: Springer Science & Business Media, 2011.
- [42] Y. Cheng et al., "A novel method to mitigate TSV electromigration for 3D ICs," in Proc. IEEE Comput. Soc. Annu. Symp. VLSI (ISVLSI), Natal, Brazil, Aug. 2013, pp. 121–126.
- [43] A. Jain, R. E. Jones, R. Chatterjee, S. Pozder, and Z. Huang, "Thermal modeling and design of 3D integrated circuits," in *Proc. IEEE 11th Intersoc. Conf. Thermal Thermomech. Phenomena Electron. Syst. (ITHERM)*, Orlando, FL, USA, May 2008, pp. 1139–1145.
- [44] C.-W. Ho, A. E. Ruehli, and P. A. Brennan, "The modified nodal approach to network analysis," *IEEE Trans. Circuits Syst.*, vol. 22, no. 6, pp. 504–509, Jun. 1975.

- [45] M. Zhao, R. V. Panda, S. S. Sapatnekar, and D. Blaauw, "Hierarchical analysis of power distribution networks," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 21, no. 2, pp. 159–168, Feb. 2002.
- [46] M. Sotman, A. Kolodny, M. Popovich, and E. G. Friedman, "On-die decoupling capacitance: Frequency domain analysis of activity radius," in *Proc. IEEE Int. Symp. Circuits Syst. (ISCAS)*, May 2006, pp. 492–496.
- [47] M. Popovich, M. Sotman, A. Kolodny, and E. Friedman, "Effective radii of on-chip decoupling capacitors," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 16, no. 7, pp. 894–907, Jul. 2008.
- [48] L. D. Smith, R. E. Anderson, and T. Roy, "Chip-package resonance in core power supply structures for a high power microprocessor," in *Proc. IPACK*, 2001, pp. 1–6.
- [49] (2013). International Technology Roadmap for Semiconductors, accessed on May 12, 2015. [Online]. Available: http://www.itrs.net/LINKS/2013ITRS/Home2013.htm
- [50] Y. Chen, E. Kursun, D. Motschman, C. Johnson, and Y. Xie, "Analysis and mitigation of lateral thermal blockage effect of through-siliconvia in 3D IC designs," in *Proc. Int. Symp. Low Power Electron. Design (ISLPED)*, Aug. 2011, pp. 397–402.



Aida Todri-Sanial (M'03) received the B.S. degree in electrical engineering from Bradley University, Peoria, IL, USA, in 2001, the M.S. degree in electrical engineering from Long Beach State University, Long Beach, CA, USA, in 2003, and the Ph.D. degree in electrical and computer engineering from the University of California at Santa Barbara, Santa Barbara, CA, USA, in 2009.

She was an RD Engineer with the Fermi National Accelerator Laboratory, Batavia, IL, USA. She has held visiting positions with Mentor Graphics,

Wilsonville, OR, USA, Cadence Design Systems, San Jose, CA, USA, STMicroelectronics, Geneva, Switzerland, and the IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA. She is currently a Research Scientist with the French National Center of Scientific Research, Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier, Montpellier, France. She has authored over 100 papers in VLSI design area.

Dr. Todri-Sanial is a member of the Association for Computing Machinery. She was a recipient of the John Bardeen Fellowship in Engineering at the Fermi National Accelerator Laboratory in 2009. She served as the General Chair of ISVLSI 2015 and participates on the Technical Program Committee of the IEEE Design, Automation and Test in Europe Conference and Exhibition, the IEEE International Symposium on Quality Electronic Design, the IEEE International NEW Circuits and Systems, the IEEE ISVLSI, and the IEEE Great Lakes Symposium on VLSI. She also serves as a Technical Reviewer of the IEEE TRANSACTIONS ON COMPUTERS. Computer-Aided Design, the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I, the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II, the IEEE TRANSACTIONS ON NUCLEAR SCIENCE, and the Institution of Engineering and Technology. She is an Associate Editor of the IEEE TRANSACTIONS ON VLSI. She is also engaged with European agencies, such as the European Platform of Women Scientists and the European Association for Women in Science, Engineering and Technology.



Yuanqing Cheng (S'11–M'13) received the Ph.D. degree from the Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.

He spent one year in post-doctoral studies with the Centre National de la Recherche Scientifique, Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier, Montpellier, France. He joined Beihang University, Beijing, China, in 2013, as an Assistant Professor. His

current research interests include VLSI design for 3-D integrated circuits and spintronics computing system architecture.

Dr. Cheng is a member of the Association for Computing Machinery.