Learning-Based Cell-Aware Defect Diagnosis of Customer Returns
Safa Mhamdi, Patrick Girard, Arnaud Virazel, Alberto Bosio, Aymen Ladhar

To cite this version:


HAL Id: lirmm-03035669
https://hal-lirmm.ccsd.cnrs.fr/lirmm-03035669
Submitted on 2 Dec 2020

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Learning-Based Cell-Aware Defect Diagnosis of Customer Returns

S. Mhamdi  P. Girard  A. Virazel
LIRMM, Univ. of Montpellier / CNRS
Montpellier, France
lastname@lirmm.fr

A. Bosio
INL, École Centrale de Lyon
France
alberto.bosio@ee-lyon.fr

A. Ladhar
STMicroelectronics
Crolles, France
aymen.ladhar@st.com

Abstract—In this paper, we propose a new framework for cell-aware defect diagnosis of customer returns based on supervised learning. The proposed method comprehensively deals with static and dynamic defects that may occur in real circuits. A Naive Bayes classifier is used to precisely identify defect candidates. Results obtained on benchmark circuits, and comparison with a commercial cell-aware diagnosis tool, demonstrate the efficiency of the proposed approach in terms of accuracy and resolution.

Keywords—Diagnosis, Customer Returns, Machine Learning

I. INTRODUCTION

The final goal for high-quality products, such as those used in the automotive market, is to have zero customer returns. A customer return is a circuit that passed a comprehensive test flow after manufacturing, but has failed on the customer’s side during its lifetime [1]. The main root causes of a customer return are often due to test escape during manufacturing test or latent defect mechanisms. So, the first step when a customer return occurs is to reproduce the failure mechanism with any original individual test and appropriate test conditions. In case of test escape, this task will abort and efforts have to be made to find new test patterns and test conditions (i.e. temperature and voltage) that will reveal the failure. In case of latent defect, the task will often succeed and a diagnosis program made of several routines is used to identify, step by step, the failing part and, finally, the suspected defects [2]. Each routine corresponds to the application of a diagnosis algorithm at a given hierarchy level. SoC level diagnosis is the first routine used to identify the core(s) in the SoC that can explain the failure [3]. Core level diagnosis (inter-cell diagnosis) is then used to identify the possible failing cells within the core(s) [4]. Cell-Aware (CA) diagnosis is finally used to pinpoint the possible defect candidates within the failing cell(s) [5].

Diagnosis is usually followed by PFA (Physical Failure Analysis), which is a time-consuming and destructive process for exposing the defect physically in order to characterize the failure mechanism. Due to the high cost and destructive nature of PFA, diagnosis resolution and accuracy are very critical. Unfortunately, diagnostic resolution is typically far from ideal due to the SoC complexity. As a result, many efforts have been dedicated recently for improving resolution by using machine learning techniques, primarily through the derivation of characteristics that enable correct candidates (candidates that correctly represent defect locations) to be distinguished from incorrect ones [6]–[8]. Though efficient, a common feature of these techniques is that they all address volume diagnosis for yield improvement, which is a different problem than fault diagnosis of customer returns [2]. Indeed, during volume diagnosis, numerous data collected during manufacturing test and subsequent diagnosis phases are available, such as, e.g. hundreds of similar failed chips with candidates correctly labeled (good, bad) obtained in a previous stage. It is therefore possible to use these data for failure diagnosis of a new failed chip. Conversely, during fault diagnosis of a customer return, only one failed chip is investigated, with no information about the defective behavior of some other similar chips used in the same conditions (application, environment, workload). For this reason, learning-guided approaches used for volume diagnosis cannot be reused for fault diagnosis of customer returns.

In [2], we proposed a learning-guided approach for CA diagnosis of mission mode failures in customer returns. Several supervised learning algorithms were considered in this work, with various levels of efficiency. Results obtained on benchmark circuits showed the feasibility and accuracy of this approach. However, only static defects modeled by stuck-at faults were assumed in this preliminary work. Recently, the previous work has been extended and a new CA diagnosis method was proposed [9]. We assumed more sophisticated (i.e. dynamic) defects and used a Bayesian classification method for predicting the nature (likelihood to be a good candidate) of each new data instance (defect) that has to be evaluated. We used cell-aware delay test sequences generated by a cell-aware ATPG assuming a Launch-On-Capture (LOC) testing scheme. Again, the efficiency of the proposed learning-based method for diagnosis of CA dynamic defects was demonstrated through comparison with a commercial tool.

In this paper, we propose a comprehensive method able to deal with all types of defects, i.e. static and dynamic, that may occur in customer returns. Though it may look simple, as just a combination of the two previous methods, proposing such a comprehensive method raised new problems and imposed setting up a new framework with specific rules to achieve the same high level of efficiency in terms of diagnosis accuracy and resolution. The proposed method is based on a Gaussian Naive Bayes (NB) trained model to predict good defect candidates. Results have been compared to a commercial CA diagnosis tool to show the superiority of our approach.

II. PROPOSED CELL-AWARE DIAGNOSIS FRAMEWORK

The proposed approach is based on supervised learning that takes a known set of input data and known responses (labeled data) used as training data, trains a model, and then implement a classifier based on this model to make predictions (inferences) for the response to new data. Fig. 1 depicts the two main steps of the supervised learning process used for CA dynamic defect diagnosis. As indicated at the beginning of the
paper, we use a Bayesian classification method for predicting the nature (likelihood to be a good candidate) of each new data instance. This choice comes from the results obtained in our previous study after experimenting several learning algorithms and observing their prediction accuracies [2]. So, the first main step consists in generating a Naïve Bayes (NB) model and to train it by using the training dataset. The second main step consists in constructing the NB classifier by using a Gaussian distribution to model the likelihood probability functions, and use this classifier to make probabilistic prediction (or inference) when a new data instance has to be evaluated. These two main steps are detailed in [9].

![Figure 1: Generic view of the diagnosis flow](image)

Training data are generated for each type of cell existing in the Circuit Under Diagnosis (CUD) during an off-line characterization process done only once for a given cell library. They are extracted from cell-aware views provided by a commercial CAD tool that contain all characterization results for a given cell type. These characterization results are provided in the form of a fault dictionary containing, for each defect within a cell, the cell input patterns detecting (or not) this defect.

![Figure 2: Generation flows of static and dynamic instance tables](image)

New data are composed of various instances, each of them being associated to one suspected cell in the CUD (customer return) and representing a features vector that characterizes the real behavior of the cell during test application. From each features vector, we can further extract one or more defect candidates that have to be classified as good or bad candidate with a corresponding probability to be the root cause of failure. New data are generated after post-processing of so-called instance tables describing the behaviour (pass / fail) of each suspected cell in presence of a real intra-cell defect (in one of the suspected cells) when a test pattern is applied to the cell. In order to deal with both static and dynamic defects, we need to generate and use static as well as dynamic instance tables to generate a new data instance for each suspected cell. The way to do that is illustrated in Fig. 2.

### III. EXPERIMENTAL RESULTS

We implemented our framework in a Python program. We conducted experiments on ITC’99 circuits synthesized in a full scan manner using a 28nm FDSOI technology from ST. A commercial CA ATPG tool was used to generate static and dynamic CA test sequences targeting maximum fault coverage for each circuit. For each circuit and the corresponding test set, we simulated the behavior of the tester by performing a defect injection campaign (about 2000 random injections per circuit) into a number of randomly selected cells and collecting test information to build the tester data log. For the defect injection campaign, we considered each transistor of the selected cells and we targeted all possible defects affecting that transistor.

#### Table 1. OVERALL DIAGNOSIS RESULTS

<table>
<thead>
<tr>
<th>Circuit</th>
<th>Accuracy Proposed</th>
<th>Accuracy Com. Tool</th>
<th>Resolution Proposed</th>
<th>Resolution Com. Tool</th>
</tr>
</thead>
<tbody>
<tr>
<td>b15</td>
<td>100%</td>
<td>100%</td>
<td>2.047</td>
<td>3.79</td>
</tr>
<tr>
<td>b17</td>
<td>100%</td>
<td>100%</td>
<td>7.20</td>
<td>11.146</td>
</tr>
<tr>
<td>b18</td>
<td>97.70%</td>
<td>98.90%</td>
<td>4.129</td>
<td>5.733</td>
</tr>
<tr>
<td>b19</td>
<td>98.90%</td>
<td>98.90%</td>
<td>1.818</td>
<td>2.857</td>
</tr>
<tr>
<td>b20</td>
<td>99.11%</td>
<td>99.11%</td>
<td>2.299</td>
<td>2.947</td>
</tr>
<tr>
<td>b22</td>
<td>99.12%</td>
<td>99.12%</td>
<td>3.746</td>
<td>4.823</td>
</tr>
</tbody>
</table>

Table I summarizes the results obtained on a set of ITC’99 benchmark circuits. The first part of the table is about accuracy and gives, for each circuit, the percentage of cases in which the injected defect was reported in the list of suspects provided by the proposed CA diagnosis and the commercial CA diagnosis tool respectively. As can be seen, for 4 out of 6 circuits, the commercial tool is unable to achieve 100% (achieved with our technique). The second part of the table is about resolution and gives, for each circuit and considering all injection campaigns, the average number of suspects reported by the proposed method and the commercial tool respectively. As can be seen, the resolution achieved with our method is always better. So, overall, these results confirm the superiority of our approach in terms of accuracy and resolution.

### ACKNOWLEDGEMENTS

This work has been funded by the French National Research Agency (ANR) under the framework of the ANR-17-CE24-0014-01 EDITSoC (Electrical Diagnosis for IoT SoCs in automotive) project.

### REFERENCES


