FUZZY K-NEAREST NEIGHBOR.pdf

Header for SPIE use

ADAPTATION OF THE FUZZY K-NEAREST NEIGHBOR

CLASSIFIER FOR MANUFACTURING AUTOMATION

Kenneth W. Tobin a , Shaun S. Gleason, Thomas P. Karnowski

Oak Ridge National Laboratory b , P.O.Box 2008, Oak Ridge, Tennessee 37831-6011

ABSTRACT

The use of supervised pattern recognition technologies for automation in the manufacturing environment requires the

development of systems that are easy to train and use. In general, these systems attempt to emulate an inspection or

measurement function typically performed by a manufacturing engineer or technician. This paper describes a self-optimizing

classification system for automatic decision making in the manufacturing environment. This classification system identifies

and labels unique distributions of product defects denoted as “signatures”. The technique relies on encapsulating human

experience through a teaching method to emulate the human response to various manufacturing situations. This has been

successfully accomplished through the adaptation and extension of a feature-based, fuzzy k-nearest neighbor (k-NN)

classifier that has been implemented in a pair-wise fashion. The classifier works with pair-wise combinations of the user-

defined classes so that a significant reduction in feature space and problem complexity can be achieved. This k-NN

implementation makes extensive use of hold-one-out results and fuzzy ambiguity information to optimize its performance. A

semiconductor manufacturing case study will be presented. The technique uses data collected from in-line optical inspection

tools to interpret and rapidly identify characteristic signatures that are uniquely associated with the manufacturing process.

The system then alerts engineers to probable yield-limiting conditions that require attention.

Keywords: fuzzy classifier, fuzzy ambiguity, k-nearest neighbor, parametric optimization, industrial automation.

semiconductor

1. INTRODUCTION

Image processing technologies are being applied more frequently in industrial applications today than ever before. The

availability of high-speed, robust, and relatively low cost computing platforms and dedicated hardware devices makes

computer vision technology an attractive automation approach to many applications. Once a computer vision system begins

generating data for a manufactured product, the available options for data analysis, reduction, and decision making are

generally limited to those tools developed in-house. In this paper we will describe a general classification approach to

analyze this data based on the fuzzy k-nearest neighbor (k-NN) classifier 1 implemented in a pair-wise fashion.

Pattern recognition methods are playing an increasingly important role in automation by reducing large amounts of

data to provide a useful decision for process control and/or product finishing. This system was developed for automating the

analysis and interpretation of wafer defect and electrical-test data collected in a semiconductor-manufacturing

environment. 2,3,4 Although the system has been developed specifically for the automated analysis of semiconductor wafer

product data, the application of the approach to other feature-based data measurements will be described as well.

1.1. Non-parametric Approach

As mentioned above, the fuzzy k-NN classifier system is feature-based. In the manufacturing environment, image data is

collected from in-line inspection tools and defect signatures are segmented for analysis. For the native semiconductor

application, a signature object is defined as a unique pattern of individual optical defects or electrical bin codes that were

generated by an errant process. Features are extracted from the segmented object and are sent to the classifier where a user-

defined class label is assigned to the result. The user-defined class result then indicates the specific tool or process that must

be corrected 5 .

a K.W.T. (Correspondence): Email: tobinkwjr@ornl.gov; WWW: http://www-ismv.ic.ornl.gov; Telephone: 423-574-8521; Fax: 423-574-

6663

b Work Performed for SEMATECH, Austin, Texas, under Contract No. ERD-95-1340 and prepared by OAK RIDGE NATIONAL

LABORATORY, Oak Ridge, Tennessee, 37831-6285, managed by LOCKHEED MARTIN ENERGY RESEARCH CORP. for the U.S.

DEPARTMENT OF ENERGY under contract DE-AC05-96OR22464.

For industrial pattern recognition problems non-parametric classifiers such as the classical k-NN 6 apply well since

information about the shape of the distribution of features in the multi-dimensional space of the classifier is not required. It is

difficult to ascertain a statistical parameterization for the large variety of class types encountered. Also, in an industrial

setting it is often required that the classifier system begins to classify new data with few training examples while providing

reasonable accuracy. Bayesian classifiers 7 and neural networks 8 generally require large sample populations to estimate the

appropriate statistics and are therefore difficult to implement in general for industrial applications. This is primarily due to

the diverse nature of the patterns that arise for different manufacturing processes and facilities, coupled with the length of

time required collecting large sample populations. Also, over the period of time required to collect large sample sets,

acceptable process variations can occur that confuse the boundaries between classes. The fuzzy k-NN classifier training set

can readily be maintained over time (e.g., by including and excluding examples based on time and date), can be modified

often, and can operate with relatively few examples for each class.

1.2. Fuzzy Classification

The ability to accommodate ambiguity in the training and test data necessitates a fuzzy approach for managing the classifier

training and test data. The fuzzy k-NN classifier assigns a membership value to the unlabeled signature that provides the

system with information suitable for estimating the confidence of the decision. The fuzzy membership describes what

fraction of an unlabeled signature resides in each of the defined classes. If the membership is relatively high for two classes

and low for three others, then there is a clear delineation between the first two classes and the other three, but there is

confusion within the first two classes. This data becomes important when ultimately assigning a crisp label to the signature.

For example, the classifier might assign a signature membership 0.8 to C1 , 0.75 to C2 , 0.2 to C3 , and 0.01 to C4 . For this

situation the signature would likely belong to C1 or C2 , but the ambiguity between the two would be high making a crisp

assignment difficult.

One of the benefits of using a fuzzy system is that an object can be assigned to the category unknown , which in

certain situations may provide a much greater advantage over crisply assigning the example to the wrong category. The

fuzzy k-NN classifier uses the training data and the subsequent fuzzy information derived from it, to dynamically set a

defuzzification threshold that accommodates labeling data as unknown . The defuzzification level is indirectly controlled by

the user specifying how much an incorrect decision is “worth” to the process. In many situations it may be advantageous to

set the defuzzification level such that more defects are classified as unknown then are classified incorrectly, i.e., if confidence

in a decision is low then a decision is not made. This is especially true in instances where classifying a defect incorrectly

could lead to greater economic impact or yield loss than requiring more manual re-inspection of unknown classifications.

2. THE FUZZY k-NN CLASSIFIER

The form of the fuzzy k-NN classifier adapted to this work is given by the following expression. The classifier assigns a

membership to a test vector according to the relationship, 1

(

)

å -

(1)

(

)

(

)

where

( x ) i is the membership of the test vector x , to class i, || x - x j || is the L-norm distance between the test vector x , and the

k -th nearest neighbor vector x j , and m is a real number greater than 1.0 that sets the “strength” of the fuzzy distance function.

For the purposes of description in this paper, m will be defined as the fuzzy strength parameter. The value

ij is the

membership of the j -th neighbor to the i -th class and is determined a-priori from the training data. 1

The fuzzy k-NN classifier as defined by Eq. (1) is applied to the training data to determine an optimal subset of

features and an optimal value of the fuzzy strength parameter, m . The feature selection problem can be defined as one of

determining the values of a weight vector, w , whose members have values of 1 corresponding to “on” features and 0

corresponding to “off” features. The L-norm metric in Eq. (1) can be expressed as,

(2)

(

)

where the index q selects the q-th feature of vector x , and p defines the L-norm distance metric used, e.g., p =1 is the

Manhattan distance, p =2 is the Euclidean distance, etc.

The fuzzy k-NN classifier is implemented in a pair-wise fashion for all defined classes in the training set so that a

reduction in the dimensionality of the feature-space can be achieved. The motivation for achieving a significant

dimensionality reduction in feature space is to improve classifier performance by discarding “noisy” features, i.e., those

features with little or no discriminatory power. By using only features intrinsic to the classification problem, a significant

performance improvement can be realized. From the point of view of a Bayesian classifier, performance cannot be degraded

through the addition of superfluous features. For real-world applications, though, the assumptions associated with a Bayes

classifier are rarely valid 9 . This effect can be explained for distance-based classifiers like k-NN by considering the L-norm

distance terms in Eq. (1), which, for the point of demonstration, can also be expressed as,

(3)

(

)

(

)

(

)

discri

min

ating

noise

It has been assumed in this equation that the class features can be segmented into two distinct groups: those that contribute to

the discrimination of classes, and those that do not. The discriminating terms in Eq. (3) will provide a statistically significant

contribution to a distance measure such as that in Eq. (1), while the noise terms will add a positive bias to the summation in a

random fashion, i.e., it will tend to wash-out the discriminatory power of those features that best represent the class. By

reducing the dimension of the feature space, it is intended that the noise term in Eq. (3) be mitigated. This discrimination is

achieved by determining the optimal elements of the weight vector, w , shown in Eq. (2), for each pair-wise comparison of the

training classes. The details of this feature selection process are given in Ref. [4].

2.1. Pair-wise Implementation

The pair-wise implementation of the fuzzy k-NN classifier has been designed to maximize the reduction in the number of

features required to correctly classify a test vector within a given pair of classes. A pair-wise comparison is performed for all

unique class pairs. There are N(N-1)/2 unique pair-wise combinations for N classes. The pair-wise fuzzy membership (a

matrix of values),

m ij , for a comparison of the test vector x , to the ij- th class, can be expressed by the following,

(

x ij

)

,...

(4)

where each element of

m ij is determined by applying Eq. (1) to the class pair ij . The fuzzy membership of the test vector for

each membership pair is then combined to give a composite membership of x to each class in the set, M i . This operation is

performed by summing and normalizing the membership pairs as follows,

(

)

,...

(5)

(

0 ij selects the first membership element of

where the superscript on

m ij (note that this corresponds to the membership to the

current class, i , under consideration). The range for each membership value, M i , is [0,1], but that the sum of M i over all

classes, i , does not necessarily equal 1. It should also be noted that a crisp classification is assigned to the test vector, x , after

defuzzification of the membership values given in Eq. (5). The method of defuzzification is described in next section.

Although significant feature dimension reduction can be achieved by use of the pair-wise method described above,

an increase in computational expense is incurred. Where the fuzzy k-NN algorithm of Eq. (1) must be applied N times for an

N-class problem, the pair-wise fuzzy k-NN implementation requires N(N-1) applications to achieve the resulting

memberships given by Eq. (5). This increased computational burden has proven to be worth the benefit of improved

classifier performance as demonstrated in Ref. [4].

2.2. Defuzzification

For this application of the fuzzy k-NN classifier, extensive use is made of the hold-one-out (HOO) method 10 of estimating

system performance based on the training data. A training set composed of representative examples of all the user-defined

classes is used to optimize the classifier parameters, i.e., to determine the feature selection weight vector, w ij , and the fuzzy

strength parameter, m ij for each class pair, ij . Once determined, these values define the “trained” system that is put to work

classifying previously unseen data. In a fuzzy system, the classifier returns a membership value of each of the training

vectors to each of the defined classes. To calculate the HOO performance based on this training data, the fuzzy values are

made crisp by setting a defuzzification threshold. The HOO feature vector is crisply assigned to the class with the highest

membership, or, if the highest class membership is below the defuzzification threshold, the feature vector is assigned to the

unknown category.

The approach we have taken to determining the defuzzification value is to let the user define the relative “worth” of

assigning a feature vector to the category unknown versus a crisply defined, but incorrect class. This ultimately motivates the

system to favor unknown assignments over potentially wrong assignments if the economics of the decision warrants such

actions. This is implemented by having the user define what fraction of an unknown feature vector should count towards a

correct answer. This fraction is used in the determination of the system performance and can be expressed by the following

equation,

(

)

(

)

correct

unknown

(5)

(

)

(

)

total

where P is the estimate of total system performance, f is the unknown defuzzification threshold, and l is the value (i.e.,

“worth”) assigned to an unknown classification. X in this equation simply indicates a correct , unknown , or incorrect class

label. The defuzzification threshold is determined by adjusting f

to find the maximum performance, P , for all possible

defuzzification values. A simulation of the typical performance

curve is shown in Figure 1, where it is observed that as more

credit is given for unknown classifications, the HOO

performance increases by having unknown classifications add to

the overall performance (as noted by the hump in the curve). As

the value of

P(f max )

(specified by the user) is increased, the system has

a higher propensity to classify vectors as unknown causing the

peak in the curve to reach a higher maximum value. As l moves

towards zero, the unknown classifications add little or nothing to

the performance and the curve falls off as the defuzzification

level increases.

Figure 1 – Graphical representation of

performance versus defuzzification threshold.

2.3. Optimization

Optimization of the classifier algorithm is achieved by iterating through the process of selecting a locally optimal feature

selection vector, w ij , and fuzzy strength parameter, m ij , for each class pair, ij . Figure 2 schematically shows the optimization

process.

The process begins by ranking the features that correspond to the training data input by the user and by specifying a

constant initial fuzzy strength value for all class pairs. The ranking process is deterministic for the feature set and is not

repeated as the algorithm loops through the training process. After feature-ranking, an estimate of classifier HOO

performance is generated for the ranked features beginning with the best feature, the best two features, best three features,

etc., until a list of performance versus weight vectors is derived for every pair-wise class comparison. From this list the best

weight vector, w ij , is determined for each class pair, ij . Once the best features have been selected, the HOO performance is

generated again as the system increments the fuzzy strength parameter, m ij , for each class pair, ij . The optimally performing

values for m ij are then selected to represent each class pair. This completes the first iteration through the training procedure.

The overall system HOO performance at iteration n is then compared to the performance at iteration n-1. If the change in

performance is greater than a prescribed limit,

, then another iteration is initiated through the algorithm. If

is less than the

prescribed limit, then the classifier is considered

trained and the values of w ij , and m ij are

maintained.

Rank features for each class pair, ij

n = 0

A trained classifier therefore consists of

the training vectors, the optimal weight vectors,

and the optimal fuzzy strength parameters. If new

vectors are later added to the training set the

weight vector and fuzzy strength need to be

updated.

Generate performance versus weight

vector, w ij , for each class pair, ij

Select optimal weight

Generate performance versus fuzzy

strength, m ij , for each class pair, ij

3. RESULTS

The results of applying the fuzzy k-NN classifier

system will be described for two separate testing

environments. In Section 3.1, results are

described for a system that collected data in a

semiconductor-manufacturing site over a period of

several months. In Section 3.2, results are

presented for application of the classifier system

to several documented classification data sets.

n = n + 1

Select optimal fuzzy strength

Calculate total classifier

performance, P n , for iteration n

Trained classifier

w ij , m ij , P n

(P n - P n-1 )/P n <

YES

3.1. Semiconductor Wafermap Analysis

As mentioned in the introduction, the fuzzy k-NN

classifier system was developed for application to

automating wafermap analysis in the

semiconductor industry. The system has been

tested at several manufacturing sites on wafermap

data collected from in-line optical microscopy and

laser-scattering inspection tools. The classifier system, as designed for semiconductor wafermap analysis, performs a

segmentation of the various wafermap signatures that may occur during manufacturing. These signatures are grouped into

four elemental categories depending on their distribution characteristics and clustering morphology. The description of these

categories and of the architecture of the overall spatial signature analysis system is outside the scope of this paper but can be

found in Ref. [2-5]. The classifier operates on each of these elemental categories individually, i.e., there are four separate

classifiers within the system along with four sets of training data. The approach is hierarchical and provides the analysis

system with an ability to describe a broad variety of signature events using one of several sets of feature descriptors, i.e. there

are different features defined for each elemental category. The system responds by assigning a user-defined label (or a

classification of unknown ) for each signature analyzed. The results presented herein are for application of the wafermap

analysis system to one manufacturing environment for field testing over a period of several months.

Figure 2 – Schematic representation of pair-wise classifier

optimization process.

The system software was installed on-site at a manufacturer and run as a background process over a period of three

months. During this time, the system analyzed all wafermaps associated with a particular product. The statistics for the

training and test data sets are shown in Table 1 for two separate test cases. Over the time period the system analyzed and

classified 926 signatures with a correct classification rate of 79.1% as shown for Test Set No. 1 in the table. The wafermap

analyzer also builds a composite map for each lot (group of wafers) prior to segmentation and classification so that subtle

defect distributions can be detected and classified. The results of analyzing the composite wafermaps is shown as Test Set

No. 2 with a correct classification rate of 85.1%. These performance estimates are based on a comparison of the classifier

results to results generated manually by human experts who work in the field.

Plik z chomika:

Inne pliki z tego folderu:

Inne foldery tego chomika: