FUZZY K-NEAREST NEIGHBOR.pdf

(63 KB) Pobierz
Header for SPIE use
ADAPTATION OF THE FUZZY K-NEAREST NEIGHBOR
CLASSIFIER FOR MANUFACTURING AUTOMATION
Kenneth W. Tobin a , Shaun S. Gleason, Thomas P. Karnowski
Oak Ridge National Laboratory b , P.O.Box 2008, Oak Ridge, Tennessee 37831-6011
ABSTRACT
The use of supervised pattern recognition technologies for automation in the manufacturing environment requires the
development of systems that are easy to train and use. In general, these systems attempt to emulate an inspection or
measurement function typically performed by a manufacturing engineer or technician. This paper describes a self-optimizing
classification system for automatic decision making in the manufacturing environment. This classification system identifies
and labels unique distributions of product defects denoted as “signatures”. The technique relies on encapsulating human
experience through a teaching method to emulate the human response to various manufacturing situations. This has been
successfully accomplished through the adaptation and extension of a feature-based, fuzzy k-nearest neighbor (k-NN)
classifier that has been implemented in a pair-wise fashion. The classifier works with pair-wise combinations of the user-
defined classes so that a significant reduction in feature space and problem complexity can be achieved. This k-NN
implementation makes extensive use of hold-one-out results and fuzzy ambiguity information to optimize its performance. A
semiconductor manufacturing case study will be presented. The technique uses data collected from in-line optical inspection
tools to interpret and rapidly identify characteristic signatures that are uniquely associated with the manufacturing process.
The system then alerts engineers to probable yield-limiting conditions that require attention.
Keywords: fuzzy classifier, fuzzy ambiguity, k-nearest neighbor, parametric optimization, industrial automation.
semiconductor
1. INTRODUCTION
Image processing technologies are being applied more frequently in industrial applications today than ever before. The
availability of high-speed, robust, and relatively low cost computing platforms and dedicated hardware devices makes
computer vision technology an attractive automation approach to many applications. Once a computer vision system begins
generating data for a manufactured product, the available options for data analysis, reduction, and decision making are
generally limited to those tools developed in-house. In this paper we will describe a general classification approach to
analyze this data based on the fuzzy k-nearest neighbor (k-NN) classifier 1 implemented in a pair-wise fashion.
Pattern recognition methods are playing an increasingly important role in automation by reducing large amounts of
data to provide a useful decision for process control and/or product finishing. This system was developed for automating the
analysis and interpretation of wafer defect and electrical-test data collected in a semiconductor-manufacturing
environment. 2,3,4 Although the system has been developed specifically for the automated analysis of semiconductor wafer
product data, the application of the approach to other feature-based data measurements will be described as well.
1.1. Non-parametric Approach
As mentioned above, the fuzzy k-NN classifier system is feature-based. In the manufacturing environment, image data is
collected from in-line inspection tools and defect signatures are segmented for analysis. For the native semiconductor
application, a signature object is defined as a unique pattern of individual optical defects or electrical bin codes that were
generated by an errant process. Features are extracted from the segmented object and are sent to the classifier where a user-
defined class label is assigned to the result. The user-defined class result then indicates the specific tool or process that must
be corrected 5 .
a K.W.T. (Correspondence): Email: tobinkwjr@ornl.gov; WWW: http://www-ismv.ic.ornl.gov; Telephone: 423-574-8521; Fax: 423-574-
6663
b Work Performed for SEMATECH, Austin, Texas, under Contract No. ERD-95-1340 and prepared by OAK RIDGE NATIONAL
LABORATORY, Oak Ridge, Tennessee, 37831-6285, managed by LOCKHEED MARTIN ENERGY RESEARCH CORP. for the U.S.
DEPARTMENT OF ENERGY under contract DE-AC05-96OR22464.
755596581.018.png
For industrial pattern recognition problems non-parametric classifiers such as the classical k-NN 6 apply well since
information about the shape of the distribution of features in the multi-dimensional space of the classifier is not required. It is
difficult to ascertain a statistical parameterization for the large variety of class types encountered. Also, in an industrial
setting it is often required that the classifier system begins to classify new data with few training examples while providing
reasonable accuracy. Bayesian classifiers 7 and neural networks 8 generally require large sample populations to estimate the
appropriate statistics and are therefore difficult to implement in general for industrial applications. This is primarily due to
the diverse nature of the patterns that arise for different manufacturing processes and facilities, coupled with the length of
time required collecting large sample populations. Also, over the period of time required to collect large sample sets,
acceptable process variations can occur that confuse the boundaries between classes. The fuzzy k-NN classifier training set
can readily be maintained over time (e.g., by including and excluding examples based on time and date), can be modified
often, and can operate with relatively few examples for each class.
1.2. Fuzzy Classification
The ability to accommodate ambiguity in the training and test data necessitates a fuzzy approach for managing the classifier
training and test data. The fuzzy k-NN classifier assigns a membership value to the unlabeled signature that provides the
system with information suitable for estimating the confidence of the decision. The fuzzy membership describes what
fraction of an unlabeled signature resides in each of the defined classes. If the membership is relatively high for two classes
and low for three others, then there is a clear delineation between the first two classes and the other three, but there is
confusion within the first two classes. This data becomes important when ultimately assigning a crisp label to the signature.
For example, the classifier might assign a signature membership 0.8 to C1 , 0.75 to C2 , 0.2 to C3 , and 0.01 to C4 . For this
situation the signature would likely belong to C1 or C2 , but the ambiguity between the two would be high making a crisp
assignment difficult.
One of the benefits of using a fuzzy system is that an object can be assigned to the category unknown , which in
certain situations may provide a much greater advantage over crisply assigning the example to the wrong category. The
fuzzy k-NN classifier uses the training data and the subsequent fuzzy information derived from it, to dynamically set a
defuzzification threshold that accommodates labeling data as unknown . The defuzzification level is indirectly controlled by
the user specifying how much an incorrect decision is “worth” to the process. In many situations it may be advantageous to
set the defuzzification level such that more defects are classified as unknown then are classified incorrectly, i.e., if confidence
in a decision is low then a decision is not made. This is especially true in instances where classifying a defect incorrectly
could lead to greater economic impact or yield loss than requiring more manual re-inspection of unknown classifications.
2. THE FUZZY k-NN CLASSIFIER
The form of the fuzzy k-NN classifier adapted to this work is given by the following expression. The classifier assigns a
membership to a test vector according to the relationship, 1
(
)
k
-
1
2
/(
m
-
1
å -
m
1
/
x
-
x
ij
j
(1)
j
=
0
m
(
x
)
=
,
(
)
i
k
1
2
/(
m
-
1
å
1
/
x
-
x
j
j
0
=
where
( x ) i is the membership of the test vector x , to class i, || x - x j || is the L-norm distance between the test vector x , and the
k -th nearest neighbor vector x j , and m is a real number greater than 1.0 that sets the “strength” of the fuzzy distance function.
For the purposes of description in this paper, m will be defined as the fuzzy strength parameter. The value
m
m
ij is the
membership of the j -th neighbor to the i -th class and is determined a-priori from the training data. 1
The fuzzy k-NN classifier as defined by Eq. (1) is applied to the training data to determine an optimal subset of
features and an optimal value of the fuzzy strength parameter, m . The feature selection problem can be defined as one of
determining the values of a weight vector, w , whose members have values of 1 corresponding to “on” features and 0
corresponding to “off” features. The L-norm metric in Eq. (1) can be expressed as,
1
/
p
æ
ö
Q
-
1
p
(2)
å
ç
è
÷
ø
d
(
x
,
x
)
=
x
-
x
º
w
x
-
x
,
i
i
q
q
qi
q
=
0
755596581.019.png 755596581.020.png 755596581.021.png
where the index q selects the q-th feature of vector x , and p defines the L-norm distance metric used, e.g., p =1 is the
Manhattan distance, p =2 is the Euclidean distance, etc.
The fuzzy k-NN classifier is implemented in a pair-wise fashion for all defined classes in the training set so that a
reduction in the dimensionality of the feature-space can be achieved. The motivation for achieving a significant
dimensionality reduction in feature space is to improve classifier performance by discarding “noisy” features, i.e., those
features with little or no discriminatory power. By using only features intrinsic to the classification problem, a significant
performance improvement can be realized. From the point of view of a Bayesian classifier, performance cannot be degraded
through the addition of superfluous features. For real-world applications, though, the assumptions associated with a Bayes
classifier are rarely valid 9 . This effect can be explained for distance-based classifiers like k-NN by considering the L-norm
distance terms in Eq. (1), which, for the point of demonstration, can also be expressed as,
1
/
p
é
ù
(3)
å
å
p
p
d
(
x
,
x
)
=
x
-
x
=
(
x
-
x
)
+
(
y
-
y
)
.
ë
û
i
i
i
i
discri
min
ating
noise
It has been assumed in this equation that the class features can be segmented into two distinct groups: those that contribute to
the discrimination of classes, and those that do not. The discriminating terms in Eq. (3) will provide a statistically significant
contribution to a distance measure such as that in Eq. (1), while the noise terms will add a positive bias to the summation in a
random fashion, i.e., it will tend to wash-out the discriminatory power of those features that best represent the class. By
reducing the dimension of the feature space, it is intended that the noise term in Eq. (3) be mitigated. This discrimination is
achieved by determining the optimal elements of the weight vector, w , shown in Eq. (2), for each pair-wise comparison of the
training classes. The details of this feature selection process are given in Ref. [4].
2.1. Pair-wise Implementation
The pair-wise implementation of the fuzzy k-NN classifier has been designed to maximize the reduction in the number of
features required to correctly classify a test vector within a given pair of classes. A pair-wise comparison is performed for all
unique class pairs. There are N(N-1)/2 unique pair-wise combinations for N classes. The pair-wise fuzzy membership (a
matrix of values),
m ij , for a comparison of the test vector x , to the ij- th class, can be expressed by the following,
é
m
m
m
m
ù
æ
ö
æ
ö
æ
ö
æ
ö
0
0
0
0
ç
è
÷
ø
ç
è
÷
ø
ç
÷
ç
è
÷
ø
ê
ú
m
m
m
m
è
ø
ê
ú
1
2
3
4
ê
m
m
m
ú
æ
ö
æ
ö
æ
ö
1
1
1
m
(
x ij
)
=
ç
÷
ç
÷
ç
÷
"
i
=
0
,...
N
-
1
j
=
i
,...
N
-
1
,
(4)
ê
ú
m
m
m
è
ø
è
ø
è
ø
ê
ú
2
3
4
m
m
ê
æ
ö
æ
ö
ú
2
2
ç
è
÷
ø
ç
÷
ê
ú
m
m
è
ø
ë
û
3
4
where each element of
m ij is determined by applying Eq. (1) to the class pair ij . The fuzzy membership of the test vector for
each membership pair is then combined to give a composite membership of x to each class in the set, M i . This operation is
performed by summing and normalizing the membership pairs as follows,
å
é
m
0
0
ù
j
ê
ú
0
"
j
¹
ê
å
ú
0
1
m
1
j
ê
ú
M
(
x
)
=
"
i
=
0
,...
N
-
1
,
"
j
¹
1
(5)
i
ê
ú
N
-
1
ê
ú
å
0
m
ê
ú
(
N
-
1
j
ë
û
"
j
¹
N
-
1
0 ij selects the first membership element of
where the superscript on
m ij (note that this corresponds to the membership to the
current class, i , under consideration). The range for each membership value, M i , is [0,1], but that the sum of M i over all
m
755596581.001.png 755596581.002.png 755596581.003.png
classes, i , does not necessarily equal 1. It should also be noted that a crisp classification is assigned to the test vector, x , after
defuzzification of the membership values given in Eq. (5). The method of defuzzification is described in next section.
Although significant feature dimension reduction can be achieved by use of the pair-wise method described above,
an increase in computational expense is incurred. Where the fuzzy k-NN algorithm of Eq. (1) must be applied N times for an
N-class problem, the pair-wise fuzzy k-NN implementation requires N(N-1) applications to achieve the resulting
memberships given by Eq. (5). This increased computational burden has proven to be worth the benefit of improved
classifier performance as demonstrated in Ref. [4].
2.2. Defuzzification
For this application of the fuzzy k-NN classifier, extensive use is made of the hold-one-out (HOO) method 10 of estimating
system performance based on the training data. A training set composed of representative examples of all the user-defined
classes is used to optimize the classifier parameters, i.e., to determine the feature selection weight vector, w ij , and the fuzzy
strength parameter, m ij for each class pair, ij . Once determined, these values define the “trained” system that is put to work
classifying previously unseen data. In a fuzzy system, the classifier returns a membership value of each of the training
vectors to each of the defined classes. To calculate the HOO performance based on this training data, the fuzzy values are
made crisp by setting a defuzzification threshold. The HOO feature vector is crisply assigned to the class with the highest
membership, or, if the highest class membership is below the defuzzification threshold, the feature vector is assigned to the
unknown category.
The approach we have taken to determining the defuzzification value is to let the user define the relative “worth” of
assigning a feature vector to the category unknown versus a crisply defined, but incorrect class. This ultimately motivates the
system to favor unknown assignments over potentially wrong assignments if the economics of the decision warrants such
actions. This is implemented by having the user define what fraction of an unknown feature vector should count towards a
correct answer. This fraction is used in the determination of the system performance and can be expressed by the following
equation,
å
å
X
(
f
)
+
l
×
X
(
f
)
P
f
l
correct
unknown
(5)
(
,
)
º
,
å
X
(
f
)
total
where P is the estimate of total system performance, f is the unknown defuzzification threshold, and l is the value (i.e.,
“worth”) assigned to an unknown classification. X in this equation simply indicates a correct , unknown , or incorrect class
label. The defuzzification threshold is determined by adjusting f
to find the maximum performance, P , for all possible
defuzzification values. A simulation of the typical performance
curve is shown in Figure 1, where it is observed that as more
credit is given for unknown classifications, the HOO
performance increases by having unknown classifications add to
the overall performance (as noted by the hump in the curve). As
the value of
P(f max )
P
(specified by the user) is increased, the system has
a higher propensity to classify vectors as unknown causing the
peak in the curve to reach a higher maximum value. As l moves
towards zero, the unknown classifications add little or nothing to
the performance and the curve falls off as the defuzzification
level increases.
l
f
Figure 1 – Graphical representation of
performance versus defuzzification threshold.
2.3. Optimization
Optimization of the classifier algorithm is achieved by iterating through the process of selecting a locally optimal feature
selection vector, w ij , and fuzzy strength parameter, m ij , for each class pair, ij . Figure 2 schematically shows the optimization
process.
The process begins by ranking the features that correspond to the training data input by the user and by specifying a
constant initial fuzzy strength value for all class pairs. The ranking process is deterministic for the feature set and is not
repeated as the algorithm loops through the training process. After feature-ranking, an estimate of classifier HOO
755596581.004.png 755596581.005.png 755596581.006.png
performance is generated for the ranked features beginning with the best feature, the best two features, best three features,
etc., until a list of performance versus weight vectors is derived for every pair-wise class comparison. From this list the best
weight vector, w ij , is determined for each class pair, ij . Once the best features have been selected, the HOO performance is
generated again as the system increments the fuzzy strength parameter, m ij , for each class pair, ij . The optimally performing
values for m ij are then selected to represent each class pair. This completes the first iteration through the training procedure.
The overall system HOO performance at iteration n is then compared to the performance at iteration n-1. If the change in
performance is greater than a prescribed limit,
e
, then another iteration is initiated through the algorithm. If
e
is less than the
prescribed limit, then the classifier is considered
trained and the values of w ij , and m ij are
maintained.
Rank features for each class pair, ij
n = 0
A trained classifier therefore consists of
the training vectors, the optimal weight vectors,
and the optimal fuzzy strength parameters. If new
vectors are later added to the training set the
weight vector and fuzzy strength need to be
updated.
Generate performance versus weight
vector, w ij , for each class pair, ij
Select optimal weight
Generate performance versus fuzzy
strength, m ij , for each class pair, ij
3. RESULTS
The results of applying the fuzzy k-NN classifier
system will be described for two separate testing
environments. In Section 3.1, results are
described for a system that collected data in a
semiconductor-manufacturing site over a period of
several months. In Section 3.2, results are
presented for application of the classifier system
to several documented classification data sets.
n = n + 1
Select optimal fuzzy strength
Calculate total classifier
performance, P n , for iteration n
Trained classifier
w ij , m ij , P n
NO
(P n - P n-1 )/P n <
YES
e
3.1. Semiconductor Wafermap Analysis
As mentioned in the introduction, the fuzzy k-NN
classifier system was developed for application to
automating wafermap analysis in the
semiconductor industry. The system has been
tested at several manufacturing sites on wafermap
data collected from in-line optical microscopy and
laser-scattering inspection tools. The classifier system, as designed for semiconductor wafermap analysis, performs a
segmentation of the various wafermap signatures that may occur during manufacturing. These signatures are grouped into
four elemental categories depending on their distribution characteristics and clustering morphology. The description of these
categories and of the architecture of the overall spatial signature analysis system is outside the scope of this paper but can be
found in Ref. [2-5]. The classifier operates on each of these elemental categories individually, i.e., there are four separate
classifiers within the system along with four sets of training data. The approach is hierarchical and provides the analysis
system with an ability to describe a broad variety of signature events using one of several sets of feature descriptors, i.e. there
are different features defined for each elemental category. The system responds by assigning a user-defined label (or a
classification of unknown ) for each signature analyzed. The results presented herein are for application of the wafermap
analysis system to one manufacturing environment for field testing over a period of several months.
?
Figure 2 – Schematic representation of pair-wise classifier
optimization process.
The system software was installed on-site at a manufacturer and run as a background process over a period of three
months. During this time, the system analyzed all wafermaps associated with a particular product. The statistics for the
training and test data sets are shown in Table 1 for two separate test cases. Over the time period the system analyzed and
classified 926 signatures with a correct classification rate of 79.1% as shown for Test Set No. 1 in the table. The wafermap
analyzer also builds a composite map for each lot (group of wafers) prior to segmentation and classification so that subtle
defect distributions can be detected and classified. The results of analyzing the composite wafermaps is shown as Test Set
No. 2 with a correct classification rate of 85.1%. These performance estimates are based on a comparison of the classifier
results to results generated manually by human experts who work in the field.
755596581.007.png 755596581.008.png 755596581.009.png 755596581.010.png 755596581.011.png 755596581.012.png 755596581.013.png 755596581.014.png 755596581.015.png 755596581.016.png 755596581.017.png
Zgłoś jeśli naruszono regulamin