e032028.pdf

(74 KB) Pobierz
COMPUT ING
Practical Neural
Networks (2)
u
Part 2: Back Propagation Neural Nets
By Chris MacLeod and Grant Maxwell
Back Propagation (BP) Networks are the quintessential Neural Nets.
Probably eighty percent of nets used today are of this type. Actually
though, Back Propagation is the learning or training method, rather than
the network structure itself.
The network operates in the same way as
the type we’ve looked at in part 1 — you apply
the inputs and calculate an output exactly as
described. What the Back Propagation part
does, is allow you to change the weights, so
that the network learns and gives you the out-
put you want. The weights which the network
starts off with are simply set to small random
numbers (say between –1 and +1).
Inputs
For this
particular
input
pattern to
the network,
we would
like to get
this
output.
Targets
0 1
1 0
1 1
(the output you
want for each pattern)
020324 - 2 - 11
What is BP good for?
Back Propagation is excellent for simple pat-
tern recognition and mapping tasks. It learns
by example.
To give a typical application, we can train
a BP network for character recognition. All
you need to do is give it examples of the char-
acters, and what output we would like the
network to have, and it will learn from them,
see Figure 1 .
The algorithm works by calculating an
error — which is the amount by which the
output differs from an ideal value (chosen by
you, and called the Target), and then chang-
ing the weights to minimise this error. Once
the network is trained, it will correctly give
the output when a character is applied, even
if the character is distorted, imperfect or
noisy. In this case, because the Target has
two bits, we need two output neurons (one
for each bit). Each input and its associated
Ta r get is called a Training Pair.
Figure.1. Use of a BP network for image recognition.
What does a BP network
look like?
output neuron 2 gives a ‘1’ (and the
rest are zero).
This only really leaves the number
of neurons in the hidden layer to
decided on. Fortunately, networks
are quite flexible about this parame-
ter and will operate over a wide
range of hidden layer neurons;
although, the more patterns the net-
work needs to remember, the more
neurons you will need. In a network
designed to recognise all 26 letters
of the alphabet (26 output neurons)
on a 5× 7 grid (35 inputs), the net-
work will function with anywhere
between about 6 and 22 neurons. If
you have too few, then the network
hasn’t got enough weights to store
all the information in; if there are too
Figure 2 shows a BP network being
used for Pattern Recognition.
A common question is: How big
should the network be? We can see
from Figure 2 that the number of
inputs is fixed by the pattern we are
trying to process. In the case of four
pixels, there must be four inputs.
Likewise, the number of output neu-
rons is fixed by the number of pat-
terns we what to recognise. If we
had nine patterns we could either
use three output neurons and binary
code their outputs, or we could use
nine and assign them so that, for
example, when pattern 2 appears,
28
Elektor Electronics
2/2003
993438593.048.png 993438593.059.png 993438593.070.png 993438593.081.png 993438593.001.png 993438593.002.png 993438593.003.png 993438593.004.png 993438593.005.png 993438593.006.png 993438593.007.png 993438593.008.png 993438593.009.png 993438593.010.png
 
COMPUT ING
out
)
Input 1
2. Change output layer weights
W +
We’d like this
neuron to give a
“0” out.
= W
+
Error
out A
W +
=
0
1
Input 2
W
+
Error
out A
W +
= W
+
Error
out B
W +
=
1
0
We’d like this
neuron to give a
“1” out.
Input 3
W
+
Error
out B
W +
= W
+
Error
out C
W +
=
W
+
Error
out C
Input 4
Targets
3. Calculate (back-propagate) hidden layer
errors
Error A = out A (1 – out A ) (Error
020324 - 2 - 12
W
+ Error
Figure 2. A network wired for recognising patterns.
)
Error B = out B (1 – out B ) (Error
W
W
+ Error
)
Error C = out C (1 – out C ) (Error
W
many, it becomes inefficient and
prone to a problem called local min-
ima (discussed later).
present because of the effect of the
sigmoid function — if we were just
using a binary threshold, we would
omit it.
3. Change the weight. Let W + AB be
the new (trained) weight and W AB
be the original (untrained) weight:
W
+ Error
W
)
4. Change hidden layer weights
W +
A = W
A +
Error A in
The BP algorithm
Now let’s have a look at the training
algorithm itself. To do this, we’ll refer
to three neurons labelled A,B and C
in Figure 3 .
The weight that we’ll train is that
between neuron A and neuron B and
is labelled W AB in the diagram. The
diagram also shows another weight
— W AC — and we’ll return to that
one in a moment.
The algorithm works like this:
W +
A = W +
A +
Error A in
W +
B = W
B +
Error B in
W +
B = W +
B +
Error B in
W + AB = W AB + (Error B ×
W +
Output A )
C = W
C +
Error C in
W +
C = W +
C +
Error C in
Notice that we use the error of the
second neuron (B), but the output of
the feeding neuron (A).
4. Change all the other weights in
the output layer in this manner.
5. To c h ange the weights of the hid-
den layers you need to calculate an
error for the hidden neurons. We do
this by Back Propagating the errors of
the output neurons back. For exam-
ple, suppose we want to calculate the
error for neuron A. We use the errors
calculated for all the output neurons
attached to it, in this case B and C
and propagate them back — hence
the name of the algorithm.
(called the learning rate, and
nominally equal to one) is put in to speed up
or slow down the learning if required.
The constant
Using BP to train a network
Now that we’ve seen the algorithm in detail,
let’s look at how to use it. One of the most
common mistakes made when programming
a BP network for the first time is the order in
which you apply the patterns to the network.
Let us take an example. Suppose you wanted
to teach the network to recognise the first
four letters of the alphabet, placed on a 5?7
grid.
The correct way to train the network is to
apply the first letter, and then change all the
weights of the network once (i.e., do all the
calculations in Figure 4, once only). Then
apply the second pattern and do the same
again, then the third and finally the fourth.
Once you’ve gone through this cycle once
start all over again with pattern 1. Figure 5
shows the idea.
We stop the network when the total error
is low enough — that is, when the sum of all
the errors (the positive error from every neu-
ron, summed over every pattern) is below a
threshold. This threshold is usually set by the
user to be some arbitrary low number, like
0.1. In the example above the total error of the
network would be:
1. First, apply the inputs to the net-
work and calculate its outputs as
described last month in Part 1 (this
is the forward pass ).
2. Next, calculate the output error for
neuron B. The error is basically:
What you want - What you get .
What you want is your target and
what you get is your output. Mathe-
matically:
Error B = Output B * (1 – Output B ) *
(Target B – Output B )
Error A = Output A * (1 – Output A ) *
(Error B * W AB + Error C * W AC )
Again, the Output A * (1 – Output A )
serves the purpose noted in 2.
The term Output B * (1 – Output B ) is
6. Having obtained the errors for the
hidden layer neurons, we now pro-
ceed back to stage 3 and change
their weights.
W AB
A
B
W AC
Now this might be a little confusing,
so let’s show a full example, Figure
4 .
C
1. Calculate errors of output neurons
Error
020324 - 2 - 13
= out
(1 - out
) (Target
out
)
(Errors of all neurons in pattern 1) + (Pattern 2
errors) + (Pattern 3 errors) + (Pattern 4 errors)
Figure.3. Three neurons which are part of
a larger network.
Error
= out
(1 - out
) (Target
2/2003
Elektor Electronics
29
993438593.011.png 993438593.012.png 993438593.013.png 993438593.014.png 993438593.015.png 993438593.016.png 993438593.017.png 993438593.018.png 993438593.019.png 993438593.020.png 993438593.021.png 993438593.022.png 993438593.023.png 993438593.024.png 993438593.025.png
 
COMPUT ING
Before doing this, it is necessary to make all
the errors positive — we can do this by
squaring them.
The learning process is shown in the algo-
rithm below:
A
W ΩA
W
W λA
W
α
W ΩB
W
1. Apply first pattern, perform forward pass,
perform reverse pass.
2. Apply second pattern, perform forward
pass, perform reverse pass.
3. Apply third pattern, perform forward pass,
perform reverse pass.
4. Apply fourth pattern, perform forward
pass, perform reverse pass.
5. Test: is total error small enough? If yes,
then go to 6.
6. Go to 1.
7. Stop, network has trained.
B
W
W λB
λ
β
W ΩC
W
W
W λC
C
Inputs
Hidden layer
Outputs
020324 - 2 - 14
Figure 4. All the calculations for a complete reverse pass in a network.
A common mistake to make is running the pro-
gram on pattern one until the error is low, then
on pattern two and then on pattern three. If
you do this, then the network will only learn
the last pattern you’ve presented it with.
Once the network has learned, you can
apply any of the inputs to it (just apply the
input and run a forward pass with the trained
weights) and it should recognise them. We
can then use the network to recognise pat-
terns in a real system.
A more accurate way to train the network
is to use a validation set. This is similar to the
set of the patterns which you are training the
network with — but with noise or other
imperfections added. After the training set
has been applied, the validation set is run
through the network to check its performance
(we don’t use the validation set to change the
network weights). When the net has fully
trained both the validation set and the train-
ing set will give a low error. If you’re training
the network too much, then the validation set
error will increase as shown in Figure 6 .
Where, in addition to the variables
explained in part 1 of this course,
E(L,n) and T(L,n) are the errors and
targets respectively of layer L, neu-
ron n.
ing how this can be done:
1. Set up inputs and targets for net-
work (either in a file, or in arrays).
2. Randomise weights being used.
3. Apply first pattern, calculate net-
work output (forward pass) and
error, use error to change weights
(reverse pass) — once only. Keep a
note of the error.
4. Do the same for second pattern.
Add error to the running total from
pattern one.
Putting it all together
Now that we have algorithms for
both the forward and reverse pass
of the network, we can put them
together into a coherent whole.
Given below is a suggestion, show-
Listing 1
FOR x = first_output_neuron TO final_output_neuron_number
E(output_layer, x) = O(output_layer, x) * (1 -
O(output_layer, x) * (T(output_layer, x) -
O(output_layer, x))
NEXT x
Algorithms in software
In part 1 we discussed various ways of cod-
ing the network. One way was to store the
weights in a three dimensional array, with
indexes denoting the layer number, the neuron
number and the connection number. A suit-
able algorithm for a Back Propagation reverse
pass in such a network might be:
FOR L = number_of_layers TO 1 STEP –1
FOR n = 1 TO max_number_of_neurons
FOR c = 1 TO max_number_of_weights
W(L, n, c) = W(L, n, c) + E(L + 1, n) * O(L, c)
NEXT c
NEXT n
FOR n = 1 TO maximum_number_of_neurons
FOR c = 1 TO max_number_of_weights
E(L, n) = E(L, n) + E(L + 1, c) * W(L, c, n)
NEXT c
E(L, n) = E(L, n) * O(L, n) * (1 - O(L, n))
NEXT n
1. Initialise all unused weights, targets, errors
and outputs to zero
2. Calculate output errors, see Listing 1 , first
part.
3. Change weights, see Listing 1, second
part.
4. Calculate error of hidden layers, see List-
ing 1, third part.
NEXT L
30
Elektor Electronics
2/2003
993438593.026.png 993438593.027.png 993438593.028.png 993438593.029.png 993438593.030.png 993438593.031.png 993438593.032.png 993438593.033.png 993438593.034.png 993438593.035.png 993438593.036.png 993438593.037.png 993438593.038.png
 
COMPUT ING
when the image it is to recognise is the correct
size and placed in a central position on the
grid. It’s no good at, say, recognising a face in a
crowd — unless you can centre the face or
make the network ‘scan’ the picture until it falls
onto the face (and even then you still have to
make the face the correct size). In other words,
many problems need to be ‘pre-processed’
before being presented to the network.
So these networks need to operate in a
controlled environment, which means that
applications such as Optical Character
Recognition (OCR) are more suitable. They
have problems dealing with the crowded and
confusing real world.
Incidentally, the human brain solves this
problem by first identifying ‘features’ in an
image, for example, horizontal or vertical
lines and then integrating these progressively
into a whole image in a layered structure. So
if you can identify a horizontal line along the
top of an image and a vertical line down the
middle, you can integrate these to find the
letter T. This approach is more tolerant
because these features (the two lines) are
always present in T, no matter where its
placed in the image or what size it is.
When running your network, you may run
into problems with its training. The most
common is known as ‘local minima’. This
occurs because the algorithm always follows
the error downwards (it can’t cause a change
of weights which causes the error to
increase). But sometimes, as part of a down-
wards trend, the error must go up as shown
in Figure 7 . In this case the training gets
stuck and the weights can’t move out of the
local minima.
This problem doesn’t really effect small net-
works, but becomes a problem as the network
size increases. One solution is to add ‘momen-
tum’ to the network. This involves allowing
the change of weight to continue for some
time in a particular direction as shown below:
Calculate the error and
change all the weights
in the network once.
Change all the
weights again
Apply this
letter first.
Apply this
letter next
Apply this
letter 3 rd .
Change
weights
Change weights
and start again
at A
Finally apply
this letter.
020324 - 2 - 15
Figure 5. How a network learns four patterns.
Problems and additions
5. Repeat for all subsequent pat-
terns, keep running total of error.
6. If error is too great (network still
not fully trained) then zero running
total and go to 3, else go to 7.
7. Network is trained and ready to
be used, either use directly or
store trained weights in a file for
future use.
Although BP is a very useful and
simple algorithm, it does have some
problems and limitations. Let’s start
with its limitations.
BP is excellent for the sort of sim-
ple pattern recognition and mapping
tasks explained above and in the first
article. However, it only works well
Error measured
Error measured on
Network
error
on training set
validation set
Network
trained
020324 - 2 - 16
New_weight = Old_weight +
weight_change + Weight_change_from_pre-
vious_iteration.
Figure 6. Use of a validation set.
However, a simpler way to overcome this
problem (and several others which effect
training) is simply to monitor the training
progress of the network and if the error gets
‘stuck’ (does not decrease for some time),
reset the initial weights of the network to dif-
ferent random values and start training
again.
In next month’s instalment of this course,
we’ll have a look at networks which have
recurrent connections including the famous
‘Hopfield’ network.
Global Minima - the
lowest error (the weight
value you really want to
find)
Local Minima
Network
error
Weight
020324 - 2 - 17
Figure 7. Local minima.
(020324-2)
2/2003
Elektor Electronics
31
993438593.039.png 993438593.040.png 993438593.041.png 993438593.042.png 993438593.043.png 993438593.044.png 993438593.045.png 993438593.046.png 993438593.047.png 993438593.049.png 993438593.050.png 993438593.051.png 993438593.052.png 993438593.053.png 993438593.054.png 993438593.055.png 993438593.056.png 993438593.057.png 993438593.058.png 993438593.060.png 993438593.061.png 993438593.062.png 993438593.063.png 993438593.064.png 993438593.065.png 993438593.066.png 993438593.067.png 993438593.068.png 993438593.069.png 993438593.071.png 993438593.072.png 993438593.073.png 993438593.074.png 993438593.075.png 993438593.076.png 993438593.077.png 993438593.078.png 993438593.079.png 993438593.080.png 993438593.082.png
 
Zgłoś jeśli naruszono regulamin