Neural Networks
- Processing Elements
- An array of inputs, with a weight associated with each input.
- An intermediate value computed from the input values and weights.
The computation performed depends on the architecture of the network, often the sum of
products.
- An output which is some (activation) function of the intermediate value.
Often sigmoid, or step, or sign function.
- Inputs come either from other PEs, or from an external source.
- Layers or slabs
- The PEs are arranged into layers, or slabs.
- The input layer accepts input from an external source
- The output layer produces output for external consumption
- Hidden layers lie between the input and output layers
- There may or may not be a spatial relationship between the PEs in a layer.
- If each PE in a layer has an input from every PE in the previous layer, then the
network is fully feedforward connected.
- Connections between non-adjacent layers may be used.
- Learning
- Supervised learning architectures
- Learn functions from input to output, based on example input and output vector
pairs.
- Once trained can approximate the function for inputs not previously encountered.
- Unsupervised learning architectures
- Learn to differentiate between different input vectors.
- Kohonen/Counter-Propagation Networks
- There is an input layer, a hidden layer called the Kohonen layer, and an output layer
called the Grossberg layer
like this.
- The network is fully feedforward connected.
- Firstly the Kohonen layer is trained in an unsupervised manner.
This trains the PEs in the layer to differentiate between different input vectors.
- The second phase trains the Grossberg layer in a supervised manner.
This trains the Grossberg layer to associate an output vector with each recognised input
vector.
- Once trained, the network will output an appropriate output vector for any given
input vector.
- Kohonen layer
- The PEs in the Kohonen layer may have a spatial relationship, e.g. rectangular
lattice, triangular lattice, hexagonal lattice.
- Intermediate value =
sqrt(SUMinputs(Weight - Input)2)
This is the Euclidian distance from the weight vector to the input vector.
- Activation function is
f(Intermediate) = 1 if Intermediate is minimum over PEs,
f(Intermediate) = 0 otherwise
This is a form of step function.
- The weights are initially set randomly.
- Training the Kohonen Layer
- A sequence of typical input vectors are handed to the input layer, which distributes
these values to the Kohonen layer PEs.
- Each Kohonen layer PE works out the intermediate value (distance) from its weights
to the input vector.
- The activation function determines which PE which is closest to the input, and is
declared the "winner".
- The winning PE (and those close by in the spatial relationship if any) have their
weights moved towards the input vector:
weightkohonen,winner#,input# +=
alpha*(inputkohonen,winner#,input# -
weightkohonen,winner#,input#)
where alpha is the learning rate for winners.
- The other PEs (losers) have their weights moved towards the input vector:
weightkohonen,loser#,input# +=
beta*(inputkohonen,loser#,input# -
weightkohonen,loser#,input#)
where beta is the learning rate for losers.
Beta is typically very small, often 0.
- After sufficient training, the PE's weights will be distributed about the input
space, with higher density in the areas of frequent input.
Each PE recognises a piece of the input space.
- The Grossberg layer
- Intermediate value = SUMinputs(Weight * Input).
This is the sum of products.
- Activation function is f(Intermediate) = Intermediate
- The weights are initially set randomly.
- Training the Grossberg layer
- A sequence of typical input vectors and expected output vectors are used.
- The input values are distributed to the Kohonen layer as usual, and the winner
calulated.
- The output values from the Kohonen layer are transmitted to the Grossberg layer.
- The output from each Grossberg layer PE is the weight corresponding to the input
value 1 from the Kohonen layer.
- The weights in Grossberg layer PEs are then modified:
weightgrossberg,pe#,input# +=
inputgrossberg,pe#,input# *
(gamma*(expected_outputpe# -
outputgrossberg,pe#,input#))
where gamma is the learning rate for the Grossberg layer.
- After sufficient training, the vector made from the Grossberg layer PE's outputs
will approximate the output vector for the piece of input space recognised by the
ith Kohonen layer PE.
- Example application: Weather types and intelligent reactions
- Backpropagation
- There is an input layer, a hidden layer, and an output layer,
like this.
- Intermediate value = SUMinputs(Weight * Input).
This is the sum of products.
- Activation function is often the sigmoid: f(x) = 1/(1 + exp(-x))
- The network might or might not be fully feedforward connected.
- The weights are initially set randomly.
- The network is trained in a supervised manner, to learn a function from input vectors to
output vectors (the output vectors are formed from the outputs from the output layer
PEs).
This is done by adjusting the weights in the network.
- Once trained, the network is used by presenting an input vector.
- Training
- A sequence of typical input and output vector pairs are presented to the network,
and the PE outputs calculated.
- The output layer weights are then updated:
deltaoutput,pe =
f'(Intermediateoutput,pe) *
(expected_outputoutput,pe -
outputoutput,pe)
weightoutput,pe,i +=
alpha*deltaoutput,pe*inputoutput,pe,i
where the first derivative of the sigmoid function is used to train the network:
f'(x) = f(x)*(1-f(x)), and
alpha is the learning rate for the output layer.
- The hidden layer weights are then updated, by propagating the errors from the output
layer back to the hidden layer:
deltahidden,pe =
f'(Intermediatehidden,pe) *
SUMi(deltaoutput,i*weightoutput,i,pe)
weighthidden,pe,i +=
beta*deltahidden,pe*inputhidden,i
where beta is the learning rate for the hidden layer.
- Example application: Stock market analysis
Exam Style Questions
- Describe the structure and operation of a processing element.
- Explain what is meant by a neural network being "fully feedforward connected".
- Explain the difference between supervised and unsupervised learning in a neural network.
- Describe the architecture of a Kohonen/counter propagation network.
What (in general terms) are the tasks of each of the layers?
- Give details of how the output and hidden layers of a backpropogation network are trained.
After sufficient training, what has a backpropogation network learned?