Neural Networks

Processing Elements
- An array of inputs, with a weight associated with each input.
- An intermediate value computed from the input values and weights. The computation performed depends on the architecture of the network, often the sum of products.
- An output which is some (activation) function of the intermediate value. Often sigmoid, or step, or sign function.
- Inputs come either from other PEs, or from an external source.
- Layers or slabs
  - The PEs are arranged into layers, or slabs.
  - The input layer accepts input from an external source
  - The output layer produces output for external consumption
  - Hidden layers lie between the input and output layers
  - There may or may not be a spatial relationship between the PEs in a layer.
  - If each PE in a layer has an input from every PE in the previous layer, then the network is fully feedforward connected.
  - Connections between non-adjacent layers may be used.
- Learning
  - Supervised learning architectures
    - Learn functions from input to output, based on example input and output vector pairs.
    - Once trained can approximate the function for inputs not previously encountered.
  - Unsupervised learning architectures
    - Learn to differentiate between different input vectors.
Kohonen/Counter-Propagation Networks
- There is an input layer, a hidden layer called the Kohonen layer, and an output layer called the Grossberg layer like this.
- The network is fully feedforward connected.
- Firstly the Kohonen layer is trained in an unsupervised manner. This trains the PEs in the layer to differentiate between different input vectors.
- The second phase trains the Grossberg layer in a supervised manner. This trains the Grossberg layer to associate an output vector with each recognised input vector.
- Once trained, the network will output an appropriate output vector for any given input vector.
- Kohonen layer
  - The PEs in the Kohonen layer may have a spatial relationship, e.g. rectangular lattice, triangular lattice, hexagonal lattice.
  - Intermediate value = sqrt(SUM_inputs(Weight - Input)²)
    This is the Euclidian distance from the weight vector to the input vector.
  - Activation function is f(Intermediate) = 1 if Intermediate is minimum over PEs,
    f(Intermediate) = 0 otherwise
    This is a form of step function.
  - The weights are initially set randomly.
- Training the Kohonen Layer
  - A sequence of typical input vectors are handed to the input layer, which distributes these values to the Kohonen layer PEs.
  - Each Kohonen layer PE works out the intermediate value (distance) from its weights to the input vector.
  - The activation function determines which PE which is closest to the input, and is declared the "winner".
  - The winning PE (and those close by in the spatial relationship if any) have their weights moved towards the input vector:
    weight_{kohonen,winner#,input#} += alpha*(input_{kohonen,winner#,input#} - weight_{kohonen,winner#,input#})
    where alpha is the learning rate for winners.
  - The other PEs (losers) have their weights moved towards the input vector:
    weight_{kohonen,loser#,input#} += beta*(input_{kohonen,loser#,input#} - weight_{kohonen,loser#,input#})
    where beta is the learning rate for losers. Beta is typically very small, often 0.
  - After sufficient training, the PE's weights will be distributed about the input space, with higher density in the areas of frequent input. Each PE recognises a piece of the input space.
- The Grossberg layer
  - Intermediate value = SUM_inputs(Weight * Input). This is the sum of products.
  - Activation function is f(Intermediate) = Intermediate
  - The weights are initially set randomly.
- Training the Grossberg layer
  - A sequence of typical input vectors and expected output vectors are used.
  - The input values are distributed to the Kohonen layer as usual, and the winner calulated.
  - The output values from the Kohonen layer are transmitted to the Grossberg layer.
  - The output from each Grossberg layer PE is the weight corresponding to the input value 1 from the Kohonen layer.
  - The weights in Grossberg layer PEs are then modified:
    weight_{grossberg,pe#,input#} += input_{grossberg,pe#,input#} * (gamma*(expected_output_pe# - output_{grossberg,pe#,input#}))
    where gamma is the learning rate for the Grossberg layer.
  - After sufficient training, the vector made from the Grossberg layer PE's outputs will approximate the output vector for the piece of input space recognised by the ith Kohonen layer PE.
- Example application: Weather types and intelligent reactions
Backpropagation
- There is an input layer, a hidden layer, and an output layer, like this.
- Intermediate value = SUM_inputs(Weight * Input). This is the sum of products.
- Activation function is often the sigmoid: f(x) = 1/(1 + exp(-x))
- The network might or might not be fully feedforward connected.
- The weights are initially set randomly.
- The network is trained in a supervised manner, to learn a function from input vectors to output vectors (the output vectors are formed from the outputs from the output layer PEs). This is done by adjusting the weights in the network.
- Once trained, the network is used by presenting an input vector.
- Training
  - A sequence of typical input and output vector pairs are presented to the network, and the PE outputs calculated.
  - The output layer weights are then updated:
    delta_output,pe = f'(Intermediate_output,pe) * (expected_output_output,pe - output_output,pe)
    weight_output,pe,i += alpha*delta_output,pe*input_output,pe,i
    where the first derivative of the sigmoid function is used to train the network: f'(x) = f(x)*(1-f(x)), and alpha is the learning rate for the output layer.
  - The hidden layer weights are then updated, by propagating the errors from the output layer back to the hidden layer:
    delta_hidden,pe = f'(Intermediate_hidden,pe) * SUM_i(delta_output,i*weight_output,i,pe)
    weight_hidden,pe,i += beta*delta_hidden,pe*input_hidden,i
    where beta is the learning rate for the hidden layer.
- Example application: Stock market analysis

Exam Style Questions

Describe the structure and operation of a processing element.
Explain what is meant by a neural network being "fully feedforward connected".
Explain the difference between supervised and unsupervised learning in a neural network.
Describe the architecture of a Kohonen/counter propagation network. What (in general terms) are the tasks of each of the layers?
Give details of how the output and hidden layers of a backpropogation network are trained. After sufficient training, what has a backpropogation network learned?