Lab 9 (final lab): Dynamics and Recurrent Neural Networks
A Simple Recurrent Network deals with sequences, not simple input-output mappings. Its state at any given time is a function of the present input, and its state at the last several timesteps. This introduces a wrinkle into the way we update weights. Where we previously chose to update weights either after every pattern, or after presentation of all patterns (one epoch), a new, and more sensible choice now arises: updating after each sequence.
Recall from the last lab that the network learnt the internal structure of sequences (words), but there was nothing to learn in the transitions from one word to the next. When we have a problem like that, it makes sense to concentrate on that which can be learned (sequence structure, or letters-within-words in the last example) and not to waste time trying to learn that which is inherently unpredictable (sequences of sequences, or sequences of words, in the last example).
We now tackle a problem that can be understood in two ways: as a computational solution to the parity checking problem, or as a dynamical system in which we induce a limit cycle attractor.
We will start with a simple 1-n-1 Simple Recurrent Network. This will be trained with a bunch of short sequences. In each case, the input will be a random sequence of 1 and 0s. The output will be 1 if we have seen an odd number of 1s so far, and 0 if we have seen an even number of 1s so far. Thus the network should ignore all 0s, but every time a new 1 is presented, it should flip its output from 1 to 0 or from 0 to 1. Computer scientists will recognize this simple task as a parity checking problem.
Here is the interesting bit. We will train the network only on short sequences. If you look at the training pattern file you will see that the network is instructed to "reset" after every short sequence. What this does is two fold:
- It forces the network to make its weight updates. Weight update is thus per-sequence, not per pattern, nor per epoch.
- The network hidden unit activations are set to 0, so that all past state is lost.
Although we train the network on short sequences (max 6 patterns per sequence), we wish to induce a general parity checker. The test pattern file is just a continuous input of 1s. The desired output is thus an oscillation from 1 to 0 to 1 to 0 etc.
The parity checking interpretation is a timeless, computational way of viewing this network. Viewed instead as a dynamical system, we are inducing a periodic attractor for the network. It should learn to oscillate in the presence of continuous and unchanging input.
Most systems will tend towards equilibrium in the presence of constant, unchanging input. In training, you are here inducing a bifurcation, whereby the behavior exhibits a qualitative shift, from steady-state, to oscillation. You can track this by training for a while, and then loading the test patterns, and looking at the output over time. Here, for example, is a snapshot of the output for the test series, in an incompletely trained network. You can see that we are approaching the bifurcation, after which the oscillation will persist indefinitely. What you see here is knows as damped oscillation.
And here is the same network, after some more training:
See if you can reproduce this behavior. Remember, if you load the test patterns to examine the performance of a partially trained network, remember to reload the training patterns before you continue with training.