Lab 8: Training a Simple Recurrent Network

Now it is time to turn our attention to the simple recurrent network.

In this lab, you will train a network to predict the next letter in a simple artificial language. The network is a so-called 'predictive SRN', which is to say that its task consists of learning to predict the next element in a sequence. There are three "words" in this language: ba, dii, and guuu. Letters will be presented one at a time to the network. The individual letters will be encoded in the following fashion:

b1100
d1010
g1001
a0100
i0010
u0001

Before you try any simulations, study this language and its encoding. You should realize that a) this is a distributed coding (i.e., a letter may be represented by more than one active unit) and b) there is some structure to it: the first bit (input) codes whether the letter is a consonant (1) or a vowel (0), the next 3 bits encode the identity of the respective consonant or vowel.

Question: How many bits (input units) would be required for a fully localist coding?

Question: Give an example of a distributed coding that uses 3 bits (inputs) only.

The pattern file for this simulation is here: badiiguuu.pat. It was generated by greating an entirely random sequence of 500 words, like this:

b
a
b
a
d
i
i
g
u
u
u
d
i
i

Note: this is a random series of words, not a random series of letters. What is the difference?

The words were translated into their pattern-based representations, as above, and a target for each letter was provided by simply copying the input from the next line as the target for each pattern.

When you are training an SRN, it is very important that the patterns be presented in the correct sequence. With a feed forward network, we can present the patterns in any order at all, but if we do that with an SRN, we will make the task impossible: the information required to do the task lies in the sequence and not in the individual patterns considered in isolation. The "Batch Update" choice is not to be used here. If you have an SRN, the network will always present the patterns in the order in which they are listed in your file.

Fire up your simulator. Choose "Network->Configure Network". Select the "SRN" tab, and build a 4*10*4 network. You will see that the hidden units have a blue border around them. This is your visual clue that the network is a simple recurrent network, and not a feed forward network. (Displaying all the hidden units would make the display unnecessarily complex.)

Train your network. You might try a learning rate of about 0.1 and a momentum of about 0.3. I find these work well, but you may find better values.

Here is the most important thing to understand about this task (and the final exercise): you can not simply rely on the error plot to evaluate whether or not the network has learned. The error here can never go anywhere near zero. There will always be a lot of error in predicting any given sequence. Why? If you do not understand why, do not procede, but discuss this question with others, or with Fred. If you now understand that, now look at the pattern inputs, targets and outputs ("Patterns -> Show Patterns and Outputs"). How do you make sense of the error distribution? You should be able to understand the errors you are seeing. There is a big difference in the network's ability to predict vowels and consonants. Why? When it has seen the second /u/ of /guuu/, and is to predict the third, why is the error (possibly) greater than when it has to predict the first or second /u/?

If "show patterns and outputs" is misbehaving (as it does on windows machines) you may find it useful to use this file for testing: badiiguuTest.pat.

What would optimal performance of this network be, if it were doing the very best it can do?

The final exercise is similar in spirit to this problem, so make sure you have spent some time to understand the errors that this network makes. You can not understand the errors, if you do not understand the problem the network is being asked to solve.