85-419/719 Intro PDP: Homework 2

Due Tuesday, Feb 14, 10:30pm

Each part of the homework has multiple questions to be answered and/or other things to hand in. Be careful not to leave out anything in preparing your responses. Note that, although it will be scored out of 100 points, the homework constitutes 15% of your course grade.

1. Hebb and Delta Rules [50 points]

This part of the homework requires you to demonstrate an understanding of both the Hebb and Delta learning rules in feedforward pattern associator networks, and how these procedures extract regularities from (possibly noisy) examples.

You will need to download the file http://www.cnbc.cmu.edu/~plaut/IntroPDP/networks/8x8.zip and unzip it into your Lens Examples subdirectory (or wherever you'd like to run the simulations from). This should give you the following six files: 8x8-imposs.ex, 8x8-li.ex, 8x8-orth.ex, 8x8.tcl, 8x8lin.in, and 8x8sig.in.These files define two versions of a pattern associator with 8 inputs and 8 outputs: one with linear units (8x8lin.in) and one with sigmoid units (8x8sig.in). Three example files are loaded by each of the networks:

First, start up lens, click on "Run Script", and select 8x8lin.in to load the linear version of the pattern associator. The Link Viewer and a graph that plots the network error will open automatically. Note that the training set is initially the orthogonal examples ("orth") and that the weights in the network are all initialized to 0.0. Thus, if you open the Unit Viewer and click on each of the three examples, you'll see that the output of each unit for each example is 0.0. This is because, for linear units, the activation of a unit is equal to its net input, which is 0.0 if all of its weights are 0.0. Note that, if you move your mouse over the units, the input activations range from -1 to 1 whereas the targets of the output units are 0 or 1. You can reset the weights back to all zeros at any time by clicking on "Reset Network" on the main panel.

Now click on "Train Network". Because "Weight Updates" is set to 1, this will train the network on one presentation of each of the three training examples (called an "epoch"). For the purposes of this homework, you can consider this specific situation to be equivalent to applying the Hebb rule using each example. [In actuality, the network is training with the Delta rule but, as will be discussed in class, the result is identical to that for the Hebb rule when training for one presentation of orthogonal patterns starting with zero weights.] Note that, after training, the weights now have a range of values.

1A [5 pts.] Explain why the weight from input:0 to output:0 is equal to 0.375, and why the weight from input:1 to output:4 is equal to -0.25. To do this, you will have to consider the Hebb rule equation, the training patterns, the learning rate, and the fact that (as will be discussed in class) Lens applies an extra factor of 2.0 to the weight changes.

Save these weights by clicking on "Save Weights" and replacing the Selection field with hebb-orth.wt. The file will be saved into the current directory (shown in the top-left panel). [Note: Recall that Lens under Windows doesn't handle paths or filenames with spaces well. If you get an error when attempting to save a weight file, it may be because you didn't remove the full path from the Selection field and it contains one or more spaces (e.g., C:\Documents and Settings\Name\Lens\hebb-orth.wt).] You will also need to save the Link Viewer display to a file (to hand in). First, switch to "Hinton Diagram" under the "Palette" menu of the Link Viewer. Then (to print) select "Print..." under the "Viewer" menu. You should save to a file (ending in ".ps") and then import this file into your write-up directly.

1B [5 pts.] If, at this point, you apply the Delta rule by clicking on "Train Network", the weights remain unchanged. Why? What would have happened if the Hebb rule had been applied instead?

Now click on "Reset Network" to reset the weights to zero, and switch to training on the linearly independent patterns by running "useTrainingSet li" from the command-line interface. Then train for 1 epoch, save the weights as hebb-li.wt, and print a display of them (using the Hinton Diagram palette). (Note that, even though the patterns are not all orthogonal, the weights after 1 epoch are equivalent to Hebbian learning in this case because Lens is not updating the weights after each example, but only after all three examples are run.) Now continue training for 9 more epochs (total of 10), save the weights as delta-li.wt, and save a display of them (by "printing" to a ".ps" file).

1C [10 pts.] Describe and explain the similarities and differences among the weights produced by the Hebb rule (hebb-li.wt) and those produced by the Delta rule (delta-li.wt) when training on the linearly independent set.

Reset the network and run "noiseOn" from the command line. This adds noise to both the inputs and targets when each example is presented. Also reset the error graph ("Graph 0") by selecting "Clear" under the "Graph" menu of the graph. Then train for 30 epochs. [You can do this by clicking "Train Network" 30 times, or by setting "Weight Updates" to 30 and clicking "Train Network" once. Also, be sure to hit "Enter" after changing the value in the Lens interface, so the field turns from pink back to gray---otherwise the value won't actually be changed.] You'll see (in the error graph) that the error jumps around wildly over the course of training. Now reset the network, set the learning rate to 0.005, and retrain the network for 30 epochs. Note that learning is now much more effective. Finally, reset the network one more time, set the learning rate to 0.001, and retrain for 30 epochs.

1D [10 pts.] Why was training with the intermediate learning rate more effective than either the higher or lower rate? Save the error graph (by selecting "Print..." from the "Graph" menu; chose "Grayscale" unless you have a color printer).

Now load the version of the pattern associator with sigmoid units by clicking on "Run Script" and selecting 8x8sig.in. This creates a new network with zero weights and "li" as the training set. Train this network for 40 epochs. Save the resulting weights as delta-li-sig.wt and save a display of them.

1E [10 pts.] Why is learning so much slower using sigmoid units than when using linear units? Describe and explain the similarities and differences in the resulting sets of weights.

Finally, reset the network and switch to training on the "imposs" set. Train for 40 epochs, print the weights, and save them as delta-imposs-sig.wt.

1F [10 pts.] Explain why learning fails here, even with the Delta rule. To answer this, you will have to examine the patterns carefully, noting that some sets of inputs are "redundant" with each other---that is, they provide no additional information (and similarly for some outputs).

In addition to your written answers to the questions above, please include an image of the Link Viewer diagrams (using the "Hinton Diagram" palette) for hebb-orth.wt, hebb-li.wt, delta-li.wt, delta-li-sig.wt, and delta-imposs-sig.wt, as well as the error graph from 1D.

2. Learning and Generalization [50 points]

In this part of the homework, you are asked to apply what you have learned to some domain of your own choosing. Design a set of input-output pattern pairs representing two types of information about some set of entities. For example, if the entities were musical instruments, the inputs might specify features of the shape and appearance of the instrument and the outputs might specify features of the sounds that the instrument makes. Or, for a set of words, the inputs might represent the spelling of the word and the outputs might represent its pronunciation. Note that the input and output units should code features of entities (i.e., a distributed representation), rather than individual entities directly (i.e., a localist representation).

Make up a set of about 10-12 examples, each consisting of an 8 element input vector and an 8 element output vector. (It can be interesting to use even more examples, although this is not necessary for the assignment.) In the case of instruments, one object might be a violin:

name: violin
I: 1 0 0 1 0 1 1 0
T: 1 0 0 1 1 0 0 1

The first line lists the name of the example, which will appear in the left panel of the Unit Viewer. Names with spaces must be surrounded by curly braces, as in

name: {french horn}
The input values are preceded by an I:, the output or target values are preceded by a T:, all values are separated by spaces, and the entire example is ended by a semicolon. Use 1's and 0's in the input patterns rather than +1's and -1's, and make sure each input pattern has at least one 1 in it (and generally more than one). For the elements of the vectors, try to identify features (e.g., has-strings, has-frets, is-long-and-thin) that allow you to distinguish each item, both in terms of its input characteristics and its output characteristics. You can use fewer than 8 features by just using all 0's for the missing features (although try to avoid this). Try to design your patterns so that they capture both the strengths and the weaknesses of pattern associator models, with attention to both learnability and generalization. That is, pay attention to the similarities (overlap) among the input patterns and the extent to which these correspond to similiarities among the output patterns. Also, make sure that no two input patterns are identical, as it is impossible for a deterministic network to learn to map identical inputs to different outputs. The interestingness of the results you achieve with learning will depend to a large extent on the properties of the patterns you use, so take some care in designing them.

2A [5 pts.] Hand in a table displaying the set of patterns you have constructed, and explain briefly how you designed them.

Chose two of your examples to be withheld from training and to be used to test generalization. Generally, test cases should be selected randomly so as not to bias the results, but for the purposes of the homework you should choose two cases that are likely to show some degree of generalization based on the nature of the remaining (training) examples. Put these two examples in a plain text file named test.ex and the remaining 8-10 examples in a plain text file named train.ex. In each file, the examples should be formatted as shown above for violin. Be sure that you're using a text editor (e.g., WordPad) that saves text files as plain text (no formatting whatsoever).

Train an 8x8 pattern associator with the Delta rule on your training patterns in each of two conditions: (1) using linear units, and (2) using sigmoid units. To do this, load either 8x8lin.in or 8x8sig.in (from "Run Script") and run the following commands from the command-line interface:

loadExamples train.ex -s train
useTrainingSet train
loadExamples test.ex -s test
useTestingSet test

[NOTE: if you started lens by double-clicking it, you will need to run either "cd ../Examples" (for Windows) or "cd ../../Examples" (for OSX or Linux) from within the lens console before running the above commands, or lens won't file the files (assuming they're in the Examples folder).] Now train each network using a small enough learning rate, for enough epochs, so that you reach a stable configuration of weights. (Note that it will take many more epochs to achieve this using sigmoid units compared to linear units.)

2B [10 pts.] Examine how well the network does at the end of training in learning the training examples using each unit activation function (linear and sigmoid), and explain the successes and failures (based on the relationships among the trained patterns). Are there any differences between the two functions in how well the patterns can be learned?

2C [15 pts.] Choose one of the two activation functions and examine the time course of learning. To do this, reset the appropriate network and then retrain it only a few epochs at a time (using the "Weight Updates" setting on the main console). Try to identify what aspects of your patterns the network learns first and what aspects it learns only later. Describe what you observe and try to explain why it happens.

2D [20 pts.] Now, for both the linear and sigmoid network, consider how well the trained network generalizes to the two patterns that you set aside. (To do this, open the Unit Viewer and select "Testing Set" from the "Example Set" menu, and then click on the examples.) Report what happens when you test with these, and explain the results in terms of the relationship between the test patterns and the trained patterns.