85-419/719 Intro to PDP: Homework 1 Feedback

This material is intended to give you some sense of what I was looking for in grading the homework---the sample answers that are provided are not the only acceptable ones. For some questions, it was necessary to base the grading on a somewhat subjective sense of the depth of your understanding of the relevant issues.


1. Jets and Sharks

1A [10 points]: Why are some of Ken's properties more strongly activated than others?
Ken's name unit activates the Ken_in instance unit which, in turn, activates the rest of Ken's properties (Sharks, 20s, HS, Single, Burglar). These property units, in turn, partially activate other instance units who are similar to Ken in that they share many of his properties (e.g., Nick, Neal). These partially activated instance units then provide additional support to their own property units, some of which support Ken's properties and some of which compete with his properties, causing differing levels of final activation.

People generally did well on this question, although some made only general statements about the network without reasoning specifically about interactions among particular sets of units (e.g., instance units of people who share properties with Ken).

1B [10 points]: Why are some of the instance units more active than others?
Individuals differ in the degree to which they share properties with Ken (and with each other), and hence the degree to which activation of Ken's properties (via his instance unit) provides support for the instance units of those individuals. In particular, although Nick, Neal, Rick, Earl, Pete and Fred all share three features with Ken, Nick and Neal share four properties with each other but only three with Rick and two with Earl, Pete and Fred, so they support each other (via the property units) more than either supports Rick, Earl, Pete or Fred. Pete and Fred are only very weakly active because most people who are similar to Ken (and, hence, partially active) are Sharks rather than Jets.

Some people just said that certain instance units received more input from the property units without explaining why (i.e., not mentioning that this depended on the level of overlap between the individuals' properties and those of Ken and other individuals similar to Ken). Very few people mentioned that the relative degree of similarity among the partially activated individuals also mattered.

1C [10 points]: [after providing input to Sharks and 20s units] Explain why the occupation units show partial activations of units other than Ken's occupation, which is Burglar.
The Sharks and 20s input causes a greater degree of partial activation among a number of instance units other than Ken (e.g., Pete, Fred, Nick, Neal) compared with the case in which the Ken name unit alone is provided as input. Two of these instances support Bookie and two support Pusher, so these alternative occupations receive significant support. Also, when the Ken name unit is input, the Burglar unit is activated much earlier than when Sharks and 20s are input, and this early activation provides an additional advantage in the competition with the Bookie and Pusher units.

Most people got the basic idea here, but many forgot to contrast the "Sharks 20's" case with the "Ken" case.

1D [10 points]: [after removing the Lance-Burglar connections] Describe how the model was able to fill in the correct occupation for Lance. Also, explain why the model tends to activate the Div. (divorced) unit as well as the Mar. (married) unit.
Lance's correct occupation (Burglar) is still activated because the instance units that share several of Lance's properties (Al, Jim, John, George) also happen to share the same occupation (i.e., the network generalizes on the basis of similarity). The divorced unit is partially activated because two of the instances that are similar to Lance (Jim, George) are divorced rather than married.

Most people did well here too, although some forgot to address the partial activation of the divorced unit.


2. Schemas

2A [10 points]: What does each of the notions of variable, value, and default value correspond to within Rumelhart et al.'s PDP formulation of a schema?
This is more-or-less straight from the text (p. 33): ``The variables of a schema correspond to those parts of the pattern that are not completely determined by the remainder of the structure of the pattern itself'' and, thus, ``vary from one situation to another.'' The specific pattern these portions take in a given situation corresponds to the value assigned to the variable. ``Default Values represent variable subpatterns that tend to get filled-in in the absence of any specific input.''

People had a little trouble with the notions of a variable in terms of connectionist networks. Most were good with "value", but many seemed to think that the default was the initial activation. Also, some definitions of a variable were often overly restricted. For example, a value need not be the activation of a single unit among a set of mutually exclusive alternatives---it could be a particular pattern of activity over the units.

2B [10 points]: Summarize briefly how, according to Rumelhart and colleagues, schema embedding is instantiated in a constraint satisfaction network. That is, what does it mean for a particular collection of descriptors to be a subschema?
On p. 35, Rumelhart et al. state ``Under our interpretation, subschemata correspond to small configurations of units which cohere and which may be a part of many different stable patterns (and therefore constitute a schema on the own right).'' So the critical property is that the descriptors within a subschema have to cohere (come and go together) across many different contexts. Rumelhart et al. use relative goodness across contexts as a measure of coherence.

Students generally did well on this question though there were a few who did not grasp that an embedded schema needed to be stable across multiple contexts.

2C [20 points]: Based on your answer to 2B, critically evaluate the claim that the "windows" and "drapes" features do, in fact, form a subschema. Be sure to provide evidence from running the simulation (e.g., goodness values) to support your argument.
The text (see Figure 13) operationalizes a subschema as a set of units for which the goodness is higher when they occur together or not at all than when they occur separately. This is clearly true for ``windows'' and ``drapes'' only in the office schema (and maybe weakly in the bathroom); the effect is not found in the bedroom and living room schemas. (The natural way to provide evidence for this is to give the goodness values of the four combinations in the context of each of the five room prototype patterns.) Thus it could be argued that ``windows'' and ``drapes'' do not cohere within ``many different stable patterns'' and so don't satisfy the criteria in 2B. (If, instead, you argued that they did form a subschema, I didn't necessarily take points off as long as you provided valid evidence for your claim.)

Many people had difficulty with this question. They had the idea that goodness needed to be the higher when both features were on than when either was on by itself. However, they didn't always show that having neither on is better than only one. Also, they tended to give only a very little bit of evidence to test whether it was true of windows and drapes. Moreover---and this was perhaps the main difficulty---most people didn't seek to evaluate goodness in different contexts. In fact, many people believed that a subschema ``belonged'' to a particular schema---that, for example, ``window and drapes'' was a subschema of the office schema.

As a general rule, it is important to support your claims with evidence---in this case, that means listing the actual goodness values of configurations in different contexts.

2D [20 points]: Find another combination of features that, based on your intuitions of the contents of the various room types, ought to form a subschema but do not. Explain your choice of features and provide evidence for your claims (as in 2C). Does the pattern of weights among units...provide any insight into why the network behaves as it does?
Many different choices of a possible subschema are possible; common ones were ``book'' and ``bookshelf'', or ``desk'' and ``desk-chair''. As mentioned above, I was looking for you to argue that these (or other) features do not form a subschema using evidence gathered by running simulations. A natural way to do this would be to repeat 2C, determining goodness values for each standard schema with various combinations of your proposed features. For some choices of units, the weights between them might not be strongly positive (or might even be negative), or they might have very strong weights to other units that were strongly associated with a particular schema, which would also help to explain why the two units didn't "cohere" across a range of contexts, like a proper subschema.

This question was also problematic for many students. Common problems were: 1) not considering the case where both members of the subschema are off; 2) only discussing the subschema in a single context; 3) not considering that individual items might occur independently---for example, ``bed'' and ``clock'' were a frequent choice, because they often occur together, even though it's evident that clock occurs without bed in many contexts; and 4) not providing evidence from simulations.