85-419/719 Intro PDP: Homework 1 Feedback

This material is intended to give you some sense of what we were looking for in grading the homework---the sample answers that are provided are not the only acceptable ones. For some questions, it was necessary to base the grading on a somewhat subjective sense of the depth of your understanding of the relevant issues.

1. Jets and Sharks

1A [10 points]: Why are some of the instance units (other than Ken) more active than others? Explain the differences.

Individuals differ in the degree to which they share properties with Ken, and hence the degree to which activation of Ken's properties (via his instance unit) provides support for the instance units of those individuals (as described in the answer to the example question given in the homework). But individuals also differ with regard to how similar they are to each other, and this also influences their final activation levels. Thus, although Nick, Neal, Rick, Earl, Pete and Fred all share three features with Ken, Nick and Neal are the most active (apart from Ken) because they share four properties with each other but only three with Rick and two with Earl, Pete and Fred, so they support each other (via the property units) more than either supports Rick, Earl, Pete or Fred. Pete and Fred are only very weakly active because most people who are similar to Ken (and, hence, partially active) are Sharks rather than Jets.

Some people just said that certain instance units received more input from the property units without explaining why (i.e., not mentioning that this depended on the level of overlap between the individuals' properties and those of Ken and other individuals similar to Ken). And again, some people were not specific about which instance units had partial activation (and why those and not others). Many people mentioned that activation depended on the number of properties shared with Ken, but didn't notice or explain the variation in activation among individuals sharing the same number of properties, or only considered one or two individuals but failed to discuss all of the relevant ones. Only a few people mentioned that the relative degree of similarity among the partially activated individuals themselves also mattered.

1B [15 points]: [after providing input to Sharks and 20s units] Explain why the occupation units show partial activations of units other than Ken's occupation, which is Burglar. Be sure to contrast the current case with the one with Ken as input.

The Sharks and 20s input causes a greater degree of partial activation among a number of instance units other than Ken (e.g., Pete, Fred, Nick, Neal) compared with the case in which the Ken name unit alone is provided as input, because the Sharks and 20s property units provide direct, immediate support for these alternative instance units. In the case when only the Ken name unit is presented, only Ken-in receives direct support; the partial activation of other instance units happens only in response to Ken's additional properties becoming active. Among the active alternative instance units, two support Bookie and two support Pusher, so these alternative occupations receive significant support. Also, for the case in which the Ken name unit is input, the Burglar unit is activated much earlier than when Sharks and 20s are input, and this early activation provides an additional advantage in the competition with the Bookie and Pusher units (along the lines of a "winner-take-all" dynamic).

Most people got the basic idea here, but some forgot to contrast the "Sharks 20's" case with the "Ken" case, and (again) some people were not specific about which individuals were most involved in activating Burgler.

1C [15 points]: [after removing the Lance-Burglar connections] Explain how the model was able to fill in the correct occupation for Lance. Also, explain why the model tends to activate the Div. (divorced) unit as well as the Mar. (married) unit.

Lance's correct occupation (Burglar) is still activated because the instance units that share several of Lance's properties (Al, Jim, John, George) also happen to share the same occupation (i.e., the network generalizes on the basis of similarity). The divorced unit is partially activated because two of the instances that are similar to Lance (Jim, George) are divorced rather than married.

Many people did well here too, although some forgot to address the partial activation of the divorced unit. Some described the network in general terms (the network "looked for" similar individuals or "understood" Lance's properties) rather than describing how the network actually worked.

2. Schemas

2A [10 points]: What does each of the notions of variable, value, and default value correspond to within Rumelhart et al.'s PDP formulation of a schema?

This is more-or-less straight from the text (p. 33): "The variables of a schema correspond to those parts of the pattern that are not completely determined by the remainder of the structure of the pattern itself" and, thus, "vary from one situation to another." The specific pattern these portions take in a given situation corresponds to the value assigned to the variable. "Default values represent the subpatterns over variables that tend to get filled-in in the absence of any specific input."

People had a little trouble with the notions of a variable in terms of connectionist networks. Most were good with "value", but many seemed to think that the default was the initial activation. Also, some definitions of a variable were often overly restricted. For example, a value need not be the activation of a single unit among a set of mutually exclusive alternatives---it could be a particular pattern of activity over the units. Also, in general, when you quote from a text it is important that you cite the source.

2B [10 points]: Summarize briefly how, according to Rumelhart et al., schema embedding is instantiated in a constraint satisfaction network. That is, what are the conditions under which a particular collection of descriptors should be considered to constitute a subschema?

On p. 35, Rumelhart et al. state ``Under our interpretation, subschemata correspond to small configurations of units which cohere and which may be a part of many different stable patterns (and therefore constitute a schema on the own right).'' So the critical property is that the descriptors within a subschema have to cohere (come and go together) across many different contexts. Rumelhart et al. use relative goodness across contexts as a measure of coherence.

Students generally did well on this question though there were a few who did not grasp that an embedded schema needed to be stable across multiple contexts; in fact, some suggested that a subschema was part of or "belonged to" a specific schema. Also, some answers were more-or-less restatements of the properties of traditional schemas rather than the reformulations of these concepts in constraint-satisfaction networks.

2C [20 points]: Critically evaluate whether or not the "windows" and "drapes" features do, in fact, form a subschema. Be sure to provide evidence from running the simulation (e.g., goodness values) to support your argument.

The text (see Figure 13) operationalizes a subschema as a set of units for which, across many contexts, the goodness is higher when they occur together or not at all than when they occur separately. This is clearly true for window and drapes only in the office schema (and maybe weakly in the bathroom); the effect is not found in the kitchen, bedroom, or living room schemas. The natural way to provide evidence for this is to list the goodness values of the four combinations in the context of each of the five room prototype patterns, as in the following table:

office kitchen living room bedroom bathroom
both 19.62 17.20 17.56 15.05 4.54
neither 19.78 16.05 10.74 9.27 4.09
window only 19.35 16.36 14.04 12.20 4.04
drapes only 19.11 15.94 13.28 11.18 3.64

Thus it could be argued that window and drapes do not cohere within ``many different stable patterns'' and so don't satisfy the criteria in 2B. (If, instead, you argued that they did form a subschema, we didn't necessarily take points off as long as you provided valid evidence for your claim---e.g., by also examining the "empty" prototype.)

Many people had difficulty with this question. They had the idea that goodness needed to be the higher when both features were on than when either was on by itself. However, they didn't always show that having neither on is better than only one. Also, they tended to give only a very little bit of evidence to test whether it was true of window and drapes. Moreover---and this was perhaps the main difficulty---most people didn't seek to evaluate goodness in enough different contexts. Some people made the mistake of using the standard examples (e.g., "desk (office)", which just clamps the desk unit on) instead of the prototype examples (e.g., "office (prototype)", which forces all of the units into their state in the prototype, except for those you've turned on/off explicitly), which leads to somewhat strange results.

As a general rule, it is important in this course to support your claims with evidence---that is, the relevant data from running the simulation. In this case, that means listing the actual goodness values of configurations in different contexts, and perhaps aspects of the weight values (e.g., strong positive weight between the units in a subschema, and relatively weaker weights between them and other units).

Also, as a side comment, many people referred to the prototype patterns---the patterns evoked by "oven", "bed", etc.---as the "office schema", the "bedroom schema", etc. The "office schema" is not a specific pattern of activity. Rather, if it corresponds to anything, it corresponds to the knowlege the network has about the collection of features that tend to occur in offices. A particular activation pattern, such as the prototype office pattern, is more like a specific office (and can be thought of as an "instantiation" of the office schema).

2D [20 points]: [after combining two incompatible features] Identify the ways in which the combined final pattern differs from each of the single-feature patterns, and try to explain those differences. Does the pattern produced by one of the features predominate, or is the mix fairly even, and why? How does the goodness value of the combined pattern compare with those of the single-feature patterns, and why?

Many different choices for incompatible units are possible. In general, the pattern produced by one of the units tends to dominate, in that the combined or "hybrid" pattern that results from clamping both of the units on is much more similar to the pattern caused by the dominating unit by itself than the pattern caused by the other unit by itself. (This is especially true if one of the descriptors comes from the bathroom schema, such as bathtub or toilet, as that schema contains very few active descriptors and so is particularly weak.) Typically, the dominant pattern is the one with higher goodness when its feature is presented in isolation. Even so, the hybrid pattern will show evidence of contamination by the other, non-dominant pattern---particularly in the strongest aspects of that pattern (i.e., the strongest positive or negative weights from the incompatible unit), which will cause increases or decreases in activation for some descriptors compared to the dominant pattern. These changes will also cause the hybrid pattern to have lower goodness than the dominant pattern (and often lower than the non-dominant pattern as well), because the altered activation pattern violates some of the constraints reflected in the original (dominant) pattern. Descriptors that tend to occur in more than one room type (e.g., television, in both bedroom and living room) produce only weak interference because they are not very strongly committed to which other descriptors they co-occur with.

Some choices of features lead to a more balanced mix of the two patterns, where the strongest aspects of each are preserved but where the hybrid has lower overall goodness than either pattern produced by clamping only one of the units, but the same basic reasoning about the causes of the changes applies.

As in 2C, in answering this question it was important to make careful observations about the behavior of the network under the various conditions, and then to explain those differences in terms of the underlying operation of the network---in this case, the patterns of strong positive and negative weights from the two units in question.

This question was also difficult for many students. Common problems included not relating goodness differences to which pattern dominated, not examining the activation differences carefully enough to figure out what's going on, and not trying to explain those differences in terms of the weights from each of the units. A couple people used "bed" and "oven" even though the question stated explicitly not to do this.