85-419/719 Intro to PDP: Homework 1 Feedback
This material is intended to give you some sense of what I was looking
for in grading the homework---the sample answers that are provided are not the
only acceptable ones. For some questions, it was necessary to base the
grading on a somewhat subjective sense of the depth of your understanding of
the relevant issues.
1. Jets and Sharks
-
- 1A [10 points]: Why are some of Ken's properties more strongly
activated than others?
Ken's name unit activates the Ken_in instance unit which, in turn, activates
the rest of Ken's properties (Sharks, 20s, HS, Single, Burglar). These
property units, in turn, partially activate other instance units who are
similar to Ken in that they share many of his properties (e.g., Nick, Neal).
These partially activated instance units then provide additional support to
their own property units, some of which support Ken's properties and some of
which compete with his properties, causing differing levels of final
activation.
People generally did well on this question, although some made only general
statements about the network without reasoning specifically about interactions
among particular sets of units (e.g., instance units of people who share
properties with Ken).
-
- 1B [10 points]: Why are some of the instance units more active
than others?
Individuals differ in the degree to which they share properties with Ken (and
with each other), and hence the degree to which activation of Ken's properties
(via his instance unit) provides support for the instance units of those
individuals. In particular, although Nick, Neal, Rick, Earl, Pete and Fred
all share three features with Ken, Nick and Neal share four properties with
each other but only three with Rick and two with Earl, Pete and Fred, so they
support each other (via the property units) more than either supports Rick,
Earl, Pete or Fred. Pete and Fred are only very weakly active because most
people who are similar to Ken (and, hence, partially active) are Sharks rather
than Jets.
Some people just said that certain instance units received more input from
the property units without explaining why (i.e., not mentioning that this
depended on the level of overlap between the individuals' properties and those
of Ken and other individuals similar to Ken). Very few people mentioned that
the relative degree of similarity among the partially activated individuals
also mattered.
-
- 1C [10 points]: [after providing input to Sharks and 20s units]
Explain why the occupation units show partial activations of units other
than Ken's occupation, which is Burglar.
The Sharks and 20s input causes a greater degree of partial activation among a
number of instance units other than Ken (e.g., Pete, Fred, Nick, Neal)
compared with the case in which the Ken name unit alone is provided as input.
Two of these instances support Bookie and two support Pusher, so these
alternative occupations receive significant support. Also, when the Ken name
unit is input, the Burglar unit is activated much earlier than when Sharks and
20s are input, and this early activation provides an additional advantage in
the competition with the Bookie and Pusher units.
Most people got the basic idea here, but many forgot to contrast the
"Sharks 20's" case with the "Ken" case.
-
- 1D [10 points]: [after removing the Lance-Burglar connections]
Describe how the model was able to fill in
the correct occupation for Lance. Also, explain why the model tends to
activate the Div. (divorced) unit as well as the Mar.
(married) unit.
Lance's correct occupation (Burglar) is still activated because the instance
units that share several of Lance's properties (Al, Jim, John, George) also
happen to share the same occupation (i.e., the network generalizes on the
basis of similarity). The divorced unit is partially activated because two of
the instances that are similar to Lance (Jim, George) are divorced rather than
married.
Most people did well here too, although some forgot to address the partial
activation of the divorced unit.
2. Schemas
- 2A [10 points]: What does each of the notions of
variable, value, and default value correspond
to within Rumelhart et al.'s PDP formulation of a schema?
This is more-or-less straight from the text (p. 33): ``The variables
of a schema correspond to those parts of the pattern that are not completely
determined by the remainder of the structure of the pattern itself'' and, thus,
``vary from one situation to another.'' The specific pattern these portions
take in a given situation corresponds to the value assigned to the
variable. ``Default Values represent variable subpatterns that tend
to get filled-in in the absence of any specific input.''
People had a little trouble with the notions of a variable in terms of
connectionist networks. Most were good with "value", but many seemed to think
that the default was the initial activation. Also, some definitions of a
variable were often overly restricted. For example, a value need not be the
activation of a single unit among a set of mutually exclusive
alternatives---it could be a particular pattern of activity over the units.
-
- 2B [10 points]: Summarize briefly how, according to Rumelhart and
colleagues, schema embedding is instantiated in a constraint satisfaction
network. That is, what does it mean for a particular collection of
descriptors to be a subschema?
On p. 35, Rumelhart et al. state ``Under our interpretation, subschemata
correspond to small configurations of units which cohere and which may be a
part of many different stable patterns (and therefore constitute a schema on
the own right).'' So the critical property is that the descriptors within a
subschema have to cohere (come and go together) across many different
contexts. Rumelhart et al. use relative goodness across contexts as a measure
of coherence.
Students generally did well on this question though there were a few who
did not grasp that an embedded schema needed to be stable across multiple
contexts.
-
- 2C [20 points]: Based on your answer to 2B, critically
evaluate the claim that the "windows" and "drapes" features do, in
fact, form a subschema. Be sure to provide evidence from running the
simulation (e.g., goodness values) to support your argument.
The text (see Figure 13) operationalizes a subschema as a set of units for
which the goodness is higher when they occur together or not at all than when
they occur separately. This is clearly true for ``windows'' and ``drapes''
only in the office schema (and maybe weakly in the bathroom); the effect is
not found in the bedroom and living room schemas. (The natural way to provide
evidence for this is to give the goodness values of the four combinations in
the context of each of the five room prototype patterns.) Thus it could be
argued that ``windows'' and ``drapes'' do not cohere within ``many different
stable patterns'' and so don't satisfy the criteria in 2B. (If, instead, you
argued that they did form a subschema, I didn't necessarily take points off as
long as you provided valid evidence for your claim.)
Many people had difficulty with this question. They had the idea that
goodness needed to be the higher when both features were on than when either
was on by itself. However, they didn't always show that having
neither on is better than only one. Also, they tended to give only a
very little bit of evidence to test whether it was true of windows and drapes.
Moreover---and this was perhaps the main difficulty---most people didn't seek
to evaluate goodness in different contexts. In fact, many people believed that
a subschema ``belonged'' to a particular schema---that, for example, ``window
and drapes'' was a subschema of the office schema.
As a general rule, it is important to support your claims with
evidence---in this case, that means listing the actual goodness values of
configurations in different contexts.
-
- 2D [20 points]: Find another combination of features that, based
on your intuitions of the contents of the various room types, ought to
form a subschema but do not. Explain your choice of features and
provide evidence for your claims (as in 2C). Does the pattern of
weights among units...provide any insight into why the network behaves
as it does?
Many different choices of a possible subschema are possible; common ones were
``book'' and ``bookshelf'', or ``desk'' and ``desk-chair''. As mentioned
above, I was looking for you to argue that these (or other) features do not
form a subschema using evidence gathered by running simulations. A
natural way to do this would be to repeat 2C, determining goodness values for
each standard schema with various combinations of your proposed features. For
some choices of units, the weights between them might not be strongly positive
(or might even be negative), or they might have very strong weights to other
units that were strongly associated with a particular schema, which would also
help to explain why the two units didn't "cohere" across a range of contexts,
like a proper subschema.
This question was also problematic for many students. Common problems
were: 1) not considering the case where both members of the subschema are off;
2) only discussing the subschema in a single context; 3) not considering that
individual items might occur independently---for example, ``bed'' and
``clock'' were a frequent choice, because they often occur together, even
though it's evident that clock occurs without bed in many contexts; and 4) not
providing evidence from simulations.