Early attractor networks used step activation functions, allowing only two activation values for each processing element. Contemporary networks, however, often use sigmoidal activation functions. Continuous outputs allow for continuous target patterns, and these have at least two apparent advantages: (1) they allow for richer representational schemes, such as mapping activations to probabilities, and (2) they allow for a larger number of attractors. Attractor network models, like those for lexical semantics (Clouse and Cottrell, 1995) and consciousness (Mathis and Mozer, 1995), would seem to benefit from continuous patterns, and other models using continuous vectors, like the CHARM memory model (Metcalfe Eich, 1982), would benefit from attractor dynamics.
In this work we argue that these apparent advantages of continuous targets are illusory, at least when attractors are being learned using standard methods. Our simulation results indicate that attractor networks, even those with continuous activation functions, are best suited for use with target vectors consisting of polarized discrete elements.
Rigorous experiments on small networks, ranging in size from 2 units
to 16 units, revealed that the most attractors were learned when
targets were placed in the extreme corners of activation space. An
example of this result, for size 16 networks, is shown in
Figure 1. Furthermore, attractors were learned
faster (i.e., after fewer presentations) with corner targets.
Lastly, corner target patterns resulted in attractors which were
closer, in Euclidean distance, to their corresponding targets
as compared to the other target distributions.
Figure 1 - Fraction of the training set of target attractors actually learned as a function of the training set size.
These results were also found for larger networks, with 100 processing elements, trained for shorter durations, and also for sparse target patterns.
In short, attractor networks with sigmoidal units show higher capacity, faster learning, and greater accuracy when targets are placed in the extreme corners of activation space.
Mathis, D. W. and Mozer, M. C. (1995). On the computational utility of consciousness. In Tesauro, G., Touretzky, D. S., and Leen, T. K., editors, Advances In Neural Information Processing Systems 7, pages 11-18, Denver. MIT Press.
Metcalfe Eich, J. (1982). A composite holographic associative recall model. Psychological Review, 89(6):627-661.
Noelle, D. C., Cottrell, G. W., and Wilms, F. R. (1997). Extreme attraction: The benefits of corner attractors. Technical Report CS97-536, Department of Computer Science & Engineering, UCSD.
This paper is also available as a GNU Zipped PostScript file (cover page). Other publications by this author are also available online.