Toward a Universal Law of Generalization for Psychological Science. Shepard. Science 1987.

  1. “A psychological space is established for any set of stimuli by determining metric distances between the stimuli such that the probability that a response learned to any stimulus will generalize to any other is an invariant monotonic function of the distance between them.  To a good approximation, this probability of generalization (i) decays exponentially with this distance, and (ii) does so in accordance with one of two metrics, depending on the relation between the dimensions along which the stimuli vary.  These empirical principles are mathematically derivable from universal principles of natural kinds and probabilistic geometry that may, through evolutionary internalization, tend to govern the behaviors of all sentient organisms.”
  2. Psychology is about generalization, because nothing happens exactly the same way twice
    1. This idea though, is often left as a secondary topic in psychology.
    2. This generalization occurs according to some sort of metric
  3. Aristotle’s principle of association by resemblance goes back 2000 years, but this was only studied more formally at the beginning of the 1900s with Pavlov (the original whistle or bell caused a response, but he also tested other bells and whistles of differing levels of similarity)
  4. Since Pavlov, a common basis of experimentation was around “‘gradients of stimulus generalization’ relating the strength, probability, or speed of a learned response to some measure difference between each test stimulus and the original training stimulus.”
    1. Measuring this accurately began in ’56, when Guttman and Kalish examined Skinner’s work
    2. Author then expanded upon this by testing people in a passive noisy n to n association task, gradients were found when distributions for items in terms of their mapping were similar
  5. These gradients were originally defined in terms of hand-designed features (such as the wavelength of light emitted by each button in a set of buttons), but in some cases generalization was nomonotonic, (such as tones separated by an octave) or varied across individuals, species, and stimuli in differing ways
  6. Lashley, along with others like Robert R Bush and Frederick Mosteller felt like there was not going to be any invariant law of generalization
  7. The idea was, that instead of measuring things based on objective properties (such as the wavelength of the light) to do so according to how that physical parameter space maps to that individuals psychological space.
  8. More specifically, consider if there is “… an invariant monotonic function whose inverse will uniquely transform those data into numbers interpretable as distances is some appropriate metric space?… Thus, in a K-dimensional space, the distances between points within each subset of K+2 points must satisfy definite conditions…”
  9. The function must be unique based on the properties of the constraints set up: “Provided that the number, n, of points in a space is not too small relative to the number of dimensions of the space, teh rank order of teh n(n-1)/2 distances among those n points permits a close approximation to the distances themselves, up to multiplication by an arbitrary scale factor.”
  10. This unknown function can be determined by “nonmetric” multidimensional scaling.  “The plot of the generalization measures gij against the distances dij between points in the resulting configuration is interpreted as the gradient of generalization.  It is a psychological rather than psychophysical function because it can be determined in the absence of any physical measurements on the stimuli.”
  11. Basically the P matrix consists of how confusable pairs of stimuli are, and MDS is commonly done on a normalized version of that matrix
    1. Applying this to data from all sorts of experiments, even on different animals, yields basically the same exponential decay function.  This is not something that must fall out of MDS, but is in the data itself that MDS picks up on
  12. MDS will not impose monotonicity, so when MDS yielded something nonmonotonic, going up to higher dimensional representations has done the trick.
    1. Interesting discussion about what exactly it yields in terms of colors (for example, colors should be 2D so a circle can be formed connecting red and violet instead of putting them at opposite ends of a line), tones
  13. When you can define a reasonable metric (such as lightness, saturation in color) those are usually the closest thing to the MDS results.  Sometimes different metrics are needed though, such as Euclidian or Manhattan
  14. “Are these regularities of the decay of generalization in psychological space and of the implied metric of that space reflections of no more than arbitrary design features… Or do they have a deeper, more pervasive source?  I now outline a theory of generalization based on the idea that these regularities may be evolutionary accommodations to universal properties of the world.”
  15. Different organisms have different things they have to attend to in order to survive, and how they need to be able to distinguish between a particular stimulus varies.  This is from both evolutionary and individual perspectives.
  16. Assume psychological space is in some dimension K.   Color might be 3D in terms of lightness, hue, saturation
  17. The exponential law is derived from a set of assumptions about how an organism considers this feature space.
    1. All locations are equally probably
    2. Probability that the region (of the test stimulus) has a size s is based on density function p(s).  The way p(s) looks exactly doesn’t actually make much of a difference in practice, for reasonable distributions
    3. Region is convex and “centrally symmetric” <whats that>.  As is the case for the probability distribution, for the most part things are quite robust to the particular shape of the region
  18. The theory of generalization described “applies only to the highly idealized experiment in which generalization is tested immediately after a single learning trial with a novel stimulus.”  Empirical evidence from other test settings of either very long training times on very similar stimuli, or delayed test stimuli will lead to deviations from what is discussed here, which may happen in a few ways:
    1. Instead of exponential, an inflected Gaussian function
    2. “deviation away from rhombic and toward elliptical curves of equal generalization” <?>
  19. Brief discussion of how to extend the theory to deal with these cases (such as how to deal with sharply bounded “consequential regions”
  20. “We generalize from one situation to another not because we cannot tell the difference between the two situations but because we judge that they are likely to belong to a set of situations having the same consequence.”
  21. “probability of generalization approximates an exponential decay function of distance in psychological space”
  22. “to the degree that the spreads of consequential stimuli along orthogonal dimensions of that space tend to be correlated, psychological distances in that space approximate the Euclidian or non-Euclidian metrics associated, respectively with the L2- and L1- norms for that space.”
Tagged ,

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: