Untangling Invariant Object Recognition. Dicarlo, Cox. TRENDS in Cognitive Sciences 2007.

Read this because this paper is pointed to as evidence of slow features doing object recognition, but really there is very little mention of SFA.

  1. Considers object recognition, and “…show that the primate ventral visual processing stream achieves a particularly effective solution in which single-neuron invariance is not the goal.”
  2. “Because brains compute with neurons, the subject must have neurons somewhere in its nervous system — ‘read-out’ — neurons – that can successfully report if objectg A was present […].”
  3. “The central issues are: what is the format of the representation used to support the decision (the substrate on which the decision functions operate); and what kinds of decision functions (i.e. read-out tools) are applied to that representation?”
  4. “… we treat object recognition fundamentally as a problem of data representation and re-representation, and we use simple decision functions (linear classifiers) to examine those representations.”
  5. Object recognition is fundamentally difficult, because “… each image projected into the eye is one point in an ~1 million dimensional retinal ganglion cell representation (…).”
  6. Manifolds: “… all the possible retinal images that face could ever produce… form a continuous, low-dimensional, curved surface inside the retinal image space called an object ‘manifold’…”
  7. Consider two potential image manifolds that “… do not cross or superimpose — they are like two sheets of paper crumpled together… We argue that this describes the computational crux of ‘everyday’ recognition: the problem is typically not a lack of information or noisy information, but that the information is badly formatted in the retinal representation — it is tangled (…).”
  8. “One way of viewing the overarching goal of the brain’s object recognition machinery, then, is as a transformation from visual representations that are easy to build (e.g. center-surround filters in the retina), but are not easily decoded … into representations that we do not yet know how to build (e.g. representations in IT), but are easily decoded…”
  9. “… single neurons at the highest level of the monkey ventral visual stream — the IT cortex — display spiking responses that are probably useful for object recognition.  Specifically, many individual IT neurons respond selectively to particular classes of objects, such as facesor other complex shapes, yet show some tolerance to changes in object position, size, pose, and illumination, and low-level shape cues.”
  10. In <what I suppose is actual> neural responses from 200 neurons in IT (the highest part of the visual system). Simple linear classifiers on activity were robust to variation in object position and size.  More sophisticated classifiers didn’t improve performance much, and were ineffective when applied directly to v1 activity.
    1. V1 doesn’t allow for simple separation and identification of objects but IT does.
  11. The classical approach to understanding visual processing is to consider the action of individual neurons — here they consider more the activity of entire populations in the representation of visual images
  12. According to this perspective, the best way to build object representations is to find a way to separate manifolds
  13. Because populations are considered, it is not the case that a single neuron must represent things (although if a single neuron can split the manifolds that is great).  Because “… real IT neurons are not, for example, position and size invariant, in that they have limited spatial receptive fields <as opposed to wide spatial receptive fields, as would be required if single neurons were actually responsible for representing invariances>[…].  It is now easy to see that this ‘limitation’ <of relying on populations of neurons as opposed to single neurons> is an advantage.”
  14. Manifold flattening may be hard wired, but it may also be the case that the means to do this is learned from the statistics of natural images. <The latter must play at least part of the role>
  15. Progressive flattening by increasing depth in visual system
    1. “… we believe that the most fruitful computational algorithms will be those that a visual system (natural or artificial) could apply locally and iteratively at each cortical processing stage (…) in a largely unsupervised manner(…) and that achieve some local object manifold flattening.”
  16. In the actual visual system, each layer projects information into a spaces of increasing dimension “… e.g. ~100 times more v1 neurons than retinal ganglion neurons …” as data gets projected to higher dimensions (if the projection is good) it makes simpler methods of analysis and classification more effective
    1. Additionally, at each stage the distribution of activity is “… allocated in a way that matches the distribution of visual information encountered in the real world …” which means the projections into higher dimensions is actually using the additional space reasonably
  17. The addition that temporal information helps untangle manifolds: “In the language of object tangling, this is equivalent to saying that temporal image evolution spells out the degrees of freedom of object manifolds.  The ventral stream might use this temporal evolution to achieve progressive flattening of object manifolds across neuronal processing stages.  Indeed, recent studies in our laboratory … have begun to connect this computational idea with biological vision, showing that invariant object recognition can be predictably manipulated by the temporal statistics of the environment.”
    1. This is where SFA is cited

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: