AutoIncSFA and Vision-based Developmental Learning for Humanoid Robots. Kompella, Pape, Masci, Frank, Schmidhuber. International Conference on Humanoid Robotics, 2011.

Related to the other Incremental SFA paper here.  Seems to have a bit of overlap included in the IncSFA software package.

  1. Learns from high dimensional raw inputs and maps it to a low dimensional space representing slowly changing features
  2. When doing RL on robots, working in the raw input space will generally lead to prohibitively long learning periods
  3. AutoIncSFA is different from the original Incremental SFA in that it also employs an autoencoder (AE)
    1. “The AE performs spatial compression while IncSFA extracts spatiotemporal features that change slowly over time.”
  4. There is previous work that is somewhat similar <definitely not the same> but only works in batch
  5. “Sometimes, however, the slowest component is not the most intuitive one; for example when observing an object that moves in front of a camera and occasionally leaves the field of view, the slowest feature is the presence/absence of the object, not its position.”
  6. Give the example of a robot watching a board move towards and away in visual field.  Batch SFA has a few problems in this setting (IncSFA overcomes the first two <and sort of obviates the last>):
    1. Covariance estimates done in batch SFA (but not incremental SFA) are very expensive in high dimensional data
    2. If for some reason the signal doesn’t change between time steps will lead to “… singularity errors, since the matrix won’t have full rank.  This is an important problem in humanoid robot applications where often only a small part of the input image changes.”
    3. In many cases there may be some features that change very slowly but aren’t relevant to the task <I’m not sure how IncSFA doesn’t suffer from this as well though>
    4. “Generalization properties of Hierarchical SFA are limited.  For example, training H-SFA on one human interactor, but testing on another, may yield erroneous output.”
      1. <H-SFA is commonly used because its too expensive to work in the raw data space (primarily because of covariance estimates) – hierarchy allows for example a higher-order polynomial representation that is a bit sparsified.  One of the arguments for IncSFA elsewhere is that because it doesn’t generate covariance matrices, it is unnecessary to use hierarchical representations because its cheap enough to use directly on the full input.>
  7. “SFA uses principal component analysis (PCA) […] twice.  In the first stage, PCA whitens the signal to decorrelate it with unit variance along each PC direction.  In the second stage, PCA on the derivative of the whitened signal yields slow features.”
    1. IncSFA replaces both instances of PCA use with incremental methods.  Each time a different PCA algorithm is used.
      1. For first PCA Candid Covariance-Free Incremental Principal Component Analysis (CCIPCA) is used.
      2. For second PCA, a different method is used, because the slow features are the least significant components.  That method is Minor Component Analysis (MCA).  “To extract multiple minor components in parallel, it uses MCA with sequential addition […].”
  8. For hierarchical IncSFA, the lower levels must converge before the upper levels can be trained, so learning takes a long time
    1. <This is in addition to the fact that (admittedly by the authors) IncSFA already uses each sample less efficiently than batch SFA>
  9. Autoencoders (ANNs used for dimension reduction) can help improve convergence speed by doing dimension reduction
    1. <Maybe the method used in the paper is hierarchical after all?  Still not clear – its interesting because other IncSFA papers from this lab say that one of the major benefits of IncSFA is that the improved computational complexity (as a function of problem dimension) obviates the need for hierarchical methods>
    2. Autoencoders basically are like neural multiplexors – there is a narrow internal layer but the network is symmetric top to bottom.  Then network should learn to produce outputs that are the same as the inputs.  Once that happens, the network is cut in half (so the output is now the narrow multiplexor part) and then the reduced dimensionality signals from there are used instead of the raw input
    3. “Applied to image patches […], AEs learn biologically plausible Gabor-like filters resembling the responses of the striate mammilian cortex […].”  They can also be used to perform nonlinear PCA
  10. The resulting system isn’t really hierarchical in the sense that there aren’t recurring ANNs or SFAs, but the autoencoder is “below” the SFA as its output is the input to SFA
  11. In the experiment, a person walks toward and away from a <stationary?> robot
    1. The image is 83 x 100 pixels, and the AE has 100 hidden units
    2. 3,000 images used
  12. <Ah,> They compare hierarchical SFA to AutoIncSFA
  13. They allowed for some “distractor” events to occur during training, such as the opening and closing of a door, and other people walking by
  14. They check the output of the first slow feature as the person in the image moves away (or toward?)
    1. The output of HSFA is basically flat with a few spikes in between
    2. The output of AutoIncSFA is pretty linearly increasing with distance
    3. <They say> “We see that H-SFA completely fails…” <but thats not really fair because its an unsupervised method, and the slow features aren’t created in a way that are necessarily related intuitively to what is going on in the scene.  On the other hand, it is mentioned in a number of places that batch SFA will overfit to spurious events as they can serve to encode very slowly varying features just as a function of rarity>
  15. “Analyzing the output features of H-SFA, we found that most features code for slowly occurring noisy elements, such as the door opening or closing and people passing by.  Only subsequent units encode the interactor position, which usually gets mixed with higher frequencies of the first units.”
  16. AutoIncSFA also generalizes well, training with one person but testing with another person on camera results in essentially the same output.  Mentioned earlier that H-SFA often has problems with this
  17. In the next experiment the robot is looking down at a table with a cup on it.  The robot moves its hand somewhat randomly until both of the cups are knocked over, which is episodic
  18. AutoIncSFA basically learns an on/off threshold feature for whether the cup is standing or fallen
  19. Then in a second experiment they introduce a bottle as well as a cup – the first slow feature codes for the cup standing/fallen as a step function exlusively, and the second feature codes exclusively for whether the bottle is standing or fallen
  20. Moves onto discussion of intrinsically motivated agents, citing The “Formal Theory of Fun and Creativity” <not going to read this part carefully.>

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: