Autonomous Learning of Abstractions Using Curiosity-Driven Modular Incremental Slow Feature Analysis. Kompella, Luciw, Stollenga, Pape, Schmidhuber. Development and Learning and Epigenetic Robotics (ICDL), 2012

  1. Presents “…a modular, curiosity-driven learning system that autonomously learns multiple abstract representations.” from vision data
  2. Based on RL and Incremental Slow Feature Analysis
  3. In order to do RL sucessfully in high dimensional domains, feature reduction is important
  4. “Slow-features depend on how te inputs change over time […] and can be used to learn behavior-specific <my emphasis> abstractions[…]”
  5. Call their framework curiosity-driven modular incremental slow feature analysis (Curious Dr. MISFA) <you have to be kidding me>.  This enables learning multiple slow feature modules for behavior specific abstractions
    1. Good – this helps deal with one of the 2 fundamental problems I have with SFA
  6. Based on hierarchical RL where each option also has an associated feature abstraction component
    1. Makes a nod to Konidaris’ skill trees
    2. The key here though is that it is designed to work in high dimensional spaces without expert training
  7. No extrinsic motivation <so probably a pure exploratory setting>
    1. Reward occurs when error of model it builds is low
    2. Once accuracy reaches a certain level, the abstraction is crystallized and added to the abstraction library\
  8. “Slow features are useful for RL.  SFs approximate proto-value functions […] from sampled observations of a … MDP.  They are approximations of the low-order eigenvectors of the graph Laplacian matrix representing the MDP.”
  9. The high-level architecture is similar to one proposed by Andy and Ozgur also based on intrinsic motivation <I don’t think I’ve read that paper though>
    1. Environment consists of internal and external environments
    2. The internal one generates the rewards for intrinsic motivations
  10. The internal environment also takes care of keeping track of options
    1. Options have <pre defined?> initialization and termination states
    2. Also have pre-defined exploration policies <seems like here its just a random policy>
    3. There is also a feature reduction/abstraction component related to each option
  11. Termination can only occur in a state where there is at least one option that initializes there (no controllers on the primitive level)
  12. The goal of the algorithm is to build the abstraction library
  13. Options are called internal-state here that is based on what occurs in the internal environment (based on what happens in the external environment)
    1. Basically the state the agent cares about is simply the option that is being executed
  14. When the option reaches a terminal state, the agent can either select the same option again, or randomly switch to another policy that initializes from that external state
    1. <So its doing a random walk in option-space>
  15. The model is based on the slow features, but this is nonstationary as the slow features change as data is obtained <especially because the incremental as opposed to batch algorithm is used> so a supervised learning algorithm that can deal with nonstationarity is needed
  16. Agent builds a model of the option dynamics
  17. And then there is an epsilon-greedy policy based on LSPI
  18. Once error reaches a threshold, the module is frozen and epsilon is reset to learn a new module
  19. <Seems like once a state can be covered by a crystalized option no reward is generated, but once a region is reached that isn’t covered by an option reward resumes?>
  20. Emprical results on states that have dynamics that look like sine waves <skipping>
  21. Next set of empirical results is for their simulated humanoid robot
  22. Two options and therefore 2 states, which involve moving one shoulder joint or the other.
    1. Moving either joint will knock over one of two cups stacked on a table in front of the robot
  23. It learns one option (eventually encoding whether the cup is standing or knocked over), then that stops generating intrinsic reward, and then moves onto the other option, until that is learned as well
  24. The algorithm “… can prevent abstraction duplicaiton and therefore permit generalization.  It does not need to know the number of useful abstractions in advance.  New ones are created once existing ones are insufficient to encode/predict input signals well.”

Followup paper An Intrinsic Value System for Developing Mutiple Invariant Representations with Incremental Slowness Learning

Goes more into computational properties and “neruophysiological analogs fulfilling similar functional roles.”  Will probably be taking notes less thoroughly here

  1. “Experimental results on synthetic observations and on the iCub robot show that the intrisic value system is essential for representation learning.  Representations are typically explored and learned in order from least to most costly, as predicted by the theory of curiosity.”
  2. <Asking experts whether worthwhile to read neuro parts before I invest more time.>

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: