Structure in the Space of Value Functions. Foster, Dayan. Machine Learning 2002.

  1. Consider case where dynamics are static but rewards may change – need at least value functions for a few instantiations of the rewards to use it
  2. Standard hierarchical RL does temporal abstraction.
    1. The approach here instead performs structural abstraction, which is state aggregation
  3. A common method is to group states together, which is often lossy (Michael has a paper on when this isn’t lossy)
  4. Here they propose augmenting state information with structural abstraction, as opposed to throwing away the original state and only using the aggregated state
    1. This gets rid of problems of sub-optimal solutions or partial observability
  5. Generally structural abstraction is done top-down, but here a bottom-up approach is proposed
    1. Do unsupervised learning on collections of optimal value functions <how is that unsupervised then?>
  6. “…value functions, particularly for a few different goals, contain substantial information about the functional texture of the underlying MDP, amounting, in many cases, to an appropriate structural decomposition of the problem.”
  7. When doing unsupervised learning on the structure, some parameters to the classifier are goal-independent and some are goal-dependent
  8. <Not really focusing on how they do their unsupervised learning, as I think it can be treated mostly as a black-box for conceptual purposes; based on model fitting via EM of Gaussian mixture model>
  9. <Their boundaries are surprisingly sharp given that they are using mixtures of Gaussians; not sure how they got that to work.  There are only a couple of cells that are incorrectly labelled>
  10. When increasing resolution (same world topology with a finer grid) basically the same results fall out
  11. RL approach used here is actor-critic
  12. For critic, the state and chunk/fragmentation label are used (they get different learning rates)
  13. They find a fragmentation from just 2 states set as goal, and then evaluate performance on domains where any other state can be the goal (so the train and test domains are separate)
  14. The fragmentation found helps.  Outperforms no fragmentation as well as a random fragmentation
  15. So this example had just 1 level of fragmentation; its possible to have multiple levels of fragmentation: “A natural generalisation is to a hierarchical fragmentation in which states are free to belong simultaneously to many fragments at different levels, reflecting, for example, thier position in rooms, in particular parts of rooms, and the like.”
  16. Here they consider a 2-level hierarchy
  17. “The solution to the problem of fitting a hierarchy of c0ntributing experts is under-constrained and so, … we first train at the higher level and then train lower levels on the residual error from the higher level.  The hierarchical model incorporates prior information that the flat model did not: that there should be a certain number of low-level fragments tied to each high-level fragment.”
  18. The fragmentation found from the hierarchical decomposition seems like it makes sense.  Learning is as you would expect quite sped up
  19. <I still don’t grok how its unsupervised.  Too sleep deprived to go back and figure that out; I think the basic points are what matter for right now anyway.>
  20. Mentions a few issues with the unsupervised learning they are doing
  21. Here there are no risks of partial observability, but the hypothesis space being searched over while learning in any particular instantiation of the problem is larger than the flat problem
    1. <This may not be a problem as it seems to help speed up learning even though it is more complex>

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: