Learning Methods for Sequential Decision Making with Imperfect Representations. Thesis by Shivaram c.


Basic idea is dealing with the issue of how to do RL in continuous spaces, when it is difficult to represent what we would really like to.

Chapter 1:

  1. Compares value-based, and PS methods (sarsa lambda and CMA-ES, which seems to be not totally different from cross-entropy)
  2. Discussing using sarsa to seed policies for PS methods

Chapter 2:

  1. State aliasing is when two actually distinct states are recognized as the same (pomdps).  This can actually be helpful when generalization between the two states is useful
  2. Really solving POMDPs isn’t possible in large domains, but in some domains the partial observability doesn’t kill you and you might as well act based on the observations directly
    1. Belief states are basically impossible to deal with in continuous states anyway
  3. Discusses PS on keepaway, but a version that uses a FA for each action (probably estimating value for each) and then picking the action with the highest output

Chapter 3:

  1. Sometimes value-based methods are more effective than PS and vice versa- it depends on the setting.  Experimentally these settings are investigated in this chapter
  2. Discusses that in supervised learning, we have fairly good ideas as to what off-the shelf methods work well on what “type”  of problem, but in RL this body of knowledge is less well developed
  3. Part of the goal of the emprirical results here is to be able to tease apart what qualities of problems work well for which RL algorithms
  4. They tune each algorithm to each particular instance of each domain
  5. Experiments are in gridworlds…. but the point is they add a layer of POMDPness to it, where (nearby) states can get conflated
    1. I dont exactly grasp the method if PO-alizing it
    2. But there is discussion of applications to larger domains such as Tetris and Keepaway
  6. Their sarsa(lambda) implementation uses cmacs with linear value estimators
  7. Admit that exploration isn’t very important in this experimental setup
  8. Agent always has only two actions (north or east)
  9. …dropping it mid ch 3 (3.3.2) as it looks a bit orthogonal to my research.  I’m a bit interested in the parts about Tetris and Keepaway that I might look at later.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: