Towards a Unified Theory of State Abstraction for MDPs. Li, Walsh, Littman. Ninth International Symposium on Artificial Intelligence and Mathematics. 2006

  1. Considers a large number of abstraction algorithms, and discusses exactness and some other characteristic of each method
    1. Bisimulation
    2. Homomorphisms
    3. Approximate Bisimulation
    4. Bisimulation Metrics
    5. MaxQ
    6. Stochastic Dynamic Programming
    7. G Algorithm
    8. Utile Distinction
    9. Policy Irrelevance
    10. Adaptive Aggregation
  2. Their definition of an abstraction is “a mapping from one problem representation to a new representation, while preserving some properties.”
    1. Here, in particular they are interested in preserving properties that allow representation of optimal policy
  3. Concerned only with state, and not action abstraction
  4. Φ is defined as the mapping from raw to abstracted state; a state corresponds to one abstract state, but an abstract state maps to a set of raw states, these mappings are mutually exclusive
  5. New reward and transition functions are needed for abstracted states that perform weighted average over T and R in the raw space
    1. They don’t really go into how much each state weighs
  6. Define fineness and coarseness; a mapping is as fine as another if a pair of raw states that map into a single abstract state in both mappings
    1. This relation satisfies reflexibility, antisymmetry, and transitivity
    2. The >= finer mapping is a partial ordering
  7. A model-irrelevance abstraction is one where 2 raw states map to the same abstract states, and have the same reward function and transition function in the original MDP
  8. A Qπ-irrelevance abstraction is one where for any action, if the two raw states map to the same abstract state, they have the same Q function.  There is a similar rule for Q*
    1. Similarly, for a*-irrelevance, if a particular action is a* for any raw state mapped to the abstract state, it must also be for all other states that map to the same abstract state
  9. Different irrelevance rules maintain particular aspects that other algorithms listed in 1 maintain
  10. There is an ordering of fineness according to the different forms of model irrelevance defined above
    1. “As a consequence, any one of the five abstractions is a special case of other finer abstractions”
    2. This means that proofs about a particular level of abstraction will automatically apply to some other abstractions
  11. With the abstractions defined above, optimality is maintained in the abstract MDP
  12. On the other hand, Q-learning does not converge in all abstractions, or may converge to a suboptimal policy
    1. The proof for this is too sketchy to understand though
    2. “chattering” of q values arises if weights are defined as the frequency of visitation.  As the policy varies the weights do as well which can cause issues
  13. Results here generalize Gordon’s results about issues with VFA
  14. Results hold not just for Q-learning, but all “algorithm[s] based on on-trajectory sampling, combined with a non-stationary policy.”
  15. There are similar results for model-based RL.  Weighing must be fixed and the coarsest form of abstraction in general will not converge to optimal
  16. Different levels of coarseness allow for different operations to be performed/recovery of different information
    1. With the coarsest abstraction, optimal planning is no longer possible, but if the optimal policy was presented, it could be maintained
  17. “Therefore, there is a tradeoff between minimizing information loss and maximizing state space abstraction when selecting abstractions.”
  18. Ultimately for abstraction they say the most useful is the one where all states that map to an abstract state have the same optimal action; it still allows convergence of most algorithms but also allows for a fairly coarse representation
  19. The way to actually discover abstractions is still mostly open work, although some algorithms attempt it
  20. Discuss some interesting future work
    1. Definitions discussed here are based on exact representations, but ones that tolerate some error are probably also useful
    2. Use homomorphims? (I dont know what that is.)
    3. Applications to HRL
    4. Other operations on abstractions like intersection and composition
    5. Abstraction and factored representations
    6. State-space partitioning in continuous MDPs

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: