Symbolic Dynamic Programming for Continuous State and Action MDPs. Zamani, Sanner, Fang. AAAI 2012

  1. Paper presents a way to optimally (and in closed-form) solve continuous MDPs, does Symbolic Dynamic Programming (SPD)
  2. Functions in multivariate continuous state and action, discrete noise (whats that mean in this case?), piecewise linear reward.
  3. There is limited work that deals with exact solutions.  In control, there is LQR/LQG (linear quadratic with gaussian noise), but in LQR transition dynamics or reward cannot be piecewise linear
  4. The paper deals with 2 domains: Mars Rover, Inventory Control, and Resivior Management
  5. States can be composed of discrete and continuous factors
  6. (not surprisingly) requires a complete model of the MDP
  7. Because CSA-MDPS (continuous state action mdps) are naturally factored (there is a reference for this) , they can be dealt with as a dynamic bayes net (DBN)
    1. Assume no synchronic arcs (all dependencies are strictly forward in time), which allows for a relatively simple description of the transition dynamics
  8. The solutions here are optimal discounted finite-horizon
  9. The conditional probability functions are described in terms of piecewise linear equations with the following properties:
    1. Conditioned only on action, current state, and prev state vars (this is weird, why current state?)
    2. The are deterministic (can be encoded with dirac deltas), this means gaussian noise in the dynamics, for example is no good.
    3. They are piecewise linear
  10. Can sort of handle quadratic rewards, but this is restricted a bit (has to be univariate quadratic?)
  11. Describe value iteration for continuous MDPs
  12. Dynamics get represented as a bunch of disjoint linear and quadratic functions
  13. <its so cold in this room that I can’t even concentrate on what I’m reading, probably missing stuff here>
  14. Integration is required, but it turns out that in this setting it is solvable in a particular closed form, so it isn’t an issue
  15. This is very cool and important, but a general point to make is that this does exact solving, so it is brittle in a way.  A region that isn’t linear or quadratic just ends up being unsolvable.  Methods that are less exact can deal with that sort of problem
  16. The max operator is commutative? What is the casemax operator?
  17. It is claimed that the solver is “efficient”, but I’d like to know what that means in particular
  18. Solutions require linear checking/pruning in the XADD (I think thats the structure that represents the solution) or problems blow up extremely quickly
  19. Claims here that this approach produces the first exact solution for a couple of canonical OR problems.  A very neat paper, I wish I understood it better AAAI 2012 seems to have the very prominent talks on videolectures but thats it.
Tagged ,

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: