- Paper presents a way to optimally (and in closed-form) solve continuous MDPs, does Symbolic Dynamic Programming (SPD)
- Functions in multivariate continuous state and action, discrete noise (whats that mean in this case?), piecewise linear reward.
- There is limited work that deals with exact solutions. In control, there is LQR/LQG (linear quadratic with gaussian noise), but in LQR transition dynamics or reward cannot be piecewise linear
- The paper deals with 2 domains:
**Mars Rover**,**Inventory Control**, and**Resivior Management** - States can be composed of discrete and continuous factors
- (not surprisingly) requires a complete model of the MDP
- Because CSA-MDPS (continuous state action mdps) are naturally factored (there is a reference for this) , they can be dealt with as a dynamic bayes net (DBN)
- Assume no synchronic arcs (all dependencies are strictly forward in time), which allows for a relatively simple description of the transition dynamics

- The solutions here are
**optimal discounted finite-horizon** - The conditional probability functions are described in terms of piecewise linear equations with the following properties:
- Conditioned only on action, current state, and prev state vars (this is weird, why current state?)
- The are deterministic (can be encoded with dirac deltas), this means gaussian noise in the dynamics, for example is no good.
- They are piecewise linear

- Can sort of handle quadratic rewards, but this is restricted a bit (has to be univariate quadratic?)
- Describe value iteration for continuous MDPs
- Dynamics get represented as a bunch of disjoint linear and quadratic functions
- <its so cold in this room that I can’t even concentrate on what I’m reading, probably missing stuff here>
- Integration is required, but it turns out that in this setting it is solvable in a particular closed form, so it isn’t an issue
**This is very cool and important, but a general point to make is that this does exact solving, so it is brittle in a way. A region that isn’t linear or quadratic just ends up being unsolvable. Methods that are less exact can deal with that sort of problem****The max operator is commutative? What is the casemax operator?**- It is claimed that the solver is “efficient”, but I’d like to know what that means in particular
- Solutions require linear checking/pruning in the XADD (I think thats the structure that represents the solution) or problems blow up extremely quickly
- Claims here that this approach produces the first exact solution for a couple of canonical OR problems. A very neat paper, I wish I understood it better AAAI 2012 seems to have the very prominent talks on videolectures but thats it.

Advertisements