Dynamic Programming for Structured Continuous MDPs


Feng, Dearden, Meuleau, Washington, UAI 03.

  1. Interested in doing efficient dynamic programming in continuous state space environments
  2. Algorithms SPUDD and SPI work by identifying regions of the state space that have a similar value under optimal policy and treating them as one large state
  3. This algorithm does a similar thing in continuous state spaces.  They motivate the approach by saying it is useful for the type of tasks rovers encounter when trying to optimize over time and energy
    1. Discuss its use in domains that have piecewise linear value functions
  4. Actually requires a discrete transition function, so I don’t know why they say its for continuous state domains; its not
    1. They say well you can just discretize the continuous space, but I don’t understand why they posed this as a continuous state algorihtm.
    2. How is this now different from SPUDD/SPI?
    3. Argue that discretizing and solving is better than using other FAs because it is an exact solution of an approximation of the domain, as opposed to an approximate solution of the value function of the actual model
    4. Seems like they are getting back into the same mess by choosing their partition; I don’t follow their arguments
  5. Uses computational geometry to store state space partitions and POMDP to do Bellman backups…
  6. Claim solving of a rover problem went from a day of computation to a few minutes
  7. Also does an experiment in another rover domain which is planning oriented, may be worth checking out
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: