## Planning for Markov Decision Processes with Sparse Stochasticity. Likhachev, Gordon, Thrun. NIPS 2004

1. Algorithms for deterministic planning problems (Dijkstra’s) run faster than those for deterministic problems (primarily because in deterministic problems, optimal plans are acyclic).   This work tries to split the difference for domains that are only sparsely stochastic
2. Algorithm is called MCP (MDP compression planning)
3. Combines A* with VI for solving stochastic-shortest-path problems
4. Empirical results involve integration and mapping of two actual robots (one helicopter, and one ground robot) to do mapping and then planning
5. In their definition of sparse stochasticity, most states have actions with deterministic outcomes, and only a few have stochastic outcomes (this is different from the more common definition of sparse stochasticity where all transitions may be stochastic, but that distribution is only across a small number of states)
1. The trick with this assumption is that since only a few of the states are stochastic, they are the only ones that can lead to cycles in an optimal policy
6. The idea then is to build an MDP that consists only of the stochastic states from the original domain.
1. So all the deterministic states are compressed out
2. The states included (the stochastic ones) are called distinguished states.  In addition, start and goal states are included to the MDP
7. Based on this differentiation, a fast algorithm for planning through determinstic states (such as A*) and then a different one for dealing with the stochastic states (like VI)
8. Figuring out how to do this decomposition isn’t straightforward (for example, the transition diagram in the MDP of distinguished states depends on reachability between deterministic states)
1. So the decomposition is done incrementally
9. In the problem considered “The ground robot needs to plan a path from its current location to a goal, but
has only partial knowledge of the surrounding terrain. The helicopter can aid the ground
robot by flying to and sensing places in the map.”
10. Map is discretized and described in 2d
11. In areas still uncovered, there is a belief as to the probability that state will be open (and therefore passable).  The ground robot can try and plan through those areas as if they were open, but runs the risk of having to replan if one of the cells turns out to be occupied.  The other option is it can request that the helicopter uncover that cell
12. The overall planning goal is to minimize the cost of running both robots, with the episode terminating when the ground robot reaches its goal and the helicopter goes back to home
13. This forms a POMDP, (which are in general intractable) <but this particular class of POMDPs is almost certainly tractable because it is possible to get to ground truth just by observing each corresponding cell>
1. <Ah not exactly> Sensors are imperfect and may contradict each other.  But it is assumed that most of the path planning is deterministic and only rarely noise in sensing will arise
14. Planning is done in belief space; they discretize this though and plan in discretized belief space
15. The compressed MDP is built by running deterministic searches <I suppose a deterministic policy?  Which part in particular is deterministic is a little unclear.  Seems like the policy is deterministic as>
1. Actions that produce different outcomes are put in the set of distinguished states
2. The best deterministic paths are memoized and are used to produce better cost estimates (in a manner similar to Bellman backups)
3. These deterministic searches can use an admissible heuristic, which means that much of the state space may be (correctly) unexplored, which leads to cheaper planning
16. Because the MDP of distinguished states is small, planning there is cheap
17. Planning done over the stochastic states is used as a heuristic for the deterministic planning part
18. Each action in the MDP actually corresponds to a series of actions in the domain; only the final action in this sequence may produce a stochastic outcome
1. “Among all compressed actions starting at s and ending at (s’, a) there is (at least) one with lowest expected cost; we will call such an action an optimal compression”
2. If all compressions are optimal for all states in the MDP, then the MDP is said to be fully compressed
19. Solving the full compressed MDP will produce the optimal policy, but its possible to get away with less work, in particular because the planning problem considered is goal-directed, so many states simply may not have to be considered
20. State selection on each iteration is based on something in the form of the Bellman residual
21. <I wonder how this relates to prioritized sweeping>
22. <Not taking much  notes on specifics of algorithm, but seems totally reasonable for the current amount of attention I am paying>
23. The top level of the algorithm repeatedly tries to find trajectories from start to goal
1. Values produced by the MDP used as a heurisitic in this search should be admissible
24. Need a modified version of A* that can factor in stochasticity in the domain; they propose such a modified A* planner
25. Has theoretical guarantees, PACy
26. Compare against RTDP, LAO*, and VI – VI was restricted to problems with perfect sensing because otherwise the size of the state space is prohibitive
27. 5-10x the speed of competitors.  The slowest part is <surprisingly> the deterministic search component <I suppose the stochastic states are very sparse>.  Mention another search algorithm may help reduce the cost of that particular component
28. As stochasticity becomes less sparse, its time advantage reduces, but is still better than competitors