- Deals with finding hierarchies – not just using them
- Covers two main contributions
- Compressing /homogenizing MDPs
- Identification of possibilities for transfer

- “Regardless of how the partitioning is accomplished, the statespace is divided into a multiscale collection of “clusters” connected via a small set of “bottleneck” states. A key function of the partitioning step is to tie computational complexity to problem complexity. It is problem complexity, controlled for instance by the inherent geometry of a problem, and its amenability to being partitioned, that should determine the computational complexity, rather than the choice of statespace representation or sampling. The decomposition we suggest attempts to rectify this difficulty. “
- Multiscale compression is local, and recursive
- Present a form of asynchronous policy iteration, which as O(n log n) cost per iteration
- In terms of transfer, they don’t only consider the common case of sharing low-level options, but also the other side of the coin that the high level dynamics
*between*options may be the same, even if the dynamics in the low-level policies are different - Consider stochastic policies, as well as discount factors that are distinct for state action pairs
- 3 Step process:
- Partition states into clusters connected by bottleneck states
- Compress MDP so each cluster becomes a state
- This may be done a number of times

- Solve hierarchy top-down

- Use diffusion map embeddings to cluster domain (but mention there are other possibilities)
- Uses Chung’s method for finding the Laplacian of directed graphs, <but this can only describe a subset of all possible directed graphs>
- Does “low conductance cuts”
- Briefly mention a number of other possible metrics

- Describes
**two types of bottlenecks****Geometric**, which are based on graph structure alone**Problem**, which is based on the geometric bottlenecks as well as the goal structure of the MDP- “If the policy is already strongly directed according to the goals defined by the rewards, then the bottlenecks can be interpreted as choke points for a random walker in the presence of a strong potential.”

- Have rewards, transitition funcitons, states, actions for each MDP at different levels of coarseness
- “One of the important consequences of these definitions is that the optimal fine scale value function on the bottlenecks is a good solution to the coarse MDP, compressed with respect to the optimal fine scale policy, and vice versa. It is this type of consistency across scales that will allow us to take advantage of the construction above and design efficient multiscale algorithms. “

- Mention similarities of the approach to other classes of problems, where the basic approach is to make a small problem of a large one by locally averaging parts of the problem
- Coarsening: PDEs and meshes <?>
- Homogenization: PDEs
- Lumping: Markov chains

- Assume that the policy has at least some degree of chance action selection
- Basically solving the MDP at all hierarchies results in an iterative process
- <skimming the rest>

Advertisements