Multiscale Markov Decision Problems: Compression, Solution, and Transfer Learning. Bouvrie, Maggioni. Arxiv 2012Use

  1. Deals with finding hierarchies – not just using them
  2. Covers two main contributions
    1. Compressing /homogenizing MDPs
    2. Identification of possibilities for transfer
  3. “Regardless of how the partitioning is accomplished, the statespace is divided into a multiscale collection of “clusters” connected via a small set of “bottleneck” states. A key function of the partitioning step is to tie computational complexity to problem complexity. It is problem complexity, controlled for instance by the inherent geometry of a problem, and its amenability to being partitioned, that should determine the computational complexity, rather than the choice of statespace representation or sampling. The decomposition we suggest attempts to rectify this difficulty. “
  4. Multiscale compression is local, and recursive
  5. Present a form of asynchronous policy iteration, which as O(n log n) cost per iteration
  6. In terms of transfer, they don’t only consider the common case of sharing low-level options, but also the other side of the coin that the high level dynamics between options may be the same, even if the dynamics in the low-level policies are different
  7. Consider stochastic policies, as well as discount factors that are distinct for state action pairs
  8. 3 Step process:
    1. Partition states into clusters connected by bottleneck states
    2. Compress MDP so each cluster becomes a state
      1. This may be done a number of times
    3. Solve hierarchy top-down
  9. Use diffusion map embeddings to cluster domain (but mention there are other possibilities)
  10.  Uses Chung’s method for finding the Laplacian of directed graphs, <but this can only describe a subset of all possible directed graphs>
    1. Does “low conductance cuts”
    2. Briefly mention a number of other possible metrics
  11. Describes two types of bottlenecks
    1. Geometric, which are based on graph structure alone
    2. Problem, which is based on the geometric bottlenecks as well as the goal structure of the MDP
    3. “If the policy is already strongly directed according to the goals defined by the rewards, then the bottlenecks can be interpreted as choke points for a random walker in the presence of a strong potential.”
  12. Have rewards, transitition funcitons, states, actions for each MDP at different levels of coarseness
    1. “One of the important consequences of these definitions is that the optimal fine scale value function on the bottlenecks is a good solution to the coarse MDP, compressed with respect to the optimal fine scale policy, and vice versa. It is this type of consistency across scales that will allow us to take advantage of the construction above and design efficient multiscale algorithms. “
  13. Mention similarities of the approach to other classes of problems, where the basic approach is to make a small problem of a large one by locally averaging parts of the problem
    1. Coarsening: PDEs  and meshes <?>
    2. Homogenization: PDEs
    3. Lumping: Markov chains
  14. Assume that the policy has at least some degree of chance action selection
  15. Basically solving the MDP at all hierarchies results in an iterative process
  16. <skimming the rest>

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: