Learning Representation and Control in Markov Decision Processes: New Frontiers. Mahadevan. Foundations and Trends in ML 2008.

  1. Approach is to solve MDPs by iteratively considering low-D reps and approx optimal policies
  2. “Exact solutions of discounted and average-reward MDPs are expressed in terms of a generalized spectral inverse of the Laplacian called the Drazin inverse.”
  3. Algorithm is called representation policy iteration (RPI)
    1. Model-free and model-based versions are given
  4. “Two approaches for dimensionality reduction of MDPs are described based on geometric and reward-sensitive regularization, whereby low-dimensional representations are formed by diagonalization or dilation of Laplacian operators.”
  5. “… the aim is to solve MDPs by automatically finding ‘lowdimensional’ descriptions of ‘high-dimensional’ functions on a state (action) space.”
    1. The relevant functions are the policy, transitions, and value
  6. Method tries to find representations based on the underlying manifold dynamics
    1. Means method is usable in continuous (state) problems
  7. In the Laplacian off-diagonal elements are nonpositive, and row sums are 0
    1. Means they are singular and therefore in general don’t have a direct inverse
    2. Moore-Penrose pseudoinverse is the most common method when direct inverse isn’t possible
    3. But something called the Drazin inverse is for spectral cases and is very important in study of Markov chains
  8. In particular, for an MDP M with transition matrix T, care about
    1. A=I-T
    2. As well as the Drazin inverse of A
    3. But then the go on to call AL, and T, P, so L=I-P
  9. Getting a good low-dimensional representation depends on finding the right basis functions that span the underlying manifold
  10. Deciding what basis to use is fundamentally about making tradeoffs – for example, in a discrete domain, using tabular methods will yield optimal results but will be expensive and noncompact; using a sparser representation is cheaper in some ways <although with my critical CS hat on, I think the big-O costs are really the same, as there are order-3 space and computational costs in the linear algebra that don’t go away even when using fewer bases>
  11. “… the basis construction problem may require as much or more computational effort than required to solve the original MDP.”
    1. On the other hand, in the case where transfer can be utilized, this extra cost can me amortized across instances
  12. Two main classes of methods are considered for basis construction in MDPs:
    1. Diagonalization (of the graph Laplacian)
  13. <Only got to here>
Tagged ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: