- Approach is to solve MDPs by iteratively considering low-D reps and approx optimal policies
- “Exact solutions of discounted and average-reward MDPs are expressed in terms of a generalized spectral inverse of the Laplacian called the
*Drazin inverse*.” - Algorithm is called representation policy iteration (RPI)
- Model-free and model-based versions are given

- “Two approaches for dimensionality reduction of MDPs are described based on geometric and reward-sensitive regularization, whereby low-dimensional representations are formed by diagonalization or dilation of Laplacian operators.”
- “… the aim is to solve MDPs by automatically finding ‘lowdimensional’ descriptions of ‘high-dimensional’ functions on a state (action) space.”
- The relevant functions are the policy, transitions, and value

- Method tries to find representations based on the underlying manifold dynamics
- Means method is usable in continuous (state) problems

- In the Laplacian off-diagonal elements are nonpositive, and row sums are 0
- Means they are singular and therefore in general don’t have a direct inverse
- Moore-Penrose pseudoinverse is the most common method when direct inverse isn’t possible
- But something called the Drazin inverse is for spectral cases and is very important in study of Markov chains

- In particular, for an MDP
*M*with transition matrix*T*, care about*A*=*I**-T*- As well as the Drazin inverse of
*A* - But then the go on to call
*A*,*L*, and*T*, P, so*L=I-P*

- Getting a good low-dimensional representation depends on finding the right basis functions that span the underlying manifold
- Deciding what basis to use is fundamentally about making tradeoffs – for example, in a discrete domain, using tabular methods will yield optimal results but will be expensive and noncompact; using a sparser representation is cheaper in some ways <although with my critical CS hat on, I think the big-O costs are really the same, as there are order-3 space and computational costs in the linear algebra that don’t go away even when using fewer bases>
- “… the basis construction problem may require as much or more computational effort than required to solve the original MDP.”
- On the other hand, in the case where transfer can be utilized, this extra cost can me amortized across instances

- Two main classes of methods are considered for basis construction in MDPs:
- Diagonalization (of the graph Laplacian)

- <Only got to here>

## Learning Representation and Control in Markov Decision Processes: New Frontiers. Mahadevan. Foundations and Trends in ML 2008.

**Tagged**Drazin inverse, Laplacian