Learning Representation and Control in Markov Decision Processes: New Frontiers. Mahadevan. Foundations and Trends in ML 2008. | Ari Weinstein's Research

Learning Representation and Control in Markov Decision Processes: New Frontiers. Mahadevan. Foundations and Trends in ML 2008.

Approach is to solve MDPs by iteratively considering low-D reps and approx optimal policies

“Exact solutions of discounted and average-reward MDPs are expressed in terms of a generalized spectral inverse of the Laplacian called the Drazin inverse.”

Algorithm is called representation policy iteration (RPI)

Model-free and model-based versions are given

“Two approaches for dimensionality reduction of MDPs are described based on geometric and reward-sensitive regularization, whereby low-dimensional representations are formed by diagonalization or dilation of Laplacian operators.”

“… the aim is to solve MDPs by automatically finding ‘lowdimensional’ descriptions of ‘high-dimensional’ functions on a state (action) space.”

The relevant functions are the policy, transitions, and value

Method tries to find representations based on the underlying manifold dynamics

Means method is usable in continuous (state) problems

In the Laplacian off-diagonal elements are nonpositive, and row sums are 0

Means they are singular and therefore in general don’t have a direct inverse

Moore-Penrose pseudoinverse is the most common method when direct inverse isn’t possible

But something called the Drazin inverse is for spectral cases and is very important in study of Markov chains

In particular, for an MDP M with transition matrix T, care about

A=I-T

As well as the Drazin inverse of A

But then the go on to call A, L, and T, P, so L=I-P

Getting a good low-dimensional representation depends on finding the right basis functions that span the underlying manifold

Deciding what basis to use is fundamentally about making tradeoffs – for example, in a discrete domain, using tabular methods will yield optimal results but will be expensive and noncompact; using a sparser representation is cheaper in some ways <although with my critical CS hat on, I think the big-O costs are really the same, as there are order-3 space and computational costs in the linear algebra that don’t go away even when using fewer bases>

“… the basis construction problem may require as much or more computational effort than required to solve the original MDP.”

On the other hand, in the case where transfer can be utilized, this extra cost can me amortized across instances

Two main classes of methods are considered for basis construction in MDPs:

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here:
Cookie Policy