- Approach is to solve MDPs by iteratively considering low-D reps and approx optimal policies
- “Exact solutions of discounted and average-reward MDPs are expressed in terms of a generalized spectral inverse of the Laplacian called the
*Drazin inverse*.” - Algorithm is called representation policy iteration (RPI)
- Model-free and model-based versions are given

- “Two approaches for dimensionality reduction of MDPs are described based on geometric and reward-sensitive regularization, whereby low-dimensional representations are formed by diagonalization or dilation of Laplacian operators.”
- “… the aim is to solve MDPs by automatically finding ‘lowdimensional’ descriptions of ‘high-dimensional’ functions on a state (action) space.”
- The relevant functions are the policy, transitions, and value

- Method tries to find representations based on the underlying manifold dynamics
- Means method is usable in continuous (state) problems

- In the Laplacian off-diagonal elements are nonpositive, and row sums are 0
- Means they are singular and therefore in general don’t have a direct inverse
- Moore-Penrose pseudoinverse is the most common method when direct inverse isn’t possible
- But something called the Drazin inverse is for spectral cases and is very important in study of Markov chains

- In particular, for an MDP
*M*with transition matrix*T*, care about*A*=*I**-T*- As well as the Drazin inverse of
*A* - But then the go on to call
*A*,*L*, and*T*, P, so*L=I-P*

- Getting a good low-dimensional representation depends on finding the right basis functions that span the underlying manifold
- Deciding what basis to use is fundamentally about making tradeoffs – for example, in a discrete domain, using tabular methods will yield optimal results but will be expensive and noncompact; using a sparser representation is cheaper in some ways <although with my critical CS hat on, I think the big-O costs are really the same, as there are order-3 space and computational costs in the linear algebra that don’t go away even when using fewer bases>
- “… the basis construction problem may require as much or more computational effort than required to solve the original MDP.”
- On the other hand, in the case where transfer can be utilized, this extra cost can me amortized across instances

- Two main classes of methods are considered for basis construction in MDPs:
- Diagonalization (of the graph Laplacian)

- <Only got to here>

Advertisements
(function(g,$){if("undefined"!=typeof g.__ATA){
g.__ATA.initAd({collapseEmpty:'after', sectionId:26942, width:300, height:250});
g.__ATA.initAd({collapseEmpty:'after', sectionId:114160, width:300, height:250});
}})(window,jQuery);
var o = document.getElementById('crt-1586655647');
if ("undefined"!=typeof Criteo) {
var p = o.parentNode;
p.style.setProperty('display', 'inline-block', 'important');
o.style.setProperty('display', 'block', 'important');
Criteo.DisplayAcceptableAdIfAdblocked({zoneid:388248,containerid:"crt-1586655647",collapseContainerIfNotAdblocked:true,"callifnotadblocked": function () {var o = document.getElementById('crt-1586655647'); o.style.setProperty('display','none','important');o.style.setProperty('visbility','hidden','important'); } });
} else {
o.style.setProperty('display', 'none', 'important');
o.style.setProperty('visibility', 'hidden', 'important');
}
var o = document.getElementById('crt-1533258655');
if ("undefined"!=typeof Criteo) {
var p = o.parentNode;
p.style.setProperty('display', 'inline-block', 'important');
o.style.setProperty('display', 'block', 'important');
Criteo.DisplayAcceptableAdIfAdblocked({zoneid:837497,containerid:"crt-1533258655",collapseContainerIfNotAdblocked:true,"callifnotadblocked": function () {var o = document.getElementById('crt-1533258655'); o.style.setProperty('display','none','important');o.style.setProperty('visbility','hidden','important'); } });
} else {
o.style.setProperty('display', 'none', 'important');
o.style.setProperty('visibility', 'hidden', 'important');
}