- Approach is to solve MDPs by iteratively considering low-D reps and approx optimal policies
- “Exact solutions of discounted and average-reward MDPs are expressed in terms of a generalized spectral inverse of the Laplacian called the
*Drazin inverse*.” - Algorithm is called representation policy iteration (RPI)
- Model-free and model-based versions are given

- “Two approaches for dimensionality reduction of MDPs are described based on geometric and reward-sensitive regularization, whereby low-dimensional representations are formed by diagonalization or dilation of Laplacian operators.”
- “… the aim is to solve MDPs by automatically finding ‘lowdimensional’ descriptions of ‘high-dimensional’ functions on a state (action) space.”
- The relevant functions are the policy, transitions, and value

- Method tries to find representations based on the underlying manifold dynamics
- Means method is usable in continuous (state) problems

- In the Laplacian off-diagonal elements are nonpositive, and row sums are 0
- Means they are singular and therefore in general don’t have a direct inverse
- Moore-Penrose pseudoinverse is the most common method when direct inverse isn’t possible
- But something called the Drazin inverse is for spectral cases and is very important in study of Markov chains

- In particular, for an MDP
*M*with transition matrix*T*, care about*A*=*I**-T*- As well as the Drazin inverse of
*A* - But then the go on to call
*A*,*L*, and*T*, P, so*L=I-P*

- Getting a good low-dimensional representation depends on finding the right basis functions that span the underlying manifold
- Deciding what basis to use is fundamentally about making tradeoffs – for example, in a discrete domain, using tabular methods will yield optimal results but will be expensive and noncompact; using a sparser representation is cheaper in some ways <although with my critical CS hat on, I think the big-O costs are really the same, as there are order-3 space and computational costs in the linear algebra that don’t go away even when using fewer bases>
- “… the basis construction problem may require as much or more computational effort than required to solve the original MDP.”
- On the other hand, in the case where transfer can be utilized, this extra cost can me amortized across instances

- Two main classes of methods are considered for basis construction in MDPs:
- Diagonalization (of the graph Laplacian)

- <Only got to here>

Advertisements
(function(){var c=function(){var a=document.getElementById("crt-106888647");window.Criteo?(a.parentNode.style.setProperty("display","inline-block","important"),a.style.setProperty("display","block","important"),window.Criteo.DisplayAcceptableAdIfAdblocked({zoneid:388248,containerid:"crt-106888647",collapseContainerIfNotAdblocked:!0,callifnotadblocked:function(){a.style.setProperty("display","none","important");a.style.setProperty("visbility","hidden","important")}})):(a.style.setProperty("display","none","important"),a.style.setProperty("visibility","hidden","important"))};if(window.Criteo)c();else{if(!__ATA.criteo.script){var b=document.createElement("script");b.src="//static.criteo.net/js/ld/publishertag.js";b.onload=function(){for(var a=0;a<__ATA.criteo.cmd.length;a++){var b=__ATA.criteo.cmd[a];"function"===typeof b&&b()}};(document.head||document.getElementsByTagName("head")[0]).appendChild(b);__ATA.criteo.script=b}__ATA.criteo.cmd.push(c)}})();
(function(){var c=function(){var a=document.getElementById("crt-476261992");window.Criteo?(a.parentNode.style.setProperty("display","inline-block","important"),a.style.setProperty("display","block","important"),window.Criteo.DisplayAcceptableAdIfAdblocked({zoneid:837497,containerid:"crt-476261992",collapseContainerIfNotAdblocked:!0,callifnotadblocked:function(){a.style.setProperty("display","none","important");a.style.setProperty("visbility","hidden","important")}})):(a.style.setProperty("display","none","important"),a.style.setProperty("visibility","hidden","important"))};if(window.Criteo)c();else{if(!__ATA.criteo.script){var b=document.createElement("script");b.src="//static.criteo.net/js/ld/publishertag.js";b.onload=function(){for(var a=0;a<__ATA.criteo.cmd.length;a++){var b=__ATA.criteo.cmd[a];"function"===typeof b&&b()}};(document.head||document.getElementsByTagName("head")[0]).appendChild(b);__ATA.criteo.script=b}__ATA.criteo.cmd.push(c)}})();