- Covers VFA for high dimensional state andexpe space
- Does the approximation as “… Negative free energies in an undirected graphical model called a product of experts
- Action selection is done by mcmc
- In one experimental result action spaces are 2^40
- VFA and action selection are two separate issues discussed here
- ” Our approach is to borrow techniques from the graphical modeling literature and apply then to the problems of value estimation and action selection.”
- Looks like considers pomdps.
- Oh looks like they use inference methods for pomdps to infer state value

- In their system computing “free energy” which I think is equivalent to value is tractable while doing action selection isn’t so mcmc is used
- Mdps considered here are very large but finite
- Looks like they do VFA according to TD updates
- For value approximation use a “particular kind of product of experts, called a restricted Boltzmann machine (…).”
- “Boltzmann machines are undirected models. That means that the model satisfies joint pro probabilities, rather than conditional probabilities.”
- DBNs on the other band are directed

- The reason for using undirected models is that value estimation in that case is tractable which isn’t the case in directed models (the inference is hard)
- A directed graph also restricts use to an actor-critic model

- In Boltzmann machines there are visible and hidden nodes with symmetric weights pairwise between all vertices
- Hidden nodes don’t have a fixed value so the approach is to consider all possible settings of hidden variables
- The probability of settings of hidden nodes is according to Boltzmann distribution
- Not taking extensive notes on this though
- Finding the equilibrium of the Boltzmann machine is done by mcmc.

- Boltzmann machines are called product of experts as each hidden node is called an expert and the values are simply products between nodes
- Exploitation with Boltzmann rule I think it works out simply from the Boltzmann machine itself
- The large action task they mentioned in the beginning is very smooth and not sure it has a sequential component
- There is another multiagent task they test in

Advertisements
(function(g,$){if("undefined"!=typeof g.__ATA){
g.__ATA.initAd({collapseEmpty:'after', sectionId:26942, width:300, height:250});
g.__ATA.initAd({collapseEmpty:'after', sectionId:114160, width:300, height:250});
}})(window,jQuery);
var o = document.getElementById('crt-791373202');
if ("undefined"!=typeof Criteo) {
var p = o.parentNode;
p.style.setProperty('display', 'inline-block', 'important');
o.style.setProperty('display', 'block', 'important');
Criteo.DisplayAcceptableAdIfAdblocked({zoneid:388248,containerid:"crt-791373202",collapseContainerIfNotAdblocked:true,"callifnotadblocked": function () {var o = document.getElementById('crt-791373202'); o.style.setProperty('display','none','important');o.style.setProperty('visbility','hidden','important'); } });
} else {
o.style.setProperty('display', 'none', 'important');
o.style.setProperty('visibility', 'hidden', 'important');
}
var o = document.getElementById('crt-2045480633');
if ("undefined"!=typeof Criteo) {
var p = o.parentNode;
p.style.setProperty('display', 'inline-block', 'important');
o.style.setProperty('display', 'block', 'important');
Criteo.DisplayAcceptableAdIfAdblocked({zoneid:837497,containerid:"crt-2045480633",collapseContainerIfNotAdblocked:true,"callifnotadblocked": function () {var o = document.getElementById('crt-2045480633'); o.style.setProperty('display','none','important');o.style.setProperty('visbility','hidden','important'); } });
} else {
o.style.setProperty('display', 'none', 'important');
o.style.setProperty('visibility', 'hidden', 'important');
}