Ideas from Kernel-Based Reinforcement Learning: Dirk Ormoneit, Saunak Sen

Approximate VI converges to a finite fixed point
There are some assumptions: Lipshitz continuity, finite covariance of T(y|x, a), iid in transitions, kernel is Lipshitz “mother kernel” (don’t know what that is, but my guess is almost anything reasonable will satisfy it)
Classic bias/variance tradeoff that comes along with a varying bandwidth (high width = low variance, high bias)
Requirement that bandwidth drops to zero over time, but not quickly enough to cause a large increase in variance during the course of the decreasing bandwidth (so bandwidth size should be at least partly a function of the amount of data)
If a reasonable shrinking rate is chosen, the estimated value function converges to optimal
A formula for optimal shrinkage rate is given, which is exponential in the dimension, not surprisingly
Using the max operator over a random estimate of the Q function in the bellman equation leads to a biased estimator (an observed maximum state action value may come from a suboptimal action)
Unless priors are used, the curse of dimensionality can’t be broken

5 thoughts on “Ideas from Kernel-Based Reinforcement Learning: Dirk Ormoneit, Saunak Sen”

Michael Littman says:

August 13, 2009 at 2:16 am

the “mother kernel” thing is interesting—several of us have tried find out what it could mean with no success. very odd that they would use obscure terminology and not define it!

Anonymous says:

October 1, 2013 at 1:39 pm

I am the second author of the paper, and it’s been over a decade since we worked on it, so pardon the rust. The mother kernel is a Lipschitz continuous function from [0,1] -> R+ that integrates to 1. See page 172.

Srinivasan says:

March 1, 2014 at 10:11 am

I am sorry.. Page 172 in what? Any book?

Ari Weinstein says:

March 1, 2014 at 7:19 pm

Not sure – I found versions starting at page 1 or page 757 in Intelligent Computing

Anonymous says:

July 16, 2014 at 5:59 pm

Page 172 from the Ormoneit and Sen paper (in the appendix).

Ari Weinstein's Research