Just a note that I read this paper while I didn’t have ability to take notes, so notes are pretty sparse here. If I go down the road of dimensionality reduction / feature selection in RL I should cite this, but probably the Parr, Ng papers on L1 regularization in TD are more relevant at the moment.
- Addresses exploration in high dimensional domains
- Uses a continuous value of knowness <C-KWIK?> to drive exploration smoothly, is closer in spirit to MBIE. This is better than the standard binary known/unknown labels that R-Max uses
- Does learning according to KNN with RBFs, learns RBF parameters (σ) for each component of the output vector independently, which allows for more compact representations
- Performance degrades very slowly as a function of problem size, generalization and accuracy are very good as related to methods that do not attempt dimension reduction