Ideas from Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods, Alessandro Lazaric, Andrea Bonarini, Marcello Restelli

Paper addresses a means to do RL in continuous action spaces via actor (has the policy) critic (has value function independent of policy) architecture

Goal is not to learn the optimum value function perfectly everywhere, but is instead to find the optimal policy

Actor takes an action, and is then criticized. Based on this, the actor modifies its policy by a stochastic gradient method on the policy space

Method proposed is the use of the sequential monte-carlo (SMC) method to approximate the sequence of probability distributions implemented by the actor, which they call SMC-Learning

Actions are initially selected by chance, but are resampled according to importance sampling, which has values based on the values computed by the critic

Because of monte-carlo sampling and Boltzman exploration, an accurate model can be built at the limit

Computational cost of action selection is logarithmic in the number of samples

There is a set of possible actions for each states, but this is adjusted via sampling so that the set of actions should eventually contain the optimal action for that state

One thought on “Ideas from Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods, Alessandro Lazaric, Andrea Bonarini, Marcello Restelli”

Again, author(s)? Also, apart from the contents of the paper, how do you see these ideas being useful in your work? What are the limitations that could lead to follow up papers?

Again, author(s)? Also, apart from the contents of the paper, how do you see these ideas being useful in your work? What are the limitations that could lead to follow up papers?