- Paper addresses a means to do RL in continuous action spaces via actor (has the policy) critic (has value function independent of policy) architecture
- Goal is not to learn the optimum value function perfectly everywhere, but is instead to find the optimal policy
- Actor takes an action, and is then criticized. Based on this, the actor modifies its policy by a stochastic gradient method on the policy space
- Method proposed is the use of the sequential monte-carlo (SMC) method to approximate the sequence of probability distributions implemented by the actor, which they call SMC-Learning
- Actions are initially selected by chance, but are resampled according to importance sampling, which has values based on the values computed by the critic
- Because of monte-carlo sampling and Boltzman exploration, an accurate model can be built at the limit
- Computational cost of action selection is logarithmic in the number of samples
- There is a set of possible actions for each states, but this is adjusted via sampling so that the set of actions should eventually contain the optimal action for that state

## Ideas from Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods, Alessandro Lazaric, Andrea Bonarini, Marcello Restelli

**Tagged**2007, Alessandro Lazaric, Andrea Bonarini, Marcello Restelli, NIPS, Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods

Again, author(s)? Also, apart from the contents of the paper, how do you see these ideas being useful in your work? What are the limitations that could lead to follow up papers?