This looked familiar. I already saw this talk online a while back:

- Mentions motion planning exploration and exploitation:
- Warren B. Powell. Approximate Dynamic Programming:Solving the Curses of Dimensionality. Wiley Series in Probability and Statistics, 2007.
- [17] M. Rickert, O. Brock, and A. Knoll. Balancing explo-ration and exploitation in motion planning. In Proc. IEEE Int. Conf. Robotics and Automation ICRA 2008, pages 2812–2817, 2008. doi: 10.1109/ROBOT.2008.4543636

- Citations given for CE are:
- Reuven Y. Rubinstein and Dirk P. Kroese. The cross-entropy method: a unified approach to combinatorial optimization. Springer, 2004.
- Dirk Kroese, Sergey Porotsky, and Reuven Rubinstein. The cross-entropy method for continuous multi-extremal optimization. methodology and Computing in Applied Probability, 8:383–407, 2006. ISSN 1387-5841.

- “The CE method is widely applicable and is used to successfully solve complex combinatorial problems such as the minimum graph cut or the traveling salesman problems”
- “The scheme is general and converges to an optimum assuming that enough feasible trajectories can be sampled. As with most randomized methods there is no guarantee that this would be the global optimum but with high likelihood the global optimum can be approached if enough samples are generated
- This is contrary to what I remember reading somewhere else (that it may not converge), but not sure

- These authors relate CE to MCMC (not MCTS) methods, by doing importance sampling. Citation for MCTS:
- Reuven Y. Rubenstein and Dirk P. Kroese. Simulation and the Monte Carlo Method. Wiley, 2008.

- When doing MCTS you sample from another distribution to calculate an integral. The assumption is you can’t sample from the distribution you care about so you get samples from a distribution that is easy to sample from.
- The best sampling-distribution to use is one that minimizes the estimated integral you are trying to find.
- A good way to find that distribution is by finding a distribution that minimizes the KL-Divergence to the optimal distribution
- There is a good explanation of CE along with its relationship to rare-event probabilities, but I am hungry right now and should re-read it in the future.
- Says initial distribution doesn’t have to be very good, it just needs to get good coverage of the space (these claims are a little hand wavy without citations, so not sure what to make of it)
- He discusses doing uniform sampling for the first generation, which is how my implementation does it as well
- Distribution parameter update is done according to EM
- Also says good to inject noise into samples to prevent early convergence
- Looks like the approach here is continuous time and looking to find parameters to optimal controllers (at least the first domain looks LQR)
- Searching trajectories through the state space, as opposed to policies?

- “The Dubins car fails the small-time local controllability (STLC) test which creates interesting non-trivial control problems”
- Cites Lavalle

- For the Dubins car example he uses the encoding of saying what action to take, and for how long (so each action is 2D and variable time)
- Also does a helicopter domain – the description of the dynamics is pretty detailed
- “The paper addresses the motion planning problem through a randomized optimization in the continuous space of feasible trajectories”
**Domains here are planned by selecting sequences of states, and using inverse kinematics to stitch them together.**