Rollout Algorithms for Constrained Dynamic Programming. Bertsekas. LIDS 2005

<This is a tech report?>

Based on the abstract, seems similar to http://www.mit.edu/~jnt/Papers/J066-97-rollout.pdf, and his other work that deals with heuristics and rollouts

“Under suitable assumptions, we show that if the base heuristic produces a feasible solution, the rollout algorithm also produces a feasible solution, whose cost is no worse than the cost correspoding to the base heuristic.”

<Important!> “In the base heuristic is a DP policy, i.e., it generates state/control sequences via feedback functions μ_{k}, k=0,1,…,N-1, so that at any state x_{k}, it applies the control μ_{k}(x_{k}), then the algorithm can be viewed as a policy improvement step of the policy iteration method. Based on the generic results for policy iteration, it then follows that the rollout algorithm’s cost <doing cost minimization in this setting> is no worse than the one of the base heuristic. This cost improvement property has been extended in [BTW97 <Bertsekas, Tsitsiklis, Wu. Rollout algorithms for combinatorial optimiziation>] to base heuristics that are not DP policies, but rather satisfy an assumption called sequential improvement, which will be discussed shortly within the constrained context of this paper.”