- Seems similar in spirit to (same 1st author): Inferring bounds on the performance of a control policy from a sample of trajectories.
- Attempts to do value approximation by using corpus of SARSes, which is made of a number of “broken trajectories”
- Assumption everything is Lipshitz
- Strength is method doesn’t use FAs
- Stochasticity is tolerated
- Gives equations for bias and variance of monte carlo estimators
- Algorithm is setup to minimize discrepancy between rollout resulting from corpus vs that from true model if it was available
- Each sample in the corpus is used at most once (not sure why thats necessary)
- Sample used for the transition is one that minimizes a particular distance metric in the SxA space
- Gives bias and variance of their evaluator
- “When the sample sparsity becomes small, the bias of the estimator decreases to zero and its variance converges to the variance of the Monte Carlo estimator.”

“Seems similar in spirit to (same 1st author): Inferring bounds on the performance of a control policy from a sample of trajectories.”: I assume that’s the paper you shared with me recently.

“Stochasticity is tolerated”: Do they do anything interesting to extend the other work to handle stochastic transitions?

“Each sample in the corpus is used at most once (not sure why thats necessary)”: I assume it’s to keep the statistical estimates from getting biased.