(I accidentally had the wrong title here before, fixed)

- Seems similar in spirit to (same 1st author): Inferring bounds on the performance of a control policy from a sample of trajectories.
- Attempts to do value approximation by using corpus of SARSes, which is made of a number of “broken trajectories”
- Assumption everything is Lipshitz
- Strength is method doesn’t use FAs
- Stochasticity is tolerated
- Gives equations for bias and variance of monte carlo estimators
- Algorithm is setup to minimize discrepancy between rollout resulting from corpus vs that from true model if it was available
- Each sample in the corpus is used at most once (not sure why thats necessary)
- Sample used for the transition is one that minimizes a particular distance metric in the SxA space
- Gives bias and variance of their evaluator
- “When the sample sparsity becomes small, the bias of the estimator decreases to zero and its variance converges to the variance of the Monte Carlo estimator.”

### Like this:

Like Loading...

*Related*

“Seems similar in spirit to (same 1st author): Inferring bounds on the performance of a control policy from a sample of trajectories.”: I assume that’s the paper you shared with me recently.

“Stochasticity is tolerated”: Do they do anything interesting to extend the other work to handle stochastic transitions?

“Each sample in the corpus is used at most once (not sure why thats necessary)”: I assume it’s to keep the statistical estimates from getting biased.