This is not a full paper but an extended abstract

- Is concerned with analysis of fitted-VI with options
- For pessimistic VF estimates, options can improve the convergence of FVI with options as opposed to vanilla FVI
- Options can improve convergence even when they are suboptimal
- Says that other papers consider options to exploration, while this paper deals with options as related to planning
- “We derive bounds that help us reason about when approx DP with extended actions [options] may converge faster than approx DP with only primitive actions.”
- VI, policy iteration, and many other methods still converge when options are used, so from that standpoint there is little reason to use them
- Introduces a generalization of FVI that incorporates samples generated by options
- When the set of options contains primitive actions, it converges at least as quickly as vanilla FVI
- Also proof for when their generalized FVI converges faster with options than with only primitive actions
- <The distinction between 1 and 2 isn’t totally clear to me, but perhaps #2 also contains the case where primitive actions aren’t available?>

- “…the improvement in convergence only occurs when the iterates of our algorithm underestimate the optimal value function, which can be controlled in practice by setting the initial estimate of the optimal value function pessimistically.”
- Has an equation by Munos, Szepesvari that bounds the error of estimated and optimal VFs under vanilla VFI
- They call this the “approximation error”
- Its quite messy, but it exists
- Some terms are the error of the VFA to the true VF, and the smoothness of the transition distribution
- Total error consists of:
- Inability of the VFA to capture the VF at each iteration
- Finite number of samples are being used to approximate an infinitely sized (continuous) space
- Finite number of iterations (by increasing # of iteration this term shrinks exponentially fast), convergence is also faster with smaller γ

- Says that “because options execute for multiple timesteps, an option can have an effective discount factor that is smaller than γ”
- <I don’t understand what that means / why its true>

- Earlier analytical work on options doesn’t apply to continuous state problems, and doesn’t compare performance relative to primitive-action only (basically they prove that using options doesn’t prevent convergence)
- Seems like in their version the actual actions taken during the option aren’t fed into VI – just the beginning and end of the option?
- Say that for their algorithm a similar bound can be obtained to that by Munos Szepesvari
- “Faster convergence depends critically on the optimism or pessimism of the sequence of iterates produced by OFVI [their version of VI].”
- Speed requires pessimism

- <I’m a little confused by this. They define a different γ for options and then basically say that because this new γ is smaller (because the option is temporally extended) then their thing works better. The stuff that I’ve read on options though doesn’t redifine γ, and that transitions have to be discounted as well which I don’t see here either. The whole paper seems to depend on this so it would be important to understand.>
- There are empirical results too, but I think this mainly has to do with the theoretical results.

Advertisements