- Proves convergence of an approx DP algo for high-dimensional control problems linked by a scalar storage device
- Problem is formulated as a stochastic DP where value of stored energy is done with a piecewise linear value func approximator
- Problem is concave, so exploration isn’t needed, and a proof is provided for convergence
- Discrete state, vector valued, but low-dimensional
- Actions are also discrete
- “For our problem, xt (action at time t) can easily have hundreds or thousands of dimensions, depending on the level of aggregation. In addition, we are unable to compute the expectation analytically, introducing the additional complication of maximizing an expectation that is itself computationally intractable.”
- Because the action space is so large, thorough exploration is not possible. They find optimal solutions by concavity
- They say there are three possible curses of dimensionality: in finding a mapping over S, selecting A, and predicting S’
- “Our strategy represents a form of approximate value iteration where we focus on finding the slopes of a function rather than the value”
- Although approximate value functions have poor convergence and diverge, concavity allows a result that works well in practice, and in this application
- This paper, however, has the first convergence proof (for approx VI on concave value functions?)
- Assumptions made here are also weaker than in many similar papers
- Although some state values (such as energy in storage) are continuous, the method will only allow those continuous variables to reach a discrete number of values