Ideas from: On the Role of Tracking in Stationary Environments. Sutton, Koop, Silver

  • It seems to basically be an argument for a TD method where the learning rate is controlled by meta-learning (I think this is a method for learning hyperparameters of sorts, but not sure)
  • I guess the one interesting thing here is that TD methods may be better for nonstationary domains than a traditional ML estimate.
  • Fundamental idea here is similar to that in Dyna-2, that sometimes you want to forget what you know because actions that are often bad can sometimes be good.  This is discussed in terms of Go in both papers.
  • Propose that focusing on convergence to a single solution (traditional in RL) may not ultimately lead to best results in practice.   Propose tracking a solution.
    • Is it unreasonable to hope they would define what tracking means or give a reference?
  • Assume domain may not be stationary/iid
  • Argue that in very large space MDPs that are even stationary, world can appear nonstationary because different neighborhoods that are far apart can have very different behaviors and can only be reached after many transitions/long time.  Suggest may just be better to adapt to the local neighborhood.
  • The first example used seems to be more of a strange supervised learning task than an RL task.
  • Is meta-learning orthodox statistics trying to be Bayesian?
  • Utilize previously existing algorithm, incremental delta-bar-delta (IDBD), which is an online meta-learning algorithm that uses gradient-descent to learn step-size parameters.
  • “Tracking becomes important in stationary domains which have temporal coherence and are too large to be solved exactly”

