TD Models: Modeling the World at a Mixture of Time Scales. Sutton. ICML 1995

  1. Allows model-building at multiple time scales within a single structure
  2. Uses TD for building these models
  3. Says the approach may be relevant to hierarchical planning which attacks a problem at multiple time (usually anyway) scales
  4. Discusses n-step as well as beta-models (the latter are learned TD-style)
  5. Ignores actions, is concerned only with sequence of states and the rewards, so seems to be on-policy
  6. Multiple n-step models can be combined to estimate the value function
    1. The issue is we would need many of these, because each works for only exactly one value of n
    2. Also may be expensive to learn for large n
  7. Instead of using the exact model for step n, it may bet better to average over a range of values, say n-5 to n+5
  8. “The predictions of the different time scales are linearly mixed and yet still they can be used in backup operations without altering convergence to V”
  9. In simple beta-models, weight of predictions falls of exponentially with delay
  10. In full beta- models a different scheme is used can can have an arbitrary weighing profile, and is dependent on the particular sequence of states (not sure how this has bought us anything yet)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: