TD Models: Modeling the World at a Mixture of Time Scales. Sutton. ICML 1995

Allows model-building at multiple time scales within a single structure

Uses TD for building these models

Says the approach may be relevant to hierarchical planning which attacks a problem at multiple time (usually anyway) scales

Discusses n-step as well as beta-models (the latter are learned TD-style)

Ignores actions, is concerned only with sequence of states and the rewards, so seems to be on-policy

Multiple n-step models can be combined to estimate the value function

The issue is we would need many of these, because each works for only exactly one value of n

Also may be expensive to learn for large n

Instead of using the exact model for step n, it may bet better to average over a range of values, say n-5 to n+5

“The predictions of the different time scales are linearly mixed and yet still they can be used in backup operations without altering convergence to V”

In simple beta-models, weight of predictions falls of exponentially with delay

In full beta- models a different scheme is used can can have an arbitrary weighing profile, and is dependent on the particular sequence of states (not sure how this has bought us anything yet)

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here:
Cookie Policy