Issues in Using FAs for RL. Thrun, Schwartz

  • Discusses failure cases w/FAs in QL
  • Generalization err in FA can lead to systematic overestimation of values, can lead to an expectation of agent failing to learn optimal policy
  • Gets bounds on necessary accuracy of FA, TD factor used
  • Driven by empirical difficulties encountered when using FAs
  • Even if error introduced by FA is zero-mean, the max operator makes the error to be above zero, so issue makes value function tend to overestimate value, gives expectation as to what that amount can be
  • Empirically, using high gammas (over 0.9) lead to more severe failures
  • Say memory based methods may be more reliable than other forms of regression, have better behavior when amount of experience is large
  • Also say TD methods should help mitigate this due to eligibility trace
  • FAs baised toward worst case estimates can help the issue… how many of these exist?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: