Reinforcement Learning with High-Dimensional, Continuous Actions. Baird, Klopf

  • Don’t think this was peer reviewed, but was cited in Kaelbling’s RL survey
  • Basic idea seems similar to CACLA (although this came first – 1993), uses wire-fitting and FAs
  • Wire fitting works by having a discrete number of “wires” that represent control points on the function, areas between control points are estimated by interpolation between them.
    • Maxima must be at the wires (I think?)
  • Wire control points are initialized randomly and moved around by training
  • Say that it makes more sense with a gradient-based (I guess online updates) system, but can also be adapted to a memory based system
  • Still dont really understand how the wire fitting is done – the interpolation I get – I can never find a good reference on the algorithm itself, and I haven’t seen it explained very well
    • “Any general FA system can be used to learn the function… This function generates a set of control points based upon the the value of x.  A function is then fitted to the set of control points, and the value of f(.) is then calculated from u (the actions). “
  • So you really need an FA that has two outputs for each wire, so if there are w wires, the FA has an output vector of size 2w for a query x.  I think this is whats doing most of the work, and don’t get how the location of the wires is selected

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: