State Abstraction for Programmable Reinforcement Learning Agents. Andre, Russell. AAAI 2002

  1. Considers safe state abstraction in HRL
  2. Maintains optimality
  3. Something LISP-based?
  4. Run on Taxi
  5. Diettrich showed that a variable can be irrelevant to the optimal decision of a state even if it affects the value of that state
    1. Ex in taxi the final destination matters, but not during the part of the task when driving to pick up the passenger
  6. This idea is a central part of HRL – subtasks in non-pathological domains dont need to consider all state features
  7. Their creation ALisp is Lisp with nondeterministic constructs (whats that). Subsumes options
  8. “Given a partial program, a HRL alg. finds a policy that is consistent with the program.”
  9. A value function decomposition splits the value of a <s,a> into multiple components.  Hierarchy allows this do be done at boundaries of options
  10. There are decomposed versions of VI and policy iteration and they converge to optimal (a paper from 2002)
  11. Has constraints similar to the RL3 paper on aggregation

