Learning Options in Reinforcement Learning. Stole, Precup. Lecture Notes in Computer Science Volume. 2002.

  1. About creating options
  2. “The underlying assumption is that the agent will be asked to perform different goal-achievement tasks in an environment that is otherwise the same over time.”
  3. Based on identification of bottleneck states
    1. domains with bottleneck states are good when using option models because all the subsequent states after the bottleneck have either been already solved or the current solution will help future times the bottleneck is hit
  4. Lists a number of Semi-MDP learning methods
  5. Concerned with goal-achievement tasks
  6. Basically the algorithm works by considering some number of instantiations of the domain (with start and goal states changing per instantiation), for each instantiation do:
    1. Solve the domain
    2. Once the solution is obtained, run a number of trajectories from start to finish with solution
    3. Count which states are visited most
    4. Then compute which states are more likely to lead to the max-state (based on average visitations on trajectories through max-state), set whichever have an above-average count (perhaps some other threshold is desirable?) as initialization states for that option
    5. After this, some other form of generalization over initialization states may be desirable (domain specific)
  7. Cite another paper (McGovern, Barto – in reading list) that tries a similar on a similar method but tries to distinguish good from bad trajectories
  8. The empirical results are actually a little less convincing than I would expect – using options is better but the difference isn’t really huge

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: