Basis Function Construction for Hierarchical Reinforcement Learning. Osentoski, Mahadevan. AAMAS 2010.

For semi-MDPs

“By combining methods for automatic basis construction with HRL methods, such as MAXQ, we show how a hierarchical function approximator can be automatically formed based on a specific task hierarchy.”

Spectral (graph Laplacian) approach that works from trajectory information (nodes are states and edges are actions – assumes deterministic?)

Taxi, VFs can be constructed by conditioning on particular elements of feature vector (like a decision tree)

Task hierarchy composes a SMDP into smaller, simpler subtasks

“Our approach is divided into three steps. The first step constructs a graph for each subtask from the agent’s experience. We show how graph construction can leverage the abstractions provided with the task hierarchy. The second step constructs a reduced graph based upon the graph structure. The third step recursively constructs basis functions using basis functions from child subtasks as well as using the eigenvectors of the graph Laplacian from the reduced graph.”

Because spectral methods are undirected, they just keep track of state transitions and basically throw out the actions

There is a simple method of merging vertices (called a reduced graph).

Then eigenvectors of this <?>

Consider 3 types of abstraction:

Subtask irrelevance: eliminate state variables that are irrelevant to subtask. This occurs from the reduced graph construction: “If the state variables in Y_i have no bearing on the probability transition function, they will be irrelevant in terms of connectivity on the graph and only X_i will be used to represent the state variables.”

Shielding: If there is some state that causes a path to some option to terminate <prematurely?> it wont be represented

Result distribution irrelevance: If the output depends of an option depends only on a particular feature <?>

Empirical results in taxi and a manufacturing domain