- Considers a large number of abstraction algorithms, and discusses exactness and some other characteristic of each method
- Bisimulation
- Homomorphisms
- Approximate Bisimulation
- Bisimulation Metrics
- MaxQ
- Stochastic Dynamic Programming
- G Algorithm
- Utile Distinction
- Policy Irrelevance
- Adaptive Aggregation

- Their definition of an abstraction is “a mapping from one problem representation to a new representation, while preserving some properties.”
- Here, in particular they are interested in preserving properties that allow representation of optimal policy

- Concerned only with state, and not action abstraction
- Φ is defined as the mapping from raw to abstracted state; a state corresponds to one abstract state, but an abstract state maps to a set of raw states, these mappings are mutually exclusive
- New reward and transition functions are needed for abstracted states that perform weighted average over T and R in the raw space
- They don’t really go into how much each state weighs

- Define fineness and coarseness; a mapping is as fine as another if a pair of raw states that map into a single abstract state in both mappings
- This relation satisfies reflexibility, antisymmetry, and transitivity
- The >= finer mapping is a partial ordering

- A model-irrelevance abstraction is one where 2 raw states map to the same abstract states, and have the same reward function and transition function in the original MDP
- A Qπ-irrelevance abstraction is one where for any action, if the two raw states map to the same abstract state, they have the same Q function. There is a similar rule for Q*
- Similarly, for a*-irrelevance, if a particular action is a* for any raw state mapped to the abstract state, it must also be for all other states that map to the same abstract state

- Different irrelevance rules maintain particular aspects that other algorithms listed in 1 maintain
- There is an ordering of fineness according to the different forms of model irrelevance defined above
- “As a consequence, any one of the five abstractions is a special case of other finer abstractions”
- This means that proofs about a particular level of abstraction will automatically apply to some other abstractions

- With the abstractions defined above, optimality is maintained in the abstract MDP
- On the other hand, Q-learning does not converge in all abstractions, or may converge to a suboptimal policy
- The proof for this is too sketchy to understand though
- “chattering” of q values arises if weights are defined as the frequency of visitation. As the policy varies the weights do as well which can cause issues

- Results here generalize Gordon’s results about issues with VFA
- Results hold not just for Q-learning, but all “algorithm[s] based on on-trajectory sampling, combined with a non-stationary policy.”
- There are similar results for model-based RL. Weighing must be fixed and the coarsest form of abstraction in general will not converge to optimal
- Different levels of coarseness allow for different operations to be performed/recovery of different information
- With the coarsest abstraction, optimal planning is no longer possible, but if the optimal policy was presented, it could be maintained

- “Therefore, there is a tradeoff between
*minimizing information loss*and*maximizing state space abstraction*when selecting abstractions.” - Ultimately for abstraction they say the most useful is the one where all states that map to an abstract state have the same optimal action; it still allows convergence of most algorithms but also allows for a fairly coarse representation
- The way to actually discover abstractions is still mostly open work, although some algorithms attempt it
- Discuss some interesting future work
- Definitions discussed here are based on exact representations, but ones that tolerate some error are probably also useful
- Use homomorphims? (I dont know what that is.)
- Applications to HRL
- Other operations on abstractions like intersection and composition
- Abstraction and factored representations
- State-space partitioning in continuous MDPs

Advertisements