- A* tries to simply find a path, another solution could be in the form of a tree. AO* finds a solution in the form of an acyclic graph/tree (specifically discuss and/or trees). This is a conditional structure
- LAO* can find solutions in the form of graphs with
*with*loops - Claims it “shares the advantage heuristic search has over dynamic programming… Given a start state, it can find an optimal solution w/o evaluating the entire state space”
- Loops are relevant for finding policies in stochastic MDPs where an action can take you back to a state visited previously. They are concerned with stochastic shortest-path problems (has terminal states)
- Under certain conditions A* evaluates the minimum number of states, same for AO*
- Discusses Trial-based RTDP, since it only performs computation on encountered states, it can avoid doing computation of entire MDP.
- Check out convergence theorem from LRTA* (learning real-time A*)
- AO* can be made more efficient by adding “solve-labeling.” A state is labeled solved if it is a goal state, or if all children are solved. This is similar to what FS^3 does.
- Proofs of admissibility, correctness
- They call doing backwards bellman backups from the goal to the start pruning by dominance. They also call using bounds to prune the search tree/graph forward “pruning by bounds” (branch and bound)
- AO* uses branch and bound in its forward expansion step and dynamic programing for cost-revision
- The trick in allowing AO* to work in cyclic domains (making it LAO*) is to note that since a value function is being computed, other methods such as VI can be used in place of pure backwards induction
- It must be the case that when they query for an s,a they get all of T(s,a) and not just s’ because they dont really discuss resampling – not sure.

- Pretty simple proofs of admissibility and correctness
- RTDP was proved to converge to optimal, but there is no convergence test or error bound for the algorithm, they state the proofs for LAO* for these can be adapted to RTDP
- Discuss consistency h(i) <= ci(a)+h(j) for next state j, as desirable because it ensures state costs increas monotonically as the algorithm converges. If a nonconsistent (but admissible) heuristic is used, a
*pathmax*trick can make it behave in a consistent manner (works in A*, AO*, LAO*). - Discuss weighting heuristic, which weights the cost observed versus the heuristic cost (in the original formulation, both costs are 1). Its possible to bound by how much the found solution is different from optimal.
- In A* if two heuristic functions are used s.t. h1(i)<=h2(i)<=f*(i), the set of nodes examined when following h2 is a subset of those followed when exploring h1. This isn’t true necessarily in AO*/LAO*, but holds in terms of worst-case sets of states
- LAO* can be used in MDPs without terminal states by adding discount factor
- Analyze LAO* performance on racetrack and a supply-chain management problem.
- In racetrack, performance of LAO* in zero-heuristic case is basically the same as VI, PI takes too long to finish. Giving a good heuristic allows LAO* to finish in 1/3 the time, using weighting helps even more
- They use tricks to improve performance of LAO* because otherwise it is quite poor, one thing is analyzing all vertices from an edge at the same time instead of one at a time, most of the cost is from running VI
- Show convergence as faster than RTDP.

- It would be interesting to see performance of LAO* vs prioritized sweeping

Advertisements