Category Archives: A*

In Transit to Constant Time Shortest-Path Queries in Road Networks. Bast, Funke, Matijevic, Sanders, Schultes. Algorithm Engineering and Experiments (ALENEX)

  1. Beats previous best results by two orders of magnitude <was also published in Science, but the article is so short I’m not taking notes on it>
  2. In worst-case Dijkstras cannot be below linear because it may have to examine all nodes and vertices
    1. The only way to beat this is by preprocessing
  3. Constant query time can be achieved by superlinear storage <Upon looking up that reference, it really looks linear.  The response may be slightly suboptimal (but this amount is constant in the size of the graph)>
  4. The improvements in quality and speed here exploit the fact that roads have particular properties, like low branching factors, fact that highways are designed to be used for long distance travel…
  5. Idea of transit node is “that for every pair of nodes that are ‘not too close’ to each other, the shortest path between them passes through at least one of these transit nodes.”
    1. Additionally, even when travelling very far, the number of transit nodes that must be used is very small (like 10)
  6. In the US road data they consider, 24 million nodes, and 10,000 transit nodes
  7. Example (paper has figure): “Finding the optimal travel time between two points (flags) somewhere between Saarbr¨ucken and Karlsruhe amounts to retrieving the 2 × 4 access nodes (diamonds), performing 16 table lookups between all pairs of access nodes, and checking that the two disks defining the locality filter do not overlap. Transit nodes that are not relevant for the depicted query are drawn as small squares.”
  8. Mentions that there are special classes of algorithms for working on planar graphs, which roads (basically) are
  9. They have a previous paper that hierarchically builds paths based on increasingly long paths <should check out>
    1. This paper is by the same authors, and ends up doing something very similar to transit nodes (they are just nodes high in the hierarchy).  In the earlier paper access nodes were computed on the fly whereas here they are precomputed
    2. It also doesn’t deal with how to figure out the distance in the distance table is indeed the shortest distance
  10. A* is actually one of the best methods for real world planning (when done around landmarks and the the triangle inequality).  Here, access nodes can be used as the landmarks
  11. The hierarchy approach also needs to be grounded in actual geographic information to work <at small scales?>
  12. Nodes are labeled either close/not close to individual access nodes
  13. Access nodes are found by backwards search by Dijkstra’s, and do this until all paths converge on another transit node, then take all nodes found through this search that do not reach another transit node
  14. There are a number of ways to implement a locality filter; the one they recommend is to examine everything that is reachable within a cluster in the hierarchy
  15. This information allows for what is effectively constant time queries for shortest paths
  16. There is another way to implement the algorithm that is grid based – uses 2 grids of different resolution
    1. Needs some optimization to function efficiently
    2. Computing access nodes in grid-based implementation is very simple
  17. With the optimizations, long-distance queries via transit nodes is actually faster (1 million x) than local queries.  Although on average local queries that dont go through access nodes are about 1% of queries <not sure how this is measured>  it still dominates the running time
    1. There is a scheme also that allows speedups for local queries as well
  18. In the grid setting, there are tradeoffs between space, number of transit nodes <both increasing with resolution> and #local queries that don’t go through transit nodes <decreasing with resolution>
  19. Ah, to get the best of both worlds they use a hierarchy of grids
  20. The exact algorithm used to compute access nodes and transit nodes is confusing, and their diagram doesn’t help either.  Its written in English but would be much more understandable written in math (or at least pseudocode)
  21. Queries are super-easy to run (if nodes are more than 4 cells apart, path must go through a transit node, which already have pairwise distances computed)
  22. Can do hierarchical <seems to be just multiple griddings of different resolutions but I think there is another paper that may go into it more thoroughly>
  23. Do a greedy set cover to optimize storage for access nodes
  24. Implemented on data with 24m nodes, 60m edges
  25. Run on a ’07 dual-proc machine, 99% of queries have an avg time of 12 microseconds (adding the extra 1% moves it to 63 msec – are those the ones that don’t go through transit nodes?)
  26. Computation times are not “Dijkstra rank dependent”
  27. <Ok that’s not the end of the paper – just for the grid part, now moving on to a more fully hierarchical method>
  28. Highway hierarchies
  29. An edge is a highway edge is an edge that is in the shortest path between two vertices, but not in the local neighborhood of those vertices
  30. They also may contract edges based on a bypassbility criterion, but that isn’t yet discussed at length
  31. The hierarchy is then constructed by recursively building highway networks on top of each preceeding layer
  32. With increasing path distance, paths will go increasingly higher in the highway hierarchy
  33. So the nodes existing at some level in the hierarchy can be used as the transit nodes
  34. #transit nodes controlled by neighborhood size and level of hierarchy
  35. “Note that there is a difference between the level of the highway hierarchy and the layer of transit node search” <on this read through the distinction is actually not clear to me – looks like they are basically equivalent but in reverse order?>
  36. There is a gloss over how description of how access nodes are found – another paper is referenced <which I will try and get to> and point #20 is referenced
  37. Present 2 methods of the algorithm, which have different tradeoffs in terms of space, preprocessing, and query time
  38. Search is done top-down
  39. “For a given node pair (s, t), in order to get a complete description of the shortest s-t-path, we first perform a transit node query and determine the layer i that is used to obtain the shortest path distance. Then, we have to determine the path from s to the forward access node u to layer i, the path from the backward access node v to t, and the path from u to v”
  40. There is still room for improvement for the locality filters
  41. For US data, 99% of cases are handled at the top layer
  42. Search times start low and increase as searches are still local but further away, and then drop extremely low once nodes are far enough that transit nodes can be leveraged
  43. <On to conclusion>
  44. “Building on highway hierarchies, this can be achieved using a moderate amount of additional storage and precomputation but with an extremely low query time. The geometric grid approach on the other hand allows for very low space consumption at the cost of  slightly higher preprocessing and query times.”
  45. “There are many interesting ways to choose transit nodes. For example nodes with high node reach [9, 6] could be a good starting point. Here, we can directly influence |T |, and the resulting reach bound might help defining a simple locality filter. However, it seems thatgeometric reach or travel time reach do not reflect the inhomogeneous density of real world road networks. Hence, it would be interesting if we could efficiently approximate reach based on the Dijkstra rank.
    Another interesting approach might be to start with some locality filter that guarantees uniformly small local searches and to view it as an optimisation problem to choose a small set of transit nodes that cover all the local search spaces.”

Motion Planning and Optimal Control of Robotic Systems. Marin Kobilarov. Talk

Read through of the actual paper is here:

  1. Seems to be strongly motivated by real problems, lots of examples on real robots, or realistic simulations
  2. Continuous vs discrete mechanics (one involves integration, the other summation, Euler Lagrange vs discrete Lagrange)
    1. Discrete mechanics uses discrete trajectories, based on snapshots across time, as opposed to continuous time in trajectories
    2. Gives an example of the continuous and discrete dynamics in modelling pendulum motion
      1. In this case, demonstrates that the Runge-Kutta approximation dissipates energy over time, so the pendulum stops moving, the discrete doesn’t
  3. He is interested in discrete geometric mechanics
    1. Uses vector fields, flow maps, and covariant differentials(?)
    2. Transformations make the space linear
  4. Gives an example of what he calls the snakeboard (skateboard where axles rotate) and shows RK integration is very poor compared to the discrete mechanical equations
  5. Computational costs of these methods are very low
  6. Discrete Geometric Optimal Control.   
    1. Equations for this are very simple, and very similar to the equations for the original dynamics.
    2. The difference is the Lagrangian
  7. When working with nonlinear systems, must use iterative methods, like Newton-type root finding
    1. Usually takes only a few (ten or so) iterations, can be done real-time
    2. Regularity of the Lagrangian for control means there is a optimal control solution
  8. Normal methods Don’t deal well with collisions or other nonsmooth dynamics
    1. This is still an area of active research
  9. The state-of-the-art method of planning in nonsmooth domains (such as with obstacles) are RRTs or PRMs (probabilistic roadmaps)
    1. A drawback is these don’t really converge to optimal, they are good at only finding some path to goal
  10. Proposes random-sampling in trajectory space (adaptive sampling and stochastic optimization, Cross-Entropy for rare-event estimation)
  11. Trajectories can be parameterized in two ways: by piecewise constant control, or by specific states visited.
    1. For control sequences, trajectory estimated by discrete mechanics, for state parametrization, need optimal control
  12. Reformulates the problem as optimization, instead of points try to find small regions around those points
  13. Try to estimate volume of low-cost trajectories via MCMC
  14. When estimating densities can work with combinations of Gaussians, for example.
  15. SCE-RRT* algorithm
  16. Combining sensing and control: how do you plan and try to gain as much information as possible about the environment within a restricted amount of time?
    1. Implemented this on a model helicopter for slam-style learning/planning.  Planned to end up on full-sized helicopters later.
  17. Conclusions:
    1. Discrete mechanics leads to provably good numerical algs
    2. Traj optimization should exploit state space geometry
    3. <missed the rest>
  18. Future Work:
    1. Numerical geom dynamics and control (robust real-time numerics)
    2. planning and control, uncertainty and robustness
    3. info-theoretic bounds on performance/efficiency/complexity
    4. He cares about applications to autonomous robots (surface, sea, air, space)

Anytime Motion Planning using the RRT*. Karaman, Walter, Perez, Frazzoli, Teller. ICRA 2011

  1. Paper presents a variant of RRT that interleaves execution of plans and replanning, as opposed to RRT which monolithically plans and then executes
  2. Execution starts with a short planning period, of which at least one path should reach the goal.
  3. The algorithm then chooses some initial segment of this plan to execute, and executes this segment in its entirety.
  4. While executing this segment, it begins replanning with the end of the segment as the next start point
  5. Planning leverages a branch and bound method; esssentially A* down to the fact that it requires admissible heuristics
  6. A major criticism of the paper is the planning domains they use are very simple and could be solved optimally with dead reckoning and very basic obstacle avoidance
  7. They say that regular RRT often will end up finding a suboptimal path and then only performing local optimizations of that bad path instead of finding a whole different, better path
    1. I think the reason RRT* ends up finding less suboptimal paths is because while RRT is doing mostly hillclimbing with few restarts, RRT* has a complete restart at the end of each path segment, so it gets itself out of local optima. They do not present this interpretation but I’m pretty sure it is what is going on
    2. You can see (and they point this out as well) when it starts to do something suboptimal but then catches itself and performs the correction
  8. The algorithm has asymptotic convergence to optimal behavior

Iterative Bounding LAO*. Hakan Warnquist, Jonas Kvarnstrom, Patrick Doherty. ECAI 2010

  1. Seems similar to FS3 because it uses an upper and lower heuristic to prune extensively
  2. An epsilon-optimal planning algorithm for probabilistic domains where objective is to reach a goal state with minimum cost
  3.  Based on LAO*, comparison to RTDP
  4. Is an on-line method, which maintains provably good bounds over time
  5. They mention other algs that use upper and lower bounds, mainly RTDP and variants.  I forgot RTDP used bounds in that manner.
  6. One issue with RTDP is that it will continue to attempt value function updates for states that have converged values, which is wasteful. Its later variants use upper and lower bounds to prevent this from happening.
  7. the AO in LAO* comes from And/Or graphs
  8. LAO* fixes AO* so that it functions in acyclic graphs
    1. Does so by transforming graph into a tree, it seems
  9. LAO* functions in cyclic domains by performing value iteration instead of regular backups from leaf to root
  10. IB-LAO* doesn’t perform full VI, but instead does one backup of all ancestors of states that are newly expanded, so its like one iteration of VI
  11. LAO* is not rollout-based (like RTDP) and therefore keeps track of a fringe that must be expanded.  Since the fringe can get very large, it is best to expand only a subset of the fringe, prioritized based on which nodes can maximally influence the root.  The computation is done with other fringe operations and doesn’t increase the computational cost.
  12. Discussion includes addition of weighted heuristics, this makes heuristics inadmissible, but helps to find a solution in most real-world cases
    1. Looks like it is possible to adjust the weights while maintaining epsilon-optimality?  Thats interesting.
  13. Use two comparison domains because one has low, one high branching factor

Best-First Fixed-Depth Minimax Algorithms. Aske Plaat, Jonathan Schaeffer, Wim Pijls, Arie de Bruin. Journal of AI 1996.

  1. “Interesting alternatives to depth-first searching, such as breadth-first and best-first strategies, have been largely ignored in practice”
  2. SSS* is best-fist, while A-B is depth-first
  3. Here they dont use the name AB-SSS*, they call it MT-SSS*?
  4. “MTD can be used to construct a variety of fixed-depth best-first search algorithms using depth-first search.
  5. They say that the assumptions for SSS*’s dominance of A-B aren’t met in practice (due to move reordering and use of iterative deepening).  Empirical results have A-B actually beating MT-SSS* in checkers at some plies for this reason
  6. Talk about significant differences between real-world, and synthetic game trees
  7. MTD(f) is the newest as of the time of writing.  It beats a state of the art implementation of Negascout in terms of total nodes, leaves, and clock time.   I think this is AB-SSS* or very very similar.
  8. They say the connection between A-B and SSS* starts with an upper bound on the minimax value, and the max-solution tree, which is the minimal search tree that proves an upper bound
  9. Most game playing programs use iterative deepening “It is based on the assumption that a shallow search is a good approximation of a deeper search.”  “Among the benefits of iterative deepening (ID) in game-playing programs are better move ordering…, and advantages for tournament time control information.”
  10. “Transposition tables are often used in conjunction with iterative deepening to achieve a partial move ordering.  The search value and the branch leading to the highest score (and best move) are saved for each node.  When iterative deepening searches on level deeper and revisits nodes, the move suggested by the transposition table (if available) is searched first”
  11. Transposition tables are also used to recycle computation when it exists in a graph structure
  12. In emprical results they use very strong programs for comparisons, and modify them as little as possible
  13. The states from games used are states that acutally came up from tournament play; they mention sometimes states used are contrived
  14. They are concerned with node/leaf re-expansions, but in pure trees this isn’t an issue
  15. They say that in practice, memory actually isn’t an issue – and this was written a long time ago
  16. They keep referring to MT-SSS*, but I don’t see where its described explicitly…
  17. The MT (Memory-enhanced Test) framework allows depth-first procedures to implement best-first algorithms

Memory-enhanced Test: a Framework

  1. MT(n,gamma) = A-B(n, gamma-1, gamma) when using transposition tables
  2. One method is to start with gamma = inf and move down from there
  3. Talk about “drivers” for MT, which specify:
    1. n: the root of the tree
    2. first: the starting bound for MT
    3. next: after the completion of a search, how the bounds are updated. This is a function.
  4. MTD would be an algorithm using the MT framework.  They list a number of instantiations of algorithms in the framework.
    1. AB-SSS* = MT-SSS* = MTD(n, inf, bound = g)
    2. Another searches from -inf, then one tries to do a binary search on the true minimax value, they call this MTD(bi)
      1. Mention an equivalent algorithm C* that is also guaranteed to dominate A-B
    3. Can also specify a specific step size to move the bounds by, called MTD(step)
    4. Say MTD(bi) and step seem inferior
  5. Transposition tables prevent waste of reexpansion
  6. MTD(f) is consistently the best algorithm.  They say in practice in these agents, it actually uses less memory than A-B
    1. Says SSS*/MTD(inf) is optimistic (paper seems to be very inconsistent with names, which is even worse across papers
    2. DUAL*/MTD(-inf) is pessimistic
    3. and MTD(f) is realistic
  7. Difference between the algorithms is just a few percent, though (mainly because of the excellent move ordering and other tricks the game programs do aside from the searching itself)
  8. Number of nodes expanded in experiments depends on whether the number of plies is odd or even, which we didn’t see in our synthetic results
  9. If you have a guess as to what the minimax value actually will be, initializing the null window to around that area seems to help (for MTD(f))
  10. “…if MTD(f) is to be effective, the f obtained from the previous iteration must be a good indicator of the next iteration’s value”
  11. Keep in mind the empirical results are relative to very effective game programs that do excellent move ordering and other tricks
  12. Argue against reporting results on synthetic trees:
    1. Real trees have variable branching factor
    2. Value interdependence from parents to children – usually parent values are good predictors for children (a shallow search is a good approximation of a deeper one).  This is probably why UCT can perform so well at Go.
    3. Existence of graph structure, allows for use of transposition table

Appendix on MT-SSS*

  1. Every top-level call to MT fails low and builds a max-solution tree.  The last call to MT (stops the search) fails high and builds a min-solution tree
  2. In the first pass, the left-most solution tree with finite g-value (value of the tree rooted at specified node n) is constructed
  3. During following passes, the following conditions hold:
    1. Before each pass, the tree exposed is the left-most max solution tree with upper bound = gamma.  The children to the left of the min node (child of the root) has already been searched and the lower bound on that subtree is > gamma
    2. During each pass, every node visited belongs to the max solution tree with upper bound = gamma.  For min nodes, children to the left of the child of n (there is only one in the max-solution tree) has a lower bound > gamma and will never be revisited
    3. Whenever nested calls to MT(n, gamma) fail low, a max-solution tree is created where children of min-nodes have property 1

Dyna-H: a heuristic planning RL algorithm applied to RPG strategy decision systems. Santos, Martin H., Lopez, Botella

  1. Paper makes a strange point in abstract, so perhaps a red-flag.  On the other hand I think I saw it because it was referenced by Remi on the RL newsgroup.  “We suggest also, a functional analogy between the proposed sampling from worst trajectories heuristic and the role of drems (e.g. nightmares)…” Whaaaa?
    1. Not sure its refereed as-is, paper is from arxiv, says preprint submitted to knowledge-based systems
  2. Basic idea is to do cheap planning for cases when A* is too expensive, or when there is uncertainty
  3. Dyna-H is basically dyna with a heuristic added (uses epsilon-greedy exploration as described)
  4. Does an approximate tree search and refines results iteratively, can work off of incomplete (i suppose also incorrect?) models of the environment, unlike A* which requires a correct model
  5. Algorithm tries to generate runs through the MDP from the worst (according to the heuristic(shaping)) trajectories
  6. Beats Dyna-Q handily, which is to be expected.
  7. The class of trajectories considered by the heuristic (cartesian) is a different class than from what is actually possible (manhattan), in this case its ok.

New Advances in Alpha-Beta Searching. Schaeffer, Plaat

  1. Ah.   Move ordering means the order in which nodes are expanded in AB search.  Optimally, the best node is selected to be expanded and no other nodes have to be expanded, so getting a good approximation of this is extremely helpful.
  2. Also, this explains windows, which are the difference between the lower and upper bounds used (alpha, beta, respectively).  Narrowing the window (bounds) increases the likelihood of a cutoff.  The narrowest window occurs when alpha = beta-1 when int valued
  3. Early depth cutoffs of actions that quickly seem bad helps
  4. Aspiration search means calling alpha-beta with bounds centered close to the expected (optimal?) value, it can be called with the loosest bounds of +/- infinity.  Using aspiration search is ok as long as the value is in the specified values or if a search value is better than expected, but is bad if it fails low (this is opposite of my expectations, need to think about it more)
  5. If the search fails low, the search window can be adapted and then search can happen again
  6. A series of minimax searches with minimal windows allows convergence on the optimal value.  The search starts with an upper bound of inf and lowers it iteratively, which results in the same performance as SSS*.  Another algo searches from -inf.  MTD(f) searches in both directions simultaneously
  7. The memoization thats used is the depth to (from?) which the node was searched, the best move and its score.  When a node is visited, the best move recorded is searched first before others
  8. They also talk about something called enhanced transportation cutoff which is another optimization
  9. “This unsatisfactory state of affairs [problems with SSS*] has left many researchers in the field with a nagging feeling. Although Alpha-Beta-based programs achieve good results, it could be that depth-first strategies are missing out on some fundamental notion, and that best0first is a better way”

A Minimax Algorithm Better than Alpha-Beta? No and Yes. Plaat, Schaeffer, Pijls, de Bruin

(Note: This is a tech report from Alberta, so unrefereed, I wont be going through it thoroughly.  Perhaps a better thing to read is New Advances in Alpha-Beta Searching)

(Schaeffer is the guy that did Chinook, and solved checkers)

  1. Need to find out what a null-window is
  2. 2 main contributions are:
    1. New formulation of SSS* based on alpha beta (AB) is presented, which is basically AB+transposition tables/memoization.
    2. Present a framework that helps classify existing several best-first fixed-depth search algorithms, and uncovers some new methods as well
  3. SSS* provably expands less nodes than AB, and empirically does so as well
  4. But is complex/difficult to implement and is also generally slower than AB due to the very large priority queue that it builds/maintains (including purging), which has size exponential in depth of the search tree explored.
  5. SSS* functions quite differently from AB, but works only in the fixed-depth setting.  It does a best-first search (like A*) visiting the most promising nodes first, whereas AB uses left to right traversal of the tree.
  6. Claimed in practice, the conditions for the proof of SSS* don’t hold, seems like iterative deepening is sometimes attempted but with this addition it is no longer guaranteed to expand less nodes than AB
  7. AB-SSS* seems to be much simpler than SSS* but is equivalent.  It basically involves using memoization and lying to AB about the min value that can be achieved; an extremely simple loop on top of AB.  This outer loop is a way of transforming depth-first to best-first searches. Mentions other algorithms have been presented that are SSS* but easier to understand.
  8. Transposition tables a generally used to:
    1. improve the quality of the move ordering
    2. detect when different paths through the search space transpose into the same state, to prevent the re-expansion of that node.
  9. In the case of an algorithm in which each ID iteration performs multiple passes over the search tree, like AB-SSS* and AB-DUAL*, there is an additional use for the TT:prevent the re-search of a node that has been searched in a previous pass, in the current ID iteration.
  10. Some operations on the priority queue of SSS* are superpolynomial (purge of exponential sized queue, sometimes up to 90% of running time)
  11. Good “move ordering” achieved by using iterative deepening and memoization increase the pruning power of AB and SSS*.  I think the move ordering refers to the estimated values of moves from the previous iteration of iterative deepening
  12. Using transposition tables is helpful because it turns the tree search into a graph search, gets rid of need for priority queue and uses hashing instead.  Also, using transition tables makes the algorithm able to use set amounts of memory (using more just makes it more efficient), whereas SSS* can’t run on limited memory
  13. Seems like size of transposition tables in common games (othello, chess, checkers) isn’t a huge issue (at least based on the processing that was done when the paper was written in the mid 90s).
  14. Says AB-SSS* is only a few percent faster than just running AB
  15. Show MTD framework that can represent a number of search algorithms
  16. AB-SSS* refines a max solution tree, AB-DUAL* refines a min solution tree
  17. If iterative deepening is used with SSS*, it doesn’t necessarily dominate AB with iterative deepening.  In this case the overall performance of AB is better
  18. MTD(f) does 5-10% better than the most commonly used negascout.
  19. Whereas AB-SSS*, AB-Dual search from opposite extremes of the tree, MTD(f) searches around the middle, is more efficient, early guesses are closer to true minimax value
  20. They discuss some differences between real-life game trees vs those that are commonly simulated:
    1. Variable branching factor in real life; some algoritms leverage this in a least-work-first manner
    2. In real applications, evaluation of a shallow search is often a very good estimate of the value of a deeper search
    3. Global move ordering: moves that are good in one position tend to be good in another as well
    4. Trees remerging to graphs come up often in real life, so memoizing is useful
  21. AB-SSS* uses null window searches only

Anytime Heuristic Search. Hansen, Zhou

  1. Uses weighted heuristic search to quickly find some solution and then continues to search in the same manner to find improved solutions.  Considers generally how to turn recursive search into anytime search
  2. Previous work studied how to bound the error weighted search can introduce in terms of found policy vs optimal.  Generally it is assumed search is terminated after weighted search finds a result, but this paper proposes to keep searching after that point.
  3. The paper claims that as long as the heuristic is admissible, the first goal node selected for expansion is guaranteed to be optimal, but someone told me A* can lead to suboptimal results in the mult-goal setting…
    1. Whenever a node is expanded, it optimal value has already been computed
  4. Weighting the heuristic function makes the value estimate non-admissible, so the result found may not be optimal.  On the other hand, the result is generally found more quickly than in the case where an admissible function is computed.
  5. For f'(n)=g(n)+w*h(n), the cost of the solution can’t be more than a factor of w worse than optimal.  This solution is said to be epsilon-admissible for epsilon
  6. “Weighted heuristic search is most effective for search problems with close-to-optimal solutions, and can often find a close-to-optimal solution in a small fraction of the time it takes to find an optimal solution”  Basic idea of kappa in Bubeck’s stuff
  7. Other approaches of biasing search are adjusting weights according to depth in search, using weighted heuristics to identify a subset of open nodes that can be opened w/o loss of epsilon-optimality
  8. Can check convergence to optimal by computing costs with unweighted admissible heuristic (or just actual costs)
  9. Basic idea can be used with non-admissible heuristics as well, but I guess in that case the epsilon-optimality is gone
  10. They also discuss using heuristic to avoid pushing nodes that must be suboptimal onto the queue
  11. They also mention it may be good to apply pruning to items in queue that can’t be optimal can be helpful
  12. A node can be reinserted into the queue if its cost is later revised to be lower than originally computed, seems like the dynamic programming in LAO* is a better method than reprocessing the graph like this
  13. Proofs for convergence to optimality and bound of suboptimality of anytime weighted A*
  14. The upper bounds on the cost of the best path found are strictly monotonically decreasing
  15. The empirical performance of the algorithm is evaluated on 8 benchmark problems from biennial planning competitions.
    1. In some cases blew through 2 gigs of ram and couldn’t complete search
  16. Also look into sequence alignment, discusses method that doesn’t expand all vertices from each node
    1. A domain like this might serve FS^3 better than A* variants
  17. A big issue seems to be maintaining a small open list in terms of memory requirements
  18. Vanilla weighted A* is often faster than A* because in cases where there are many near-optimal paths, it doesn’t spend much time trying to differentiate between them.
  19. Anytime weighted A* sometimes reaches the optimal solution using less memory and time than A*.  A* is optimally efficient in terms of the #of nodes expanded, but not optimal in terms of these other metrics
  20. Instead of weights, could use any other non-admissible heuristics
  21. Show how to convert Recursive Best-First Search (has linear memory in the depth of the search) into an anytime method.
  22. RBFS is a heuristic algorithm that expands frontier nodes in a best-first manner, but does so with a DFS-based search
    1. Build a stack containing all nodes visited along the current path and all the siblings of those nodes, has memory requirements of O(depth*branching factor)
    2. Instead of going fully depth-first, RBFS keeps track of f-cost of best alternative path available from any ancestor of current node.  If f-cost of current path exceeds the next best cost over the nodes currently recorded by a certain threshold, backtrack and start search from cheaper nodes
    3. <Skipping rest of that as it doesn’t seem particularly relevant>
  23. Previous algorithm called Anytime Repairing A*(Likachev, Gordon, Thrun) is pretty similar to AWA* except it decreases the weight after each search and has a technique of limiting the node reexpansions.  It was used on real time robot planning.
  24. Reducing the weight after each step brings results closer to optimal each time, at the cost of potential longer search times
  25. Show an example where ARA* seems to have nodes stored exponential in that of AWA* (as weight increases)
    1. This is due to the changing weights.  Since using weights makes the algorithm inconsistent(a node can be expanded before its cheapest cost is found), a node may have to be put on the queue again so the new value can propagate through.  Changing the weights incrementally can make this happen over and over again.
    2. Also argue that the method used to prevent nodes from being readded to the queue causes more distinct nodes to be expanded and can prevent good solutions from being found.
  26. Discuss Real-Time A* which assumes search is interleaved with execution, also something called DTA*
  27. Depth-first branch and bound (DFBnB) are good for problems that have many solutions a the same depth (traveling salesperson).  Finds a suboptimal solution and then prunes anything that has to be worst than the best found so far.
    1. Ideas similar to DFBnB was brought into A* with Iterative-Deepening A* (IDA*), but this is bad in cases were all nodes have distinct edge costs (like if they are reals), empirically good a tree-searching
  28. Bidirectional A* starts one search from the beginning and one from the goal
  29. Some methods that do local-search by finding a suboptimal path and then exploring segments around the path.
  30. Also k-group A* that focuses on groups of sequences as opposed to one large sequence

LAO*: A heuristic search algorithm that finds solutions with loops. Hansen, Zilberstein

  1. A* tries to simply find a path, another solution could be in the form of a tree.  AO* finds a solution in the form of an acyclic graph/tree (specifically discuss and/or trees).  This is a conditional structure
  2. LAO* can find solutions in the form of graphs with with loops
  3. Claims it “shares the advantage heuristic search has over dynamic programming… Given a start state, it can find an optimal solution w/o evaluating the entire state space”
  4. Loops are relevant for finding policies in stochastic MDPs where an action can take you back to a state visited previously.  They are concerned with stochastic shortest-path problems (has terminal states)
  5. Under certain conditions A* evaluates the minimum number of states, same for AO*
  6. Discusses Trial-based RTDP, since it only performs computation on encountered states, it can avoid doing computation of entire MDP.
  7. Check out convergence theorem from LRTA* (learning real-time A*)
  8. AO* can be made more efficient by adding “solve-labeling.”  A state is labeled solved if it is a goal state, or if all children are solved.  This is similar to what FS^3 does.
  9. Proofs of admissibility, correctness
  10. They call doing backwards bellman backups from the goal to the start pruning by dominance.  They also call using bounds to prune the search tree/graph forward “pruning by bounds” (branch and bound)
  11. AO* uses branch and bound in its forward expansion step and dynamic programing for cost-revision
  12. The trick in allowing AO* to work in cyclic domains (making it LAO*) is to note that since a value function is being computed, other methods such as VI can be used in place of pure backwards induction
    1. It must be the case that when they query for an s,a they get all of T(s,a) and not just s’ because they dont really discuss resampling – not sure.
  13. Pretty simple proofs of admissibility and correctness
  14. RTDP was proved to converge to optimal, but there is no convergence test  or error bound for the algorithm, they state the proofs for LAO* for these can be adapted to RTDP
  15. Discuss consistency h(i) <= ci(a)+h(j) for next state j, as desirable because it ensures state costs increas monotonically as the algorithm converges.  If a nonconsistent (but admissible) heuristic is used, a pathmax trick can make it behave in a consistent manner (works in A*, AO*, LAO*).
  16. Discuss weighting heuristic, which weights the cost observed versus the heuristic cost (in the original formulation, both costs are 1).  Its possible to bound by how much the found solution is different from optimal.
  17. In A* if two heuristic functions are used s.t. h1(i)<=h2(i)<=f*(i), the set of nodes examined when following h2 is a subset of those followed when exploring h1.  This isn’t true necessarily in AO*/LAO*, but holds in terms of worst-case sets of states
  18. LAO* can be used in MDPs without terminal states by adding discount factor
  19. Analyze LAO* performance on racetrack and a supply-chain management problem.
    1. In racetrack, performance of LAO* in zero-heuristic case is basically the same as VI, PI takes too long to finish.  Giving a good heuristic allows LAO* to finish in 1/3 the time, using weighting helps even more
    2. They use tricks to improve performance of LAO* because otherwise it is quite poor, one thing is analyzing all vertices from an edge at the same time instead of one at a time, most of the cost is from running VI
    3. Show convergence as faster than RTDP.
  20. It would be interesting to see performance of LAO* vs prioritized sweeping