Planning for Markov Decision Processes with Sparse Stochasticity. Likhachev, Gordon, Thrun. NIPS 2004

  1. Algorithms for deterministic planning problems (Dijkstra’s) run faster than those for deterministic problems (primarily because in deterministic problems, optimal plans are acyclic).   This work tries to split the difference for domains that are only sparsely stochastic
  2. Algorithm is called MCP (MDP compression planning)
  3. Combines A* with VI for solving stochastic-shortest-path problems
  4. Empirical results involve integration and mapping of two actual robots (one helicopter, and one ground robot) to do mapping and then planning
  5. In their definition of sparse stochasticity, most states have actions with deterministic outcomes, and only a few have stochastic outcomes (this is different from the more common definition of sparse stochasticity where all transitions may be stochastic, but that distribution is only across a small number of states)
    1. The trick with this assumption is that since only a few of the states are stochastic, they are the only ones that can lead to cycles in an optimal policy
  6. The idea then is to build an MDP that consists only of the stochastic states from the original domain.
    1. So all the deterministic states are compressed out
    2. The states included (the stochastic ones) are called distinguished states.  In addition, start and goal states are included to the MDP
  7. Based on this differentiation, a fast algorithm for planning through determinstic states (such as A*) and then a different one for dealing with the stochastic states (like VI)
  8. Figuring out how to do this decomposition isn’t straightforward (for example, the transition diagram in the MDP of distinguished states depends on reachability between deterministic states)
    1. So the decomposition is done incrementally
  9. In the problem considered “The ground robot needs to plan a path from its current location to a goal, but
    has only partial knowledge of the surrounding terrain. The helicopter can aid the ground
    robot by flying to and sensing places in the map.”
  10. Map is discretized and described in 2d
  11. In areas still uncovered, there is a belief as to the probability that state will be open (and therefore passable).  The ground robot can try and plan through those areas as if they were open, but runs the risk of having to replan if one of the cells turns out to be occupied.  The other option is it can request that the helicopter uncover that cell
  12. The overall planning goal is to minimize the cost of running both robots, with the episode terminating when the ground robot reaches its goal and the helicopter goes back to home
  13. This forms a POMDP, (which are in general intractable) <but this particular class of POMDPs is almost certainly tractable because it is possible to get to ground truth just by observing each corresponding cell>
    1. <Ah not exactly> Sensors are imperfect and may contradict each other.  But it is assumed that most of the path planning is deterministic and only rarely noise in sensing will arise
  14. Planning is done in belief space; they discretize this though and plan in discretized belief space
  15. The compressed MDP is built by running deterministic searches <I suppose a deterministic policy?  Which part in particular is deterministic is a little unclear.  Seems like the policy is deterministic as>
    1. Actions that produce different outcomes are put in the set of distinguished states
    2. The best deterministic paths are memoized and are used to produce better cost estimates (in a manner similar to Bellman backups)
    3. These deterministic searches can use an admissible heuristic, which means that much of the state space may be (correctly) unexplored, which leads to cheaper planning
  16. Because the MDP of distinguished states is small, planning there is cheap
  17. Planning done over the stochastic states is used as a heuristic for the deterministic planning part
  18. Each action in the MDP actually corresponds to a series of actions in the domain; only the final action in this sequence may produce a stochastic outcome
    1. “Among all compressed actions starting at s and ending at (s’, a) there is (at least) one with lowest expected cost; we will call such an action an optimal compression”
    2. If all compressions are optimal for all states in the MDP, then the MDP is said to be fully compressed
  19. Solving the full compressed MDP will produce the optimal policy, but its possible to get away with less work, in particular because the planning problem considered is goal-directed, so many states simply may not have to be considered
  20. State selection on each iteration is based on something in the form of the Bellman residual
  21. <I wonder how this relates to prioritized sweeping>
  22. <Not taking much  notes on specifics of algorithm, but seems totally reasonable for the current amount of attention I am paying>
  23. The top level of the algorithm repeatedly tries to find trajectories from start to goal
    1. Values produced by the MDP used as a heurisitic in this search should be admissible
  24. Need a modified version of A* that can factor in stochasticity in the domain; they propose such a modified A* planner
  25. Has theoretical guarantees, PACy
  26. Compare against RTDP, LAO*, and VI – VI was restricted to problems with perfect sensing because otherwise the size of the state space is prohibitive
  27. 5-10x the speed of competitors.  The slowest part is <surprisingly> the deterministic search component <I suppose the stochastic states are very sparse>.  Mention another search algorithm may help reduce the cost of that particular component
  28. As stochasticity becomes less sparse, its time advantage reduces, but is still better than competitors













Multiscale Markov Decision Problems: Compression, Solution, and Transfer Learning. Bouvrie, Maggioni. Arxiv 2012Use

  1. Deals with finding hierarchies – not just using them
  2. Covers two main contributions
    1. Compressing /homogenizing MDPs
    2. Identification of possibilities for transfer
  3. “Regardless of how the partitioning is accomplished, the statespace is divided into a multiscale collection of “clusters” connected via a small set of “bottleneck” states. A key function of the partitioning step is to tie computational complexity to problem complexity. It is problem complexity, controlled for instance by the inherent geometry of a problem, and its amenability to being partitioned, that should determine the computational complexity, rather than the choice of statespace representation or sampling. The decomposition we suggest attempts to rectify this difficulty. “
  4. Multiscale compression is local, and recursive
  5. Present a form of asynchronous policy iteration, which as O(n log n) cost per iteration
  6. In terms of transfer, they don’t only consider the common case of sharing low-level options, but also the other side of the coin that the high level dynamics between options may be the same, even if the dynamics in the low-level policies are different
  7. Consider stochastic policies, as well as discount factors that are distinct for state action pairs
  8. 3 Step process:
    1. Partition states into clusters connected by bottleneck states
    2. Compress MDP so each cluster becomes a state
      1. This may be done a number of times
    3. Solve hierarchy top-down
  9. Use diffusion map embeddings to cluster domain (but mention there are other possibilities)
  10.  Uses Chung’s method for finding the Laplacian of directed graphs, <but this can only describe a subset of all possible directed graphs>
    1. Does “low conductance cuts”
    2. Briefly mention a number of other possible metrics
  11. Describes two types of bottlenecks
    1. Geometric, which are based on graph structure alone
    2. Problem, which is based on the geometric bottlenecks as well as the goal structure of the MDP
    3. “If the policy is already strongly directed according to the goals defined by the rewards, then the bottlenecks can be interpreted as choke points for a random walker in the presence of a strong potential.”
  12. Have rewards, transitition funcitons, states, actions for each MDP at different levels of coarseness
    1. “One of the important consequences of these definitions is that the optimal fine scale value function on the bottlenecks is a good solution to the coarse MDP, compressed with respect to the optimal fine scale policy, and vice versa. It is this type of consistency across scales that will allow us to take advantage of the construction above and design efficient multiscale algorithms. “
  13. Mention similarities of the approach to other classes of problems, where the basic approach is to make a small problem of a large one by locally averaging parts of the problem
    1. Coarsening: PDEs  and meshes <?>
    2. Homogenization: PDEs
    3. Lumping: Markov chains
  14. Assume that the policy has at least some degree of chance action selection
  15. Basically solving the MDP at all hierarchies results in an iterative process
  16. <skimming the rest>

Intrinsically Motivated Hierarchical Skill Learning in Structured Environments. Vigorito, Barto. IEEE Transactions on Autonomous Mental Development 2010

  1. Covers intrinsic motivation for learning in hierarchical environments
  2. “Using Bayesian network structure-learning techniques and structured dynamic programming algorithms, we show that reinforcement learning agents can learn incrementally and autonomously both the causal structure of their environment and a hierarchy of skills that exploit this structure.”
  3. Motivation is “ensemble” <what I think of as transfer> learning
  4. Mentions structured value iteration for VI in domains with compact representations
  5. Factored representations lead to sparseness
  6. Naturally, mention VISA by Jonsson, Barto
  7. But VISA needs a model of the domain to work from (in the form of DBN).  The goal here is to do model building so that assumption can be removed
  8. Produces results that are recursively optimal but may not be hierarchically optimal – full solution may be suboptimal, even though policy for each option is optimal
  9. Optimize DBN according to Bayes Information Criterion (BIC)
    1. Doing this optimally is intractable, they use a greedy heuristic
    2. Basically involves building decision trees using BIC (use a chi-squared test)
  10. Actions are purely explorative; are selected s.t. leaf (leaves) of <s,a> maximize change in entropy in leaf
  11. In original VISA, algorithm <ironically> doesn’t explore thoroughly enough, because action selection is myopic in terms of improving knowledge of DBN only as far as current state is concerned, so there is no real directed exploration in the way that RMAX does, mention a few possible fixes <although its not clear if any of them are “smart” in an RMAX sense>
    1. Mention work by Schmidhuber for intrinsic motivation (and its problems) <but why not RMAX as it does things correctly?>
    2. Then mention stuff by Barto that is better, but isn’t designed for factored domains
    3. <Oh, they will go into KWIK, Carlos’ stuff later in related work>
  12. To do learning, maintain a set C of what are called controllable variables, which are variables which the agent knows how to set to any possible value that feature can take
  13. When choosing an action, look for features that may be changed if more information is achieved.  Then make sure that features ancestors in the DBN are controllable.  If so, set up a plan to try and change the feature
  14. So how do you know when to stop refining your model for a feature in stochastic environments?
  15. <Its a little unclear to me how this happens.  There is some expectation for the probability that the agent will be able to change a feature to a particular value.  If it fails to do so with a ratio a certain value less than that, it is abandoned?  This doesn’t make sense so I must not understand it correctly>
  16. Looks like they do the RMAX trick, but the feature they pick to assign value to is the one that has controllable sources and the highest potential change in entropy <I thought Bayes info – is it the same?>
  17. “When options happen to be created prematurely and are malformed, their lack of utility is discovered fairly quickly by the agent when it attempts to use those  options in its experimental plans and they fail repeatedly. These options will be removed from the agent’s skill set until the agent performs more experiments relevant to discovering their structure, at which point they will be re-created and tested in further experiments. Once a correct option is learned, its empirical success rate will on average match its expected success rate, and the option will remain in the agent’s skill set to be used in
    all further experiments.”
  18. Experimental results
  19. The light box domain has 20 “lights” or binary variables: ~1 million raw states, 20 million <s,a> pairs
    1. Lights are separated into categories
      1. Circular, which are controlled directly by their switch
      2. Triangular, which are turned on if a particular set of circular lights are on (with own respective switch)
      3. Rectangular which depend on triangular (with own respective switch)
      4. Diamond which depend on rectangular (with own respective switch)
    2. Stochastic; actions “work” with p=0.9
    3. “However, if an action is taken to toggle a light whose dependencies are not currently satisfied, the entire domain is reset to all lights being off.”
      1. No way random actions can get anywhere
    4. Learning must be active so that exploration can be done in an intelligent directed manner
  20. <Seems to learn effective, but not “correct” policies, dependencies that are flat (1,2,3 needed on to turn on 4, but 1,2,3, independent of each other) end up being constructed serially so it looks like they are dependent (1, then 2, then 3 – even though they could be done in any order)>
  21. Not surprisingly, global (directed) exploration is the only thing that works in this domain
  22. Planning times for planning with options+primitives vs just primitives is options is flat with increasing problem complexity (level in the hierarchy — circular, triangular, rectangular, diamond) while primitives only has exponentially increasing planning cost with increasing complexity
  23. Mention KWIK algorithms for DBNs, knock them for having limited (and exponential cost in) in-degree, but you must have that in general problems to get an optimal solution – the greedy approach here only works in submodular domains

In Transit to Constant Time Shortest-Path Queries in Road Networks. Bast, Funke, Matijevic, Sanders, Schultes. Algorithm Engineering and Experiments (ALENEX)

  1. Beats previous best results by two orders of magnitude <was also published in Science, but the article is so short I’m not taking notes on it>
  2. In worst-case Dijkstras cannot be below linear because it may have to examine all nodes and vertices
    1. The only way to beat this is by preprocessing
  3. Constant query time can be achieved by superlinear storage <Upon looking up that reference, it really looks linear.  The response may be slightly suboptimal (but this amount is constant in the size of the graph)>
  4. The improvements in quality and speed here exploit the fact that roads have particular properties, like low branching factors, fact that highways are designed to be used for long distance travel…
  5. Idea of transit node is “that for every pair of nodes that are ‘not too close’ to each other, the shortest path between them passes through at least one of these transit nodes.”
    1. Additionally, even when travelling very far, the number of transit nodes that must be used is very small (like 10)
  6. In the US road data they consider, 24 million nodes, and 10,000 transit nodes
  7. Example (paper has figure): “Finding the optimal travel time between two points (flags) somewhere between Saarbr¨ucken and Karlsruhe amounts to retrieving the 2 × 4 access nodes (diamonds), performing 16 table lookups between all pairs of access nodes, and checking that the two disks defining the locality filter do not overlap. Transit nodes that are not relevant for the depicted query are drawn as small squares.”
  8. Mentions that there are special classes of algorithms for working on planar graphs, which roads (basically) are
  9. They have a previous paper that hierarchically builds paths based on increasingly long paths <should check out>
    1. This paper is by the same authors, and ends up doing something very similar to transit nodes (they are just nodes high in the hierarchy).  In the earlier paper access nodes were computed on the fly whereas here they are precomputed
    2. It also doesn’t deal with how to figure out the distance in the distance table is indeed the shortest distance
  10. A* is actually one of the best methods for real world planning (when done around landmarks and the the triangle inequality).  Here, access nodes can be used as the landmarks
  11. The hierarchy approach also needs to be grounded in actual geographic information to work <at small scales?>
  12. Nodes are labeled either close/not close to individual access nodes
  13. Access nodes are found by backwards search by Dijkstra’s, and do this until all paths converge on another transit node, then take all nodes found through this search that do not reach another transit node
  14. There are a number of ways to implement a locality filter; the one they recommend is to examine everything that is reachable within a cluster in the hierarchy
  15. This information allows for what is effectively constant time queries for shortest paths
  16. There is another way to implement the algorithm that is grid based – uses 2 grids of different resolution
    1. Needs some optimization to function efficiently
    2. Computing access nodes in grid-based implementation is very simple
  17. With the optimizations, long-distance queries via transit nodes is actually faster (1 million x) than local queries.  Although on average local queries that dont go through access nodes are about 1% of queries <not sure how this is measured>  it still dominates the running time
    1. There is a scheme also that allows speedups for local queries as well
  18. In the grid setting, there are tradeoffs between space, number of transit nodes <both increasing with resolution> and #local queries that don’t go through transit nodes <decreasing with resolution>
  19. Ah, to get the best of both worlds they use a hierarchy of grids
  20. The exact algorithm used to compute access nodes and transit nodes is confusing, and their diagram doesn’t help either.  Its written in English but would be much more understandable written in math (or at least pseudocode)
  21. Queries are super-easy to run (if nodes are more than 4 cells apart, path must go through a transit node, which already have pairwise distances computed)
  22. Can do hierarchical <seems to be just multiple griddings of different resolutions but I think there is another paper that may go into it more thoroughly>
  23. Do a greedy set cover to optimize storage for access nodes
  24. Implemented on data with 24m nodes, 60m edges
  25. Run on a ’07 dual-proc machine, 99% of queries have an avg time of 12 microseconds (adding the extra 1% moves it to 63 msec – are those the ones that don’t go through transit nodes?)
  26. Computation times are not “Dijkstra rank dependent”
  27. <Ok that’s not the end of the paper – just for the grid part, now moving on to a more fully hierarchical method>
  28. Highway hierarchies
  29. An edge is a highway edge is an edge that is in the shortest path between two vertices, but not in the local neighborhood of those vertices
  30. They also may contract edges based on a bypassbility criterion, but that isn’t yet discussed at length
  31. The hierarchy is then constructed by recursively building highway networks on top of each preceeding layer
  32. With increasing path distance, paths will go increasingly higher in the highway hierarchy
  33. So the nodes existing at some level in the hierarchy can be used as the transit nodes
  34. #transit nodes controlled by neighborhood size and level of hierarchy
  35. “Note that there is a difference between the level of the highway hierarchy and the layer of transit node search” <on this read through the distinction is actually not clear to me – looks like they are basically equivalent but in reverse order?>
  36. There is a gloss over how description of how access nodes are found – another paper is referenced <which I will try and get to> and point #20 is referenced
  37. Present 2 methods of the algorithm, which have different tradeoffs in terms of space, preprocessing, and query time
  38. Search is done top-down
  39. “For a given node pair (s, t), in order to get a complete description of the shortest s-t-path, we first perform a transit node query and determine the layer i that is used to obtain the shortest path distance. Then, we have to determine the path from s to the forward access node u to layer i, the path from the backward access node v to t, and the path from u to v”
  40. There is still room for improvement for the locality filters
  41. For US data, 99% of cases are handled at the top layer
  42. Search times start low and increase as searches are still local but further away, and then drop extremely low once nodes are far enough that transit nodes can be leveraged
  43. <On to conclusion>
  44. “Building on highway hierarchies, this can be achieved using a moderate amount of additional storage and precomputation but with an extremely low query time. The geometric grid approach on the other hand allows for very low space consumption at the cost of  slightly higher preprocessing and query times.”
  45. “There are many interesting ways to choose transit nodes. For example nodes with high node reach [9, 6] could be a good starting point. Here, we can directly influence |T |, and the resulting reach bound might help defining a simple locality filter. However, it seems thatgeometric reach or travel time reach do not reflect the inhomogeneous density of real world road networks. Hence, it would be interesting if we could efficiently approximate reach based on the Dijkstra rank.
    Another interesting approach might be to start with some locality filter that guarantees uniformly small local searches and to view it as an optimisation problem to choose a small set of transit nodes that cover all the local search spaces.”

How to Use Spanning Trees to Navigate in Graphs. Dragan, Xiang. FOCS 2009.

  1. Investigates how to navigate graphs efficiently using the spanning tree T of a graph G
  2. Navigation is done through knowledge of a nodes neighbors (or k-neighborhood) and T
  3. In wireless networks, a common heuristic is to simply move packets to the neighbor that is geographically closest to the destination.  Of course this scheme can run into problems so there are various methods to try and escape local optima
    1. If geographic distances arent applicable or unkown, other similar methods measure location based on proximity to a number of “beacon” nodes.  This usually works well in practice, but does not have theoretic guarantees
  4. If however, an embedding of the spanning tree is used instead of beacons, delivery is guaranteed
  5. Its possible to encode node labels compactly s.t. node distances can be found simply by comparing labels (of size log^2 n)
    1. So minimum spanning tree routing can be done just with these particularly encoded labels and an adjacency list for each node
  6. In some classes of graphs routing according to the MST will be optimal
  7. There are two additional tree routing strategies (aside from the obvious one) that need a small amount of information for each node (logarithmic) that encodes how far up the spanning tree it is
    1. These two strategies are “simpler and more compact” than the vanilla method, and in some cases the paths created will be shorter
  8. The results are all for particular classes of graphs which I’m not going to investigate in particular

How Optimized Environmental Sensing helps address Information Overload. Guestrin. Talk 2009

  1. When working in non-submodular functions there is <usually?> a trick that allows the problem to be solved.
    1. That is SATURATE
    2. Basically its as good as you can do
    3. Works by artificially allowing more sensors than allowed and then truncating the actual selection
    4. Of course this isn’t for all problems, but a subset of problems that are not natively submodular

Adaptive Submodularity: A New Approach to Active Learning and Stochastic Optimization. Glovin, Krause, Ray. Talk (NIPS 2010)


  1. Basic idea is consider submodular functions, they they are stochastic and unfold over time.  So instead of making all decisions in one shot (before execution begins), better to wait to see how the first decision unfolds before making the second, etc.
    1. Gives the example of influence in social networks – if you want to get people to plug a product you give to a few people for free, see how influence is spreading from the first person before deciding the second
  2. The setting described here is a generalization of standard suboptimality
  3. Analysis here allows many results from standard submodular optimization to the adaptive setting
  4. Instead of maximizing margin, maximizes expected margin based on current state, but does this only for the first selection, and the results of the first selection on state then become conditions for the next choice
  5. Gets standard bounds of 1-1/e
  6. Because adaptive, can now do things like “take the minimal number of actions needed to achieve X”, which you couldn’t do with standard submodularity (there, you can just say “maximize the benefit of N choices”
    1. This is a log-approximation ln(n)+1 for stochastic set cover
  7. Another application optimal decision trees
    1. “Generalized binary search” which is equivalent to max info gain
    2. This problem is adaptive submodular <I thought things like xor in decision trees arent submodular, but maybe because we are talking w.r.t. information gain it is?>
    3. Give bounds for this approximation
    4. “Results require that tests are exact (no noise)!”
  8. If trying to do decision trees in noisy setting, can do a Bayesian version
    1. But generalized binary search, maxing info gain, and maxing value of info aren’t adaptive submodular in noisy setting.  In noisy settings they require exponentially more tests than optimum <Should really read this paper>
  9. Trick is to transform noisy problem into normal problem by making outcome part of the hypothesis
    1. Only need to distinguish between noisy hypotheses that lead to different decisions
  10. Gives bounds for this Bayesian active learning for noisy domains
  11. Give an example of testing different psyhological/economic models of how people perform decision making
    1. For this setting, turns out many common approaches actually do much worse than random

Near-optimal Nonmyopic Value of Information in Graphical Models. Krause, Guestrin. UAI 2005.

<Working off the arxiv version, although it looks the same as the UAI one>

  1. For submodular functions, gives proof of a tractable approach for solving combinatoric problems that is within a factor of (1-1/e-ε) for ε > 0.
    1. Shows this is pretty tight, and that nothing can be better than (1-1/e) unless P=NP
  2. In terms of sensor networks, the goal is to maximize information gain, or equivalently, “decrease uncertainty about unobserved variables”
  3. A common criterion for node selection in sensor placement is to pick the location that has maximal entropy, given the previously selected locations.  This method has a fault, however, in that it will push sensed locations to the edges of the space.  The problem with this, in particular, is that sensors then have about half of their possible observation area (if its spherical) clipped by the boundaries
    1. Thats why the approach here, which is to minimize entropy of unsensed locations is better; it ensures more even coverage by sensors
  4. A set function is submodular if <among other possible things> it satisfies the diminishing returns property, that is, if a new piece of information is added to a set, or its superset, the gain in F will be larger with a set than its superset
    1. The marginal increase of X to A is the increase in F between A and X
  5. Exact maximization of submodular functions is NP hard (by reduction from max-cover)
  6. F also must be non-decreasing (like information “the information never hurts principle”; adding more sensors can never reduce the amount of information you have)
    1. So dealing with entropy is submodular.  Information gain is as well
  7. Basic theoretical results are from G. Nemhauser, L. Wolsey, and M. Fisher. An analysis of the approximations for maximizing submodular set functions. Mathematical Programming, 1978.
  8. Information gain is not always submodular <xor pattern in decision trees>, but given independence assumptions it is
  9. “A convenient property of submodular functions is that they are closed under positive linear combinations. This allows us, for example, to have temperature models for different times of the day, and select sensors that are most informative on average for all models.”
  10. Versions with budgets, individual unique costs per item
  11. Unit-cost case is squared cost, budgetedis O^5, but if willing to take another hit on accuracy this can drop down to cubic
  12. Has links to simulated annealing
  13. <Skipping the proof that the greedy bound is tight with any non-exponential solution>
  14. The ε in the proof comes from the fact that accurately computing conditional entropies as necessary is #P <I’m actually not familiar with what that means offhand, but based on context it must mean more than poly>, so a method of estimating entropies is used instead.
  15. <on to experimental results>
  16. In the first experimental result they place temperature sensors around their lab.  The quality of predictions with their method using 5 sensors is better than 20 sensors placed the standard entropy approach (that pushes sensors to boundaries and wastes part of their field)
  17. They had to lay down tiles to segment the continuous space of the lab <although in some segments there is more than one sensor, so I’m not sure exactly what they are doing.  Oh it looks there are candidate positions pre-selected, which have line-of-sight connectivity, so the hierarchy doesn’t impact that graph but sits on it.  Still not totally sure whats going on>
  18. Even when the assumptions of conditional independence are violated, the results are still good, worse than standard entropy with less than 5 sensors, but better with more than that
  19. Graph that shows linear run time <I thought it was poly, but they mention a simple and complex model, so one may be linear and the other poly?>
  20. Next results are on highway data.  Like in the temp results used a hierarchical model <although I’m still not sure what that means exactly>
  21. Here, they compared placement based on information gain, entropy, and random choice.  Random actually yielded more information gain than the entropy-based approach: “This indicates that maximizing entropy and information gain appear to be conflicting goals, resulting in qualitatively different behaviors.”

Transit Node Routing Reconsidered. Arz, Luxen, Sanders. Experimental Algorithms 2013.

<These notes are from the arxiv version>

  1. “Transit Node Routing (TNR) is a fast and exact distance oracle for road networks.”
  2. This paper
    1. Provides a means of speeding up preprocessing by an order of magnitude
  3. Basic idea in contraction hierarchies is to do some preprocessing to allow for speedups of Dijkstras
  4. TNR is one of the best methods, and can get queries down to almost constant time (a small number of table lookups)
  5. The basic assumption behind TNR is that queries that span long geographical distances almost always go through centralized connections <highways> called transit nodes

    1. Generally, there are only a few entrances to each transit node, so distances to these entrances as well as pairwise between transit nodes is stored
    2. This information allows for a locality filter, which “… indicates whether the shortest path
      might not cross any transit nodes, requiring an additional local path search.”
  6. A couple of issues with TNR traditionally is very expensive preprocessing, as well as the need for geographic information of the nodes (which a little bit of a poor fit when dealing theoretically with graphs; the hope is that would be abstracted away)
  7. On a rough level, contraction mappings order nodes by importance, and contracting them — removing them temporarily from the graph, and inserting new shortcut edges between that removed node and its neighbors only when necessary to maintain shortest path distances
  8. When doing search, all original edges are considered, as well as the additionally processed shortcut edges (although only those that lead to more important nodes).  <Its not clear if also the original edges are filtered this way, but it seems like it is the case, as the resulting graph forms a DAG.>
    1. Leads to a graph with particular structure (all shortest paths now have a midpoint), which allows for a Dijkstra’s with a minor extension to be used
  9. Basically, these tricks allow for linear as opposed to quadratic all-pair shortest paths to be computed
  10. “TNR in itself is not a complete algorithm, but a framework. A concrete instantiation has to fi nd solutions to the following degrees of freedom: It has to identify a set of transit nodes. It has to fi nd access node for all nodes.”
  11. “CHs order the nodes in such a way that nodes occurring in many shortest paths are moved to the upper part of the hierarchy. Hence, CH is a natural choice to identify a small node set which covers many shortest paths in the road network.”
    1. <This is a loose definition of course, and for a real-world mapping problem you can’t hope to tackle it with an exact solution, so the trick must be what sort of heuristic to use>
  12. The algorithm presented here does not require geometric information about the nodes, resolving that wrinkle
  13. <skimming some parts of the papers, like the lemmas>
  14. <the remainder of the paper is empirical results – actually on a DIMACS dataset of Europe with about 20M nodes/edges>
  15. <discussion of application of other common graph-search extensions/heuristics>
  16. <discussion>
  17. “In particular, at the price of twice the (quite fast) preprocessing time of contraction hierarchies, we get two orders of magnitude faster query time. Our purely graph theoretical locality filter outperforms previously used geometric filters. To the best of our knowledge, this eliminates the last remnant of geometric techniques in competitive speedup techniques for route planning.”
Tagged ,

A Local Algorithm for Finding Well-Connected Clusters. Lattanzi. Mirrokni, Zhu. ICML 2013 Talk.


  1. A good cluster should have few edges going relative to the size of the size of the cluster
  2. Called the conductance (clustering by cutting according to that criteria)
  3. Google research, so google scale, doesn’t want even linear data cost
  4. Algorithm is initialized on some vertex, moves around locally, and then finally produces some set S, hopefully with a low conductance
  5. Has theoretical bounds, cant do better than that due to Cheeger’s inequality
  6. Algs have runtime linear in size of cluster made from starting point
  7. Algorithm is quite simple
  8. Proposes a different metric than standard conductance; connectivity
  9. Want small conductance and high connectivity (imposed by a relative combination); want better connection inside than outside
  10. This metric improves theoretical bounds, this beats other random walk based spectral algorithms
  11. Use of this extra data allows for tighter bounds
  12. <Proofs>
  13. Claim a followup work that is local but is big-O optimal based on flow analysis

Get every new post delivered to your Inbox.