<Working off the arxiv version, although it looks the same as the UAI one>

- For submodular functions, gives proof of a tractable approach for solving combinatoric problems that is within a factor of (1-1/e-ε) for ε > 0.
- Shows this is pretty tight, and that nothing can be better than (1-1/e) unless P=NP

- In terms of sensor networks, the goal is to maximize information gain, or equivalently, “decrease uncertainty about unobserved variables”
- A common criterion for node selection in sensor placement is to pick the location that has maximal entropy, given the previously selected locations. This method has a fault, however, in that it will push sensed locations to the edges of the space. The problem with this, in particular, is that sensors then have about half of their possible observation area (if its spherical) clipped by the boundaries
- Thats why the approach here, which is to minimize entropy of
*unsensed *locations is better; it ensures more even coverage by sensors

- A set function
*F *is submodular if <among other possible things> it satisfies the diminishing returns property, that is, if a new piece of information is added to a set, or its superset, the gain in *F* will be larger with a set than its superset
- The
*marginal increase* of *X* to *A* is the increase in *F* between *A* and *A *∪* X*

- Exact maximization of submodular functions is NP hard (by reduction from max-cover)
*F* also must be non-decreasing (like information “the information never hurts principle”; adding more sensors can never reduce the amount of information you have)
- So dealing with entropy is submodular. Information gain is as well

- Basic theoretical results are from G. Nemhauser, L. Wolsey, and M. Fisher. An analysis of the approximations for maximizing submodular set functions. Mathematical Programming, 1978.
- Information gain is not always submodular <xor pattern in decision trees>, but given independence assumptions it is
- “A convenient property of submodular functions is that they are closed under positive linear combinations. This allows us, for example, to have temperature models for different times of the day, and select sensors that are most informative on average for all models.”
- Versions with budgets, individual unique costs per item
- Unit-cost case is squared cost, budgetedis O^5, but if willing to take another hit on accuracy this can drop down to cubic
- Has links to simulated annealing
- <Skipping the proof that the greedy bound is tight with any non-exponential solution>
- The ε in the proof comes from the fact that accurately computing conditional entropies as necessary is #P <I’m actually not familiar with what that means offhand, but based on context it must mean more than poly>, so a method of estimating entropies is used instead.
- <on to experimental results>
- In the first experimental result they place temperature sensors around their lab. The quality of predictions with their method using 5 sensors is better than 20 sensors placed the standard entropy approach (that pushes sensors to boundaries and wastes part of their field)
- They had to lay down tiles to segment the continuous space of the lab <although in some segments there is more than one sensor, so I’m not sure exactly what they are doing. Oh it looks there are candidate positions pre-selected, which have line-of-sight connectivity, so the hierarchy doesn’t impact that graph but sits on it. Still not totally sure whats going on>
- Even when the assumptions of conditional independence are violated, the results are still good, worse than standard entropy with less than 5 sensors, but better with more than that
- Graph that shows linear run time <I thought it was poly, but they mention a simple and complex model, so one may be linear and the other poly?>
- Next results are on highway data. Like in the temp results used a hierarchical model <although I’m still not sure what that means exactly>
- Here, they compared placement based on information gain, entropy, and random choice. Random actually yielded more information gain than the entropy-based approach: “This indicates that maximizing entropy and information gain appear to be conflicting goals, resulting in qualitatively different behaviors.”