From http://videolectures.net/nipsworkshops2010_golovin_asm/

- Basic idea is consider submodular functions, they they are stochastic and unfold over time. So instead of making all decisions in one shot (before execution begins), better to wait to see how the first decision unfolds before making the second, etc.
- Gives the example of influence in social networks – if you want to get people to plug a product you give to a few people for free, see how influence is spreading from the first person before deciding the second

- The setting described here is a generalization of standard suboptimality
- Analysis here allows many results from standard submodular optimization to the adaptive setting
- Instead of maximizing margin, maximizes expected margin based on current state, but does this only for the first selection, and the results of the first selection on state then become conditions for the next choice
- Gets standard bounds of 1-1/e
- Because adaptive, can now do things like “take the minimal number of actions needed to achieve X”, which you couldn’t do with standard submodularity (there, you can just say “maximize the benefit of N choices”
- This is a log-approximation ln(n)+1 for stochastic set cover

- Another application
**optimal decision trees**- “Generalized binary search” which is equivalent to max info gain
- This problem is adaptive submodular <I thought things like xor in decision trees arent submodular, but maybe because we are talking w.r.t. information gain it is?>
- Give bounds for this approximation
- “Results require that tests are exact (no noise)!”

- If trying to do decision trees in noisy setting, can do a Bayesian version
- But generalized binary search, maxing info gain, and maxing value of info aren’t adaptive submodular in noisy setting. In noisy settings they require exponentially more tests than optimum <Should really read this paper>

- Trick is to transform noisy problem into normal problem by making outcome part of the hypothesis
- Only need to distinguish between noisy hypotheses that lead to different decisions

- Gives bounds for this Bayesian active learning for noisy domains
- Give an example of testing different psyhological/economic models of how people perform decision making
- For this setting, turns out many common approaches actually do much worse than random