**I took bad notes, didn’t take notes on things I didn’t care about or if I was tired, and spend alot of time doing volunteer stuff, so this isn’t a great guide.**

**Boosting for Regression Transfer (Pardoe):**

- Inspired by issues found in trading agent competition
- Adopt a classification booster for regression
- [couldn’t see any of the screen here, so lots missing]
- Used Adaboost.R2 which weighs samples based on relative error
- Also has a weighting of source instances? So if one corpus seems to be crappy its used less?

**Should one computer the TD Fixed Pt or Bellman Resid? (Scherrer)**

- Case of fixed policy, compute the value, trying to figure out linear value function
- Can try to minimize TD(0) fixed pt or Bellman residual
- Some examples where TD is better and some where Bellman is
- Bellman is an upper bound on TD error
- Empirically, TD is often better, but when it fails it fails badly, so Bellman is more robust

**Active Learning for Multi-Task Adaptive Filtering (Harpale?)**

- Basically looking at issue of spam classification
- Use Dirichlet process for filtering
- Try to tune params for DP based on feedback, also tune params more directly
- Interested in risk minimization

**Constructing States for RL (Mahmud)**

- Pomdp stuff
- concerned w/learning and optimal control of Deterministic Markov Models
- In regular RL you know what you don’t know, but in POMDPs you don’t know what you don’t know
- Each DMM is a finite state machine
- At the limit finds the right model and policy
- Sample models from all possible FSMs
- Empirical results were in domains that actually violate some assumptions

**Analysis of Classification-Based Policy Iteration (Lazaric)**

- Classification of optimal action for states
- Estimates return distribution instead of just expected return
- Uses a recursive formula for distribution estimation,
*Distributional Bellman Operator*(also a version with lambda) - Do exploration/exploitation based on tail of CDF of estimated distribution?
- Not impressive empirical results; not significantly different performance

**Inverse Optimal Control w/Linearly Solvable MDPs (Dvijotham)**

- This was given by one of Emo’s students, so that style of mix between control theory and RL
- Related to apprenticeship learning, assume teacher is optimal
- Linearly solvable MDPs:
- Has passive dynamics
- Control inputs change dynamics

- Closed form of computation I think
- Maxent RL is related (never heard of it, but sounds interesting)
- Has good sample complexity
- Have to deal with integration, but some assumptions can get around that?
- Need to know process dynamics

**Feature Selection Using Regularization… (Petrik)**

- Concerned w/lin value func approx
- Does feature selection
- Approximate linear programming
- Uses L1 regularization

**Efficient Selection of Bandit Arms (Kalyanakrishnan)**

- PAC bounds for finding m-best bandit arms

**Model Based RL w/Near-Tight Exploration (Szita)**

- An improvement of the PAC bounds for R-max, slight changes to the algorithm
- Investigation prompted by the fact that delayed QL had better bounds w.r.t. state size than R-max, which didn’t make much sense, got the bounds of R-max back down

**Gaussian Process Optimization in Bandits (Srinivas)**

- I think he said he was dealing with global continuous
**noisy**optimization - Basic idea works like Lipshitz optimization, except using the GP bounds instead of Lipshitz bounds
- This is nice because this can have diff distance metrics?

- Regret depends on how quickly information can be gained
- Function doesn’t actually have to be a GP
- Finding the optima of max conf is hard (another optimization problem)?
- I’m not sure how they deal with stochasticity – that piece seemed missing, if they deal with that case

**Epilepsy Prediction (Deselaers) (runner up?)**

- Doing bin classification, spectral processing on EEG data
- Tough problem because diff patients have diff patterns for baseline and seizure, but both are abnormal, so it really has to be customized to patient (one that tries to work of the shelf is available, but has a very high rate of error, about once an hour?)
- Uses SVMs w/RBF kernels
- About 96% detection, but about 5 false positives a day – I imagine they would prefer false positives to false negatives, so that seems ok.
- 90+% detection in less than 10 seconds

**Workshop: RL and Search in Large Spaces:**

**Search Engines as Bandits (Thorsten)**

- Tried search engines on Arxiv w/ diff known levels of goodness (started with a good on, and then added diff levels of noise)
- Only way to test search engines is to present results from both on the same page and check which has results that are preferred, other way (testing just one at a time and checking click-through) didn’t really produce significant results
- Bandit regret was in terms of how bad results presented are relative to best possible engine
- Came up with and alg that does a one-vs rest with elimination, has a regret bound but also chance of failure, requires some assumptions, didn’t seem to be unreasonable

**Sarsa(lambda) in RKHS (Richards)**

- Method of making features that allow for value func linear in features?
- Also tried to use regression to estimate eligibility trace, I think this was really the focus of what was going on, was able to use very small # of traces to do this, saving memory burden
- Online kernel TD

Thanks, Ari. I’m checking out Istvan’s paper, which seems like a very cool result.