Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms. Lihong Li, Wei Chu, John Langford, Xuanhui Wang. WSDM 2011.

On videolectures:http://videolectures.net/wsdm2011_li_uoe/

  1. Clearly, basically about contextual bandits
  2. Introduce a replay methodology for contextual bandit algorithm evaluation.
    1. Totally data driven (another method is to construct a simulator)
  3. Method is provably unbiased
  4. Motivation is similar to what we did in Pitfall.  It can be costly to run each potential method in real life, so the goal is to generate enough data that we can effectively simulate each method from the logged data
  5. “…data in bandit-style applications only contain user feedback for recommendations that were actually displayed to the user, but not all candidates.  This ‘partial-label’ nature raises a difficulty that is the key difference between evaluation of bandit algorithms and supervised learning ones.”
  6. The treat RL from batch data as an “off-policy evaluation problem”
  7. If I read this again skip everything before section 3.
  8. I don’t get why, but the claim is that each history has an identical probability in the real world as in the policy evaluator.  Therefore,
  9. Basically the process is:
    1. Go through data corpus sample by sample
    2. For a logged sample <s_i,a_i,r_i>, give the algorithm the context/state, s_i
    3. The algorithm returns desired action a
    4. If a_i =a, give the sample to the program to update its history.  Else just ignore this sample and move on
  10. Estimation error approaches 0 with increasing data.
    1. Error is O(sqrt(K/|D|)), |D| is size of data set; drops off quickly
  11. Empirical results of results on corpus consisting of Yahoo click through, actual results fit very nicely with bounds

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: