Thoughts on KWIK Learning the Stock Game


In my previous post, I wrote about using entropy to find the structure of the stock game.  In discussions afterward, it came up whether it was provable that the method described was KWIK (Knows What It Knows – Li,  Littman, Walsh – ICML 2008).

István pointed out that this might be difficult due to the log factor in the equation for entropy, but I think I figured out a way around it; the basic answer is to just raise the equation H(X) to 2^{H(X)}, which then removes the log factor.  I wrote a quick breakdown of the math steps, its available here: Removing the log from the equation for entropy [note – Binomial should read Bernoulli].

From there I was unsure what to do, as it didn’t seem there was really anything simple to do as a next step in that line of thinking.  Reading over the KWIK paper again though gave me exactly what I was looking for; estimating the probability of a Bernoulli distribution is KWIK learnable.

So the algorithm I used was just using the conditional Bernoulli distributions to compute the entropy.  The question is, does anything weird happen based on the calculations done with those estimated probabilities to remove the KWIK bound?  Basically, does an accurate estimation of the probability imply an accurate estimation of the entropy?  More specifically, can the product of a number of KWIK-learned values be not KWIK anymore for some reason?

Advertisements
Tagged , , , ,

2 thoughts on “Thoughts on KWIK Learning the Stock Game

  1. Michael Littman says:

    Ok, so it seems like there are some helpful pieces here. I’d suggest that you take a step back and formalize the question. In mathematical notation, what are you trying to show? The log stuff ought to come in somewhere in the middle of the argument, but you should be clear on the problem statement and what you want to show. I suspect it seems tedious and redundant, but restating the question is really helpful in keeping the ideas clean and easy to understand.

  2. hut3 says:

    Ok – that’s good advice.
    – I suppose the first step is can we show we can get epsilon close bounds on the entropy of a binomial distribution in samples polynomial in (|S|, |A|, 1/delta, 1/epsilon)?
    – After that, we can introduce the bias in the algorithm. That is, that conditioning on any variables (x) that are actually predictive of another variable (y) in the process introduce a decrease in the entropy in y that is at most some fractional value of y. Specifically the assumption is that if in the *actual* process x is an input to the value of y in the next step, then H(y_t)r > H(y|x_{t-1}) for some 0<r<1.
    – From there, if the entropy is KWIK and we have a tight bound, and the assumption is valid, what is the probability of incorrectly building a model of the DBN? We should also be able to show that this is KWIK, poly in (|S|, |A|, 1/epsilon_1, 1/delta_1).

    So those are the main steps as I see it, does that seem reasonable?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: