Using decision trees for modelling Taxi, 2

> I’m not sure I follow what you mean.  Do you mean using features
> from both the previous and current time step to predict the current
> time step?  Could you give an example?

Right!  In dynamic Bayes net terminology, they are called

Take a look at

If you search for synchronic, you can find some discussions of the
importance of this kind of representation in stochatic domains.  There
isn’t much in terms of deterministic domains, so here’s a quick
example.

Imagine data that looks like this (three binary variables):

000 -> 001
001 -> 010
010 -> 011
011 -> 100
100 -> 101
101 -> 110
110 -> 111
111 -> 000

(It’s a binary counter with wraparound.)

Let’s say x,y,z are the values before the transition and x’,y’,z’
are the values afterwards.  Consider learning x’ from x,y,z.  The
decision tree looks like

if x = 0
if y = 1
if z = 1, then x’ = 1
else, x’ = 0
else, x’ = 0
else,
if y = 1,
if z = 1, x’ = 0
else, x’ = 1
else, x’ = 1

But, what if y’ is part of the input as well?  Then, we can ignore z
and make a tree like:

if x = 0,
if y = 1,
if y’ = 0, x’ = 1
else, x’ = 0
else, x’ = 0
else,
if y = 1,
if y’ = 0, x’ = 0
else, x’ = 1
else, x’ = 1

Admittedly, that’s not much better, but a longer bit counting
example would should an advantage, because every bit would just need
to look at its value and the value of the neighbor and its previous
value.  There’s also some more advantage if we’re predicting delta x:

if y = 1,
if y’ = 0, delta x = 1
else, delta x = 0
else, delta x = 0

It was pretty easy to rerun the experments with synchronic links.  The decision trees for all variables is uploaded here.  I fixed the mistake with the fuel not changing on a failed location change, and changed the rewards from before (as far as the rewards go, its not totally clear to me which is right from the paper, but this seems better).

Note there are some instances where a prediction does not make any sense (isn’t possible given the environment), and is followed by (0.0) for the confidence.  This is just because if weka has no data on a given state, it makes one up so there is a leaf for all possible values in the tree when using class instead of scalar data, which is how I had to set it up here.

Each variable has a suffix -X where X is the number of time steps before the present step predicted.  So foo-1 means attribute foo on the previous time step, and foo-0 means the attribute foo at the current time step.

Some outtakes are:

Decision tree for @attribute taxi_delta_x-0 {-1,0,1}
------------------
action-1 = U: 0 (3740.0)
action-1 = D: 0 (3744.0)
action-1 = W
| wallW-1 = true: 0 (1814.0)
| wallW-1 = false
| | wallE-0 = true: 0 (211.0)
| | wallE-0 = false: -1 (1726.0)
action-1 = E
| wallE-1 = true: 0 (1864.0)
| wallE-1 = false
| | wallW-0 = true: 0 (210.0)
| | wallW-0 = false: 1 (1747.0)
action-1 = PickUp: 0 (3657.0)
action-1 = DropOff: 0 (3716.0)
action-1 = Refuel: 0 (3651.0)

Decision tree for @attribute fuelDelta-0 {-1,0,1,2,3,4,5,6,7,8,9,10,11,12,13}
J48 pruned tree
------------------
action-1 = U: -1 (3740.0)
action-1 = D: -1 (3744.0)
action-1 = W: -1 (3751.0)
action-1 = E: -1 (3821.0)
action-1 = PickUp: 0 (3657.0)
action-1 = DropOff: 0 (3716.0)
action-1 = Refuel: 0 (3651.0/135.0)

Decision tree for @attribute reward-0 {20,-1,-10,-20} ------------------ terminal-0 = true: -20 (1500.0/13.0) terminal-0 = false | fuelDelta-0 = -1: -1 (13569.0) | fuelDelta-0 = 0 | | action-1 = U: -10 (0.0) [note these are wrong, but have... | | action-1 = D: -10 (0.0) ... no supporting evidence either way] | | action-1 = W: -10 (0.0) | | action-1 = E: -10 (0.0) | | action-1 = PickUp | | | hasPassenger-0 = true | | | | hasPassenger-1 = true: -10 (109.0) | | | | hasPassenger-1 = false: -1 (86.0) | | | hasPassenger-0 = false: -10 (3462.0) | | action-1 = DropOff: -10 (3703.0) | | action-1 = Refuel: -1 (3516.0) | fuelDelta-0 = 1: -1 (9.0) | fuelDelta-0 = 2: -1 (7.0) ... | fuelDelta-0 = 12: -1 (10.0) | fuelDelta-0 = 13: -1 (0.0)
Decision tree for @attribute terminal-0 {true,false} ------------------ reward-0 = 20: true (13.0) reward-0 = -1: false (17306.0) reward-0 = -10: false (7274.0) reward-0 = -20: true (1487.0)