Using decision trees for modelling Taxi


In an attempt to construct a model for the non-fickle Taxi (The MAXQ Method for Hierarchical Reinforcement Learning, Dietterich), I made a rich feature vector and then attempted to train decision trees based on the traces of random actions.

The feature vector I used was as follows:

  1. taxi (x,y)
  2. taxi delta (x,y)
  3. passenger (x,y)
  4. destination (x,y)
  5. action
  6. fuel
  7. fuel delta
  8. reward
  9. has passenger?
  10. terminal?
  11. wall to North?
  12. wall to South?
  13. wall to West?
  14. wall to East?

From there I trained a J48/C4.5 decision tree which was the result of 1,500 trajectories through taxi when using a (uniform) random policy.

Some of the results are below.  When there is some value listed as <attribute name>-1, it means that it happened one step before the attribute that is being predicted.  For all integer values attributes, I had to have weka treat them as classes instead of numbers so that it could predict those values as well (which means there are no generalizations between numbers, a different decision tree classifier that performs regression and not just classification wouldn’t have this limitation).  Even still things seem to be quite good.  Here some of them are:

Decision tree for @attribute taxi_delta_x-0 {-1,0,1}
------------------
action-1 = U: 0 (3129.0)
action-1 = D: 0 (3133.0)
action-1 = W
|   wallW-1 = true: 0 (1569.0)
|   wallW-1 = false: -1 (1607.0)
action-1 = E
|   wallE-1 = true: 0 (1595.0)
|   wallE-1 = false: 1 (1642.0)
action-1 = PickUp: 0 (3351.0)
action-1 = DropOff: 0 (3547.0)
action-1 = Refuel: 0 (3426.0)

Decision tree for @attribute taxi_delta_y-0 {-1,0,1} ------------------ action-1 = U | wallN-1 = true: 0 (692.0) | wallN-1 = false: -1 (2437.0) action-1 = D | wallS-1 = true: 0 (780.0) | wallS-1 = false: 1 (2353.0) action-1 = W: 0 (3176.0) action-1 = E: 0 (3237.0) action-1 = PickUp: 0 (3351.0) action-1 = DropOff: 0 (3547.0) action-1 = Refuel: 0 (3426.0)
Decision tree for @attribute fuel-0 {0,1,2,3,4,5,6,7,8,9,10,11,12,13} ------------------ ... fuel-1 = 1: 1 (1548.0/3.0) fuel-1 = 2 | action-1 = U | | wallN-1 = true: 2 (74.0) | | wallN-1 = false: 1 (303.0) | action-1 = D | | wallS-1 = true: 2 (86.0) | | wallS-1 = false: 1 (267.0) | action-1 = W | | wallW-1 = true: 2 (214.0) | | wallW-1 = false: 1 (195.0) | action-1 = E | | wallE-1 = true: 2 (147.0) | | wallE-1 = false: 1 (229.0) | action-1 = PickUp: 2 (361.0) | action-1 = DropOff: 2 (384.0) | action-1 = Refuel: 2 (349.0/6.0) [although the fuel = 1 is weird, the rest of them are fine, but replicate for each value] ...
Decision tree for @attribute fuelDelta-0 {0,1,2,3,4,5,6,7,8,9,10,11,12,13,-1} ------------------ action-1 = U | wallN-1 = true: 0 (692.0) | wallN-1 = false: -1 (2437.0) action-1 = D | wallS-1 = true: 0 (780.0) | wallS-1 = false: -1 (2353.0) action-1 = W | wallW-1 = true: 0 (1569.0) | wallW-1 = false: -1 (1607.0) action-1 = E | wallE-1 = true: 0 (1595.0) | wallE-1 = false: -1 (1642.0) action-1 = PickUp: 0 (3351.0) action-1 = DropOff: 0 (3547.0) action-1 = Refuel: 0 (3426.0/96.0)
Decision tree for @attribute terminal-0 {true,false} ------------------ fuel-1 = 0: false (0.0) fuel-1 = 1 | action-1 = U | | wallN-1 = true: false (127.0) | | wallN-1 = false: true (440.0) | action-1 = D | | wallS-1 = true: false (124.0) | | wallS-1 = false: true (453.0) | action-1 = W | | wallW-1 = true: false (241.0) | | wallW-1 = false: true (291.0) | action-1 = E | | wallE-1 = true: false (235.0) | | wallE-1 = false: true (307.0) | action-1 = PickUp: false (560.0) | action-1 = DropOff: false (559.0/1.0) | action-1 = Refuel: false (515.0) fuel-1 = 2: false (3769.0) fuel-1 = 3: false (3835.0/1.0) fuel-1 = 4: false (4127.0/1.0) fuel-1 = 5: false (4286.0/1.0) fuel-1 = 6: false (3600.0/2.0) fuel-1 = 7: false (3241.0/1.0) fuel-1 = 8: false (2810.0) fuel-1 = 9: false (2196.0) fuel-1 = 10: false (1693.0/1.0) fuel-1 = 11: false (1250.0) fuel-1 = 12: false (808.0/1.0) fuel-1 = 13: false (0.0)
Decision tree for @attribute reward-0 {19,-1,-11,-21} ------------------ ... [Dynamics for each direction looks basically same except for the wall adjacency test is different of course] action-1 = E | fuel-1 = 0: -1 (0.0) | fuel-1 = 1 | | wallE-1 = true: -1 (235.0) | | wallE-1 = false: -21 (307.0) | fuel-1 = 2: -1 (559.0) | fuel-1 = 3: -1 (586.0) | fuel-1 = 4: -1 (603.0) | fuel-1 = 5: -1 (582.0) | fuel-1 = 6: -1 (481.0) | fuel-1 = 7: -1 (452.0) | fuel-1 = 8: -1 (415.0) | fuel-1 = 9: -1 (302.0) | fuel-1 = 10: -1 (251.0) | fuel-1 = 11: -1 (180.0) | fuel-1 = 12: -1 (118.0) | fuel-1 = 13: -1 (0.0) action-1 = PickUp: -11 (5057.0/103.0) action-1 = DropOff: -11 (5173.0/9.0) action-1 = Refuel: -1 (5048.0) ...

So those are the most interesting ones, others are less pretty.  For full disclosure I tried uploading all of the resulting trees, but something is wrong with wordpress right now.

Anyway, I think thats a good/interesting start to the feature construction discussion.  I did force feed a number of variables to make this all easier, but its tough for the decision trees to function without them.  Other regression/classification algorithms may not need then, but the nice thing about the decision tree approach is that it is very clear which variables are found to contribute and which are not.

Advertisements
Tagged ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: