Reinforcement Learning in Multidimensional Continuous Action Spaces

  1. Utilizes Binary Action Search approach for action selection when set of actions is large
  2. Goes to pains to emphasize that most papers discuss why existing methods are inefficient from a storage or computational perspective
  3. The modified MDP is larger by a factor of log(|A|)  over the original MPD, which makes the learning problem larger, although the also claim the representation complexity of the transformed MDP is within a factor of 2 of the original MDP
  4. LSPI and Fitted-Q iteration with Binary action search still needed hundreds of samples to do well in the inverted pendulum task
  5. They used PCA to find a good set of features for inverted pendulum
  6. Mention that packing action info in the state causes representation problems, and it is worth investigating more feature selection to reduce the problems that introduces

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: