Reinforcement Learning in Multidimensional Continuous Action Spaces


  1. Utilizes Binary Action Search approach for action selection when set of actions is large
  2. Goes to pains to emphasize that most papers discuss why existing methods are inefficient from a storage or computational perspective
  3. The modified MDP is larger by a factor of log(|A|)  over the original MPD, which makes the learning problem larger, although the also claim the representation complexity of the transformed MDP is within a factor of 2 of the original MDP
  4. LSPI and Fitted-Q iteration with Binary action search still needed hundreds of samples to do well in the inverted pendulum task
  5. They used PCA to find a good set of features for inverted pendulum
  6. Mention that packing action info in the state causes representation problems, and it is worth investigating more feature selection to reduce the problems that introduces
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: