Stats on RL domains tested in ICML 2012


“Learning Force Control Policies for Compliant Robotic Manipulation”
Kalakrishnan  et al:
7 Degrees of freedom, 6-DOF force sensor, actual robot

“Off-Policy Actor-Critic”
Degris, White, Sutton
3 Domains, each 2 continuous state dimensions (Hillcar, Pendulum,
Plane-world)
Discrete actions, 3,3,5 actions respectively

“Selector Approach to Temporal Difference Learning”
Dantzig et al
20 states, 2 action chain

“Continuous Inverse Optimal Control with Locally Optimal Examples”
Levine, Koltun
Continuous, arm control task.  8 action dimension, 16 state dimension

“Near-Optimal BRL using Optimistic Local Transitions”
Araya-L opez, Thomas, Buffet
5-state chain.
Paint-Polish
gridworld

“Greedy Algorithms for Sparse RL”
Painter-Wakefield, Parr
Chain, 50 state
Pendulum
Blackjack, 203 states
Mountain Car
2 forms of plane-worlds

“Reinforcement Learning Approach to Automatic Stroke Generation”
Xie, Hachiya, Sugiyama
Sumi-e painting: 1-d continuous action, 6d state

“Learning Parameterized Skills”
da Silva, Konidaris, Barto
Dart throwing: 1 continuous action, 1 discrete action, 6 cont. state, 1
discrete state

“Safe Exploration in MDPs”
Mihai Moldovan, Abeel
2 forms of gridworld

“Path Integral Policy Improvement with Covariance Matrix Adaptation”
Stulp, Sigaud
10-Dof arm

“Conditional mean embeddings as regressors”
Grunewalder et al.
Swing-up, discrete action

“Agnostic System Identification for Model-Based Reinforcement Learning”
Ross, Bagnell
Helicopter, 4d action, 21d state

ICML 2011

“Integrating Partial Model Knowledge in Model Free RL Algorithms”

Tamar, Di Castro, Meir

Randomly generated MDPs with up to 30 states

 

“PILCO: A Model-Based and Data-Ecient Approach to Policy Search”

Deisenroth, Rasmussen

(real) Swing-up, double pendulum swing up, simulated unicycle driving (2 action dimensions)

 

“Generalized Value Functions for Large Action Sets”

Pazis, Parr

Inverted Pendulum, Bicycle, Double Integrator

 

“Classification-based Policy Iteration with a Critic”

Gabillon et al

Inverted Pendulum, Hillcar

 

“Apprenticeship Learning About Multiple Intentions”

Babes-Vroman, Marivate, Subramanian, Littman

Gridworld, Highway Car

 

“Incremental Basis Construction from Temporal Di erence Error”

Sun et al

Trees with 500 states

ICML 2010

“Bayesian Multi-Task Reinforcement Learning”

Lazaric, Ghavamzadeh

Inverted Pendulum

 

“Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda”

Downey Sanner

Racetrack, mail robot (like taxi, millions of states)

 

“Generalizing Apprenticeship Learning across Hypothesis Classes”

Walsh, Subramanian, Littman

Taxi, Noisy blocks world

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: