“Learning Force Control Policies for Compliant Robotic Manipulation”
Kalakrishnan et al:
7 Degrees of freedom, 6-DOF force sensor, actual robot
“Off-Policy Actor-Critic”
Degris, White, Sutton
3 Domains, each 2 continuous state dimensions (Hillcar, Pendulum,
Plane-world)
Discrete actions, 3,3,5 actions respectively
“Selector Approach to Temporal Difference Learning”
Dantzig et al
20 states, 2 action chain
“Continuous Inverse Optimal Control with Locally Optimal Examples”
Levine, Koltun
Continuous, arm control task. 8 action dimension, 16 state dimension
“Near-Optimal BRL using Optimistic Local Transitions”
Araya-L opez, Thomas, Buffet
5-state chain.
Paint-Polish
gridworld
“Greedy Algorithms for Sparse RL”
Painter-Wakefield, Parr
Chain, 50 state
Pendulum
Blackjack, 203 states
Mountain Car
2 forms of plane-worlds
“Reinforcement Learning Approach to Automatic Stroke Generation”
Xie, Hachiya, Sugiyama
Sumi-e painting: 1-d continuous action, 6d state
“Learning Parameterized Skills”
da Silva, Konidaris, Barto
Dart throwing: 1 continuous action, 1 discrete action, 6 cont. state, 1
discrete state
“Safe Exploration in MDPs”
Mihai Moldovan, Abeel
2 forms of gridworld
“Path Integral Policy Improvement with Covariance Matrix Adaptation”
Stulp, Sigaud
10-Dof arm
“Conditional mean embeddings as regressors”
Grunewalder et al.
Swing-up, discrete action
“Agnostic System Identification for Model-Based Reinforcement Learning”
Ross, Bagnell
Helicopter, 4d action, 21d state
ICML 2011
“Integrating Partial Model Knowledge in Model Free RL Algorithms”
Tamar, Di Castro, Meir
Randomly generated MDPs with up to 30 states
“PILCO: A Model-Based and Data-Ecient Approach to Policy Search”
Deisenroth, Rasmussen
(real) Swing-up, double pendulum swing up, simulated unicycle driving (2 action dimensions)
“Generalized Value Functions for Large Action Sets”
Pazis, Parr
Inverted Pendulum, Bicycle, Double Integrator
“Classification-based Policy Iteration with a Critic”
Gabillon et al
Inverted Pendulum, Hillcar
“Apprenticeship Learning About Multiple Intentions”
Babes-Vroman, Marivate, Subramanian, Littman
Gridworld, Highway Car
“Incremental Basis Construction from Temporal Dierence Error”
Sun et al
Trees with 500 states
ICML 2010
“Bayesian Multi-Task Reinforcement Learning”
Lazaric, Ghavamzadeh
Inverted Pendulum
“Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda”
Downey Sanner
Racetrack, mail robot (like taxi, millions of states)
“Generalizing Apprenticeship Learning across Hypothesis Classes”
Walsh, Subramanian, Littman
Taxi, Noisy blocks world