Low Complexity Proto-Value Function Updating with Incremental Slow Feature Analysis. Luciw, Schmidhuber. EWRL 2012.

  1. Paper uses incremental (as opposed to batch) slow feature analysis (IncSFA) for learning proto-value functions
  2. POMDP paper
  3. “Since they [proto-value functions] provide a good basis for learning different reward functions, they are good building blocks for general reinforcement learning.”
  4. Uses TD to do VFA on top of the features developed by ISFA
  5. A previous work (Sprekeler 2011, in my reading list) “… showed how SFA can be considered a function approximation to learning PVFs [proto-value functions] (…), so slow features (SFs) can have the same set of beneficial properties for representation learning for general RL .”
  6. “In this paper, we show how IncSFA can be used as a low complexity method for an autonomous RL agent to learn features of a spectral (graph)-embedding of its environment, which it experiences indirectly through high-dimensional observations (images).”
  7. Goal is to find a set of basis functions that can represent the value function well over a wide class of reward functions
  8. <I am confused as it says phi makes up the basis functions for estimating value, but then t says phi maps the observation vector x to y – I guess there is another dot product somewhere that turns y into a scalar?>
  9. Proto-value functions are intended to “… capture global dynamic characteristics of the MDP in a low dimensional space.”  The idea is to find some basis that preserves distance between states.
  10. In Proto-reinforcement learning, the basis comes from the graph Laplacian
  11. SFA has a slightly different approach which is to look for ways of weighing the basis functions so that the output signal changes slowly
  12. Equivalence of PVFs and SFs. [their emphasis] Sprekeler (2011) showed that two objectives, of SFA (slowness) and LEM [Laplacian EigenMap] (nearness), are mathematically equivalent under some reasonable (i.e., when we collect data from a random walk on an MDP) assumptions.  For intuition, note that saying two observations have a high probability of being successive in time is another way of saying that the two underlying states have a high probability of being connected  In the LEM formulation, the neighbor relationships are explicit (through the adjacency matrix), but in SFA’s they are implicit (through temporal succession).”
  13. [Immediately next paragraph] “Note that SFA additionally depends on function space F.  The main reason the slow features (SFs) are approximations of the PVFs is due to this function space, and what information it captures and what information it lacks.”
  14. IncSFA doesn’t build a covariance matrix, even for temporary use between data points
  15. The math ends up approximating the Laplacian according to a random walk
  16. IncSFA is much more computationally efficient than Laplacian EigenMap or batch SFA – in these two the computation is dominated by the Eigen decomposition, which is cubic
  17. On the other hand, as a batch method it uses each data point less efficiently
  18. The first experiment is a 20-state chain where the state representation is a zero-vector of length 20, with the current state changing value to a 1.
    1. Features extracted after training is very similar to Laplacian EigenMap (basically sinusoids that are slightly time-shifted but of very similar wavelength)
    2. The learning times, are very long, however – on the order of 10,000 or 20,000 samples
    3. <I don’t think they mention the basis functions used here>
  19. The second experiment is a 2D task with nonstationary reward.
  20. <In both experiments actions are taken randomly, so would exponential experience to get adequate coverage in some domains.>
  21. The third experiment is another 2d navigation task. Again embedding of Laplacian EigenMap looks essentially the same as IncSFA
  22. “The slow features are approximately proto-value functions, therefore graphical embedding can be done with linear complexity via IncSFA without constructing a graph (or a covariance matrix).”
  23. Linear time and space requirements, and that its online add to biological plausibility of the method.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: