- Paper uses incremental (as opposed to batch) slow feature analysis (IncSFA) for learning proto-value functions
- POMDP paper
- “Since they [proto-value functions] provide a good basis for learning different reward functions, they are good building blocks for
*general*reinforcement learning.” - Uses TD to do VFA on top of the features developed by ISFA
- A previous work (Sprekeler 2011, in my reading list) “… showed how SFA can be considered a function approximation to learning PVFs [proto-value functions] (…), so slow features (SFs) can have the same set of beneficial properties for representation learning for general RL .”
- “In this paper, we show how IncSFA can be used as a low complexity method for an autonomous RL agent to learn features of a spectral (graph)-embedding of its environment, which it experiences indirectly through high-dimensional observations (images).”
- Goal is to find a set of basis functions that can represent the value function well over a wide class of reward functions
- <I am confused as it says phi makes up the basis functions for estimating value, but then t says phi maps the observation vector
**x**to**y**– I guess there is another dot product somewhere that turns**y**into a scalar?> - Proto-value functions are intended to “… capture global dynamic characteristics of the MDP in a low dimensional space.” The idea is to find some basis that preserves distance between states.
- In Proto-reinforcement learning, the basis comes from the graph Laplacian
- SFA has a slightly different approach which is to look for ways of weighing the basis functions so that the output signal changes slowly
- “
**Equivalence of PVFs and SFs.**[their emphasis] Sprekeler (2011) showed that two objectives, of SFA (slowness) and LEM [Laplacian EigenMap] (nearness), are mathematically equivalent under some reasonable (i.e., when we collect data from a random walk on an MDP) assumptions. For intuition, note that saying two observations have a high probability of being successive in time is another way of saying that the two underlying states have a high probability of being connected In the LEM formulation, the neighbor relationships are explicit (through the adjacency matrix), but in SFA’s they are implicit (through temporal succession).” - [Immediately next paragraph] “Note that SFA additionally depends on function space
*F**.*The main reason the slow features (SFs) are*approximations*of the PVFs is due to this function space, and what information it captures and what information it lacks.” - IncSFA doesn’t build a covariance matrix, even for temporary use between data points
- The math ends up approximating the Laplacian according to a random walk
- IncSFA is much more computationally efficient than Laplacian EigenMap or batch SFA – in these two the computation is dominated by the Eigen decomposition, which is cubic
- On the other hand, as a batch method it uses each data point less efficiently
- The first experiment is a 20-state chain where the state representation is a zero-vector of length 20, with the current state changing value to a 1.
- Features extracted after training is very similar to Laplacian EigenMap (basically sinusoids that are slightly time-shifted but of very similar wavelength)
- The learning times, are very long, however – on the order of 10,000 or 20,000 samples
- <I don’t think they mention the basis functions used here>

- The second experiment is a 2D task with nonstationary reward.
- <In both experiments actions are taken randomly, so would exponential experience to get adequate coverage in some domains.>
- The third experiment is another 2d navigation task. Again embedding of Laplacian EigenMap looks essentially the same as IncSFA
- “The slow features are approximately proto-value functions, therefore graphical embedding can be done with linear complexity via IncSFA without constructing a graph (or a covariance matrix).”
- Linear time and space requirements, and that its online add to biological plausibility of the method.

Advertisements