Grid Cells, Place Cells, and Geodesic Generalization for Spatial Reinforcement Learning. Gustafson, Daw. PLOS Computational Biology. 2011.

  1. How does representation make learning possible in spatial navigation.
    1. In particular, considering (activity of pyramidal) hippocampal place and grid cells can contribute to estimates of a value function
  2. As opposed to localization, here transfer between earlier and present experience is considered
  3. “Accordingly, although neural populations collectively offer a precise representation of position, our simulations of navigational tasks verify the suggestion that RL gains efficiency from the more diffuse tuning of individual neurons, which allows learning about rewards to generalize over longer distances given fewer training experiences.”
  4. As opposed to Euclidian distances that may not respect the manifold of the domain (and cause problems in the case of boundaries), a geodesic distance is what is useful
  5. The brain has at least 2 representations of location:
    1. “Hippocampal place cells fire when a rat passes through a confined roughly concentric, region of space…”
    2. “… grid cells of the dorsomedial enthrohinal cortex (dMEC) discharge at vertices of regular triangular latices […].”
  6. Most studies based on place and grid cells consider what type of stimulus/environment makes them active.  Here, the consideration is how does the brain use that activity to organize behavior
  7. “Importantly, this exercise views the brain’s spatial codes less as a representation for location per se, and instead as basis sets for approximating other functions across space.”
    1. That is, activity of place cells alone, would be capable of producing value functions and policies, but is there something about the combination of place and grid cells that makes path planning even easier (or in particular, producing value functions that respect topography, as discontinuities in dynamics lead to discontinuities in value)
  8. Activity of place and grid cells are dependent somewhat on characteristics of domain
  9. In simulated experiments, there were grid worlds with different numbers of boundaries.  Start positions were randomized, but goal position always remained the same
    1. To simulate place cells, Gaussian basis functions were used
    2. To simulate grid cells, sine waves were used
  10. Naturally, the tabular approach was the worst (no generalization).  In the simplest domain the place cells are clearly better than grid cells, which are clearly better than tabular.  In the hardest domain, the performance of grid+place cells are equivalent and still better than tabular
  11. But, looking at the value functions, it is clear that goodness from the goal is “bleeding” across boundaries in a way that is not appropriate
  12. Because of the overgeneralization, in another set of more complex tasks, the tabular representation does better
  13. To fix this, and so that basis functions would respect topology, points were assigned new x-y coordinates, basically by running connectivity through ISOMAP
    1. After doing this, there wasn’t spillage across boundaries.
  14. <Although they use shortest geodesic distance, there is no reason why that would be the only method that would produce these results.  Basically, you just need something that respects the fact that you cant cross walls (for example a random walk also respects this>
  15. <I’m not sure why, but they keep comparing tabular vs grid vs place cells.  The brain has the latter two together, so why not show the results of their combined activity?  Maybe there is also some prevention of spillage across boundaries in the naive case when used together, or something else interesting….>
  16. There are results from place cell activity that respect domain topology w.r.t. doors/boundaries
    1. This is true also when the domain is nonstationary/changes during activity.  This can even cause new place cell activity
  17. “One of the hallmarks of model-based planning (and the behavioral phenomena that Tolman [67] used to argue for it, albeit not subsequently reliably demonstrated in the spatial domain), is the ability to plan novel routes without relearning, e.g. to make appropriate choices immediately when favored routes are blocked or new shortcuts are opened. Interestingly, rather than by explicit replanning, some such behaviors could instead be produced more implicitly by updating the basis functions to reflect the new maze, while maintaining the weights connecting them to value. This is easy to demonstrate in the successor representation [16], a model closely related to ours.”
Tagged , , ,

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: