Behavioral and neurophysiological correlates of regret in rat decision-making on a neuroeconomic task. Steiner, Redish. Nature Neuroscience.

  1. Deals with regret as described in the RL literature as rats make decisions (they make a choice they can’t go back on, but may have gotten something better with a different choice)
  2. “In humans, the orbitofrontal cortex is active during expressions of regret, and humans with damage to the orbitofrontal cortex do not express regret.  In rats and nonhuman primates, both the orbitofrontal cortex and the ventral striatum have been implicated in reward computations.”
  3. In situations of high regret <presumably, the reward of the other option is presented, but not accessible> “… rats looked backwards toward the lost option, cells within orbitofrontal cortex and ventral striatum represented the missed action, rats were more likely to wait for the long delay, and rats rushed through eating the food after that delay.
  4. “Disappointment is the realization that a realized outcome is worse than expected; regret is the realization that the worse than expected outcome is due to one’s own mistaken action… that the option taken resulted in a worse outcome than an alternative option or action would have.”
  5. “Orbitofrontal cortical neurons represent the chosen value of an expected future reward… [and] has been hypothesized to be critical for learning and decision making, particularly in the evaluation of expected outcomes.”
  6. Ventral striatum is also implicated in evaluation of outcomes
  7. In rats, neural recordings show VS and OFC deal with reward, reward/value predictions.  “In the rat, lesion studies suggest that orbitofrontal cortex is necessary for recognition of reward-related changes that require inference, such as flavor and kind, while vStr is necessary for recognition of any changes that affect value.  In rats deliberating at choice points, vStr reward representations are transiently active before and during the reorientation process, but reward representations in OFC are only active after the reorientation process is complete.”
  8. The experiments are somewhat along the lines of the secretary problem – rats run around a loop and can stop at one of a number of places to get food.  When entering an area, a stochastic wait time (until food was released) was introduced.  A tone after entering indicated how long the wait would be.  Rats could either decide to wait for the food or proceed on.  Delays were IID and uniformly distributed
  9. Experiment run on 4 different rats – they all basically took a threshold approach where if the wait was less than a certain value they would wait, and otherwise they would move on.
    1. If they decided to skip the reward and move on, the delay on this was independent of delay.  Data indicates they made a decision upfront and did not simply wait for a period of time to move on (either they left after a short period of time, or waited around completely until food was delivered)
    2. Threshold between waiting and skipping tended to be related to each of the 1 of 4 flavors of possible food, one for each quadrant
    3. Upon delivery of reward, rats usually waited 20-25s to leave to get next rewards
  10. In variation of the task where one zone provided 3x amount of food, rats chose to wait longer for that larger amount
  11. 81, 86% of OFC, VS neurons responded to reward, respectively.  “Responses in bot OFC and vStr often differentiated among the four reward sites (…).”
    1. Used Bayesian classification to determine food quadrant from activity pattern (they included an extra zone from the 4 food quadrants to correspond to locations in the track between quadrants where food could not be obtained
    2. Results of classification between OFC and VS are overall qualitatively quite similar
  12. Both OFC and VS signals distinguish between zones both at time of entering zones, as well as at time reward was produced
  13. Responses in both areas when food delay was below threshold, but not so in cases when delay was above threshold
    1. “This suggests that these structures were indicating expected value, and predicting future actions.”
    2. This was tested specifically when presenting delays right around threshold (so the difference in possible delays was small, but rats would either choose to stay or go).  The same results were found, that activity was related to the decision and not the environment
  14. Now moving on to regret specifically
  15. For regret to be induced, the agent needs to know what outcome occurred, as well as what the expected outcome of all actions are – these conditions exist in this domain
  16. “Because the rats were time-limited on the Restaurant Row task, encountering a high-cost delay after not waiting through a low-cost delay means that skipping the low-cost delay was a particularly expensive missed opportunity.”
    1. These conditions did occur in the experiment, in some cases the rats would skip a below-threshold reward and then be faced with a high delay
  17. “Theoretically, the key to regret is a representation of the action not taken.”
    1. <Their interpretation of this is that is “… that there should be representations of the previous choice during the regret-inducing situations, particularly in contrast to control conditions that are merely disappointing.”  Usually in the RL literature it is done w.r.t. the expectation, but I guess this is reasonable here because the rat knows exactly what it passed up on – the noise is presented to the rat in the form of an audio cue.>
  18. To tease apart disappointment from regret they took sequences where the waits were the same, but the rat behavior differed.
    1. In this case, the rat acted optimally (by taking the short wait) but may be disappointed by the long wait following, as it may want to eat another time
    2. The second control is when cue 1 + 2 are above threshold (as opposed to just #2) and where the rat skipped both options.  Again, in this case, the options presented weren’t optimal, but the rat behaved optimally given the circumstances
    3. Experimental set-up had regret and control instances evenly distributed
  19. “Behaviorally, rats paused and looked backwards toward the previous option upon encountering a potentially regret-inducing sequence, but they did not do so in either control condition (…).”  In both the control instances, (where the rat acted correctly), the rat did not look back.
  20. “During potential regret instances, individual reward-responsive neurons in OFC and vStr showed activity patterns more consistent with the previous reward than the current one (…). Neural activity peaked immediately after the start of the look back toward the previously skipped, low-cost reward.”
    1. That is for individual neurons.  For the population as a whole, representation of the previous reward was weak, whereas representation related to the previous zone was stronger
    2. The representation of the previous zone (where the regret-inducing decision was made), did not occur in non-regret situations
  21. Regret didn’t only manifest through immediate behavioral <looking back> and neural responses but also in terms of future decision making.
    1. This is the case – rats tended to take the subsequent long delay (that they normally would reject) after rejecting the previous short delay (that they normally accept, and should do so optimally)
    2. They also ate *much* more quickly in the regret case where they accepted the bad offer than normal – the otherwise average case and both other controls are basically the same and are markedly different
  22. In regret cases, increased representation of the previous zone was correlated to accepting the bad offer, this wasn’t the case in controls, which both had a high-cost second choice
  23. There was a clear representation of the previous zone, but not other zones
  24. Earlier work implicates OFC and VS in calculating expectation of reward.
    1. “Our data indicate that violation of an expectation initiates a retrospective calculation of expectation, this retrospective calculation of expectation influences future behavior: rats are more willing to wait for reward after a regret instance.”
  25. “While some evidence suggests that OFC represents economic value, the representation of regret is more consistent with the hypothesis that OFC encodes the outcome parameters of the current, expected, or imagined state.  The data presented here are also consistent with the essential role of OFC in proper credit assignment.  Previous studies have identified potential representations of the counterfactual could-have-been-chosen option in rats, monkeys, and humans.”
  26. “The connectivity between OFC and vStr remains highly controversial, with some evidence pointing to connectivity and other analyses suggesting a lack of connectivity.”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: