Optimal Indolence: A Normative Microscopic Approach to Work and Leisure. Niyogi, Breton, Solomon, Conover, Shizgal, Dayan. Interface 2013

  1. Deals with how decisions of working and engaging in leisure are made
  2. Previous work has generally just considered the gross average amount of time spent on either task, this study considers these decisions at a fine scale
  3. Studies rodents, where work is holding down a lever, and leisure is any other behavior
  4. Reward is BSR (brain stimulation reward)
    1. “Does not suffer satiation and allows precise psychophysically stable data to be collected over many months.”
  5. The strength of a BSR is the frequency of its application
    1. The goodness of a BSR is called its reward intensity (RI)
  6. Leisure is assumed to have intrinsic subjective worth
  7. The ratio of the RI to the price (cost?) is called the pay-off
  8. During a trial, the RI and amount of work required to achieve it are held constant
  9. Subject could work for at most 25 times the price (being the amount of time the lever is held down?)
  10. Experiments are broken into three trials (inside a trial reward is constant).  Reward intensity declines between trials.  Price is “shortest” during first and last trial?  <This last bit seems unclear based on wording.>  First and last trials are intended for calibration
  11. Seems like there are multiple test trials in the middle, with different costs and rewards
  12. “Subjects repeatedly experience each test RI and price over many months, and so can readily appreciate them after minimal experience on a given trial without uncertainty.”
  13. “The key molar statistic [whats a molar statistic??] is the TA, namely the proportion of the available time for working in a test trial that the subject spends pressing the lever.”
  14. Qualitative behavior is:
    1. For high payoff (reward/cost), work is conducted almost continuously
    2. For low payoff, leisure and work are both done in large chunks, not interleaved
    3. Work is done during the entire “price duration” (the amount of time the lever needs to be pressed?) as long as this isn’t a very long amount of time
    4. Duration of leisure bouts varies
  15. Problem is formulated as a semi-MDP.  They assume the duration of the trial does not influence behavior as they are fairly long
  16. TA (proportion of time working) increases with RI  and decreases with cost
  17. The detailed activity sequence they call ethograms (when and how long each period of work/leisure was)
  18. <Oops re-read this bit, not sure what I took notes on already>
  19. Assume that the effort involved in depressing the switch is negligible, the cost is the amount of time
  20. Discuss benefit of leisure as a function of duration, which would lead to different behaviors
    1. For example, with a sigmoidal, it would be preferable to take long breaks to short ones
    2. They also consider linear, and linear/sigmoid combination
    3. Note that many (infinitely) different functions are possible
  21. <It seems like they treat cost as linear in time, so why do they consider different functions for the benefit of leisure but not effort?>
  22. <A paragraph about Pavlovian and instrumental influences which I guess I don’t have the background to understand at the moment.>
  23. The state definition is a bit unintuitive at least until further explained – seems to be defined completely in the amount of work performed
    1. Start at some state <pre,w>
    2. Choose either to engage in leisure or work for some period of duration as decided by the subject
    3. If leisure is selected, the agent “enjoys  a benefit-of-leisure” for the amount of time chosen, and then returns back to the same state <pre,w>
    4. If work is done but the period of work leading to a reward is not completed, <pre,w> is updated, with w being updated to w+the amount of work just done
    5. Once enough work is done to provide reward, state is updated to the post-reward state <post> (they cannot work longer than this because at this point the lever they press is retracted)
    6. “In the post-reward state s’=<post>, the subject can add instrumental leisure for time τL to the mandatory Pavlovian leisure τPav discussed above.  It receives utility CL(τL + τPav), and then transitions to state s’=<pre,0>.”  Is the Pavlovian leisure is the inability to do work?
  24. Assume stochastic policy, evaluated according to average reward (detailed is supplementary material, which I’ll get to later).
  25. “The average reward rate is the ratio of the expected total microscopic utility accumulated during a cycle to the expected total time that a cycle takes.  The former comprises RI from the reward and the expected microscopic utilities of leisure; the latter includes the price P and the expected duration engaged in leisure.”
  26. The value of leisure during <post> depends on:
    1. The function (linear sigmoidal) of the time spent in leisure plus
    2. The Pavlovian reinforcement, the opportunity cost with regard to the same amount of time (determined by the average reward rate),
    3. And the value of <pre,0> which is the resulting state
  27. As its a semi-MDP where the agent chooses the action durations, the value function has an integral over time
  28. The value of <pre,w> is the same thing, without the the enforced Pavlovian leisure period, because that only occurs during <post>.  It also depends on whether (if work was selected) that the duration was long enough to lead to <post>
  29. Policy is stochastic based on softmax, in addition there is a distribution over durations so that durations selected by the model aren’t extremely long (in order to fit actual data)
  30. The distribution in the policy also requires the computation of an integral to solve – wonder how they took care of these integrals
  31. All this defined, now look to see how the model can describe characteristics of behavior
  32. For now, they consider linear benefit of leisure based on duration, that work always completes the task, and that arbitrarily long leisure durations are possible
    1. In this case, they come up with an analytical solution – again this is in the appendix I will get to
  33. Where the softmax temperature -> inf, the result is deterministic optimal behavior.  In this case, the agent would either only engage in leisure, or work, depending on the relative rewards and costs
    1. <This is perhaps a weakness of the model that they need to inject noise in this manner to prevent this sort of behavior>
    2. They set the temperature to 1 – not a weird value so its probably fairly robust to this parameter
  34. For high payoffs, work is done exclusively, so the average reward is just the reward/(work duration + Pavlovian delay) .  In very high reward settings, the Pavlovian delay is also short, so its basically just the reward/duration.  In this case, the opportunity cost of leisure is linear in duration in RI/P
    1. Rest is “…very rare, short, lagged-exponentially distributed…” at then end of each wok session, agrees well with actual results
  35. For low payoffs, work is barely ever done, and leisure is generally done at once in large blocks: “This accounts for [a?] key feature of the data.”  In fact, these leisure bouts are so long
  36. <There are discussions of the graphs, but there are so many of them I don’t really have resources to consider them in-depth.>
  37. With medium payoff, the model similarly (to the low payoff case) predicts one short period of leisure followed by a long period that is so long it runs past the end of the episode
  38. <Again, I think> discussion of what different curves corresponding to benefit of leisure would have on policy (many frequent stops, or longer periods of rest)
  39. “Stochasticity in choices had a further unexpected effect in tending to make subjects pre-commit to a single long work bout rather than dividing work up into multiple short bouts following on from each other.  The more bouts the subject used for a single overall work duration, the more probably [wc?] stochasticity would lead to a choice in favor of leisure, and thus lower the overall reward rate.  Pre-commitment to a single long duration avoids this.”  Benefit to pre-commitment is increased if things like switch costs are included
  40. “Even at very high pay-offs, subjects are observed still to engage in short leisure bouts after receiving a reward-the so-called post-reinforcement pause (PRP).  This is apparently not instrumentally appropriate, and so we consider PRPs to be Pavlovian.”
  41. Findings of behavioral economists contradict current labor supply theory – cabbies will quit for the day once they earn a target income, even if there are many passengers around (easier money still to be made).  <Seems like as-is their current model doesn’t jive with this behavior either, but additional factors could be included that may allow for such things>
  42. Also mentions continuous time Markov chains, also mentions standard MDPs with simply a very fine temporal resolution <but here the treatment is as a sem-MDP, which seems more appropriate given the experimental design>.
  43. “Fatigue would lead to runs of work bouts interspersed with short leisure bouts, followed by a long leisure bout to reset or diminish the degree of fatigue.  Note, however, that fatigue would make the benefit-of-leisure depend on the recent history of work.”
  44. The experiments here were based on repetitions of similar task settings over months, with the idea that the subjects <rats> would be able to very quickly adapt to the particular experimental setting <parameterization> used in the trial recorded.
    1. On the other hand, in episodes before learning has occurred with regards to task structure, the problem would involve poorer understanding of task structure, and would lead to more exploratory behavior

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: