- Deals with analysis of computational models with regards to data from actual experiments, seems to have emphasis on RL experiments
- Tutorial on relevant statistical methods on synthetic data and flag potential pitfalls
- Original publications on the dopamine/RL connection dealt with activity averaged over many trials, but when examining prediction error, the change from trial to trial is particularly important
- Means that data should be treated as timeseries, and goodness of fit of model should be done to data directly as opposed to average (over time) results

- Direct experimental tests are applicable and should be used in this setting
- In a bandit task, for example, there is one action choice and one reward per trial
- Models allow for exact estimation of quantities (such as prediction error) that would otherwise be subjective, which then allows the search for neural correlates to these values predicted by the model
- A first issue is dealing with estimation of free parameters (such as learning rate), probably gamma as well
- The second issue is model comparison; different models produce different predictions about what results should look like
- A model can be separated into 2 parts:
- Learning model: what is going on in terms of the internal representation
- Ex/ estimated average reward for each slot machine

- Observation model: how the internal components of learning model manifest themselves in observed data
- This could be either actual choices made or BOLD, depending on the observation model

- Learning model: what is going on in terms of the internal representation
- Learning models are generally deterministic but observations models often noisy (at least in terms of the measurements made on them)
- How to do model parameter estimation from data:
- Model should define a probability distribution over data sets
- Then you can use Bayes rule to flip the conditional in the probability to get probability of model parameters conditioned on the data and the model
- Commonly seek the max likelihood estimate of paramters
- First example is specific to softmax selection in a bandit task
- Discuss use of Hessian to generate CIs around parameter estimates
- Discusses use of local blackbox optimization and common libraries to do so. Some will also produce Hessians for CIs as well
- If parameter estimation yields parameters that are enormous, small, or at set boundaries, there is something wrong either with programming, or model is simply not appropriate for predicting data
- In Bayesian approaches can try to help by introducing priors
- Dont average across subjects either; the variability between them is important
- More appropriate is to generate MLEs for each subject
- Then compare across groups with something like a t-test

- In some cases Bayes rule simply expresses the equation that maximizes parameterlikelihoods, but in the general case this involves integration which can be problematic
- If assumption is made that things are Gaussian (this would be true for a number of families of distributions), this becomes easy to deal with (although this leads to inflated estimates of variance)

- Some stuff about fMRI I’m skipping for now, but worth coming back to
- Another issue is of parametric nonstationarity – the assumption that parameters may change over time hasn’t been considered so far here
- Example, does learning rate reduce over time? This would allow for fast acquisition as well as convergence
- This is a fundamentally tough issue, one approach is to simply design tasks to minimize the existance of nonstationarity

- So far, the assumption is a fixed model and an attempt to find its parameters.
- Another important question is whether one model fits data better than another
- In terms of RL, this could be used to test between MB+MF models, for example

- In models with many parameters, be cautious of overfitting
- For the most part, the tools used for model comparison just involve parameter fitting
- In order to mitigate issues related to overfitting (and therefore selecting a model that is actually a poorer predictor) cross validation should be used
- In RL, however, cross validation is difficult, as data isn’t IID so its difficult to pull apart independent datasets
- There must be a way around this, though?

- In RL, however, cross validation is difficult, as data isn’t IID so its difficult to pull apart independent datasets
- With nested models (I suppose that means ones where more parameters can be exposed by choice) can use something called likelihood ratio tests to determine whether to accept simpler or more complex model
- Bayesian methods can natually assign higher probabilities to more general models not only due to priors over models, but also that more appropriate models will have better support of more data
- Can use something called Bayes factor to compare two models using Bayes rule
- These aren’t the same as classical p values, but can be interpreted in a similar manner

- With Bayesian comparison, again, the problem is in terms of the priors and the integration, as usual assuming Gaussian or other forms helps
- Before its assumed that the parameters are fit to be constant across subjects, another method is to use the same model across subjects but fit parameters to each subject, creating a distribution over parameters for each subject
- Math for how to do this Bayesian

- Don’t evaluate models based on how many correct decisions they make – bad for a number of reasons, including an improper way of dealing with noice
- Be careful when measuring differences between groups – having additional variables may cause actual differences to be absorbed in parameters that should be the same across groups, fix some parameters across groups and fit others if appropriate
- Answering “how well” the model fits is still a tough problem to answer scientifically, but there are some reasonable means of doing so. This isn’t heavily adopted in the field yet, so how to interpret them is still somewhat open
- Remember that its only easy to do relative comparisons, but confirming one model is “right” is difficult to justify