This document is intended to give a more detailed description (beyond what’s described in the “Getting Started” vignette) of model calibration with MacPan.

The `fix_pars`

function takes an initial/baseline set of parameters and user-specified values of summary “moments” of the parameters (one or more of the intrinsic growth rate \(r\), the intrinsic reproductive number \(R_0\), and the generation interval \(\bar G\)). For each summary metric, the user also gives a set of parameters that are to be adjusted (all by the same multiplier) to achieve the target. The function returns an adjusted set of parameters that correspond to the requested metrics.

This step is useful to get the initial values into a reasonable ballpark, *or* to adjust parameters that will not be tuned in the full calibration step.

In general, time-varying parameters are specified by a data frame (`params_timevar`

argument) of the following form:

```
Date Symbol Relative_value
1 2020-03-23 beta0 0.9
2 2020-03-30 beta0 NA
3 2020-04-01 beta0 1.2
```

where `Date`

is the date of a breakpoint `Symbol`

refers to the name of a value in the parameter vector. If numeric, `Relative_value`

is the relative value (i.e., relative to the value in the baseline `params`

vector) to switch to at the specified date. If `NA`

, the relative value will be a calibrated parameter.

The `opt_pars`

list specifies which parameters to calibrate, and their starting value:

- the
`params`

element specifies starting calibratation values corresponding to elements of the (unlisted?)`params`

vector - the
`time_params`

(or`[link_]time_params`

) element specifies starting values for the calibrated values in`params_timevar`

(it should be the same length and for now should match the missing elements by position) - the
`[link_]nb_disp`

element specifies calibrated dispersion parameters

Prepending `log_`

or `logit_`

to a parameter name in `opt_pars`

will automatically specify that it is to be fitted on the corresponding “link” scale (this is the scale on which starting values are specified).

Describe model calibration here: by (1) MLE fitting or (2) log-linear regression to exponential-phase data.

- fit an appropriate statistical model to time-series data (e.g. hospitalization, death, or ICU counts), e.g. negative binomial GLM or (for multiple regions) a GLMM
- the log slope is an estimate of \(r\), the log intercept will provide an estimate of initial conditions
- the first step of the
`calibrate()`

function takes a given set of baseline parameters and adjusts a specified subset of them (at the moment this is fixed to be (1) the baseline transmission rate and (2) the latent period and infection periods for all but the presymptomatic period) to achieve the observed value of \(r\) and one or more other epidemiological characteristics (at present \(\bar G\), the mean generation interval) - the second step first projects the observed intercept (e.g. predicted number of hospitalizations at the beginning of the observation time period) back to the beginning of the simulation time period, then uses the dominant eigenvector of the linearized system to estimate the numbers of other states at that time.

The top-level function is `calibrate()`

: the machinery is in `R/calibrate.R`

Possible calibration issues:

- effects of nonlinear slopes?
- what to do when different data streams have different regression slopes?
- if we use a quadratic fit to allow for time-varying beta, how do we feed this back into the simulation?

Brain dump from e-mail:

Our calibration is/will be based on

- taking reasonable baseline values of all epi parameters (transmission rate, residence time in various compartments, relative transmission of different compartments, aspects of severity and health utilization …) [right now these are taken from the Stanford covid-interventions model and some conversations from our organizational contact about e.g. fraction ICU, hospital residence times etc. They could easily be adjusted based on regional variation.]
- adjusting these parameters to get a mean generation interval and a shape (squared coef of var) that are a match for reasonable values from the literature
- doing a log-linear (negative binomial) fit to one or more observed time series (cases, hospitalization, death) to get a value of ‘r’; adjust base transmission rate to match this r
**JD: I still don’t know how we can adjust beta0 without screwing up Gbar?**, if necessary using numerical optimization to get the same desired values of G etc. at the same time - use the log-slope and log-intercept of the fit in previous step to set initial conditions, seting the
*relative*numbers in compartments according to the dominant eigenvector of the Jacobian. This is where underreporting comes in: e.g. if you’re calibrating from confirmed cases, you need to guess the ratio between cases and true I. If you’re calibrating from reported COVID deaths, you should scale your true initial conditions to take this into account.

Note that we could fake a testing lag (for now) by simple post-hoc adjustment of case times vs. other times. Don’t yet have a good solution for dependence of case numbers on testing intensity though (see `testing_flow.md`

).