McMasterPandemic is a modelling tool that was rapidly developed to provide timely insights into the Covid-19 Pandemic in Canada. We are currently refactoring this tool so that it is faster and more general. This refactoring will be done incrementally by implementing a series of versioned specifications that define each step. This document outlines these specifications, and provides a roadmap for planning future specifications.

We are using semantic versioning https://semver.org/ to provide versions for the specs. All versions of a spec that have not been implemented in a released version of the McMasterPandemic R package have major version 0 (e.g. 0.x.y, https://semver.org#spec-item-4) – currently all of them. Although none of these specs are released, they are being experimentally implemented on the TMB branch of McMasterPandemic, with R-side code here. There is a global option, `MP_flex_spec_version`

, that is used by this `R`

code to control what spec version is being assumed by the `TMB`

/`C++`

code. This helps in testing as well. `C++`

files that implement various spec versions are currently being placed here.

Spec versions have three life cycle stages: frozen, experimental, recommended, and archived. If a frozen version needs clarification or updating, this clarification must be done by linking to a subsequent version that addresses the issue(s). Each frozen version of the spec must contain the following five sections.

*New Capabilities*– very short summary; refer to previous versions for background*Assumptions*– known assumptions that can help users/developers understand if this version will suite their needs*Mathematical Description*– precise definition of the new or changed features of the model; refer to previous versions for definitions being reused*User Interface*– short description and example of the user interface in`R`

for model specification*Data Structure*– definition of the data structure to be passed from`R`

to`TMB`

/`C++`

that stores the model structure

The specs for the following versions are frozen, which means that they will not change.

- 0.0.1 – Simple rate matrix updates
- 0.0.2 – Simulation (iterated state variable updates)
- 0.0.4 – Piece-wise constant exogenous time-variation of parameters
- 0.0.5 – Hazard simulation
- 0.0.6 – Updates of time-varying parameters from the initial parameters on the
`C++`

side - 0.1.0 – Model structure
- 0.1.1 – Make state and calibrate time-varying multipliers
- 0.1.2 – Common factors

The specs for the following versions are being actively developed and modified. Note that version numbers of experimental versions can change without warning.

The specs for archived versions are frozen but no longer respected by the `McMasterPandemic`

`R`

package and/or its test framework. Archived versions cannot be ignored because Frozen and Experimental versions may still reference them.

Log-linear time-variation of parameters. This feature generalizes the piece-wise time-variation of parameters to allow for the following.

- Process error
- Semi-parametric time-variation estimation

This is a

Such parameters have their own parameters that determine a model of time-variation. These latter parameters get transformed to produce a time series of parameter values of length equal to the number of time steps. One simple thing to do would be to put these time series into the parameter vector, and when parameters are accessed use the index `spi + t`

where `spi`

is the index into the `state_param`

vector that picks out the first parameter in the time series and `t`

is the current time step.

\[ \log(u) = \log(u_0) + Xv \] where \(u\) is a vector of length \(T\), the number of time steps, \(u_0\) is a scalar parameter, \(v\) is a vector of length \(D < T\) that parameterizes the model of time variation, and \(X\) is a \(T\)-by-\(D\) matrix of constant columns (e.g. indicator variables, spline bases). Note that logs of vectors are taken element-wise.

\[ \log(u_t) = \log(u_0) + \sum_{i=1}^Dx_{ti}v_i \] \[ u_t = u_0 \prod_{i=1}^D\exp\left(x_{ti}v_i\right) \] For example:

- \(D=2\) breakpoints
- \(v_1 = v_2 = \log\left(\frac{1}{2}\right) \approx -0.69\)
- \(x_{ti} = \begin{cases} 1, & t \ge t_i \\ 0, & \text{otherwise} \end{cases}\)

In this case we have the following time variation of \(u\). \[ u_t = \begin{cases} u_0, & t < t_1 \\ \frac{u_0}{2}, & t_1 \le t < t_2 \\ \frac{u_0}{4}, & t \ge t_2 \end{cases} \]

Lag-n differencing with variable lag. This will allow us to difference between time points with observed data, which is necessary when the frequency of data collection changes and other cases with uneven time-series.

One way to frame this problem is to convert the scalar integer, \(n\), into a vector of length equal to the number of simuluation iterations. Let \(n_t\) be the differencing lag at time \(t\), and \(x_t\) and \(y_t\) be the input and output variables of the differencing at this time.

\[ y_t = x_t - x_{t-n_t} \]

If \(n_t = n\) for all \(t\), then we have the special case of lag-n differencing in spec version 0.2.0.

For all times where \(n_t = 0\), we will get \(y_t = 0\) as well. This corresponds for example to delays in the reporting of incidence, where there are periods of several days over which we do not have any new reports – typically not because there are no new cases but because they have not yet been reported.

Consider a case where the frequency of reporting changes from \(n = 1\) to \(n = 7\) (daily to weekly) at a particular time, \(\tau\). The following example takes \(\tau = 101\)

```
## t n x y
## 1 98 1 295 NA
## 2 99 1 300 5
## 3 100 1 310 10
## 4 101 0 320 0
## 5 102 0 330 0
## 6 103 0 340 0
## 7 104 0 350 0
## 8 105 0 360 0
## 9 106 0 370 0
## 10 107 7 400 90
## 11 108 0 410 0
## 12 109 0 420 0
## 13 110 0 430 0
## 14 111 0 440 0
## 15 112 0 450 0
## 16 113 0 460 0
## 17 114 0 500 100
```

`lag_diff_sri`

remains unchanged from previous versions.

`lag_diff_delay_n`

is changed from a vector to a matrix, with rows associated with the elements of `lag_diff_sri`

one column for each iterations.

Condensation and calibration on the `C++`

side.

Condensation is the process of summarizing the `concatenated_state_vector`

so that it can be compared with available data. This comparison is done quantitatively through a negative log-likelihood function that measures how different the condensed `concatenated_state_vector`

is from the data. The user can add any of the following variables to the condensed dataset.

- Any state variable – already computed in
`state`

and stored in`concatenated_state_vector`

- Element-wise sum of any set of state variables – already computed in
`sp`

but not stored - Any time-varying rate matrix element – already computed in
`ratemat`

and stored in`concatenated_ratemat_nonzeros`

- Element-wise product of any two variables of type 1-3 – not computed
- Element-wise sum of any variable of type 4 – not computed
- Lag-n differences of any variables of type 1-5 – not computed
- Convolutions of any variables of type 1-5 with a gamma-density kernel (see section below on computing convolutions) – not computed

The following are computed in classic MacPan.

- All the S boxes
- The sum over all the S boxes
- The sum over all the E boxes
- The sum over all the I boxes
- The sum over all the H and H2 boxes
- The sum over all the ICUs and ICUd boxes
- The sum over all R boxes
- The lag-1 difference of the total of all X boxes (with NA at the beginning) – called
`hosp`

- The sum over all X boxes
- The lag-1 difference of the total of all D boxes (with NA at the beginning) – called
`death`

- The sum over all D boxes
- All the foi columns
- Incidence:
- compute foi times S box for each epi category
- compute row sums over the resulting columns
- compute the convolution (see section below on computing convolutions) of all of these columns (including the row sums column) – called
`report`

The convolution is based on a gamma-density kernel.

The shape parameter of this gamma density depends on `params['c_delay_cv']`

. \[
\text{shape} = \frac{1}{\text{c_delay_cv}^2}
\] The scale parameter depends on `params['c_delay_mean']`

and the shape \[
\text{scale} = \text{(c_delay_mean)} \text{(c_delay_cv)}^2
\] Compute the vector \(\gamma\) using the TMB `pgamma`

function, which takes a vector `q`

and scalar `shape`

and `scale`

arguments. \[
\gamma = \text{pgamma}(q, \text{shape}, \text{scale})
\]

where \(q\) is a constant integer vector that gets created in the following way in R.

We will avoid computing `qmax`

in C++ because it could introduce a discontinuity that could cause automatic differentiation to fail. This difference between refactored and classic MacPan only has the potential to cause numerical differences whenever `params['c_delay_cv']`

and `params['c_delay_mean']`

are varying – typically because they are in `opt_pars`

– but I have not yet seen any examples where this is the case. To guarantee matched results at least for the initial parameter vector we will set `qmax`

as follows on the R-side and pass it through a `DATA_`

macro.

`qmax = ceiling(qgamma(0.95, 1/params[["c_delay_cv"]]^2, params[["c_delay_mean"]] * params[["c_delay_cv"]]^2))``

The delay kernel for the convolution is a numeric vector given by the following expression. \[ \kappa_i = (\text{c_prop})\frac{\delta_i}{\sum_j\delta_j} \] where \(\delta_j = \gamma_{j+1} - \gamma_j\) and \(\text{c_prop}\) is a scalar in the parameter vector.

The convolution, \(z\), of the time series of any state variable, \(x\), is given by the following. \[ z_{i+m-1} = \sum_{j = 1}^m \kappa_{m-j+1} x_{i+j-1} \] where \(i = 1,...,n-m\), \(m\) is the length of \(\kappa\), and \(n\) is the number of time steps (length of the input vector \(x\)). Note that the first \(m-1\) elements of \(z\) remain undefined and so they do not need to be stored on the C++-side because these \(m-1\) missing values can be added on the R-side.

In Progress: in collaboration with Weiguang

Condensation is the process of summarizing the history of the simulations. Before solving the condensation problem we should refactor how this history is currently summarized. Currently the `concatenated_state_vector`

and `concatenated_ratemat_nonzeros`

contain some history information, but it will be awkward to continue with this strategy. Instead we will introduce a matrix called `simulation_history`

with rows corresponding to simulation steps and columns corresponding to the following (in the following order).

- State variables
- Changing rate matrix elements (there is one such element in this model)
- Sums
- Factrs
- Element-wise expressions involving the first four items
- Lag-n differences of the first five items
- Convolutions of the first five items

Items 1-4 are computed at simulation time, and simply need to be added to the `simulation_history`

matrix at each iteration.

Items 5-7 should be computed at the end of the simulation. These columns should be computed in the same order as they will appear in the `simulation_history`

matrix. The input for these computations (5-7) will be stored in the following new data structures.

- Item 5 – element-wise expressions:
`sr_count`

– vector the same length as the number of columns added in step 3a giving the number of`simulation_history`

columns to use as input for each new column (note that input columns can be used more than once – see`count`

vector for a similar structure)`sri`

– indices into the columns of`simulation_history`

to use as input (see`spi`

vector for a similar structure)`sr_modifier`

– bitwise integer encoding of the parse tree (see`modifier`

for a similar data structure)

- Item 6 – lag-n differences:
`lag_diff_sri`

– indices into the columns of`simulation_history`

to use as input to lag-n difference operations`lag_diff_delay_n`

– vector same length as`lag_diff_sri`

containing the delay (i.e. the “n” in lag-n) of the lag for each lag-n difference operation

- Item 7 – convolutions:
`conv_sri`

– indices into the columns of`simulation_history`

to use as input to convolution operations`conv_c_prop_idx`

,`conv_c_delay_cv_idx`

,`conv_c_mean_cv_idx`

– three vectors each the same length as`conv_sri`

giving indices into`params`

for extracting the parameters of the convolutions`conv_qmax`

– vector the same length as`conv_sri`

giving the maximum integer in the`q`

vector for each convolution

Add `simulation_history`

to the report. On the R-side we could start using `simulation_history`

where we currently use `concatenated_*`

vectors, and eventually we could drop them.

We compare the negative binomial negative log likelihood with `observed_vector`

as data and `condensed_state_vector`

as predictions. Apologies that the word ‘negative’ is being used in two different ways – the first refers to the negative binomial distribution, and the second to the fact that we are multiplying the log likelihood by negative one. In R we would write this as follows.

In TMB, this negative binomial density function should be used for this (I think … how does `size`

on the R-side correspond to `var`

in TMB?).

The first three arguments to this function are vectors of the same size.

`observed_vector`

is an observed data stream that is provided by the user`condensed_state_vector`

is described in detail above`nb_disp`

is a vector of dispersion parameters, which is composed of values passed in through`PARAMETER`

macros (see below for a description of how`nb_disp`

is constructed)

The sum of these negative log likelihood values gives a loss function to be returned to the user.

TODO: how to handle clamping when we leave the support of the density? concerned about discontinuities.

We require an additional set of parameters to model negative binomial dispersion, `nb_disp`

. The user has two options.

- one dispersion parameter to be used in every element of
`nb_disp`

- one dispersion parameter is passed in for every variable in
`condensed_state_vector`

The user is allowed to specify a prior distribution for any element of `params`

, `tv_mult`

, or `nb_disp`

. Initially only univariate normal prior distributions will be allowed, but this should be built to be extensible allowing for other distributions to be added in the future.

For each parameter with a prior, the user specifies the distribution (initially `normal`

will be the only choice), and a parameter vector for the arguments of the corresponding density function. At each evaluation of the optimizer the log of the prior densities are subtracted from the negative log-likelihood function.

The TMB normal density function should be used. We should be extensible to allow future developers to add other distributions from here.

The user is allowed to express any value of `params`

or `tv_mult`

on one of several transformed scales. The corresponding inverse transformation is applied immediately as the parameter is read through the appropriate macro on the TMB-side. The user should be allowed to specify one of the following transformations.

`log10`

`log`

`logit`

The corresponding inverse transformations that would be applied after the macros are the following.

`10^x`

`exp(x)`

`1/(1+exp(-x))`

Let \(l_{\psi_i}\left(x_i(\theta); y_i\right)\) be a loss function where \(x_i(\theta)\) is a simulated element of the history matrix, \(y_i\) is an associated observed data point, and \(\psi_i\) is a vector of loss function parameters. Note that we write the simulation values as functions of \(\theta\) to indicate their dependence on parameters. Every element \(i\) that belongs to the same column of the history matrix has the same value for \(\psi_i\), but to simplify notation \(\psi\) is sub-scripted by \(i\). Let \(r_{\phi_j}(\theta_j)\) be an optional regularization function for each parameter \(\theta_j\) being optimized, where \(\phi_j\) is a vector of regularization function parameters associated with parameter \(j\). The objective function is then given by the following.

\[ L_{\psi, \phi}(\theta; y) = \sum_i l_{\psi_i}\left(x_i(\theta); y_i\right) + \sum_j r_{\phi_j}(\theta_j) \] Currently the only objective function that is offered is the negative log negative binomial density, but the infrastructure we build assumes that other loss functions will be allowed in the future. Similarly the only regularizing function that is available is the negative log normal density.

Each parameter, \(\theta_j\), can be optionally passed into the objective function on a transformed scale. These transformations allow users to enforce limits on the domain of the parameter space (e.g. a logit transformation will keep parameter values between 0 and 1). The regularization functions are applied on the transformed scale, but the elements of the rate matrix (and therefore the simulations, \(x\)) are computed using the parameters on their original scale. Therefore, we have the following ordering of operations.

- Parameter, \(\theta_j\), is passed in to the objective function on the transformed scale, \(f(\theta_j)\)
- The regularization function, \(r_{\phi_j}(\theta_j)\), is computed and saved
- The inverse transformation is applied to the parameter, \(\theta_j\), and the result is used to overwrite the previous value in the
`c++`

`params`

vector - The simulations are made
- The objective function, \(L\), is computed, and this computation includes adding the saved values of the regularization functions
- The objective function, \(L\), is returned by the
`c++`

function`objective_function`

- Negative log negative binomial density
- Loss function parameters: dispersion
- In TMB use
`dnbinom2`

to compute the log negative binomial density - The arguments for
`dnbinom2`

can be computed as follows:`x`

– observed data`mu`

– simulated data`var`

–`mu + mu^2 / disp`

, where`disp`

is the dispersion parameter`give_log`

–`1`

- Lower bound on the simulated values
- Before passing the simulated data to the
`mu`

argument, the user may optionally choose to smoothly constrain the simulated data to be greater than a lower bound,`eps`

, by applying the function`mu_constrained = mu + eps*exp(-mu/eps)`

- Before passing the simulated data to the

- No regularization
- Name to use in
`C++`

:`flat`

- Regularization function parameters: initial value (\(\theta_\text{init}\))
- \(r_{\theta_\text{init}}(\theta) = 0\)
- The fact that the regularization parameter, \(\theta_\text{init}\), has no effect on the regularization function is an unusual special case. When regularization is applied, the ‘location’ of the regularization function is used as the initial value for the optimization. In the case of no regularization, we are using the regularization parameters to contain the initial value. One interpretation of this regularization parameter is as the ‘peak’ of a flat prior.

- Name to use in
- Negative log of the normal density
- Name to use in
`C++`

:`normal`

- Regularization function parameters: mean (\(\bar{\theta}\)), standard deviation (\(\sigma_\theta\))
- \(r_{\mu_f, \sigma_f}(\theta) = -\log\left[\mathcal{N}(f(\theta), \mu_f, \sigma_f)\right]\), where \(\mathcal{N}\) is the normal density
- In TMB the dnorm function should be used with the following arguments
`x`

– \(f(\theta)\) – value of the parameter on the transformed scale`mean`

– \(\mu_{f(\theta)}\) – first regularization parameter`sd`

– \(\sigma_{f(\theta)}\) – second regularization parameter`give_log`

` – 1

- Name to use in

The first regularization parameter for every regularization function is used as the starting value for the optimizer on the scale of the transformed parameter.

`identity`

- index = 1
- trans = \(f(x) = x\)
- inverse = \(f^{-1}(x) = x\)

`log`

- index = 2
- trans = \(f(x) = \log(x)\)
- inverse = \(f^{-1}(x) = e^x\)

`log10`

- index = 3
- trans = \(f(x) = log_{10}(x)\)
- inverse = \(f^{-1}(x) = 10^x\)

`logit`

- index = 4
- trans = \(f(x) = \log\left[\frac{x}{1-x}\right]\)
- inverse = \(f^{-1}(x) = \frac{1}{1 + e^{-x}}\)

`cloglog`

- index = 5
- trans = \(f(x) = \log\left[-\log(1-x)\right]\)
- inverse = \(f^{-1}(x) = 1 - \exp\left[-\exp(x)\right]\)

`inverse`

- index = 6
- trans = \(f(x) = \frac{1}{x}\)
- inverse = \(f^{-1}(x) = \frac{1}{x}\)

We will pass the following additional data through to TMB to specify the objective function.

**Variables**: the following three integer vectors have one element per variable in the objective function and identify what loss function is used for each variable, as well as how many parameters those loss functions require`obs_var_id`

– id giving the column in the history matrix associated with each variable in the objective function`obs_loss_id`

– identifier of the loss function being used for each variable (currently only one – negative log negative binomial density)`obs_loss_param_count`

– number of loss function parameters associated with each variable

**Loss Function Parameters**: the following vector has one element per loss function parameter (these are the \(\psi_i\) parameters described above in the general objective function section)`obs_spi_loss_param`

– index into sp containing the loss function parameters. the number of such parameters is sum(obs_loss_param_count)

**Observed Data**the following three vectors have one element for each observed data point, and identify what elements in the history matrix should be compared with each observed value (each element of these vectors corresponds to each \(i\) in the description of the general objective function above)`obs_time_step`

– row in the history matrix`obs_history_col_id`

– column in the history matrix (match with`obs_var_id`

above to determine the`obs_loss_id`

and`obs_loss_param_count`

)`obs_value`

– value observed in data to compare with the associated element in the history matrix`obs_do_sim_constraint`

– Boolean determining if the lower bound in`obs_sim_lower_bound`

should be applied to the simulated data with smooth constraint function`mu_constrained = mu + eps*exp(-mu/eps)`

`obs_sim_lower_bound`

– lower bound applied if`obs_do_sim_constraint`

is true

**Optimized Parameters**: the following vectors have one element per parameter that is being optimized`opt_param_id`

– indices into the`params`

vector giving the parameters to be optimized`opt_trans_id`

– index identifying a transformation – see above for correspondence between indices and transformations`opt_count_reg_params`

– number of regularization parameters associated with each parameter to be optimized`opt_reg_family_id`

– index identifying a family of regularization functions – currently only two are available (`flat`

,`normal`

)

**Optimized Time-Varying Parameters (not yet in scope)**: similar to optimized parameters but for`tv_mult`

`opt_tv_param_id`

– indices into the`tv_mult`

vector giving the time varying multipliers to be optimized`opt_tv_trans_id`

– index identifying a transformation – see above for correspondence between indices and transformations`opt_tv_count_reg_params`

– number of regularization parameters associated with each parameter to be optimized`opt_tv_reg_family_id`

– index identifying a family of regularization functions – currently only two are available (`flat`

,`normal`

)

**Regularization Parameters**:`opt_reg_params`

– regularization parameters – the size of this vector is`opt_count_reg_params.sum()`

`opt_tv_reg_params`

– regularization parameters for`tv_mult`

– the size of this vector is`opt_tv_count_reg_params.sum()`

Save common factors when building rate expressions

Expressions involving sums and products of state variables and parameters, as well as their complements and inverses, can be saved in a vector, \(\phi\). The elements of \(\phi\) can then be used in expressions that define elements of the rate matrix.

Four new integer vectors:

`factr_spi`

– 1-based index into the elements of the`sp`

vector that will contain the common factor`factr_count`

– vector the same length as`factr_spi`

containing the number of elements in`sp`

used to compute the common factor`factr_spi_compute`

– vector with`sum(factr_count)`

elements giving the indices into the elements of the`sp`

vector for extracting the elements used to compute the new factor`factr_modifier`

– vector with`sum(factr_count)`

elements describing the operations to be performed for generating the common factor (see v0.0.1 for details on modifier vectors)

This is a very big ‘patch’. The motivation for it is to get a first release that is potentially useful to PHAC. This requires three enhancements.

- Generalization of the definition of the outflow vector
- Construct the state vector from the parameter vector at the beginning of each simulation run
- Move
`tv_mult`

to the parameters so that we can calibrate them if we need to do so

Let \(f_i^{(out)}\) be the \(i\)th element of the outflow vector, which is computed as follows. \[ f_i^{(out)} = \sum_{j \in \Omega_i}F_{ij} \]

Where \(\Omega_i\) is a subset of the indices into the state vector. Typically there are only a small number of unique index sets, with one \(\Omega_i\) for many values of \(i\) – for this reason the data structure for these indices only need to track each unique index set.

We can express the previous definition of the outflow vector (TODO: link to previous definition) in the terms of this generalization as a constant \(\Omega = \Omega_i, \forall i\).

In previous spec versions, the `C++`

side needs to get the initial state vector directly from R. The difficulty with this strategy is that the initial state vector depends on the parameter vector, so when a new parameter vector gets updated in an optimization or ensemble run it becomes necessary to drop back to R causing a substantial performance bottleneck. So here we add the ability to create the initial state vector from the parameter vector in `C++`

.

What follows is a detailed description of the procedure, which makes reference to new items in the data structure (see Data Structure below).

- We first check to see if a state vector has been passed – if not, do the following steps to create a new state vector
- Initialize two state vectors with all zeros
`state`

– initializing state vector for the full non-linearized model`lin_state`

– initializing state vector for the linearized model

- Make a copy (
`lin_params`

) of the parameter vector (`params`

) - Replace some user-specified elements of
`lin_params`

with user supplied constant values (`lin_param_vals`

,`lin_param_count`

,`lin_param_idx`

– these can be found in`model$tmb_indices$linearized_params`

) - Replace some user-specified elements of
`lin_state`

with elements of`lin_params`

(`df_state_par_idx`

,`df_state_count`

,`df_state_idx`

– these can be found in`model$tmb_indices$disease_free`

) - Compute the Jacobian matrix (
`jac`

), which requires the following steps in a function that takes`lin_state`

and returns an updated`lin_state`

- Create a rate matrix (
`lin_ratemat`

) from`lin_state`

and`lin_params`

using the existing`make_ratemat_indices`

- Create a flow matrix (
`lin_flow`

) from`lin_state`

and`lin_ratemat`

- Create inflow vector (
`lin_inflow`

) from`lin_flow`

by taking column sums - Create outflow vector (
`lin_outflow`

) from`lin_flow`

by the method above – Generalization of Outflow Vector (`linearized_outflow_row_count`

,`linearized_outflow_col_count`

,`linearized_outflow_rows`

,`linearized_outflow_cols`

) - Update
`lin_state`

using`lin_inflow`

and`lin_outflow`

- Create a rate matrix (
- Remove certain user-defined rows, columns, and elements from
`jac`

and`lin_state`

to create a new vector`eigvec`

(`all_drop_eigen_idx`

– can be found in`model$tmb_indices$initialization_mapping`

) - Compute the dominant eigenvector,
`eigvec`

of`jac`

- Remove elements of
`eigvec`

to create`eig_infected`

(`eigen_drop_infected_idx`

– can be found in`model$tmb_indices$initialization_mapping`

) - Normalize
`eig_infected`

to have elements that sum to one - Update
`state[all_to_infected_idx] = eig_infected * param[infected_idx]`

(`infected_idx`

can be found in`model$tmb_indices$initial_population`

) - Update
`state[susceptible_idx] = (1/susceptible_idx.size()) * (param[total_idx] - param[infected_idx])`

(`infected_idx`

and`total_idx`

can be found in`model$tmb_indices$initial_population`

)

Objects of class `flexmodel`

contains a new element called `tmb_indices$outflow`

, with the following elements

`row_count`

–`col_count`

–`rows`

–`cols`

–

These indices encode information in the \(\omega_i\) index sets (see Mathematical Description above).

This `outflow`

item means that `par_accum_indices`

is no longer required. Given that we do not need to worry about back compatibility yet, it will be dropped from this and all future spec versions.

Objects of class `flexmodel`

contains the following new elements.

`linearized_params`

– indices to update some elements of`lin_param`

before computing the Jacobian of the linearized model`lin_param_vals`

– values used to fill the elements of`lin_param`

`lin_param_count`

– the number of elements of`lin_param`

that will be updated with each value in`lin_param_vals`

`lin_param_idx`

– indices of`lin_param`

to update

`disease_free`

– indices to initialize`lin_state`

to be nearly disease-free, by setting some of its elements equal to some elements of`lin_param`

`df_state_par_idx`

– indices of`lin_param`

used to fill the elements of`lin_state`

`df_state_count`

– the number of elements of`lin_state`

that will be updated with each parameter identified by`df_state_par_ix`

`df_state_idx`

– indices of`lin_state`

to update

`linearized_outflow`

– same as`outflow`

but for the linearized model`initialization_mapping`

– indices into various subsets of the state vector and eigenvector that could be useful for initializing the state – see justification below this list`all_to_eigen_idx`

– indices into`state`

and`lin_state`

for identifying states in`eigvec`

`all_to_infected_idx`

– indices into`state`

and`lin_state`

for identifying states in`eig_infected`

`eigen_to_infected_idx`

– indices into`eigvec`

for identifying states in`eig_infected`

`all_drop_eigen_idx`

– indices into`state`

and`lin_state`

for identifying states not in`eigvec`

`all_drop_infected_idx`

– indices into`state`

and`lin_state`

for identifying states not in`eig_infected`

`eigen_drop_infected_idx`

– indices into`eigvec`

for identifying states not in`eig_infected`

`susceptible_idx`

– indices into`state`

and`lin_state`

for identifying states associated with the initial susceptible population

`initial_population`

– indices into the parameter vector for identifying the total number of individuals in the population and the initial total number of infected individuals in the population`total_idx`

`infected_idx`

The justification for the `initialization_mapping`

is given here. One potential source of confusion in the above process is the need to keep track of the different subsets of states. To help clarify this point, there are conceptually three types of state vectors containing nested subsets of states. 1. `all_states`

– vectors containing all states 2. `eigen_states`

– vectors containing all states of the linearized system that are included in the dominant eigenvector 3. `infected_states`

– vectors containing all infected states, which are `eigen_states`

that will be initialized using corresponding elements in the eigenvector

To make life a little easier for development on the `C++`

side we provide index vectors for converting between the three nested sets of states. 1. `all_to_eigen_idx`

2. `all_to_infected_idx`

3. `eigen_to_infected_idx`

And for completeness, in case it is useful, we also provide index vectors for dropping states from each kind of state vector. 1. `all_drop_eigen_idx`

2. `all_drop_infected_idx`

3. `eigen_drop_infected_idx`

We bump major versions here to start working on model structure (e.g. vaccination, age, testing). We make two enhancements:

- Introduce a new
`struc`

object class that allows for symbolic manipulation of matrices and vectors of parameters and state variables when defining rate matrix elements - The ability to refer to sums of state variables and parameters when defining expressions for rate matrix elements.

Within each simulation iteration, a user-defined number of sums of state variables and parameters are computed and stored immediately before the rate matrix is updated.

As an example, the force of infection under the two-dose vaccination model can be given by the following `struc`

objects.

Start by getting a set of vaccination parameters and state variables.

Create a symbolic vector of I-state variables, with 4 base states by 5 vaccination categories = 20 variables.

```
base_params <- read_params("PHAC.csv")
vax_params <- expand_params_vax(
params = base_params,
model_type = "twodose"
)
Istate = (McMasterPandemic:::expand_names(
c('Ia', 'Ip', 'Im', 'Is'), # = Icats
attr(vax_params, 'vax_cat')) # = vax_cats
%>% struc
)
as.matrix(Istate)
```

```
## [,1]
## [1,] "(Ia_unvax)"
## [2,] "(Ip_unvax)"
## [3,] "(Im_unvax)"
## [4,] "(Is_unvax)"
## [5,] "(Ia_vaxdose1)"
## [6,] "(Ip_vaxdose1)"
## [7,] "(Im_vaxdose1)"
## [8,] "(Is_vaxdose1)"
## [9,] "(Ia_vaxprotect1)"
## [10,] "(Ip_vaxprotect1)"
## [11,] "(Im_vaxprotect1)"
## [12,] "(Is_vaxprotect1)"
## [13,] "(Ia_vaxdose2)"
## [14,] "(Ip_vaxdose2)"
## [15,] "(Im_vaxdose2)"
## [16,] "(Is_vaxdose2)"
## [17,] "(Ia_vaxprotect2)"
## [18,] "(Ip_vaxprotect2)"
## [19,] "(Im_vaxprotect2)"
## [20,] "(Is_vaxprotect2)"
```

Create a vector of base transmission rates associated with each of the 4 base I-states.

```
baseline_trans_rates =
struc(
'Ca',
'Cp',
'(1 - iso_m) * (Cm)',
'(1 - iso_s) * (Cs)') *
struc('(beta0) * (1/N)')
as.matrix(baseline_trans_rates)
```

```
## [,1]
## [1,] "(Ca) * (beta0) * (1/N)"
## [2,] "(Cp) * (beta0) * (1/N)"
## [3,] "(1 - iso_m) * (Cm) * (beta0) * (1/N)"
## [4,] "(1 - iso_s) * (Cs) * (beta0) * (1/N)"
```

Create a 5-by-5 block matrix of vaccine efficacies

```
vax_trans_red = struc_block(struc(
'1',
'1',
'(1 - vax_efficacy_dose1)',
'(1 - vax_efficacy_dose1)',
'(1 - vax_efficacy_dose2)'),
row_times = 1, col_times = 5)
as.matrix(vax_trans_red)
```

```
## [,1] [,2]
## [1,] "(1)" "(1)"
## [2,] "(1)" "(1)"
## [3,] "(1 - vax_efficacy_dose1)" "(1 - vax_efficacy_dose1)"
## [4,] "(1 - vax_efficacy_dose1)" "(1 - vax_efficacy_dose1)"
## [5,] "(1 - vax_efficacy_dose2)" "(1 - vax_efficacy_dose2)"
## [,3] [,4]
## [1,] "(1)" "(1)"
## [2,] "(1)" "(1)"
## [3,] "(1 - vax_efficacy_dose1)" "(1 - vax_efficacy_dose1)"
## [4,] "(1 - vax_efficacy_dose1)" "(1 - vax_efficacy_dose1)"
## [5,] "(1 - vax_efficacy_dose2)" "(1 - vax_efficacy_dose2)"
## [,5]
## [1,] "(1)"
## [2,] "(1)"
## [3,] "(1 - vax_efficacy_dose1)"
## [4,] "(1 - vax_efficacy_dose1)"
## [5,] "(1 - vax_efficacy_dose2)"
```

And finally the force of infection vector, which looks like we really need to introduce the capability for common factors.

```
## struc object with 5 rows and 1 column:
##
## Row 1
## (Ca) * (beta0) * (1/N) * (Ia_unvax) +
## (Cp) * (beta0) * (1/N) * (Ip_unvax) +
## (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_unvax) +
## (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_unvax) +
## (Ca) * (beta0) * (1/N) * (Ia_vaxdose1) +
## (Cp) * (beta0) * (1/N) * (Ip_vaxdose1) +
## (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxdose1) +
## (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxdose1) +
## (Ca) * (beta0) * (1/N) * (Ia_vaxprotect1) +
## (Cp) * (beta0) * (1/N) * (Ip_vaxprotect1) +
## (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxprotect1) +
## (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxprotect1) +
## (Ca) * (beta0) * (1/N) * (Ia_vaxdose2) +
## (Cp) * (beta0) * (1/N) * (Ip_vaxdose2) +
## (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxdose2) +
## (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxdose2) +
## (Ca) * (beta0) * (1/N) * (Ia_vaxprotect2) +
## (Cp) * (beta0) * (1/N) * (Ip_vaxprotect2) +
## (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxprotect2) +
## (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxprotect2)
## Row 2
## (Ca) * (beta0) * (1/N) * (Ia_unvax) +
## (Cp) * (beta0) * (1/N) * (Ip_unvax) +
## (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_unvax) +
## (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_unvax) +
## (Ca) * (beta0) * (1/N) * (Ia_vaxdose1) +
## (Cp) * (beta0) * (1/N) * (Ip_vaxdose1) +
## (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxdose1) +
## (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxdose1) +
## (Ca) * (beta0) * (1/N) * (Ia_vaxprotect1) +
## (Cp) * (beta0) * (1/N) * (Ip_vaxprotect1) +
## (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxprotect1) +
## (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxprotect1) +
## (Ca) * (beta0) * (1/N) * (Ia_vaxdose2) +
## (Cp) * (beta0) * (1/N) * (Ip_vaxdose2) +
## (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxdose2) +
## (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxdose2) +
## (Ca) * (beta0) * (1/N) * (Ia_vaxprotect2) +
## (Cp) * (beta0) * (1/N) * (Ip_vaxprotect2) +
## (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxprotect2) +
## (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxprotect2)
## Row 3
## (1 - vax_efficacy_dose1) * (Ca) * (beta0) * (1/N) * (Ia_unvax) +
## (1 - vax_efficacy_dose1) * (Cp) * (beta0) * (1/N) * (Ip_unvax) +
## (1 - vax_efficacy_dose1) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_unvax) +
## (1 - vax_efficacy_dose1) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_unvax) +
## (1 - vax_efficacy_dose1) * (Ca) * (beta0) * (1/N) * (Ia_vaxdose1) +
## (1 - vax_efficacy_dose1) * (Cp) * (beta0) * (1/N) * (Ip_vaxdose1) +
## (1 - vax_efficacy_dose1) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxdose1) +
## (1 - vax_efficacy_dose1) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxdose1) +
## (1 - vax_efficacy_dose1) * (Ca) * (beta0) * (1/N) * (Ia_vaxprotect1) +
## (1 - vax_efficacy_dose1) * (Cp) * (beta0) * (1/N) * (Ip_vaxprotect1) +
## (1 - vax_efficacy_dose1) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxprotect1) +
## (1 - vax_efficacy_dose1) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxprotect1) +
## (1 - vax_efficacy_dose1) * (Ca) * (beta0) * (1/N) * (Ia_vaxdose2) +
## (1 - vax_efficacy_dose1) * (Cp) * (beta0) * (1/N) * (Ip_vaxdose2) +
## (1 - vax_efficacy_dose1) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxdose2) +
## (1 - vax_efficacy_dose1) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxdose2) +
## (1 - vax_efficacy_dose1) * (Ca) * (beta0) * (1/N) * (Ia_vaxprotect2) +
## (1 - vax_efficacy_dose1) * (Cp) * (beta0) * (1/N) * (Ip_vaxprotect2) +
## (1 - vax_efficacy_dose1) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxprotect2) +
## (1 - vax_efficacy_dose1) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxprotect2)
## Row 4
## (1 - vax_efficacy_dose1) * (Ca) * (beta0) * (1/N) * (Ia_unvax) +
## (1 - vax_efficacy_dose1) * (Cp) * (beta0) * (1/N) * (Ip_unvax) +
## (1 - vax_efficacy_dose1) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_unvax) +
## (1 - vax_efficacy_dose1) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_unvax) +
## (1 - vax_efficacy_dose1) * (Ca) * (beta0) * (1/N) * (Ia_vaxdose1) +
## (1 - vax_efficacy_dose1) * (Cp) * (beta0) * (1/N) * (Ip_vaxdose1) +
## (1 - vax_efficacy_dose1) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxdose1) +
## (1 - vax_efficacy_dose1) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxdose1) +
## (1 - vax_efficacy_dose1) * (Ca) * (beta0) * (1/N) * (Ia_vaxprotect1) +
## (1 - vax_efficacy_dose1) * (Cp) * (beta0) * (1/N) * (Ip_vaxprotect1) +
## (1 - vax_efficacy_dose1) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxprotect1) +
## (1 - vax_efficacy_dose1) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxprotect1) +
## (1 - vax_efficacy_dose1) * (Ca) * (beta0) * (1/N) * (Ia_vaxdose2) +
## (1 - vax_efficacy_dose1) * (Cp) * (beta0) * (1/N) * (Ip_vaxdose2) +
## (1 - vax_efficacy_dose1) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxdose2) +
## (1 - vax_efficacy_dose1) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxdose2) +
## (1 - vax_efficacy_dose1) * (Ca) * (beta0) * (1/N) * (Ia_vaxprotect2) +
## (1 - vax_efficacy_dose1) * (Cp) * (beta0) * (1/N) * (Ip_vaxprotect2) +
## (1 - vax_efficacy_dose1) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxprotect2) +
## (1 - vax_efficacy_dose1) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxprotect2)
## Row 5
## (1 - vax_efficacy_dose2) * (Ca) * (beta0) * (1/N) * (Ia_unvax) +
## (1 - vax_efficacy_dose2) * (Cp) * (beta0) * (1/N) * (Ip_unvax) +
## (1 - vax_efficacy_dose2) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_unvax) +
## (1 - vax_efficacy_dose2) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_unvax) +
## (1 - vax_efficacy_dose2) * (Ca) * (beta0) * (1/N) * (Ia_vaxdose1) +
## (1 - vax_efficacy_dose2) * (Cp) * (beta0) * (1/N) * (Ip_vaxdose1) +
## (1 - vax_efficacy_dose2) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxdose1) +
## (1 - vax_efficacy_dose2) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxdose1) +
## (1 - vax_efficacy_dose2) * (Ca) * (beta0) * (1/N) * (Ia_vaxprotect1) +
## (1 - vax_efficacy_dose2) * (Cp) * (beta0) * (1/N) * (Ip_vaxprotect1) +
## (1 - vax_efficacy_dose2) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxprotect1) +
## (1 - vax_efficacy_dose2) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxprotect1) +
## (1 - vax_efficacy_dose2) * (Ca) * (beta0) * (1/N) * (Ia_vaxdose2) +
## (1 - vax_efficacy_dose2) * (Cp) * (beta0) * (1/N) * (Ip_vaxdose2) +
## (1 - vax_efficacy_dose2) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxdose2) +
## (1 - vax_efficacy_dose2) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxdose2) +
## (1 - vax_efficacy_dose2) * (Ca) * (beta0) * (1/N) * (Ia_vaxprotect2) +
## (1 - vax_efficacy_dose2) * (Cp) * (beta0) * (1/N) * (Ip_vaxprotect2) +
## (1 - vax_efficacy_dose2) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxprotect2) +
## (1 - vax_efficacy_dose2) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxprotect2)
```

Start with a basic example model.

```
params <- read_params("ICU1.csv")
state <- make_state(params = params)
M <- McMasterPandemic::make_ratemat(state, params, sparse = TRUE)
test_model <- flexmodel(
params, state,
start_date = "2021-09-09", end_date = "2021-10-09"
)
```

Then we define sums of state variables and parameter vectors. Here we add the sum of the `I`

states and of the `ICU`

states using regular expressions and just a list of state names.

```
test_model = (test_model
%>% add_state_param_sum(sum = "Isum", summands = "^I[a-z]")
%>% add_state_param_sum(sum = "ICUsum", summands = c("ICUs", "ICUd"))
)
```

Then we can just proceed as we normally would, but with the option to refer to these sums (this is not meant to reflect real epidemiology, but to illustrate the interface).

```
options(MP_flex_spec_version = "0.0.2")
test_model = (test_model
%>% add_rate("E", "Ia", ~ (alpha) * (sigma) * (1 / Isum))
%>% add_rate("E", "Ip", ~ (1 - alpha) * (sigma))
%>% add_rate("Ia", "R", ~ (gamma_a))
%>% add_rate("Ip", "Im", ~ (mu) * (gamma_p))
%>% add_rate("Ip", "Is", ~ (1 - mu) * (gamma_p))
%>% add_rate("Im", "R", ~ (gamma_m))
%>% add_rate("Is", "H", ~ (1 - nonhosp_mort) * (phi1) * (gamma_s))
%>% add_rate("Is", "ICUs", ~ (1 - nonhosp_mort) * (1 - phi1) * (1 - phi2) * (gamma_s))
%>% add_rate("Is", "ICUd", ~ (1 - nonhosp_mort) * (1 - phi1) * (phi2) * (gamma_s))
%>% add_rate("Is", "D", ~ (nonhosp_mort) * (gamma_s))
%>% add_rate("ICUs", "H2", ~ (psi1))
%>% add_rate("ICUd", "D", ~ (psi2))
%>% add_rate("H2", "R", ~ (psi3))
%>% add_rate("H", "R", ~ (rho) / (1 / ICUsum))
%>% add_rate("Is", "X", ~ (1 - nonhosp_mort) * (phi1) * (gamma_s))
%>% add_rate("S", "E", ~
(Ia) * (beta0) * (1 / N) * (Ca) +
(Ip) * (beta0) * (1 / N) * (Cp) +
(Im) * (beta0) * (1 / N) * (Cm) * (1 - iso_m) +
(Is) * (beta0) * (1 / N) * (Cs) * (1 - iso_s))
%>% add_parallel_accumulators(c("X", "N", "P", "V"))
%>% update_tmb_indices()
)
```

And here are the indices that are relevant to the tracking of sums of state variables and parameters.

```
## $sumidx
## integer(0)
##
## $sumcount
## integer(0)
##
## $summandidx
## integer(0)
```

We describe these below.

We introduce three new index/count vectors.

`sumidx`

– indices, locating the variable to store each sum, into the`state_param`

vector defined in version 0.0.1`sumcount`

– vector containing the number of summands required for each sum`summandidx`

– indices, locating the summands, into the`state_param`

vector

Updates of time-varying parameters from the initial parameters on the `C++`

side. This is really a patch of 0.0.4, which didn’t work for calibration because it required updating the objective function (or its environment?) at every iteration of the optimizer. It is just easier to do it on the `C++`

and we need to do this anyways.

- Hazard simulation
- Returning the time-varying rate matrix elements from
`C++`

to`R`

so that they can be included in the results of`run_sim`

Define the sums of the rows of the rate matrix. \[ r_i = \sum_{j=1}^n M_{ij} \] Define the elements of a vector of exponentiated row sums \[ \rho_i = \exp(-r_i) \]

Define the elements of a normalized state vector. \[ \tilde{s}_i = \begin{cases} 0 & r_i = 0 \\ \frac{s_i}{r_i} & \text{otherwise} \end{cases} \] With these definitions we can define the modified flow matrix. \[ F_{ij} = \begin{cases} M_{ij}\tilde{s}_i(1-\rho_i) & i \ne j \\ 0 & \text{otherwise} \end{cases} \] This modified flow matrix can now be used in the same way as the unmodified flow matrix to produce state variable updates following spec version 0.0.2.

A Boolean argument needs to be added to the interface, indicating whether or not the hazard-based flow matrix is used rather than the simple flow matrix.

The data structure is equivalent to 0.0.4.

We begin with simple piece-wise constant time-variation. *Piece-wise constant* means that each time-varying parameter is associated with a sequence of break-points at which the value of the parameter changes – at all other times the parameter is constant. We plan to relax the *piece-wise constant* restriction in later versions.

Patch of the spec from 0.0.3 on piece-wise constant time-varying parameters.

*update_indices*(**removed**): a second set of*from**to*, etc for need-to-update elements proposed in spec 0.0.2 is removed in this proposal because the newly introduced*upateidx*provides the information needed.*updateidx*(**added**): a vector of indices into vectors*from*,*to*, and*count*of those elements in the rate matrix that need to be updated. It includes the indices of the elements that depend on either the state vector (0.0.2), or time-varying parameters (0.0.3 or 0.0.4), or both. At each simulation step, we will update only those elements specified by*updateidx*. The drawback of this design is that elements that only depends on piece-wise parameters (0.0.3) are updated at each step although they only need to be updated at certain breaks.*breaks*(**added**): vector of all the breaks.- *count_of_tv_at_breaks (
**added**): vector of number of time-varying parameters that change at each break. *tv_spi*(**added**): vector of indices into vector*sp*of those time-varying parameters in the order of breaks as major and parameters as minor.*tv_val*(**added**): vector of new values corresponding to*tv_spi*.

This version was an early attempt to introduce parameters that can vary at each simulation step. It was never implemented on the `C++`

-side, and has been superseded by version 0.0.4

In the non-time-varying case of version 0.0.2, we have scalar valued parameters. These parameters are constant in simulation time. We now consider parameters that can vary at any particular simulation step.

Consider a focal scalar-valued time-varying parameter \(u\). In the non-time-varying case this scalar value, \(u\), is an element of the parameter vector, \(\theta\).

\[ \theta = [..., u, ...] \]

In the time-varying case, there is a series of values, \(u_0, u_1, ..., u_n\) associated with an initial value \(u = u_0\) and the values, \(u_1, ..., u_n\), at \(n_u\) break-points, \(t_1, ..., t_{n_u}\). We store these additional values by expanding the parameter vector to make room for them.

\[ \theta = [..., u_0, u_1, ..., u_n, ...] \]

Each simulation step is indexed \(t = 1, ..., T\), where \(T\) is the number of simulation steps. If the time-varying parameter, \(u\), is required at time \(t\) to update an element of the rate-matrix, the value of \(u\) that is used is \(u_i\) such that \(t \in [t_i, t_{i+1})\).

We follow the `params_timevar`

argument to `run_sim`

in `McMasterPandemic`

, which allows the user to specify a data frame with the following columns.

- Date
- Symbol
- Value
- Type

The definition of these columns at the time of writing is given here.

The user must also specify the starting and ending calendar dates of the simulation.

An example model with time variation can be defined as follows.

```
options(MP_flex_spec_version = "0.0.3")
params = read_params('ICU1.csv')
state = make_state(params = params)
model = (
flexmodel(
params, state,
start_date = '2021-08-26', end_date = "2021-09-26",
timevar_piece_wise = data.frame(
Date = c("2021-09-01", "2021-09-15", "2021-09-10", "2021-08-28"),
Symbol = c("mu", "beta0", "beta0", "mu"),
Value = c(0.5, 0.1, 2, 0.2),
Type = c("rel_prev", "rel_orig", "rel_prev", "rel_orig")
))
%>% add_rate("E", "Ia", ~ (alpha) * (sigma))
%>% add_rate("E", "Ip", ~ (1 - alpha) * (sigma))
%>% add_rate("Ia", "R", ~ (gamma_a))
%>% add_rate("Ip", "Im", ~ (mu) * (gamma_p))
%>% add_rate("Ip", "Is", ~ (1 - mu) * (gamma_p))
%>% add_rate("Im", "R", ~ (gamma_m))
%>% add_rate("Is", "H", ~ (1 - nonhosp_mort) * (phi1) * (gamma_s))
%>% add_rate("Is", "ICUs", ~
(1 - nonhosp_mort) * (1 - phi1) * (1 - phi2) * (gamma_s))
%>% add_rate("Is", "ICUd", ~
(1 - nonhosp_mort) * (1 - phi1) * (phi2) * (gamma_s))
%>% add_rate("Is", "D", ~ (nonhosp_mort) * (gamma_s))
%>% add_rate("ICUs", "H2", ~ (psi1))
%>% add_rate("ICUd", "D", ~ (psi2))
%>% add_rate("H2", "R", ~ (psi3))
%>% add_rate("H", "R", ~ (rho))
%>% add_rate("Is", "X", ~ (1 - nonhosp_mort) * (phi1) * (gamma_s))
%>% add_rate("S", "E", ~
(Ia) * (beta0) * (1/N) * (Ca) +
(Ip) * (beta0) * (1/N) * (Cp) +
(Im) * (beta0) * (1/N) * (Cm) * (1-iso_m) +
(Is) * (beta0) * (1/N) * (Cs) * (1-iso_s))
%>% add_parallel_accumulators(c("X", "N", "P", "V"))
%>% update_tmb_indices()
)
```

Here we make a distinction between indices and time-indices. An index is a 1-based integer identifying a value within a vector. A time-index is a 1-based integer identifying the simulation step.

`tvi_spi`

: vector of indices for each time varying factor, identifying the elements of the`spi`

vector from version 0.0.1`n_breaks`

: vector of the number of time breaks for each time varying factor`t_breaks`

: vector of time-indices locating all time breaks for all time varying factors, organized such that time-indices associated with the same factor are grouped together and time-indices are all ascending

These indices and counts can be used to update the `spi`

vector at each simulation step. Here is an example in `R`

.

```
options(MP_flex_spec_version = "0.0.3")
tvi_spi = c(1L, 3L, 6L, 10L, 14L, 19L)
n_breaks = c(3L, 3L, 2L, 2L, 2L, 2L)
t_breaks = c(
2L, 6L, 17L, # factor 1 has 3 break-points
2L, 6L, 17L, # factor 2 has 3 break-points
15L, 20L, # factor 3 has 2 break-points
15L, 20L, # factor 4 has 2 break-points
15L, 20L, # factor 5 has 2 break-points
15L, 20L # factor 6 has 2 break-points
)
# number of time varying factors
n = length(tvi_spi)
# initialize a break point counter. each element of t_breaks_t
# is associated with a time varying factor. each time a break
# point is encountered, increment the corresponding element in
# this counter
t_breaks_t = rep(0, n)
# indices into the state param vector for all factors, including
# those that are not time varying
spi = c(30L, 27L, 30L, 27L, 3L, 15L, 33L, 18L, 4L, 15L, 33L, 19L, 5L,
15L, 33L, 20L, 36L, 6L, 15L, 33L, 21L, 36L)
print(spi)
```

`## [1] 30 27 30 27 3 15 33 18 4 15 33 19 5 15 33 20 36 6 15 33 21 36`

```
# loop over time steps
for(t in 1:30) {
# loop over time-varying factors
for(i in 1:n) {
# only do something if factor i has a break-point at time t
break_now = t == t_breaks[1 + t_breaks_t[i] + sum(n_breaks[1:(i-1)])]
if(isTRUE(break_now)) {
# increment the break-point counter
t_breaks_t[i] = t_breaks_t[i] + 1
# increment the spi vector for the ith time varying factor
spi[tvi_spi[i]] = spi[tvi_spi[i]] + 1
}
}
print(spi)
# do the rest of what needs to be done in this simulation step
# (e.g. update the rate matrix, update the state vector)
# ...
}
```

```
## [1] 30 27 30 27 3 15 33 18 4 15 33 19 5 15 33 20 36 6 15 33 21 36
## [1] 31 27 31 27 3 15 33 18 4 15 33 19 5 15 33 20 36 6 15 33 21 36
## [1] 31 27 31 27 3 15 33 18 4 15 33 19 5 15 33 20 36 6 15 33 21 36
## [1] 31 27 31 27 3 15 33 18 4 15 33 19 5 15 33 20 36 6 15 33 21 36
## [1] 31 27 31 27 3 15 33 18 4 15 33 19 5 15 33 20 36 6 15 33 21 36
## [1] 32 27 32 27 3 15 33 18 4 15 33 19 5 15 33 20 36 6 15 33 21 36
## [1] 32 27 32 27 3 15 33 18 4 15 33 19 5 15 33 20 36 6 15 33 21 36
## [1] 32 27 32 27 3 15 33 18 4 15 33 19 5 15 33 20 36 6 15 33 21 36
## [1] 32 27 32 27 3 15 33 18 4 15 33 19 5 15 33 20 36 6 15 33 21 36
## [1] 32 27 32 27 3 15 33 18 4 15 33 19 5 15 33 20 36 6 15 33 21 36
## [1] 32 27 32 27 3 15 33 18 4 15 33 19 5 15 33 20 36 6 15 33 21 36
## [1] 32 27 32 27 3 15 33 18 4 15 33 19 5 15 33 20 36 6 15 33 21 36
## [1] 32 27 32 27 3 15 33 18 4 15 33 19 5 15 33 20 36 6 15 33 21 36
## [1] 32 27 32 27 3 15 33 18 4 15 33 19 5 15 33 20 36 6 15 33 21 36
## [1] 32 27 32 27 3 16 33 18 4 16 33 19 5 16 33 20 36 6 16 33 21 36
## [1] 32 27 32 27 3 16 33 18 4 16 33 19 5 16 33 20 36 6 16 33 21 36
## [1] 33 27 33 27 3 16 33 18 4 16 33 19 5 16 33 20 36 6 16 33 21 36
## [1] 33 27 33 27 3 16 33 18 4 16 33 19 5 16 33 20 36 6 16 33 21 36
## [1] 33 27 33 27 3 16 33 18 4 16 33 19 5 16 33 20 36 6 16 33 21 36
## [1] 33 27 33 27 3 17 33 18 4 17 33 19 5 17 33 20 36 6 17 33 21 36
## [1] 33 27 33 27 3 17 33 18 4 17 33 19 5 17 33 20 36 6 17 33 21 36
## [1] 33 27 33 27 3 17 33 18 4 17 33 19 5 17 33 20 36 6 17 33 21 36
## [1] 33 27 33 27 3 17 33 18 4 17 33 19 5 17 33 20 36 6 17 33 21 36
## [1] 33 27 33 27 3 17 33 18 4 17 33 19 5 17 33 20 36 6 17 33 21 36
## [1] 33 27 33 27 3 17 33 18 4 17 33 19 5 17 33 20 36 6 17 33 21 36
## [1] 33 27 33 27 3 17 33 18 4 17 33 19 5 17 33 20 36 6 17 33 21 36
## [1] 33 27 33 27 3 17 33 18 4 17 33 19 5 17 33 20 36 6 17 33 21 36
## [1] 33 27 33 27 3 17 33 18 4 17 33 19 5 17 33 20 36 6 17 33 21 36
## [1] 33 27 33 27 3 17 33 18 4 17 33 19 5 17 33 20 36 6 17 33 21 36
## [1] 33 27 33 27 3 17 33 18 4 17 33 19 5 17 33 20 36 6 17 33 21 36
```

Each row in this output is a time step and each column is the `spi`

index for a particular factor. Notice that only six columns change through time, and these are associated with the six time-varying factors. Columns 1 and 3 are associated with the same time-varying parameter, and so they change at the same break-points. The same is true of columns 6, 10, 14, and 19, which are associated with another time-varying parameter.

Iterated updates of the state vector.

- No parameter varies throughout a simulation
- Simple \(dt = 1\) simulation (i.e. unit time step,
`do_exponential == FALSE`

, and`do_hazard == FALSE`

) - No stochasticity
`MP_badsum_action = 'ignore'`

- No updates optimization
- No model structure (e.g. vaccination, age-structure, testing)
- Assumptions 2-4 from 0.0.1

Let \(s\) be the state vector and \(M\) be the rate matrix. Define the flows matrix, \(F\), to have the same dimensions as \(M\). The elements of \(F\) are given by the following.

\[ F_{ij} = M_{ij} s_i \]

Let \(f_{\text{in}}\) and \(f_{\text{out}}\) be the inflow and outflow vectors, given by the column sums and row sums of \(F\) respectively. If the use specifies some states as being parallel accumulators, then the columns associated with those special states are removed from \(F\) before the row sums are taken to compute \(f_{\text{out}}\).

With these definitions in place the state vector update at each iteration is given by the following.

\[ s \to s - f_{\text{out}} + f_{\text{in}} \] Here is a summary of the stages of a single simulation step under this model.

- Update the rate matrix following version 0.0.1
- Compute the flow matrix
- Compute the inflow vector
- Compute the outflow vector, making sure to ignore the flow matrix columns associated with parallel accumulators
- Produce a new state by subtracting the outflows and adding the inflows
- Save every updated state vector

In addition to the interface elements specified in 0.0.1, the user needs to provide some additional information. We need to provide a character vector of regex patterns with which to find the indices of the parallel accumulators amongst the state names (e.g. X, N, P, V). For example, one way to provide such a vector to an existing model would be the following.

```
options(MP_flex_spec_version = "0.0.2")
model = add_parallel_accumulators(model, c("X", "N", "P", "V"))
```

The user must specify the number of state vector updates.

The following pieces of information need to be added to those described in Data Structure (0.0.1) in order to make state vector updates.

Modified copies of the following integer vectors described in 0.0.1:

`spi`

,`modifier`

,`from`

,`to`

,`count`

- These copies will only contain indices necessary for updating the rate matrix elements that could potentially change at each simulation step
- Currently this will include non-zero rate matrix elements that depend on at least one state variable (those that depend on parameters only will not need to be updated given that this version of the spec does not allow for time-varying parameters)

A vector,

`par_accum_indices`

of indices into the columns of the flows matrix, identifying the states that are parallel accumulatorsThe number of state vector updates to simulate

Update the non-zero elements of a rate matrix on the `TMB`

/`C++`

side using a restricted set of operations (complements, inverses, sums, and products). The update formula must be specified separately for each non-zero element.

- The state vector is not actually updated, making this a more-or-less useless spec (but it provides a good ‘warm-up’ and sanity check)
- A
`param_pansim`

and a`state_pansim`

object exists or can be constructed - If
`make_state`

is used to create the`state_pansim`

object,`vaxify`

,`ageify`

, and`testify`

are all set to`FALSE`

- There is no model structure (e.g. vaccination, age-structure, testing)

Any element, \(x\), of either the parameter or state vector can be used to define a *factor* in one of the following three forms.

- Identity: \(x\)
- Complement: \(1-x\)
- Inverse: \(1/x\)

We collect these user-defined factors into a factor vector, \(y\). Factors can be repeated in \(y\) if required. Any number of factors can be multiplied together using `*`

to produce a *product*. Any number of factors and products can be added together using `+`

.

There is a higher level nested structure associated with the factor vector, \(y\).

- All factors associated with the, \(i\)th, non-zero rate matrix element, \(M_{(i)}\), are grouped together in a contiguous block
- Within the \(i\)th block, all factors associated with the \(j\)th product (\(j = 1 ... n_i\)) in that block are grouped together in a contiguous sub-block
- Within the \(i,j\)th sub-block, all factors are given an index, \(k = 1 ... m_{ij}\)

With these definitions, the dependence of any non-zero rate matrix element on the parameters and state variables is given by the following expression.

\[ M_{(i)} = \sum_{j=1}^{n_i} \prod_{k=1}^{m_{ij}} y_{ijk} \]

where \(y_{ijk}\) is the \(k\)th factor associated with the \(j\)th product associated with the \(i\)th non-zero rate matrix element.

Users can define the structure of a rate matrix with a list of expressions, each determining the parameter-dependence and/or state-dependence of a single non-zero rate matrix element. The standard `McMasterPandemic`

introductory example model can be defined as follows.

```
options(MP_flex_spec_version = "0.0.1")
params = read_params('ICU1.csv')
state = make_state(params = params)
model = (
flexmodel(params, state)
%>% add_rate("E", "Ia", ~ (alpha) * (sigma))
%>% add_rate("E", "Ip", ~ (1 - alpha) * (sigma))
%>% add_rate("Ia", "R", ~ (gamma_a))
%>% add_rate("Ip", "Im", ~ (mu) * (gamma_p))
%>% add_rate("Ip", "Is", ~ (1 - mu) * (gamma_p))
%>% add_rate("Im", "R", ~ (gamma_m))
%>% add_rate("Is", "H", ~ (1 - nonhosp_mort) * (phi1) * (gamma_s))
%>% add_rate("Is", "ICUs", ~
(1 - nonhosp_mort) * (1 - phi1) * (1 - phi2) * (gamma_s))
%>% add_rate("Is", "ICUd", ~
(1 - nonhosp_mort) * (1 - phi1) * (phi2) * (gamma_s))
%>% add_rate("Is", "D", ~ (nonhosp_mort) * (gamma_s))
%>% add_rate("ICUs", "H2", ~ (psi1))
%>% add_rate("ICUd", "D", ~ (psi2))
%>% add_rate("H2", "R", ~ (psi3))
%>% add_rate("H", "R", ~ (rho))
%>% add_rate("Is", "X", ~ (1 - nonhosp_mort) * (phi1) * (gamma_s))
%>% add_rate("S", "E", ~
(Ia) * (beta0) * (1/N) * (Ca) +
(Ip) * (beta0) * (1/N) * (Cp) +
(Im) * (beta0) * (1/N) * (Cm) * (1-iso_m) +
(Is) * (beta0) * (1/N) * (Cs) * (1-iso_m))
%>% update_tmb_indices()
)
```

The first step, `flexmodel`

, in this pipeline initializes the compartmental model by specifying objects containing the parameters and state variables respectively.

Each subsequent call to `add_rate`

defines the dependence of a single non-zero rate matrix element on parameters and state variables. When used in a pipeline, the `add_rate`

function takes three arguments.

`from`

– character string describing the state associated with the row of the rate matrix`to`

– character string describing the state associated with the column of the rate matrix`formula`

– one-sided model formula defining the dependence based on symbols associated named parameter and state vectorsAny single parameter or state variable name,

`x`

, can be placed in parentheses to produce a*factor*in the following three ways- Identity:
`(x)`

- Complement:
`(1-x)`

- Inverse:
`(1/x)`

- Identity:
Factors can be multiplied together with the

`*`

operator to produce a*product*Products and factors can be added together with the

`+`

operator to produce a non-zero rate matrix element

The final step of the pipeline adds the indices to be passed to `TMB`

/`C++`

that are described in the Data Structure section below.

The object created by the pipeline in the User Interface section above must contain at a minimum the following seven objects.

The initial value of a

`state_pansim`

object representing the vector,`state`

, containing the state variablesThe initial value of a

`param_pansim`

object representing the vector,`params`

, containing the parametersAn integer vector,

`spi`

, of 1-based indices into a concatenation,`state_param`

, of`state`

and`param`

- The ordering of the elements of
`state_param`

is arbitrary because several integer vectors defined below contain indices into these elements for book-keeping purposes on the`TMB`

/`C++`

side - However, typically it will make sense to at least keep the state variables together and the parameters together
- On the
`R`

side,`spi`

has the property that`state_param[spi]`

returns a vector of state variables and parameters in the order in which they occur in the factor vector defined above in the Mathematical Description section - Note that
`state_param[spi]`

does not return the factor vector itself – to get the factor vector from`state_param[spi]`

you would need to modify some of the elements by taking their complements and inverses

- The ordering of the elements of
An integer vector,

`modifier`

, of the same length as`spi`

Each element of

`modifier`

corresponds to a factor, as defined above in the Mathematical DescriptionThese elements bit-wise encode several pieces of information

- What elements of
`state_param[spi]`

need to transformed by complements or inverses - Whether complements or inverses should be used for those elements that need to be transformed
- What elements of
`state_param[spi]`

need to be multiplied together to take products

- What elements of
The binary expansion of these elements can each be represented with three bits, with each bit encoding different information

- The left-most bit is
`1`

if the corresponding factor is the first in a product that needs to be added to a previous product, and`0`

otherwise - The middle bit is
`1`

if the corresponding factor requires a complement and`0`

otherwise - The right-most bit is
`1`

if the corresponding factor requires an inverse and`0`

otherwise

- The left-most bit is

An integer vector,

`from`

, of 1-based indices, the \(i\)th element of which points to the row of the \(i\)th non-zero rate matrix elementAn integer vector,

`to`

, of 1-based indices, the \(i\)th element of which points to the column of the \(i\)th non-zero rate matrix elementAn integer vector,

`count`

, the \(i\)th element of which gives the number of factors used to calculate the \(i\)th non-zero rate matrix element- In terms of the Mathematical Description above, the elements of
`count`

store these sums \(\sum_{j=1}^{n_i}m_{ij}\)

- In terms of the Mathematical Description above, the elements of