McMasterPandemic is a modelling tool that was rapidly developed to provide timely insights into the Covid-19 Pandemic in Canada. We are currently refactoring this tool so that it is faster and more general. This refactoring will be done incrementally by implementing a series of versioned specifications that define each step. This document outlines these specifications, and provides a roadmap for planning future specifications.

Version Descriptions

v0.2.2

New Capabilities (0.2.2)

Log-linear time-variation of parameters. This feature generalizes the piece-wise time-variation of parameters to allow for the following.

Process error
Semi-parametric time-variation estimation

Mathematical Description (0.2.2)

This is a

Such parameters have their own parameters that determine a model of time-variation. These latter parameters get transformed to produce a time series of parameter values of length equal to the number of time steps. One simple thing to do would be to put these time series into the parameter vector, and when parameters are accessed use the index spi + t where spi is the index into the state_param vector that picks out the first parameter in the time series and t is the current time step.

\[ \log(u) = \log(u_0) + Xv \] where $u$ is a vector of length $T$, the number of time steps, $u_0$ is a scalar parameter, $v$ is a vector of length $D < T$ that parameterizes the model of time variation, and $X$ is a $T$-by-$D$ matrix of constant columns (e.g. indicator variables, spline bases). Note that logs of vectors are taken element-wise.

\[ \log(u_t) = \log(u_0) + \sum_{i=1}^Dx_{ti}v_i \] \[ u_t = u_0 \prod_{i=1}^D\exp\left(x_{ti}v_i\right) \] For example:

$D=2$ breakpoints
$v_1 = v_2 = \log\left(\frac{1}{2}\right) \approx -0.69$
$x_{ti} = \begin{cases} 1, & t \ge t_i \\ 0, & \text{otherwise} \end{cases}$

In this case we have the following time variation of $u$. \[ u_t = \begin{cases} u_0, & t < t_1 \\ \frac{u_0}{2}, & t_1 \le t < t_2 \\ \frac{u_0}{4}, & t \ge t_2 \end{cases} \]

v0.2.1

New Capabilities (0.2.1)

Lag-n differencing with variable lag. This will allow us to difference between time points with observed data, which is necessary when the frequency of data collection changes and other cases with uneven time-series.

Mathematical Description (0.2.1)

One way to frame this problem is to convert the scalar integer, $n$, into a vector of length equal to the number of simuluation iterations. Let $n_t$ be the differencing lag at time $t$, and $x_t$ and $y_t$ be the input and output variables of the differencing at this time.

\[ y_t = x_t - x_{t-n_t} \]

If $n_t = n$ for all $t$, then we have the special case of lag-n differencing in spec version 0.2.0.

For all times where $n_t = 0$, we will get $y_t = 0$ as well. This corresponds for example to delays in the reporting of incidence, where there are periods of several days over which we do not have any new reports – typically not because there are no new cases but because they have not yet been reported.

Consider a case where the frequency of reporting changes from $n = 1$ to $n = 7$ (daily to weekly) at a particular time, $\tau$. The following example takes $\tau = 101$

##      t n   x   y
## 1   98 1 295  NA
## 2   99 1 300   5
## 3  100 1 310  10
## 4  101 0 320   0
## 5  102 0 330   0
## 6  103 0 340   0
## 7  104 0 350   0
## 8  105 0 360   0
## 9  106 0 370   0
## 10 107 7 400  90
## 11 108 0 410   0
## 12 109 0 420   0
## 13 110 0 430   0
## 14 111 0 440   0
## 15 112 0 450   0
## 16 113 0 460   0
## 17 114 0 500 100

Data Structure (0.2.1)

lag_diff_sri remains unchanged from previous versions.

lag_diff_delay_n is changed from a vector to a matrix, with rows associated with the elements of lag_diff_sri one column for each iterations.

v0.2.0

New Capabilities (0.2.0)

Condensation and calibration on the C++ side.

Mathematical Description (0.2.0)

Condensation

Condensation is the process of summarizing the concatenated_state_vector so that it can be compared with available data. This comparison is done quantitatively through a negative log-likelihood function that measures how different the condensed concatenated_state_vector is from the data. The user can add any of the following variables to the condensed dataset.

Any state variable – already computed in state and stored in concatenated_state_vector
Element-wise sum of any set of state variables – already computed in sp but not stored
Any time-varying rate matrix element – already computed in ratemat and stored in concatenated_ratemat_nonzeros
Element-wise product of any two variables of type 1-3 – not computed
Element-wise sum of any variable of type 4 – not computed
Lag-n differences of any variables of type 1-5 – not computed
Convolutions of any variables of type 1-5 with a gamma-density kernel (see section below on computing convolutions) – not computed

The following are computed in classic MacPan.

All the S boxes
The sum over all the S boxes
The sum over all the E boxes
The sum over all the I boxes
The sum over all the H and H2 boxes
The sum over all the ICUs and ICUd boxes
The sum over all R boxes
The lag-1 difference of the total of all X boxes (with NA at the beginning) – called hosp
The sum over all X boxes
The lag-1 difference of the total of all D boxes (with NA at the beginning) – called death
The sum over all D boxes
All the foi columns
Incidence:
- compute foi times S box for each epi category
- compute row sums over the resulting columns
- compute the convolution (see section below on computing convolutions) of all of these columns (including the row sums column) – called report

Computing Convolutions

The convolution is based on a gamma-density kernel.

The shape parameter of this gamma density depends on params['c_delay_cv']. \[ \text{shape} = \frac{1}{\text{c_delay_cv}^2} \] The scale parameter depends on params['c_delay_mean'] and the shape \[ \text{scale} = \text{(c_delay_mean)} \text{(c_delay_cv)}^2 \] Compute the vector $\gamma$ using the TMB pgamma function, which takes a vector q and scalar shape and scale arguments. \[ \gamma = \text{pgamma}(q, \text{shape}, \text{scale}) \]

where $q$ is a constant integer vector that gets created in the following way in R.

qmax = ceiling(qgamma(0.95, shape, scale))
q = 1:qmax

We will avoid computing qmax in C++ because it could introduce a discontinuity that could cause automatic differentiation to fail. This difference between refactored and classic MacPan only has the potential to cause numerical differences whenever params['c_delay_cv'] and params['c_delay_mean'] are varying – typically because they are in opt_pars – but I have not yet seen any examples where this is the case. To guarantee matched results at least for the initial parameter vector we will set qmax as follows on the R-side and pass it through a DATA_ macro.

qmax = ceiling(qgamma(0.95, 1/params[["c_delay_cv"]]^2, params[["c_delay_mean"]] * params[["c_delay_cv"]]^2))`

The delay kernel for the convolution is a numeric vector given by the following expression. \[ \kappa_i = (\text{c_prop})\frac{\delta_i}{\sum_j\delta_j} \] where $\delta_j = \gamma_{j+1} - \gamma_j$ and $\text{c_prop}$ is a scalar in the parameter vector.

The convolution, $z$, of the time series of any state variable, $x$, is given by the following. \[ z_{i+m-1} = \sum_{j = 1}^m \kappa_{m-j+1} x_{i+j-1} \] where $i = 1,...,n-m$, $m$ is the length of $\kappa$, and $n$ is the number of time steps (length of the input vector $x$). Note that the first $m-1$ elements of $z$ remain undefined and so they do not need to be stored on the C++-side because these $m-1$ missing values can be added on the R-side.

Condensation Algorithm

In Progress: in collaboration with Weiguang

Condensation is the process of summarizing the history of the simulations. Before solving the condensation problem we should refactor how this history is currently summarized. Currently the concatenated_state_vector and concatenated_ratemat_nonzeros contain some history information, but it will be awkward to continue with this strategy. Instead we will introduce a matrix called simulation_history with rows corresponding to simulation steps and columns corresponding to the following (in the following order).

State variables
Changing rate matrix elements (there is one such element in this model)
Sums
Factrs
Element-wise expressions involving the first four items
Lag-n differences of the first five items
Convolutions of the first five items

Items 1-4 are computed at simulation time, and simply need to be added to the simulation_history matrix at each iteration.

Items 5-7 should be computed at the end of the simulation. These columns should be computed in the same order as they will appear in the simulation_history matrix. The input for these computations (5-7) will be stored in the following new data structures.

Item 5 – element-wise expressions:
- sr_count – vector the same length as the number of columns added in step 3a giving the number of simulation_history columns to use as input for each new column (note that input columns can be used more than once – see count vector for a similar structure)
- sri – indices into the columns of simulation_history to use as input (see spi vector for a similar structure)
- sr_modifier – bitwise integer encoding of the parse tree (see modifier for a similar data structure)
Item 6 – lag-n differences:
- lag_diff_sri – indices into the columns of simulation_history to use as input to lag-n difference operations
- lag_diff_delay_n – vector same length as lag_diff_sri containing the delay (i.e. the “n” in lag-n) of the lag for each lag-n difference operation
Item 7 – convolutions:
- conv_sri – indices into the columns of simulation_history to use as input to convolution operations
- conv_c_prop_idx, conv_c_delay_cv_idx, conv_c_mean_cv_idx – three vectors each the same length as conv_sri giving indices into params for extracting the parameters of the convolutions
- conv_qmax – vector the same length as conv_sri giving the maximum integer in the q vector for each convolution

Add simulation_history to the report. On the R-side we could start using simulation_history where we currently use concatenated_* vectors, and eventually we could drop them.

Negative Binomial Negative Log Likelihood

We compare the negative binomial negative log likelihood with observed_vector as data and condensed_state_vector as predictions. Apologies that the word ‘negative’ is being used in two different ways – the first refers to the negative binomial distribution, and the second to the fact that we are multiplying the log likelihood by negative one. In R we would write this as follows.

-1 * dnbinom(observed_vector, mu = condensed_state_vector, size = nb_disp, log = TRUE)

In TMB, this negative binomial density function should be used for this (I think … how does size on the R-side correspond to var in TMB?).

The first three arguments to this function are vectors of the same size.

observed_vector is an observed data stream that is provided by the user
condensed_state_vector is described in detail above
nb_disp is a vector of dispersion parameters, which is composed of values passed in through PARAMETER macros (see below for a description of how nb_disp is constructed)

The sum of these negative log likelihood values gives a loss function to be returned to the user.

TODO: how to handle clamping when we leave the support of the density? concerned about discontinuities.

Dispersion Parameters

We require an additional set of parameters to model negative binomial dispersion, nb_disp. The user has two options.

one dispersion parameter to be used in every element of nb_disp
one dispersion parameter is passed in for every variable in condensed_state_vector

Optional Prior Distribution Penalties

The user is allowed to specify a prior distribution for any element of params, tv_mult, or nb_disp. Initially only univariate normal prior distributions will be allowed, but this should be built to be extensible allowing for other distributions to be added in the future.

For each parameter with a prior, the user specifies the distribution (initially normal will be the only choice), and a parameter vector for the arguments of the corresponding density function. At each evaluation of the optimizer the log of the prior densities are subtracted from the negative log-likelihood function.

The TMB normal density function should be used. We should be extensible to allow future developers to add other distributions from here.

Optional Inverse Parameter Transformations

The user is allowed to express any value of params or tv_mult on one of several transformed scales. The corresponding inverse transformation is applied immediately as the parameter is read through the appropriate macro on the TMB-side. The user should be allowed to specify one of the following transformations.

log10
log
logit

The corresponding inverse transformations that would be applied after the macros are the following.

10^x
exp(x)
1/(1+exp(-x))

General Objective Function

Let $l_{\psi_i}\left(x_i(\theta); y_i\right)$ be a loss function where $x_i(\theta)$ is a simulated element of the history matrix, $y_i$ is an associated observed data point, and $\psi_i$ is a vector of loss function parameters. Note that we write the simulation values as functions of $\theta$ to indicate their dependence on parameters. Every element $i$ that belongs to the same column of the history matrix has the same value for $\psi_i$, but to simplify notation $\psi$ is sub-scripted by $i$. Let $r_{\phi_j}(\theta_j)$ be an optional regularization function for each parameter $\theta_j$ being optimized, where $\phi_j$ is a vector of regularization function parameters associated with parameter $j$. The objective function is then given by the following.

\[ L_{\psi, \phi}(\theta; y) = \sum_i l_{\psi_i}\left(x_i(\theta); y_i\right) + \sum_j r_{\phi_j}(\theta_j) \] Currently the only objective function that is offered is the negative log negative binomial density, but the infrastructure we build assumes that other loss functions will be allowed in the future. Similarly the only regularizing function that is available is the negative log normal density.

Each parameter, $\theta_j$, can be optionally passed into the objective function on a transformed scale. These transformations allow users to enforce limits on the domain of the parameter space (e.g. a logit transformation will keep parameter values between 0 and 1). The regularization functions are applied on the transformed scale, but the elements of the rate matrix (and therefore the simulations, $x$) are computed using the parameters on their original scale. Therefore, we have the following ordering of operations.

Parameter, $\theta_j$, is passed in to the objective function on the transformed scale, $f(\theta_j)$
The regularization function, $r_{\phi_j}(\theta_j)$, is computed and saved
The inverse transformation is applied to the parameter, $\theta_j$, and the result is used to overwrite the previous value in the c++ params vector
The simulations are made
The objective function, $L$, is computed, and this computation includes adding the saved values of the regularization functions
The objective function, $L$, is returned by the c++ function objective_function

Available Loss Functions

Negative log negative binomial density
- Loss function parameters: dispersion
- In TMB use dnbinom2 to compute the log negative binomial density
- The arguments for dnbinom2 can be computed as follows:
  - x – observed data
  - mu – simulated data
  - var – mu + mu^2 / disp, where disp is the dispersion parameter
  - give_log – 1
- Lower bound on the simulated values
  - Before passing the simulated data to the mu argument, the user may optionally choose to smoothly constrain the simulated data to be greater than a lower bound, eps, by applying the function mu_constrained = mu + eps*exp(-mu/eps)

Available Regularization Functions

No regularization
- Name to use in C++: flat
- Regularization function parameters: initial value ($\theta_\text{init}$)
- $r_{\theta_\text{init}}(\theta) = 0$
- The fact that the regularization parameter, $\theta_\text{init}$, has no effect on the regularization function is an unusual special case. When regularization is applied, the ‘location’ of the regularization function is used as the initial value for the optimization. In the case of no regularization, we are using the regularization parameters to contain the initial value. One interpretation of this regularization parameter is as the ‘peak’ of a flat prior.
Negative log of the normal density
- Name to use in C++: normal
- Regularization function parameters: mean ($\bar{\theta}$), standard deviation ($\sigma_\theta$)
- $r_{\mu_f, \sigma_f}(\theta) = -\log\left[\mathcal{N}(f(\theta), \mu_f, \sigma_f)\right]$, where $\mathcal{N}$ is the normal density
- In TMB the dnorm function should be used with the following arguments
  - x – $f(\theta)$ – value of the parameter on the transformed scale
  - mean – $\mu_{f(\theta)}$ – first regularization parameter
  - sd – $\sigma_{f(\theta)}$ – second regularization parameter
  - give_log ` – 1

The first regularization parameter for every regularization function is used as the starting value for the optimizer on the scale of the transformed parameter.

Available Parameter Transformations

identity
- index = 1
- trans = $f(x) = x$
- inverse = $f^{-1}(x) = x$
log
- index = 2
- trans = $f(x) = \log(x)$
- inverse = $f^{-1}(x) = e^x$
log10
- index = 3
- trans = $f(x) = log_{10}(x)$
- inverse = $f^{-1}(x) = 10^x$
logit
- index = 4
- trans = $f(x) = \log\left[\frac{x}{1-x}\right]$
- inverse = $f^{-1}(x) = \frac{1}{1 + e^{-x}}$
cloglog
- index = 5
- trans = $f(x) = \log\left[-\log(1-x)\right]$
- inverse = $f^{-1}(x) = 1 - \exp\left[-\exp(x)\right]$
inverse
- index = 6
- trans = $f(x) = \frac{1}{x}$
- inverse = $f^{-1}(x) = \frac{1}{x}$

Data Structure (0.2.0)

General Objective Function

We will pass the following additional data through to TMB to specify the objective function.

Variables: the following three integer vectors have one element per variable in the objective function and identify what loss function is used for each variable, as well as how many parameters those loss functions require
- obs_var_id – id giving the column in the history matrix associated with each variable in the objective function
- obs_loss_id – identifier of the loss function being used for each variable (currently only one – negative log negative binomial density)
- obs_loss_param_count – number of loss function parameters associated with each variable
Loss Function Parameters: the following vector has one element per loss function parameter (these are the $\psi_i$ parameters described above in the general objective function section)
- obs_spi_loss_param – index into sp containing the loss function parameters. the number of such parameters is sum(obs_loss_param_count)
Observed Data the following three vectors have one element for each observed data point, and identify what elements in the history matrix should be compared with each observed value (each element of these vectors corresponds to each $i$ in the description of the general objective function above)
- obs_time_step – row in the history matrix
- obs_history_col_id – column in the history matrix (match with obs_var_id above to determine the obs_loss_id and obs_loss_param_count)
- obs_value – value observed in data to compare with the associated element in the history matrix
- obs_do_sim_constraint – Boolean determining if the lower bound in obs_sim_lower_bound should be applied to the simulated data with smooth constraint function mu_constrained = mu + eps*exp(-mu/eps)
- obs_sim_lower_bound – lower bound applied if obs_do_sim_constraint is true
Optimized Parameters: the following vectors have one element per parameter that is being optimized
- opt_param_id – indices into the params vector giving the parameters to be optimized
- opt_trans_id – index identifying a transformation – see above for correspondence between indices and transformations
- opt_count_reg_params – number of regularization parameters associated with each parameter to be optimized
- opt_reg_family_id – index identifying a family of regularization functions – currently only two are available (flat, normal)
Optimized Time-Varying Parameters (not yet in scope): similar to optimized parameters but for tv_mult
- opt_tv_param_id – indices into the tv_mult vector giving the time varying multipliers to be optimized
- opt_tv_trans_id – index identifying a transformation – see above for correspondence between indices and transformations
- opt_tv_count_reg_params – number of regularization parameters associated with each parameter to be optimized
- opt_tv_reg_family_id – index identifying a family of regularization functions – currently only two are available (flat, normal)
Regularization Parameters:
- opt_reg_params – regularization parameters – the size of this vector is opt_count_reg_params.sum()
- opt_tv_reg_params – regularization parameters for tv_mult – the size of this vector is opt_tv_count_reg_params.sum()

Interface (0.2.0)

{opt-trans}_{parameter} ~ {reg-trans}_{baseline-prior-family}({hyperparameter1}, ...)

{opt-trans}_{parameter} ~ {reg-trans}_{time-varying-prior-family}({hyperparameter-vector1}, ...)

v0.1.2

New Capabilities (0.1.2)

Save common factors when building rate expressions

Mathematical Description (0.1.2)

Expressions involving sums and products of state variables and parameters, as well as their complements and inverses, can be saved in a vector, $\phi$. The elements of $\phi$ can then be used in expressions that define elements of the rate matrix.

Data Structure (0.1.2)

Four new integer vectors:

factr_spi – 1-based index into the elements of the sp vector that will contain the common factor
factr_count – vector the same length as factr_spi containing the number of elements in sp used to compute the common factor
factr_spi_compute – vector with sum(factr_count) elements giving the indices into the elements of the sp vector for extracting the elements used to compute the new factor
factr_modifier – vector with sum(factr_count) elements describing the operations to be performed for generating the common factor (see v0.0.1 for details on modifier vectors)

v0.1.1

New Capabilities (0.1.1)

This is a very big ‘patch’. The motivation for it is to get a first release that is potentially useful to PHAC. This requires three enhancements.

Generalization of the definition of the outflow vector
Construct the state vector from the parameter vector at the beginning of each simulation run
Move tv_mult to the parameters so that we can calibrate them if we need to do so

Mathematical Description (0.1.1)

Generalization of Outflow Vector

Let $f_i^{(out)}$ be the $i$th element of the outflow vector, which is computed as follows. \[ f_i^{(out)} = \sum_{j \in \Omega_i}F_{ij} \]

Where $\Omega_i$ is a subset of the indices into the state vector. Typically there are only a small number of unique index sets, with one $\Omega_i$ for many values of $i$ – for this reason the data structure for these indices only need to track each unique index set.

We can express the previous definition of the outflow vector (TODO: link to previous definition) in the terms of this generalization as a constant $\Omega = \Omega_i, \forall i$.

Make State Vector

In previous spec versions, the C++ side needs to get the initial state vector directly from R. The difficulty with this strategy is that the initial state vector depends on the parameter vector, so when a new parameter vector gets updated in an optimization or ensemble run it becomes necessary to drop back to R causing a substantial performance bottleneck. So here we add the ability to create the initial state vector from the parameter vector in C++.

What follows is a detailed description of the procedure, which makes reference to new items in the data structure (see Data Structure below).

We first check to see if a state vector has been passed – if not, do the following steps to create a new state vector
Initialize two state vectors with all zeros
- state – initializing state vector for the full non-linearized model
- lin_state – initializing state vector for the linearized model
Make a copy (lin_params) of the parameter vector (params)
Replace some user-specified elements of lin_params with user supplied constant values (lin_param_vals, lin_param_count, lin_param_idx – these can be found in model$tmb_indices$linearized_params)
Replace some user-specified elements of lin_state with elements of lin_params (df_state_par_idx, df_state_count, df_state_idx – these can be found in model$tmb_indices$disease_free)
Compute the Jacobian matrix (jac), which requires the following steps in a function that takes lin_state and returns an updated lin_state
- Create a rate matrix (lin_ratemat) from lin_state and lin_params using the existing make_ratemat_indices
- Create a flow matrix (lin_flow) from lin_state and lin_ratemat
- Create inflow vector (lin_inflow) from lin_flow by taking column sums
- Create outflow vector (lin_outflow) from lin_flow by the method above – Generalization of Outflow Vector (linearized_outflow_row_count, linearized_outflow_col_count, linearized_outflow_rows, linearized_outflow_cols)
- Update lin_state using lin_inflow and lin_outflow
Remove certain user-defined rows, columns, and elements from jac and lin_state to create a new vector eigvec (all_drop_eigen_idx – can be found in model$tmb_indices$initialization_mapping)
Compute the dominant eigenvector, eigvec of jac
Remove elements of eigvec to create eig_infected (eigen_drop_infected_idx – can be found in model$tmb_indices$initialization_mapping)
Normalize eig_infected to have elements that sum to one
Update state[all_to_infected_idx] = eig_infected * param[infected_idx] (infected_idx can be found in model$tmb_indices$initial_population)
Update state[susceptible_idx] = (1/susceptible_idx.size()) * (param[total_idx] - param[infected_idx]) (infected_idx and total_idx can be found in model$tmb_indices$initial_population)

Data Structure (0.1.1)

Generalization of Outflow Vector

Objects of class flexmodel contains a new element called tmb_indices$outflow, with the following elements

row_count –
col_count –
rows –
cols –

These indices encode information in the $\omega_i$ index sets (see Mathematical Description above).

This outflow item means that par_accum_indices is no longer required. Given that we do not need to worry about back compatibility yet, it will be dropped from this and all future spec versions.

Make State Vector

Objects of class flexmodel contains the following new elements.

linearized_params – indices to update some elements of lin_param before computing the Jacobian of the linearized model
- lin_param_vals – values used to fill the elements of lin_param
- lin_param_count – the number of elements of lin_param that will be updated with each value in lin_param_vals
- lin_param_idx – indices of lin_param to update
disease_free – indices to initialize lin_state to be nearly disease-free, by setting some of its elements equal to some elements of lin_param
- df_state_par_idx – indices of lin_param used to fill the elements of lin_state
- df_state_count – the number of elements of lin_state that will be updated with each parameter identified by df_state_par_ix
- df_state_idx – indices of lin_state to update
linearized_outflow – same as outflow but for the linearized model
initialization_mapping – indices into various subsets of the state vector and eigenvector that could be useful for initializing the state – see justification below this list
- all_to_eigen_idx – indices into state and lin_state for identifying states in eigvec
- all_to_infected_idx – indices into state and lin_state for identifying states in eig_infected
- eigen_to_infected_idx – indices into eigvec for identifying states in eig_infected
- all_drop_eigen_idx – indices into state and lin_state for identifying states not in eigvec
- all_drop_infected_idx – indices into state and lin_state for identifying states not in eig_infected
- eigen_drop_infected_idx – indices into eigvec for identifying states not in eig_infected
- susceptible_idx – indices into state and lin_state for identifying states associated with the initial susceptible population
initial_population – indices into the parameter vector for identifying the total number of individuals in the population and the initial total number of infected individuals in the population
- total_idx
- infected_idx

The justification for the initialization_mapping is given here. One potential source of confusion in the above process is the need to keep track of the different subsets of states. To help clarify this point, there are conceptually three types of state vectors containing nested subsets of states. 1. all_states – vectors containing all states 2. eigen_states – vectors containing all states of the linearized system that are included in the dominant eigenvector 3. infected_states – vectors containing all infected states, which are eigen_states that will be initialized using corresponding elements in the eigenvector

To make life a little easier for development on the C++ side we provide index vectors for converting between the three nested sets of states. 1. all_to_eigen_idx 2. all_to_infected_idx 3. eigen_to_infected_idx

And for completeness, in case it is useful, we also provide index vectors for dropping states from each kind of state vector. 1. all_drop_eigen_idx 2. all_drop_infected_idx 3. eigen_drop_infected_idx

v0.1.0

New Capabilities (0.1.0)

We bump major versions here to start working on model structure (e.g. vaccination, age, testing). We make two enhancements:

Introduce a new struc object class that allows for symbolic manipulation of matrices and vectors of parameters and state variables when defining rate matrix elements
The ability to refer to sums of state variables and parameters when defining expressions for rate matrix elements.

Mathematical Description (0.1.0)

struc objects

Sums of state variables and parameters

Within each simulation iteration, a user-defined number of sums of state variables and parameters are computed and stored immediately before the rate matrix is updated.

User Interface (0.1.0)

struc objects

As an example, the force of infection under the two-dose vaccination model can be given by the following struc objects.

Start by getting a set of vaccination parameters and state variables.

Create a symbolic vector of I-state variables, with 4 base states by 5 vaccination categories = 20 variables.

base_params <- read_params("PHAC.csv")
vax_params <- expand_params_vax(
  params = base_params,
  model_type = "twodose"
)
Istate = (McMasterPandemic:::expand_names(
  c('Ia', 'Ip', 'Im', 'Is'),   # = Icats
  attr(vax_params, 'vax_cat')) # = vax_cats
  %>% struc
)
as.matrix(Istate)

##       [,1]              
##  [1,] "(Ia_unvax)"      
##  [2,] "(Ip_unvax)"      
##  [3,] "(Im_unvax)"      
##  [4,] "(Is_unvax)"      
##  [5,] "(Ia_vaxdose1)"   
##  [6,] "(Ip_vaxdose1)"   
##  [7,] "(Im_vaxdose1)"   
##  [8,] "(Is_vaxdose1)"   
##  [9,] "(Ia_vaxprotect1)"
## [10,] "(Ip_vaxprotect1)"
## [11,] "(Im_vaxprotect1)"
## [12,] "(Is_vaxprotect1)"
## [13,] "(Ia_vaxdose2)"   
## [14,] "(Ip_vaxdose2)"   
## [15,] "(Im_vaxdose2)"   
## [16,] "(Is_vaxdose2)"   
## [17,] "(Ia_vaxprotect2)"
## [18,] "(Ip_vaxprotect2)"
## [19,] "(Im_vaxprotect2)"
## [20,] "(Is_vaxprotect2)"

Create a vector of base transmission rates associated with each of the 4 base I-states.

baseline_trans_rates =
  struc(
  'Ca',
  'Cp',
  '(1 - iso_m) * (Cm)',
  '(1 - iso_s) * (Cs)') *
  struc('(beta0) * (1/N)')
as.matrix(baseline_trans_rates)

##      [,1]                                  
## [1,] "(Ca) * (beta0) * (1/N)"              
## [2,] "(Cp) * (beta0) * (1/N)"              
## [3,] "(1 - iso_m) * (Cm) * (beta0) * (1/N)"
## [4,] "(1 - iso_s) * (Cs) * (beta0) * (1/N)"

Create a 5-by-5 block matrix of vaccine efficacies

vax_trans_red = struc_block(struc(
  '1',
  '1',
  '(1 - vax_efficacy_dose1)',
  '(1 - vax_efficacy_dose1)',
  '(1 - vax_efficacy_dose2)'),
  row_times = 1, col_times = 5)
as.matrix(vax_trans_red)

##      [,1]                       [,2]                      
## [1,] "(1)"                      "(1)"                     
## [2,] "(1)"                      "(1)"                     
## [3,] "(1 - vax_efficacy_dose1)" "(1 - vax_efficacy_dose1)"
## [4,] "(1 - vax_efficacy_dose1)" "(1 - vax_efficacy_dose1)"
## [5,] "(1 - vax_efficacy_dose2)" "(1 - vax_efficacy_dose2)"
##      [,3]                       [,4]                      
## [1,] "(1)"                      "(1)"                     
## [2,] "(1)"                      "(1)"                     
## [3,] "(1 - vax_efficacy_dose1)" "(1 - vax_efficacy_dose1)"
## [4,] "(1 - vax_efficacy_dose1)" "(1 - vax_efficacy_dose1)"
## [5,] "(1 - vax_efficacy_dose2)" "(1 - vax_efficacy_dose2)"
##      [,5]                      
## [1,] "(1)"                     
## [2,] "(1)"                     
## [3,] "(1 - vax_efficacy_dose1)"
## [4,] "(1 - vax_efficacy_dose1)"
## [5,] "(1 - vax_efficacy_dose2)"

And finally the force of infection vector, which looks like we really need to introduce the capability for common factors.

foi = kronecker(vax_trans_red, t(baseline_trans_rates)) %*% Istate
foi

## struc object with 5 rows and 1 column:
## 
## Row 1 
## (Ca) * (beta0) * (1/N) * (Ia_unvax) +
##  (Cp) * (beta0) * (1/N) * (Ip_unvax) +
##  (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_unvax) +
##  (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_unvax) +
##  (Ca) * (beta0) * (1/N) * (Ia_vaxdose1) +
##  (Cp) * (beta0) * (1/N) * (Ip_vaxdose1) +
##  (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxdose1) +
##  (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxdose1) +
##  (Ca) * (beta0) * (1/N) * (Ia_vaxprotect1) +
##  (Cp) * (beta0) * (1/N) * (Ip_vaxprotect1) +
##  (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxprotect1) +
##  (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxprotect1) +
##  (Ca) * (beta0) * (1/N) * (Ia_vaxdose2) +
##  (Cp) * (beta0) * (1/N) * (Ip_vaxdose2) +
##  (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxdose2) +
##  (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxdose2) +
##  (Ca) * (beta0) * (1/N) * (Ia_vaxprotect2) +
##  (Cp) * (beta0) * (1/N) * (Ip_vaxprotect2) +
##  (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxprotect2) +
##  (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxprotect2)
## Row 2 
## (Ca) * (beta0) * (1/N) * (Ia_unvax) +
##  (Cp) * (beta0) * (1/N) * (Ip_unvax) +
##  (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_unvax) +
##  (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_unvax) +
##  (Ca) * (beta0) * (1/N) * (Ia_vaxdose1) +
##  (Cp) * (beta0) * (1/N) * (Ip_vaxdose1) +
##  (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxdose1) +
##  (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxdose1) +
##  (Ca) * (beta0) * (1/N) * (Ia_vaxprotect1) +
##  (Cp) * (beta0) * (1/N) * (Ip_vaxprotect1) +
##  (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxprotect1) +
##  (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxprotect1) +
##  (Ca) * (beta0) * (1/N) * (Ia_vaxdose2) +
##  (Cp) * (beta0) * (1/N) * (Ip_vaxdose2) +
##  (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxdose2) +
##  (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxdose2) +
##  (Ca) * (beta0) * (1/N) * (Ia_vaxprotect2) +
##  (Cp) * (beta0) * (1/N) * (Ip_vaxprotect2) +
##  (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxprotect2) +
##  (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxprotect2)
## Row 3 
## (1 - vax_efficacy_dose1) * (Ca) * (beta0) * (1/N) * (Ia_unvax) +
##  (1 - vax_efficacy_dose1) * (Cp) * (beta0) * (1/N) * (Ip_unvax) +
##  (1 - vax_efficacy_dose1) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_unvax) +
##  (1 - vax_efficacy_dose1) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_unvax) +
##  (1 - vax_efficacy_dose1) * (Ca) * (beta0) * (1/N) * (Ia_vaxdose1) +
##  (1 - vax_efficacy_dose1) * (Cp) * (beta0) * (1/N) * (Ip_vaxdose1) +
##  (1 - vax_efficacy_dose1) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxdose1) +
##  (1 - vax_efficacy_dose1) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxdose1) +
##  (1 - vax_efficacy_dose1) * (Ca) * (beta0) * (1/N) * (Ia_vaxprotect1) +
##  (1 - vax_efficacy_dose1) * (Cp) * (beta0) * (1/N) * (Ip_vaxprotect1) +
##  (1 - vax_efficacy_dose1) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxprotect1) +
##  (1 - vax_efficacy_dose1) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxprotect1) +
##  (1 - vax_efficacy_dose1) * (Ca) * (beta0) * (1/N) * (Ia_vaxdose2) +
##  (1 - vax_efficacy_dose1) * (Cp) * (beta0) * (1/N) * (Ip_vaxdose2) +
##  (1 - vax_efficacy_dose1) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxdose2) +
##  (1 - vax_efficacy_dose1) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxdose2) +
##  (1 - vax_efficacy_dose1) * (Ca) * (beta0) * (1/N) * (Ia_vaxprotect2) +
##  (1 - vax_efficacy_dose1) * (Cp) * (beta0) * (1/N) * (Ip_vaxprotect2) +
##  (1 - vax_efficacy_dose1) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxprotect2) +
##  (1 - vax_efficacy_dose1) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxprotect2)
## Row 4 
## (1 - vax_efficacy_dose1) * (Ca) * (beta0) * (1/N) * (Ia_unvax) +
##  (1 - vax_efficacy_dose1) * (Cp) * (beta0) * (1/N) * (Ip_unvax) +
##  (1 - vax_efficacy_dose1) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_unvax) +
##  (1 - vax_efficacy_dose1) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_unvax) +
##  (1 - vax_efficacy_dose1) * (Ca) * (beta0) * (1/N) * (Ia_vaxdose1) +
##  (1 - vax_efficacy_dose1) * (Cp) * (beta0) * (1/N) * (Ip_vaxdose1) +
##  (1 - vax_efficacy_dose1) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxdose1) +
##  (1 - vax_efficacy_dose1) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxdose1) +
##  (1 - vax_efficacy_dose1) * (Ca) * (beta0) * (1/N) * (Ia_vaxprotect1) +
##  (1 - vax_efficacy_dose1) * (Cp) * (beta0) * (1/N) * (Ip_vaxprotect1) +
##  (1 - vax_efficacy_dose1) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxprotect1) +
##  (1 - vax_efficacy_dose1) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxprotect1) +
##  (1 - vax_efficacy_dose1) * (Ca) * (beta0) * (1/N) * (Ia_vaxdose2) +
##  (1 - vax_efficacy_dose1) * (Cp) * (beta0) * (1/N) * (Ip_vaxdose2) +
##  (1 - vax_efficacy_dose1) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxdose2) +
##  (1 - vax_efficacy_dose1) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxdose2) +
##  (1 - vax_efficacy_dose1) * (Ca) * (beta0) * (1/N) * (Ia_vaxprotect2) +
##  (1 - vax_efficacy_dose1) * (Cp) * (beta0) * (1/N) * (Ip_vaxprotect2) +
##  (1 - vax_efficacy_dose1) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxprotect2) +
##  (1 - vax_efficacy_dose1) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxprotect2)
## Row 5 
## (1 - vax_efficacy_dose2) * (Ca) * (beta0) * (1/N) * (Ia_unvax) +
##  (1 - vax_efficacy_dose2) * (Cp) * (beta0) * (1/N) * (Ip_unvax) +
##  (1 - vax_efficacy_dose2) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_unvax) +
##  (1 - vax_efficacy_dose2) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_unvax) +
##  (1 - vax_efficacy_dose2) * (Ca) * (beta0) * (1/N) * (Ia_vaxdose1) +
##  (1 - vax_efficacy_dose2) * (Cp) * (beta0) * (1/N) * (Ip_vaxdose1) +
##  (1 - vax_efficacy_dose2) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxdose1) +
##  (1 - vax_efficacy_dose2) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxdose1) +
##  (1 - vax_efficacy_dose2) * (Ca) * (beta0) * (1/N) * (Ia_vaxprotect1) +
##  (1 - vax_efficacy_dose2) * (Cp) * (beta0) * (1/N) * (Ip_vaxprotect1) +
##  (1 - vax_efficacy_dose2) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxprotect1) +
##  (1 - vax_efficacy_dose2) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxprotect1) +
##  (1 - vax_efficacy_dose2) * (Ca) * (beta0) * (1/N) * (Ia_vaxdose2) +
##  (1 - vax_efficacy_dose2) * (Cp) * (beta0) * (1/N) * (Ip_vaxdose2) +
##  (1 - vax_efficacy_dose2) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxdose2) +
##  (1 - vax_efficacy_dose2) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxdose2) +
##  (1 - vax_efficacy_dose2) * (Ca) * (beta0) * (1/N) * (Ia_vaxprotect2) +
##  (1 - vax_efficacy_dose2) * (Cp) * (beta0) * (1/N) * (Ip_vaxprotect2) +
##  (1 - vax_efficacy_dose2) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxprotect2) +
##  (1 - vax_efficacy_dose2) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxprotect2)

Sums of state variables and parameters

Start with a basic example model.

params <- read_params("ICU1.csv")
state <- make_state(params = params)
M <- McMasterPandemic::make_ratemat(state, params, sparse = TRUE)
test_model <- flexmodel(
    params, state,
    start_date = "2021-09-09", end_date = "2021-10-09"
)

Then we define sums of state variables and parameter vectors. Here we add the sum of the I states and of the ICU states using regular expressions and just a list of state names.

test_model = (test_model
  %>% add_state_param_sum(sum = "Isum", summands = "^I[a-z]")
  %>% add_state_param_sum(sum = "ICUsum", summands = c("ICUs", "ICUd"))
)

Then we can just proceed as we normally would, but with the option to refer to these sums (this is not meant to reflect real epidemiology, but to illustrate the interface).

options(MP_flex_spec_version = "0.0.2")
test_model = (test_model
  %>% add_rate("E", "Ia", ~ (alpha) * (sigma) * (1 / Isum))
  %>% add_rate("E", "Ip", ~ (1 - alpha) * (sigma))
  %>% add_rate("Ia", "R", ~ (gamma_a))
  %>% add_rate("Ip", "Im", ~ (mu) * (gamma_p))
  %>% add_rate("Ip", "Is", ~ (1 - mu) * (gamma_p))
  %>% add_rate("Im", "R", ~ (gamma_m))
  %>% add_rate("Is", "H", ~ (1 - nonhosp_mort) * (phi1) * (gamma_s))
  %>% add_rate("Is", "ICUs", ~ (1 - nonhosp_mort) * (1 - phi1) * (1 - phi2) * (gamma_s))
  %>% add_rate("Is", "ICUd", ~ (1 - nonhosp_mort) * (1 - phi1) * (phi2) * (gamma_s))
  %>% add_rate("Is", "D", ~ (nonhosp_mort) * (gamma_s))
  %>% add_rate("ICUs", "H2", ~ (psi1))
  %>% add_rate("ICUd", "D", ~ (psi2))
  %>% add_rate("H2", "R", ~ (psi3))
  %>% add_rate("H", "R", ~ (rho) / (1 / ICUsum))
  %>% add_rate("Is", "X", ~ (1 - nonhosp_mort) * (phi1) * (gamma_s))
  %>% add_rate("S", "E", ~
                 (Ia) * (beta0) * (1 / N) * (Ca) +
                 (Ip) * (beta0) * (1 / N) * (Cp) +
                 (Im) * (beta0) * (1 / N) * (Cm) * (1 - iso_m) +
                 (Is) * (beta0) * (1 / N) * (Cs) * (1 - iso_s))
  %>% add_parallel_accumulators(c("X", "N", "P", "V"))
  %>% update_tmb_indices()
)

And here are the indices that are relevant to the tracking of sums of state variables and parameters.

test_model$tmb_indices$sum_indices

## $sumidx
## integer(0)
## 
## $sumcount
## integer(0)
## 
## $summandidx
## integer(0)

We describe these below.

Data Structure (0.1.0)

struc objects

Sums of state variables and parameters

We introduce three new index/count vectors.

sumidx – indices, locating the variable to store each sum, into the state_param vector defined in version 0.0.1
sumcount – vector containing the number of summands required for each sum
summandidx – indices, locating the summands, into the state_param vector

v0.0.6

New Capabilities (0.0.6)

Updates of time-varying parameters from the initial parameters on the C++ side. This is really a patch of 0.0.4, which didn’t work for calibration because it required updating the objective function (or its environment?) at every iteration of the optimizer. It is just easier to do it on the C++ and we need to do this anyways.

v0.0.5

New Capabilities (0.0.5)

Hazard simulation
Returning the time-varying rate matrix elements from C++ to R so that they can be included in the results of run_sim

Mathematical Description (0.0.5)

Define the sums of the rows of the rate matrix. \[ r_i = \sum_{j=1}^n M_{ij} \] Define the elements of a vector of exponentiated row sums \[ \rho_i = \exp(-r_i) \]

Define the elements of a normalized state vector. \[ \tilde{s}_i = \begin{cases} 0 & r_i = 0 \\ \frac{s_i}{r_i} & \text{otherwise} \end{cases} \] With these definitions we can define the modified flow matrix. \[ F_{ij} = \begin{cases} M_{ij}\tilde{s}_i(1-\rho_i) & i \ne j \\ 0 & \text{otherwise} \end{cases} \] This modified flow matrix can now be used in the same way as the unmodified flow matrix to produce state variable updates following spec version 0.0.2.

User Interface (0.0.5)

A Boolean argument needs to be added to the interface, indicating whether or not the hazard-based flow matrix is used rather than the simple flow matrix.

Data Structure (0.0.5)

The data structure is equivalent to 0.0.4.

v0.0.4

We begin with simple piece-wise constant time-variation. Piece-wise constant means that each time-varying parameter is associated with a sequence of break-points at which the value of the parameter changes – at all other times the parameter is constant. We plan to relax the piece-wise constant restriction in later versions.

New Capabilities (0.0.4)

Patch of the spec from 0.0.3 on piece-wise constant time-varying parameters.

Data Structure (0.0.4)

update_indices (removed): a second set of from to, etc for need-to-update elements proposed in spec 0.0.2 is removed in this proposal because the newly introduced upateidx provides the information needed.
updateidx (added): a vector of indices into vectors from, to, and count of those elements in the rate matrix that need to be updated. It includes the indices of the elements that depend on either the state vector (0.0.2), or time-varying parameters (0.0.3 or 0.0.4), or both. At each simulation step, we will update only those elements specified by updateidx. The drawback of this design is that elements that only depends on piece-wise parameters (0.0.3) are updated at each step although they only need to be updated at certain breaks.
breaks (added): vector of all the breaks.
*count_of_tv_at_breaks (added): vector of number of time-varying parameters that change at each break.
tv_spi (added): vector of indices into vector sp of those time-varying parameters in the order of breaks as major and parameters as minor.
tv_val (added): vector of new values corresponding to tv_spi.

v0.0.3

New Capabilities (0.0.3)

This version was an early attempt to introduce parameters that can vary at each simulation step. It was never implemented on the C++-side, and has been superseded by version 0.0.4

Assumptions (0.0.3)

Mathematical Description (0.0.3)

In the non-time-varying case of version 0.0.2, we have scalar valued parameters. These parameters are constant in simulation time. We now consider parameters that can vary at any particular simulation step.

Consider a focal scalar-valued time-varying parameter $u$. In the non-time-varying case this scalar value, $u$, is an element of the parameter vector, $\theta$.

\[ \theta = [..., u, ...] \]

In the time-varying case, there is a series of values, $u_0, u_1, ..., u_n$ associated with an initial value $u = u_0$ and the values, $u_1, ..., u_n$, at $n_u$ break-points, $t_1, ..., t_{n_u}$. We store these additional values by expanding the parameter vector to make room for them.

\[ \theta = [..., u_0, u_1, ..., u_n, ...] \]

Each simulation step is indexed $t = 1, ..., T$, where $T$ is the number of simulation steps. If the time-varying parameter, $u$, is required at time $t$ to update an element of the rate-matrix, the value of $u$ that is used is $u_i$ such that $t \in [t_i, t_{i+1})$.

User Interface (0.0.3)

We follow the params_timevar argument to run_sim in McMasterPandemic, which allows the user to specify a data frame with the following columns.

Date
Symbol
Value
Type

The definition of these columns at the time of writing is given here.

The user must also specify the starting and ending calendar dates of the simulation.

An example model with time variation can be defined as follows.

options(MP_flex_spec_version = "0.0.3")
params = read_params('ICU1.csv')
state = make_state(params = params)
model = (
  flexmodel(
    params, state,
    start_date = '2021-08-26', end_date = "2021-09-26",
    timevar_piece_wise = data.frame(
      Date = c("2021-09-01", "2021-09-15", "2021-09-10", "2021-08-28"),
      Symbol = c("mu", "beta0", "beta0", "mu"),
      Value = c(0.5, 0.1, 2, 0.2),
      Type = c("rel_prev", "rel_orig", "rel_prev", "rel_orig")
    ))
  %>% add_rate("E", "Ia", ~ (alpha) * (sigma))
  %>% add_rate("E", "Ip", ~ (1 - alpha) * (sigma))
  %>% add_rate("Ia", "R", ~ (gamma_a))
  %>% add_rate("Ip", "Im", ~ (mu) * (gamma_p))
  %>% add_rate("Ip", "Is", ~ (1 - mu) * (gamma_p))
  %>% add_rate("Im", "R", ~ (gamma_m))
  %>% add_rate("Is", "H", ~ (1 - nonhosp_mort) * (phi1) * (gamma_s))
  %>% add_rate("Is", "ICUs", ~ 
                 (1 - nonhosp_mort) * (1 - phi1) * (1 - phi2) * (gamma_s))
  %>% add_rate("Is", "ICUd", ~ 
                 (1 - nonhosp_mort) * (1 - phi1) * (phi2) * (gamma_s))
  %>% add_rate("Is", "D", ~ (nonhosp_mort) * (gamma_s))
  %>% add_rate("ICUs", "H2", ~ (psi1))
  %>% add_rate("ICUd", "D", ~ (psi2))
  %>% add_rate("H2", "R", ~ (psi3))
  %>% add_rate("H", "R", ~ (rho))
  %>% add_rate("Is", "X", ~ (1 - nonhosp_mort) * (phi1) * (gamma_s))
  %>% add_rate("S",  "E", ~
                 (Ia) * (beta0) * (1/N) * (Ca) +
                 (Ip) * (beta0) * (1/N) * (Cp) +
                 (Im) * (beta0) * (1/N) * (Cm) * (1-iso_m) +
                 (Is) * (beta0) * (1/N) * (Cs) * (1-iso_s))
  %>% add_parallel_accumulators(c("X", "N", "P", "V"))
  %>% update_tmb_indices()
)

Data Structure (0.0.3)

Here we make a distinction between indices and time-indices. An index is a 1-based integer identifying a value within a vector. A time-index is a 1-based integer identifying the simulation step.

tvi_spi: vector of indices for each time varying factor, identifying the elements of the spi vector from version 0.0.1
n_breaks: vector of the number of time breaks for each time varying factor
t_breaks: vector of time-indices locating all time breaks for all time varying factors, organized such that time-indices associated with the same factor are grouped together and time-indices are all ascending

These indices and counts can be used to update the spi vector at each simulation step. Here is an example in R.

options(MP_flex_spec_version = "0.0.3")
tvi_spi = c(1L, 3L, 6L, 10L, 14L, 19L)
n_breaks = c(3L, 3L, 2L, 2L, 2L, 2L)
t_breaks = c(
  2L, 6L, 17L, # factor 1 has 3 break-points 
  2L, 6L, 17L, # factor 2 has 3 break-points 
  15L, 20L,    # factor 3 has 2 break-points 
  15L, 20L,    # factor 4 has 2 break-points 
  15L, 20L,    # factor 5 has 2 break-points 
  15L, 20L     # factor 6 has 2 break-points 
)

# number of time varying factors
n = length(tvi_spi)

# initialize a break point counter. each element of t_breaks_t
# is associated with a time varying factor. each time a break
# point is encountered, increment the corresponding element in 
# this counter
t_breaks_t = rep(0, n)

# indices into the state param vector for all factors, including
# those that are not time varying
spi = c(30L, 27L, 30L, 27L, 3L, 15L, 33L, 18L, 4L, 15L, 33L, 19L, 5L,
  15L, 33L, 20L, 36L, 6L, 15L, 33L, 21L, 36L)
print(spi)

##  [1] 30 27 30 27  3 15 33 18  4 15 33 19  5 15 33 20 36  6 15 33 21 36

# loop over time steps
for(t in 1:30) {
  
  # loop over time-varying factors
  for(i in 1:n) {
    
    # only do something if factor i has a break-point at time t
    break_now = t == t_breaks[1 + t_breaks_t[i] + sum(n_breaks[1:(i-1)])]
    if(isTRUE(break_now)) {
      
      # increment the break-point counter
      t_breaks_t[i] = t_breaks_t[i] + 1
      
      # increment the spi vector for the ith time varying factor
      spi[tvi_spi[i]] = spi[tvi_spi[i]] + 1
      
    }
  }
  print(spi)
  
  # do the rest of what needs to be done in this simulation step
  # (e.g. update the rate matrix, update the state vector)
  # ...
}

##  [1] 30 27 30 27  3 15 33 18  4 15 33 19  5 15 33 20 36  6 15 33 21 36
##  [1] 31 27 31 27  3 15 33 18  4 15 33 19  5 15 33 20 36  6 15 33 21 36
##  [1] 31 27 31 27  3 15 33 18  4 15 33 19  5 15 33 20 36  6 15 33 21 36
##  [1] 31 27 31 27  3 15 33 18  4 15 33 19  5 15 33 20 36  6 15 33 21 36
##  [1] 31 27 31 27  3 15 33 18  4 15 33 19  5 15 33 20 36  6 15 33 21 36
##  [1] 32 27 32 27  3 15 33 18  4 15 33 19  5 15 33 20 36  6 15 33 21 36
##  [1] 32 27 32 27  3 15 33 18  4 15 33 19  5 15 33 20 36  6 15 33 21 36
##  [1] 32 27 32 27  3 15 33 18  4 15 33 19  5 15 33 20 36  6 15 33 21 36
##  [1] 32 27 32 27  3 15 33 18  4 15 33 19  5 15 33 20 36  6 15 33 21 36
##  [1] 32 27 32 27  3 15 33 18  4 15 33 19  5 15 33 20 36  6 15 33 21 36
##  [1] 32 27 32 27  3 15 33 18  4 15 33 19  5 15 33 20 36  6 15 33 21 36
##  [1] 32 27 32 27  3 15 33 18  4 15 33 19  5 15 33 20 36  6 15 33 21 36
##  [1] 32 27 32 27  3 15 33 18  4 15 33 19  5 15 33 20 36  6 15 33 21 36
##  [1] 32 27 32 27  3 15 33 18  4 15 33 19  5 15 33 20 36  6 15 33 21 36
##  [1] 32 27 32 27  3 16 33 18  4 16 33 19  5 16 33 20 36  6 16 33 21 36
##  [1] 32 27 32 27  3 16 33 18  4 16 33 19  5 16 33 20 36  6 16 33 21 36
##  [1] 33 27 33 27  3 16 33 18  4 16 33 19  5 16 33 20 36  6 16 33 21 36
##  [1] 33 27 33 27  3 16 33 18  4 16 33 19  5 16 33 20 36  6 16 33 21 36
##  [1] 33 27 33 27  3 16 33 18  4 16 33 19  5 16 33 20 36  6 16 33 21 36
##  [1] 33 27 33 27  3 17 33 18  4 17 33 19  5 17 33 20 36  6 17 33 21 36
##  [1] 33 27 33 27  3 17 33 18  4 17 33 19  5 17 33 20 36  6 17 33 21 36
##  [1] 33 27 33 27  3 17 33 18  4 17 33 19  5 17 33 20 36  6 17 33 21 36
##  [1] 33 27 33 27  3 17 33 18  4 17 33 19  5 17 33 20 36  6 17 33 21 36
##  [1] 33 27 33 27  3 17 33 18  4 17 33 19  5 17 33 20 36  6 17 33 21 36
##  [1] 33 27 33 27  3 17 33 18  4 17 33 19  5 17 33 20 36  6 17 33 21 36
##  [1] 33 27 33 27  3 17 33 18  4 17 33 19  5 17 33 20 36  6 17 33 21 36
##  [1] 33 27 33 27  3 17 33 18  4 17 33 19  5 17 33 20 36  6 17 33 21 36
##  [1] 33 27 33 27  3 17 33 18  4 17 33 19  5 17 33 20 36  6 17 33 21 36
##  [1] 33 27 33 27  3 17 33 18  4 17 33 19  5 17 33 20 36  6 17 33 21 36
##  [1] 33 27 33 27  3 17 33 18  4 17 33 19  5 17 33 20 36  6 17 33 21 36

Each row in this output is a time step and each column is the spi index for a particular factor. Notice that only six columns change through time, and these are associated with the six time-varying factors. Columns 1 and 3 are associated with the same time-varying parameter, and so they change at the same break-points. The same is true of columns 6, 10, 14, and 19, which are associated with another time-varying parameter.

v0.0.2

New Capabilities (0.0.2)

Iterated updates of the state vector.

Assumptions (0.0.2)

No parameter varies throughout a simulation
Simple $dt = 1$ simulation (i.e. unit time step, do_exponential == FALSE, and do_hazard == FALSE)
No stochasticity
MP_badsum_action = 'ignore'
No updates optimization
No model structure (e.g. vaccination, age-structure, testing)
Assumptions 2-4 from 0.0.1

Mathematical Description (0.0.2)

Let $s$ be the state vector and $M$ be the rate matrix. Define the flows matrix, $F$, to have the same dimensions as $M$. The elements of $F$ are given by the following.

\[ F_{ij} = M_{ij} s_i \]

Let $f_{\text{in}}$ and $f_{\text{out}}$ be the inflow and outflow vectors, given by the column sums and row sums of $F$ respectively. If the use specifies some states as being parallel accumulators, then the columns associated with those special states are removed from $F$ before the row sums are taken to compute $f_{\text{out}}$.

With these definitions in place the state vector update at each iteration is given by the following.

\[ s \to s - f_{\text{out}} + f_{\text{in}} \] Here is a summary of the stages of a single simulation step under this model.

Update the rate matrix following version 0.0.1
Compute the flow matrix
Compute the inflow vector
Compute the outflow vector, making sure to ignore the flow matrix columns associated with parallel accumulators
Produce a new state by subtracting the outflows and adding the inflows
Save every updated state vector

User Interface (0.0.2)

In addition to the interface elements specified in 0.0.1, the user needs to provide some additional information. We need to provide a character vector of regex patterns with which to find the indices of the parallel accumulators amongst the state names (e.g. X, N, P, V). For example, one way to provide such a vector to an existing model would be the following.

options(MP_flex_spec_version = "0.0.2")
model = add_parallel_accumulators(model, c("X", "N", "P", "V"))

The user must specify the number of state vector updates.

Data Structure (0.0.2)

The following pieces of information need to be added to those described in Data Structure (0.0.1) in order to make state vector updates.

Modified copies of the following integer vectors described in 0.0.1: spi, modifier, from, to, count
- These copies will only contain indices necessary for updating the rate matrix elements that could potentially change at each simulation step
- Currently this will include non-zero rate matrix elements that depend on at least one state variable (those that depend on parameters only will not need to be updated given that this version of the spec does not allow for time-varying parameters)
A vector, par_accum_indices of indices into the columns of the flows matrix, identifying the states that are parallel accumulators
The number of state vector updates to simulate

v0.0.1

New Capabilities (0.0.1)

Update the non-zero elements of a rate matrix on the TMB/C++ side using a restricted set of operations (complements, inverses, sums, and products). The update formula must be specified separately for each non-zero element.

Assumptions (0.0.1)

The state vector is not actually updated, making this a more-or-less useless spec (but it provides a good ‘warm-up’ and sanity check)
A param_pansim and a state_pansim object exists or can be constructed
If make_state is used to create the state_pansim object, vaxify, ageify, and testify are all set to FALSE
There is no model structure (e.g. vaccination, age-structure, testing)

Mathematical Description (0.0.1)

Any element, $x$, of either the parameter or state vector can be used to define a factor in one of the following three forms.

Identity: $x$
Complement: $1-x$
Inverse: $1/x$

We collect these user-defined factors into a factor vector, $y$. Factors can be repeated in $y$ if required. Any number of factors can be multiplied together using * to produce a product. Any number of factors and products can be added together using +.

There is a higher level nested structure associated with the factor vector, $y$.

All factors associated with the, $i$th, non-zero rate matrix element, $M_{(i)}$, are grouped together in a contiguous block
Within the $i$th block, all factors associated with the $j$th product ($j = 1 ... n_i$) in that block are grouped together in a contiguous sub-block
Within the $i,j$th sub-block, all factors are given an index, $k = 1 ... m_{ij}$

With these definitions, the dependence of any non-zero rate matrix element on the parameters and state variables is given by the following expression.

\[ M_{(i)} = \sum_{j=1}^{n_i} \prod_{k=1}^{m_{ij}} y_{ijk} \]

where $y_{ijk}$ is the $k$th factor associated with the $j$th product associated with the $i$th non-zero rate matrix element.

User Interface (0.0.1)

Users can define the structure of a rate matrix with a list of expressions, each determining the parameter-dependence and/or state-dependence of a single non-zero rate matrix element. The standard McMasterPandemic introductory example model can be defined as follows.

options(MP_flex_spec_version = "0.0.1")
params = read_params('ICU1.csv')
state = make_state(params = params)
model = (
  flexmodel(params, state)
  %>% add_rate("E", "Ia", ~ (alpha) * (sigma))
  %>% add_rate("E", "Ip", ~ (1 - alpha) * (sigma))
  %>% add_rate("Ia", "R", ~ (gamma_a))
  %>% add_rate("Ip", "Im", ~ (mu) * (gamma_p))
  %>% add_rate("Ip", "Is", ~ (1 - mu) * (gamma_p))
  %>% add_rate("Im", "R", ~ (gamma_m))
  %>% add_rate("Is", "H", ~ (1 - nonhosp_mort) * (phi1) * (gamma_s))
  %>% add_rate("Is", "ICUs", ~ 
                 (1 - nonhosp_mort) * (1 - phi1) * (1 - phi2) * (gamma_s))
  %>% add_rate("Is", "ICUd", ~ 
                 (1 - nonhosp_mort) * (1 - phi1) * (phi2) * (gamma_s))
  %>% add_rate("Is", "D", ~ (nonhosp_mort) * (gamma_s))
  %>% add_rate("ICUs", "H2", ~ (psi1))
  %>% add_rate("ICUd", "D", ~ (psi2))
  %>% add_rate("H2", "R", ~ (psi3))
  %>% add_rate("H", "R", ~ (rho))
  %>% add_rate("Is", "X", ~ (1 - nonhosp_mort) * (phi1) * (gamma_s))
  %>% add_rate("S",  "E", ~
                 (Ia) * (beta0) * (1/N) * (Ca) +
                 (Ip) * (beta0) * (1/N) * (Cp) +
                 (Im) * (beta0) * (1/N) * (Cm) * (1-iso_m) +
                 (Is) * (beta0) * (1/N) * (Cs) * (1-iso_m))
  %>% update_tmb_indices()
)

The first step, flexmodel, in this pipeline initializes the compartmental model by specifying objects containing the parameters and state variables respectively.

Each subsequent call to add_rate defines the dependence of a single non-zero rate matrix element on parameters and state variables. When used in a pipeline, the add_rate function takes three arguments.

from – character string describing the state associated with the row of the rate matrix
to – character string describing the state associated with the column of the rate matrix
formula – one-sided model formula defining the dependence based on symbols associated named parameter and state vectors
- Any single parameter or state variable name, x, can be placed in parentheses to produce a factor in the following three ways
  - Identity: (x)
  - Complement: (1-x)
  - Inverse: (1/x)
- Factors can be multiplied together with the * operator to produce a product
- Products and factors can be added together with the + operator to produce a non-zero rate matrix element

The final step of the pipeline adds the indices to be passed to TMB/C++ that are described in the Data Structure section below.

Data Structure (0.0.1)

The object created by the pipeline in the User Interface section above must contain at a minimum the following seven objects.

The initial value of a state_pansim object representing the vector, state, containing the state variables
The initial value of a param_pansim object representing the vector, params, containing the parameters
An integer vector, spi, of 1-based indices into a concatenation, state_param, of state and param
- The ordering of the elements of state_param is arbitrary because several integer vectors defined below contain indices into these elements for book-keeping purposes on the TMB/C++ side
- However, typically it will make sense to at least keep the state variables together and the parameters together
- On the R side, spi has the property that state_param[spi] returns a vector of state variables and parameters in the order in which they occur in the factor vector defined above in the Mathematical Description section
- Note that state_param[spi] does not return the factor vector itself – to get the factor vector from state_param[spi] you would need to modify some of the elements by taking their complements and inverses
An integer vector, modifier, of the same length as spi
- Each element of modifier corresponds to a factor, as defined above in the Mathematical Description
- These elements bit-wise encode several pieces of information
  - What elements of state_param[spi] need to transformed by complements or inverses
  - Whether complements or inverses should be used for those elements that need to be transformed
  - What elements of state_param[spi] need to be multiplied together to take products
- The binary expansion of these elements can each be represented with three bits, with each bit encoding different information
  - The left-most bit is 1 if the corresponding factor is the first in a product that needs to be added to a previous product, and 0 otherwise
  - The middle bit is 1 if the corresponding factor requires a complement and 0 otherwise
  - The right-most bit is 1 if the corresponding factor requires an inverse and 0 otherwise
An integer vector, from, of 1-based indices, the $i$th element of which points to the row of the $i$th non-zero rate matrix element
An integer vector, to, of 1-based indices, the $i$th element of which points to the column of the $i$th non-zero rate matrix element
An integer vector, count, the $i$th element of which gives the number of factors used to calculate the $i$th non-zero rate matrix element
- In terms of the Mathematical Description above, the elements of count store these sums $\sum_{j=1}^{n_i}m_{ij}$

Flexible Compartmental Modelling Specifications

Steve Walker, Weiguang Guan, Ben Bolker

Table of Contents

Versioning and Lifecycle

Version Summary

Frozen Versions

Recommended

Experimental Versions

Archived Versions

Version Descriptions

v0.2.2

New Capabilities (0.2.2)

Mathematical Description (0.2.2)

v0.2.1

New Capabilities (0.2.1)

Mathematical Description (0.2.1)

Data Structure (0.2.1)

v0.2.0

New Capabilities (0.2.0)

Mathematical Description (0.2.0)

Condensation

Computing Convolutions

Condensation Algorithm

Negative Binomial Negative Log Likelihood

Dispersion Parameters

Optional Prior Distribution Penalties

Optional Inverse Parameter Transformations

General Objective Function

Available Loss Functions

Available Regularization Functions

Available Parameter Transformations

Data Structure (0.2.0)

General Objective Function

Interface (0.2.0)

v0.1.2

New Capabilities (0.1.2)

Mathematical Description (0.1.2)

Data Structure (0.1.2)

v0.1.1

New Capabilities (0.1.1)

Mathematical Description (0.1.1)

Generalization of Outflow Vector

Make State Vector

Data Structure (0.1.1)

Generalization of Outflow Vector

Make State Vector

v0.1.0

New Capabilities (0.1.0)

Mathematical Description (0.1.0)

struc objects

Sums of state variables and parameters

User Interface (0.1.0)

struc objects

Sums of state variables and parameters

Data Structure (0.1.0)

struc objects

Sums of state variables and parameters

v0.0.6

New Capabilities (0.0.6)

v0.0.5

New Capabilities (0.0.5)

Mathematical Description (0.0.5)

User Interface (0.0.5)

Data Structure (0.0.5)

v0.0.4

New Capabilities (0.0.4)

Data Structure (0.0.4)

v0.0.3

New Capabilities (0.0.3)

Assumptions (0.0.3)

Mathematical Description (0.0.3)

User Interface (0.0.3)

Data Structure (0.0.3)

v0.0.2

New Capabilities (0.0.2)

Assumptions (0.0.2)

Mathematical Description (0.0.2)

User Interface (0.0.2)

Data Structure (0.0.2)

v0.0.1