5  Adding a new model

If you have read Section 3 and Section 4, you should have a pretty good idea of how the bmm package functions. Now it’s time to add your new model.

The good news are, you don’t have to add any of the files manually. The bmm package includes a function use_model_template() that generates all the files with templates for all necessary functions. Thus, you can focus on the important work of filling these templates with the relevant information without worrying about missing something critical. If you type ?use_model_template you will see the following description:

Function use_model_template()

Description

Create a file with a template for adding a new model (for developers)

Usage

use_model_template(
  model_name,
  custom_family = FALSE,
  stanvar_blocks = c("data", "tdata", "parameters", 
                     "tparameters", "model", "likelihood",
                     "genquant", "functions"),
  open_files = TRUE,
  testing = FALSE
)

Arguments

Argument Description
model_name A string with the name of the model. The file will be named bmm_model_model_name.R and all necessary functions will be created with the appropriate names and structure. The file will be saved in the ⁠R/⁠ directory
custom_family Logical; Do you plan to define a brms::custom_family()? If TRUE the function will add a section for the custom family, placeholders for the stan_vars and corresponding empty .stan files in ⁠inst/stan_chunks/⁠, that you can fill For an example, see the sdmSimple model in ⁠/R/bmm_model_sdmSimple.R⁠. If FALSE (default) the function will not add the custom family section nor stan files.
stanvar_blocks A character vector with the names of the blocks that will be added to the custom family section. See brms::stanvar() for more details. The default lists all the possible blocks, but it is unlikely that you will need all of them. You can specify a vector of only those that you need. The function will add a section for each block in the list
open_files Logical; If TRUE (default), the function will open the template files that were created in RStudio
testing Logical; If TRUE, the function will return the file content but will not save the file. If FALSE (default), the function will save the file

5.1 Example

Let’s add a new model called gcm. Let’s assume that you have tested the model in Stan and you have the Stan code ready. We want to define a custom family for the gcm model, and we want to define the following blocks: likelihood, functions (see ?brms::stanvar for an explanation of the blocks).

First you set up your system and git environment as described in Section 1. Then you can run the following code from RStudio in the root directory of the bmm package:

use_model_template("gcm", custom_family = TRUE, stanvar_blocks = c("likelihood", "functions"))

This will create the file bmm_model_gcm.R in the R/ directory and the files gcm_likelihood.stan and gcm_functions.stan in the inst/stan_chunks/ directory. The function will also open the files in RStudio. You will see the following output in the console:

• Modify 'inst/stan_chunks/gcm_likelihood.stan'
• Modify 'inst/stan_chunks/gcm_functions.stan'
• Modify 'R/model_gcm.R'

Now you can fill the files with the appropriate code. The stan files will be empty, but the R file will have the following structure:

#############################################################################!
# MODELS                                                                 ####
#############################################################################!
# see file 'R/bmm_model_mixture3p.R' for an example

.model_gcm <- function(resp_var1 = NULL, required_arg1 = NULL, required_arg2 = NULL, 
                       links = NULL, version = NULL, call = NULL, ...) {
   out <- structure(
     list(
       resp_vars = nlist(resp_var1),
       other_vars = nlist(required_arg1, required_arg2),
       domain = '',
       task = '',
       name = '',
       citation = '',
       version = version,
       requirements = '',
       parameters = list(),
       links = list(),
       fixed_parameters = list(),
       default_priors = list(par1 = list(), par2 = list()),
       void_mu = FALSE
     ),
     class = c('bmmodel', 'gcm'),
     call = call
   )
   if(!is.null(version)) class(out) <- c(class(out), paste0("gcm_",version))
   out$links[names(links)] <- links
   out
}
# user facing alias
# information in the title and details sections will be filled in
# automatically based on the information in the .model_gcm()$info
 
#' @title `r .model_gcm()$name`
#' @name Model Name#' @details `r model_info(.model_gcm())`
#' @param resp_var1 A description of the response variable
#' @param required_arg1 A description of the required argument
#' @param required_arg2 A description of the required argument
#' @param links A list of links for the parameters.
#' @param version A character label for the version of the model. Can be empty or NULL if there is only one version. 
#' @param ... used internally for testing, ignore it
#' @return An object of class `bmmodel`
#' @export
#' @examples
#' \dontrun{
#' # put a full example here (see 'R/model_mixture3p.R' for an example)
#' }
gcm <- function(resp_var1, required_arg1, required_arg2, links = NULL, version = NULL, ...) {
   call <- match.call()
   stop_missing_args()
   .model_gcm(resp_var1 = resp_var1, required_arg1 = required_arg1, 
              required_arg2 = required_arg2, links = links, version = version, 
              call = call, ...)
}


#############################################################################!
# CHECK_DATA S3 methods                                                  ####
#############################################################################!
# A check_data.* function should be defined for each class of the model.
# If a model shares methods with other models, the shared methods should be
# defined in helpers-data.R. Put here only the methods that are specific to
# the model. See ?check_data for details.
# (YOU CAN DELETE THIS SECTION IF YOU DO NOT REQUIRE ADDITIONAL DATA CHECKS)

#' @export
check_data.gcm <- function(model, data, formula) {
   # retrieve required arguments
   required_arg1 <- model$other_vars$required_arg1
   required_arg2 <- model$other_vars$required_arg2

   # check the data (required)

   # compute any necessary transformations (optional)

   # save some variables as attributes of the data for later use (optional)

   NextMethod('check_data')
}


#############################################################################!
# Convert bmmformula to brmsformla methods                               ####
#############################################################################!
# A bmf2bf.* function should be defined if the default method for consructing
# the brmsformula from the bmmformula does not apply (e.g if aterms are required).
# The shared method for all `bmmodels` is defined in bmmformula.R.
# See ?bmf2bf for details.
# (YOU CAN DELETE THIS SECTION IF YOUR MODEL USES A STANDARD FORMULA WITH 1 RESPONSE VARIABLE)

#' @export
bmf2bf.gcm <- function(model, formula) {
   # retrieve required response arguments
   resp_var1 <- model$resp_vars$resp_var1
   resp_var2 <- model$resp_vars$resp_arg2

   # set the base brmsformula based 
   brms_formula <- brms::bf(paste0(resp_var1," | ", vreal(resp_var2), " ~ 1" ),)

   # return the brms_formula to add the remaining bmmformulas to it.
   brms_formula
}


#############################################################################!
# CONFIGURE_MODEL S3 METHODS                                             ####
#############################################################################!
# Each model should have a corresponding configure_model.* function. See
# ?configure_model for more information.

#' @export
configure_model.gcm <- function(model, data, formula) {
   # retrieve required arguments
   required_arg1 <- model$other_vars$required_arg1
   required_arg2 <- model$other_vars$required_arg2

   # retrieve arguments from the data check
   my_precomputed_var <- attr(data, 'my_precomputed_var')

   # construct brms formula from the bmm formula
   formula <- bmf2bf(model, formula)

   # construct the family & add to formula object 
   gcm_family <- brms::custom_family(
     'gcm',
     dpars = c(),
     links = c(),
     lb = c(), # upper bounds for parameters
     ub = c(), # lower bounds for parameters
     type = '', # real for continous dv, int for discrete dv
     loop = TRUE, # is the likelihood vectorized
   )
   formula$family <- gcm_family

   # prepare initial stanvars to pass to brms, model formula and priors
   sc_path <- system.file('stan_chunks', package='bmm')
   stan_likelihood <- read_lines2(paste0(sc_path, '/gcm_likelihood.stan'))
   stan_functions <- read_lines2(paste0(sc_path, '/gcm_functions.stan'))

   stanvars <- stanvar(scode = stan_likelihood, block = 'likelihood') +
      stanvar(scode = stan_functions, block = 'functions')

   # return the list
   nlist(formula, data, stanvars)
}


#############################################################################!
# POSTPROCESS METHODS                                                    ####
#############################################################################!
# A postprocess_brm.* function should be defined for the model class. See 
# ?postprocess_brm for details

#' @export
postprocess_brm.gcm <- function(model, fit) {
   # any required postprocessing (if none, delete this section)
   fit
}

Now you have to:

  1. Fill the .model_gcm function with the appropriate code. This function should return a list with all the variables specified above. The class of the list should be c('bmmmodel', 'gcm'). Rename the response arguments and the other required arguments, or delete the other arguments if you do not have any. You can see an example in the model_sdm.R file. Specify the parameters of the model, the link functions, what if any parameters are fixed (and to what value). It’s crucial that you set default priors for every parameter of the model, which should be informed by knowledge in the field.

  2. Adjust the user-facing alias. Here you should only rename the required arguments and fill in the @examples section with a full example. Everything else will be filled in automatically based on the information in the .model_gcm function.

  3. Fill the check_data.gcm function with the appropriate code. This function should check the data and return the data. You may or may not need to compute any transformations or save some variables as attributes of the data.

  4. If necessary define the bmf2bf.gcm method to convert the bmmformula to a brmsformula. The first step for this is always to specify the response variable and additional response information. Keep in mind that brms automatically interprets this formula as the linear model formula for the mu parameter of your custom family. Currently, brms requires all custom families to have a mu parameter. However, we recommend to code this parameter as a void_mu, and fix the intercept of this parameter to zero using constant priors. This way, the bmmformula can be used to only specify the linear or non-linear model for the parameters of a bmmmodel. if your model has a single response variable, you can delete this section.

  5. Fill the configure_model.gcm function with the appropriate code. This function should construct the formula, the family, the stanvars. You can also retrieve any arguments you saved from the data check. Depending on your model, some of these parts might not be necessary. For example, for the mixture models (e.g. mixture3p), we construct a new formula, because we want to rename the arguments to make it easier for the user. For the sdmSimple model, we define the family ourselves, so we don’t need to change the formula.

    You need to fill information about your custom family, and then fill the STAN files with your STAN code. Conveniently, you don’t have to edit lines 137-145: loading the STAN files and setting up the stanvars is set up automatically when calling the use_model_template function. Should you need to add more STAN files after you created the template, you can add the files in init/stan_chunks/ manually and edit those lines to additionally load the manually added files.

  6. Fill the postprocess_brm.gcm function with the appropriate code. By postprocessing, we mean changes to the fitted brms model - like renaming parameters, etc. If you don’t need any postprocessing, you can delete this section.

5.2 Testing

Unit testing is extremely important. You should test your model with the testthat package. You can use the use_test() function to create a test file for your model. See file tests/testthat/test-bmm.R for an example of how we test the existing models. BRMS models take a long time to fit, so we don’t test the actual fitting process. fit_model() provides an argument backend="mock", which will return a mock object instead of fitting the model. This ensures that the entire pipeline works without errors. For example, here’s a test of the IMMfull model:

test_that('Available mock models run without errors',{
  withr::local_options('bmm.silent'=2)
  skip_on_cran()
  dat <- data.frame(
    resp_err = rIMM(n = 5),
    Item2_rel = 2,
    Item3_rel = -1.5,
    spaD2 = 0.5,
    spaD3 = 2
  )

  # two-parameter model mock fit
  f <- bmmformula(kappa ~ 1, c ~ 1, a ~ 1, s ~ 1)
  mock_fit <- bmm(f, dat, 
                  imm(resp_err = "resp_err", 
                      setsize = 3, 
                      nt_features = paste0('Item',2:3,'_rel'), 
                      nt_distance=paste0('spaD',2:3)), 
                  backend = "mock", mock_fit = 1, rename=FALSE)
  expect_equal(mock_fit$fit, 1)
  expect_type(mock_fit$bmm$fit_args, "list")
  expect_equal(names(mock_fit$fit_args[1:4]), c("formula", "data"))
})

The tests based on the testthat package are run every time you call the check() command. Before you submit your changes, make sure that all tests pass.

Important

Additionally, you should perform a full test of the model by running it in a separate script and ensuring it gives meaningful results. At the very least, you should perform basic parameter recovery simulations for hyper-parameters (i.e. means and standard deviations) as well as subject-level parameters to give users an idea of how much data they need to adequately estimate the model.

We are in the process of establishing guidelines for that.

5.3 Add an example dataset

All new models should come with an example dataset, that can be loaded by users and that can be used in the examples section. This should be one of:

  • A new dataset that you add to the package
  • A dataset that already exists in the package but that can be used with the new model
  • A dataset that exists in another package that you can load with data() and use with the new model

For example, the vignettes for the mixture2p and mixture3p use an external dataset from the mixtur package that can be loaded with data('bays2009_full', package='mixtur'). The IMM models use a dataset included in the current package. For instructions on how to add a new dataset see here.

5.4 Add a vignette

All new models should come with a vignette that explains some basic information about the model and how to estimate it with bmm. You can use the use_vignette() function to create a new vignette. See here for more information. The vignettes will be published automatically on the package website under “Articles” when the pull request is approved. You can browse the source code for the existing vignettes in the vignettes/ directory. You can see the published version of the existing vignettes here.

And that’s it! You have added a new model to the bmm package. You can now submit your changes to the bmm package repository.