5 Adding a new model
If you have read Section 3 and Section 4, you should have a pretty good idea of how the bmm
package functions. Now it’s time to add your new model.
The good news are, you don’t have to add any of the files manually. The bmm
package includes a function use_model_template()
that generates all the files with templates for all necessary functions. Thus, you can focus on the important work of filling these templates with the relevant information without worrying about missing something critical. If you type ?use_model_template
you will see the following description:
Function use_model_template()
Description
Create a file with a template for adding a new model (for developers)
Usage
Arguments
Argument | Description |
---|---|
model_name |
A string with the name of the model. The file will be named bmm_model_model_name.R and all necessary functions will be created with the appropriate names and structure. The file will be saved in the R/ directory |
custom_family |
Logical; Do you plan to define a brms::custom_family()? If TRUE the function will add a section for the custom family, placeholders for the stan_vars and corresponding empty .stan files in inst/stan_chunks/, that you can fill For an example, see the sdmSimple model in /R/bmm_model_sdmSimple.R. If FALSE (default) the function will not add the custom family section nor stan files. |
stanvar_blocks |
A character vector with the names of the blocks that will be added to the custom family section. See brms::stanvar() for more details. The default lists all the possible blocks, but it is unlikely that you will need all of them. You can specify a vector of only those that you need. The function will add a section for each block in the list |
open_files |
Logical; If TRUE (default), the function will open the template files that were created in RStudio |
testing |
Logical; If TRUE, the function will return the file content but will not save the file. If FALSE (default), the function will save the file |
5.1 Example
Let’s add a new model called gcm
. Let’s assume that you have tested the model in Stan and you have the Stan code ready. We want to define a custom family for the gcm
model, and we want to define the following blocks: likelihood
, functions
(see ?brms::stanvar
for an explanation of the blocks).
First you set up your system and git environment as described in Section 1. Then you can run the following code from RStudio in the root directory of the bmm
package:
This will create the file bmm_model_gcm.R
in the R/
directory and the files gcm_likelihood.stan
and gcm_functions.stan
in the inst/stan_chunks/
directory. The function will also open the files in RStudio. You will see the following output in the console:
• Modify 'inst/stan_chunks/gcm_likelihood.stan'
• Modify 'inst/stan_chunks/gcm_functions.stan'
• Modify 'R/model_gcm.R'
Now you can fill the files with the appropriate code. The Stan files will be empty, but the R file will have the following structure:
#############################################################################!
# MODELS ####
#############################################################################!
# see file 'R/model_mixture3p.R' for an example
.model_gcm <- function(resp_var1 = NULL, required_arg1 = NULL, required_arg2 = NULL, links = NULL, version = NULL, call = NULL, ...) {
out <- structure(
list(
resp_vars = nlist(resp_var1),
other_vars = nlist(required_arg1, required_arg2),
domain = '',
task = '',
name = '',
citation = '',
version = version,
requirements = '',
parameters = list(),
links = list(),
fixed_parameters = list(),
default_priors = list(par1 = list(), par2 = list()),
void_mu = FALSE
),
class = c('bmmodel', 'gcm'),
call = call
)
if(!is.null(version)) class(out) <- c(class(out), paste0("gcm_",version))
out$links[names(links)] <- links
out
}
# user facing alias
# information in the title and details sections will be filled in
# automatically based on the information in the .model_gcm()$info
#' @title `r .model_gcm()$name`
#' @name Model Name#' @details `r model_info(.model_gcm())`
#' @param resp_var1 A description of the response variable
#' @param required_arg1 A description of the required argument
#' @param required_arg2 A description of the required argument
#' @param links A list of links for the parameters.
#' @param version A character label for the version of the model. Can be empty or NULL if there is only one version.
#' @param ... used internally for testing, ignore it
#' @return An object of class `bmmodel`
#' @export
#' @examples
#' \dontrun{
#' # put a full example here (see 'R/model_mixture3p.R' for an example)
#' }
gcm <- function(resp_var1, required_arg1, required_arg2, links = NULL, version = NULL, ...) {
call <- match.call()
stop_missing_args()
.model_gcm(resp_var1 = resp_var1, required_arg1 = required_arg1, required_arg2 = required_arg2,
links = links, version = version,call = call, ...)
}
#############################################################################!
# CHECK_DATA S3 methods ####
#############################################################################!
# A check_data.* function should be defined for each class of the model.
# If a model shares methods with other models, the shared methods should be
# defined in helpers-data.R. Put here only the methods that are specific to
# the model. See ?check_data for details.
# (YOU CAN DELETE THIS SECTION IF YOU DO NOT REQUIRE ADDITIONAL DATA CHECKS)
#' @export
check_data.gcm <- function(model, data, formula) {
# retrieve required arguments
required_arg1 <- model$other_vars$required_arg1
required_arg2 <- model$other_vars$required_arg2
# check the data (required)
# compute any necessary transformations (optional)
# save some variables as attributes of the data for later use (optional)
NextMethod('check_data')
}
#############################################################################!
# Convert bmmformula to brmsformla methods ####
#############################################################################!
# A bmf2bf.* function should be defined if the default method for consructing
# the brmsformula from the bmmformula does not apply (e.g if aterms are required).
# The shared method for all `bmmodels` is defined in bmmformula.R.
# See ?bmf2bf for details.
# (YOU CAN DELETE THIS SECTION IF YOUR MODEL USES A STANDARD FORMULA WITH 1 RESPONSE VARIABLE)
#' @export
bmf2bf.gcm <- function(model, formula) {
# retrieve required response arguments
resp_var1 <- model$resp_vars$resp_var1
resp_var2 <- model$resp_vars$resp_arg2
# set the base brmsformula based
brms_formula <- brms::bf(paste0(resp_var1," | ", vreal(resp_var2), " ~ 1" ),)
# return the brms_formula to add the remaining bmmformulas to it.
brms_formula
}
#############################################################################!
# CONFIGURE_MODEL S3 METHODS ####
#############################################################################!
# Each model should have a corresponding configure_model.* function. See
# ?configure_model for more information.
#' @export
configure_model.gcm <- function(model, data, formula) {
# retrieve required arguments
required_arg1 <- model$other_vars$required_arg1
required_arg2 <- model$other_vars$required_arg2
# retrieve arguments from the data check
my_precomputed_var <- attr(data, 'my_precomputed_var')
# construct brms formula from the bmm formula
formula <- bmf2bf(model, formula)
# construct the family & add to formula object
gcm_family <- brms::custom_family(
'gcm',
dpars = c(),
links = c(),
lb = c(), # upper bounds for parameters
ub = c(), # lower bounds for parameters
type = '', # real for continous dv, int for discrete dv
loop = TRUE, # is the likelihood vectorized
)
formula$family <- gcm_family
# prepare initial stanvars to pass to brms, model formula and priors
sc_path <- system.file('stan_chunks', package='bmm')
stan_likelihood <- read_lines2(paste0(sc_path, '/gcm_likelihood.stan'))
stan_functions <- read_lines2(paste0(sc_path, '/gcm_functions.stan'))
stanvars <- stanvar(scode = stan_likelihood, block = 'likelihood') +
stanvar(scode = stan_functions, block = 'functions')
# return the list
nlist(formula, data, stanvars)
}
#############################################################################!
# POSTPROCESS METHODS ####
#############################################################################!
# A postprocess_brm.* function should be defined for the model class. See
# ?postprocess_brm for details
#' @export
postprocess_brm.gcm <- function(model, fit) {
# any required postprocessing (if none, delete this section)
fit
}
Now you have to:
Fill the
.model_gcm
function with the appropriate code. This function should return a list with all the variables specified above. The class of the list should bec('bmmmodel', 'gcm')
. Rename the response arguments and the other required arguments, or delete the other arguments if you do not have any. You can see an example in themodel_sdm.R
file. Specify the parameters of the model, the link functions, what if any parameters are fixed (and to what value). It’s crucial that you set default priors for every parameter of the model, which should be informed by knowledge in the field.Adjust the user-facing alias. Here you should only rename the required arguments and fill in the
@examples
section with a full example. Everything else will be filled in automatically based on the information in the.model_gcm
function.Fill the
check_data.gcm
function with the appropriate code. This function should check the data and return the data. You may or may not need to compute any transformations or save some variables as attributes of the data.If necessary define the
bmf2bf.gcm
method to convert thebmmformula
to abrmsformula
. The first step for this is always to specify the response variable and additional response information. Keep in mind thatbrms
automatically interprets this formula as the linear model formula for themu
parameter of your custom family. Currently,brms
requires all custom families to have amu
parameter. However, we recommend to code this parameter as avoid_mu
, and fix the intercept of this parameter to zero using constant priors. This way, thebmmformula
can be used to only specify the linear or non-linear model for the parameters of abmmmodel
. If your model has a single response variable, you can delete this section.Fill the
configure_model.gcm
function with the appropriate code. This function should construct the formula, the family, the stanvars. You can also retrieve any arguments you saved from the data check. Depending on your model, some of these parts might not be necessary. For example, for the mixture models (e.g.mixture3p
), we construct a new formula, because we want to rename the arguments to make it easier for the user. For thesdmSimple
model, we define the family ourselves, so we don’t need to change the formula.You need to fill information about your custom family, and then fill the STAN files with your STAN code. Conveniently, you don’t have to edit lines 134-140: loading the STAN files and setting up the stanvars is set up automatically when calling the
use_model_template
function. Should you need to add more STAN files after you created the template, you can add the files ininit/stan_chunks/
manually and edit those lines to additionally load the manually added files.Fill the
postprocess_brm.gcm
function with the appropriate code. By post-processing, we mean changes to the fitted brms model - like renaming parameters, etc. If you don’t need any post-processing, you can delete this section.
5.2 Testing
Unit testing is extremely important. You should test your model with the testthat
package. You can use the use_test()
function to create a test file for your model. See file tests/testthat/test-bmm.R
for an example of how we test the existing models. BRMS models take a long time to fit, so we don’t test the actual fitting process. The bmm()
function provides an argument backend="mock"
, which will return a mock object instead of fitting the model. This ensures that the entire pipeline works without errors. For example, here’s a test of the IMMfull
model:
test_that('Available mock models run without errors',{
withr::local_options('bmm.silent'=2)
skip_on_cran()
dat <- data.frame(
resp_err = rIMM(n = 5),
Item2_rel = 2,
Item3_rel = -1.5,
spaD2 = 0.5,
spaD3 = 2
)
# two-parameter model mock fit
f <- bmmformula(kappa ~ 1, c ~ 1, a ~ 1, s ~ 1)
mock_fit <- bmm(f, dat,
imm(resp_err = "resp_err",
setsize = 3,
nt_features = paste0('Item',2:3,'_rel'),
nt_distance=paste0('spaD',2:3)),
backend = "mock", mock_fit = 1, rename=FALSE)
expect_equal(mock_fit$fit, 1)
expect_type(mock_fit$bmm$fit_args, "list")
expect_equal(names(mock_fit$fit_args[1:4]), c("formula", "data"))
})
The tests based on the testthat
package are run every time you call the check()
command. Before you submit your changes, make sure that all tests pass.
Additionally, you should perform a full test of the model by running it in a separate script and ensuring it gives meaningful results. At the very least, you should perform basic parameter recovery simulations for hyper-parameters (i.e. means and standard deviations) as well as subject-level parameters to give users an idea of how much data they need to adequately estimate the model.
We are in the process of establishing guidelines for that.
5.3 Add an example dataset
All new models should come with an example dataset, that can be loaded by users and can be used in the examples section. This should be either:
- A new dataset that you add to the package
- A dataset that already exists in the package but that can be used with the new model
- A dataset that exists in another package that you can load with
data()
and use with the new model
For example, the vignettes for the mixture2p
and mixture3p
use an external dataset from the mixtur
package that can be loaded with data('bays2009_full', package='mixtur')
. The IMM
models use a dataset included in the current package. For instructions on how to add a new dataset see here.
5.4 Add an article
All new models should come with an article that explains some basic information about the model and how to estimate it with bmm
. You can use the use_article()
function to create a new article. See here for more information. The articles will be published automatically on the package website under “Articles” when the pull request is approved. You can browse the source code for the existing articles in the vignettes/articles/
directory. You can see the published version of the existing vignettes here.
And that’s it! You have added a new model to the bmm
package. You can now submit your changes to the bmm
package repository.