Psychophysical model recovery using Bayesian brms

Will generate fake data that includes differences between participants, but doesn’t model them.

To get started, we load the required packages.

rm(list = ls())

library(tidyverse)
library(brms)
library(here) #For finding root directory

source( here("R","simulate_data_no_underscores.R") )  #Load my needed custom function
source( here("R","psychometric_function.R") ) #Load my needed custom function

set.seed(999) #ensures reproducibility for testing

break_brms_by_making_numTargets_numeric<- FALSE #Change factor to numeric, right before doing formula fit, so 
   #I know that it's brms that's the problem rather than the rest of my code

Create simulated trials

Set up simulated experiment design.

numTargetsConds<- c("two","three")
numSubjects<- 20#25
trialsPerCondition<- 20
laboratories<- c("Holcombe","Roudaia")
#Array of speeds (not very realistic because mostly controlled by a staircase in actual experiment)
speeds<-seq(.02,1.8, length.out = 12) # trials at 12 different speeds between .02 and 1.8

In order to build and test our model in brms, we must first create a simulated data set that is similar to our actual experiment data. This allows us to confirm the brms model is working and successfully recovers the parameters we set before applying it to our real experimental data that has unknown parameter values. In the actual data, there will be many group-wise differences in location and scale parameters. The following simulated data only has explicit differences between the \(\eta\) (location) of the two age groups (older vs younger).

# A tibble: 8 × 2
  name            value
  <chr>           <int>
1 lab                 2
2 gender              2
3 ageGroup            2
4 subjWithinGroup    20
5 numTargets          2
6 obj_per_ring        4
7 speed              12
8 trialThisCond      20

Create between-subject variability within each group, to create a potential multilevel modeling advantage. Can do this by passing the participant number to the location-parameter calculating function. Because that function will be called over and over, separately, the location parameter needs to be a deterministic function of the participant number. So it needs to calculate a hash or something to determine the location parameter. E.g. it could calculate the remainder, but then the location parameters wouldn’t be centered on the intended value for that condition. To do that, I think I need to know the number of participants in the condition, n, and number them 1..n. Then I can give e.g. participant 1 the extreme value on one side of the condition-determined value and give participant n the extreme value on the other side. So I number participants separately even within age*gender to maintain the age and gender penalties, but not within speed, of course. Also not within targetLoad or obj_per_ring.

Because the present purpose of numbering participants is to inject the right amount of between-participant variability, I will number the participants with a range of numbers that has unit standard deviation. Will call this column “subjStandardized”.

Choose values for psychometric function for younger and older

lapse <- 0.05
sigma <- 0.2

location_param_base<-1.6 #location_param_young_123targets <- c(1.7,1.0,0.8)
target_penalty<-0.15 #penalty for additional target
age_penalty <- 0.2#0.4 #Old people have worse limit by this much
gender_penalty <- 0.12 #Female worse by this much
Holcombe_lab_penalty <- 0.2
#Set parameters for differences between Ss in a group
eta_between_subject_sd <- 0.2 

#Using above parameters, need function to calculate a participant's location parameter
#Include optional between_subject_variance
location_param_calculate<- function(numTargets,ageGroup,gender,lab,
                                    subjStandardized,eta_between_subject_sd) {
  
  base_location_param <- location_param_base#location_param_young_123targets[targetLoad]
  #calculate offset for this participant based on desired sigma_between_participant_variance
  location_param <- base_location_param + 
                    subjStandardized * eta_between_subject_sd
  
  after_penalties <-location_param - 
                      if_else(lab=="Holcombe",1,0) * Holcombe_lab_penalty -
                      if_else(ageGroup=="older",1,0) * age_penalty -
                      if_else(numTargets=="three",1,0) * target_penalty -
                      if_else(gender=="F",1,0) * gender_penalty
  return (after_penalties)
}

#Need version with fixed eta_between_subject_sd for use in mutate
location_param_calc_for_mutate<- function(numTargets,ageGroup,gender,lab,
                                               subjStandardized) {
  location_param_calculate(numTargets,ageGroup,gender,lab,
                           subjStandardized, eta_between_subject_sd)
}

Using the psychometric function, simulate whether participant is correct on each trial or not, and add that to the simulated data.

Plot data

upper_bound = 1 - L*(1-C)

Setting up our Model in brms

Setting a model formula in brms allows the use of multilevel models, where there is a hierarchical structure in the data. But at this point we haven’t made the model multi-level as we have been concentrating on the basics of brms.

The bf() function of brms allows the specification of a formula. The parameter can be defined by population effects, where the parameter’s effect is fixed, or group level effects where the parameter varies with a variable such as age. The “family” argument is a description of the response distribution and link function that the model uses. For more detailed information on setting up a formula and the different arguments in BRMS seehttps://paulbuerkner.com/brms/reference/brmsformula.html

The model we used is based off our psychometric function used to generate the data mentioned previously. The only explicitly-coded difference in our simulated data is in the location parameter of older vs younger. Thus, in addition to the psychometric function, we allowed \(\eta\) and \(\log(\sigma)\) to vary by age group in the model. Because the psychometric function doesn’t map onto a canonical link function, we use the non-linear estimation capability of brms rather than linear regression with a link function.

Alex’s note: Using the nonlinear option is also what allowed us to set a prior on the thresholds \(\eta\), because we could then parametrize the function in terms of the x-intercept, whereas with the link-function approach, we are stuck with the conventional parameterization of a line, which has a term for the y-intercept but not the x-intercept

my_formula <- brms::bf( 
   correct ~ chance_rate + (1-chance_rate - lapseRate*(1-chance_rate)) * Phi(-(speed-eta)/exp(logSigma))
  ) 
my_formula <- my_formula$formula
  
my_brms_formula <- brms::bf(
  correct ~ chance_rate + (1-chance_rate - lapseRate * (1-chance_rate))*Phi(-(speed-eta)/exp(logSigma)), 
  eta ~ lab + ageGroup, #+ numTargets,# + gender,
  lapseRate ~ 1, #~1 estimates intercept only
  logSigma ~ 1,#ageGroup,
  family = bernoulli(link="identity"), #Otherwise the default link 'logit' would be applied
  nl = TRUE #non-linear model
)

Set priors

See visualize_and_select_priors.html for some motivation and visualisation.

my_priors <- c(
  brms::set_prior("beta(2,33.33)", class = "b", nlpar = "lapseRate", lb = 0, ub = 1),
  brms::set_prior("uniform(0, 2.5)", class = "b", nlpar = "eta", lb = 0, ub = 2.5),
  brms::set_prior("uniform(-3, 1.6)", class = "b", nlpar = "logSigma", lb = -2, ub = 1.6) 
)

Set priors

See visualize_and_select_priors.html for motivation.

Visualize priors, by plotting them all together (with arbitrary height).

Fitting Model to Simulated Data

Fitting the model gives an estimation of the average parameter value of the participants. The brm() function is used to fit the model based on the given formula, data and priors. Other arguments of brm can adjust the model fitting in various ways, for more information on each of the arguments see https://paulbuerkner.com/brms/reference/brm.html

if (break_brms_by_making_numTargets_numeric) {
  #Make numeric version of numTargets
  data_simulated <- data_simulated |>
    mutate(targets = case_when(
      numTargets == "two" ~ 2,
      numTargets == "three" ~ 3,
      TRUE ~ 0
    ))
  #Delete old column
  data_simulated$numTargets <- NULL #delete column
  data_simulated <- data_simulated %>%
    rename(numTargets = targets)
}

Compiling Stan program...

Trying to compile a simple C file

Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
using C compiler: ‘Apple clang version 16.0.0 (clang-1600.0.26.6)’
using SDK: ‘’
clang -arch arm64 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppParallel/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DUSE_STANC3 -DSTRICT_R_HEADERS  -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -D_HAS_AUTO_PTR_ETC=0  -include '/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1   -I/opt/R/arm64/include    -fPIC  -falign-functions=64 -Wall -g -O2  -c foo.c -o foo.o
In file included from <built-in>:1:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/Dense:1:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/Core:19:
/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: 'cmath' file not found
  679 | #include <cmath>
      |          ^~~~~~~
1 error generated.
make: *** [foo.o] Error 1

Start sampling

Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#bulk-ess

Time taken (min) = 13.1on_model' NOW (CHAIN 2).

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 2: 
Chain 2: Gradient evaluation took 0.040546 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 405.46 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 1: 
Chain 1: Gradient evaluation took 0.041512 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 415.12 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:   1 / 400 [  0%]  (Warmup)
Chain 2: Iteration:   1 / 400 [  0%]  (Warmup)
Chain 1: Iteration:  40 / 400 [ 10%]  (Warmup)
Chain 1: Iteration:  80 / 400 [ 20%]  (Warmup)
Chain 2: Iteration:  40 / 400 [ 10%]  (Warmup)
Chain 1: Iteration: 120 / 400 [ 30%]  (Warmup)
Chain 1: Iteration: 160 / 400 [ 40%]  (Warmup)
Chain 1: Iteration: 200 / 400 [ 50%]  (Warmup)
Chain 1: Iteration: 201 / 400 [ 50%]  (Sampling)
Chain 1: Iteration: 240 / 400 [ 60%]  (Sampling)
Chain 1: Iteration: 280 / 400 [ 70%]  (Sampling)
Chain 1: Iteration: 320 / 400 [ 80%]  (Sampling)
Chain 2: Iteration:  80 / 400 [ 20%]  (Warmup)
Chain 1: Iteration: 360 / 400 [ 90%]  (Sampling)
Chain 1: Iteration: 400 / 400 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 182.891 seconds (Warm-up)
Chain 1:                79.291 seconds (Sampling)
Chain 1:                262.182 seconds (Total)
Chain 1: 
Chain 2: Iteration: 120 / 400 [ 30%]  (Warmup)
Chain 2: Iteration: 160 / 400 [ 40%]  (Warmup)
Chain 2: Iteration: 200 / 400 [ 50%]  (Warmup)
Chain 2: Iteration: 201 / 400 [ 50%]  (Sampling)
Chain 2: Iteration: 240 / 400 [ 60%]  (Sampling)
Chain 2: Iteration: 280 / 400 [ 70%]  (Sampling)
Chain 2: Iteration: 320 / 400 [ 80%]  (Sampling)
Chain 2: Iteration: 360 / 400 [ 90%]  (Sampling)
Chain 2: Iteration: 400 / 400 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 687.555 seconds (Warm-up)
Chain 2:                71.67 seconds (Sampling)
Chain 2:                759.225 seconds (Total)
Chain 2:

eta_intercept is the eta for the older group

eta_ageGroupyounger represents the eta advantage for the younger age group.

The logSigma estimate tends to have wide confidence intervals.

eta_intercept is the eta for the older group

eta_ageGroupyounger represents the eta advantage for the younger age group.

The logSigma estimate tends to have wide confidence intervals.

Check how close the fit estimates are to the true parameters.

First try to calculate the reference group location parameter.

[1] "Success. Looks like I/brms calculated/estimated the reference group location parameter correctly."

Check brms’ estimate of sigma.

Discrepancy of 0.46 between true logSigma, -1.61 and what brms estimated, -1.15

            Row.names trueVal Estimate discrepancy Est.Error Q2.5 Q97.5
1      eta_labRoudaia    0.20     0.20         0.0         0 0.19  0.21
2   eta_ageGroupolder    0.20     0.00         0.2         0 0.00  0.00
3 lapseRate_Intercept    0.05     0.05         0.0         0 0.05  0.05
  outside_CI
1        0.0
2        0.2
3        0.0

Estimates falling outside CI:          Row.names trueVal     Estimate discrepancy    Est.Error         Q2.5
1 eta_ageGroupolder     0.2 9.037908e-05   0.1999096 8.156578e-05 4.771587e-06
         Q97.5 outside_CI
1 0.0003025123  0.1996975

why are gender and ageGroup estimates zero

Look at what is happening during the chain

Registered S3 method overwritten by 'GGally':
  method from   
  +.gg   ggplot2

Plot possibly- problematic ones (gender and age) specifically.