Testing Reliability with Random Slopes Extracted from Linear Mixed-Effects Modeling (LMM)

Testing Case 1

Zhiyi Wu

When you have reaction time data from some psycholinguistics studies, you can learn more about each individual by extracting the random slopes using linear mixed-effects modeling. Not only can you use those slopes to examine the reliability of the task as an outcome measure, you can also use them for other analysis as an predictor measure. You can use this measure as the predictor to examine how the construct measured by the task (e.g., inhibitory control) relates to another construct (phonological processing). If you have multiple tasks that target the similar construct (e.g., inhibitory control), you can examine inter-task correlations (e.g., Stroop task versus Simon task).

This dataset is a subset of data for masked-repetition priming from the Hui et al. (2022b) dataset, available on Open Science Framework (osf.io/uyfh5/). Read more about the original study here: osf.io/preprints/osf/dwjmn_v1.

# Set the working directory to the location of this R Markdown file
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))


# Import dataset
d = read.csv("H-data.csv")

# Load necessary libraries for data manipulation
library(tidyr)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ purrr     1.0.2
## ✔ forcats   1.0.0     ✔ readr     2.1.5
## ✔ ggplot2   3.5.1     ✔ stringr   1.5.1
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Check the descriptive data
library(rstatix)
## 
## Attaching package: 'rstatix'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## To get the descriptives (means & standard deviations of the two conditions) of these two datasets
d %>% group_by(Condition) %>% get_summary_stats(Reaction.Time, type = "mean_sd")
## # A tibble: 2 × 5
##   Condition variable          n  mean    sd
##   <chr>     <fct>         <dbl> <dbl> <dbl>
## 1 related   Reaction.Time  4219  654.  232.
## 2 unrelated Reaction.Time  4227  686.  231.
# Mean RT for Related Condition is shorter than Unrelated Condition

Step 1: Split Data into Two Halves

The first step in calculating split-half reliability is to divide our data set into two comparable halves. We typically split by even and odd item numbers.

We start by ensuring our item identifiers are numeric using parse_number() from the readr package, which extracts numeric portions from strings. This is helpful if your item codes include text like “item1” or “I-001.”

Then, we use dplyr’s filter() function to create two separate datasets: one containing all even-numbered items and one containing all odd-numbered items. The modulo operator %% with a value of 2 returns the remainder after division by 2, so ItemNo %% 2 == 0 selects all even numbers.

This splitting approach ensures that both halves contain a representative sample of items from throughout your experiment, which is important for a valid reliability assessment.

# Split the data in to two halves for split-half reliability
d$ItemNo = parse_number(d$ItemNo) # extracts numeric portions from strings

de = d %>% filter(ItemNo %% 2 == 0)
do = d %>% filter(ItemNo %% 2 == 1)

Step 2: Fitting models to the datasets

After splitting our data, we fit separate mixed-effects models to each half. We’re using the blme package, which implements Bayesian linear mixed-effects models. These can be more stable than standard lme4 models when dealing with complex random effects structures.

For our outcome variable, we’re using the inverse of reaction time (-1/RT) rather than raw RT. This transformation often results in more normally distributed residuals, which is an assumption of linear mixed models.

Our model formula includes:

  • A fixed effect for our experimental condition (“Condition”)

  • By-subject random intercepts and slopes for condition (“(1+Condition|Subject)”)

  • By-item random intercepts and slopes for condition (“(1+Condition|ItemNo)”)

This is what’s often called the “maximal” random effects structure justified by our design, following Barr et al.’s (2013) recommendations.

For optimization, we’re using “nlminb” through the optimx package. If you encounter convergence warnings, you might need to try different optimizers or simplify your random effects structure.

library(optimx)
library(blme)
## Loading required package: lme4
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## fit models for even and odd numbered items (Bayesian & Generalized LMM)

m_de = blmer(-1/Reaction.Time ~ Condition + 
               (1+Condition|Subject) + 
               (1+Condition|ItemNo), 
             data = de, 
             control=lmerControl(optimizer="optimx",
                                 optCtrl=list(method="nlminb")))

m_do = blmer(-1/Reaction.Time ~ Condition + 
               (1+Condition|Subject) + 
               (1+Condition|ItemNo), 
             data = do, 
             control=lmerControl(optimizer="optimx",
                                 optCtrl=list(method="nlminb")))

Step 3: Extract Random Slopes

Now we come to the key step in our approach: extracting the random slopes for each participant from both halves of our data. These random slopes represent our individual difference measure.

We use the ranef() function to extract the random effects from each model. This returns a list, and we specifically want the Subject-level random effects, which we convert to a data frame. The column containing our condition effect will be named according to how the condition variable is coded in our data set - in our case, it’s “Conditionunrelated”.

We then add a column with the subject identifiers, which we get from the row names of the random effects data frame.

###get the slopes for all the individuals
#even-numbered
person_rand_e=data.frame(ranef(m_de)$Subject) # extract the intercepts and slopes for each subject
person_rand_e$Subject= row.names(person_rand_e) # create a column with subjects' ID 
head(person_rand_e) # check how this dataset looks like: there are intercepts and slopes
##         X.Intercept. Conditionunrelated Subject
## adk208 -9.430854e-05       6.532975e-06  adk208
## aiy093 -8.927030e-05       1.724309e-05  aiy093
## atl218  4.076901e-05      -1.149842e-05  atl218
## aus204 -1.636037e-04       2.364253e-05  aus204
## axt253  2.773919e-04      -3.753603e-05  axt253
## bbs259  6.679708e-05      -1.578168e-05  bbs259
#odd-numbered
person_rand_o=data.frame(ranef(m_do)$Subject)
person_rand_o$Subject= row.names(person_rand_o)

Step 4: Combine Even-Numbered and Odd-Numbered Sets

After extracting random slopes from both models, we join the two data sets using an inner join, with Subject as the key. The suffix parameter adds “_even” and “_odd” to distinguish columns from each data set.

The resulting data frame has one row per participant, with columns for their random intercepts and slopes from both the even and odd halves of the data.

Take a moment to examine this data. Each participant now has two values representing their condition effect - one estimated from even-numbered items and one from odd-numbered items. If our measure is reliable, these two values should be strongly correlated.

#combine them into one data frame
person = inner_join(person_rand_e, person_rand_o, 
                    by = "Subject", suffix = c("_even", "_odd"))

head(person) # Take a quick look at what this data frame looks like
##   X.Intercept._even Conditionunrelated_even Subject X.Intercept._odd
## 1     -9.430854e-05            6.532975e-06  adk208    -1.113374e-04
## 2     -8.927030e-05            1.724309e-05  aiy093    -6.523755e-05
## 3      4.076901e-05           -1.149842e-05  atl218    -6.501898e-05
## 4     -1.636037e-04            2.364253e-05  aus204    -8.186569e-05
## 5      2.773919e-04           -3.753603e-05  axt253    -6.129647e-05
## 6      6.679708e-05           -1.578168e-05  bbs259    -4.950781e-05
##   Conditionunrelated_odd
## 1          -6.288882e-07
## 2          -2.176201e-06
## 3           2.825726e-06
## 4           1.945004e-06
## 5           2.697453e-06
## 6           4.327795e-06

Step 5: Correlation between Even-Numbered and Odd-Numbered Sets

The final step is to calculate the reliability as the correlation between the random slopes from the two halves of our data. We use two methods:

  1. Pearson correlation using the corr.test() function from the psych package, which is the standard approach for normally distributed data.

  2. Robust correlation using the pbcor() function from the WRS2 package, which is less affected by outliers or violations of normality assumptions.

In our study, these methods gave us correlations of .81 and .79, respectively. Generally, reliability coefficients above .80 are considered good, while those above .70 are acceptable. If you get results substantially lower than these, you might want to reconsider your experimental design or increase the number of items.

#check the correlation between even- and odd-numbered sets
library(psych)
## 
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
library(car)
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:psych':
## 
##     logit
## The following object is masked from 'package:dplyr':
## 
##     recode
## The following object is masked from 'package:purrr':
## 
##     some
### Pearson
print(corr.test(person$Conditionunrelated_even, 
                person$Conditionunrelated_odd), 
      short = FALSE) #.85
## Call:corr.test(x = person$Conditionunrelated_even, y = person$Conditionunrelated_odd)
## Correlation matrix 
## [1] 0.85
## Sample Size 
## [1] 127
## These are the unadjusted probability values.
##   The probability values  adjusted for multiple tests are in the p.adj object. 
## [1] 0
## 
##  Confidence intervals based upon normal theory.  To get bootstrapped values, try cor.ci
##       raw.lower raw.r raw.upper raw.p lower.adj upper.adj
## NA-NA      0.79  0.85      0.89     0      0.79      0.89
### Robust
WRS2::pbcor(person$Conditionunrelated_even, 
            person$Conditionunrelated_odd, 
            ci = T) # .81
## Call:
## WRS2::pbcor(x = person$Conditionunrelated_even, y = person$Conditionunrelated_odd, 
##     ci = T)
## 
## Robust correlation coefficient: 0.8135
## Test statistic: 15.6386
## p-value: 0 
## 
## Bootstrap CI: [0.7423; 0.8683]