When you have reaction time data from some psycholinguistics studies, you can learn more about each individual by extracting the random slopes using linear mixed-effects modeling. Not only can you use those slopes to examine the reliability of the task as an outcome measure, you can also use them for other analysis as an predictor measure. You can use this measure as the predictor to examine how the construct measured by the task (e.g., inhibitory control) relates to another construct (phonological processing). If you have multiple tasks that target the similar construct (e.g., inhibitory control), you can examine inter-task correlations (e.g., Stroop task versus Simon task).
This dataset is a subset of data for masked-repetition priming from the Hui et al. (2022b) dataset, available on Open Science Framework (osf.io/uyfh5/). Read more about the original study here: osf.io/preprints/osf/dwjmn_v1.
# Set the working directory to the location of this R Markdown file
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
# Import dataset
d = read.csv("H-data.csv")
# Load necessary libraries for data manipulation
library(tidyr)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ purrr 1.0.2
## ✔ forcats 1.0.0 ✔ readr 2.1.5
## ✔ ggplot2 3.5.1 ✔ stringr 1.5.1
## ✔ lubridate 1.9.3 ✔ tibble 3.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Check the descriptive data
library(rstatix)
##
## Attaching package: 'rstatix'
##
## The following object is masked from 'package:stats':
##
## filter
## To get the descriptives (means & standard deviations of the two conditions) of these two datasets
d %>% group_by(Condition) %>% get_summary_stats(Reaction.Time, type = "mean_sd")
## # A tibble: 2 × 5
## Condition variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 related Reaction.Time 4219 654. 232.
## 2 unrelated Reaction.Time 4227 686. 231.
# Mean RT for Related Condition is shorter than Unrelated Condition
The first step in calculating split-half reliability is to divide our data set into two comparable halves. We typically split by even and odd item numbers.
We start by ensuring our item identifiers are numeric using
parse_number()
from the readr package, which extracts
numeric portions from strings. This is helpful if your item codes
include text like “item1” or “I-001.”
Then, we use dplyr
’s filter()
function to
create two separate datasets: one containing all even-numbered items and
one containing all odd-numbered items. The modulo operator
%%
with a value of 2 returns the remainder after division
by 2, so ItemNo %% 2 == 0
selects all even numbers.
This splitting approach ensures that both halves contain a representative sample of items from throughout your experiment, which is important for a valid reliability assessment.
# Split the data in to two halves for split-half reliability
d$ItemNo = parse_number(d$ItemNo) # extracts numeric portions from strings
de = d %>% filter(ItemNo %% 2 == 0)
do = d %>% filter(ItemNo %% 2 == 1)
After splitting our data, we fit separate mixed-effects models to
each half. We’re using the blme
package, which implements
Bayesian linear mixed-effects models. These can be more stable than
standard lme4
models when dealing with complex random
effects structures.
For our outcome variable, we’re using the inverse of reaction time (-1/RT) rather than raw RT. This transformation often results in more normally distributed residuals, which is an assumption of linear mixed models.
Our model formula includes:
A fixed effect for our experimental condition (“Condition”)
By-subject random intercepts and slopes for condition (“(1+Condition|Subject)”)
By-item random intercepts and slopes for condition (“(1+Condition|ItemNo)”)
This is what’s often called the “maximal” random effects structure justified by our design, following Barr et al.’s (2013) recommendations.
For optimization, we’re using “nlminb
” through the
optimx
package. If you encounter convergence warnings, you
might need to try different optimizers or simplify your random effects
structure.
library(optimx)
library(blme)
## Loading required package: lme4
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
## fit models for even and odd numbered items (Bayesian & Generalized LMM)
m_de = blmer(-1/Reaction.Time ~ Condition +
(1+Condition|Subject) +
(1+Condition|ItemNo),
data = de,
control=lmerControl(optimizer="optimx",
optCtrl=list(method="nlminb")))
m_do = blmer(-1/Reaction.Time ~ Condition +
(1+Condition|Subject) +
(1+Condition|ItemNo),
data = do,
control=lmerControl(optimizer="optimx",
optCtrl=list(method="nlminb")))
Now we come to the key step in our approach: extracting the random slopes for each participant from both halves of our data. These random slopes represent our individual difference measure.
We use the ranef()
function to extract the random
effects from each model. This returns a list, and we specifically want
the Subject-level random effects, which we convert to a data frame. The
column containing our condition effect will be named according to how
the condition variable is coded in our data set - in our case, it’s
“Conditionunrelated”.
We then add a column with the subject identifiers, which we get from the row names of the random effects data frame.
###get the slopes for all the individuals
#even-numbered
person_rand_e=data.frame(ranef(m_de)$Subject) # extract the intercepts and slopes for each subject
person_rand_e$Subject= row.names(person_rand_e) # create a column with subjects' ID
head(person_rand_e) # check how this dataset looks like: there are intercepts and slopes
## X.Intercept. Conditionunrelated Subject
## adk208 -9.430854e-05 6.532975e-06 adk208
## aiy093 -8.927030e-05 1.724309e-05 aiy093
## atl218 4.076901e-05 -1.149842e-05 atl218
## aus204 -1.636037e-04 2.364253e-05 aus204
## axt253 2.773919e-04 -3.753603e-05 axt253
## bbs259 6.679708e-05 -1.578168e-05 bbs259
#odd-numbered
person_rand_o=data.frame(ranef(m_do)$Subject)
person_rand_o$Subject= row.names(person_rand_o)
After extracting random slopes from both models, we join the two data sets using an inner join, with Subject as the key. The suffix parameter adds “_even” and “_odd” to distinguish columns from each data set.
The resulting data frame has one row per participant, with columns for their random intercepts and slopes from both the even and odd halves of the data.
Take a moment to examine this data. Each participant now has two values representing their condition effect - one estimated from even-numbered items and one from odd-numbered items. If our measure is reliable, these two values should be strongly correlated.
#combine them into one data frame
person = inner_join(person_rand_e, person_rand_o,
by = "Subject", suffix = c("_even", "_odd"))
head(person) # Take a quick look at what this data frame looks like
## X.Intercept._even Conditionunrelated_even Subject X.Intercept._odd
## 1 -9.430854e-05 6.532975e-06 adk208 -1.113374e-04
## 2 -8.927030e-05 1.724309e-05 aiy093 -6.523755e-05
## 3 4.076901e-05 -1.149842e-05 atl218 -6.501898e-05
## 4 -1.636037e-04 2.364253e-05 aus204 -8.186569e-05
## 5 2.773919e-04 -3.753603e-05 axt253 -6.129647e-05
## 6 6.679708e-05 -1.578168e-05 bbs259 -4.950781e-05
## Conditionunrelated_odd
## 1 -6.288882e-07
## 2 -2.176201e-06
## 3 2.825726e-06
## 4 1.945004e-06
## 5 2.697453e-06
## 6 4.327795e-06
The final step is to calculate the reliability as the correlation between the random slopes from the two halves of our data. We use two methods:
Pearson correlation using the corr.test()
function
from the psych
package, which is the standard approach for
normally distributed data.
Robust correlation using the pbcor()
function from
the WRS2
package, which is less affected by outliers or
violations of normality assumptions.
In our study, these methods gave us correlations of .81 and .79, respectively. Generally, reliability coefficients above .80 are considered good, while those above .70 are acceptable. If you get results substantially lower than these, you might want to reconsider your experimental design or increase the number of items.
#check the correlation between even- and odd-numbered sets
library(psych)
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
## The following object is masked from 'package:dplyr':
##
## recode
## The following object is masked from 'package:purrr':
##
## some
### Pearson
print(corr.test(person$Conditionunrelated_even,
person$Conditionunrelated_odd),
short = FALSE) #.85
## Call:corr.test(x = person$Conditionunrelated_even, y = person$Conditionunrelated_odd)
## Correlation matrix
## [1] 0.85
## Sample Size
## [1] 127
## These are the unadjusted probability values.
## The probability values adjusted for multiple tests are in the p.adj object.
## [1] 0
##
## Confidence intervals based upon normal theory. To get bootstrapped values, try cor.ci
## raw.lower raw.r raw.upper raw.p lower.adj upper.adj
## NA-NA 0.79 0.85 0.89 0 0.79 0.89
### Robust
WRS2::pbcor(person$Conditionunrelated_even,
person$Conditionunrelated_odd,
ci = T) # .81
## Call:
## WRS2::pbcor(x = person$Conditionunrelated_even, y = person$Conditionunrelated_odd,
## ci = T)
##
## Robust correlation coefficient: 0.8135
## Test statistic: 15.6386
## p-value: 0
##
## Bootstrap CI: [0.7423; 0.8683]