<- sleepstudy
dat <- get_Z_matrix(~ Days + (1|Subject), dat)
Z rownames(Z) <- NULL
colnames(Z) <- NULL
::melt(Z) |>
reshape2ggplot(aes(x = Var1, y = Var2, fill = factor(value), color = factor(value))) +
geom_tile(show.legend = FALSE) +
scale_fill_manual(values = c("transparent", scales::alpha("black", 0.5))) +
scale_color_manual(values = c("transparent", "black")) +
theme_bw(20) +
theme(panel.grid = element_blank(),
aspect.ratio = 1) +
ylab(latex2exp::TeX("$Cluster_q$")) +
xlab(latex2exp::TeX("$Observation_i$"))
Generalized Linear Mixed-Effects Models
Notation
\[ \mathbf{y}_{N \times 1} = \mathbf{X}_{N \times p} \boldsymbol{\beta}_{p \times 1} + \sum_{i=1}^{m} \mathbf{Z}_i^{(N \times q_i)} \mathbf{b}_i^{(q_i \times 1)} + \boldsymbol{\varepsilon}_{N \times 1} \]
Where \(N\) is the number of observations, \(p\) is the number of predictors, \(q\) is the number of clusters (e.g., participants) and \(m\) is the number of random effects (e.g., nested or crossed).
Visualizing the \(\mathbf{Z}\) matrix
Why clustered data in Psychology?
In Psychology and Neuroscience we (almost) always have clustered data. For example:
- Childrens nested within classrooms (maybe nested within schools)
- Trials of a cognitive experiments nested within participants
- …
The main point is that, clustered observations are not independent and we want to take into account the correlation.
Example with lme4::sleepstudy
A very simple example is the lme4::sleepstudy
where participants reaction times where evaluated under sleep deprivation.
<- lme4::sleepstudy
dat head(dat)
#> Reaction Days Subject
#> 1 249.5600 0 308
#> 2 258.7047 1 308
#> 3 250.8006 2 308
#> 4 321.4398 3 308
#> 5 356.8519 4 308
#> 6 414.6901 5 308
Overall model
|>
dat ggplot(aes(x = Days, y = Reaction)) +
geom_point(position = position_jitter(width = 0.1)) +
scale_x_continuous(breaks = unique(dat$Days)) +
geom_smooth(method = "lm", se = FALSE) +
ggtitle("Linear Model (ignore dependency)")
By-participant model
|>
dat ggplot(aes(x = Days, y = Reaction)) +
geom_point(position = position_jitter(width = 0.1)) +
facet_wrap(~Subject) +
geom_smooth(method = "lm", se = FALSE)
By-participant model
From the by-participant models, we see a clear dependency. Observations within the same participant are more similar compared to observations across participant.
In addition, at Day 0, some participants have higher/lower reaction times compared to the overall trend. Similarly, some participants have higher/lower slopes.
Individual differences are the core of Psychology and we want to explictly model them!
Individual differences
|>
dat ggplot(aes(x = Days, y = Reaction, group = Subject)) +
geom_point(position = position_jitter(width = 0.1)) +
geom_smooth(method = "lm", se = FALSE)
Are the observations clustered?
We can start assessing the clustering structure by fitting a mixed-model with only the random-intercepts and calculating the intraclass-correlation.
\[ y_{ij} = \beta_0 + \beta_{0_i} + \epsilon_{ij} \]
Are the observations clustered?
The model can be fitted with the lme4::lmer()
function:
<- lmer(Reaction ~ 1 + (1|Subject), data = dat)
fit0 summary(fit0)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: Reaction ~ 1 + (1 | Subject)
#> Data: dat
#>
#> REML criterion at convergence: 1904.3
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -2.4983 -0.5501 -0.1476 0.5123 3.3446
#>
#> Random effects:
#> Groups Name Variance Std.Dev.
#> Subject (Intercept) 1278 35.75
#> Residual 1959 44.26
#> Number of obs: 180, groups: Subject, 18
#>
#> Fixed effects:
#> Estimate Std. Error t value
#> (Intercept) 298.51 9.05 32.98
Are the observations clustered?
# using the insight::get_variance() function
<- insight::get_variance(fit0)
vv $var.intercept / (vv$var.intercept + vv$var.residual) vv
#> Subject
#> 0.3948896
# or directly with performance::icc()
::icc(fit0) performance
#> # Intraclass Correlation Coefficient
#>
#> Adjusted ICC: 0.395
#> Unadjusted ICC: 0.395
Thus roughly 39% of the variance is explained by the clustering structure.
Are the observations clustered?
|>
dat ggplot(aes(x = Subject, y = Reaction)) +
#geom_point(position = position_jitter(width = 0.1))
geom_boxplot(fill = "dodgerblue") +
ggtitle("ICC = 39%")
Are the observations clustered
We can remove the subject-specific effect.
|>
dat mutate(Reaction_cmc = cmc(Reaction, Subject)) |>
ggplot(aes(x = Subject, y = Reaction_cmc)) +
#geom_point(position = position_jitter(width = 0.1))
geom_boxplot(fill = "dodgerblue") +
ylab("Reaction (cluster-mean centered)")
How different ICCs appear…
We can simulate some datasets with different ICC:
<- c(0, 0.3, 0.8)
icc <- sqrt(icc)
sb0
<- 0
b0
<- 10
ns <- 10
nt
<- rep(1:ns, each = nt)
id
<- vector(mode = "list", length = length(icc))
y
for(i in 1:length(icc)){
<- rnorm(ns, 0, sb0[i])
b0i <- b0 + b0i[id] + rnorm(ns * nt, 0, 1 - sb0[i])
y[[i]]
}
<- data.frame(
dd id = rep(id, length(sb0)),
sb0 = rep(sb0, each = ns * nt),
icc = rep(icc, each = ns * nt),
y = unlist(y)
)
|>
dd mutate(icc = sprintf("ICC = %s", icc),
icc = factor(icc, levels = c("ICC = 0", "ICC = 0.3","ICC = 0.8"))) |>
ggplot(aes(x = factor(id), y = y)) +
geom_boxplot() +
facet_wrap(~icc) +
xlab("Cluster")
Psychological interpretation of random intercepts
The random-intercepts are intepreted as baseline variation in the experiment. For example:
- variability at time 0
- variability pre treatment
- variability for the reference condition
- …
Furthemore, the ICC (that is related to the random-intercepts variance) affects the statistical power of the model.
Statistical power and ICC
We will see how to simulate data in a meaningful way, but here just an example on the impact of ICC on the statistical power.
I estimate the statistical power (for an intercept-only model) using the analytical method by Hedges and Pigott (2001).
<- function(nc, ns, d, icc, alpha = 0.05){
powerICC <- icc
tau2 <- 1/ns + 1/ns
vi <- (vi + tau2)/nc
v <- d / sqrt(v)
z <- abs(qnorm(alpha/2))
zc 1 - pnorm(zc - z) + pnorm(-zc - z)
}
Statistical power and ICC
<- expand.grid(
sim nc = c(1, seq(5, 50, 10)),
ns = 20,
d = 0.3,
icc = c(0, 0.3, 0.5, 0.8)
)
$power <- with(sim, powerICC(nc, ns, d, icc))
sim
|>
sim ggplot(aes(x = icc, y = power, color = factor(nc))) +
geom_line() +
labs(color = "Clusters") +
xlab("ICC") +
ylab("Power") +
ggtitle("N = 20 (per cluster), d = 0.3")
ICC in Psychology1
In Psychology it is common to collect clustered data and the data collection is usually time expensive. In addition, sample sizes are usually lower than the optimal level according to power calculation. Rao and Scott (1992) defined the concept of effective sample size for the reduction in the sample size (and thus power) according to the ICC in clustered data.
\[ N_{\text{eff}} = \frac{N}{1 + (\bar k - 1) \rho} \] Where \(\bar k\) is the average number of observations per cluster.
<- function(N, khat, icc){
neff / (1 + (khat - 1) * icc)
N }
ICC in Psychology
<- 100
N <- 10
khat <- seq(0, 1, 0.01)
icc
qplot(icc, neff(N, khat, icc), type = "l", lwd = 1)
Adding the fixed effect
Then we can add the fixed effect of Days
. The variable is numeric starting with 0 up to 5 days. We can just add the variable as it is.
<- lmer(Reaction ~ Days + (1|Subject), data = dat)
fit1 # equivalent to
# fit1 <- update(fit0, . ~ . + Days)
summary(fit1)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: Reaction ~ Days + (1 | Subject)
#> Data: dat
#>
#> REML criterion at convergence: 1786.5
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -3.2257 -0.5529 0.0109 0.5188 4.2506
#>
#> Random effects:
#> Groups Name Variance Std.Dev.
#> Subject (Intercept) 1378.2 37.12
#> Residual 960.5 30.99
#> Number of obs: 180, groups: Subject, 18
#>
#> Fixed effects:
#> Estimate Std. Error t value
#> (Intercept) 251.4051 9.7467 25.79
#> Days 10.4673 0.8042 13.02
#>
#> Correlation of Fixed Effects:
#> (Intr)
#> Days -0.371
Adding the fixed effect
This means that for each day we have an expected increase in reaction times of 10.47 milliseconds (or 0.01 seconds).
We have only the random intercept for subjects, thus we are assuming that each subject has the same sleep deprivation effect but can have different baseline reaction times.
head(coef(fit1)$Subject)
#> (Intercept) Days
#> 308 292.1888 10.46729
#> 309 173.5556 10.46729
#> 310 188.2965 10.46729
#> 330 255.8115 10.46729
#> 331 261.6213 10.46729
#> 332 259.6263 10.46729
# equivalent to fixef(fit1)["(Intercept)"] + ranef(fit1)$Subject for the random intercept
Adding the fixed effect
coef(fit1)$Subject |>
mutate(Subject = 1:n()) |>
mutate(b0 = fixef(fit1)[1],
b1 = fixef(fit1)[2]) |>
rename("b0i" = `(Intercept)`,
"b1i" = `Days`) |>
expand_grid(Days = unique(dat$Days)) |>
mutate(pi = b0i + b1i * Days,
p = b0 + b1 * Days) |>
ggplot(aes(x = Days, y = pi)) +
geom_line(aes(group = Subject),
alpha = 0.5) +
geom_line(aes(x = Days, y = p),
lwd = 1.5,
col = "firebrick") +
ylab("Reaction")
Adding the fixed effect
coef(fit1)$Subject |>
mutate(Subject = 1:n()) |>
mutate(b0 = fixef(fit1)[1],
b1 = fixef(fit1)[2]) |>
rename("b0i" = `(Intercept)`,
"b1i" = `Days`) |>
expand_grid(Days = unique(dat$Days)) |>
mutate(pi = b0i + b1i * Days,
p = b0 + b1 * Days) |>
ggplot(aes(x = Days, y = pi)) +
geom_line(aes(group = Subject),
alpha = 0.5) +
geom_line(aes(x = Days, y = p),
lwd = 1.5,
col = "firebrick") +
ylab("Reaction")
Adding the fixed effect
The same can be achieved using:
<- expand_grid(
DD Subject = unique(dat$Subject),
Days = unique(dat$Days)
)
$pi <- predict(fit1, newdata = DD)
DD
head(DD)
#> # A tibble: 6 × 3
#> Subject Days pi
#> <fct> <dbl> <dbl>
#> 1 308 0 292.
#> 2 308 1 303.
#> 3 308 2 313.
#> 4 308 3 324.
#> 5 308 4 334.
#> 6 308 5 345.
Is the fixed effect enough?
A big problem with mixed models is that effects can be included both as fixed and random and the choice is not always easy. From the plots at the beginning there is a clear variability in slopes that the model is ignoring.
<- lmer(Reaction ~ Days + (Days|Subject), data = dat)
fit2 summary(fit2)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: Reaction ~ Days + (Days | Subject)
#> Data: dat
#>
#> REML criterion at convergence: 1743.6
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -3.9536 -0.4634 0.0231 0.4634 5.1793
#>
#> Random effects:
#> Groups Name Variance Std.Dev. Corr
#> Subject (Intercept) 612.10 24.741
#> Days 35.07 5.922 0.07
#> Residual 654.94 25.592
#> Number of obs: 180, groups: Subject, 18
#>
#> Fixed effects:
#> Estimate Std. Error t value
#> (Intercept) 251.405 6.825 36.838
#> Days 10.467 1.546 6.771
#>
#> Correlation of Fixed Effects:
#> (Intr)
#> Days -0.138
Is the fixed effect enough?
The first important part is the estimation of the random part. Clearly the random slopes variance is not zero.
filter_output(summary(fit2), c("^Random effects|^Number of obs"))
#> Random effects:
#> Groups Name Variance Std.Dev. Corr
#> Subject (Intercept) 612.10 24.741
#> Days 35.07 5.922 0.07
#> Residual 654.94 25.592
#> Number of obs: 180, groups: Subject, 18
Is the fixed effect enough?
One of the most important part is that the standard error of the fixed coefficients is affected by the inclusion of the random slopes. Omitting the slopes was underestimating the standard error.
qualcosa su barr, keep it maximal, etc.
::compareCoefs(fit1, fit2) car
#> Calls:
#> 1: lmer(formula = Reaction ~ Days + (1 | Subject), data = dat)
#> 2: lmer(formula = Reaction ~ Days + (Days | Subject), data = dat)
#>
#> Model 1 Model 2
#> (Intercept) 251.41 251.41
#> SE 9.75 6.82
#>
#> Days 10.467 10.467
#> SE 0.804 1.546
#>
Is the fixed effect enough?
We can formally compare the models using a Likelihood Ratio Test (LRT)2:
anova(fit1, fit2)
#> Data: dat
#> Models:
#> fit1: Reaction ~ Days + (1 | Subject)
#> fit2: Reaction ~ Days + (Days | Subject)
#> npar AIC BIC logLik -2*log(L) Chisq Df Pr(>Chisq)
#> fit1 4 1802.1 1814.8 -897.04 1794.1
#> fit2 6 1763.9 1783.1 -875.97 1751.9 42.139 2 7.072e-10 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# or ranova(refitML(fit2))
Plotting the random slopes
<- expand_grid(
DD Subject = unique(dat$Subject),
Days = unique(dat$Days)
)
$pi <- predict(fit2, newdata = DD)
DD
|>
DD ggplot(aes(x = Days, y = pi, group = Subject)) +
geom_line() +
xlab("Days") +
ylab("Reaction")
Shrinkage!
We can compare the fit of the multilevel model with linear models for each cluster. We can use the fit_by_cluster()
function tha takes a formula, a grouping factor and fit a model for each cluster.
<- fit_by_cluster(Reaction ~ Days | Subject,
fitl
dat,model = lm)
Shrinkage!
<- fit_by_cluster(Reaction ~ Days | Subject,
fitl
dat,model = lm)
<- expand_grid(
DD Subject = unique(dat$Subject),
Days = unique(dat$Days)
)
$lmer <- predict(fit2, DD)
DD$Subject <- as.numeric(as.character(DD$Subject))
DD
<- unique(DD$Subject)
subjs $lm <- NA
DD
for(i in 1:length(fitl)){
<- predict(fitl[[i]], DD[DD$Subject == subjs[i], ])
pp $lm[DD$Subject == subjs[i]] <- pp
DD
}
|>
DD pivot_longer(c(lmer, lm)) |>
ggplot(aes(x = Days, y = value, color = name)) +
geom_line(lwd = 1) +
facet_wrap(~Subject) +
ylab("Reaction") +
labs(color = "Method")
Let’s simulate a multilevel model!
Let’s try to simulate the previous model
<- 10
N <- 10 # max number of days
DAYS <- 300 # grand-mean of reaction times at time 0
b00 <- 20 # increase in reaction time for each day
b1 <- 30 # standard deviation intercepts
sb0 <- 10 # standard deviaton slopes
sb1 <- 100 # residual standard deviation
s
# we are simulating an ICC of
^2 / (sb0^2 + s^2)
sb0#> [1] 0.08256881
<- expand_grid(
sim id = 1:N,
days = 0:(DAYS - 1)
)
# random intercepts and slopes, rho = 0
<- 0 + diag(1 - 0, 2)
R <- diag(c(sb0, sb1)) %*% R %*% diag(c(sb0, sb1))
VCOV
<- MASS::mvrnorm(N, c(0, 0), VCOV)
RE
<- RE[, 1]
b0i <- RE[, 2]
b1i
# linear predictor
$lp <- with(sim, b00 + b0i[id] + (b1 + b1i[id]) * days)
sim$rt <- rnorm(nrow(sim), sim$lp, s) sim
Let’s try to simulate the previous model
|>
sim ggplot(aes(x = rt)) +
geom_histogram(color = "black",
fill = "dodgerblue") +
xlab("Reaction Times")
Let’s try to simulate the previous model
|>
sim ggplot(aes(x = days, y = rt)) +
geom_point() +
scale_x_continuous(breaks = 0:DAYS) +
xlab("Days") +
ylab("Reaction Times") +
geom_smooth(method = "lm",
se = FALSE,
aes(group = id))
Let’s try to simulate the previous model
|>
sim ggplot(aes(x = days, y = rt)) +
geom_point() +
geom_line() +
scale_x_continuous(breaks = 0:DAYS) +
xlab("Days") +
ylab("Reaction Times") +
facet_wrap(~id, ncol = 4) +
geom_smooth(method = "lm",
se = FALSE)
Let’s try to simulate the previous model
A more realistic scenario could be to have an heterogeneous number of observations for each cluster.
# minimum 3 observations
<- sample(3:(DAYS - 1), size = N, replace = TRUE)
ndays_per_subject <- split(sim, sim$id)
siml
for(i in 1:length(siml)){
<- siml[[i]][1:ndays_per_subject[i], ]
siml[[i]]
}
<- do.call(rbind, siml)
sim_missing
# number of observations
tapply(sim_missing$rt, sim_missing$id, length)
#> 1 2 3 4 5 6 7 8 9 10
#> 8 3 3 6 9 5 4 5 7 4
Let’s try to simulate the previous model
|>
sim_missing ggplot(aes(x = days, y = rt)) +
geom_point() +
geom_line() +
scale_x_continuous(breaks = 0:DAYS) +
xlab("Days") +
ylab("Reaction Times") +
facet_wrap(~id, ncol = 4) +
geom_smooth(method = "lm",
se = FALSE)
Let’s fit the model
Here the random-intercepts and slopes model:
<- lmer(rt ~ days + (days|id), data = sim_missing)
fit summary(fit)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: rt ~ days + (days | id)
#> Data: sim_missing
#>
#> REML criterion at convergence: 650
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -2.10251 -0.66077 0.00732 0.63516 2.01985
#>
#> Random effects:
#> Groups Name Variance Std.Dev. Corr
#> id (Intercept) 3543.5 59.53
#> days 451.5 21.25 -0.77
#> Residual 10994.3 104.85
#> Number of obs: 54, groups: id, 10
#>
#> Fixed effects:
#> Estimate Std. Error t value
#> (Intercept) 313.18 30.20 10.369
#> days 12.14 11.03 1.101
#>
#> Correlation of Fixed Effects:
#> (Intr)
#> days -0.761
Can the multilevel model be misleading?
What do you see in this plot?
<- 10 # number of clusters/units
k <- 100 # number of participants within each unit
n <- n * k # total sample size
N <- rep(1:k, each = n)
unit
<- rnorm(N, 60 - 3 * (unit - 1), 5) # age equation
age <- 0 + rnorm(k, 0, 0.1)[unit] + 0.01 * age + 0.1 * (unit-1) + rnorm(N, 0, 0.1) # response equation
y <- data.frame(unit, age, y) dat
|>
dat ggplot(aes(x = age, y = y)) +
geom_point()
. . .
There is a clear negative relationship between age
and the response variable y
!
What do you see in this plot?
Let’s add the cluster information. What about now?
|>
dat ggplot(aes(x = age, y = y)) +
geom_point(aes(color = factor(unit))) +
labs(color = "Cluster")
Simpson’s paradox
- We have clustered data, and the relationship
y ~ age
seems to be different within and between clusters. - This phenomenon is called Simpson’s paradox and can be a serious problem in multilevel models.
- Clearly this is a problem only for variables at the observation level (not at the cluster level).
- For example, if clusters are schools and the observations are children.
age
is a variable at the children level (or aggregated at the school level). On the other side, the prestige of the school is a variable at the school level (the same for each child)
Simpson’s paradox
|>
dat ggplot(aes(x = age, y = y)) +
geom_point(aes(color = factor(unit))) +
labs(color = "Cluster") +
geom_smooth(method = "lm",
se = FALSE,
col = "black") +
geom_smooth(method = "lm",
aes(group = unit,
color = factor(unit)),
se = FALSE)
Simpson’s paradox, what to do?
The main strategy to deal with the Simpson’s paradox is centering the variables. Enders and Tofighi (2007) provide a clear overview of the strategy.
Simpson’s paradox, what to do?
We can define some centering functions and see how we can model the multilevel structure.
::print_fun(c(funs$cm, funs$cmc, funs$gmc)) filor
<- function(x, cluster){
cm <- tapply(x, cluster, mean)
cm
cm[cluster]
}<- function(x, cluster){
cmc - cm(x, cluster)
x
}<- function(x){
gmc - mean(x)
x }
Simpson’s paradox, between-clusters effect
Firsly we can calculate the clusters mean. This remove the within-effect and a linear model will capture only the between-effect.
|>
dat mutate(age_cm = cm(age, unit),
y_cm = cm(y, unit)) |>
ggplot(aes(x = age, y = y)) +
geom_point(alpha = 0.2,
aes(color = factor(unit))) +
geom_point(aes(x = age_cm, y = y_cm,
color = factor(unit)),
alpha = 1,
size = 4) +
labs(color = "Unit")
Simpson’s paradox, between-clusters effect
We have a negative age
effect between clusters, as expected.
<- dat |>
dat_cm group_by(unit) |>
summarise(y_cm = mean(y),
age_cm = mean(age))
lm(y_cm ~ age_cm, data = dat_cm)
#>
#> Call:
#> lm(formula = y_cm ~ age_cm, data = dat_cm)
#>
#> Coefficients:
#> (Intercept) age_cm
#> 1.92527 -0.02175
Simpson’s paradox, within-clusters effect
We can substract (i.e., centering) from each observation, the cluster mean.
|>
dat mutate(age_cm = cm(age, unit),
age_cmc = cmc(age, unit),
y_cm = cm(y, unit)) |>
ggplot(aes(x = age_cmc, y = y)) +
geom_point(alpha = 0.2,
aes(color = factor(unit))) +
xlab(latex2exp::TeX("$age - \\; \\bar{age_k}$")) +
geom_point(aes(x = 0, y = y_cm,
color = factor(unit))) +
geom_smooth(method = "lm",
aes(group = unit,
color = factor(unit)),
se = FALSE) +
labs(color = "Unit")
Simpson’s paradox, within-clusters effect
$age_cmc <- cmc(dat$age, dat$unit)
dat
<- fit_by_cluster(
fitl_cmc ~ age_cmc | unit,
y data = dat,
model = lm
)
Simpson’s paradox, within-clusters effect
All slopes are similar but positive (compared to the between-effect). Thus estimating only the between (or within) effect is completely misleading.
|>
fitl_cmc lapply(broom::tidy, conf.int = TRUE) |>
bind_rows(.id = "unit") |>
mutate(unit = as.numeric(unit)) |>
filter(term == "age_cmc") |>
ggplot(aes(x = estimate, y = unit)) +
geom_pointrange(aes(xmin = conf.low, xmax = conf.high)) +
geom_vline(xintercept = 0,
lty = "dashed",
col = "firebrick") +
xlab(latex2exp::TeX("$\\beta_{age}$"))
What about the multilevel model?
We can fit a multilevel model on the full dataset. What is the age
slope? within or between?
<- lmer(y ~ age + (1|unit), data = dat)
fit summary(fit)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: y ~ age + (1 | unit)
#> Data: dat
#>
#> REML criterion at convergence: -1710.7
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -3.7439 -0.6848 -0.0165 0.7191 3.0376
#>
#> Random effects:
#> Groups Name Variance Std.Dev.
#> unit (Intercept) 0.091429 0.30237
#> Residual 0.009747 0.09872
#> Number of obs: 1000, groups: unit, 10
#>
#> Fixed effects:
#> Estimate Std. Error t value
#> (Intercept) 0.4471843 0.0997411 4.483
#> age 0.0100761 0.0006074 16.590
#>
#> Correlation of Fixed Effects:
#> (Intr)
#> age -0.283
What about the multilevel model?
The estimated effect is a sort of weighted average of the within and between effect. This is usually not interesting, especially when the two effects are different. We want to isolate the between and within effects.
We need to include two version of the age
variable, one centered on the clusters and the other representing the clusters means.
$age_cmc <- cmc(dat$age, dat$unit)
dat$age_cm <- cm(dat$age, dat$unit)
dat
<- lmer(y ~ age_cmc + age_cm + (1|unit), data = dat) fit_bw
What about the multilevel model?
summary(fit_bw)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: y ~ age_cmc + age_cm + (1 | unit)
#> Data: dat
#>
#> REML criterion at convergence: -1724.4
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -3.7518 -0.6848 -0.0136 0.7126 3.0452
#>
#> Random effects:
#> Groups Name Variance Std.Dev.
#> unit (Intercept) 0.007530 0.08678
#> Residual 0.009746 0.09872
#> Number of obs: 1000, groups: unit, 10
#>
#> Fixed effects:
#> Estimate Std. Error t value
#> (Intercept) 1.9252710 0.1503998 12.801
#> age_cmc 0.0101730 0.0006083 16.724
#> age_cm -0.0217498 0.0031833 -6.832
#>
#> Correlation of Fixed Effects:
#> (Intr) ag_cmc
#> age_cmc 0.000
#> age_cm -0.983 0.000
What about the multilevel model?
The within-clusters effect can be also included as random slope3. Basically we allows not only the within effect to be different compared to the between effect but also that each cluster has a different sloope.
<- lmer(y ~ age_cmc + age_cm + (age_cmc|unit), data = dat)
fit_bw2 summary(fit_bw2)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: y ~ age_cmc + age_cm + (age_cmc | unit)
#> Data: dat
#>
#> REML criterion at convergence: -1724.5
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -3.7516 -0.6911 -0.0114 0.7116 3.0458
#>
#> Random effects:
#> Groups Name Variance Std.Dev. Corr
#> unit (Intercept) 7.528e-03 0.0867634
#> age_cmc 2.053e-08 0.0001433 1.00
#> Residual 9.746e-03 0.0987213
#> Number of obs: 1000, groups: unit, 10
#>
#> Fixed effects:
#> Estimate Std. Error t value
#> (Intercept) 1.933188 0.149989 12.889
#> age_cmc 0.010176 0.000610 16.683
#> age_cm -0.021920 0.003174 -6.905
#>
#> Correlation of Fixed Effects:
#> (Intr) ag_cmc
#> age_cmc 0.020
#> age_cm -0.983 -0.006
#> optimizer (nloptwrap) convergence code: 0 (OK)
#> boundary (singular) fit: see help('isSingular')
More on the Simpson’s Paradox
Kievit et al. (2013) describe the SP problem in Psychology with methods to detect it.
What do you think?
Practical session
- simulate clustered data with some predictors
- simulate the same dataset but with a some between-within differences
Diagnostics
The performance
package
The performance
package has a series of nice functions to visualize and assess the models fit.
Let’s go back to our reaction times:
<- lmer(Reaction ~ Days + (Days|Subject), data = sleepstudy)
fit
::check_model(fit) performance
The performance
package
<- lmer(Reaction ~ Days + (Days|Subject), data = sleepstudy)
fit
::check_model(fit) performance
Influence measures
Nieuwenhuis et al. (2012) describe the Influence.ME
package that computes the standard influence measures (Cook’s distance, DFbeta, etc.) for lme4
models.
Also the lme4:::influence.merMod()
compute all the measures. You can provide the groups =
argument to specify at which level performing the leave-one-out procedure.
Influence measures, small exercise
Both lme4:::influence.merMod()
and Influence.ME
do not provide (good enough) plotting tools. Write a set of functions that takes a lme4
model in input, calculate the influence measures, organize everything in a data.frame and plot the influence measures results with ggplot2
.
Effect sizes
\(R^2\) for multilevel models
The \(R^2\) for multilevel models is not computed as for standard regression models. Nakagawa, Johnson, and Schielzeth (2017) describe how to calculate the \(R^2\). They described the marginal (only fixed-effects) and the conditional (fixed and random-effects).
# or performance::r2()
::r2_nakagawa(fit) performance
#> # R2 for Mixed Models
#>
#> Conditional R2: 0.799
#> Marginal R2: 0.279
Clearly the conditional is always greater or equal to the marginal one.
References
Footnotes
This is just an approximation to estimate the impact of the ICC↩︎
Note that the
anova()
function is refitting models using Maximum Likelihood (and not REML). This is required to compare models using LRT↩︎Of course, including the clusters means as random-slopes is not possible. Note that data are simulated without random slopes, thus the model estimate parameters at the boundaries↩︎