Power analysis

Filippo Gambarota

Gianmarco Altoè

University of Padova

08 February 2024

Statistical Power

Power in a nutshell1

The stastistical power is defined as the probability of correctly rejecting the null hypothesis \(H_0\).

Power in a nutshell

For simple designs such as t-test, ANOVA, etc. the power can be computed analytically. For example, let’s find the power of detecting an effect size of \(d = 0.5\) with \(n1 = n2 = 30\).

d <- 0.5
alpha <- 0.05
n1 <- n2 <- 30
sp <- 1

# Calculate non-centrality parameter (delta)
delta <- d * sqrt(n1 * n2 / (n1 + n2))

# Calculate degrees of freedom
df <- n1 + n2 - 2

# Calculate critical t-value
critical_t <- qt(1 - alpha / 2, df)

# Calculate non-central t-distribution value
non_central_t <- delta / sp

# Calculate power
1 - pt(critical_t - non_central_t, df)
#> [1] 0.4741093

Power in a nutshell

The same can be done using the pwr package:

power <- pwr::pwr.t.test(n = n1, d = 0.5)
power
#> 
#>      Two-sample t test power calculation 
#> 
#>               n = 30
#>               d = 0.5
#>       sig.level = 0.05
#>           power = 0.4778965
#>     alternative = two.sided
#> 
#> NOTE: n is number in *each* group

Power in a nutshell

Power by simulations

Sometimes the analytical solution is not available or we can estimate the power for complex scenarios (missing data, unequal variances, etc.). The general workflow is:

  1. Generate data under the parametric assumptions
  2. Fit the appropriate model
  3. Extract the relevant metric (e.g., p-value)
  4. Repeat 1-3 several times (1000, 10000 or more)
  5. Summarise the results

For example, the power is the number of p-values lower than \(\alpha\) over the total number of simulations.

Power by simulations

Let’s see the previous example using simulations:

nsim <- 5000
p <- rep(NA, nsim)
for(i in 1:nsim){
    g1 <- rnorm(n1, 0, 1)
    g2 <- rnorm(n2, d, 1)
    p[i] <- t.test(g1, g2)$p.value
}
mean(p <= alpha)
#> [1] 0.4738

The estimated value is pretty close to the analytical value.

What about meta-analysis

Also for meta-analysis we have the two approaches analytical and simulation-based.

  • The analytical approach for (intercept-only) random-effects and equal-effects model can be found on Borenstein et al. (2009) (Chapter 29). See also Valentine, Pigott, and Rothstein (2010) and L. V. Hedges and Pigott (2001)
  • Jackson and Turner (2017) proposed a similar but improved approach

Analytical approach

For the analytical approach we need to make some assumptions:

  • \(\tau^2\) and \(\mu_{\theta}\) (or \(\theta\)) are estimated without errors
  • The \(\sigma^2_i\) (thus the sample size) of each \(k\) study is the same

Under these assumptions the power is:

\[ (1 - \Phi(c_{\alpha} - \lambda)) + \Phi(-c_{\alpha} - \lambda) \]

Where \(c_{\alpha}\) is the critical \(z\) value and \(\lambda\) is the observed statistics.

Analytical approach - EE model

For an EE model the only source of variability is the sampling variability, thus \(\lambda\):

\[ \lambda_{EE} = \frac{\theta}{\sqrt{\sigma^2_{\theta}}} \]

And recalling previous assuptions where \(\sigma^2_1 = \dots = \sigma^2_k\):

\[ \sigma^2_{\theta} = \frac{\sigma^2}{k} \]

Analytical approach - EE model

For example, a meta-analysis of \(k = 15\) studies where each study have a sample size of \(n1 = n2 = 20\) (assuming again unstandardized mean difference as effect size):

k <- 10
theta <- 0.3
n1 <- n2 <- 25
vt <- 1/n1 + 1/n2
vtheta <- vt/k
lambda <- theta/sqrt(vtheta)
zcrit <- abs(qnorm(0.05/2))
(1 - pnorm(zcrit - lambda)) + pnorm(-zcrit - lambda)
#> [1] 0.9183621

Be careful that the EE model is assuming \(\tau^2 = 0\) thus is like having a huge study with \(k \times n_1\) participants per group.

Analytical approach - RE model

For the RE model we just need to include \(\tau^2\) in the \(\lambda\) calculation, thus:

\[ \sigma^{2\star}_{\theta} = \frac{\sigma^2 + \tau^2}{k} \]

The other calculations are the same as the EE model.

Analytical approach - RE model

k <- 10
mu <- 0.3
tau2 <- 0.1
n1 <- n2 <- 25
vt <- 1/n1 + 1/n2
vtheta <- (vt + tau2)/k
lambda <- mu/sqrt(vtheta)
zcrit <- abs(qnorm(0.05/2))
(1 - pnorm(zcrit - lambda)) + pnorm(-zcrit - lambda)
#> [1] 0.6087795

The power is reduced because we are considering another source of heterogeneity. Clearly the maximal power of \(k\) studies is achieved when \(\tau^2 = 0\). Hypothetically we can increase the power either increasing \(k\) (the number of studies) or reducing \(\sigma^2_k\) (increasing the number of participants in each study).

Analytical approach - Power curves

The most informative approach is plotting the power curves for different values of \(\tau^2\), \(\sigma^2_k\) and \(\theta\) (or \(\mu_{\theta}\)).

You can use the power_meta() function:

power_meta <- function(es, k, tau2 = 0, n1, n2 = NULL, alpha = 0.05){
  if(is.null(n2)) n2 <- n1
  zc <- qnorm(1 - alpha/2)
  vt <- 1/n1 + 1/n2
  ves <- (vt + tau2)/k
  lambda <- es/sqrt(ves)
  (1 - pnorm(zc - lambda)) + pnorm(-zc - lambda)
}

Analytical approach - Power curves

Code
k <- c(5, 10, 30, 50, 100)
es <- c(0.1, 0.3)
tau2 <- c(0, 0.05, 0.1, 0.2)
n <- c(10, 30, 50, 100, 1000)

power <- expand_grid(es, k, tau2, n1 = n)
power$power <- power_meta(power$es, power$k, power$tau2, power$n1)

power$es <- factor(power$es, labels = latex2exp::TeX(sprintf("$\\mu_{\\theta} = %s$", es)))
power$tau2 <- factor(power$tau2, labels = latex2exp::TeX(sprintf("$\\tau^2 = %s$", tau2)))

ggplot(power, aes(x = factor(k), y = power, color = factor(n1))) +
  geom_point() +
  geom_line(aes(group = factor(n1))) +
  facet_grid(es~tau2, labeller = label_parsed) +
  xlab("Number of Studies (k)") +
  ylab("Power") +
  labs(
    color = latex2exp::TeX("$n_1 = n_2$")
  )

Analytical approach - Power curves

With the analytical approach we can (quickly) do interesting stuff. For example, we fix the total \(N = n_1 + n_2\) for a series of \(k\) and check the power.

Code
# average meta k = 20, n = 30
kavg <- 20
navg <- 30
N <- kavg * (navg*2)
es <- 0.3
tau2 <- c(0, 0.05, 0.1, 0.2)
k <- seq(10, 100, 10)
n1 <- n2 <- round((N/k)/ 2)

sim <- data.frame(es, k, n1, n2)
sim <- expand_grid(sim, tau2 = tau2)
sim$power <- power_meta(sim$es, sim$k, sim$tau2, sim$n1, sim$n2)
sim$N <- with(sim, k * (n1 + n2))

ggplot(sim, aes(x = k, y = power, color = factor(tau2))) +
  geom_line() +
  ggtitle(latex2exp::TeX("Total N ($n_1 + n_2$) = 1200")) +
  labs(x = "Number of Studies (k)",
       y = "Power",
       color = latex2exp::TeX("$\\tau^2$"))

As long as \(\tau^2 \neq 0\) we need more studies (even if the total sample size is the same).

Simulation-based power

With simulations we can fix or relax the previous assumptions. For example, let’s compute the power for an EE model:

k <- 10
es <- 0.3
tau2 <- 0
n1 <- n2 <- rep(25, k)
nsim <- 1000 # more is better
pval <- rep(NA, nsim)

for(i in 1:nsim){
  dat <- sim_studies(k, es, tau2, n1, n2)
  fit <- rma(yi, vi, data = dat, method = "EE")
  pval[i] <- fit$pval
}

mean(pval <= 0.05)
#> [1] 0.933

The value is similar to the analytical simulation. But we can improve it e.g. generating heterogeneous sample sizes.

Simulation-based power curve

By repeating the previous approach for a series of parameters we can easily draw a power curve:

k <- c(5, 10, 50, 100)
es <- 0.1
tau2 <- c(0, 0.05, 0.1, 0.2)
nsim <- 1000

grid <- expand_grid(k, es, tau2)
power <- rep(NA, nrow(grid))

for(i in 1:nrow(grid)){
  pval <- rep(NA, nsim)
  for(j in 1:nsim){
    n <- rpois(grid$k[i], 40)
    dat <- sim_studies(grid$k[i], grid$es[i], grid$tau2[i], n)
    fit <- rma(yi, vi, data = dat)
    pval[j] <- fit$pval
  }
  power[i] <- mean(pval <= 0.05)
}
grid$power <- power

Simulation-based power curve

Code
ggplot(grid, aes(x = factor(k), y = power, color = factor(tau2))) +
  geom_point() +
  geom_line(aes(group = factor(tau2))) +
  labs(
    y = "Power",
    x = "Number of studies (k)",
    color = latex2exp::TeX("$\\tau^2$")
  )

Power analysis for meta-regression

The power for a meta-regression can be easily computed by simulating the moderator effect. For example, let’s simulate the effect of a binary predictor \(x\).

k <- seq(10, 100, 10)
b0 <- 0.2 # average of group 1
b1 <- 0.1 # difference between group 1 and 2
tau2r <- 0.2 # residual tau2
nsim <- 1000
power <- rep(NA, length(k))

for(i in 1:length(k)){
  es <- b0 + b1 * rep(0:1, each = k[i]/2)
  pval <- rep(NA, nsim)
  for(j in 1:nsim){
    n <- round(runif(k[i], 10, 100))
    dat <- sim_studies(k[i], es, tau2r, n)
    fit <- rma(yi, vi, data = dat)
    pval[j] <- fit$pval
  }
  power[i] <- mean(pval <= 0.05)
}

Power analysis for meta-regression

Then we can plot the results:

Code
power <- data.frame(k = k, power = power)
ggplot(power, aes(x = k, y = power)) +
  geom_line()

Multilab Studies

Multilab studies

Multilab studies can be seen as a meta-analysis that is planned (a prospective meta-analysis) compared to standard retrospective meta-analysis.

The statistical approach is (roughly) the same with the difference that we have control both on \(k\) (the number of experimental units) and \(n\) the sample size within each unit.

In multilab studies we have also the raw data (i.e., participant-level data) thus we can do more complex multilevel modeling.

Meta-analysis as multilevel model

Assuming that we have \(k\) studies with raw data available there is no need to aggregate, calculate the effect size and variances and then use an EE or RE model.

k <- 50
es <- 0.4
tau2 <- 0.1
n <- round(runif(k, 10, 100))
dat <- vector(mode = "list", k)
thetai <- rnorm(k, 0, sqrt(tau2))

for(i in 1:k){
  g1 <- rnorm(n[i], 0, 1)
  g2 <- rnorm(n[i], es + thetai[i], 1)
  d <- data.frame(id = 1:(n[i]*2), unit = i, y = c(g1, g2), group = rep(c(0, 1), each = n[i]))
  dat[[i]] <- d
}

dat <- do.call(rbind, dat)
ht(dat)
#>      id unit          y group
#> 1     1    1  0.2893323     0
#> 2     2    1 -1.1718120     0
#> 3     3    1 -0.8300565     0
#> 4     4    1  0.1406313     0
#> 5     5    1  1.0616239     0
#> 5473 65   50  0.3235104     1
#> 5474 66   50  0.4872725     1
#> 5475 67   50 -0.5355670     1
#> 5476 68   50  2.4656131     1
#> 5477 69   50 -0.6286605     1
#> 5478 70   50  2.4208509     1

Meta-analysis as multilevel model

This is a simple multilevel model (pupils within classrooms or trials within participants). We can fit the model using lme4::lmer():

library(lme4)
fit_lme <- lmer(y ~ group + (1|unit), data = dat)
summary(fit_lme)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: y ~ group + (1 | unit)
#>    Data: dat
#> 
#> REML criterion at convergence: 15712.6
#> 
#> Scaled residuals: 
#>     Min      1Q  Median      3Q     Max 
#> -3.4455 -0.6876  0.0186  0.6977  3.2117 
#> 
#> Random effects:
#>  Groups   Name        Variance Std.Dev.
#>  unit     (Intercept) 0.04006  0.2002  
#>  Residual             1.01432  1.0071  
#> Number of obs: 5478, groups:  unit, 50
#> 
#> Fixed effects:
#>              Estimate Std. Error t value
#> (Intercept) -0.003784   0.034715  -0.109
#> group        0.417518   0.027215  15.342
#> 
#> Correlation of Fixed Effects:
#>       (Intr)
#> group -0.392

Meta-analysis as multilevel model

Let’s do the same as a meta-analysis. Firstly we compute the effect sizes for each unit:

datagg <- dat |> 
  group_by(unit, group) |> 
  summarise(m = mean(y),
            sd = sd(y),
            n = n()) |> 
  pivot_wider(names_from = group, values_from = c(m, sd, n), names_sep = "")

datagg <- escalc("MD", m1i = m1, m2i = m0, sd1i = sd1, sd2i = sd0, n1i = n1, n2i = n0, data = datagg)

Meta-analysis as multilevel model

#> 
#>    unit           m0          m1       sd0       sd1 n0 n1      yi     vi 
#> 1     1 -0.182281099  0.67740970 0.9313053 0.8599736 32 32  0.8597 0.0502 
#> 2     2 -0.080684001  0.09258349 0.8352305 1.0518596 59 59  0.1733 0.0306 
#> 3     3 -0.194020177  0.04229138 1.0409077 0.9633482 80 80  0.2363 0.0251 
#> 4     4 -0.122323160  0.61829855 0.9724882 0.8976187 36 36  0.7406 0.0487 
#> 5     5  0.240849696  0.27411106 0.9550909 0.8178345 36 36  0.0333 0.0439 
#> 45   45  0.014118034 -0.16639800 1.1398257 0.9029696 81 81 -0.1805 0.0261 
#> 46   46  0.258466056  0.70023676 0.9593704 1.0152041 31 31  0.4418 0.0629 
#> 47   47 -0.108808858 -0.06605484 1.0886507 1.0221369 47 47  0.0428 0.0474 
#> 48   48 -0.101801658  1.06380067 0.9214148 1.0912795 27 27  1.1656 0.0756 
#> 49   49 -0.004267983  0.06425461 1.0686583 0.9415882 58 58  0.0685 0.0350 
#> 50   50 -0.108280467  0.36343002 0.9505833 1.2846007 35 35  0.4717 0.0730

Meta-analysis as multilevel model

Then we can fit the model:

fit_rma <- rma(yi, vi, data = datagg)
fit_rma
#> 
#> Random-Effects Model (k = 50; tau^2 estimator: REML)
#> 
#> tau^2 (estimated amount of total heterogeneity): 0.1112 (SE = 0.0306)
#> tau (square root of estimated tau^2 value):      0.3335
#> I^2 (total heterogeneity / total variability):   75.80%
#> H^2 (total variability / sampling variability):  4.13
#> 
#> Test for Heterogeneity:
#> Q(df = 49) = 207.6604, p-val < .0001
#> 
#> Model Results:
#> 
#> estimate      se    zval    pval   ci.lb   ci.ub      
#>   0.4653  0.0553  8.4168  <.0001  0.3570  0.5737  *** 
#> 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Meta-analysis as multilevel modeling

Actually the results are very similar where the standard deviation of the intercepts of the lme4 model is \(\approx \tau\) and the group effect is the intercept of the rma model.

data.frame(
  b = c(fixef(fit_lme)[2], fit_rma$b),
  se = c(summary(fit_lme)$coefficients[2, 2], fit_rma$se),
  tau2 = c(as.numeric(VarCorr(fit_lme)[[1]]), fit_rma$tau2),
  model = c("lme4", "metafor")
)
#>               b         se       tau2   model
#> group 0.4175182 0.02721483 0.04006087    lme4
#>       0.4653258 0.05528509 0.11120165 metafor

Actually the two model are not exactly the same, especially when using only the aggregated data. See https://www.metafor-project.org/doku.php/tips:rma_vs_lm_lme_lmer.

Meta-analysis as multilevel modeling

To note, aggregating data and then computing a standard (non-weighted) model (sometimes this is done with trial-level data) is wrong and should be avoided. Using meta-analysis is clear that aggregating without taking into account the cluster (e.g., study or subject) precision is misleading.

dataggl <- datagg |> 
  select(unit, m0, m1) |>
  pivot_longer(c(m0, m1), values_to = "y", names_to = "group")

summary(lmer(y ~ group + (1|unit), data = dataggl))
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: y ~ group + (1 | unit)
#>    Data: dataggl
#> 
#> REML criterion at convergence: 47.9
#> 
#> Scaled residuals: 
#>      Min       1Q   Median       3Q      Max 
#> -2.31836 -0.54276 -0.03706  0.61381  2.36896 
#> 
#> Random effects:
#>  Groups   Name        Variance Std.Dev.
#>  unit     (Intercept) 0.01407  0.1186  
#>  Residual             0.07520  0.2742  
#> Number of obs: 100, groups:  unit, 50
#> 
#> Fixed effects:
#>             Estimate Std. Error t value
#> (Intercept) -0.02872    0.04226  -0.680
#> groupm1      0.48411    0.05485   8.827
#> 
#> Correlation of Fixed Effects:
#>         (Intr)
#> groupm1 -0.649

Mulitlab sample size vs unit

When planning a multilab study there is an important decision between increasing the sample size within each unit (more effort for each lab) or recruiting more units with less participants per unit (more effort for the organization).

We could have the situation where the number of units \(k\) is fixed and we can only increase the sample size.

We can also simulate scenarios where some units collect all data while others did not complete the data collection.

Fixed \(k\), increasing \(n\)

Let’s assume that the maximum number of labs is \(10\). How many participants are required assuming a certain amount of heterogeneity?

es <- 0.2
k <- 10
n1 <- n2 <- seq(10, 500, 10)
tau2 <- c(0.01, 0.05, 0.1, 0.2)
sim <- expand_grid(k, es, tau2, n1)
sim$n2 <- sim$n1
sim$vt <- with(sim, 1/n1 + 1/n2)
sim$I2 <- round(with(sim, tau2 / (tau2 + vt)) * 100, 3)
sim$power <- power_meta(sim$es, sim$k, sim$tau2, sim$n1, sim$n2)

ht(sim)
#> # A tibble: 11 × 8
#>        k    es  tau2    n1    n2      vt    I2 power
#>    <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl>
#>  1    10   0.2  0.01    10    10 0.2      4.76 0.281
#>  2    10   0.2  0.01    20    20 0.1      9.09 0.479
#>  3    10   0.2  0.01    30    30 0.0667  13.0  0.627
#>  4    10   0.2  0.01    40    40 0.05    16.7  0.733
#>  5    10   0.2  0.01    50    50 0.04    20    0.807
#>  6    10   0.2  0.2    450   450 0.00444 97.8  0.288
#>  7    10   0.2  0.2    460   460 0.00435 97.9  0.288
#>  8    10   0.2  0.2    470   470 0.00426 97.9  0.288
#>  9    10   0.2  0.2    480   480 0.00417 98.0  0.288
#> 10    10   0.2  0.2    490   490 0.00408 98    0.288
#> 11    10   0.2  0.2    500   500 0.004   98.0  0.288

Fixed \(k\), increasing \(n\)

With a fixed \(k\), we could reach a plateau even increasing \(n\). This depends also on \(\mu_{\theta}\) and \(\tau^2\).

Multilab replication studies

A special type of multilab studies are the replication projects. There are some paper discussing how to view replication studies as meta-analyses and how to plan them.

  • Larry V. Hedges and Schauer (2021)
  • Jacob M. Schauer (2022)
  • Jacob M. Schauer and Hedges (2020)
  • J. M. Schauer and Hedges (2021)
  • Jacob M. Schauer (2023)
  • Larry V. Hedges and Schauer (2019)

References

Borenstein, Michael, Larry V Hedges, Julian P T Higgins, and Hannah R Rothstein. 2009. “Introduction to Meta-Analysis.” https://doi.org/10.1002/9780470743386.
Hedges, L V, and T D Pigott. 2001. “The Power of Statistical Tests in Meta-Analysis.” Psychological Methods 6 (September): 203–17. https://www.ncbi.nlm.nih.gov/pubmed/11570228.
Hedges, Larry V, and Jacob M Schauer. 2019. “Statistical Analyses for Studying Replication: Meta-Analytic Perspectives.” Psychological Methods 24 (October): 557–70. https://doi.org/10.1037/met0000189.
———. 2021. “The Design of Replication Studies.” Journal of the Royal Statistical Society. Series A, 184 (March): 868–86. https://doi.org/10.1111/rssa.12688.
Jackson, Dan, and Rebecca Turner. 2017. “Power Analysis for Random-Effects Meta-Analysis.” Research Synthesis Methods 8 (September): 290–302. https://doi.org/10.1002/jrsm.1240.
Schauer, J M, and L V Hedges. 2021. “Reconsidering Statistical Methods for Assessing Replication.” Psychological Methods 26 (February): 127–39. https://doi.org/10.1037/met0000302.
Schauer, Jacob M. 2022. “Replicability and Meta-Analysis.” In Avoiding Questionable Research Practices in Applied Psychology, edited by William O’Donohue, Akihiko Masuda, and Scott Lilienfeld, 301–42. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-031-04968-2_14.
———. 2023. “On the Accuracy of Replication Failure Rates.” Multivariate Behavioral Research 58: 598–615. https://doi.org/10.1080/00273171.2022.2066500.
Schauer, Jacob M, and Larry V Hedges. 2020. “Assessing Heterogeneity and Power in Replications of Psychological Experiments.” Psychological Bulletin 146 (August): 701–19. https://doi.org/10.1037/bul0000232.
Valentine, Jeffrey C, Therese D Pigott, and Hannah R Rothstein. 2010. “How Many Studies Do You Need?: A Primer on Statistical Power for Meta-Analysis.” Journal of Educational and Behavioral Statistics: A Quarterly Publication Sponsored by the American Educational Research Association and the American Statistical Association 35 (April): 215–47. https://doi.org/10.3102/1076998609346961.