Introduction to Meta-Analysis

Filippo Gambarota

Gianmarco Altoè

University of Padova

08 February 2024

Meta-analysis

Meta-analysis

  • The meta-analysis is a statistical procedure to combine evidence from a group of studies.
  • The idea is to “switch” the statistical unit from e.g., participants to studies
  • The motto could be that (appropriately) combining similar studies with a similar aim is the best way to understand something about a phenomenon

Meta-analysis and Systematic Review

Usually a meta-analysis work follows these steps:

  1. Identify the research question: is the treatment x effective?, Does the experimental effect y exist?
  2. Define inclusion/exclusion criteria: From the research question (1), keep only e.g., randomized controlled trials, studies with healthy participants, etc.
  3. Systematically search for studies: Analyze the literature to find all relevant studies
  4. Extract relevant information: Read, extract and organize relevant information e.g., sample size, treatment type, age, etc.
  5. Summarize the results: Create a narrative (flowcharts, tables, etc.) summary of included studies. This is the Systematic review part.
  6. Choose an effect size: Choose a way to standardize the effect across included studies
  7. Meta-analysis model: Choose and implement a meta-analysis model
  8. Interpret and report results

Before the fun part…

  • We are dealing only with the statistical part. The study selection, data extraction, studies evaluation etc. is another story
  • The quality of the meta-analysis is the quality of included studies

Unstandardized effect sizes

The basic idea of an effect size is just using the raw measure. For example studies using reaction times we can calculate the difference between two conditions as \(\overline X_1 - \overline X_2\):

Unstandardized effect sizes

But another study (with the same research question) could use another measure, e.g., accuracy. We can still (not the best strategy but) compute the difference between the group means.

Unstandardized effect sizes

Clearly we cannot directly compare the two effects but we need to standardize the measure.

Standardized effect sizes

To compare results from different studies, we should use a common metric. Frequently meta-analysts use standardized effect sizes. For example the Pearson correlation or the Cohen’s \(d\).

\[ r = \frac{\sum{(x_i - \bar{x})(y_i - \bar{y})}}{\sqrt{\sum{(x_i - \bar{x})^2}\sum{(y_i - \bar{y})^2}}} \tag{1}\]

\[ d = \frac{\bar{x_1} - \bar{x_2}}{s_p} \]

\[ s_p = \sqrt{\frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}} \]

Standardized effect sizes

The advantage of standardized effect size is that regardless the original variable, the interpretation and the scale is the same. For example the pearson correlation ranges between -1 and 1 and the Cohen’s \(d\) between \(- \infty\) and \(\infty\) and is interpreted as how many standard deviations the two groups/conditions differs.

Code
S <- matrix(c(1, 0.7, 0.7, 1), nrow = 2)
X <- MASS::mvrnorm(100, c(0, 2), S, empirical = TRUE)

par(mfrow = c(1,2))
plot(X, xlab = "x", ylab = "y", cex = 1.3, pch = 19,
     cex.lab = 1.2, cex.axis = 1.2,
     main = latex2exp::TeX(sprintf("$r = %.2f$", cor(X[, 1], X[, 2]))))
abline(lm(X[, 2] ~ X[, 1]), col = "firebrick", lwd = 2)


plot(density(X[, 1]), xlim = c(-5, 7), ylim = c(0, 0.5), col = "dodgerblue", lwd = 2,
     main = latex2exp::TeX(sprintf("$d = %.2f$", lsr::cohensD(X[, 1], X[, 2]))),
     xlab = "")
lines(density(X[, 2]), col = "firebrick", lwd = 2)

Standardized vs unstandardized

The main difference is (usually) the absence of a effect-size-variance relationship for unstandardized effects. For example, the variance of the difference between two groups is:

\[ V_d = \frac{s^2_1}{n_1} + \frac{s^2_2}{n_2} \tag{2}\]

While the variance of a Cohen’s \(d\) can be calculated as:

\[ V_d = \frac{n_1 + n_2}{n_1 n_2} + \frac{d^2}{2(n_1 + n_2)} \]

Standardized vs unstandardized

In this amazing blog post James Pustejovsky explained where the equations comes from. Basically, the \(\frac{n_1 + n_2}{n_1 n_2}\) term is the same as the \(\frac{s^2_1}{n_1} + \frac{s^2_2}{n_2}\) while the extra \(\frac{d^2}{2(n_1 + n_2)}\) is for the non-centrality induced by the standardized difference.

Code
n <- c(10, 50, 100)
d <- seq(0, 2, 0.001)

dd <- expand.grid(n = n, d = d)

dd$vumd <- with(dd, 1/n + 1/n)
dd$vd <- with(dd, (n + n) / (n * n) + d^2/(2 * (n + n)))

tidyr::pivot_longer(dd, 3:4) |> 
  ggplot(aes(x = d, y = value, color = name, linetype = factor(n))) +
  geom_line() +
  labs(linetype = "Sample Size",
       color = NULL)

Effect size sampling variability

Crucially, we can calculate also the sampling variability of each effect size. The sampling variability is the precision of estimated value.

For example, there are multiple methods to estimate the Cohen’s \(d\) sampling variability. For example:

\[ V_d = \frac{n_1 + n_2}{n_1 n_2} + \frac{d^2}{2(n_1 + n_2)} \]

Each effect size has a specific formula for the sampling variability. The sample size is usually the most important information. Studies with high sample size have low sampling variability.

Effect size sampling variability

As the sample size grows and tends to infinity, the sampling variability approach zero.

Unstandardized effect sizes

For the examples and plots I’m going to use simulated data. We simulate unstandardized effect sizes (UMD) because the computations are easier and the estimator is unbiased (e.g., Viechtbauer 2005)

More specifically we simulate hypothetical studies where two independent groups are compared:

\[ \Delta = \overline{X_1} - \overline{X_2} \tag{3}\]

\[ SE_{\Delta} = \sqrt{\frac{s^2_1}{n_1} + \frac{s^2_2}{n_2}} \]

With \(X_{1_i} \sim \mathcal{N}(0, 1)\) and \(X_{2_i} \sim \mathcal{N}(\Delta, 1)\)

The main advantage is that, compared to standardized effect size, the sampling variability do not depends on the effect size itself, simplifying the computations.

Simulating a single study

Simulating a single study - UMD

To simulate a single study using a UMD we need to generate data according to the appropriate model. Here we have a difference between two groups. We can assume that the two groups comes from a normal distribution where group 1 \(g_1 \sim \mathcal{N}(0, 1)\) and group 2 \(g_2 \sim \mathcal{N}(D, 1)\) where \(D\) is the effect size. Then using Equations 2, 3 we can estimate the effect size and the variance.

D <- 1  # effect size
n <- 50 # sample size
g1 <- rnorm(n, mean = 0, sd = 1)
g2 <- rnorm(n, mean = D, sd = 1)

# effect size
mean(g2) - mean(g1)
#> [1] 0.9756925

# variance
var(g1)/n + var(g2)/n
#> [1] 0.04502048

Simulating a single study - UMD

For simplicity we can wrap everything within a function:

# default sd = 1
sim_umd <- function(n1, n2 = NULL, D, sd = 1){
  if(is.null(n2)) n2 <- n1 # same to n1 if null 
  g1 <- rnorm(n1, mean = 0, sd = sd)
  g2 <- rnorm(n2, mean = D, sd = sd)
  yi <- mean(g2) - mean(g1)
  vi <- var(g1)/n1 + var(g2)/n2
  data.frame(yi, vi)
}

sim_umd(100, D = 0.5)
#>          yi         vi
#> 1 0.5241969 0.02044682
sim_umd(50, D = 0.1)
#>         yi         vi
#> 1 0.400809 0.03576154

Simulating a single study - UMD

We can also generate a large number of studies and check the distribution of effect size and sampling variances. Note that the real \(D = 1\) and the real variance \(V_D = 1/50 + 1/50 = 0.04\)

studies <- replicate(1000, sim_umd(n1 = 50, D = 1), simplify = FALSE) # simplify = FALSE return a list
studies <- do.call(rbind, studies) # to dataframe
head(studies)
#>          yi         vi
#> 1 0.8652181 0.03581586
#> 2 0.7999635 0.03695626
#> 3 0.8494439 0.04321523
#> 4 1.1619902 0.04550560
#> 5 0.5047229 0.05476030
#> 6 0.8489028 0.04442915

Simulating a single study - UMD

Then we can plot the sampling distributions:

Simulating a single study - SMD

The idea is the same when simulating a SDM but we need extra steps. Let’s adjust the previous function:

sim_smd <- function(n1, n2 = NULL, D){
  if(is.null(n2)) n2 <- n1 # same to n1 if null 
  g1 <- rnorm(n1, mean = 0, sd = 1)
  g2 <- rnorm(n2, mean = D, sd = 1)
  
  v1 <- var(g1)
  v2 <- var(g2)
  
  # pooled standard deviation
  sp <- sqrt((v1 * (n1 - 1) + v2 * (n2 - 1)) / (n1 + n2 - 2))
  
  yi <- (mean(g2) - mean(g1)) / sp
  vi <- (n1 + n2) / (n1 * n2) + yi^2/(2*(n1 + n2))
  data.frame(yi, vi)
}

Simulating a single study - SMD

When working with SMD, calculating the sampling variance can be challenging. Veroniki et al. (2016) identified 16 different estimators with different properties. Furthermore, it is a common practice to correct the SDM effect and variance using the Hedges’s correction (Hedges 1989).

You can directly implement another equation for the sampling variance or the Hedges’s correction directly in the simulation function.

Simulating a single study - Pearson \(\rho\)

Another common effect size is the Pearson correlation coefficient \(\rho\) (and the estimate \(r\), see Equation 1). The variance of the correlation is calculated as:

\[ V_{r} = \frac{(1 - r^2)^2}{n - 1} \]

Simulating a single study - Pearson \(\rho\)

There is a huge dependency between \(r\) and it’s sampling variance (similar to the Cohen’s \(d\)):

Code
n <- 50
r <- seq(0, 1, 0.01)
v <- (1 - r^2)^2 / (n - 1) 

plot(r, v, type = "l", main = "N = 50", xlab = "r", ylab = latex2exp::TeX("$V_r$"))

Simulating a single study - Pearson \(\rho\)

For this reason the so-called Fisher’s \(z\) transformation is used to stabilize the relationship.

\[ z = \frac{\log{\frac{1 + r}{1 - r}}}{2} \]

\[ V_z = \frac{1}{n - 3} \]

Now the variance is completely independent from the correlation value.

Simulating a single study - Pearson \(\rho\)

This is the relationship between \(r\) and \(z\):

Code
n <- 50
r <- seq(-1, 1, 0.01)
v <- (1 - r^2)^2 / (n - 1) 
z <- log((1 + r)/(1 - r))/2

plot(z, r, type = "l", xlab = "Fisher's z", ylab = "Correlation", main = "Correlation to Fisher's z")

Simulating a single study - Pearson \(\rho\)

To simulate a study using correlations we can use the MASS::mvrnorm() function that can generate correlated data from a multivariate normal distribution.

sim_r <- function(n, r){
  R <- r + diag(1 - r, nrow = 2) # 2 x 2 correlation matrix
  X <- MASS::mvrnorm(n, mu = c(0, 0), Sigma = R) # the means are not relevant here
  r <- cor(X)[1, 2] # extract correlation
  vr <- (1 - r^2)^2 / (n - 1)  # variance of r
  yi <- log((1 + r)/(1 - r))/2 # fisher z
  vi <- 1 / (n - 3) # fisher z variance
  data.frame(yi, vi, r, vr) # including also the pearson correlation and variance
}

Simulating a single study - Pearson \(\rho\)

sim_r(100, 0.5)
#>         yi         vi         r          vr
#> 1 0.347177 0.01030928 0.3338696 0.007974622
sim_r(50, 0.8)
#>         yi        vi         r          vr
#> 1 1.037175 0.0212766 0.7767701 0.003210488

# also here the sampling distributions
studies <- replicate(1000, sim_r(50, 0.7), simplify = FALSE)
studies <- do.call(rbind, studies)
summary(studies)
#>        yi               vi                r                vr          
#>  Min.   :0.3577   Min.   :0.02128   Min.   :0.3432   Min.   :0.001104  
#>  1st Qu.:0.7863   1st Qu.:0.02128   1st Qu.:0.6563   1st Qu.:0.003913  
#>  Median :0.8812   Median :0.02128   Median :0.7070   Median :0.005104  
#>  Mean   :0.8789   Mean   :0.02128   Mean   :0.6987   Mean   :0.005435  
#>  3rd Qu.:0.9724   3rd Qu.:0.02128   3rd Qu.:0.7497   3rd Qu.:0.006614  
#>  Max.   :1.3583   Max.   :0.02128   Max.   :0.8760   Max.   :0.015883

More on effect sizes

The same logic can be applied to any situation. Just understand the data generation process, find the effect size equations and generate data.

  • Borenstein et al. (2009) for all effect sizes equations. Also with equations to convert among effect sizes (useful in real-world meta-analyses)
  • the metafor::escalc() function implements basically any effect size. You can see also the source code to see the actual R implementation.
  • Guide to effect sizes: a modern and complete overview of effect sizes

Simulating from sampling distributions [#extra]

The previous simulation examples are participant-level simulations. In fact we simulated \(n\) observations then we aggregated calculating the effect sizes.

This is the most flexible and general data simulation strategy but is computationally not efficient.

Another strategy individuate the exact effect size sampling distribution. Then we can sample directly from it. The downside is that we need to derive (or find) the equation.

Simulating from sampling distributions [#extra]

For example, when generating UMD we can simulate from the sampling distribution presented in Slide 2.4.

\[ y_i \sim \mathcal{N}(\theta, \sqrt{\sigma^2_i}) \] \[ \sigma^2_i \sim \frac{\chi^2_{n_1 + n_2 - 2}}{n_1 + n_2 - 2} (\frac{1}{n_1} + \frac{1}{n_2}) \]

In this way we can sample \(k\) effects and sampling variances directly from the sampling distributions. Without generating data and then aggregate.

Simulating from sampling distributions [#extra]

We can again put everything within a function:

sim_k_umd <- function(k, D, n1, n2 = NULL){
  if(is.null(n2)) n2 <- n1
  yi <- rnorm(k, D, sqrt(1/n1 + 1/n2))
  vi <- (rchisq(k, n1 + n2 - 2) / (n1 + n2 - 2)) * (1/n1 + 1/n2)
  data.frame(yi, vi)
}

Simulating from sampling distributions [#extra]

sim_k_umd(k = 10, D = 0.5, n1 = 50)
#>             yi         vi
#> 1   0.59332986 0.04004863
#> 2   0.25204033 0.03552118
#> 3   0.42781903 0.04540351
#> 4   0.49591426 0.03711613
#> 5   0.44485786 0.03705320
#> 6   0.55863880 0.03915509
#> 7   0.65454652 0.03982319
#> 8  -0.07582759 0.03391864
#> 9   0.52141545 0.03686712
#> 10  0.33727350 0.04003404

Simulating from sampling distributions [#extra]

We can compare the two methods and see that we are sampling from the same data generation process.

Code
k <- 1e4
s_umd <- sim_k_umd(k, D = 1, n1 = 50)
ip_umd <- replicate(k, sim_umd(n1 = 50, D = 1), simplify = FALSE)
ip_umd <- do.call(rbind, ip_umd)

Simulating from sampling distributions [#extra]

The actual advantage is in terms of computational speed. To simulate \(k = 10\) studies for 1000 times (similar to a standard Monte Carlo simulation):

bench <- microbenchmark::microbenchmark(
  sampling = sim_k_umd(k = 10, n1 = 50, D = 1),
  participant = replicate(10, sim_umd(n1 = 50, D = 1)),
  times = 1000 
)

(bench <- summary(bench))
#>          expr    min       lq       mean   median        uq      max neval cld
#> 1    sampling  76.67   78.925   86.13985   80.524   83.9885 2860.488  1000  a 
#> 2 participant 984.82 1000.462 1071.56620 1016.103 1047.4885 4540.422  1000   b

bench$mean[2] / bench$mean[1] # faster
#> [1] 12.43984

References

Borenstein, Michael, Larry V Hedges, Julian P T Higgins, and Hannah R Rothstein. 2009. “Introduction to Meta-Analysis.” https://doi.org/10.1002/9780470743386.
Hedges, Larry V. 1989. “An Unbiased Correction for Sampling Error in Validity Generalization Studies.” The Journal of Applied Psychology 74 (June): 469–77. https://doi.org/10.1037/0021-9010.74.3.469.
Veroniki, Areti Angeliki, Dan Jackson, Wolfgang Viechtbauer, Ralf Bender, Jack Bowden, Guido Knapp, Oliver Kuss, Julian P T Higgins, Dean Langan, and Georgia Salanti. 2016. “Methods to Estimate the Between-Study Variance and Its Uncertainty in Meta-Analysis.” Research Synthesis Methods 7 (March): 55–79. https://doi.org/10.1002/jrsm.1164.
Viechtbauer, Wolfgang. 2005. “Bias and Efficiency of Meta-Analytic Variance Estimators in the Random-Effects Model.” Journal of Educational and Behavioral Statistics: A Quarterly Publication Sponsored by the American Educational Research Association and the American Statistical Association 30 (September): 261–93. https://doi.org/10.3102/10769986030003261.