Multiverse

Replicability Crisis in Science?

Filippo Gambarota

8-10 July 2024

Multiverse analysis

Multiverse analysis Steegen et al. (2016)

Even a simple data analysis is characterized by several (often arbitrary) choices
The impact of these choices is not always clear, known or easy to predict
Researchers (with or without awareness) usually report a single set of choices

Some examples

using a predictor as continous or choosing some thresholds and transforming to categorical
including or excluding an observation (e.g., outlier)
including or excluding a covariate (e.g., controlling for age or not)
using a linear model or an ordinal regression for discrete ordered data
…

The garden of forking paths

A tree of possibilities…

The garden of forking paths

… Where only some of them produce a certain result.

Only plausible vs all scenarios

Not all scenarios are equally plausible or reasonable. Thus the multiverse is not the entire set but a non-random sample of plausibile choices.

Code

x <- seq(0, 1, 0.0001)
y <- dexp(x, 2)
y <- (y-min(y))/(max(y)-min(y))

plot(x*100, y, type = "l",
     xlab = "All possible scenarios",
     ylab = "Plausibility",
     lwd = 2)

Multiverse meta-analysis

Meta-analysis many choices

Despite useful and very powerful, meta-analysis is characterized by several (arbitrary) choices. For example:

Should the study \(x\) be excluded for theoretical or statistical (e.g., outliers) reasons?
Should we use an equal or random-effects model?
Which value should take the pre-post missing correlation?
…

An example: Pre-post designs

Let’s imagine that we have \(k\) studies about the efficacy of a certain treatment. They collected a sample of participants measuring a certain variable \(y\) before and after the treatment.

With this design we can summarise the effect computing the difference between the average of the two time points.

An effect size (e.g., Cohen’s \(d\)) can be calculated as:

\[ d = \frac{\overline y_{post} - \overline y_{pre}}{\sigma_p} \]

And \(\sigma_p\) is the pooled pre-post standard deviation.

An example: Pre-post Cohen’s \(d\)

With a pre-post Cohen’s \(d\) we need the pre-post correlation \(\rho\) to calculate the sampling variance:

\[ \sigma^2_{\epsilon_{pp}} = \frac{2(1 - \rho)}{n} + \frac{d^2}{2n} \]

\(\rho\) is usually non reported and need to be chosen from previous literature or a plausible guess.

Pre-post Cohen’s \(d\)

Code

v_dm <- function(r, d, n){
    2*(1-r)/n + d^2/(2*n)
}

vdd <- data.frame(x = seq(-1, 1, 0.001))
vdd$y <- v_dm(vdd$x, 0, 40)

ggplot(vdd, aes(x = x, y = y)) +
    geom_line() +
    xlab(expression(rho)) +
    ylab(expression(sigma^2)) +
    theme_minimal(25)

A simulated example

The data structure: multiple measures of the same outcome within each paper

study	outcome	ni	yi	vi
1	1	25	0.65	0.09
1	2	25	0.42	0.08
1	3	25	-0.71	0.09
2	1	88	-0.07	0.02
2	2	88	0.15	0.02
...	...	...	...	...
9	3	72	-0.03	0.03
9	4	72	-0.82	0.03
9	5	72	0.13	0.03
10	1	164	0.24	0.01
10	2	164	0.58	0.01

Simulation approach

Let’s make an example for a paper with \(j = 3\) measures of the outcome:

\[ \begin{bmatrix} y_{11} \\ y_{12} \\ y_{13} \end{bmatrix} = \begin{bmatrix} \mu_{\theta_1} \\ \mu_{\theta_2} \\ \mu_{\theta_3} \end{bmatrix} + \begin{bmatrix} \delta_{\theta_1} \\ \delta_{\theta_2} \\ \delta_{\theta_3} \end{bmatrix} + \begin{bmatrix} \epsilon_{\theta_{11}} \\ \epsilon_{\theta_{12}} \\ \epsilon_{\theta_{13}} \end{bmatrix} \]

\[ \boldsymbol{\delta} \sim \text{MVN}\left( \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \ \begin{matrix} \tau^2_1 & & \\ \rho_{21}\tau_2\tau_1 & \tau^2_1 & \\ \rho_{31}\tau_2\tau_1 & \rho_{32}\tau_2\tau_1 & \tau^2_1 \\ \end{matrix} \right) \]

\[ \boldsymbol{\epsilon} \sim \text{MVN}\left( \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \ \begin{matrix} \sigma^2_{\epsilon_1} & & \\ \rho_{21}\sigma^2_{\epsilon_2}\sigma^2_{\epsilon_1} & \sigma^2_{\epsilon_2} & \\ \rho_{31}\sigma^2_{\epsilon_3}\sigma^2_{\epsilon_1} & \rho_{32}\sigma^2_{\epsilon_3}\sigma^2_{\epsilon_2} & \sigma^2_{\epsilon_3} \\ \end{matrix} \right) \]

Simulation approach

We simulated individual participant data, thus:

Sampling the true values \(\theta_{ij}\) for each study \(i\) and outcome \(j\) from the multivariate distribution
Generating \(n_i\) pre and post data with correlation \(\rho\)
Calculating the effect size (imputing the pre-post correlation)
Aggregating multiple outcomes within the same paper (imputing the correlation)
Fitting the meta-analyis model
Calculating the scores
Repeating 3-4 for each scenario
Using PIMA

Simulation approach

We simulated a relatively simple but plausible multiverse with:

4 pre-post correlations
4 correlations between multiple measures of the same outcome
2 meta-analysis models (fixed and random-effects)

For a total of 32 multiverse scenarios.

Results

Code

p_beta <- ggplot(multi, aes(y = b)) +
    xlim(c(-0.5,0.5)) +
    geom_boxplot(width = 0.5,
                 fill = "dodgerblue",
                 alpha = 0.6) + 
    ylab(latex2exp::TeX("$\\mu_{\\,\\, \\theta}$")) +
    theme_minimal(30) +
    theme(axis.text.x = element_blank())

p_pval <- ggplot(multi, aes(y = -log10(p.value))) +
    xlim(c(-0.5,0.5)) +
    geom_hline(yintercept = -log10(0.05)) +
    geom_boxplot(width = 0.5,
                 fill = "dodgerblue",
                 alpha = 0.6) + 
    ylab(latex2exp::TeX("Raw $-log_{10}(p)$")) +
    theme_minimal(30) +
    theme(axis.text.x = element_blank())

rdata <- cor(zi)
rdata <- data.frame(r = rdata[upper.tri(rdata, diag = FALSE)])

p_cor <- ggplot(rdata, aes(y = r)) +
    xlim(c(-0.5,0.5)) +
    geom_boxplot(width = 0.5,
                 fill = "dodgerblue",
                 alpha = 0.6) + 
    ylab(latex2exp::TeX("$\\rho$")) +
    theme_minimal(30) +
    theme(axis.text.x = element_blank()) +
    ylim(c(-1,1))

cowplot::plot_grid(p_beta, p_cor, p_pval, nrow = 1)

An example from Plessen et al. (2023)

Over the last four decades, more than 80 meta-analyses have examined the efficacy of psychotherapies for depression
More than 700 randomised controlled trials (RCTs)
Not all these studies goes in the same direction

An example from Plessen et al. (2023)

Discrepancies in results could be due to:

Which factors (which data to meta-analyze)

inclusion/esclusion of a subset of studies (e.g., low quality studies)
type of control group or control therapy
…

How factors (how to meta-analyze)

type of model (e.g., equal vs random)
model complexity (two-level, three level, robust, etc.)
correcting for publication bias

Descriptive tools

Describing the multiverse

Hall et al. (2022) and Liu et al. (2021) proposed several tools to describe the results of a multiverse analysis.

The increase in complexity NEED to be managed using appropriate tools to summarise and visualize the results.

Estimated effect of interest

Code

specif_res |> 
  ggplot(aes(x = b)) +
  geom_histogram(bins = 50,
                 fill = "dodgerblue",
                 col = "black") +
  ylab("Frequency") +
  xlab("Estimated Effect") +
  geom_vline(xintercept = 0, col = "red") +
  geom_vline(xintercept = 0.25, col = "green")

Estimated effect as a function of scenarios¹

Code

above <- specif_res |> 
  arrange(b) |> 
  mutate(id = 1:n()) |> 
  ggplot(aes(x = id, y = b)) +
  geom_segment(aes(xend = id, y = ci.lb, yend = ci.ub)) +
  geom_hline(yintercept = 0, col = "red") +
  geom_hline(yintercept = 0.25, col = "green") +
  theme(axis.title.y = element_blank(),
        axis.text.x = element_blank(),
        axis.title.x = element_blank())
  
below <- specif_res |> 
  arrange(b) |> 
  mutate(id = 1:n()) |> 
  pivot_longer(c(
    target_group,
    format,
    diagnosis,
    risk_of_bias,
    type,
    method
  )) |> 
  ggplot(aes(x = id, y = value)) +
  geom_tile() +
  theme(axis.title.y = element_blank(),
        axis.text.x = element_blank()) +
  xlab("Specification")

plot_grid(above, below, nrow = 2, align = "hv")

Fixing a specific parameter

We can the effect of a set of scenarios having a certain parameter:

Code

specif_res |> 
  arrange(b) |> 
  mutate(id = 1:n()) |> 
  ggplot(aes(x = b, fill = diagnosis)) +
  geom_density(alpha = 0.5)

Inferential tools

Specification Curve Simonsohn, Simmons, and Nelson (2020)

The idea of the specification curve is both descriptive (see previous plots) and inferential.

Inferentially, the approach requires to generate a new dataset under the null hypothesis and compare with the observed value.

Specification Curve in meta-analysis Plessen et al. (2023)

Plessen et al. (2023) describe how to implement the method for meta-analysis.

For each dataset/model generate \(k_s\) data (where \(k\) is the number of effect in the specification \(s\)) sampling from \(\mathcal{N}(0,\sigma^2_{\epsilon_i} + \tau^2)\).
Fit the model (FE or RE)
Get the specification curve
Repeat 1-3 for a large number of times (e.g., 1000)
Compute the 2.5% and 95.75% quantiles
Check if the observed specification is outside the 95% CI

Specification Curve in meta-analysis Plessen et al. (2023)

Code

specif$SD |> 
  filter(spec %in% sample(1:1000, 1)) |> 
  ggplot(aes(x = id, y = b)) +
  geom_line(aes(group = spec))

Specification Curve in meta-analysis Plessen et al. (2023)

Code

specif$SD |> 
  filter(spec %in% sample(1:1000, 1)) |> 
  ggplot(aes(x = id, y = b)) +
  geom_line(aes(group = spec))

Specification Curve in meta-analysis Plessen et al. (2023)

Code

specif$SD |> 
  filter(spec %in% sample(1:1000, 1)) |> 
  ggplot(aes(x = id, y = b)) +
  geom_line(aes(group = spec))

Specification Curve in meta-analysis Plessen et al. (2023)

Code

specif_res <- specif_res |> 
  arrange(b) |> 
  mutate(id = 1:n())

ggplot(data = specif$SD,
       aes(x = id, y = b)) +
  geom_line(aes(group = spec)) +
  geom_line(data = specif_res,
            aes(x = id, y = b),
            col = "firebrick",
            lwd = 2)

PIMA

P-selection Inference in Multiverse Meta-analysis (PIMA) is an inferential approach to multiverse analysis is an inferential framework for:

obtaining an overall p-value across the multiverse
obtaining corrected p-values based on resampling methods

PIMA (Girardi et al. 2024)

PIMA on meta-analysis

We implemented PIMA also on meta-analysis but (for the moment) limited to two-level models without moderators.

PIMA Results

Code

multi_flip_overall <- flip::npc(multi_flip, comb.funct = "maxT")
multi_p <- multi_flip_overall@res$`p-value`
fp <- function(p){
    ifelse(p <= 0.001, "p < 0.001", as.character(round(p, 3)))
}

The multiverse is associated with an overall p value of 0.016 ¹. Then we can describe the overall results:

Code

p_beta <- ggplot(multi, aes(y = b)) +
    xlim(c(-0.5,0.5)) +
    geom_boxplot(width = 0.5,
                 fill = "dodgerblue",
                 alpha = 0.6) + 
    ylab(latex2exp::TeX("$\\mu_{\\,\\, \\theta}$")) +
    theme_minimal(30) +
    theme(axis.text.x = element_blank())

p_pval <- ggplot(multi, aes(y = -log10(p.value))) +
    xlim(c(-0.5,0.5)) +
    geom_hline(yintercept = -log10(0.05)) +
    geom_boxplot(width = 0.5,
                 fill = "dodgerblue",
                 alpha = 0.6) + 
    ylab(latex2exp::TeX("Raw $-log_{10}(p)$")) +
    theme_minimal(30) +
    theme(axis.text.x = element_blank())

rdata <- cor(zi)
rdata <- data.frame(r = rdata[upper.tri(rdata, diag = FALSE)])

p_cor <- ggplot(rdata, aes(y = r)) +
    xlim(c(-0.5,0.5)) +
    geom_boxplot(width = 0.5,
                 fill = "dodgerblue",
                 alpha = 0.6) + 
    ylab(latex2exp::TeX("$\\rho$")) +
    theme_minimal(30) +
    theme(axis.text.x = element_blank()) +
    ylim(c(-1,1))

cowplot::plot_grid(p_beta, p_cor, p_pval, nrow = 1)

Impact of multiplicity correction

Code

tp <- function(p){
    -log10(p)
}

lbl <- c("Never $p \\leq 0.05$", 
         "Before correction $p \\leq 0.05$", 
         "After correction $p \\leq 0.05$")


multi$sign <- factor(multi$sign, 
                     labels = latex2exp::TeX(lbl))

ggplot(multi, aes(x = tp(p.value), tp(adjust.maxt), color = sign)) +
    geom_hline(yintercept = tp(0.05), alpha = 0.4) +
    geom_vline(xintercept = tp(0.05), alpha = 0.4) +
    geom_abline(linetype = "dashed", lwd = 0.5) +
    geom_point(size = 5,
               position = position_jitter(width = 0.1),
               alpha = 0.5) +
    xlim(c(0, 3)) +
    ylim(c(0, 3)) +
    xlab(TeX("Raw p value ($-log_{10}$)")) +
    ylab(TeX("maxT p value ($-log_{10}$)")) +
    theme_minimal(base_size = 20) +
    theme(legend.title = element_blank(),
          legend.position = "bottom") +
    scale_color_manual(labels = scales::parse_format(), values = c("#F8766D", "#7CAE00", "#00BFC4"))

Important aspects of multiverse analysis

Include only plausible scenarios. Regardless the aim (description or inference) of the multiverse, the results are meaningful if and only if scenarios are plausible.
Scenarios are assumed to be equally plausible but we could imagine a plausibility weight (future direction)
Multiverse analysis is also useful to estimate the degree of variability according to plausible choices

Some other multiverse examples and resources

R Packages

https://mucollective.github.io/multiverse/
https://mverseanalysis.github.io/mverse/
https://github.com/uwdata/boba

Multiverse projects

Dafflon et al. (2022) implemented a multiverse analysis for neuroimaging data
Hoogeveen et al. (2023) implemented a Bayesian multiverse analysis within the Many Labs 4 project (a large scale replication project in Psychology)
Olsson-Collentine et al. (2023) re-analyzed some replication projects using a multiverse approach

Multiverse is catchy!

Code

multi_cit <- data.frame(
        year = c(2024L,2023L,2022L,2021L,2020L,2019L,
                 2018L,2017L,2016L,2015L,2014L,2013L),
      papers = c(30L, 46L, 33L, 19L, 10L, 11L, 4L, 2L, 3L, 0L, 1L, 1L),
       total = c(652674L,1295092L,1317981L,1299609L,
                 1161910L,1044346L,955091L,907118L,874112L,840256L,804004L,
                 777242L)
)

multi_cit |> 
    mutate(year = as.integer(year)) |> 
    ggplot(aes(x = year, y = papers/total)) +
    geom_line() +
    geom_label(aes(label = papers)) +
    scale_x_continuous(breaks = seq(2012, 2024, 1)) +
    theme(axis.title.x = element_blank(),
          axis.text.y = element_blank(),
          axis.ticks.y = element_blank()) +
    ylab("# papers on Multiverse")

References

Dafflon, Jessica, Pedro F Da Costa, František Váša, Ricardo Pio Monti, Danilo Bzdok, Peter J Hellyer, Federico Turkheimer, Jonathan Smallwood, Emily Jones, and Robert Leech. 2022. “A Guided Multiverse Study of Neuroimaging Analyses.” Nature Communications 13 (June): 3758. https://doi.org/10.1038/s41467-022-31347-8.

Follmann, D A, and M A Proschan. 1999. “Valid Inference in Random Effects Meta-Analysis.” Biometrics 55 (September): 732–37. https://doi.org/10.1111/j.0006-341x.1999.00732.x.

Girardi, Paolo, Anna Vesely, Daniël Lakens, Gianmarco Altoè, Massimiliano Pastore, Antonio Calcagnì, and Livio Finos. 2024. “Post-Selection Inference in Multiverse Analysis (PIMA): An Inferential Framework Based on the Sign Flipping Score Test.” Psychometrika, April. https://doi.org/10.1007/s11336-024-09973-6.

Hall, B D, Y Liu, Y Jansen, P Dragicevic, F Chevalier, and M Kay. 2022. “A Survey of Tasks and Visualizations in Multiverse Analysis Reports.” Computer Graphics Forum: Journal of the European Association for Computer Graphics 41: 402–26. https://doi.org/10.1111/cgf.14443.

Hoogeveen, Suzanne, Sophie W Berkhout, Quentin F Gronau, Eric-Jan Wagenmakers, and Julia M Haaf. 2023. “Improving Statistical Analysis in Team Science: The Case of a Bayesian Multiverse of Many Labs 4.” Advances in Methods and Practices in Psychological Science 6 (July): 25152459231182318. https://doi.org/10.1177/25152459231182318.

Liu, Yang, Alex Kale, Tim Althoff, and Jeffrey Heer. 2021. “Boba: Authoring and Visualizing Multiverse Analyses.” IEEE Transactions on Visualization and Computer Graphics 27 (February): 1753–63. https://doi.org/10.1109/TVCG.2020.3028985.

Olsson-Collentine, Anton, Robbie C M van Aert, Marjan Bakker, and Jelte Wicherts. 2023. “Meta-Analyzing the Multiverse: A Peek Under the Hood of Selective Reporting.” Psychological Methods, May. https://doi.org/10.1037/met0000559.

Plessen, Constantin Yves, Eirini Karyotaki, Clara Miguel, Marketa Ciharova, and Pim Cuijpers. 2023. “Exploring the Efficacy of Psychotherapies for Depression: A Multiverse Meta-Analysis.” BMJ Mental Health 26 (February). https://doi.org/10.1136/bmjment-2022-300626.

Simonsohn, Uri, Joseph P Simmons, and Leif D Nelson. 2020. “Specification Curve Analysis.” Nature Human Behaviour 4 (November): 1208–14. https://doi.org/10.1038/s41562-020-0912-z.

Steegen, Sara, Francis Tuerlinckx, Andrew Gelman, and Wolf Vanpaemel. 2016. “Increasing Transparency Through a Multiverse Analysis.” Perspectives on Psychological Science: A Journal of the Association for Psychological Science 11 (September): 702–12. https://doi.org/10.1177/1745691616658637.

Westfall, Peter H, and S Stanley Young. 1993. Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment. John Wiley & Sons. https://play.google.com/store/books/details?id=nuQXORVGI1QC.

Extra - PIMA implementation for meta-analysis

General Workflow

Meta-analysis with permutations (Follmann and Proschan 1999)

With \(k\) observed studies where \(y_i\) and \(\sigma^2_{\epsilon_i}\) being the observed effect sizes and sampling variances:

Generate a random vector \(\mathbf{s}\) of \(\pm 1\) of length \(k\)
Multiply the \(\mathbf{y}\) vector with the \(s\) vector
Fit the meta-analysis model and calculate \(z^{*}_j\) (\(j\) for permuted)
Repeat 1-3 for a large number of times \(B\). With small \(k\) we can do all the permutations \(B = 2^k \times k\)

The first permutation (\(j = 1\)) is the observed data. The p value can be computed as:

\[ p = \frac{\text{\#}(|z^{*}_j| > |z^{*}_1|)}{B} \]

Fast meta-analysis using permutations

Meta-analysis using permutations requires recomputing \(\tau^2\) and \(\mu_\theta\) after each permutation.
We proposed to estimate \(\tau^2\) under \(H_0\) and use the value for the permutations (without re-estimating it)
This is extremely fast especially for large datasets and several multiverse scenarios

Estimating \(\tau^2\) under \(H_0\)

The crucial step is the point (1). This requires maximizing the log-likelihood fixing \(\mu_\theta = 0\):

\[ L(\mu_{\theta}, \tau^2|\mathbf{y}) = -\frac{1}{2}\sum_{i = 1}^k \ln(\tau^2 + \sigma_{\epsilon_i}^2) - \frac{1}{2} \sum_{i = 1}^k \frac{(y_i - \mu_{\theta})^2}{\tau^2 + \sigma_{\epsilon_i}^2} \]

This can be done in R using some optimizer function (e.g., optim) or using directly the metafor package that allows fixing some parameters that are usually estimated.

Multiverse

Multiverse analysis

Multiverse analysis Steegen et al. (2016)

Some examples

The garden of forking paths

The garden of forking paths

Only plausible vs all scenarios

Multiverse meta-analysis

Meta-analysis many choices

An example: Pre-post designs

An example: Pre-post Cohen’s \(d\)

Pre-post Cohen’s \(d\)

A simulated example

Simulation approach

Simulation approach

Simulation approach

Results

An example from Plessen et al. (2023)

An example from Plessen et al. (2023)

An example from Plessen et al. (2023)

Which factors (which data to meta-analyze)

How factors (how to meta-analyze)

Descriptive tools

Describing the multiverse

Estimated effect of interest

Estimated effect as a function of scenarios1

Fixing a specific parameter

Inferential tools

Specification Curve Simonsohn, Simmons, and Nelson (2020)

Specification Curve in meta-analysis Plessen et al. (2023)

Specification Curve in meta-analysis Plessen et al. (2023)

Specification Curve in meta-analysis Plessen et al. (2023)

Specification Curve in meta-analysis Plessen et al. (2023)

Specification Curve in meta-analysis Plessen et al. (2023)

PIMA

PIMA (Girardi et al. 2024)

PIMA on meta-analysis

PIMA Results

Impact of multiplicity correction

Important aspects of multiverse analysis

Important aspects of multiverse analysis

Some other multiverse examples and resources

R Packages

Multiverse projects

Multiverse is catchy!

References

Extra - PIMA implementation for meta-analysis

General Workflow

Meta-analysis with permutations (Follmann and Proschan 1999)

Fast meta-analysis using permutations

Estimating \(\tau^2\) under \(H_0\)

Estimated effect as a function of scenarios¹