Replicability Crisis in Science?
Filippo Gambarota
8-10 July 2024
A tree of possibilities…
… Where only some of them produce a certain result.
Not all scenarios are equally plausible or reasonable. Thus the multiverse is not the entire set but a non-random sample of plausibile choices.
Despite useful and very powerful, meta-analysis is characterized by several (arbitrary) choices. For example:
Let’s imagine that we have \(k\) studies about the efficacy of a certain treatment. They collected a sample of participants measuring a certain variable \(y\) before and after the treatment.
With this design we can summarise the effect computing the difference between the average of the two time points.
An effect size (e.g., Cohen’s \(d\)) can be calculated as:
\[ d = \frac{\overline y_{post} - \overline y_{pre}}{\sigma_p} \]
And \(\sigma_p\) is the pooled pre-post standard deviation.
With a pre-post Cohen’s \(d\) we need the pre-post correlation \(\rho\) to calculate the sampling variance:
\[ \sigma^2_{\epsilon_{pp}} = \frac{2(1 - \rho)}{n} + \frac{d^2}{2n} \]
\(\rho\) is usually non reported and need to be chosen from previous literature or a plausible guess.
v_dm <- function(r, d, n){
2*(1-r)/n + d^2/(2*n)
}
vdd <- data.frame(x = seq(-1, 1, 0.001))
vdd$y <- v_dm(vdd$x, 0, 40)
ggplot(vdd, aes(x = x, y = y)) +
geom_line() +
xlab(expression(rho)) +
ylab(expression(sigma^2)) +
theme_minimal(25)
The data structure: multiple measures of the same outcome within each paper
study | outcome | ni | yi | vi |
---|---|---|---|---|
1 | 1 | 25 | 0.65 | 0.09 |
1 | 2 | 25 | 0.42 | 0.08 |
1 | 3 | 25 | -0.71 | 0.09 |
2 | 1 | 88 | -0.07 | 0.02 |
2 | 2 | 88 | 0.15 | 0.02 |
... | ... | ... | ... | ... |
9 | 3 | 72 | -0.03 | 0.03 |
9 | 4 | 72 | -0.82 | 0.03 |
9 | 5 | 72 | 0.13 | 0.03 |
10 | 1 | 164 | 0.24 | 0.01 |
10 | 2 | 164 | 0.58 | 0.01 |
Let’s make an example for a paper with \(j = 3\) measures of the outcome:
\[ \begin{bmatrix} y_{11} \\ y_{12} \\ y_{13} \end{bmatrix} = \begin{bmatrix} \mu_{\theta_1} \\ \mu_{\theta_2} \\ \mu_{\theta_3} \end{bmatrix} + \begin{bmatrix} \delta_{\theta_1} \\ \delta_{\theta_2} \\ \delta_{\theta_3} \end{bmatrix} + \begin{bmatrix} \epsilon_{\theta_{11}} \\ \epsilon_{\theta_{12}} \\ \epsilon_{\theta_{13}} \end{bmatrix} \]
\[ \boldsymbol{\delta} \sim \text{MVN}\left( \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \ \begin{matrix} \tau^2_1 & & \\ \rho_{21}\tau_2\tau_1 & \tau^2_1 & \\ \rho_{31}\tau_2\tau_1 & \rho_{32}\tau_2\tau_1 & \tau^2_1 \\ \end{matrix} \right) \]
\[ \boldsymbol{\epsilon} \sim \text{MVN}\left( \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \ \begin{matrix} \sigma^2_{\epsilon_1} & & \\ \rho_{21}\sigma^2_{\epsilon_2}\sigma^2_{\epsilon_1} & \sigma^2_{\epsilon_2} & \\ \rho_{31}\sigma^2_{\epsilon_3}\sigma^2_{\epsilon_1} & \rho_{32}\sigma^2_{\epsilon_3}\sigma^2_{\epsilon_2} & \sigma^2_{\epsilon_3} \\ \end{matrix} \right) \]
We simulated individual participant data, thus:
We simulated a relatively simple but plausible multiverse with:
For a total of 32 multiverse scenarios.
p_beta <- ggplot(multi, aes(y = b)) +
xlim(c(-0.5,0.5)) +
geom_boxplot(width = 0.5,
fill = "dodgerblue",
alpha = 0.6) +
ylab(latex2exp::TeX("$\\mu_{\\,\\, \\theta}$")) +
theme_minimal(30) +
theme(axis.text.x = element_blank())
p_pval <- ggplot(multi, aes(y = -log10(p.value))) +
xlim(c(-0.5,0.5)) +
geom_hline(yintercept = -log10(0.05)) +
geom_boxplot(width = 0.5,
fill = "dodgerblue",
alpha = 0.6) +
ylab(latex2exp::TeX("Raw $-log_{10}(p)$")) +
theme_minimal(30) +
theme(axis.text.x = element_blank())
rdata <- cor(zi)
rdata <- data.frame(r = rdata[upper.tri(rdata, diag = FALSE)])
p_cor <- ggplot(rdata, aes(y = r)) +
xlim(c(-0.5,0.5)) +
geom_boxplot(width = 0.5,
fill = "dodgerblue",
alpha = 0.6) +
ylab(latex2exp::TeX("$\\rho$")) +
theme_minimal(30) +
theme(axis.text.x = element_blank()) +
ylim(c(-1,1))
cowplot::plot_grid(p_beta, p_cor, p_pval, nrow = 1)
Discrepancies in results could be due to:
Hall et al. (2022) and Liu et al. (2021) proposed several tools to describe the results of a multiverse analysis.
The increase in complexity NEED to be managed using appropriate tools to summarise and visualize the results.
specif_res |>
ggplot(aes(x = b)) +
geom_histogram(bins = 50,
fill = "dodgerblue",
col = "black") +
ylab("Frequency") +
xlab("Estimated Effect") +
geom_vline(xintercept = 0, col = "red") +
geom_vline(xintercept = 0.25, col = "green")
above <- specif_res |>
arrange(b) |>
mutate(id = 1:n()) |>
ggplot(aes(x = id, y = b)) +
geom_segment(aes(xend = id, y = ci.lb, yend = ci.ub)) +
geom_hline(yintercept = 0, col = "red") +
geom_hline(yintercept = 0.25, col = "green") +
theme(axis.title.y = element_blank(),
axis.text.x = element_blank(),
axis.title.x = element_blank())
below <- specif_res |>
arrange(b) |>
mutate(id = 1:n()) |>
pivot_longer(c(
target_group,
format,
diagnosis,
risk_of_bias,
type,
method
)) |>
ggplot(aes(x = id, y = value)) +
geom_tile() +
theme(axis.title.y = element_blank(),
axis.text.x = element_blank()) +
xlab("Specification")
plot_grid(above, below, nrow = 2, align = "hv")
We can the effect of a set of scenarios having a certain parameter:
specif_res |>
arrange(b) |>
mutate(id = 1:n()) |>
ggplot(aes(x = b, fill = diagnosis)) +
geom_density(alpha = 0.5)
The idea of the specification curve is both descriptive (see previous plots) and inferential.
Inferentially, the approach requires to generate a new dataset under the null hypothesis and compare with the observed value.
Plessen et al. (2023) describe how to implement the method for meta-analysis.
specif_res <- specif_res |>
arrange(b) |>
mutate(id = 1:n())
ggplot(data = specif$SD,
aes(x = id, y = b)) +
geom_line(aes(group = spec)) +
geom_line(data = specif_res,
aes(x = id, y = b),
col = "firebrick",
lwd = 2)
P-selection Inference in Multiverse Meta-analysis (PIMA) is an inferential approach to multiverse analysis is an inferential framework for:
We implemented PIMA also on meta-analysis but (for the moment) limited to two-level models without moderators.
multi_flip_overall <- flip::npc(multi_flip, comb.funct = "maxT")
multi_p <- multi_flip_overall@res$`p-value`
fp <- function(p){
ifelse(p <= 0.001, "p < 0.001", as.character(round(p, 3)))
}
The multiverse is associated with an overall p value of 0.016 1. Then we can describe the overall results:
p_beta <- ggplot(multi, aes(y = b)) +
xlim(c(-0.5,0.5)) +
geom_boxplot(width = 0.5,
fill = "dodgerblue",
alpha = 0.6) +
ylab(latex2exp::TeX("$\\mu_{\\,\\, \\theta}$")) +
theme_minimal(30) +
theme(axis.text.x = element_blank())
p_pval <- ggplot(multi, aes(y = -log10(p.value))) +
xlim(c(-0.5,0.5)) +
geom_hline(yintercept = -log10(0.05)) +
geom_boxplot(width = 0.5,
fill = "dodgerblue",
alpha = 0.6) +
ylab(latex2exp::TeX("Raw $-log_{10}(p)$")) +
theme_minimal(30) +
theme(axis.text.x = element_blank())
rdata <- cor(zi)
rdata <- data.frame(r = rdata[upper.tri(rdata, diag = FALSE)])
p_cor <- ggplot(rdata, aes(y = r)) +
xlim(c(-0.5,0.5)) +
geom_boxplot(width = 0.5,
fill = "dodgerblue",
alpha = 0.6) +
ylab(latex2exp::TeX("$\\rho$")) +
theme_minimal(30) +
theme(axis.text.x = element_blank()) +
ylim(c(-1,1))
cowplot::plot_grid(p_beta, p_cor, p_pval, nrow = 1)
tp <- function(p){
-log10(p)
}
lbl <- c("Never $p \\leq 0.05$",
"Before correction $p \\leq 0.05$",
"After correction $p \\leq 0.05$")
multi$sign <- factor(multi$sign,
labels = latex2exp::TeX(lbl))
ggplot(multi, aes(x = tp(p.value), tp(adjust.maxt), color = sign)) +
geom_hline(yintercept = tp(0.05), alpha = 0.4) +
geom_vline(xintercept = tp(0.05), alpha = 0.4) +
geom_abline(linetype = "dashed", lwd = 0.5) +
geom_point(size = 5,
position = position_jitter(width = 0.1),
alpha = 0.5) +
xlim(c(0, 3)) +
ylim(c(0, 3)) +
xlab(TeX("Raw p value ($-log_{10}$)")) +
ylab(TeX("maxT p value ($-log_{10}$)")) +
theme_minimal(base_size = 20) +
theme(legend.title = element_blank(),
legend.position = "bottom") +
scale_color_manual(labels = scales::parse_format(), values = c("#F8766D", "#7CAE00", "#00BFC4"))
multi_cit <- data.frame(
year = c(2024L,2023L,2022L,2021L,2020L,2019L,
2018L,2017L,2016L,2015L,2014L,2013L),
papers = c(30L, 46L, 33L, 19L, 10L, 11L, 4L, 2L, 3L, 0L, 1L, 1L),
total = c(652674L,1295092L,1317981L,1299609L,
1161910L,1044346L,955091L,907118L,874112L,840256L,804004L,
777242L)
)
multi_cit |>
mutate(year = as.integer(year)) |>
ggplot(aes(x = year, y = papers/total)) +
geom_line() +
geom_label(aes(label = papers)) +
scale_x_continuous(breaks = seq(2012, 2024, 1)) +
theme(axis.title.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank()) +
ylab("# papers on Multiverse")
With \(k\) observed studies where \(y_i\) and \(\sigma^2_{\epsilon_i}\) being the observed effect sizes and sampling variances:
The first permutation (\(j = 1\)) is the observed data. The p value can be computed as:
\[ p = \frac{\text{\#}(|z^{*}_j| > |z^{*}_1|)}{B} \]
Meta-analysis using permutations requires recomputing \(\tau^2\) and \(\mu_\theta\) after each permutation.
We proposed to estimate \(\tau^2\) under \(H_0\) and use the value for the permutations (without re-estimating it)
This is extremely fast especially for large datasets and several multiverse scenarios
The crucial step is the point (1). This requires maximizing the log-likelihood fixing \(\mu_\theta = 0\):
\[ L(\mu_{\theta}, \tau^2|\mathbf{y}) = -\frac{1}{2}\sum_{i = 1}^k \ln(\tau^2 + \sigma_{\epsilon_i}^2) - \frac{1}{2} \sum_{i = 1}^k \frac{(y_i - \mu_{\theta})^2}{\tau^2 + \sigma_{\epsilon_i}^2} \]
This can be done in R using some optimizer function (e.g., optim
) or using directly the metafor
package that allows fixing some parameters that are usually estimated.