Monte Carlo Simulations

Filippo Gambarota

Gianmarco Altoè

University of Padova

08 February 2024

Monte Carlo Simulations (MCS)

MCS are controlled experiments where assuming the data generation process, realistic data can be generated
MCS (usually) requires parametric assumptions (e.g., data are generated from a normal distribution). To note that the same parametric assumptions are made by the statistical models
MCS allow to estimate the statistical power, evaluate a new statistical methods, understand how a specific model works on some conditions, etc.

Why MCS are useful?

while(TRUE){

}

A quick example, Welch t-test¹

We are learning the t-test, and we read that if the two sample comes from populations with the same variance, we can use the regular t-test otherwise we should use the so-called Welch t-test.

Cool! but why?

Without looking at the formula, let’s simply try to simulate a t-test where we know the two populations have different variance and also simulate different sample size between the two groups:

nsim <- 1e4

n0 <- 30
n1 <- 20
m0 <- 0
m1 <- 0
sratio <- 3

equal_t <- vector(mode = "list", length = nsim)
unequal_t <- vector(mode = "list", length = nsim)

for(i in 1:nsim){
  g0 <- rnorm(n0, m0, 1)  
  g1 <- rnorm(n1, m0, sratio)
  equal_t[[i]] <- t.test(g0, g1, var.equal = TRUE)
  unequal_t[[i]] <- t.test(g0, g1, var.equal = FALSE)
}

Cool! but why?

p_equal <- sapply(equal_t, function(x) x$p.value)
p_unequal <- sapply(unequal_t, function(x) x$p.value)

mean(p_equal <= 0.05)
mean(p_unequal <= 0.05)

#> [1] 0.0975
#> [1] 0.0476

The probability of making type-1 error is almost two times higher when using the standard t test

Cool! but why?

Let’s have a better look at the simulation results. We find the answer! The standard error is systematically lower using the standard t-test thus increasing the t value and the number of low p-values inflating the type-1 error rate.

Cool! but why?¹

Standard t-test

\[t = \frac{\bar{X_1} - \bar{X_2}}{s_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}\] \[s_p = \sqrt{\frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}}\]

Welch’s t-test

\[t = \frac{\bar{X_1} - \bar{X_2}}{\sqrt{SE^2_{\bar X_1} + SE^2_{\bar X_2}}}\] \[SE_{X_i} = \frac{s_i}{\sqrt{n_i}}\]

General MCS strategy

In general, the following workflow can be useful when preparing a simulation:

Define the data-generation process usually starting from the model equation
Find the fixed parameters e.g., mean of group 1, etc.
Find the R functions to generate data given 1 and 2
Repeat the simulation several times
Check the recovery of simulated parameters
Compute the metrics that are useful for the simulation e.g., power, type1 error, etc.

Monte Carlo Simulations

Monte Carlo Simulations (MCS)