Monte Carlo Simulations

Filippo Gambarota

Gianmarco Altoè

University of Padova

08 February 2024

Monte Carlo Simulations (MCS)

Monte Carlo Simulations (MCS)

  • MCS are controlled experiments where assuming the data generation process, realistic data can be generated
  • MCS (usually) requires parametric assumptions (e.g., data are generated from a normal distribution). To note that the same parametric assumptions are made by the statistical models
  • MCS allow to estimate the statistical power, evaluate a new statistical methods, understand how a specific model works on some conditions, etc.

Why MCS are useful?

Why MCS are useful?


Understanding the statistical theory
Understanding the...
Is required for
Is required for...
Simulating Data
Simulating Da...
Is very useful for
Is very useful for...
Text is not SVG - cannot display


A quick example, Welch t-test1

We are learning the t-test, and we read that if the two sample comes from populations with the same variance, we can use the regular t-test otherwise we should use the so-called Welch t-test.

Cool! but why?

Without looking at the formula, let’s simply try to simulate a t-test where we know the two populations have different variance and also simulate different sample size between the two groups:

nsim <- 1e4

n0 <- 30
n1 <- 20
m0 <- 0
m1 <- 0
sratio <- 3

equal_t <- vector(mode = "list", length = nsim)
unequal_t <- vector(mode = "list", length = nsim)

for(i in 1:nsim){
  g0 <- rnorm(n0, m0, 1)  
  g1 <- rnorm(n1, m0, sratio)
  equal_t[[i]] <- t.test(g0, g1, var.equal = TRUE)
  unequal_t[[i]] <- t.test(g0, g1, var.equal = FALSE)

Cool! but why?

p_equal <- sapply(equal_t, function(x) x$p.value)
p_unequal <- sapply(unequal_t, function(x) x$p.value)

mean(p_equal <= 0.05)
mean(p_unequal <= 0.05)
#> [1] 0.0975
#> [1] 0.0476

The probability of making type-1 error is almost two times higher when using the standard t test

Cool! but why?

Let’s have a better look at the simulation results. We find the answer! The standard error is systematically lower using the standard t-test thus increasing the t value and the number of low p-values inflating the type-1 error rate.

Cool! but why?1

Standard t-test

\[t = \frac{\bar{X_1} - \bar{X_2}}{s_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}\] \[s_p = \sqrt{\frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}}\]

Welch’s t-test

\[t = \frac{\bar{X_1} - \bar{X_2}}{\sqrt{SE^2_{\bar X_1} + SE^2_{\bar X_2}}}\] \[SE_{X_i} = \frac{s_i}{\sqrt{n_i}}\]

General MCS strategy

General MCS strategy

In general, the following workflow can be useful when preparing a simulation:

  1. Define the data-generation process usually starting from the model equation
  2. Find the fixed parameters e.g., mean of group 1, etc.
  3. Find the R functions to generate data given 1 and 2
  4. Repeat the simulation several times
  5. Check the recovery of simulated parameters
  6. Compute the metrics that are useful for the simulation e.g., power, type1 error, etc.