Monte Carlo Simulations

Filippo Gambarota

Gianmarco Altoè

University of Padova

08 February 2024

Monte Carlo Simulations (MCS)

Monte Carlo Simulations (MCS)

  • MCS are controlled experiments where assuming the data generation process, realistic data can be generated
  • MCS (usually) requires parametric assumptions (e.g., data are generated from a normal distribution). To note that the same parametric assumptions are made by the statistical models
  • MCS allow to estimate the statistical power, evaluate a new statistical methods, understand how a specific model works on some conditions, etc.

Why MCS are useful?

Why MCS are useful?

while(TRUE){

Understanding the statistical theory
Understanding the...
Is required for
😣
Is required for...
Simulating Data
Simulating Da...
Is very useful for
😎
Is very useful for...
Text is not SVG - cannot display

}

A quick example, Welch t-test1

We are learning the t-test, and we read that if the two sample comes from populations with the same variance, we can use the regular t-test otherwise we should use the so-called Welch t-test.

Cool! but why?

Without looking at the formula, let’s simply try to simulate a t-test where we know the two populations have different variance and also simulate different sample size between the two groups:

nsim <- 1e4

n0 <- 30
n1 <- 20
m0 <- 0
m1 <- 0
sratio <- 3

equal_t <- vector(mode = "list", length = nsim)
unequal_t <- vector(mode = "list", length = nsim)

for(i in 1:nsim){
  g0 <- rnorm(n0, m0, 1)  
  g1 <- rnorm(n1, m0, sratio)
  equal_t[[i]] <- t.test(g0, g1, var.equal = TRUE)
  unequal_t[[i]] <- t.test(g0, g1, var.equal = FALSE)
}

Cool! but why?

p_equal <- sapply(equal_t, function(x) x$p.value)
p_unequal <- sapply(unequal_t, function(x) x$p.value)

mean(p_equal <= 0.05)
mean(p_unequal <= 0.05)
#> [1] 0.0975
#> [1] 0.0476

The probability of making type-1 error is almost two times higher when using the standard t test

Cool! but why?

Let’s have a better look at the simulation results. We find the answer! The standard error is systematically lower using the standard t-test thus increasing the t value and the number of low p-values inflating the type-1 error rate.

Cool! but why?1

Standard t-test

\[t = \frac{\bar{X_1} - \bar{X_2}}{s_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}\] \[s_p = \sqrt{\frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}}\]

Welch’s t-test

\[t = \frac{\bar{X_1} - \bar{X_2}}{\sqrt{SE^2_{\bar X_1} + SE^2_{\bar X_2}}}\] \[SE_{X_i} = \frac{s_i}{\sqrt{n_i}}\]

General MCS strategy

General MCS strategy

In general, the following workflow can be useful when preparing a simulation:

  1. Define the data-generation process usually starting from the model equation
  2. Find the fixed parameters e.g., mean of group 1, etc.
  3. Find the R functions to generate data given 1 and 2
  4. Repeat the simulation several times
  5. Check the recovery of simulated parameters
  6. Compute the metrics that are useful for the simulation e.g., power, type1 error, etc.