Filippo Gambarota
Gianmarco Altoè
University of Padova
08 February 2024
while(TRUE){
}
We are learning the t-test, and we read that if the two sample comes from populations with the same variance, we can use the regular t-test otherwise we should use the so-called Welch t-test.
Without looking at the formula, let’s simply try to simulate a t-test where we know the two populations have different variance and also simulate different sample size between the two groups:
nsim <- 1e4
n0 <- 30
n1 <- 20
m0 <- 0
m1 <- 0
sratio <- 3
equal_t <- vector(mode = "list", length = nsim)
unequal_t <- vector(mode = "list", length = nsim)
for(i in 1:nsim){
g0 <- rnorm(n0, m0, 1)
g1 <- rnorm(n1, m0, sratio)
equal_t[[i]] <- t.test(g0, g1, var.equal = TRUE)
unequal_t[[i]] <- t.test(g0, g1, var.equal = FALSE)
}
#> [1] 0.0975
#> [1] 0.0476
The probability of making type-1 error is almost two times higher when using the standard t test
Let’s have a better look at the simulation results. We find the answer! The standard error is systematically lower using the standard t-test thus increasing the t value and the number of low p-values inflating the type-1 error rate.
Standard t-test
\[t = \frac{\bar{X_1} - \bar{X_2}}{s_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}\] \[s_p = \sqrt{\frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}}\]
Welch’s t-test
\[t = \frac{\bar{X_1} - \bar{X_2}}{\sqrt{SE^2_{\bar X_1} + SE^2_{\bar X_2}}}\] \[SE_{X_i} = \frac{s_i}{\sqrt{n_i}}\]
In general, the following workflow can be useful when preparing a simulation: