Poisson GLM

Filippo Gambarota

University of Padova

2023

Last Update: 2023-11-29

Poisson GLM

Everything that we discussed for the binomial GLM is also relevant for the Poisson GLM. We are gonna focus on specificity of the Poisson model in particular:

  • Poisson distribution and link function
  • Parameters interpretation
  • Overdispersion causes, consequences and remedies

Poisson distribution

Poisson distribution

The Poisson distribution is defined as:

\[ p(y; \lambda) = \frac{\lambda^y e^{-\lambda}}{y!} \] Where the mean is \(\lambda\) and the variance is \(\lambda\)

Poisson distribution

Poisson distribution

As the mean increases also the variance increase and the distributions is approximately normal:

Poisson distribution

Parameters intepretation

An example…

Let’s start by a simple example trying to explain the number of errors of math exercises by students (N = 100) as a function of the number of hours of study.

id studyh solved
1 32 19
2 38 20
3 67 41
4 10 12
... ... ...
97 81 57
98 87 66
99 25 8
100 22 16

An example…

  • There is a clear non-linear and positive relationship
  • The both the mean and variance increase as a function of the predictor

Model fitting

We can fit the model using the glm function in R setting the appropriate random component (family) and the link function (link):

fit <- glm(solved ~ studyh, family = poisson(link = "log"), data = dat)

summary(fit)
#> 
#> Call:
#> glm(formula = solved ~ studyh, family = poisson(link = "log"), 
#>     data = dat)
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -2.2596  -0.7741  -0.1367   0.7117   2.6079  
#> 
#> Coefficients:
#>              Estimate Std. Error z value Pr(>|z|)    
#> (Intercept) 2.2830811  0.0505563   45.16   <2e-16 ***
#> studyh      0.0200652  0.0007048   28.47   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for poisson family taken to be 1)
#> 
#>     Null deviance: 1011.993  on 99  degrees of freedom
#> Residual deviance:   94.958  on 98  degrees of freedom
#> AIC: 616.12
#> 
#> Number of Fisher Scoring iterations: 4

Parameters intepretation

  • The (Intercept) \(2.283\) is the log of the expected number of solved exercises for a student with 0 hours of studying. Taking the exponential we obtain the estimation on the response scale \(9.807\)
  • the studyh \(0.020\) is the increase in the expected increase of (log) solved exercises for a unit increase in hours of studying. Taking the exponential we obtain the ratio of increase of the number of solved exercises \(1.020\)

Parameters intepretation

Again, as in the binomial model, the effects are linear on the log scale but non-linear on the response scale.

Parameters intepretation

The non-linearity can be easily seen using the predict() function:

linear <- predict(fit, newdata = data.frame(studyh = c(10, 11)))
diff(linear) # same as the beta0
#>          2 
#> 0.02006518
nonlinear <- exp(linear) # or predict(..., type = "response")
diff(nonlinear)
#>         2 
#> 0.2429288
# ratio of increase when using the response scale
nonlinear[2]/nonlinear[1]
#>        2 
#> 1.020268
# equivalent to exp(beta1)
exp(coef(fit)[2])
#>   studyh 
#> 1.020268

Parameters intepretation - Categorical variable

Let’s make a similar example with the number of solved exercises comparing students who attended online classes and students attending in person. The class_c is the dummy version of class (0 = online, 1 = inperson).

id class class_c solved
1 online 0 8
2 inperson 1 18
3 online 0 12
4 inperson 1 12
... ... ... ...
97 online 0 10
98 inperson 1 21
99 online 0 8
100 inperson 1 17

Parameters intepretation - Categorical variable

Parameters intepretation - Categorical variable

R by default set the categorical variables using dummy-coding. In this case we set the reference category to online.

fit <- glm(solved ~ class, family = poisson(link = "log"), data = dat)
summary(fit)
#> 
#> Call:
#> glm(formula = solved ~ class, family = poisson(link = "log"), 
#>     data = dat)
#> 
#> Deviance Residuals: 
#>      Min        1Q    Median        3Q       Max  
#> -2.38818  -0.77486  -0.07274   0.74539   2.81362  
#> 
#> Coefficients:
#>               Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)    2.22138    0.04657  47.695   <2e-16 ***
#> classinperson  0.48801    0.05917   8.248   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for poisson family taken to be 1)
#> 
#>     Null deviance: 173.39  on 99  degrees of freedom
#> Residual deviance: 103.33  on 98  degrees of freedom
#> AIC: 534.41
#> 
#> Number of Fisher Scoring iterations: 4

Parameters intepretation - Categorical variable

  • Similarly to the previous example, the intercept is the expected number of solved exercises when the class is 0. Thus the expected number of solved exercises for online students.
  • the classinperson is the difference in log solved exercises between online and in person classes. In the response scale is the expected increase in the ratio of solved exercises. People doing in person classes solve 162.91% of the exercises of people doing online classes
c(coef(fit)["(Intercept)"], exp(coef(fit)["(Intercept)"]))
#> (Intercept) (Intercept) 
#>    2.221375    9.220000
c(coef(fit)["classinperson"], exp(coef(fit)["classinperson"]))
#> classinperson classinperson 
#>     0.4880076     1.6290672

Overdispersion

Overdispersion

Overdispersion concerns observing a greater variance compared to what would have been expected by the model.

The overdispersion \(\phi\) can be estimated using Pearson Residuals:

\[\begin{align*} \hat \phi = \frac{\sum_{i = 1}^n \frac{(y_i - \hat y_i)^2}{\hat y_i}}{n - p - 1} \end{align*}\]

Where the numerator is the sum of squared Pearson residuals, \(n\) is the number of observations and \(k\) the number of predictors. For standard Binomial and Poisson models \(\phi = 1\).

Overdispersion

If the model is correctly specified for binomial and poisson models the ratio is equal to 1, of the ratio is \(> 1\) there is evidence for overdispersion. In practical terms, if the residual deviance is higher than the residuals degrees of freedom, there is evidence for overdispersion.

P <- sum(residuals(fit, type = "pearson")^2)
P / df.residual(fit) # nrow(fit$dat) - length(fit_p$coefficients)
#> [1] 1.036323

Testing overdispersion

To formally test for overdispersion i.e. testing if the ratio is significantly different from 1 we can use the performance::check_overdispersion() function.

performance::check_overdispersion(fit)
#> # Overdispersion test
#> 
#>        dispersion ratio =   1.036
#>   Pearson's Chi-Squared = 101.560
#>                 p-value =   0.383

Overdispersion plot

Pearson residuals are defined as:

\[\begin{align*} r_p = \frac{y_i - \hat y_i}{\sqrt{V(\hat y_i)}} \\ V(\hat y_i) = \sqrt{\hat y_i} \end{align*}\]

Remember that the mean and the variance are the same in Poisson models. If the model is correct, the standardized residuals should be normally distributed with mean 0 and variance 1.

Overdispersion plot

Variance-mean relationship

The overdispersion can be expressed also in terms of variance-mean ratio. In fact, when the ratio is 1, there is no evidence of overdispersion.

Causes of overdispersion

There could be multiple causes for overdispersion:

  • the phenomenon itself cannot be modelled with a Poisson distribution
  • outliers or anomalous obervations that increases the observed variance
  • missing important variables in the model

Outliers or anomalous data

This (simulated) dataset contains \(n = 30\) observations coming from a poisson model in the form \(y = 1 + 2x\) and \(n = 7\) observations coming from a model \(y = 1 + 10x\).

Outliers or anomalous data

Clearly the sum of squared pearson residuals is inflated by these values producing more variance compared to what should be expected.

c(mean = mean(datout$y), var = var(datout$y)) # mean and variance should be similar
#>     mean      var 
#> 2.756757 6.689189
performance::check_overdispersion(fit)
#> # Overdispersion test
#> 
#>        dispersion ratio =  1.515
#>   Pearson's Chi-Squared = 53.019
#>                 p-value =  0.026

Missing important variables in the model

Let’s imagine to analyze again the dataset with the number of solved exercises. We have the effect of the studyh variable. In addition we have the effect of the class variable, without interaction.

id class studyh class_c lp solved
1 online 82 0 22.613 24
2 inperson 65 1 47.734 50
3 online 12 0 11.268 15
4 inperson 54 1 42.785 50
... ... ... ... ... ...
37 online 85 0 23.298 35
38 inperson 55 1 43.213 43
39 online 80 0 22.167 26
40 inperson 77 1 53.788 54

Missing important variables in the model

We can also have a look at the data:

Missing important variables in the model

Now let’s fit the model considering only studyh and ignoring the group:

fit <- glm(solved ~ studyh, data = dat, family = poisson(link = "log"))
summary(fit)
#> 
#> Call:
#> glm(formula = solved ~ studyh, family = poisson(link = "log"), 
#>     data = dat)
#> 
#> Deviance Residuals: 
#>    Min      1Q  Median      3Q     Max  
#> -5.047  -2.247  -0.107   2.214   3.246  
#> 
#> Coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept) 2.692791   0.068831   39.12   <2e-16 ***
#> studyh      0.013624   0.001063   12.82   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for poisson family taken to be 1)
#> 
#>     Null deviance: 417.99  on 39  degrees of freedom
#> Residual deviance: 243.99  on 38  degrees of freedom
#> AIC: 451.39
#> 
#> Number of Fisher Scoring iterations: 4

Missing important variables in the model

Essentially, we are fitting an average relationship across groups but the model ignore that the two groups differs, thus the observed variance is definitely higher because we need two separate means to explain the class effect.

Missing important variables in the model

By fitting the appropriate model, the overdispersion is no longer a problem:

fit2 <- glm(solved ~ studyh + class, data = dat, family = poisson(link = "log"))
summary(fit2)
#> 
#> Call:
#> glm(formula = solved ~ studyh + class, family = poisson(link = "log"), 
#>     data = dat)
#> 
#> Deviance Residuals: 
#>      Min        1Q    Median        3Q       Max  
#> -1.93006  -0.48422   0.00417   0.43834   1.97814  
#> 
#> Coefficients:
#>               Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)   2.275159   0.078239   29.08   <2e-16 ***
#> studyh        0.010895   0.001068   10.21   <2e-16 ***
#> classinperson 0.907365   0.065373   13.88   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for poisson family taken to be 1)
#> 
#>     Null deviance: 417.993  on 39  degrees of freedom
#> Residual deviance:  29.022  on 37  degrees of freedom
#> AIC: 238.41
#> 
#> Number of Fisher Scoring iterations: 4

Missing important variables in the model

performance::check_overdispersion(fit)
#> # Overdispersion test
#> 
#>        dispersion ratio =   6.115
#>   Pearson's Chi-Squared = 232.369
#>                 p-value = < 0.001
performance::check_overdispersion(fit2)
#> # Overdispersion test
#> 
#>        dispersion ratio =  0.777
#>   Pearson's Chi-Squared = 28.738
#>                 p-value =  0.832

Missing important variables in the model

Also the residuals plots clearly improved after including all relevant predictors:

Why worring about overdispersion?

Before analyzing the two main strategies to deal with overdispersion, it is important to understand why it is very problematic.

  • Despite the estimated parameters (\(\beta\)) are not affected by overdispersion, standard errors are underestimated. The underestimation increase as the severity of the overdispersion increase.

  • Underestimated standard errors produce (biased) higher \(z\) scores inflating the type-1 error (i.e., lower p values)

Why worring about overdispersion?

The estimated overdispersion of fit is ~ 6.4208936. The summary() function in R has a dispersion argument to check how model parameters are affected.

phi <- fit$deviance / fit$df.residual
summary(fit, dispersion = 1)$coefficients # the default
#>               Estimate  Std. Error  z value     Pr(>|z|)
#> (Intercept) 2.69279058 0.068830558 39.12202 0.000000e+00
#> studyh      0.01362422 0.001062724 12.82009 1.265575e-37
summary(fit, dispersion = 2)$coefficients # assuming phi = 2
#>               Estimate  Std. Error   z value      Pr(>|z|)
#> (Intercept) 2.69279058 0.097341109 27.663447 1.923115e-168
#> studyh      0.01362422 0.001502919  9.065171  1.244091e-19
summary(fit, dispersion = phi)$coefficients # the appropriate
#>               Estimate  Std. Error   z value     Pr(>|z|)
#> (Intercept) 2.69279058 0.174413070 15.439156 8.926080e-54
#> studyh      0.01362422 0.002692888  5.059333 4.207258e-07

Why worring about overdispersion?

By using multiple values for \(\phi\) we can see the impact on the standard error. \(\phi = 1\) is what is assumed by the Poisson model.

Dealing with overdispersion

Dealing with overdispersion

If all the variables are included and there are no outliers, the phenomenon itself contains more variability compared to what predicted by the Poisson. There are two main approaches to deal with the situation:

  • quasi-poisson model
  • poisson-gamma model AKA negative-binomial model

Quasi-poisson model

The quasi-poisson model is essentially a poisson model that estimate the \(\phi\) parameter and adjust the standard errors accordingly. Again, assuming to fit the studyh only model (with overdispersion):

fit <- glm(solved ~ studyh, data = dat, family = quasipoisson(link = "log"))
summary(fit)
#> 
#> Call:
#> glm(formula = solved ~ studyh, family = quasipoisson(link = "log"), 
#>     data = dat)
#> 
#> Deviance Residuals: 
#>    Min      1Q  Median      3Q     Max  
#> -5.047  -2.247  -0.107   2.214   3.246  
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 2.692791   0.170209  15.821  < 2e-16 ***
#> studyh      0.013624   0.002628   5.184 7.46e-06 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for quasipoisson family taken to be 6.115055)
#> 
#>     Null deviance: 417.99  on 39  degrees of freedom
#> Residual deviance: 243.99  on 38  degrees of freedom
#> AIC: NA
#> 
#> Number of Fisher Scoring iterations: 4

Quasi-poisson model

The quasi-poisson model estimates the same parameters (\(\beta\)) and adjust standard errors. Notice that using summary(dispersion = ) is the same as fitting a quasi-Poisson model and using the estimated \(\phi\).

The quasi-poisson model is useful because it is a very simple way to deal with overdispersion.

The variance (\(V(\mu)\)) of the Poisson model is no longer \(\mu\) but \(V(\mu) = \mu\phi\). When \(\phi\) is close to 1, the quasi-Poisson model reduced to a standard Poisson model

Quasi-poisson model

Here we compare the expected variance (\(\pm 2\sqrt{V(y_i)}\))1 of the Quasi-Poisson and Poisson models:

Problems of Quasi-* model

The main problem of quasi-* models is that they are not a specific distribution family and there is not a likelihood function. For this reason, we cannot perform model comparison the standard AIC/BIC. See (Ver Hoef & Boveng, 2007) for an overview.

Negative-binomial model

A negative binomial model is different probability distribution with two parameters: the mean as in standard poisson model and the dispersion parameter. Similarly to the quasi-poisson model it estimates a dispersion parameter.

Practically the negative-binomial model can be considered as a hierarchical model:

\[\begin{align*} y_i \sim Poisson(\lambda_i) \\ \lambda_i \sim Gamma(\mu, \frac{1}{\theta}) \end{align*}\]

In this way, the Gamma distribution regulate the \(\lambda\) variability that otherwise is fixed to a single value in the Poisson model.

Negative-binomial model1

The Poisson-Gamma mixture can be expressed as:

\[ p(y; \mu, \theta) = \frac{\Gamma(y + \theta)}{\Gamma(\theta)\Gamma(y + 1)}\left(\frac{\mu}{\mu + \theta}\right)^{y} \left(\frac{\theta}{\mu + \theta}\right)^\theta \]

Where \(\theta\) is the overdispersion parameter, \(\Gamma()\) is the gamma function. The mean is \(\mu\) and the variance is \(\mu + \frac{\mu^2}{\theta}\). The \(\theta\) is the inverse of the overdispersion in terms that as \(1/\theta\) increase the data are more overdispersed. As \(1/\theta\) approaches 0, the negative-binomial reduced to a Poisson model.

Negative-binomial model

Compared to the Poisson model, the negative binomial allows for overdispersion estimating the parameter \(\theta\) and compared to the quasi-poisson model the variance is not a linear increase of the mean (\(V(\mu) = \theta\mu\)) but have a quadratic relationship \(V(\mu) = \mu + \mu^2/\theta\)

Negative-binomial model

We can use the MASS package to implement the negative binomial distribution using the MASS::rnegbin():

Negative-binomial model

The \(\theta\) parameter is the estimated dispersion. To note, is not the same as \(\phi\) in the quasi-poisson model. As \(\theta\) increase, the overdispersion is reduced and the model is similar to a standard Poisson model.

Negative-binomial model

For fitting a negative-binomial model we cannot use the glm function but we need to use the MASS::glm.nb() function. The syntax is almost the same but we do not need to specify the family because this function only fit negative-binomial models. Let’s simulate some data coming from a negative binomial distribution with \(\theta = 10\)

Code
theta <- 10
n <- 100
b0 <- 10
b1 <- 5
dat <- sim_design(n, nx = list(x = runif(n)))
dat$lp <- with(dat, exp(log(b0) + log(b1)*x))
dat$y <- with(dat, MASS::rnegbin(n, lp, theta))
id x y
1 0.427 11
2 0.365 13
... ... ...
99 0.966 41
100 0.154 12

Negative-binomial model

Negative-binomial model

Then we can fit the model:

fit_nb <- MASS::glm.nb(y ~ x, data = dat)
summary(fit_nb)
#> 
#> Call:
#> MASS::glm.nb(formula = y ~ x, data = dat, init.theta = 11.32327809, 
#>     link = log)
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -2.7104  -0.7507  -0.1075   0.6324   2.4510  
#> 
#> Coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  2.17693    0.08282   26.28   <2e-16 ***
#> x            1.82119    0.13811   13.19   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for Negative Binomial(11.3233) family taken to be 1)
#> 
#>     Null deviance: 273.823  on 99  degrees of freedom
#> Residual deviance:  98.035  on 98  degrees of freedom
#> AIC: 696.14
#> 
#> Number of Fisher Scoring iterations: 1
#> 
#> 
#>               Theta:  11.32 
#>           Std. Err.:  2.38 
#> 
#>  2 x log-likelihood:  -690.143

Negative-binomial model

Here we compare the expected variance (\(\pm 2\sqrt{V(y_i)}\)) of the Negative binomial and Poisson models:

Negative-binomial model

Here we compare the expected variance (\(\pm 2\sqrt{V(y_i)}\)) of the three models:

Negative-binomial model

We can compare the coefficients fitting the Poisson, the Quasi-Poisson and the Negative-binomial model:

fit_p <- glm(y ~ x, data = dat, family = poisson(link = "log"))
fit_qp <- glm(y ~ x, data = dat, family = quasipoisson(link = "log"))
car::compareCoefs(fit_nb, fit_p, fit_qp)
#> Calls:
#> 1: MASS::glm.nb(formula = y ~ x, data = dat, init.theta = 11.32327809, link 
#>   = log)
#> 2: glm(formula = y ~ x, family = poisson(link = "log"), data = dat)
#> 3: glm(formula = y ~ x, family = quasipoisson(link = "log"), data = dat)
#> 
#>             Model 1 Model 2 Model 3
#> (Intercept)  2.1769  2.2043  2.2043
#> SE           0.0828  0.0534  0.0970
#>                                    
#> x            1.8212  1.7723  1.7723
#> SE           0.1381  0.0802  0.1456
#> 

Negative-binomial model (NB) vs Quasi-poisson (QP)

  • The NB has the likelihood function thus AIC, LRT and other likelihood-based metrics works compared to the QP
  • The NB assume a different mean-variance relationship thus estimated coefficients could be different to the P where QP produce the same estimates.
  • Both NB and QP estimate higher standard errors in the presence of overdispersion
  • If there is evidence of overdispersion, the important is to fit a model that take into account it

Simulate NB data #extra1

If you want to try to simulate NB data you can use the MASS::rnegbin(n, mu, theta) function or the rnb(n, mu, vmr) custom function that could be more intuitive because requires \(\mu\) (i.e., the mean) and vmr that is the desired variance-mean ratio. Using message = TRUE it will tells the \(\theta\) value:

y <- rnb(1e5, mu = 10, vmr = 7, message = TRUE)
#> y ~ NegBin(mu = 10, theta = 1.67), var = 70.00, vmr = 7.00
nprint(mu = mean(y), v = var(y), vmr = var(y) / mean(y))
#>     mu      v    vmr 
#>  9.958 69.569  6.986

Simulate NB data #extra

You can also use the theta parameter within the rnb function. As \(\theta\) increase the overdispersion is reduced. Thus you can think the inverse of \(1/\theta\) as an index of overdispersion.

y <- rnb(1e5, mu = 10, theta = 10, message = TRUE)
y <- rnb(1e5, mu = 10, theta = 5, message = TRUE)
y <- rnb(1e5, mu = 10, theta = 1, message = TRUE)

Deviance based pseudo-\(R^2\)

The Deviance based pseudo-\(R^2\) is computed from the ratio between the residual deviance and the null deviance1:

\[ R^2 = 1 - \frac{D_{current}}{D_{null}} \]

1 - fit_p$deviance/fit_p$null.deviance
#> [1] 0.6229535

McFadden’s pseudo-\(R^2\)

The McFadden’s pseudo-\(R^2\) compute the ratio between the log-likelihood of the intercept-only (i.e., null) model and the current model (McFadden, 1987):

\[ R^2 = 1 - \frac{\log(\mathcal{L_{current}})}{\log(\mathcal{L_{null}})} \]

There is also the adjusted version that take into account the number of parameters of the model. In R can be computed manually or using the performance::r2_mcfadden():

performance::r2_mcfadden(fit_p)
#> # R2 for Generalized Linear Regression
#>        R2: 0.396
#>   adj. R2: 0.394

Nagelkerke’s pseudo-\(R^2\)

The Cox and Snell’s (Cox & Snell, 1989) \(R^2\) is defined as:

\[ R^2 = 1 - \left(\frac{\mathcal{L_{null}}}{\mathcal{L_{current}}}\right)^{\frac{2}{n}} \]

Where Nagelkerke (Nagelkerke, 1991) provide a correction to set the range of values between 0 and 1.

performance::r2_nagelkerke(fit_p)
#> Nagelkerke's R2 
#>       0.9950421

Extra

Interaction, Poisson vs Linear1

Let’s simulate a GLM where there is a main effect but no interaction:

Code
dat <- sim_design(200, nx = list(x = rnorm(200)), cx = list(g = c("a", "b")))
dat <- sim_data(dat, exp(log(10) + log(1.5) * x + log(2) * g_c + 0 * g_c * x), model = "poisson")
dat |> 
  ggplot(aes(x = x, y = y, color = g)) +
  geom_point() +
  stat_smooth(method = "glm",
              method.args = list(family = poisson()))

Interaction, Poisson vs Linear

Let’s fit a linear model and the Poisson GLM with the interaction term:

fit_lm <- lm(y ~ x * g, data = dat)
summary(fit_lm)
#> 
#> Call:
#> lm(formula = y ~ x * g, data = dat)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -10.9594  -2.7974  -0.3394   2.4466  14.8487 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  11.0610     0.2941  37.603   <2e-16 ***
#> x             3.9829     0.2979  13.371   <2e-16 ***
#> g2            9.9577     0.4163  23.921   <2e-16 ***
#> x:g2          4.0287     0.4347   9.269   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 4.116 on 396 degrees of freedom
#> Multiple R-squared:  0.7717, Adjusted R-squared:   0.77 
#> F-statistic: 446.3 on 3 and 396 DF,  p-value: < 2.2e-16
fit_glm <- glm(y ~ x * g, data = dat, family = poisson(link = "log"))
summary(fit_glm)
#> 
#> Call:
#> glm(formula = y ~ x * g, family = poisson(link = "log"), data = dat)
#> 
#> Deviance Residuals: 
#>      Min        1Q    Median        3Q       Max  
#> -2.61194  -0.71125  -0.05532   0.55622   2.75160  
#> 
#> Coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  2.33573    0.02238 104.350   <2e-16 ***
#> x            0.38286    0.02254  16.986   <2e-16 ***
#> g2           0.64219    0.02765  23.226   <2e-16 ***
#> x:g2         0.02549    0.02850   0.894    0.371    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for poisson family taken to be 1)
#> 
#>     Null deviance: 1816.48  on 399  degrees of freedom
#> Residual deviance:  385.92  on 396  degrees of freedom
#> AIC: 2157.8
#> 
#> Number of Fisher Scoring iterations: 4

Interaction, Poisson vs Linear

The linear model is estimating a completely misleading interaction due to the mean-variance relationship of the Poisson distribution.

Code
lp_plot <- dat |> 
  select(id, g, x) |> 
  add_predict(fit_glm, se.fit = TRUE) |> 
  ggplot(aes(x = x, 
             y = fit, 
             fill = g,
             ymin = fit - se.fit * 2, 
             ymax = fit + se.fit * 2)) +
  geom_line() +
  geom_ribbon(alpha = 0.5) +
  ylab("y") +
  ggtitle("Linear Predictor")
lp_plot | plot(ggeffect(fit_glm, terms = c("x", "g"))) | plot(ggeffect(fit_lm, terms = c("x", "g")))

References

Cox, D. R., & Snell, E. J. (1989). Analysis of binary data, second edition. CRC Press. https://play.google.com/store/books/details?id=0R8J71LCLXsC
McFadden, D. (1987). Regression-based specification tests for the multinomial logit model. Journal of Econometrics, 34, 63–82. https://doi.org/10.1016/0304-4076(87)90067-4
Nagelkerke, N. J. D. (1991). A note on a general definition of the coefficient of determination. Biometrika, 78, 691–692. https://doi.org/10.1093/biomet/78.3.691
Ver Hoef, J. M., & Boveng, P. L. (2007). Quasi-poisson vs. Negative binomial regression: How should we model overdispersed count data? Ecology, 88, 2766–2772. https://doi.org/10.1890/07-0043.1