The Hidden Trap of Fixed and Random Effects

-

What Are Random Effects and Fixed Effects?

When designing a study, we frequently aim to isolate independent variables from those of no interest to watch their true effects on the dependent variables. For instance, let’s say we would love to check the results of using Github Copilot () on developer productivity (). One approach is to measure how much time developers spend using Copilot and the way quickly they complete coding tasks. At first glance, we may observe a robust positive correlation: more Copilot usage, faster task completion.

Nonetheless, other aspects also can influence how quickly developers finish their work. For instance, Company A may need faster CI/CD pipelines or take care of smaller and simpler tasks, while Company B may require lengthy code reviews or handle more complex and time-consuming tasks. If we don’t account for these organizational differences, we’d mistakenly conclude that Copilot is less effective for developers in Company B, even though it’s the environment, not Copilot, that really slows them down.

These sorts of group-level variations — differences across teams, corporations, or projects — are typically often called ““ or .

Fixed effects are variables of interest, where each group is treated individually using one-hot coding. This fashion, for the reason that within-group variability is captured neatly inside each dummy variable, we’re assuming the variance of every group is comparable, or homoscedastic.

[y_i = beta_0 + beta_1 x_i + gamma_1 D_{1i} + gamma_2 D_{2i} + cdots + varepsilon_i]

where D1i, D2i, … respectively are dummy variables representing group D1i, D2i, … and γ₁, γ₂, … respectively are fixed effect coefficients for every corresponding group.

Random effects, alternatively, are typically not variables of interest. We assume each group is a component of a broader population and every group effect lies somewhere inside a broader probability distribution of that population. As such, the variance of every group is heterogeneous.

[ y_{ij} = beta_0 + beta_1 x_{ij} + u_j + varepsilon_{ij} ]

where uj is a random effect of group j of sample i, drawn from a distribution, typically a standard distribution 𝒩(0, σ²ᵤ).

Rethink Fastidiously Fixed and Random Effects

Nonetheless, it could mislead your evaluation when you just randomly insert these effects into your model without considering fastidiously about what sorts of variations they are literally capturing.

I recently worked on a project analyzing , which I studied how certain architectural features (variety of parameters, variety of compute, dataset size, and training time) and hardware selections (hardware type, variety of hardware) of AI models affect energy use during training. I discovered that Training_time, Hardware_quantity, and Hardware_type significantly affected the energy usage. The connection may be roughly modeled as:

[ text{energy} = text{Training_time} + text{Hardware_quantity} + text{Hardware}]

Since I assumed there is perhaps differences between organizations, for instance, in coding style, code structure, or algorithm preferences, I believed that including Organization as random effects would help account for all of those unobserved potential differences. To check my assumption, I compared the outcomes of two models: with and without Organization, to see which one is a greater fit. Within the two models, the dependent variable Energy was extremely right-skewed, so I applied a log transformation to stabilize its variance. Here I used Generalized Linear Models (GLM) because the distribution of my data was not normal.

glm <- glm(
  log_Energy ~ Training_time_hour + 
               Hardware_quantity + 
               Training_hardware,
               data = df)
summary(glm)

glm_random_effects <- glmer(
  log_Energy ~ Training_time_hour + 
               Hardware_quantity + 
               Training_hardware + 
               (1 | Organization), // Random effects
               data = df)
summary(glm_random_effects)
AIC(glm_random_effects)

The GLM model without Organization produced an AIC of 312.55, with Training_time, Hardware_quantity, and certain forms of Hardware were statistically significant.

> summary(glm)

Call:
glm(formula = log_Energy ~ Training_time_hour + Hardware_quantity + 
    Training_hardware, data = df)

Coefficients:
                                                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)                                     7.134e+00  1.393e+00   5.123 5.07e-06 ***
Training_time_hour                              1.509e-03  2.548e-04   5.922 3.08e-07 ***
Hardware_quantity                               3.674e-04  9.957e-05   3.690 0.000563 ***
Training_hardwareGoogle TPU v3                  1.887e+00  1.508e+00   1.251 0.216956    
Training_hardwareGoogle TPU v4                  3.270e+00  1.591e+00   2.055 0.045247 *  
Training_hardwareHuawei Ascend 910              2.702e+00  2.485e+00   1.087 0.282287    
Training_hardwareNVIDIA A100                    2.528e+00  1.511e+00   1.674 0.100562    
Training_hardwareNVIDIA A100 SXM4 40 GB         3.103e+00  1.750e+00   1.773 0.082409 .  
Training_hardwareNVIDIA A100 SXM4 80 GB         3.866e+00  1.745e+00   2.216 0.031366 *  
Training_hardwareNVIDIA GeForce GTX 285        -4.077e+00  2.412e+00  -1.690 0.097336 .  
Training_hardwareNVIDIA GeForce GTX TITAN X    -9.706e-01  1.969e+00  -0.493 0.624318    
Training_hardwareNVIDIA GTX Titan Black        -8.423e-01  2.415e+00  -0.349 0.728781    
Training_hardwareNVIDIA H100 SXM5 80GB          3.600e+00  1.864e+00   1.931 0.059248 .  
Training_hardwareNVIDIA P100                   -1.663e+00  1.899e+00  -0.876 0.385436    
Training_hardwareNVIDIA Quadro P600            -1.970e+00  2.419e+00  -0.814 0.419398    
Training_hardwareNVIDIA Quadro RTX 4000        -1.367e+00  2.424e+00  -0.564 0.575293    
Training_hardwareNVIDIA Quadro RTX 5000        -2.309e+00  2.418e+00  -0.955 0.344354    
Training_hardwareNVIDIA Tesla K80               1.761e+00  1.988e+00   0.886 0.380116    
Training_hardwareNVIDIA Tesla V100 DGXS 32 GB   3.415e+00  1.833e+00   1.863 0.068501 .  
Training_hardwareNVIDIA Tesla V100S PCIe 32 GB  3.698e+00  2.413e+00   1.532 0.131852    
Training_hardwareNVIDIA V100                   -3.638e-01  1.582e+00  -0.230 0.819087    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 3.877685)

    Null deviance: 901.45  on 69  degrees of freedom
Residual deviance: 190.01  on 49  degrees of freedom
AIC: 312.55

Variety of Fisher Scoring iterations: 2

Alternatively, the GLM model with Organization produced an AIC of 300.38, much lower than the previous model, indicating a greater model fit. Nonetheless, when taking a more in-depth look, I noticed a big issue: The statistical significance of other variables have gone away, as if Organization took away the importance from them!

> summary(glm_random_effects)
Linear mixed model fit by REML ['lmerMod']
Formula: log_Energy ~ Training_time_hour + Hardware_quantity + Training_hardware +  
    (1 | Organization)
   Data: df

REML criterion at convergence: 254.4

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.65549 -0.24100  0.01125  0.26555  1.51828 

Random effects:
 Groups       Name        Variance Std.Dev.
 Organization (Intercept) 3.775    1.943   
 Residual                 1.118    1.057   
Variety of obs: 70, groups:  Organization, 44

Fixed effects:
                                                 Estimate Std. Error t value
(Intercept)                                     6.132e+00  1.170e+00   5.243
Training_time_hour                              1.354e-03  2.111e-04   6.411
Hardware_quantity                               3.477e-04  7.035e-05   4.942
Training_hardwareGoogle TPU v3                  2.949e+00  1.069e+00   2.758
Training_hardwareGoogle TPU v4                  2.863e+00  1.081e+00   2.648
Training_hardwareHuawei Ascend 910              4.086e+00  2.534e+00   1.613
Training_hardwareNVIDIA A100                    3.959e+00  1.299e+00   3.047
Training_hardwareNVIDIA A100 SXM4 40 GB         3.728e+00  1.551e+00   2.404
Training_hardwareNVIDIA A100 SXM4 80 GB         4.950e+00  1.478e+00   3.349
Training_hardwareNVIDIA GeForce GTX 285        -3.068e+00  2.502e+00  -1.226
Training_hardwareNVIDIA GeForce GTX TITAN X     4.503e-02  1.952e+00   0.023
Training_hardwareNVIDIA GTX Titan Black         2.375e-01  2.500e+00   0.095
Training_hardwareNVIDIA H100 SXM5 80GB          4.197e+00  1.552e+00   2.704
Training_hardwareNVIDIA P100                   -1.132e+00  1.512e+00  -0.749
Training_hardwareNVIDIA Quadro P600            -1.351e+00  1.904e+00  -0.710
Training_hardwareNVIDIA Quadro RTX 4000        -2.167e-01  2.503e+00  -0.087
Training_hardwareNVIDIA Quadro RTX 5000        -1.203e+00  2.501e+00  -0.481
Training_hardwareNVIDIA Tesla K80               1.559e+00  1.445e+00   1.079
Training_hardwareNVIDIA Tesla V100 DGXS 32 GB   3.751e+00  1.536e+00   2.443
Training_hardwareNVIDIA Tesla V100S PCIe 32 GB  3.487e+00  1.761e+00   1.980
Training_hardwareNVIDIA V100                    7.019e-01  1.434e+00   0.489

Correlation matrix not shown by default, as p = 21 > 12.
Use print(x, correlation=TRUE)  or
    vcov(x)        when you need it

fit warnings:
Some predictor variables are on very different scales: consider rescaling
> AIC(glm_random_effects)
[1] 300.3767

Pondering over it fastidiously, it made a variety of sense. Certain organizations may consistently prefer specific forms of hardware, or larger organizations may give you the option to afford dearer hardware and resources to coach greater AI models. In other words, the random effects here likely overlapped and overly explained the variations of our available independent variables, hence they absorbed a big portion of what we were trying to check.

This highlights a very important point: while random or fixed effects are useful tools to regulate for unwanted group-level differences, they also can unintentionally capture the underlying variations of our independent variables. We should always fastidiously consider what these effects truly represent, before just blindly introducing them to our models hoping they might happily absorb all of the noise.


References: Steve Midway, Data Evaluation in R, https://bookdown.org/steve_midway/DAR/random-effects.html

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x