Introduction to the Finite Normal Mixtures in Regression with R

-

Methods to make linear regression flexible enough for non-linear data

The linear regression is frequently considered not flexible enough to tackle the nonlinear data. From theoretical viewpoint it shouldn’t be capable to coping with them. Nonetheless, we will make it work for us with any dataset by utilizing finite normal mixtures in a regression model. This manner it becomes a really powerful machine learning tool which could be applied to virtually any dataset, even highly non-normal with non-linear dependencies across the variables.

What makes this approach particularly interesting comes with interpretability. Despite an especially high level of flexibility all of the detected relations could be directly interpreted. The model is as general as neural network, still it doesn’t change into a black-box. You’ll be able to read the relations and understand the impact of individual variables.

On this post, we reveal tips on how to simulate a finite mixture model for regression using Markov Chain Monte Carlo (MCMC) sampling. We are going to generate data with multiple components (groups) and fit a combination model to recuperate these components using Bayesian inference. This process involves regression models and mixture models, combining them with MCMC techniques for parameter estimation.

Data simulated as a mixtures of three linear regressions

We start by loading the mandatory libraries to work with regression models, MCMC, and multivariate distributions

# Loading the required libraries for various functions
library("pscl") # For pscl specific functions, like regression models
library("MCMCpack") # For MCMC sampling functions, including posterior distributions
library(mvtnorm) # For multivariate normal distribution functio
  • pscl: Used for various statistical functions like regression models.
  • MCMCpack: Incorporates functions for Bayesian inference, particularly MCMC sampling.
  • mvtnorm: Provides tools for working with multivariate normal distributions.

We simulate a dataset where each remark belongs to one in every of several groups (components of the mixture model), and the response variable is generated using a regression model with random coefficients.

We consider a general setup for a regression model using G Normal mixture components.

## Generate the observations
# Set the length of the time series (variety of observations per group)
N <- 1000
# Set the variety of simulations (iterations of the MCMC process)
nSim <- 200
# Set the variety of components within the mixture model (G is the variety of groups)
G <- 3
  • N: The variety of observations per group.
  • nSim: The variety of MCMC iterations.
  • G: The variety of components (groups) in our mixture model.

Simulating Data

Each group is modeled using a univariate regression model, where the explanatory variables (X) and the response variable (y) are simulated from normal distributions. The betas represent the regression coefficients for every group, and sigmas represent the variance for every group.

# Set the values for the regression coefficients (betas) for every group
betas <- 1:sum(dimG) * 2.5 # Generating sequential betas with a multiplier of two.5
# Define the variance (sigma) for every component (group) within the mixture
sigmas <- rep(1, G) / 1 # Set variance to 1 for every component, with a hard and fast divisor of 1
  • betas: These are the regression coefficients. Each group’s coefficient is sequentially assigned.
  • sigmas: Represents the variance for every group within the mixture model.

On this model we allow each mixture component to own its own variance paraameter and set of regression parameters.

Group Task and Mixing

We then simulate the group project of every remark using a random project and blend the information for all components.

We augment the model with a set of component label vectors for

where

and thus z_gi=1 implies that the i-th individual is drawn from the g-th component of the mixture.

This random project forms the z_original vector, representing the true group each remark belongs to.

# Initialize the unique group assignments (z_original)
z_original <- matrix(NA, N * G, 1)
# Repeat each group label N times (assign labels to every remark per group)
z_original <- rep(1:G, rep(N, G))
# Resample the information rows by random order
sampled_order <- sample(nrow(data))
# Apply the resampled order to the information
data <- data[sampled_order,]

We set prior distributions for the regression coefficients and variances. These priors will guide our Bayesian estimation.

## Define Priors for Bayesian estimation# Define the prior mean (muBeta) for the regression coefficients
muBeta <- matrix(0, G, 1)# Define the prior variance (VBeta) for the regression coefficients
VBeta <- 100 * diag(G) # Large variance (100) as a previous for the beta coefficients# Prior for the sigma parameters (variance of every component)
ag <- 3 # Shape parameter
bg <- 1/2 # Rate parameter for the prior on sigma
shSigma <- ag
raSigma <- bg^(-1)
  • muBeta: The prior mean for the regression coefficients. We set it to 0 for all components.
  • VBeta: The prior variance, which is large (100) to permit flexibility within the coefficients.
  • shSigma and raSigma: Shape and rate parameters for the prior on the variance (sigma) of every group.

For the component indicators and component probabilities we consider following prior project

The multinomial prior M is the multivariate generalizations of the binomial, and the Dirichlet prior D is a multivariate generalization of the beta distribution.

On this section, we initialize the MCMC process by establishing matrices to store the samples of the regression coefficients, variances, and mixing proportions.

## Initialize MCMC sampling# Initialize matrix to store the samples for beta
mBeta <- matrix(NA, nSim, G)# Assign the primary value of beta using a random normal distribution
for (g in 1:G) {
mBeta[1, g] <- rnorm(1, muBeta[g, 1], VBeta[g, g])
}# Initialize the sigma^2 values (variance for every component)
mSigma2 <- matrix(NA, nSim, G)
mSigma2[1, ] <- rigamma(1, shSigma, raSigma)# Initialize the blending proportions (pi), using a Dirichlet distribution
mPi <- matrix(NA, nSim, G)
alphaPrior <- rep(N/G, G) # Prior for the blending proportions, uniform across groups
mPi[1, ] <- rdirichlet(1, alphaPrior)
  • mBeta: Matrix to store samples of the regression coefficients.
  • mSigma2: Matrix to store the variances (sigma squared) for every component.
  • mPi: Matrix to store the blending proportions, initialized using a Dirichlet distribution.

If we condition on the values of the component indicator variables z, the conditional likelihood could be expressed as

Within the MCMC sampling loop, we update the group assignments (z), regression coefficients (beta), and variances (sigma) based on the posterior distributions. The likelihood of every group project is calculated, and the group with the best posterior probability is chosen.

The next complete posterior conditionals could be obtained:

where

denotes all of the parameters in our posterior apart from x.

and where n_g denotes the variety of observations within the g-th component of the mixture.

and

Algorithm below draws from the series of posterior distributions above in a sequential order.

## Start the MCMC iterations for posterior sampling# Loop over the variety of simulations
for (i in 2:nSim) {
print(i) # Print the present iteration number

# For every remark, update the group project (z)
for (t in 1:(N*G)) {
fig <- NULL
for (g in 1:G) {
# Calculate the likelihood of every group and the corresponding posterior probability
fig[g] <- dnorm(y[t, 1], X[t, ] %*% mBeta[i-1, g], sqrt(mSigma2[i-1, g])) * mPi[i-1, g]
}
# Avoid zero likelihood and adjust it
if (all(fig) == 0) {
fig <- fig + 1/G
}

# Sample a brand new group project based on the posterior probabilities
z[i, t] <- which(rmultinom(1, 1, fig/sum(fig)) == 1)
}

# Update the regression coefficients for every group
for (g in 1:G) {
# Compute the posterior mean and variance for beta (using the information for group g)
DBeta <- solve(t(X[z[i, ] == g, ]) %*% X[z[i, ] == g, ] / mSigma2[i-1, g] + solve(VBeta[g, g]))
dBeta <- t(X[z[i, ] == g, ]) %*% y[z[i, ] == g, 1] / mSigma2[i-1, g] + solve(VBeta[g, g]) %*% muBeta[g, 1]

# Sample a brand new value for beta from the multivariate normal distribution
mBeta[i, g] <- rmvnorm(1, DBeta %*% dBeta, DBeta)

# Update the variety of observations in group g
ng[i, g] <- sum(z[i, ] == g)

# Update the variance (sigma^2) for every group
mSigma2[i, g] <- rigamma(1, ng[i, g]/2 + shSigma, raSigma + 1/2 * sum((y[z[i, ] == g, 1] - (X[z[i, ] == g, ] * mBeta[i, g]))^2))
}

# Reorder the group labels to keep up consistency
reorderWay <- order(mBeta[i, ])
mBeta[i, ] <- mBeta[i, reorderWay]
ng[i, ] <- ng[i, reorderWay]
mSigma2[i, ] <- mSigma2[i, reorderWay]

# Update the blending proportions (pi) based on the variety of observations in each group
mPi[i, ] <- rdirichlet(1, alphaPrior + ng[i, ])
}

This block of code performs the important thing steps in MCMC:

  • Group Task Update: For every remark, we calculate the likelihood of the information belonging to every group and update the group project accordingly.
  • Regression Coefficient Update: The regression coefficients for every group are updated using the posterior mean and variance, that are calculated based on the observed data.
  • Variance Update: The variance of the response variable for every group is updated using the inverse gamma distribution.

Finally, we visualize the outcomes of the MCMC sampling. We plot the posterior distributions for every regression coefficient, compare them to the true values, and plot the probably group assignments.

# Plot the posterior distributions for every beta coefficient
par(mfrow=c(G,1))
for (g in 1:G) {
plot(density(mBeta[5:nSim, g]), principal = 'True parameter (vertical) and the distribution of the samples') # Plot the density for the beta estimates
abline(v = betas[g]) # Add a vertical line on the true value of beta for comparison
}

This plot shows how the MCMC samples (posterior distribution) for the regression coefficients converge to the true values (betas).

Through this process, we demonstrated how finite normal mixtures could be utilized in a regression context, combined with MCMC for parameter estimation. By simulating data with known groupings and recovering the parameters through Bayesian inference, we will assess how well our model captures the underlying structure of the information.

Unless otherwise noted, all images are by the writer.

ASK DUKE

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x