Home Artificial Intelligence Create Your Own Nested Sampling Algorithm for Bayesian Parameter Fitting and Model Selection (With Python) Level Up Coding

Create Your Own Nested Sampling Algorithm for Bayesian Parameter Fitting and Model Selection (With Python) Level Up Coding

2
Create Your Own Nested Sampling Algorithm for Bayesian Parameter Fitting and Model Selection (With Python)
Level Up Coding

In today’s recreational coding exercise, we learn a more advanced and robust Monte Carlo approach for model parameter fitting, which also allows us to calculate the Bayesian evidence of a model and perform model selection. Previously, we now have implemented the easy Metropolis-Hastings MCMC algorithm, and I highly recommend trying out the linked post as a primer on Monte Carlo methods and the Bayesian framework.

You might find the accompanying Nested Sampling Python code on GitHub.

Before we start, below is a gif of what running our model fitting algorithm looks like:

Nested Sampling in action

Nested Sampling

The Nested Sampling algorithm extends ideas from Metropolis-Hastings MCMC. Along with obtaining the posteriors of model parameters being fit to a knowledge set, the strategy calculates the Bayesian evidence (Z) of the model. The algorithm originates from John Skilling (2004).

We discussed the Bayesian framework in Metropolis-Hastings MCMC algorithm. To recap, we recall :

Bayes Theorem

where:

  • θ is the set of model parameters
  • D is the info set
  • H is the model (‘hypothesis’)
  • P(θ|D,H) is the posterior distribution of model parameters given the info and model
  • P(θ|H) is the prior distribution for the model parameters given the model
  • Z=P(D|H) is the

In the usual Metropolis-Hastings MCMC algorithm, the Bayesian evidence is only a normalization factor that will be ignored. Nevertheless, here we’re considering calculating it, which is a challenge since it is a high-dimensional integral defined as the typical likelihood over the prior:

Bayesian evidence

Luckily, Nested Sampling gives us just the answer to calculate this essential quantity.

The algorithm creates a set variety of ‘live’ particles that sample the parameter space. The live particles are initially drawn from the prior distribution. In each step i=1..M of the algorithm, perform the next:

  1. Find the live particle that has the least likelihood
  2. Replace that particle with a newly proposed higher-likelihood particle by choosing one other live particle at random and applying a small variety of MCMC steps to it. Repeat until a set variety of acceptances (e.g. 10) have taken place in the method to make sure the latest particle is uncorrelated with the live sample. Within the MCMC step, accept the proposed set of parameters with a probability equal to the ratio of priors
  3. Keep a record of the live particles which might be kicked out of the sample, and their likelihoods L_i, and weight w_i=exp(-i/N_live)

At the top of the algorithm, calculate the Bayesian evidence as:

numerical estimation of the Bayesian evidence

Our code does just this. In the instance, we create 20 live particles and a threshold of 10 accepted MCMC steps are used to update particles. A complete of M=600 outer iterations are taken:

Within the code above, we now have added a refinement to the MCMC step to make it adaptive; we discuss this next.

Adding adaptivity to the MCMC proposal step

Adding adaptivity to the proposal step within the MCMC algorithm is helpful to assist us seek for good parts of the parameter space in less time. By adaptivity, we mean that in the beginning of the algorithm our initial guesses for model parameters could also be removed from a very good fit to the info so MCMC proposals to seek out higher ones would profit from taking large steps. Nevertheless, late within the algorithm when we now have presumably honed in on an in depth fit, we’d wish to take small perturbations around it to have a greater sampling of likely suits. We’d just like the proposal acceptance rate to be around 50%, subsequently we adapt the Gaussian widths of the proposal distribution in response to:

Having the acceptance rate be around 50% is sensible because if the speed were much smaller the algorithm could be wasting quite a lot of effort by rejecting too many proposals, but we’d also wish to have some rejection, which helps with higher sampling the space.

Resampling the Posterior Distribution

To acquire the posterior distribution of model parameters, we’d like to perform a post-processing step. We simply resample the live particles that were kicked out/collected with weights:

posterior resampling weights

That is completed with numpy’s random.alternative function:

Model Selection using the Bayesian Evidence

If two different models (hypotheses, H₁ and H₂, each with their very own set of parameters) are used to suit the info set, comparing their Bayesian evidences (Z) will be used to make your mind up whether one model is preferred over the opposite. The comparison is formally done by calculating the Bayes Factor B:

Bayes Factor

where P(H₂)/P(H₁) is the prior probability ratio of the 2 models, and in lots of contexts, without more info, it might be set to 1.

As a rule of thumb, a difference in model evidences of:

  • means evidence to favor one model over the opposite
  • means evidence
  • means evidence
  • means evidence

(where ‘log’ here is the natural logarithm). This heuristic will be used to assist resolve whether the one model is strongly favored over one other by the info.

Nested Sampling vs Metropolis-Hastings MCMC

Nested Sampling has several benefits over Metropolis-Hastings MCMC, including:

✅ Multiple particles concurrently explore the parameter space

  • higher coverage, faster convergence, and fewer probability of getting stuck in an area minimum

✅ Calculation of the Bayesian Evidence Z

  • allows one to quantitatively resolve which model is healthier at explaining the info set

✅ Well-defined termination point

  • search for convergence of Z

2 COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here