Home Artificial Intelligence Create Your Own Metropolis-Hastings Markov Chain Monte Carlo Algorithm for Bayesian Inference (With Python) Level Up Coding

Create Your Own Metropolis-Hastings Markov Chain Monte Carlo Algorithm for Bayesian Inference (With Python) Level Up Coding

2
Create Your Own Metropolis-Hastings Markov Chain Monte Carlo Algorithm for Bayesian Inference (With Python)
Level Up Coding

In today’s recreational coding exercise, we learn the way to fit model parameters to data (with error bars) and acquire the more than likely distribution of modeling parameters that best explain the info, called the posterior distribution. We’ll accomplish that in a Bayesian framework, which is a really powerful approach since it allows us to include prior knowledge and uncertainties, and to update our beliefs concerning the model parameters as we observe more data. We’ll sample the posterior distribution of model parameters using an easy and general Markov Chain Monte Carlo (MCMC) method often known as the Metropolis-Hastings algorithm.

You might find the accompanying Python code on GitHub.

Before we start, below is a gif of what running our model fitting algorithm looks like:

MCMC fitting

Bayesian Framework

We’ll use a Bayesian framework for model fitting and parameter estimation. Supposing a model with a set of parameters and given a dataset D, we seek to seek out the posterior distribution P(|D) for the model parameters using :

Bayes Theorem

where

  • P(|D) is the probability for the parameters to have a price given the info D are true.
  • P(D|) is the of the info D, assuming is true.
  • P() is the probability that a set of parameters is true (before seeing any data).
  • P(D) is the , that’s, the probability of D being true.

That’s all we’ll need for Bayesian inference. A straightforward, yet powerful idea.

Metropolis-Hastings Markov Chain Monte Carlo

The Metropolis-Hastings MCMC algorithm will randomly sample the posterior distribution. A Markov Chain is a stochastic process where each state within the chain only is determined by the previous state. The algorithm will:

  1. Draw a random value for from the prior distribution, prev
  2. For i = 1 … N, where N is the length of the Markov Chain:
  • Propose a latest value prop so as to add to the chain, by adding a random perturbation to prev using a proposal distribution
  • Evaluate the posterior probabilities Pprop and Pprev of the parameters prop and prev
  • Draw a random number U from a uniform distribution between 0 and 1
  • If U < min(1, Pprop/Pprev) then add prop to the Markov chain and set prev=prop, else add (one other copy of) prev to the chain.

3. Cut off the start of the chain (the ‘burn-in’ region) where the model parameters are removed from being a great fit.

When it comes to code, this looks like:

The prior distribution may, for instance, be a uniform distribution with a min and max value, if not much is understood concerning the parameters.

The proposal distribution (within the ‘propose()’ function) is a random perturbation that’s added to every parameter, as a strategy to traverse the parameter space. For instance, the perturbation step is usually a random value drawn from a Gaussian. The necessary thing to set in that case is the usual deviation of the Gaussian. It is smart for the perturbation to not be too large or too small, so in this instance, we set it to be 2% of the parameter range. The precise step size doesn’t matter an excessive amount of, but it may possibly affect the proposal accept/reject rate within the MCMC algorithm. If the parameters are bounded then we also must treat the case of what happens if a proposed set of parameters prop is out of bounds; in such a case, we will reflect the worth across the boundary to be back in bounds.

Finally, at the center of the algorithm is the calculation of the posterior. And a very powerful a part of that step is the calculation of the likelihood, which is determined by the model and parameters you’re fitting (more on this in the following section). Since all that matters within the MCMC algorithm is the ratio of posteriors Pprop/Pprev, we don’t must calculate the marginalization P(D) since it cancels within the ratio. Moreover, if the priors are uniform/flat, we will ignore calculating those too because in addition they cancel.

Note: In some cases, the likelihood is usually a very small number. For numerical reasons, it may possibly be more robust to as a substitute calculate the log of the likelihood directly.

Example: Fitting Exoplanet Orbital Parameters of a Mock Radial Velocity Dataset

We’ll work out a concrete example of the Metropolis-Hastings MCMC getting used to suit a (mock) dataset of exoplanet radial velocity measurements to get better the orbital parameters of the model.

Radial Velocity measurements

A telescope measures the ‘wobble’ (radial velocity) of a star, because of an exoplanet orbiting it. This radial velocity v(t) measured at several cut-off dates, t, is determined by the orbital parameters:

RV equation

where

orbital parameter definitions

Given measurements vᵢ at times tᵢ with (Gaussian) errors σᵢ, in addition to an unknown stellar jitter s (one other parameter that we also fit for in our evaluation), the likelihood of a given set of parameters V, s, K, ϖ, e, P, χ] is:

Likelihood

2 COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here