Home Artificial Intelligence Hidden Markov Models Explained with a Real Life Example and Python code

Hidden Markov Models Explained with a Real Life Example and Python code

0
Hidden Markov Models Explained with a Real Life Example and Python code

The true world is filled with phenomena for which we will see the ultimate end result, but can’t actually observe the underlying aspects that generated those outcomes. One example is predicting the weather, determining if it’s going to be rainy or sunny tomorrow, based on past weather observations and the observed probabilities of the various weather outcomes.

Although driven by aspects we will’t observe, with an Hidden Markov Model it’s possible to model these phenomena as probabilistic systems.

Hidden Markov Models, generally known as HMM for brief, are statistical models that work as a sequence of labeling problems. These are the forms of problems that describe the evolution of observable events, which themselves, are depending on internal aspects that may’t be directly observed — they’re hidden[3].

An Hidden Markov Model is manufactured from two distinct stochastic processes, meaning those are processes that could be defined as sequences of random variables — variables that rely on random events.

There’s an invisible process and an observable process.

The invisible process is a Markov Chain, like chaining together multiple hidden states which might be traversed over time so as to reach an end result. It is a probabilistic process because all of the parameters of the Markov Chain, in addition to the rating of every sequence, are in reality probabilities[4].

Hidden Markov Models describe the evolution of observable events, which themselves, are depending on internal aspects that may’t be directly observed — they’re hidden[3]

Just like every other Markov Chain, so as to know which state you’re going next, the one thing that matters is where you are actually — wherein state of the Markov Chain you’re currently in. Not one of the previous history of states you’ve been up to now matters to know where you’re going next.

This sort of short-term memory is one among the important thing characteristics of HMMs and it’s called the Markov Assumption, indicating that the probability of reaching the subsequent state is barely depending on the probability of the present state.

Markov Assumption. (Image by Creator)

The opposite key characteristic of an HMM, is that it also assumes that every statement is barely depending on the state that produced it due to this fact, being completely independent from every other state within the chain[5].

The Markov Assumption states that the probability of reaching the subsequent state is barely depending on the probability of the present state.

That is all great background information on HMM but, what classes of problems are they really utilized in?

HMMs help model the behavior of phenomena. Besides modeling and allowing to run simulations, you can even ask several types of questions those phenomena:

  • Likelihood or Scoring, as in, determining the probability of observing a sequence
  • Decoding the very best sequence of states that generated a selected statement
  • Learning the parameters of the HMM that led to observing a given sequence, that traversed a selected set of states.

Let’s have a look at this in practice!

Today you’re not as fearful concerning the weather forecast, what’s in your mind is that your dog is possibly graduating from their training lessons. In any case the time, effort and dog treats involved, all you would like is for them to succeed.

During dog training sessions, your four-legged friend is predicted to do a couple of actions or tricks, so the trainer can observe and grade their performance. After combining the scores of three trials, they’ll determine in case your dog graduates or needs additional training.
The trainer only sees the end result, but there are several aspects involved that may’t be directly observed resembling, in case your dog is drained, completely happy, in the event that they don’t just like the trainer in any respect or the opposite dogs around them.

None of those are directly observed, unless there’s undoubtably a selected motion your dog does only after they feel a certain way. Can be great in the event that they could express how they feel in words, perhaps in the longer term!

With Hidden Markov Models fresh in your mind, this looks like the right opportunity to attempt to predict how your dog was feeling through the exam. They may get a certain rating because they were feeling drained, perhaps they were hungry, or they were annoyed on the trainer.

Your dog has been taking lessons for some time and, based on data collected during that training, you could have all of the constructing blocks needed to construct a Hidden Markov Model.

With the intention to construct a HMM that models the performance of your dog within the training evaluation you would like:

  • Hidden States
  • Transition Matrix
  • Sequence of Observations
  • Commentary Likelihood Matrix
  • Initial Probability Distribution

Hidden States are those non-observable aspects that influence the statement sequence. You’ll only consider in case your dog is Drained or Joyful.

Different hidden states within the HMM. (Image by Creator)

Knowing your dog thoroughly, the non-observable aspects that may impact their exam performance are simply being drained or completely happy.

Next you must know what’s the probability of going from one state to a different, which is captured in a Transition Matrix. This matrix must even be row stochastic meaning that the possibilities from one state to every other state within the chain, each row within the matrix, must sum to 1.

Transition Matrix: represents the probability of moving from one state to a different. (Image by Creator)

No matter what form of problem you’re solving for, you mostly need a Sequence of Observations. Each statement representing the results of traversing the Markov Chain. Each statement is drawn from a selected vocabulary.

Vocabulary (Image by Creator)

Within the case of your dog’s exam you observe the rating they get after each trial, which could be Fail, OK or Perfect. These are all of the possible terms within the statement vocabulary.

You furthermore mght need the Commentary Likelihood Matrix, which is the probability of an statement being generated from a selected state.

Commentary Likelihood Matrix. (Image by Creator)

Finally, there’s the Initial Probability Distribution. That is the probability that the Markov Chain will start in each specific hidden state.

There can be some states won’t ever be the starting state within the Markov Chain. In these situations, their initial probability is zero. And similar to the possibilities within the Transition Matrix, these sum of all initial probabilities must add up to 1.

Initial Probabilities (Image by Creator)

The Initial Probability Distribution, together with the Transition Matrix and the Commentary Likelihood, make up the parameters of an HMM. These are the possibilities you’re determining if you could have a sequence of observations and hidden states, and try to learn which specific HMM could have generated them.

Putting all of those pieces together, that is what the Hidden Markov model that represents your dog’s performance on the training exam looks like

Hidden states and the transition probabilities between them. (Image by Autor)

In the course of the exam, your dog will perform three trials, and graduate only in the event that they don’t Fail in two of those trials.

At the top of the day, in case your dog needs more training, you’ll take care of all of them the identical. The large query circling your mind is How are they feeling through the exam.

Imagining a scenario where they graduate with a rating of OK — Fail — Perfect exactly on this order, what sequence of emotional states will they be in? Will they be mostly drained, or hungry throughout, or perhaps a combination of each?

Any such problem falls right under the category of Decoding problems that HMMs could be applied to. On this case, you’re interested determining what’s the very best sequence of states that generated a selected sequence of observations, OK — Fail — Perfect.

The issue of decoding the sequence of states that generated a given sequence of observations leverages the Viterbi Algorithm. Nonetheless, is price doing a brief detour and take a peek into how you might calculate the probability of a given statement sequence — a Likelihood task — using the Forward Algorithm. This can set the stage to higher understanding how the Viterbi Algorithm works.

For those who were modeling this problem like a daily Markov Chain, and desired to calculate the likelihood of observing the sequence of outcomes OK, Fail, Perfect you’d traverse the chain by landing in each specific state that generates the specified end result. At each step you’d take the conditional probability of observing the present end result provided that you’ve observed the previous end result and multiply that probability by the transition probability of going from one state to the opposite.

The large difference is that, in a daily Markov Chain, all states are well-known and observable. Not in an Hidden Markov Model! In an Hidden Markov Model you observe a sequence of outcomes, not knowing which specific sequence of hidden states needed to be traversed so as to observe that.

The large difference is that, in a daily Markov Chain, all states are well-known and observable. Not in an Hidden Markov Model!

At this point you is likely to be pondering, Well I can simply traverse all possible paths and eventually have a rule to select between equivalent paths. The mathematical definition for this approach looks something like this

Calculating the probability of observing a sequence of outcomes, traversing every hidden state sequence possible. (Image by Creator)

That’s one strategy needless to say! You’d should calculate the probability of observing the sequence OK, Fail, Perfect for each single combination of hidden states that would ever generate that sequence.

When you could have a sufficiently small variety of hidden states and sequence of observed outcomes, it’s possible to try this calculation inside an affordable time.

Outline of the possible paths in your HMM (Image by Creator)

Thankfully, the Hidden Markov model you only defined is comparatively easy, with 3 observed outcomes and a couple of hidden states.

For an observed sequence of length L outcomes, on a HMM with M hidden states, you could have “M to the facility L” possible states which in your case, means 2 to the facility of three, i.e., 8 possible paths for the sequence OK — Fail — Perfect, involving an exponential computational complexity of O(M^L L), described in Big O-Notation. Because the complexity of the model increases, the variety of paths you must have in mind grows exponentially.

Because the complexity of the model increases, the variety of paths you must have in mind grows exponentially.

That is where the Forward Algorithm shines.

The Forward Algorithm calculates the probability of a recent symbol within the observed sequence, without the necessity to calculate the possibilities of all possible paths that form that sequence [3].

As an alternative of computing the possibilities of all possible paths that form that sequence the algorithm defines the forward variable and calculates its value recursively.

How the forward variable is calculated recursively. (Image by Creator)

The incontrovertible fact that it uses recursion, is the important thing reason why this algorithm is quicker than calculating all the possibilities of possible paths. In reality, it could possibly calculate the probability of observing the sequence x in just “L times M squared” computations, as a substitute of “M to the facility of L times L”.

In your case, with 2 hidden states and a sequence of three observed outcomes, it’s the difference between calculating the possibilities O(MˆL L) = 2³x3 = 8×3 = 24 times, versus O(L Mˆ2)=3*2²=3×4 = 12 times.

This reduction within the variety of calculations is achieved by Dynamic Programming, a programming technique that uses an auxiliary data structures to store intermediate information, due to this fact ensuring the identical calculations will not be done multiple times.

Each time the algorithm is about to calculate a recent probability it checks if it has already computed it, and if that’s the case, it could possibly easily access that value within the intermediate data structure. Otherwise, the probability is calculated and the worth is stored.

Let’s get back to your decoding problem, using the Viterbi Algorithm.

Pondering in pseudo code, For those who were to brute force your way into decoding the sequence of hidden states that generate a selected statement sequence, all you needed to do was:

  • generate all possible permutations of paths that result in the specified statement sequence
  • use the Forward Algorithm to calculate the likelihood of every statement sequence, for every possible sequence of hidden states
  • pick the sequence of hidden states with highest probability
All possible hidden state sequences that generate the statement sequence OK — Fail — Perfect (Image by Creator)

On your specific HMM, there are 8 possible paths that result in an end result of OK — Fail — Perfect. Add just another statement, and also you’ll have double the quantity of possible sequences of hidden states! Similarly to what was described for the Forward Algorithm, you easily find yourself with an exponentially complex algorithm and hit performance ceiling.

The Viterbi Algorithm, gives you a hand with that.

When the sequence of hidden states within the HMM is traversed, at each step, the probability vt(j) is the probability that the HMM is within the hidden state j after seeing the statement and is being traversed through essentially the most probable state that result in j.

Viterbi path to hidden state j on time step t. (Image by Creator)

The important thing to decoding the sequence of hidden states that generate a selected statement sequence, is this idea of the most probable path. Also called the Viterbi path, essentially the most probable path, is the trail that has highest likelihood, from all of the paths that may result in any given hidden state.

The important thing to decoding the sequence of hidden states that generate a selected statement sequence, is to make use of the Viterbi path. Probably the most probable path that results in any given hidden state.

You may draw a parallel between the Forward Algorithm and the Viterbi Algorithm. Where the Forward Algorithm sums all probabilities to acquire the likelihood of reaching a certain state bearing in mind all of the paths that lead there, the Viterbi algorithm doesn’t need to explore all possibilities. It focuses on essentially the most probable path that results in any given state.

Going back to the duty of decoding the sequence of hidden states that result in the scores of OK — Fail — Perfect of their exam, running the Viterbi Algorithm by hand would appear like this

Viterbi paths and decoded sequence. (Image by Creator)

One other unique characteristic of the Viterbi algorithm is that it will need to have a approach to keep track of all of the paths that led to any given hidden state, so as to compare their probabilities. To do this it keeps track of backpointers to every hidden state, using an auxiliary data structure typical of dynamic programming algorithms. That way it could possibly easily access the probability of any viterbi path traversed up to now.

Backpointers are the important thing to determine essentially the most probable path that results in an statement sequence.

In the instance of your dogs’ exam, once you calculate the Viterbi paths v3(Joyful) and v3(Drained), you choose the trail with highest probability and begin going backwards, i.e., backtracking, through all of the paths that led to where you might be.

Doing all of this by hand is time consuming and error prone. Miss one significant digit and you may have to start out from scratch and re-check all of your probabilities!

The excellent news is that you would be able to leverage software libraries like hmmlearn, and with a couple of lines of code you possibly can decode the sequence of hidden states that result in your dog graduating with OK — Fail — Perfect within the trials, exactly on this order.

from hmmlearn import hmm
import numpy as np

## Part 1. Generating a HMM with specific parameters and simulating the exam
print("Setup HMM model with parameters")
# init_params are the parameters used to initialize the model for training
# s -> start probability
# t -> transition probabilities
# e -> emission probabilities
model = hmm.CategoricalHMM(n_components=2, random_state=425, init_params='ste')

# initial probabilities
# probability of starting within the Drained state = 0
# probability of starting within the Joyful state = 1
initial_distribution = np.array([0.1, 0.9])
model.startprob_ = initial_distribution

print("Step 1. Complete - Defined Initial Distribution")

# transition probabilities
# drained completely happy
# drained 0.4 0.6
# completely happy 0.2 0.8

transition_distribution = np.array([[0.4, 0.6], [0.2, 0.8]])
model.transmat_ = transition_distribution
print("Step 2. Complete - Defined Transition Matrix")

# statement probabilities
# Fail OK Perfect
# drained 0.3 0.5 0.2
# completely happy 0.1 0.5 0.4

observation_probability_matrix = np.array([[0.3, 0.5, 0.2], [0.1, 0.5, 0.4]])
model.emissionprob_ = observation_probability_matrix
print("Step 3. Complete - Defined Commentary Probability Matrix")

# simulate performing 100,000 trials, i.e., aptitude tests
trials, simulated_states = model.sample(100000)

# Output a sample of the simulated trials
# 0 -> Fail
# 1 -> OK
# 2 -> Perfect
print("nSample of Simulated Trials - Based on Model Parameters")
print(trials[:10])

## Part 2 - Decoding the hidden state sequence that leads
## to an statement sequence of OK - Fail - Perfect

# split our data into training and test sets (50/50 split)
X_train = trials[:trials.shape[0] // 2]
X_test = trials[trials.shape[0] // 2:]

model.fit(X_train)

# the exam had 3 trials and your dog had the next rating: OK, Fail, Perfect (1, 0 , 2)
exam_observations = [[1, 0, 2]]
predicted_states = model.predict(X=[[1, 0, 2]])
print("Predict the Hidden State Transitions that were being the exam scores OK, Fail, Perfect: n 0 -> Drained , "
"1 -> Joyful")
print(predicted_states)

In a couple of seconds you get an output that matches results the calculations you probably did by hand, much fast and with much less room for error.

Output of running the code above. (Image by Creator)

What’s fascinating about Hidden Markov Models is how this statistical tool created within the mid 1960’s [6] is so powerful and applicable to real world problems in such distinct areas, from weather forecasting to finding the subsequent word in a sentence.

In this text, you had the prospect to study the various components of an HMM, how they could be applied to several types of tasks, and spotting the similarities between the Forward Algorithm and Viterbi Algorithm. Two very similar algorithms that use dynamic programming to cope with the exponential variety of calculations involved.

Either doing the calculations by hand or plugging within the parameters into TensorFlow code, hope you enjoyed diving deep into the world of HMMs.

Thanks for reading!

  1. D. Khiatani and U. Ghose, “Weather forecasting using Hidden Markov Model,” 2017 International Conference on Computing and Communication Technologies for Smart Nation (IC3TSN), Gurgaon, India, 2017, pp. 220–225, doi: 10.1109/IC3TSN.2017.8284480.
  2. Noguchi H, Kato R, Hanai T, Matsubara Y, Honda H, Brusic V, Kobayashi T. Hidden Markov model-based prediction of antigenic peptides that interact with MHC class II molecules. J Biosci Bioeng. 2002;94(3):264–70. doi: 10.1263/jbb.94.264. PMID: 16233301.
  3. Yoon BJ. Hidden Markov Models and their Applications in Biological Sequence Evaluation. Curr Genomics. 2009 Sep;10(6):402–15. doi: 10.2174/138920209789177575. PMID: 20190955; PMCID: PMC2766791.
  4. Eddy, S. What’s a hidden Markov model?. Nat Biotechnol 22, 1315–1316 (2004). https://doi.org/10.1038/nbt1004-1315
  5. Jurafsky, Dan and Martin, James H.. Speech and language processing : an introduction to natural language processing, computational linguistics, and speech recognition. Upper Saddle River, N.J.: Pearson Prentice Hall, 2009.
  6. Baum, Leonard E., and Ted Petrie. “Statistical Inference for Probabilistic Functions of Finite State Markov Chains.The Annals of Mathematical Statistics 37, no. 6 (1966): 1554–63.

LEAVE A REPLY

Please enter your comment!
Please enter your name here