Home Artificial Intelligence Change Point Detection — A Bayesian Approach

Change Point Detection — A Bayesian Approach

1
Change Point Detection — A Bayesian Approach

For those who tried the exercise above, you found yourself in trouble using just the Kohonen Network to search out any indication of a change point in another time series. That’s since the Kohonen network doesn’t provide us with a change point, but two sets of continuous variables, representing the association of every point to a given cluster.

But recall that the sets μ1(t) and μ2(t) have its values throughout the interval [0,1]. This suggests saying that μ1(t) and μ2(t) are approximations of Beta distributions with different parameters (Have you ever ever heard about Kullback–Leibler?). In line with D’Angelo et al. (2011), assuming that the change point is denoted by m, then for t≤m, we may have a Beta(a,b) distribution and a Beta(c,d) distribution for t>m. Given the properties of a Beta distribution, if there’s any change point within the time series, parameter a will likely be greater than parameter b in Beta(a,b) and parameter c will likely be smaller than parameter d in Beta(c,d).

The issue is: How do you construct those two Beta Distributions? The Metropolis-Hastings Algorithm is a type of a Markov Chain Monte Carlo initially proposed by Metropolis et al. (1953) after which generalized by Hastings (1970). In line with Gelman et al. (2003), the goal of any Markov chain simulation “is to create a Markov process whose stationary distribution is specified by p(θ | x).” Running the simulation enough allows us to acquire a distribution close enough to the stationary and posterior distribution. The posterior may very well be resumed by way of expectations of particular functions of a parameter θ, that’s, ∫g(θ)p(θ | x)dθ = E [g(θ) | x]. Such integral shouldn’t be trivial and that’s why the MCMC methods are good to approximate the posterior distribution.

Metropolis-Hastings’ algorithm uses the concept of rejection, which suggests it generates a price from an auxiliary distribution and accepts it with a given probability. For those who are unfamiliar with MCMC methods, you may be questioning how the algorithm may reject a drawn value. We use the transition rule given by Hastings (1970):

Putting it so as words, we are able to use the 2 sets of continuous variables, given by the fuzzy clustering to reject randomly drawn values from a given prior distribution for the change point detection. If you ought to learn more about MCMC methods, I suggest Gamerman and Lopes (2018).

Let’s get back to the Jupyter Notebook. The function below is an implementation of the Metropolis-Hastings Algorithm for this problem. Although powerfoul, this algorithm requires some things. The primary is to set a previous distribution for each parameter that should be found. For the parameter m, we’re using a uniform distribution between 1 and 60, which suggests the algorithm randomly selects a change point candidate within the time series. For the parameters a, b, c and d, I chosen weakly information gamma distributions. The function also needs the arguments, that are a set of random variables (μ1(t) or μ2(t)) and the variety of simulations.

def Metropolis_Hastings(center_kohonen, n_sims=1000):
n = len(y)
m = 1 + round((n-1) * np.random.uniform(0, 1))
shape, scale, loc = 10, 0.1, 0
#Lists to avoid wasting the date for every parameter
a_params = []
b_params = []
c_params = []
d_params = []
m_params = []
#Prior Distributions for the Parameters
a = stats.erlang.rvs(shape, loc=loc, scale=scale, size=1, random_state=None)[0]
b = stats.erlang.rvs(shape, loc=loc, scale=scale, size=1, random_state=None)[0]
c = stats.erlang.rvs(shape, loc=loc, scale=scale, size=1, random_state=None)[0]
d = stats.erlang.rvs(shape, loc=loc, scale=scale, size=1, random_state=None)[0]

for i in range(0, n_sims):
m1 = 1+round((n-1) * np.random.uniform(0, 1));
a1 = stats.erlang.rvs(shape, loc=loc, scale=scale, size=1, random_state=None)[0]
b1 = stats.erlang.rvs(shape, loc=loc, scale=scale, size=1, random_state=None)[0]
c1 = stats.erlang.rvs(shape, loc=loc, scale=scale, size=1, random_state=None)[0]
d1 = stats.erlang.rvs(shape, loc=loc, scale=scale, size=1, random_state=None)[0]
#PARAM A
aux1 = 1
for j in range(0, m):
try:
aux1 = aux1 * (center_kohonen[j] ** (a1-1))
except:
aux1 = aux1
aux2 = 1
for j in range(0, m):
try:
aux2 = aux2 * center_kohonen[j] ** (a-1)
except:
aux2 = aux2
try:
ra = ((math.gamma(a1+b)/math.gamma(a1))**m)*aux1*((((a/a1)**.9)*math.exp(-.1*(a1-a)))**2)/(((math.gamma(a+b)/math.gamma(a))**m)*aux2)
if (min(1, ra) > np.random.uniform(0, 1)):
a=a1
except:
pass
#PARAM B
aux1 = 1
for j in range(0, m):
try:
aux1 = aux1*(1-center_kohonen[j])**(b1-1)
except:
aux1 = aux1
aux2 = 1
for j in range(0, m):
try:
aux2 = aux2*(1-center_kohonen[j])**(b-1)
except:
aux2 = aux2

try:
rb = ((math.gamma(a+b1)/math.gamma(b1))**m)*aux1*((((b/b1)**.9)*math.exp(-.1*(b1-b)))**2)/(((math.gamma(a+b)/math.gamma(b))**m)*aux2)
if (min(1, rb) > np.random.uniform(0, 1)):
b = b1
except:
pass
#PARAM C
aux1 = 1
for j in range(m, n):
try:
aux1=aux1*center_kohonen[j]**(c1-1)
except:
aux1 = aux1
aux2 = 1
for j in range(m, n):
try:
aux2=aux2*center_kohonen[j]**(c-1)
except:
aux2 = aux2
try:
rc = ((math.gamma(c1+d)/math.gamma(c1))**(n-m))*aux1*((((c/c1)**.9)*math.exp(-.1*(c1-c)))**2)/(((math.gamma(c+d)/math.gamma(c))**(n-m))*aux2)
if (min(1, rc) > np.random.uniform(0, 1)):
c = c1
except:
pass
#PARAM D
aux1 = 1
for j in range(m, n):
try:
aux1=aux1*(1-center_kohonen[j])**(d1-1)
except:
aux1 = aux1
aux2 = 1
for j in range(m, n):
try:
aux2=aux2*(1-center_kohonen[j])**(d-1)
except:
aux2 = aux2
try:
rd = ((math.gamma(c+d1)/math.gamma(d1))**(n-m))*aux1*((((d/d1)**.9)*math.exp(-.1*(d1-d)))**2)/(((math.gamma(c+d)/math.gamma(d))**(n-m))*aux2)
if (min(1, rd) > np.random.uniform(0, 1)):
d = d1
except:
pass
#PARAM M
aux1 = 1
for j in range(0, m1):
try:
aux1 = aux1*(center_kohonen[j]**(a-1))*((1-center_kohonen[j])**(b-1))
except:
aux1 = aux1
aux2 = 1;
for j in range(m1, n):
try:
aux2 = aux2*(center_kohonen[j]**(c-1))*((1-center_kohonen[j])**(d-1))
except:
aux2 = aux2
aux3 = 1
for j in range(0, m):
try:
aux3 = aux3*(center_kohonen[j]**(a-1))*((1-center_kohonen[j])**(b-1))
except:
aux3 = aux3
aux4 = 1
for j in range(m, n):
try:
aux4 = aux4*(center_kohonen[j]**(c-1))*((1-center_kohonen[j])**(d-1))
except:
aux4 = aux4
try:
rm = (((math.gamma(a+b)/(math.gamma(a)*math.gamma(b)))**m1)*((math.gamma(c+d)/(math.gamma(c)*math.gamma(d)))**(n-m1))*aux1*aux2)/(((math.gamma(a+b)/(math.gamma(a)*math.gamma(b)))**m)*((math.gamma(c+d)/(math.gamma(c)*math.gamma(d)))**(n-m))*aux3*aux4)
if (min(1, rm) > np.random.uniform(0, 1)):
m = m1
except:
pass

a_params.append(a)
b_params.append(b)
c_params.append(c)
d_params.append(d)
m_params.append(m)
return a_params, b_params, c_params, d_params, m_params

Call that function with the 2 required parameters: here, I sent center_1 given by the function Fuzzy_Set and n_sims=1000

a_params, b_params, c_params, d_params, m_params = Metropolis_Hastings(center_1, n_sims=1000)
fig_dims = (16, 10)
fig, ax = plt.subplots(figsize=fig_dims)
plt.plot(m_params, 'r')
ax.set(xlabel='# Simulations', ylabel='Change Point Candidates (m)')
The Metropolis-Hastings Algorithm Simulations — Image by creator

You finally found the change point now. This chart is interesting since it shows us how the drawing process was made. The primary value drawn given by the uniform distribution was m=55. The algorithm rejected it after which tried one other until it get a satisfactory and stationary result. After ~150 additional runs the worth of m=30 couldn’t be rejected by the algorithm anymore.

For the reason that function returns the sampled values for each parameter, we may plot their values as well. Starting with the parameter m, which is all of the change points drawn. To see the density plot, chances are you’ll discard the primary 200 simulations as “burn-in”:

fig_dims = (16, 10)
fig, ax = plt.subplots(figsize=fig_dims)
ax.set(xlabel='Change Point Candidates', ylabel='Density')
sns.kdeplot(m_params[200:])
Density Plot for the Change Point Candidates — Image by creator

We might also create Beta distributions using the mean of the 4 other parameters, namely the parameters a, b, c, and d. As we discussed previously these parameters are essential for the Metropolis-Hastings algorithm once the reject rule must assert that for t≤m, we may have a Beta(a,b) distribution and a Beta(c,d) distribution for t>m.

Let’s go ahead and construct a representation of that using the variables a_params, b_params, c_params and d_params, which incorporates the sampled values for a, b, c and d

fig, ax = plt.subplots(1, 1, figsize=(16, 8))

ax.set_ylabel('Density')

beta_1 = np.random.beta(np.mean(a_params[200:]), np.mean(b_params[200:]), size=1000)
beta_2 = np.random.beta(np.mean(c_params[200:]), np.mean(d_params[200:]), size=1000)

sns.kdeplot(beta_1, color='b', fill=True, common_norm=False, alpha=.5, linewidth=0)
sns.kdeplot(beta_2, color='r', fill=True, common_norm=False, alpha=.5, linewidth=0)

The primary Beta Distribution built with the mean of the parameters a and b is in red and the second with the mean of the parameters c and d is the blue one. In the center, where each are less dense, we found the change point. Undoubtedly one among the nice benefits of using such an approach is the supply of such distributions, since we are able to use them to make Bayesian inferences, enrich a predictive model and even use other sorts of Monte Carlo simulations.

Representations of Two Beta Distribuitions for the A, B, and D parameters — Image by creator

Finding change cut-off dates series could prevent systems from moving into serious trouble. Just imagine a machine whose temperature needs to be controlled. Any abrupt changes should be recognized as soon as possible to ensure that an engineer to examine them. Or the energy consumption in various production facilities. Any excessive consumption needs to be analyzed because it could indicate some deviation in production and even energy leakage, significantly impacting production costs.

That said, the approach developed by D’Angelo (2011) and implemented here in Python proved to be of great value for detecting change points in a given time series. As well as, as previously identified, one other great aspect of this approach is precisely the actual fact we obtain two beta distributions as outputs, which could be really useful.

AMINIKHANGHAHI, Samaneh e COOK, J. Diane. A Survey of Methods for Time Series Change Point Detection. Knowledge and Information Systems, 51, 2017.

D’ANGELO, Marcos Flávio et al. A Fuzzy/Bayesian Approach for the Time Series Change Point Detection Problem. Pesquisa Operacional, 31(2), 2011.

EHLERS, Ricardo S. Inferência Bayesiana. 5ª Ed. Departamento de Estatística da Universidade Federal do Paraná, 2007.

FAMA, Eugene. Random Walks in Stock Market Prices. Financial Analysts, 21, 1965.

_____________ Efficient Capital Markets: A Review of Theory and Empirical Work. The Journal of Finance, 25, 1970.

GAMERMAN, Dani. LOPES, Hedibert Freitas. Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference. 2º Ed. Florida : Chapman & Hall/CRC, 2006. 315p.

GELMAN, Andrew et al. Bayesian Data Evaluation. 2ª Ed. Florida : Chapman & Hall/CRC, 2003. 668p.

HASTINGS, W. K. Monte Carlo Sampling Methods Using Markov Chains and their Applications. Biometrika, 57, 1970.

HAYKIN, Simon. Redes Neurais: Princípios e Práticas. 2ª Ed. Porto Alegre : Bookman, 2007. 900p.

IWATA, Takuma et al. Accelerating Online Change-Point Detection Algorithm using 10GbE FPGA NIC. Trabalho apresentado na 24ª Conferência Internacional de Computação Paralela e Distribuída. Turim : 2018.

KOHONEN, Teuvo. Self-Organizing Maps. Nova Iorque : Springer, 1990, 501p.

METROPOLIS, Nicholas et al. Equation of State Calculations by Fast Computing Machines. Journal of Chemical Physics, 21, 1953.

OH KJ, et al. Developing Time-Based Clustering Neural Networks To Use Change-Point Detection: Application To Financial Time Series. Asia-Pacific Journal of Operational Research, 22(1), 2005.

PAULINO, Carlos Daniel et al. Estatística Bayesiana. 2º Edição. Lisboa : Fundação Calouste Gulbenkian, 2018, 601p.

VAN DEN BURG, Gerrit J. J., WILLIAMS, Christopher K. I. An Evaluation of Change Point Detection Algorithms. stat.ML, arXiv:2003.06222, 2020

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here