A Practical Toolkit for Time Series Anomaly Detection, Using Python

-

fascinating points of time series is the intrinsic complexity of such an apparently easy kind of information.

At the tip of the day, in time series, you’ve an x axis that typically represents time (t), and a y axis that represents the amount of interest (stock price, temperature, traffic, clicks, etc…). That is significantly simpler than a video, for instance, where you may have hundreds of images, and every image is a tensor of width, height, and three channels (RGB).

Nonetheless, the evolution of the amount of interest (y axis) over time (x axis) is where the complexity is hidden. Does this evolution present a trend? Does it have any data points that clearly deflect from the expected signal? Is it stable or unpredictable? Is the common value of the amount larger than what we’d expect? Those can all by some means be defined as .

This text is a set of multiple anomaly detection techniques. The goal is that, given a dataset of multiple time series, we are able to detect which time series is anomalous and why.

These are the 4 time series anomalies we’re going to detect:

  1. We’re going to detect any trend in our time series (trend anomaly)
  2. We’re going to evaluate how volatile the time series is (volatility anomaly).
  3. We’re going to detect the purpose anomalies throughout the time series (single-point anomaly).
  4. We’re going to detect the anomalies inside our bank of signals, to grasp what signal behaves otherwise from our set of signals (dataset-level anomaly).
Image made by writer

We’re going to theoretically describe each anomaly detection method from this collection, and we’re going to indicate the Python implementation. The entire code I used for this blog post is included within the PieroPaialungaAI/timeseriesanomaly GitHub folder

0. The dataset

As a way to construct the anomaly collector, we want to have a dataset where we all know exactly what anomaly we’re looking for, in order that we all know if our anomaly detector is working or not. As a way to try this, I even have created a data.py script. The script comprises a DataGenerator object that:

  1. Reads the configuration of our dataset from a config.json* file.
  2. Creates a dataset of anomalies
  3. Gives you the flexibility to simply store the info and plot them.

That is the code snippet:

Image made by writer

So we are able to see that we’ve:

  1. A shared time axis, from 0 to 100
  2. Multiple time series that form a time series dataset
  3. Every time series presents one or many anomalies.

The anomalies are, as expected:

  1. The trend behavior, where the time series have a linear or polynomial degree behavior
  2. The volatility, where the time series is more volatile and changing than normal
  3. The extent shift, where the time series has the next average than normal
  4. A degree anomaly, where the time series has one anomalous point.

Now our goal might be to have a toolbox that may discover each one in every of these anomalies for the entire dataset.

*The config.json file permits you to modify all of the parameters of our dataset, corresponding to the variety of time series, the time series axis and the form of anomalies. That is the way it looks like:

1. Trend Anomaly Identification

1.1 Theory

After we say “a trend anomaly”, we’re in search of a structural behavior: the series moves upward or downward over time, or it bends in a consistent way. This matters in real data because drift often means sensor degradation, changing user behavior, model/data pipeline issues, or one other underlying phenomenon to be investigated in your dataset.

We consider two sorts of trends:

  • Linear regression: we fit the time series with a linear trend
  • Polynomial regression: we fit the time series with a low-degree polynomial.

In practice, we measure the error of the Linear Regression model. If it is just too large, we fit the Polynomial Regression one. We consider a trend to be “significant” when the p value is lower than a set threshold (commonly p < 0.05).

1.2 Code

The AnomalyDetector object in anomaly_detector.py will run the code described above using the next functions:

  • The detector, which can load the info we’ve generated in DataGenerator.
  • detect_trend_anomaly and detect_all_trends detect the (eventual) trend for a single time series and for the entire dataset, respectively
  • get_series_with_trend returns the indices which have a big trend.

We are able to use plot_trend_anomalies to display the time series and see how we’re doing:

Image made by writer

Good! So we’re in a position to retrieve the “trendy” time series in our dataset with none bugs. Let’s move on!

2. Volatility Anomaly Identification

2.1 Theory

Now that we’ve a worldwide trend, we are able to give attention to volatility. What I mean by volatility is, in plain English, In additional precise terms,

That is how we’re going to test this anomaly:

  1. We’re going to remove the trend from the timeseries dataset.
  2. We’re going to search out the statistics of the variance.
  3. We’re going to search out the outliers of those statistics

Pretty easy, right? Let’s dive in with the code!

2.2 Code

Similarly to what we’ve done for the trends, we’ve:

  • detect_volatility_anomaly, which checks if a given time series has a volatility anomaly or not.
  • detect_all_volatilities, and get_series_with_high_volatility, which check the entire time series datasets for volatility anomaly and return the anomalous indices, respectively.

That is how we display the outcomes:

Image made by writer

3. Single-point Anomaly

3.1 Theory

Okay, now let’s ignore all the opposite time series of the dataset and let’s give attention to every time series at a time. For our time series of interest, we would like to see if we’ve one point that’s clearly anomalous. There are numerous ways to do this; we are able to leverage Transformers, 1D CNN, LSTM, Encoder-Decoder, etc. For the sake of simplicity, let’s use a quite simple algorithm:

  1. We’re going to adopt a rolling window approach, where a set sized window will move from left to right
  2. For every point, we compute the mean and standard deviation of its surrounding window (excluding the purpose itself)
  3. We calculate how many standard deviations the purpose is away from its local neighborhood using the Z-score

We define a degree as anomalous when it exceeds a set Z-score value. We’re going to use Z-score = 3 which implies 3 times the usual deviations.

3.2 Code

Similarly to what we’ve done for the trends and volatility, we’ve:

  • detect_point_anomaly, which checks if a given time series has any single-point anomalies using the rolling window Z-score method.
  • detect_all_point_anomalies and get_series_with_point_anomalies, which check the whole time series dataset for point anomalies and return the indices of series that contain at least one anomalous point, respectively.

And that is the way it is performing:

Image made by writer

4. Dataset-level Anomaly

4.1 Theory

This part is intentionally easy. Here we’re in search of weird cut-off dates, we’re in search of weird signals within the bank. What we would like to reply is:

To do this, we compress every time series right into a single “baseline” number (a typical level), after which we compare those baselines across the entire bank. The comparison might be done when it comes to the median and Z rating.

4.2 Code

That is how we do the dataset-level anomaly:

  1. detect_dataset_level_anomalies(), finds the dataset-level anomaly across the entire dataset.
  2. get_dataset_level_anomalies(), finds the indices that present a dataset-level anomaly.
  3. plot_dataset_level_anomalies(), displays a sample of time series that present anomalies.

That is the code to achieve this:

5. All together!

Okay, it’s time to place all of it together. We are going to use detector.detect_all_anomalies() and we’ll evaluate anomalies for the entire dataset based on trend, volatility, single-point and dataset-level anomalies. The script to do that may be very easy:

The df will provide you with the anomaly for every time series. That is the way it looks like:

If we use the next function we are able to see that in motion:

Image made by writer

Pretty impressive right? We did it. 🙂

6. Conclusions

Thanks for spending time with us, it means lots. ❤️ Here’s what we’ve done together:

  • Built a small anomaly detection toolkit for a bank of time series.
  • Detected trend anomalies using linear regression, and polynomial regression when the linear fit will not be enough.
  • Detected volatility anomalies by detrending first after which comparing variance across the dataset.
  • Detected single-point anomalies with a rolling window Z-score (easy, fast, and surprisingly effective).
  • Detected dataset-level anomalies by compressing each series right into a baseline (median) and flagging signals that continue to exist a distinct magnitude scale.
  • Put the whole lot together in a single pipeline that returns a clean summary table we are able to inspect or plot.

In lots of real projects, a toolbox just like the one we built here gets you very far, because:

  • It gives you explainable signals (trend, volatility, baseline shift, local outliers).
  • It gives you a robust baseline before you progress to heavier models.
  • It scales well when you’ve many signals, which is where anomaly detection often becomes painful.

Remember that the baseline is straightforward on purpose, and it uses quite simple statistics. Nonetheless, the modularity of the code permits you to easily add complexity by just adding the functionality within the anomaly_detector_utils.py and anomaly_detector.py.

7. Before you head out!

Thanks again in your time. It means lots ❤️

My name is Piero Paialunga, and I’m this guy here:

Image made by writer

I’m originally from Italy, hold a Ph.D. from the University of Cincinnati, and work as a Data Scientist at The Trade Desk in Recent York City. I write about AI, Machine Learning, and the evolving role of information scientists each here on TDS and on LinkedIn. When you liked the article and wish to know more about machine learning and follow my studies, you’ll be able to:

A. Follow me on Linkedin, where I publish all my stories
B. Follow me on GitHub, where you’ll be able to see all my code
C. For questions, you’ll be able to send me an email at 

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x