Earth Mover distance for time series

W

Whilst many metrics corresponding to MAPE, MAE and RMSE exist for evaluating forecasting performance, such metrics have significant limitations as they only compare forecast values with actual values for a similar closing dates.

For a lot of applications corresponding to demand planning, financial and energy forecasting and trading, that is insufficient as accurate evaluation of forecasting performance also requires evaluating how accurate the forecast is when it comes to .

Evaluating temporal mismatch is crucial, especially in such applications as energy price prediction; energy providers must submit bids with energy production forecasts for specific closing dates. Predicting correct values of future energy prices is barely valid if these prices are predicted appropriately for specific closing dates.

In demand forecasting, even when the values are predicted appropriately when it comes to magnitudes, incorrect allocation across time results in lost revenue and high costs as customer orders aren’t served on time, and excessive inventory storage and financing costs are unnecessarily incurred.

In the instance below, we’ve got two time series, and we’re interested by understanding how close these two-time series are to one another. We are going to use intermittent demand time series for example only to simplify concept explanations. Still, time series might be another type, for instance, energy consumption, energy prices or prices of some financial assets.

In the instance below, the 2 time series match on many of the timeline, but there’s a mismatch at two points in times t=8 and t=9. The primary time series might be an actual demand, and time series two might be forecast for this time series.

How can we measure the performance of such a forecast to reflect how well it forecasts not only the values of the demand but additionally the proper timings of the demand values?

As we are able to see from the image below, the results of mismatch aren’t only within the forecast (time series 2) under forecasting the worth of demand at time 9 (4 units vs units required to satisfy demand)but additionally in forecasting the 4 units one unit of time too early.

How will we measure this mismatch more intelligently to reflect each varieties of mismatch?

Imagine every time series represents a pile of soil and that the EMD represents the minimum amount of labor required to maneuver the soil from one pile to the opposite. The “work” is defined as the quantity of soil moved multiplied by the space moved. The EMD is calculated by finding the optimal alignment between the 2 distributions, where the quantity of soil moved is minimised. This alignment is found by solving a linear programming problem. Once the optimal alignment is found, the EMD is the sum of the product of the quantity of soil moved and the space it’s moved for all pairs of points on the timeline.

In our example, mismatch happens from time = 8 onwards; as step one, we calculate the space matrix D that represents the “cost” of moving a unit of mass from one time series to a different. In our example, we are able to use absolutely the difference between the values because the measure of cost; we consider two time series from the moment they begin to mismatch:

ts1 = np.array([0,8,0,0,6,0]) 
ts2 = np.array([4,0,0,0,6,0])

We first search for non-negative values and move values from one series to a different to match two series as closely as possible.

In step one, the primary positive value for ts1 is 8, whilst the primary positive value for ts2 is 4. We will move 4 units from ts1 at position 0 to position one at ts1. The flow value is the minimum of the 2 values, and the space is 1. The 4 matched units are faraway from each time series. After step one, the values of the 2 time series turn out to be. Within the second step, positive values are 4 (in position 1) and 6 (at position 4). We, due to this fact, move 4 units x 3 positions.

ts1: [0 4 0 0 6 0], ts2: [0 0 0 0 6 0]

(step)  from   to     flow     dist   total work
1.     [0]   [1]     4        1          4
2.     [1]   [4]     4        3         12  
3.     [4]   [4]     2        0          0 
-----             ------
10                 16Wasserstein distance = total work / total flow
= 16/ 10
= 1.6

For those who are interested by details, you need to use the output below and work it independently to check your results with the table above.

EMD has applications in various fields, including computer vision, image and signal processing, machine learning, and bioinformatics. It is useful when comparing distributions that aren’t easily characterised by summary statistics corresponding to means and variances but as a substitute have complex structures which may be difficult to check using other methods.

Overall, EMD is a robust tool for analysing time series data. It may be utilized in various applications, corresponding to signal processing, financial evaluation, and environmental monitoring.

Earth Mover distance for time series

What are your thoughts on this topic?
Let us know in the comments below.

1 COMMENT

Share this article

Recent posts

AI’s Growing Power Needs: Tech Industry’s Move Towards Nuclear Power

“Human Intelligence Created”… Human Intelligence Challenge Spreads Against ‘Made by AI’

What We Still Don’t Understand About Machine Learning

OpenAI Unveils SearchGPT: A Recent AI-Powered Search Engine

Public Release: Kling AI Video Generator

Earth Mover distance for time series

What are your thoughts on this topic? Let us know in the comments below.

1 COMMENT

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.