Modeling Variable Seasonal Features with the Fourier Transform

Artificial Intelligence

Modeling Variable Seasonal Features with the Fourier Transform

admin

October 12, 2023

Modeling Variable Seasonal Features with the Fourier Transform

Improve time series forecast performance with a way from signal processing

Modeling time series data and forecasting it are complex topics. There are a lot of techniques that could possibly be used to enhance model performance for a forecasting job. We are going to discuss here a way that will improve the way in which forecasting models absorb and utilize time features, and generalize from them. The most important focus might be the creation of the seasonal features that feed the time series forecasting model in training — there are easy gains to be made here should you include the Fourier transform within the feature creation process.

This text assumes you might be aware of the fundamental facets of time series forecasting — we is not going to discuss this topic typically, but only a refinement of 1 aspect of it. For an introduction to time series forecasting, see the Kaggle course on this topic — the technique discussed here builds on top of their lesson on seasonality.

Consider the Quarterly Australian Portland Cement production dataset, showing the whole quarterly production of Portland cement in Australia (in tens of millions of tonnes) from 1956:Q1 to 2014:Q1.

df = pd.read_csv('Quarterly_Australian_Portland_Cement_production_776_10.csv', usecols=['time', 'value'])
# convert time from 12 months float to a correct datetime format
df['time'] = df['time'].apply(lambda x: str(int(x)) + '-' + str(int(1 + 12 * (x % 1))).rjust(2, '0'))
df['time'] = pd.to_datetime(df['time'])
df = df.set_index('time').to_period()
df.rename(columns={'value': 'production'}, inplace=True)

        production
time              
1956Q1       0.465
1956Q2       0.532
1956Q3       0.561
1956Q4       0.570
1957Q1       0.529
...            ...
2013Q1       2.049
2013Q2       2.528
2013Q3       2.637
2013Q4       2.565
2014Q1       2.229[233 rows x 1 columns]

That is time series data with a trend, seasonal components, and other attributes. The observations are made quarterly, spanning several many years. Let’s take a take a look at the seasonal components first, using the periodogram plot (all code is included within the companion notebook, linked at the tip).

The periodogram shows the ability density of spectral components (seasonal components). The strongest component is the one with a period equal to 4 quarters, or 1 12 months. This confirms the visual impression that the strongest up-and-down variations within the graph occur about 10 times per decade. There’s also a component with a period of two quarters — that’s the identical seasonal phenomenon, and it simply means the 4-quarter periodicity just isn’t an easy sine wave, but has a more complex shape. We are going to ignore components with periods higher than 4, for simplicity.

If you happen to attempt to fit a model to this data, perhaps with the intention to forecast the cement production for times beyond the tip of the dataset, it might be a great idea to capture this yearly seasonality in a feature or set of features, and supply those to the model in training. Let’s see an example.

Seasonal components could be modeled in plenty of alternative ways. Often, they’re represented as time dummies, or as sine-cosine pairs. These are synthetic features with a period equal to the seasonality you’re attempting to model, or a submultiple of it.

Yearly time dummies:

seasonal_year = DeterministicProcess(index=df.index, constant=False, seasonal=True).in_sample()
print(seasonal_year)

        s(1,4)  s(2,4)  s(3,4)  s(4,4)
time                                  
1956Q1     1.0     0.0     0.0     0.0
1956Q2     0.0     1.0     0.0     0.0
1956Q3     0.0     0.0     1.0     0.0
1956Q4     0.0     0.0     0.0     1.0
1957Q1     1.0     0.0     0.0     0.0
...        ...     ...     ...     ...
2013Q1     1.0     0.0     0.0     0.0
2013Q2     0.0     1.0     0.0     0.0
2013Q3     0.0     0.0     1.0     0.0
2013Q4     0.0     0.0     0.0     1.0
2014Q1     1.0     0.0     0.0     0.0[233 rows x 4 columns]

Yearly sine-cosine pairs:

cfr = CalendarFourier(freq='Y', order=2)
seasonal_year_trig = DeterministicProcess(index=df.index, seasonal=False, additional_terms=[cfr]).in_sample()
with pd.option_context('display.max_columns', None, 'display.expand_frame_repr', False):
print(seasonal_year_trig)

        sin(1,freq=A-DEC)  cos(1,freq=A-DEC)  sin(2,freq=A-DEC)  cos(2,freq=A-DEC)
time                                                                              
1956Q1           0.000000           1.000000           0.000000           1.000000
1956Q2           0.999963           0.008583           0.017166          -0.999853
1956Q3           0.017166          -0.999853          -0.034328           0.999411
1956Q4          -0.999963          -0.008583           0.017166          -0.999853
1957Q1           0.000000           1.000000           0.000000           1.000000
...                   ...                ...                ...                ...
2013Q1           0.000000           1.000000           0.000000           1.000000
2013Q2           0.999769           0.021516           0.043022          -0.999074
2013Q3           0.025818          -0.999667          -0.051620           0.998667
2013Q4          -0.999917          -0.012910           0.025818          -0.999667
2014Q1           0.000000           1.000000           0.000000           1.000000[233 rows x 4 columns]

Time dummies can capture very complex waveforms of the seasonal phenomenon. On the flip side, if the period is N, you then need N time dummies — so, if the period could be very long, you will want plenty of dummy columns, which might not be desirable. For non-penalized models, often just N-1 dummies are used — one is dropped to avoid collinearity issues (we are going to ignore that here).

Sine/cosine pairs can model arbitrarily very long time periods. Each pair will model a pure sine waveform with some initial phase. Because the waveform of the seasonal feature becomes more complex, you will want so as to add more pairs (increase the order of the output from CalendarFourier).

(Side note: we use a sine/cosine pair for every period we would like to model. What we actually need is only one column of A*sin(2*pi*t/T + phi) where A is the load assigned by the model to the column, t is the time index of the series, and T is the component period. However the model cannot adjust the initial phase phi while fitting the info — the sine values are pre-computed. Luckily, the formula above is akin to: A1*sin(2*pi*t/T) + A2*cos(2*pi*t/T) and the model only needs to search out the weights A1 and A2.)

TLDR: What these two techniques have in common is that they represent seasonality via a set of repetitive features, with values that cycle as often as once per the time period being modeled (time dummies, and the bottom sine/cosine pair), or several times per period (higher order sine/cosine). And every considered one of these features has values various inside a set interval: from 0 to 1, or from -1 to 1. These are all of the features now we have to model seasonality here.

Let’s see what happens once we fit a linear model on such seasonal features. But first, we’d like so as to add trend features to the features collection used to coach the model.

Let’s create trend features after which join them with the seasonal_year time dummies generated above:

trend_order = 2
trend_year = DeterministicProcess(index=df.index, constant=True, order=trend_order).in_sample()
X = trend_year.copy()
X = X.join(seasonal_year)

        const  trend  trend_squared  s(1,4)  s(2,4)  s(3,4)  s(4,4)
time                                                               
1956Q1    1.0    1.0            1.0     1.0     0.0     0.0     0.0
1956Q2    1.0    2.0            4.0     0.0     1.0     0.0     0.0
1956Q3    1.0    3.0            9.0     0.0     0.0     1.0     0.0
1956Q4    1.0    4.0           16.0     0.0     0.0     0.0     1.0
1957Q1    1.0    5.0           25.0     1.0     0.0     0.0     0.0
...       ...    ...            ...     ...     ...     ...     ...
2013Q1    1.0  229.0        52441.0     1.0     0.0     0.0     0.0
2013Q2    1.0  230.0        52900.0     0.0     1.0     0.0     0.0
2013Q3    1.0  231.0        53361.0     0.0     0.0     1.0     0.0
2013Q4    1.0  232.0        53824.0     0.0     0.0     0.0     1.0
2014Q1    1.0  233.0        54289.0     1.0     0.0     0.0     0.0[233 rows x 7 columns]

That is the X dataframe (features) that might be used to coach/validate the model. We’re modeling the info with quadratic trend features, plus the 4 time dummies needed for the yearly seasonality. The y dataframe (goal) might be just the cement production numbers.

Let’s carve a validation set out of the info, containing the 12 months 2010 observations. We are going to fit a linear model on the X features shown above and the y goal represented by cement production (the test portion), after which we are going to evaluate model performance in validation. We may also do all the above with a trend-only model that can only fit the trend features but ignores seasonality.

def do_forecast(X, index_train, index_test, trend_order):
X_train = X.loc[index_train]
X_test = X.loc[index_test]y_train = df['production'].loc[index_train]
y_test = df['production'].loc[index_test]
model = LinearRegression(fit_intercept=False)
_ = model.fit(X_train, y_train)
y_fore = pd.Series(model.predict(X_test), index=index_test)
y_past = pd.Series(model.predict(X_train), index=index_train)
trend_columns = X_train.columns.to_list()[0 : trend_order + 1]
model_trend = LinearRegression(fit_intercept=False)
_ = model_trend.fit(X_train[trend_columns], y_train)
y_trend_fore = pd.Series(model_trend.predict(X_test[trend_columns]), index=index_test)
y_trend_past = pd.Series(model_trend.predict(X_train[trend_columns]), index=index_train)
RMSLE = mean_squared_log_error(y_test, y_fore, squared=False)
print(f'RMSLE: {RMSLE}')
ax = df.plot(**plot_params, title='AUS Cement Production - Forecast')
ax = y_past.plot(color='C0', label='Backcast')
ax = y_fore.plot(color='C3', label='Forecast')
ax = y_trend_past.plot(ax=ax, color='C0', linewidth=3, alpha=0.333, label='Trend Past')
ax = y_trend_fore.plot(ax=ax, color='C3', linewidth=3, alpha=0.333, label='Trend Future')
_ = ax.legend()
do_forecast(X, index_train, index_test, trend_order)

RMSLE: 0.03846449744356434

Blue is train, red is validation. The complete model is the sharp, thin line. The trend-only model is the wide, fuzzy line.

This just isn’t bad, but there’s one glaring issue: the model has learned a constant-amplitude yearly seasonality. Though the yearly variation actually increases in time, the model can only stick with fixed-amplitude variations. Obviously, it is because we gave the model only fixed-amplitude seasonal features, and there is nothing else within the features (goal lags, etc) to assist it overcome this issue.

Let’s dig deeper into the signal (the info) to see if there’s anything there that would help with the fixed-amplitude issue.

The periodogram will highlight all spectral components within the signal (all seasonal components in the info), and can provide an summary of their overall strength, however it’s an aggregate, a sum over the entire time interval. It says nothing about how the amplitude of every seasonal component may vary in time across the dataset.

To capture that variation, you may have to make use of the Fourier spectrogram as an alternative. It’s just like the periodogram, but performed repeatedly over many time windows across the complete data set. Each periodogram and spectrogram can be found as methods within the scipy library.

Let’s plot the spectrogram for the seasonal components with periods of two and 4 quarters, mentioned above. As all the time, the complete code is within the companion notebook linked at the tip.

spectrum = compute_spectrum(df['production'], 4, 0.1)
plot_spectrogram(spectrum, figsize_x=10)

What this diagram shows is the amplitude, the “strength” of the 2-quarter and 4-quarter components over time. They’re pretty weak initially, but turn out to be very strong around 2010, which matches the variations you may see in the info set plot at the highest of this text.

What if, as an alternative of feeding the model fixed-amplitude seasonal features, we adjust the amplitude of those features in time, matching what the spectrogram tells us? Would the ultimate model perform higher in validation?

Let’s pick the 4-quarter seasonal component. We could fit a linear model (called the envelope model) on the order=2 trend of this component, on the train data (model.fit()), and extend that trend into validation / testing (model.predict()):

envelope_features = DeterministicProcess(index=X.index, constant=True, order=2).in_sample()spec4_train = compute_spectrum(df['production'].loc[index_train], max_period=4)
spec4_train
spec4_model = LinearRegression()
spec4_model.fit(envelope_features.loc[spec4_train.index], spec4_train['4.0'])
spec4_regress = pd.Series(spec4_model.predict(envelope_features), index=X.index)
ax = spec4_train['4.0'].plot(label='component envelope', color='gray')
spec4_regress.loc[spec4_train.index].plot(ax=ax, color='C0', label='envelope regression: past')
spec4_regress.loc[index_test].plot(ax=ax, color='C3', label='envelope regression: future')
_ = ax.legend()

The blue line is the strength of the variation of the 4-quarter component within the train data, fitted as a quadratic trend (order=2). The red line is identical thing, prolonged (predicted) over the validation data.

We’ve modeled the variation in time of the 4-quarter seasonal component. We will take the output from the envelope model, and multiply it by the point dummies corresponding to the 4-quarter seasonal component:

spec4_regress = spec4_regress / spec4_regress.mean()season_columns = ['s(1,4)', 's(2,4)', 's(3,4)', 's(4,4)']
for c in season_columns:
X[c] = X[c] * spec4_regress
print(X)

        const  trend  trend_squared    s(1,4)    s(2,4)    s(3,4)    s(4,4)
time                                                                       
1956Q1    1.0    1.0            1.0  0.179989  0.000000  0.000000  0.000000
1956Q2    1.0    2.0            4.0  0.000000  0.181109  0.000000  0.000000
1956Q3    1.0    3.0            9.0  0.000000  0.000000  0.182306  0.000000
1956Q4    1.0    4.0           16.0  0.000000  0.000000  0.000000  0.183581
1957Q1    1.0    5.0           25.0  0.184932  0.000000  0.000000  0.000000
...       ...    ...            ...       ...       ...       ...       ...
2013Q1    1.0  229.0        52441.0  2.434701  0.000000  0.000000  0.000000
2013Q2    1.0  230.0        52900.0  0.000000  2.453436  0.000000  0.000000
2013Q3    1.0  231.0        53361.0  0.000000  0.000000  2.472249  0.000000
2013Q4    1.0  232.0        53824.0  0.000000  0.000000  0.000000  2.491139
2014Q1    1.0  233.0        54289.0  2.510106  0.000000  0.000000  0.000000[233 rows x 7 columns]

The 4 time dummies should not either 0 or 1 anymore. They’ve been multiplied by the component envelope, and now their amplitude varies in time, similar to the envelope.

Let’s train the most important model again, now using the modified time dummies.

do_forecast(X, index_train, index_test, trend_order)

RMSLE: 0.02546321729737165

We’re not aiming for an ideal fit here, since that is only a toy model, however it’s obvious the model performs higher, it’s more closely following the 4-quarter variations within the goal. The performance metric has improved also, by 51%, which just isn’t bad in any respect.

Improving model performance is an unlimited topic. The behavior of any model doesn’t rely upon a single feature, or on a single technique. Nonetheless, should you’re trying to extract all you may out of a given model, it is best to probably feed it meaningful features. Time dummies, or sine/cosine Fourier pairs, are more meaningful after they reflect the variation in time of the seasonality they’re modeling.

Adjusting the seasonal component’s envelope via the Fourier transform is especially effective for linear models. Boosted trees don’t profit as much, but you may still see improvements almost irrespective of what model you utilize. If you happen to’re using a hybrid model, where a linear model handles deterministic features (calendar, etc), while a boosted trees model handles more complex features, it will be significant that the linear model “gets it right”, subsequently leaving less work to do for the opposite model.

Additionally it is vital that the envelope models you utilize to regulate seasonal features are only trained on the train data, and so they don’t see any testing data while in training, just like several other model. When you adjust the envelope, the time dummies contain information from the goal — they should not purely deterministic features anymore, that could be computed ahead of time over any arbitrary forecast horizon. So at this point information could leak from validation/testing back into training data should you’re not careful.

The info set utilized in this text is accessible here under the Public Domain (CC0) license:

The code utilized in this text could be found here:

A notebook submitted to the Store Sales — Time Series Forecasting competition on Kaggle, using ideas described in this text:

A GitHub repo containing the event version of the notebook submitted to Kaggle is here:

All images and code utilized in this text are created by the creator.

Improve time series forecast performance with a way from signal processing

LEAVE A REPLY Cancel reply