Co-authored with Viswanath Gangavaram, Karthik Sundar, Ishita Dutta
Food delivery is a posh hyperlocal business spread over 1000’s of geographical zones across India. Here zones represent smaller geographical areas. The power to appropriately predict the long run state of the demand and provide at a hyperlocal level is paramount to running our operations optimally at Swiggy. On this blog, we aim to supply an summary of how we approach forecasting and the systems that were built to support forecasting at the size for hyperlocal.
This blog is split into the next sections — The challenges of forecasting at hyperlocal, and the way our platform is designed to tackle these challenges. Then we give an in depth account of the forecasting platform, and together with it, we also cover the tenets and API design selections of the pipelines we’ve built. And in the long run, we conclude with results.
Accurate forecasting of time series for smaller time granularities on the hyperlocal level is a difficult task as a result of the frequent and infrequently huge variation of the particular time series. Time series models typically model the bottom, trend, and seasonality of the time series. From our experience, even after modeling for seasonality and long-term trends, real-world time series exhibit significant variance as a result of situations changing on the bottom like delivery executive (DE) shortages, discounts on the buyer app, etc., and in addition to external aspects like festivals, sporting events, rains, strikes, etc.
To tackle the aforementioned issues, we devised the next set of techniques 1) a novel forecasting meta-model (tournament), wherein multiple models compete to supply forecasts for various spatiotemporal units, 2) ensemble techniques like easy, and weighted averages of the bottom models, and stacking of the bottom models, 3) the handling of events in independent pre and post-processing steps. We also experimentally found the next techniques assist in further reducing the forecasting errors in some cases: 4) time-reduction: forecasting at a better granularity after which distributing the forecasted value as a function of historical weights, and vice versa 5) time-shift: forecasting for a given time interval and using it for a special time interval.
As suggested within the introduction, forecasting becomes a basic ingredient in various products at Swiggy. Individual teams developed different methods, used quite a lot of tools to generate forecasts (Databricks, AWS forecast, Python scripts, Excel) and operated in isolation. As a result of the challenges mentioned above within the hyperlocal scenario, these teams had to resolve the identical problems and required significant analyst/data science/engineering bandwidth to take care of accuracy and scale the systems for numerous time-series. By identifying the business value and the technical challenges of forecasting, a centralized forecasting platform can enable teams to concentrate on the core business problem statement and depend on the platform to do the heavy lifting of forecasting in a standardized, cost efficient way.
The Swiggy Forecasting platform is a centralized forecasting service, which enables the end-user to generate accurate forecasts in a fast span of time and at a less expensive cost vs. the alternatives.
Time series forecasting problems are well-researched and various approaches have been proposed and put to make use of on real-world data. As described in our previous blog, we invested in forecasting early on for our in-house anomaly detection framework, where traditional time series techniques (autoregressive, exponential smoothing), more moderen forecasting methods (FB Prophet), neural networks (DeepAR), and gradient-boosted trees (XGBoost) are getting used extensively. We also include baseline models (moving averages, seasonal moving averages) to match with complex models and in addition to construct candidates for composite models like tournaments and ensembles.
As described in sections 3.1.2 (Model Tournament) and three.1.3 (Model subset selection) in our previous blog, we devised a novel forecasting meta-model, wherein multiple models compete to supply forecasts for various spatiotemporal units, and we experimentally found that this system is found to cut back the forecasting errors by 15–30% on various time series metrics as in comparison with the bottom models. For more details on this system, please confer with the blog. Our platform has in-built support for various ensemble techniques like averaging and stacking of baseline forecasts, these are proven to cut back the forecasting errors by 10–20% as in comparison with their baseline variants.
As mentioned above, the concept of time-shift has been shown to cut back the forecasting error by an inexpensive margin, In figure 1, we’d like an hourly forecast from N+8 to N+14th days, we found that forecasting for N+1 to N+seventh days, and use them as a forecast for N+eighth to N+14th days is shown to cut back the forecasting error by 5–10% for some forecasting models.
Considered one of the common issues while doing time series forecasting is the occurrence of events like festivals, sporting events, etc. While some models like a prophet, and DeepAR can handle events, in our use cases, we found that these models will not be that effective in modelling the event impact, whereas models like ETS, and ARIMA will not be even devised to handle the events. The impact of events is twofold, the primary issue is that if we don’t handle events within the historical data, generally, ML and forecasting models are likely to inflate the forecasts for future dates, as a result of higher value on the events within the historical dates. The second issue is, while we forecast for future dates which contain event dates, generally, models are likely to produce deflated forecasts. To handle the aforementioned issues, our forecasting pipeline has in-built pre and post-processing steps.
Within the pre-processing step, for the times once we had events, we replaced actual values with forecasted values for the historical dates. This straightforward trick has proven to be extremely useful in forecasting the dates which come immediately after an event.
Within the post-processing step, we’re feeding forecasted values from the upstream forecasting models to an event impact model (EIM), which explicitly models the impact of events in a separate follow-up step. As we will see in Figure 3, the EIM can reduce the forecasting error within the range of 5–15% on event dates. The EIM models the historical residuals as a function of events, and within the post-processing, we take the sum of forecast value from the previous step, and the event impact, which leads to event-adjusted forecasts.
A typical forecasting task follow the universal workflow of machine learning. A pipeline view of forecasting enables us to discover different modules (confer with Figure 4), define contracts and Non-Functional Requirements (scalability, reliability, observability) across all stages of the life cycle
Broadly the forecasting pipeline has the next functionality: Dataset definition and preparation, Pre-processing, Training, Post-processing, and Deployment.
: Defines the goal metric, and related metrics and generates the training data in a standardized format
: Format the input metrics to be fed into the models by handling outliers, missing values, event handling, etc.
:
- Run AutoML, where for every algorithm one of the best hyper-parameters are identified by evaluating on predefined backtest windows.
- Train a particular algorithm and construct a model by specifying model-specific parameters in addition to pre/post-processing steps
- Construct Composite models like tournaments and ensembles over base models
: Adjusting forecasts to include external event impact.
: Once one of the best model is identified, the forecasting service sets up schedules for batch forecasting use-cases or deploys the model for real-time forecasting use-cases onto the DSP.
Each pipeline stage is structured as an independent module that operates inside predefined standard inputs, and outputs and provides unique functionality (API). For instance, the Data preparation module can ingest, transform (functionality) data from Snowflake/ Hive (inputs), and write right into a feature Delta table (output).
: Individual modules will be implemented in another way and executed in several environments to supply recent functionality. For instance, a training module will be implemented to perform vanilla time-series model training in addition to Tournament model training. Prophet training can occur in Databricks clusters and DeepAR via AWS Forecast.
: Individual stages will be scaled each vertically and horizontally to enable concurrent executions of the pipeline in addition to operate on large volumes of knowledge. For instance, the Forecasting Pipelines can concurrently run for k metrics where each metric can contain as much as 100,000 time series.
: Allowing the creation of templated pipelines for specific scenarios ensures reusability and reduces the associated fee of onboarding and duplication. Templated pipelines allow us to be flexible in handling edge cases. For instance, a pipeline that computes an event handler model together with a time-series forecasting model is structurally different from an everyday time-series forecasting pipeline.
: The vast majority of the forecasting pipeline functionality is ideated by analysts/data scientists, pipeline design needs to be inclusive of their inputs (code, configs). For instance, enable the Analyst to alter the configuration of the post-processing step in a single forecasting pipeline and permit data scientists to creator a recent model for training.
: The code executed as a part of the pipeline needs to be unit-testable and integration tests between modules will be automated. For instance, unit tests should be written on preprocessing code which handles missing values.
Based on the functionality to be built and the tenets listed above, there’s a have to develop a foundational library (SDK) that encapsulates the complexity of forecasting and provides a generic API for various time series forecasting use cases.
Figure 5 illustrates the components that make up your entire pipeline, Stages (i.e., data prep, preprocessing, training) are a skinny layer of micro orchestration code that essentially executes Core APIs in isolated environments and operates under standard I/O interfaces. Pipeline API is an abstraction over stages that propagates context and provides an API to perform E2E forecasting reliably and in a reproducible manner.
Because the DAG-like structure of an E2E forecasting pipeline is analogous for nearly all of use cases, we kept things easy by authoring the Pipeline API as pure Python classes. To make onboarding easy we authored Databricks notebook utilities that simply devour specs in regards to the goal metric and output a trained model.
The classes within the SDK enable forecasters to import code into their notebooks and begin constructing models. Once the model error is in the suitable range, we will quickly arrange inference jobs for consumption.
The inner class design and spec follow Sklearn Estimator semantics and base classes extend Sklearn’s API. The SDK uses Facebook’s time series package to support a couple of algorithms, open source XGBoost, and MLFlow for logging parameters and models. At Swiggy, we heavily depend on Databricks and Spark to run data pipelines and ML, keeping this in mind all of the models and transformers implement a spark Dataframe in and Dataframe out contract.
Below is an example of learn how to use XGBoost for time series forecasting. Here the XGBForecastor class mechanically does feature engineering and builds a set of lag features of goal time series based on certain parameters and time_frequency
The XGBForecastor is saved as a custom MLflow Python model, where together with the native XGBoost model, the config used to coach the model (data spec, training params), the signature of the model (input features, output vector), and the python environment (library versions) are saved. This allows the team to take care of a transparent go surfing the form of data trained, algorithm config in addition to the training environment, which helps us with version control and software upgrades.
Complex models like tournaments also implement the identical interface as seen above. Time-reduction, time-shift and similar models implement a decorator pattern, where each model will be chained in any order to provide higher results. (Ex: forecaster = time-shift(Reduction(Prophet)))
Below is an example of learn how to run hyper-parameter optimization on XGBoost.
Pipeline API leverages MLflow projects and executes each stage in isolated Conda environments and individual Databricks job clusters. For instance, if an auto_ml run is configured to make use of Prophet, XGBoost, and ARIMA algorithms then each algorithm is trained and tuned in isolated clusters. This allows our earlier defined tenet of scaling each stage horizontally and independently. Based on the variety of time series within the input data, one can configure a cluster with different numbers of employees and different employee types (AWS EC2 instances). In practice, configurations with fewer employees loaded with an appropriate variety of CPU cores and memory work higher for ML workload as a result of reduced shuffle overhead between nodes.
This configuration of hardware is enabled by committing a profile file throughout the SDK code itself and setting the profile environment variable at run time.
Latest to forecasting users can simply run the Pipelines by providing data spec inside parameterized Databricks notebooks. This reduces the time to generate v1 models substantially since no prior understanding of forecasting or machine learning is required.
Forecasting configurations, in addition to execution metadata of every stage of the pipeline, are logged into MLFlow. The metrics generated from training runs (wMAPE) and the logged models are available within the Databricks UI, post the successful completion of the pipeline.
On this blog, we deep-dived into the engineering tenants and API design selections for implementing a general-purpose robust forecasting service. We presented how an Event Impact Model can reduce the forecasting errors on event days as in comparison with without EIM. We also got deeper insights into how composite models comparable to ensembles, and tournaments can reduce forecasting errors when put next to individual models.
Acknowledgments: This forecasting platform wouldn’t be possible without the persistent support of Nishant Agrawal, Goda Ramkumar, Deepak Jindal, Soumya Simanta and Jairaj Sathyanarayana.
relaxing music
hiphop