Home Artificial Intelligence Case Study: Applying a Data Science Process Model to a Real-World Scenario 1. Domain and project description 2. Data provision 3. Evaluation 4. Deployment 5. (Application) Use and summary References

Case Study: Applying a Data Science Process Model to a Real-World Scenario 1. Domain and project description 2. Data provision 3. Evaluation 4. Deployment 5. (Application) Use and summary References

1
Case Study: Applying a Data Science Process Model to a Real-World Scenario
1. Domain and project description
2. Data provision
3. Evaluation
4. Deployment
5. (Application) Use and summary
References

In today’s rapidly changing environment, some of the critical challenges facing corporations is the flexibility to predict future demand accurately. This is particularly true for supply chain teams, where accurate demand planning is significant for maintaining customer satisfaction and keeping costs under control.

On this , we’ll explore how a may also help corporations tackle this challenge hands-onby leveraging statistical forecasting methods. The goal of the fictional company was to develop a more accurate demand planning process that reduced stock-outs, increased inventory turnover, and improve overall supply chain performance.

Image by Unsplash

This project is a strong example of how data science can transform a business by unlocking latest insights, increasing efficiency, and improving decision-making. I hope that this case study will enable you to to think concerning the potential applications in your organization and showcase how you’ll be able to apply the method model DASC-PM successfully.

Please note that the whole article has also been published within the below publication and was written by and :

Development of a Machine Learning Model for Materials Planning within the Supply Chain” in: DASC-PM v1.1 Case Studies. Available from: https://www.researchgate.net/publication/368661660_DASC-PM_v11_Case_Studies

SCHRAMME AG is a number one provider of dressings, band-aids, and bandages. The management thinks that there are qualitative optimization potential and savings opportunities in materials planning and the resulting production processes. Management assigns an internal project manager the duty of developing a model based on machine learning to plan the materials and requirements in the availability chain. As a consequence of negative experiences in previous data science projects, it’s proposed that this project should initially be developed by utilizing a process model.

The DASC-PM is chosen to make sure a structured and scientific process for project management. To achieve an summary of the project task, the project manager initially works out various use cases which are then checked for suitability and feasibility. The appropriate use cases then function the idea for determining the particular problems and the design of the project. This design is then checked again for suitability and feasibility.

Image by Unsplash

Place to begin and use case development

The corporate manually plans after which produces over 2,500 different products at present. In the previous couple of quarters, they increasingly had inventory shortages for some product series, while for individual products inventories exceeded storage capacities. While the controlling department complains about rising storage costs attributable to imprecise planning, the demand planners lament the insufficient period of time for the planning. For a while, the top of the availability chain has criticized the undeniable fact that the planning is completed solely manually, and the opportunities of digitalization appear to not be taken advantage of.

One goal of the project is the event of a machine learning model where a big a part of the product requirements ought to be planned mechanically in the longer term, based on various influential aspects. The demand planners should increasingly address the planning of essential product groups and promoting. The system should take account of seasonality, trends, and market developments, and achieve planning accuracy of 75%. Which means the forecasts for quantities of every product should deviate from actual requirements by not more than 25%. Order histories, inventory and sales figures for purchasers, and internal promoting plans ought to be used as potential data sources.

Phase 1: Project Order (Schulz et al. 2022)

Together with the inclusion of the Supply Chain department, close collaboration with Sales and IT can also be expected. The planning team within the Supply Chain department now consists of a world market demand planning team that deals with long-term planning (6–18 months) based on market developments, product life cycles, and strategic focus. In individual markets, there are local customer demand planning teams that implement short-term materials and promoting planning (0–6 months) for retail through the corresponding sales channels.

The information science model to be developed should support the monthly planning cycles and quantify the necessity for short-term and long-term materials. The projection is then loaded into the interior planning software and ought to be analyzed and, if need be, supplemented or corrected. The ultimate planning quantity will ultimately be utilized by the factories for production planning. To take account of the customer- and product-specific expertise, seasonality, and experiences from the past, individual team members of the planning team ought to be included within the project, allocating as much as 20% of their working hours to it.

A very important partial aspect through the use case selection is the suitability test. The project manager tries to look at whether the project can fundamentally be classified as feasible and whether the necessities may be carried out with the available resources. Expert interviews have shown that the issue usually could be very well suited to the deployment of knowledge science and corresponding projects have already been undertaken externally and likewise published. The information science team confirmed that there are a sufficient number of doubtless suitable methods for this project and the required data sources can be found.

Finally, the project manager analyzes feasibility. It’s obligatory to coordinate with the IT department to ascertain the available infrastructure and the expertise of the involved employees. The available cloud infrastructure from Microsoft and the experience of the info science team withDatabricks software make the project appear fundamentally achievable. The project risk is assessed as moderate usually for the reason that planers assume a serious role as controllers within the implementation phase and the outcomes are checked.

Data Science Process Model DASC-PM (Schulz et al. 2022)

Project design

Based on the issue and specific facets of the domains, the project manager, the top of the availability chain, and an information scientist are actually liable for formally designing the project.

The project objective is assumed to be an improvement in planning accuracy and a discount within the manual processes and is tied to the aim of developing an appropriate model for the project. In response to an initial estimate, the price framework totals EUR 650,000. A period of six months is proposed because the timeframe for the event, with an extra six months planned for process integration.

Since full planning and an outline of the course of projects in the info science context are frequently impossible in contrast to many other projects, the project manager solely prepares a project outline for this process with the fundamental cornerstones that were already indicated within the previous sections. The budget includes financial resources for 1 full-time project manager, 2 full-time data scientists, and 0.5 full-time data engineers. As already mentioned, the demand planners should allocate roughly 20% of the working hours to share their expertise and experience.

The project as a complete ought to be handled with an agile working method and based on the DASC-PM phases in response to the Scrum methodology. The work is completed iteratively within the areas of knowledge procurement, evaluation, utilization, and use, with the preceding and following phase moving into focus in each phase. The back-steps are especially essential if gaps or problems are present in key areas and may only be solved by returning to the previous phase. The project outline is ready visually and placed in a really visible area of the SCHRAMME AG office for all participants. Then the whole project description is checked for suitability and feasibility once more until the method moves on to the following phase.

Data preparation

SCHRAMME AG has several data sources that may be included in automatic planning. Besides the historical sales data from the ERP system, order histories and customer data from the CRM system are options, together with inventories and marketing measures. Azure Data Factory is used to organize a cloud-based pipeline that loads, transforms, and integrates the info from various source systems. The first basis for the automated forecasts ought to be the order histories: The remaining data is used either as background information for the planning teams or to perform cluster analyses upfront if need be. Within the initial phase of the project, the person data sources still exhibit big differences regarding quality and structure. That’s the reason adjustments are made along with the IT and technical departments to organize the forecasts in a while a solid basis.

ELT data preparation process for evaluation. Image by creator

Data management

The information management process is automated by data engineers and done in response to a every day schedule to at all times remain up so far. To maintain the complexity reasonable, probably the most promising data sources are initially processed and the pipeline is then incrementally expanded with Continuous Integration / Continuous Deployment (CI/CD). After deployment, the processed data are stored in Azure Data Lake Storage where they may be used for future evaluation with Azure Databricks. DataLake also stores the backups of the prepared data and evaluation results in addition to other data similar to protocols, quality metrics, and credential structures. Writing and reading authorizations in addition to plan versions also make sure that only the newest planning period may be processed in order that the values from the past now not change.

Phase 2: Data Provision (Schulz et al. 2022)

Exploratory data evaluation

A very important step in data preparation is the exploratory data evaluation (EDA) where various statistics and visualizations are produced to begin with. This leads to an summary of the distributions, outliers, and correlations in the info. The outcomes of the EDA provide insights into characteristics to be considered for the following phase of the evaluation. Within the second step, Feature Selection and Feature Engineering are used to pick out the relevant characteristics or produce latest features. A dimension reduction method similar to a principal component evaluation is applied for data with high dimensionality. The EDA provides information concerning the existing demand histories of SCHRAMMEAG.

Example of results from exploratory data evaluation. Image by creator

Identification of suitable evaluation methods

The feasibility test at first of the project made it clear that this project can and ought to be solved with data science methods. The 2 data science employees involved initially provide an summary of the prevailing methods which are well suited to the prevailing problem. This existing problem is an element of the regression problem class within the supervised learning algorithms. Fundamentally, it is a variety of time series evaluation that may be expanded by additional aspects or multiple regression.

In reference to the important thing area of scientificity, the newest developments in research on comparable problems were examined. This showed that XGBoost, ARIMA, FacebookProphet, and LightGBM are regularly named methods for the issue class. An information scientist documents the corresponding benefits and downsides of every method and sorts them in response to the complexity and computational intensity. To receive the primary indications on the model ability for products from SCHRAMME AG, simpler models are initially chosen by the project team, which then adopts the classical exponential smoothing and ARIMA model family.

Phase 3: Evaluation (Schulz et al. 2022)

Application of study methods

Since multiple users are involved within the evaluation process for this project, the team initially relies on an acceptable notebook-based development environment in Databricks. Along the everyday machine learning workflow, the code for the import and data cleansing is initially implemented. To make sure validity, the underlying dataset is ultimately divided into training, validation, and test data by cross-validation. The chosen methods are then applied to training and validation datasets to optimize the model. On this context, attempts are also repeatedly made to optimize the parameters of processes and sensibly reduce the number of obtainable dimensions, if need be. The information scientists at SCHRAMME AG document the execution and validation results of the person runs. The ARIMA family models fundamentally exhibit a greater performance relative to the exponential smoothing, even when the goal accuracy of 75% still can’t be achieved with a currently resulting value of 62.4%. The RMSE and MAPE metrics also show potential for optimization.

Comparison of the ARIMA forecast with actual need. Image by creator

The parameter configurations and the idea for choosing the ultimate model after the primary application iteration are documented and ready for the project manager and the top of the availability chain in a technically comprehensible way. What’s seen specifically, is that some product groups have very unusual seasonality and certain products are generally very difficult to predict. Even when the product portfolio of SCHRAMME AG is affected somewhat less attributable to temporary closures (lockdowns) through the corona pandemic, a slight decline in demand for dressing products has been observed. It’s assumed that less activity and transport, in addition to fewer accidents and injuries, account for this drop.

The trend may be modeled quite well within the evaluation method used. To enhance the goal accuracy, technically more complex methods are utilized in one other experiment, with these methods proving to be relevant and applicable within the context of identifying suitable methods. After some iterations to optimize parameters and cross-validate, the Prophet and XGBoost methods demonstrated the best validation results at 73.4% and 65.8%, respectively.

The information scientists consider Prophet to be probably the most suitable method among the many applied processes and determine the planning accuracy relative to the test time series. Even when the accuracy is barely below the goal value of 73.4%, a big improvement in planning accuracy is achieved. The MAPE is at 16.64% and the RMSE at 8,130, which suggests a less absolute deviation compared to the RMSE within the XGBoost method (10,134). Much like the primary experiment, nonetheless, there are product groups which are very difficult to predict overall (37.2%) and negatively impact the cumulative accuracy.

Performance comparison of assorted methods. Image by creator

Evaluation

The outcomes of the analyses are used as the idea for a logical evaluation and classification by the top of the availability chain and the analysts, which is organized and moderated by the project manager. The adopted metrics for evaluation are the cumulative planning accuracy of all products defined upfront along with the common RMSE and MAPE metrics. The department must have a sensible, trackable, and reliable basis for determining requirements on the product level.

Evaluation of the three best models. Image by creator

The benchmark for planning accuracy is assumed to be the present (manually planned) median accuracy of 58% over the past two years. The evaluation of results shows that many product groups overall may be planned with a high degree of accuracy by utilizing the info science model and vastly exceed the benchmark. Nonetheless, there are also product groups that reflect similar accuracy concerning manual planning. It’s obligatory to debate above all of the product area of drainage, which sees much worse results with the model than within the manual planning and appears to be unsuitable for a statistical calculation of necessities with the methods used so far.

Evaluation of one of the best model, distributed across product groups. Image by creator

From a technical perspective, the top of the availability chain believes that it makes little sense to plan such product groups statistically since only limited planning accuracy is feasible attributable to their specific seasonal and trend-based characteristics. She recommends the introduction of an error threshold value on a product basis to find out which products ought to be predicted with the model and which product groups will likely be faraway from the modeling and still planned manually. A variety barely below the present benchmark appears to be an acceptable threshold value since nearly nearly as good accuracy with a less manual effort from the angle of the department is at all times an improvement on the technique to achieving the project objective. The project leader documents the outcomes of the evaluation with the selections and measures adopted.

The required quantities of all chosen products for the following 18 months may be documented because the evaluation result after the primary real modeling. This will now be utilized and integrated into the planning technique of the teams.

The team now enters the utilization phase of the DASC-PM for integration.

Phase 4: Deployment (Schulz et al. 2022)

Technical-methodological preparation

It is feasible to depend on the prevailing infrastructure for utilization. The forecasts are loaded within the planning software IBM Planning Analytics where they’re tested and reprocessed. The so-called TurboIntegrator is used to automate the loading process that represents a central component of IBM Planning Analytics. The OLAP structure of Planning Analytics allows for the creation of flexible views where the users can personally select their context (time reference, product groups, etc.)and adjust calculations in real-time. Moreover, the reporting software QlikSense can also be integrated for more in-depth analyses. Here, the components of the time series (trends, seasonality, noise) may be visualized on the one hand and extra information similar to outliers and median values may be displayed then again. The ultimate plans are loaded into the Data Lake after processing by the planning teams so that they may be referenced in the longer term.

Ensuring technical feasibility

The forecasts themselves are mechanically regenerated at first of the month. The planners could make their corrections through the first 4 working days of the month and think about the leads to the planning system in real-time. Because the algorithms work in a cloud environment, the computing power may be scaled, if need be. To get all processes to run mechanically, changes in the info sources ought to be minimized. If there may be a necessity for adjustment, the info engineer will likely be informed, and the interface document will likely be updated by recording all the data on data sources and connections. The planning and forecasting system is a mix of the cloud (Microsoft Azure) and an on-premise system (Planning Analytics), with the planners only having lively access to the on-premise structures. Credentials are awarded here so the local planners only have access to their areas, while the worldwide planners can view all topics. After the top of the event phase, the support services are mainly handled by the IT department. Within the case of complex problems, data scientists or data engineers are also consulted.

Image by Unsplash

Ensuring applicability

Users of the answer are the local and global planning teams. Since members of the teams have less of a technical orientation, training sessions are held to assist them interpret the forecasts and classify their quality. The user interface can also be designed with a concentrate on clarity and understandability. Easy line and bar charts for processes and benchmarks are used, together with tables reduced to what’s most significant. The users are included in the event from the start to make sure technical correctness and relevance and to make sure familiarity with the answer before the top of the event phase. As well as, complete documentation is drafted. The technical a part of the documentation mostly builds on the interface document by demonstrating the info structures and connections, while the content part is jointly prepared with the users.

Technical preparation

To make sure that the brand new solution doesn’t lose relevance or quality after a couple of months, work continues to be done on improvements after the completion of the primary development phase, even when substantially less time is spent on it. A very powerful aspect of the continued improvement is the constant automated adjustment of the prediction model to latest data. Other parts of the system still requiring manual work at first are also automated over time. A change in various parameters similar to the forecast horizon or threshold values for the accuracy of the prediction may be made by the planners themselves in Planning Analytics, with the model remaining flexible. Problems occurring after the discharge of the primary version are entered via the IT ticket system and assigned to the info science area. At regular intervals, it is usually checked whether the model still satisfies the expectations of the corporate or whether changes are obligatory.

Phase 5: Application (Schulz et al. 2022)

The transition to using the developed model implies that the Data Science Process Model(DASC-PM) enters its last phase. As a complete, SCHRAMME AG was capable of achieve the objectives it had set in the availability chain area by utilizing a structured and holistic approach. Additional or latest projects can now be derived from here. The planning processes were largely automated and supported by machine learning algorithms. The relevant stakeholders in management, finance, and the availability chain were highly satisfied. After initial skepticism, the planning team itself is now also convinced by the reduction in workload and possible prioritization. Nonetheless, it is usually conceivable that weak points will surface during use and more iterations will likely be required in later phases.

The case study as a complete showed that non-linear process models specifically are advantageous for the world of knowledge science. The DASC-PM is an appropriate novel process that may be transferred to quite a few other domains and problems.

Conclusion

In conclusion, data science plays an integral role in solving complex business problems by identifying hidden patterns and extracting actionable insights from data. Through this case study, we demonstrated how data science techniques may be used to develop predictive models to assist businesses make informed decisions e.g., in the availability chain.

While this case study focuses on demand planning, the method model may be utilized in various ways, similar to for constructing personalized recommendations on e-commerce web sites, identifying fraud in financial transactions, or predicting customer churn in telecom or subscription-based businesses.

Nonetheless, it’s essential to notice that real-world data science projects pose several challenges, similar to data quality issues, lack of domain expertise, and inadequate communication between stakeholders. As compared, fictitious case studies provide an idealized environment with clean, well-labeled data and well-defined problem statements. Thus, real-world projects require a practical approach that takes into consideration various aspects similar to business objectives, data quality, computational resources, and ethical considerations. I’m pretty sure you realize this from your personal experience. Don’t underestimate reality!

In summary, data science has immense potential to remodel industries, and society and create latest opportunities for businesses. The DASC-DM (or any) process model may also help to structure the approach logically to make sure clear guidance for each, business stakeholders in addition to the project team itself.

Please let me find out about your experience with data science projects. How do you structure them & what are the most important challenges? Be at liberty to go away a comment!

Image by Unsplash

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here