Parts 1 and a couple of of this series focussed on the technical aspect of improving the experimentation process. This began with rethinking how code is created, stored and used, and ended with utilising large scale parallelization to chop down the time taken to run experiments. This text takes a step back from the implementation details and as an alternative takes a wider have a look at how / why we experiment, and the way we are able to reduce the time of value of our projects by being smarter about experimenting.
Failing to plan is planning to fail
Starting on a brand new project is usually a really exciting time as an information scientist. You’re faced with a brand new dataset with different requirements in comparison with previous projects and could have the likelihood to check out novel modelling techniques you may have never used before. It’s sorely tempting to leap straight into the info, starting with EDA and possibly some preliminary modelling. You’re feeling energised and optimistic concerning the prospects of constructing a model that may deliver results to the business.
While enthusiasm is commendable, the situation can quickly change. Imagine now that months have passed and you might be still running experiments after having previously run 100’s, attempting to tweak hyperparameters to realize an additional 1-2% in model performance. Your final model configuration has became a fancy interconnected ensemble, using 4-5 base models that every one must be trained and monitored. Finally, in spite of everything of this you discover that your model barely improves upon the present process in place.
All of this might have been avoided if a more structured approach to the experimentation process was taken. You’re an information scientist, with emphasis on the scientist part, so knowing learn how to conduct an experiment is critical. In this text, I need to present some guidance about learn how to efficiently structure your project experimentation to make sure you stay focussed on what is significant when providing an answer to the business.
Gather more business information after which start easy
Before any modelling begins, you must set out very clearly what you are attempting to realize. That is where a disconnect can occur between the technical and business side of projects. Crucial thing to recollect as an information scientist is:
Your job will not be to construct a model, your job is to resolve a business problem that will involve a model!
Using this viewpoint is invaluable in succeeding as an information scientist. I actually have been on projects before where we built an answer that had no problem to resolve. Framing every little thing you do around supporting your enterprise will greatly improve the probabilities of your solution being adopted.
With that is mind, your first steps should all the time be to collect the next pieces of data in the event that they haven’t already been supplied:
- What’s the present business situation?
- What are the important thing metrics that outline their problem and the way are they wanting to enhance them?
- What’s a suitable metric improvement to contemplate any proposed solution successful?
An example of this could be:
You’re employed for an internet retailer who have to ensure that they’re all the time stocked. They’re currently experiencing issues with either having an excessive amount of stock lying around which takes up inventory space, or not having enough stock to satisfy customer demands which results in delays. They require you to enhance this process, ensuring they’ve enough product to satisfy demand while not overstocking.
Admittedly this can be a contrived problem but it surely hopefully illustrates that your role is here to unblock a business problem they’re having, and never necessarily constructing a model to achieve this. From here you’ll be able to dig deeper and ask:
- How often are they overstocked or understocked?
- Is it higher to be overstocked or understocked?
Now we’ve got the issue properly framed, we are able to start pondering of an answer. Again, before going straight right into a model take into consideration if there are simpler methods that may very well be used. While training a model to forecast future demand may give great results, it also comes with baggage:
- Where is the model going to be deployed?
- What is going to occur if performance drops and the model needs re-trained?
- How will you explain its decision to stakeholders if something goes incorrect?
Starting with something simpler and non-ML based gives us a baseline to work from. There’s also the possibly that this baseline could solve the issue at hand, entirely removing the necessity for a fancy ML solution. Continuing the above example, perhaps a straightforward or weighted rolling average of previous customer demand could also be sufficient. Or perhaps the items are seasonal and you must up demand depending on the time of yr.
If a non model baseline will not be feasible or cannot answer the business problem then moving onto a model based solution is the following step. Taking a principled approach to iterating through ideas and trying out different experiment configurations will likely be critical to make sure you arrive at an answer in a timely manner.
Have a transparent plan about experimentation
Once you may have decided that a model is required, it’s now time to take into consideration the way you approach experimenting. While you can go straight into an exhaustive search of each possibly model, hyperparameter, feature selection process, data treatments etc, being more focussed in your setups and having a deliberate strategy will make it easier to find out what’s working and what isn’t. With this in mind, listed here are some ideas that you must consider.
Pay attention to any constraints
Experimentation doesn’t occur in a vacuum, it’s one a part of the the project development process which itself is only one project happening inside an organisation. As such you will likely be forced to run your experimentation subject to limitations placed by the business. These constraints would require you to be economical along with your time and should steer you towards particular solutions. Some example constraints which are more likely to be placed on experiments are:
- Timeboxing: Letting experiments go on endlessly is a dangerous endeavour as you run the chance of your solution never making it to productionisation. As such it common to present a set time to develop a viable working solution after which you progress onto something else if it will not be feasible
- Monetary: Running experiments take up compute time and that isn’t free. This is very true when you are leveraging 3rd party compute where VM’s are typically priced by the hour. In the event you are usually not careful you can easily rack up an enormous compute bill, especially when you require GPU’s for instance. So care have to be taken to know the associated fee of your experimentation
- Resource Availability: Your experiment is not going to be the just one happening in your organisation and there could also be fixed computational resources. This implies chances are you’ll be limited in what number of experiments you’ll be able to run at anybody time. You’ll subsequently must be smart in selecting which lines of labor to explore.
- Explainability: While understanding the selections made by your model is all the time essential, it becomes critical when you work in a regulated industry equivalent to finance, where any bias or prejudice in your model could have serious repercussions. To make sure compliance chances are you’ll need to limit yourself to simpler but easier to interpret models equivalent to regressions, Decision Trees or Support Vector Machines.
It’s possible you’ll be subject to 1 or all of those constraints, so be prepared to navigate them.
Start with easy baselines
When coping with binary classification for instance, it could make sense to go straight to a fancy model equivalent to LightGBM as there may be a wealth of literature on their efficacy for solving a majority of these problems. Before that nonetheless, having a straightforward Logistic Regression model trained to function a baseline comes with the next advantages:
- Little to no hyperparameters to evaluate so quick iteration of experiments
- Very straightforward to clarify decision process
- More complicated models need to be higher than this
- It could be enough to resolve the issue at hand

Beyond Logistic Regression, having an ‘untuned’ experiment for a specific model (little to no data treatments, no explicit feature selection, default hyperparameters) is also essential as it can give a sign of how much you’ll be able to push a specific avenue of experimentation. For instance, if different experimental configurations are barely outperforming the untuned experiment, then that may very well be evidence that you must refocus your efforts elsewhere.
Using raw vs semi-processed data
From a practicality standpoint the info you receive from data engineering is probably not in the right format to be consumed by your experiment. Issues can include:
- 1000’s of columns and 1,000,000’s of transaction making it a strain on memory resources
- Features which can’t be easily used inside a model equivalent to nested structures like dictionaries or datatypes like datetimes

There are a number of different tactics to handle these scenarios:
- Scale up the memory allocation of your experiment to handle the info size requirements. This may occasionally not all the time be possible
- Include feature engineering as a part of the experiment process
- Process your data barely prior to experimentation
There are pro and cons to every approach and it’s as much as you to come to a decision. Performing some pre-processing equivalent to removing features with complex data structures or with incompatible datatypes could also be helpful now, but it surely may require backtracking in the event that they come into scope afterward within the experimentation process. Feature engineering inside the experiment may provide you with higher control over what’s being created, but it can introduce extra processing overheard for something which may be common across all experiments. There is no such thing as a correct alternative on this scenario and it is vitally much situation dependent.
Evaluate model performance fairly
Calculating final model performance is the tip goal of your experimentation. That is the result you’ll present to the business with the hope of getting approval to maneuver onto the production phase of your project. So it’s crucial that you simply give a good and unbiased evaluation of your model that aligns with stakeholder requirements. Key facets are:
- Be certain that you evaluation dataset took no part in your experimentation process
- Your evaluation dataset should reflect an actual life production setting
- Your evaluation metrics ought to be business and never model focussed

Having a standalone dataset for final evaluation ensures there is no such thing as a bias in your results. For instance, evaluating on the validation dataset you used to pick out features or hyperparameters will not be a good comparison as you run the chance of overfitting your solution to that data. You subsequently need a clean dataset that hasn’t been used before. This may occasionally feel simplistic to call out but it surely so essential that it bears repeating.
Your evaluation dataset being a real reflection of production gives confidence in your results. For example, models I actually have trained prior to now were done so on months and even years price of information to make sure behaviours equivalent to seasonality were captured. As a result of these time scales, the info volume was too large to make use of in its raw state so downsampling needed to occur prior to experimenting. Nevertheless the evaluation dataset shouldn’t be downsampled or modified in such a technique to distort it from real life. This is suitable as for inference you need to use techniques like streaming or mini-batching to ingest the info.
Your evaluation data must also be at the very least the minimum length that will likely be utilized in production, and ideally multiples of that length. For instance, in case your model will rating data every week then having your evaluation data be a days price of information will not be sufficient. It should at the very least be a weeks price of information, ideally 3 or 4 weeks price so you’ll be able to assess variability in results.
Validating the business value of your solution links back to what was said earlier about your role as an information scientist. You’re here to resolve an issue and never merely construct a model. As such it is vitally essential to balance the statistical vs business significance when deciding learn how to showcase your proposed solution. The primary aspect of this statement is to present results by way of a metric the business can act on. Stakeholders may not know what a model with an F1 rating of 0.95 is, but they know what a model that may save them £10 million annually brings to the corporate.
The second aspect of this statement is to take a cautious view on any proposed solution and consider all of the failure points that may occur, especially if we start introducing complexity. Consider 2 proposed models:
- A Logistic Regression model that operates on raw data with a projected saving of £10 million annually
- A 100M parameter Neural Network that required extensive feature engineering, selection and model tuning with a projected saving of £10.5 million annually
The Neural Network is best by way of absolute return, but it surely has significantly more complexity and potential points of failure. Additional engineering pipelines, complex retraining protocols and lack of explainability are all essential facets to contemplate and we want to take into consideration whether this overheard is price an additional 5% uplift in performance. This scenario is fantastical in nature but hopes as an instance the necessity to have a critical eye when evaluating results.
Know when to stop
When running the experimentation phase you might be balancing 2 objectives: the need to check out as many alternative experimental setups as possible vs any constrains you might be facing, more than likely the time allocated by the business so that you can experiment. There’s a 3rd aspect you must consider, and that’s knowing if you must end the experiment phase early. This might be for a spread reasons:
- Your proposed solution already answers the business problem
- Further experiments are experiencing diminishing returns
- Your experiments aren’t producing the outcomes you wanted
Your first instinct will likely be to make use of up all of your available time, either to attempt to fix your model or to actually push your solution to be the perfect it could possibly be. Nevertheless you must ask yourself in case your time may very well be higher spent elsewhere, either by moving onto productionisation, re-interpreting the present business problem in case your solution isn’t working or moving onto one other problem entirely. Your time is precious and you must treat it accordingly to ensure that whatever you might be working on goes to have the largest impact to the business.
Conclusion
In this text we’ve got considered learn how to plan the model experiment phase of your project. We’ve got focussed less on technical details and more on the ethos you must bring to experimentation. This began with taking time to know the business problem more to obviously define what must be achieved to contemplate any proposed solution successful. We spoke concerning the importance of straightforward baselines as a reference point that more complicated solutions might be compared against. We then moved onto any constraints chances are you’ll face and the way that may impact your experimentation. We then finished off by emphasising the importance of a good dataset to calculate business metrics to make sure there is no such thing as a bias in your . By adhering to the recommendations laid out here, we greatly increase our probabilities of reducing the time to value of our data science projects by quickly and confidently iterating through the experimentation process.