Constructing Higher ML Systems — Chapter 1: Every Project Must Start with a Plan

Artificial Intelligence

Constructing Higher ML Systems — Chapter 1: Every Project Must Start with a Plan

admin

April 22, 2023

Constructing Higher ML Systems — Chapter 1: Every Project Must Start with a Plan

About ML project lifecycle, designs doc, business value, and requirements. About starting small and failing fast.

Plenty of data scientists and ML engineers, after graduating from universities, have a false image of how their day-to-day work will appear like — they expect it to be just like their studies:

Trying cool state-of-the-art algorithms on fixed relatively clean datasets and choosing the very best one by way of accuracy (Expectations).

You don’t need:

To think in regards to the business value and a never-ending list of necessities.
(More than likely) to gather, label, and clean the dataset. In some cases, even the train/validation/test split is already done for you.
To thoroughly evaluate your model, check for biases, and conduct A/B tests.
To deploy the model to 1000’s (or hundreds of thousands) of users and ensure it’s up and running 99.9% of the time.
To watch the model, catch any drops in accuracy, and retrain it as needed.
To gather latest data immediately after deploying a previous version and begin working on a latest, hopefully, higher model.

Yes, you don’t must take into consideration all this in the course of the research/study project. But in real-life projects it becomes crucial.

The essential difference between research and a real-life project is that:

In real life, there are numerous users who use your model in all conceivable and unimaginable ways and expect it to at all times work quickly, accurately, and fairly without bias. Users’ behaviors constantly change, and epidemics and wars may occur, while your organization tries to earn a profit by delivering what users want and constructing a competitive advantage by applying Machine Learning in ways in which nobody else has ever tried or succeeded before (Reality).

Throughout this series, you’ll learn that constructing higher Machine Learning systems requires considering of it as a system — paying enough attention to every component and their relationships.

This tutorial can be helpful for Data Scientists, Machine Learning Engineers, Team, and Tech Leads (or those that aspire to be one). Don’t expect this series to be comprehensive, although it’s going to aid you lay a robust foundation in ML system design, fill any gaps, and permit you to explore topics which are less familiar to you. Along the best way, I’ll provide links to quite a few excellent posts, papers, and books.

Without further ado, let’s begin!

Below is the Machine Learning project lifecycle. Make yourself comfortable. First, you understand the duty and determine what must be done. Then, you collect, label, and clean the information. Next, you progress on to modeling. After that, you evaluate the models and choose the very best one. Finally, you deploy the model and monitor its performance.

*Image 1. Life Cycle of a Machine Learning Project. Image by Creator.*

Is that this the top? No, it is barely the start.

While monitoring, you could discover that the model will not be working well for some subset of users, or its accuracy is deteriorating over time, so you begin again: understand the issue -> get data -> model and evaluate -> deploy.

Or during model evaluation, you could find that the model will not be ok to deploy, and so you begin again: understand what will not be working and improve it -> collect more data -> do more modeling -> evaluate to (hopefully) get well results this time.

(If that is your first time learning in regards to the Machine Learning project lifecycle, I like to recommend trying out Anton Morgunov’s post: The Life Cycle of a Machine Learning Project: What Are the Stages?)

So there are two vital things to know:

(No rest for the wicked)
Image 1 provides a simplified version of how a Machine Learning system is developed, but in point of fact, you don’t move easily and sequentially from stage to stage. Something can go fallacious (and frequently does) at each stage, which may set you back a number of steps, and even throw you to the start. (Welcome to the actual world)

*Image 2.* *Life Cycle of a Machine Learning Project. Image by Creator.*

Those with engineering backgrounds may wonder: What’s the difference between Machine Learning projects and traditional software development? Where are the tests, builds, and releases? Thanks for asking.

The reality is that the Machine Learning project is a subclass of a software engineering project. . With that said, let me introduce you to a very realistic life cycle of a software project with a Machine Learning component:

*Image 3. Truly* *Life Cycle of a Software project with a Machine Learning component. Image by Creator.*

(Read MLOps: Machine Learning Life Cycle by Satish Chandra Gupta to learn more in regards to the ML software development lifecycle.)

Before spending 1000’s of dollars on data annotation and weeks and weeks on Machine Learning model development, there are 4 things it’s good to do. Let’s call it the “pre-coding” stage. So, close your PyCharm for now, as all you would like is a Google document, your brain, and Zoom.

1. .

Any business company’s goal is to earn extra money or provide a greater customer experience… with a purpose to earn extra money. With this easy axiom in mind, persuade your boss, C-level management, and stakeholders that the present ML project is a superb investment.

Ideally, it’s good to provide some rough numbers on how the ML model increases the corporate’s revenue, user engagement, or decreases request processing time, etc. Be creative here, turn off your perfectionism, and don’t hesitate to ask your colleagues from the financial and marketing departments for help.

(Take into account, later this metric can be used to access the project, so be realistic with what you promise to deliver.)

Once nobody has doubts that the ML model is crucial, start collecting requirements.

Each domain is particular and every project is exclusive, so there isn’t a exhaustive list of necessities to consult with. So trust your experience and collaborate together with your colleagues.

Here’s a helpful tip: Give you a listing of generic questions (I’ll share mine below) and just ask. Start the conversation, and as you discuss, more project-specific questions will naturally arise.

How much data do we’ve got? How are we going to label it?
What should model latency be?
Where will the model be deployed — cloud or on-premises? What are the instance specifications?
Are there any requirements for data privacy and model explainability?

At this point, I suggest that you simply reconsider whether a pure software engineering approach or a basic rule-based approach could also be an appropriate solution. Listed here are the posts that may aid you with that:
– When to Use Machine Learning by Amazon
– 4 Situations Where you Should Not use Machine Learning by Svenja Szillat

. It could seem obvious, but unfortunately, in my profession, I’ve seen too many corporations making the identical mistake: they need AI, but their datasets are small, lacking vital features, or dirty. An amazing post “The AI Hierarchy of Needs” by Monica Rogati encourages to consider AI as the highest of a pyramid of needs, while data collection, storage, and cleansing are at the muse.

*Image 4. The AI Hierarchy of Needs. Adapted from a picture by* *Monica Rogati in “The AI Hierarchy of Needs”*.

Even in case your goal is to create an ML system that serves hundreds of thousands of users per day, it’s clever to begin with something much much smaller:

PoC (Proof of Concept). Manually retrieve data from data storages, quickly iterate through a few algorithms in a Jupyter Notebook, and eventually, proof (or reject) the hypothesis that In the course of the PoC stage, you’ll also understand what is required to deploy and scale the model.
MVP (Minimal Viable Product). Assuming the PoC stage was successful and now you’re making a product with only the essential functionality and releasing it to users. In a Machine Learning project, this implies .

When you realize that an idea will not be understanding — abandon it with a transparent conscience and move to the following one. This is way easier to do if you haven’t already spent years of labor or tons of of 1000’s of dollars. Keeping the fee of failure low is a key consider the success of a project.

(To explore this topic further, read POC vs MVP: What to Decide to Construct a Great Product by Dmitry Chekalin.)

A design document in software engineering is an outline of the software system’s architecture — its overall structure, its individual components, and the interactions between them. It will probably take an arbitrary form and structure, be formal or informal, high-level or detailed (it’s as much as a team to make your mind up). In the course of the implementation phase of software development, the design document serves as a blueprint for developers to follow.

It is a best practice in software engineering, and as I discussed earlier, all software engineering best practices are highly welcomed in ML projects.

My personal reasons to like design docs are:

Writing a design doc is like implementing the project on a high level — you don’t actually code but still make decisions on data, algorithms, and infrastructure. You concentrate on all scenarios and evaluate trade-offs, which suggests that you simply’ll save money and time in the long run by avoiding dead ends.
The document is shared amongst team members in order that they will review it, familiarize themselves with the system design, and launch discussions if needed. Nobody is ignored, and everyone seems to be encouraged to contribute.

If you happen to are ready to begin writing a design doc, here’s a template for machine learning systems proposed by Eugene Yan. Be at liberty to change it and adapt it to your project needs.

If you happen to would love to learn more about design documents as an idea, take a look at these posts:
– Write Design Docs for Machine Learning Systems by Eugene Yan
– Design Docs at Google by Malte Ubl

On this chapter, we learned that each project must start with a plan because ML systems are too complex to implement in an ad-hoc manner. We reviewed the ML project lifecycle, discussed why and estimate project business value, collect the necessities, after which reevaluate with a chilly mind whether ML is actually needed. We learned start small and fail fast using concepts like “PoC” and “MVP”. And eventually, we talked in regards to the importance of design documents in the course of the strategy planning stage.

In the following posts, you’ll find out about data collection and labeling, model development, experiment tracking, online and offline evaluation, deployment, monitoring, retraining, and far way more — all this can aid you construct higher Machine Learning systems.

The following chapter can be available soon. Subscribe to remain tuned.