Machine Learning in Production? What This Really Means

-

, whether you’re a manager, an information scientist, an engineer, or a product owner, you’ve almost definitely been in no less than one meeting where the discussion revolved around “putting a model in production.”

But seriously, what does production even mean?

As it’s possible you’ll know, I’m an AI engineer. I began my first data science job in 2015, in a big French company within the energy sector. On the time, we were among the many first actors constructing AI applications for energy management and production (nuclear, hydraulic, and renewable). And if there’s one domain where putting AI into production is heavily regulated, it’s energy, especially nuclear. That is closely related to the character of the info and the very fact you can’t push machine learning models easily into an existing environment.

Due to this experience, I learned very early that making a model in a notebook is just the tip of the iceberg. I also began talking about production in a short time, without really knowing what it meant. For these reasons, I would like to share with you the clearer view I’ve developed through the years in the case of pushing machine learning projects into production.


But let’s pause for a moment and take into consideration our essential query.

What does production actually mean?

Sometimes, what’s behind this buzzword, “production,” might be tough to read and understand. There are countless YouTube videos and articles about it, but only a few that translate into something you may actually apply in real projects.

For those who try to reply it, our views will likely converge by the tip of this text, even when the methods we use to achieve production can differ from one context to a different.


The Essential Definition

Within the context of machine learning, production signifies that your model’s outputs directly affect a user or a product.

That impact can take many forms, comparable to educating someone, helping them make a choice, or enabling something they couldn’t do before; it could possibly also mean adding a feature to a shopping app’s suggestion system.

Any program containing a machine learning algorithm utilized by a final user or one other product or application might be considered a model in production.

Beyond having impact, production also comes with a layer of accountability. What I mean is that if no one or no system is accountable for correcting the model when it’s incorrect, then your model could also be deployed, but not in production.

There’s a typical concept that 87% of ML projects fail to achieve the ultimate stage of production. I don’t know if that’s strictly true, but my interpretation is straightforward: many ML models never reach the purpose where they really have an effect on a user or a product. And even once they do, there is usually no system in place to make them reliable over time, in order that they are only deployed and accessible.

So if we agree that production means having an ML project that’s impactful and accountable, how can we get there?


The Many Faces of Production

To reply that, we want to simply accept that production has many faces. The model is just one component inside a bigger ETL pipeline.

This point is crucial.

We regularly imagine a model as a black box, data goes in, math magic happens, and a prediction comes out. In point of fact, that’s a giant oversimplification. In production, models are frequently a part of a broader data flow, often closer to an information transformation than an isolated decision engine.

Also, not all “production” looks the identical depending on how forceful the model is in the ultimate system.

Sometimes the model supports a choice, like a rating, a suggestion, an alert, or a dashboard.

Sometimes it decides, comparable to automatic actions, real-time blocking, or triggering workflows.

The difference matters so much. When your system acts robotically, the fee of a mistake is just not the identical, and the engineering requirements normally increase very fast.

From my experience, most production systems might be broken down into:

The information storage system in production, which means that all data is stored in file systems or databases which can be safely hosted in production environments (cloud or on-premise).

→ The production of the info acquisition part, this implies having a system or workflow that connects to production databases and retrieves the info that can be used as input for the model. These workflows can contain the info preparation steps.

→ Pushing the machine learning component into production, that is the part that interests us. It means the model is already trained, and we want a system that enables it to run in the identical environment as the opposite components.

These three parts show us clearly that ML in production is just not in regards to the machine learning model itself, it’s about all the things around it.

But let’s focus only on component 3, “pushing the ML into production,” because the opposite steps are sometimes handled by different teams in an organization.


The 4-Step Breakdown

If I had a junior data scientist to whom I needed to clarify the way to work on this component, I’d separate it as follows:

Step 1: The Function

You begin with a trained model. The very first thing you wish is a function, some code that loads the model, receives input data, performs the prediction, and returns an output.

At this stage, all the things works locally. It’s exciting the primary time you see predictions appear, but we don’t wish to stop there.

A practical detail that matters early, don’t only think “does it predict?”, also think “does it fail cleanly?” In production, your function will eventually receive weird inputs, missing values, unexpected categories, corrupted files, or out-of-range signals. Your future self will thanks for basic validation and clear error messages.

Step 2: The Interface

To make this function usable by others (without asking them to run your code), you wish an interface, most frequently an API.

Once deployed, this API receives standardized requests containing input data, passes them to your prediction function, and returns the output. That is what allows other systems, applications, or users to interact together with your model.

And here’s a production reality, the interface is just not only a technical thing, it’s a contract. If one other system expects /predict and also you expose something else, friction is guaranteed. The identical applies if you happen to change the schema every two weeks. When teams say “the model is in production,” persistently what they really mean is “we created a contract that other people rely upon.”

Step 3: The Environment

Now we want portability. Which means packaging the environment, the code, the API, and all dependencies so the system can run elsewhere without modification.

For those who’ve followed the steps to this point, you’ve built a model, wrapped it in a function, and exposed it through an API. But none of that matters if all the things stays locked in your local environment.

That is where things develop into more skilled: reproducibility, versioning, and traceability. Not necessarily fancy, simply enough in order that if you happen to deploy v1.2 today, you may explain in three months what modified and why.

Step 4: The Infrastructure

The ultimate step is hosting all the things somewhere users or applications can actually access it.

In practice, this often means the cloud, but it could possibly even be internal company servers or edge infrastructure. The important thing point is that what you built should be reachable, stable, and usable where it’s needed.

And that is where many teams learn a tough lesson. In production, the “best model” is usually not the one with the perfect metric in a notebook. It’s the one that matches real constraints, latency, cost, security, regulation, monitoring, maintainability, and sometimes simply, “can we operate this with the team we’ve?”

Step 5: The Monitoring

You’ll be able to have the cleanest API and the nicest infrastructure, and still fail in production since you don’t see problems early.

A model in production that isn’t monitored is essentially broken already, you simply don’t comprehend it yet.

Monitoring doesn’t need to be complicated. At minimum, you need to know:

  • is the service up and latency tolerable?
  • are inputs still looking “normal”?
  • are the info output drifiting?
  • is the business impact still is sensible?

With many real-world projects, performance doesn’t collapse with a giant crash. It decays quietly.

Having all these components in place is what turns a model into something useful and impactful. Based on experience,

For Step 1 (The Function), stick with tools (scikit-learn, PyTorch, TensorFlow), but take into consideration portability early. Formats like ONNX could make future automation much easier. For those who develop your personal packages, you could be certain, whether you’re a manager or an information scientist, that the required software engineering or data engineering skills are present, because constructing internal libraries is a really different story from using off-the-shelf tools.

For Step 2 (The Interface), frameworks like FastAPI work thoroughly, but at all times think in regards to the consumer. If one other system expects /predict and also you expose something else, friction is guaranteed. It’s essential to be aligned together with your stakeholders, all technical points about where the machine learning output goes ought to be very clear.

For Step 3 (The Environment), that is where Docker is available in. You don’t must master all the things immediately, but you must understand the fundamentals. Consider Docker as putting all the things you built right into a box that may run almost anywhere. For those who have already got good data engineering skills, this ought to be wonderful. If not, you either need to construct them or depend on someone within the team who has them.

For Step 4 (The Infrastructure), constraints dictate decisions. Lambda, microservices, edge devices, and naturally, GPUs. ML workloads often need specialized infrastructure, sometimes via managed services like SageMaker.


Across all steps, one rule that saves lives: at all times keep a straightforward option to roll back. Production is just not only about deploying, it’s also about recovering when reality hits.

Don’t consider this step of your data science project as a single milestone. It’s a sequence of steps and a shift of mindset. In an organization, we are usually not waiting so that you can push essentially the most complicated model, we would like you to construct a model that answers business questions or adds a feature expected by a particular product. We’d like this model to achieve the product or the user, and to be monitored so that folks keep trusting and using it.

Understanding your environment could be very vital. The tools I discussed before can differ from one team to a different, however the methodology is identical. I’m sharing them only to provide you a concrete idea.

You’ll be able to construct an awesome model, but when nobody uses it, it doesn’t matter.

And if people use it, then it becomes real, it needs ownership, monitoring, constraints, and a system around it.

Don’t let your work stay within the 87%.


🤝 Stay Connected

For those who enjoyed this text, be happy to follow me on LinkedIn for more honest insights about AI, Data Science, and careers.

👉 LinkedIn: 

👉 Medium: https://medium.com/@sabrine.bendimerad1

👉 Instagramhttps://tinyurl.com/datailearn

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x