Home Artificial Intelligence Ensuring Trustworthy ML Systems With Data Validation and Real-Time Monitoring

Ensuring Trustworthy ML Systems With Data Validation and Real-Time Monitoring

0
Ensuring Trustworthy ML Systems With Data Validation and Real-Time Monitoring

Theoretical Concepts & Tools

Data Validation: Data validation refers back to the means of ensuring data quality and integrity. What do I mean by that?

As you robotically gather data from different sources (in our case, an API), you wish a approach to continually validate that the info you simply extracted follows a algorithm that your system expects.

For instance, you expect that the energy consumption values are:

  • of type float,
  • not null,
  • ≥0.

When you developed the ML pipeline, the API returned only values that respected these terms, as data people call it: a “data contract.”

But, as you permit your system to run in production for a 1 month, 1 yr, 2 years, etc., you won’t ever know what could change to data sources you haven’t got control over.

Thus, you wish a approach to always check these characteristics before ingesting the info into the Feature Store.

Note: To see find out how to extend this idea to unstructured data, reminiscent of images, you’ll be able to check my Master Data Integrity to Clean Your Computer Vision Datasets article.

Great Expectations (aka GE): GE is a preferred tool that easily enables you to do data validation and report the outcomes. Hopsworks has GE support. You’ll be able to add a GE validation suit to Hopsworks and select find out how to behave when recent data is inserted, and the validation step fails — read more about GE + Hopsworks [2].

Screenshot of GE data validation runs inside Hopswork [Image by the Author].

Ground Truth Types: While your model is running in production, you’ll be able to have access to your ground truth in 3 different scenarios:

  1. real-time: a really perfect scenario where you’ll be able to easily access your goal. For instance, once you recommend an ad and the patron either clicks it or not.
  2. delayed: eventually, you’ll access the bottom truths. But, unfortunately, it’s going to be too late to react in time adequately.
  3. none: you’ll be able to’t robotically collect any GT. Often, in these cases, you will have to rent human annotators in case you need any actuals.
Ground truth/targets/actuals types [Image by the Author].

In our case, we’re somewhere between #1. and #2. The GT is not precisely in real-time, but it surely has a delay only of 1 hour.

Whether a delay of 1 hour is OK depends rather a lot on the business context, but for instance that, in your case, it’s okay.

As we considered that a delay of 1 hour is okay for our use case, we’re in good luck: we’ve got access to the GT in real-time(ish).

This implies we are able to use metrics reminiscent of MAPE to watch the model’s performance in real-time(ish).

In scenarios 2 or 3, we wanted to make use of data & concept drifts as proxy metrics to compute performance signals in time.

Screenshot with the observations and predictions overlapped over time. As you’ll be able to see, the GT is not available for the most recent 24 hours of forecasts [Image by the Author].

ML Monitoring: ML monitoring is the means of assuring that your production system works well over time. Also, it gives you a mechanism to proactively adapt your system, reminiscent of retraining your model in time or adapting it to recent changes within the environment.

In our case, we are going to continually compute the MAPE metric. Thus, if the error suddenly spikes, you’ll be able to create an alarm to tell you or robotically trigger a hyper-optimization tuning step to adapt the model configuration to the brand new environment.

Screenshot with the mean MAPE metric between on a regular basis series computed over time [Image by the Author].

LEAVE A REPLY

Please enter your comment!
Please enter your name here