Prescriptive Modeling Unpacked: A Complete Guide to Intervention With Bayesian Modeling.

-

In this text, I’ll reveal tips on how to move from simply forecasting outcomes to actively intervening in systems to steer toward desired goals. With hands-on examples in predictive maintenance, I’ll show how data-driven decisions can optimize operations and reduce downtime.

with descriptive evaluation to analyze “”. In predictive evaluation, we aim for insights and determine “”. With Bayesian prescriptive modeling, we are able to transcend prediction and aim to intervene within the final result. I’ll reveal how you should use data to “”. To do that, we’d like to grasp the complex relationships between variables in a (closed) system. Modeling causal networks is essential, and as well as, we’d like to make inferences to quantify how the system is affected in the specified final result. In this text, I’ll briefly start by explaining the theoretical background. Within the second part,



What You Need To Know About Prescriptive Evaluation: A Transient Introduction.

Prescriptive evaluation would be the strongest solution to understand your online business performance, trends, and to optimize for efficiency, nevertheless it is actually not step one you are taking in your evaluation. Step one must be, like at all times, understanding the information by way of evaluation with Exploratory Data Evaluation (EDA). That is the step where we’d like to work out “”. That is super essential since it provides us with deeper insights into the variables and their dependencies within the system, which subsequently helps to wash, normalize, and standardize the variables in our data set. Cleaned data set are the basics in every evaluation. 

With the cleaned data set, we are able to start working on our model. Normally, for some of these evaluation, we regularly need a number of data. The rationale is easy: the higher we are able to learn a model that matches the information accurately, the higher we are able to detect causal relationships. In this text, I’ll use the notion of ‘’ incessantly, so let me first define ‘.

For example, suppose now we have a healthcare system that accommodates details about patients with their symptoms, treatments, genetics, environmental variables, and behavioral information. If we understand the causal process, we are able to intervene by influencing (one or multiple) driver variables. To enhance the patient’s final result, we may only need a comparatively small change, reminiscent of improving their weight loss program. Importantly, the variable that we aim to influence or intervene should be a driver variable to make it impactful. Generally speaking, changing variables for a desired final result is something we do in our day by day lives. From closing the window to forestall rain coming in to the recommendation from friends, family, or professionals that we consider for a selected final result. But this may increasingly even be a more trial-and-error procedure. With prescriptive evaluation, we aim to find out the motive force variables after which quantify what happens on intervention.

Throughout this text, I’ll deal with applications with systems that include physical components, reminiscent of bridges, pumps, dikes, together with environmental variables reminiscent of rainfall, river levels, soil erosion, and human decisions (e.g., maintenance schedules and costs). In the sector of water management, there are classic cases of complex systems where prescriptive evaluation can offer serious value. An awesome candidate for prescriptive evaluation is predictive maintenance, which might increase operational time and reduce costs. Such systems often contain various sensors, making it data-rich. At the identical time, the variables in systems are sometimes interdependent, meaning that actions in a single a part of the system often ripple through and affect others. For instance, opening a floodgate upstream can change water pressure and flow dynamics downstream. This interconnectedness is strictly why understanding causal relationships is significant. After we understand the crucial parts in your entire system, we are able to more accurately intervene. With Bayesian modeling, we aim to uncover and quantify these causal relationships.


Bayesian Networks and Causal Inference: The Constructing Blocks.

At its core, a Bayesian network is a graphical model that represents probabilistic relationships between variables. These networks with causal relationships are powerful tools for prescriptive modeling. Let’s break this down using a classic example: . Suppose you’re attempting to work out why your grass is wet. One possibility is that you just turned on the sprinkler; one other is that it rained. The weather plays a task too; on cloudy days, it’s more more likely to rain, and the sprinkler might behave in another way depending on the forecast. These dependencies form a network of causal relationships that we are able to model. With bnlearn for Python, we are able to model the relationships as shown within the code block:

# Install Python bnlearn package
pip install bnlearn
# Import library
import bnlearn as bn

# Define the causal relationships
edges = [('Cloudy', 'Sprinkler'),
         ('Cloudy', 'Rain'),
         ('Sprinkler', 'Wet_Grass'),
         ('Rain', 'Wet_Grass')]

# Create the Bayesian network
DAG = bn.make_DAG(edges)

# Visualize the network
bn.plot(DAG)
Figure 1: DAG for the sprinkler system. It encodes the next logic: wet grass depends on sprinkler and rain. The sprinkler depends on cloudy, and rain depends on cloudy (image by writer).

This creates a Directed Acyclic Graph (DAG) where each node represents a variable, each edge represents a causal relationship, and the direction of the sting shows the direction of causality. To date, now we have not modeled any data, but only provided the causal structure based on our own domain knowledge in regards to the weather together with our understanding/ hypothesis of the system. Necessary to grasp is that such a DAG forms the premise for Bayesian learning!

Learning Structure from Data.

In lots of occasions, we don’t know the causal relationships beforehand, but have the information that we are able to use to learn the structure. The bnlearn library provides several structure-learning approaches that may be chosen based on the style of input data (discrete, continuous, or mixed data sets); But the choice for the style of algorithm can be based on the style of network you aim for. You possibly can for instance set a root node if you’ve got an excellent reason for this.

# Import library
import bnlearn as bn

# Load Sprinkler data set
df = bn.import_example(data='sprinkler')

# Show dataframe
print(df)
+--------+------------+------+------------+
| Cloudy | Sprinkler | Rain | Wet_Grass   |
+--------+------------+------+------------+
|   0    |     0      |  0   |     0      |
|   1    |     0      |  1   |     1      |
|   0    |     1      |  0   |     1      |
|   1    |     1      |  1   |     1      |
|   1    |     1      |  1   |     1      |
|  ...   |    ...     | ...  |    ...     |
|  1000  |     1      |  0   |     0      |
+--------+------------+------+------------+

# Structure learning
model = bn.structure_learning.fit(df)

# Visualize the network
bn.plot(DAG)

DAGs Matter for Causal Inference.

The underside line is that Directed Acyclic Graphs (DAGs) depict the causal relationships between the variables. This learned model forms the premise for making inferences and answering questions like:

Making inferences is crucial for prescriptive modeling since it helps us understand and quantify the impact of the variables on intervention. As mentioned before, not all variables in systems are of interest or subject to intervention. In our easy use case, we are able to intervene for based on but we are able to not intervene for based on conditions because we are able to not control the weather.


Generate Synthetic Data in Case You Only Have Experts’ Knowledge or Few Samples.

In lots of domains, reminiscent of healthcare, finance, cybersecurity, and autonomous systems, real-world data may be sensitive, expensive, imbalanced, or difficult to gather, particularly for rare or edge-case scenarios. That is where synthetic Data becomes a strong alternative. There are, roughly speaking, two essential categories of making synthetic data: Probabilistic and Generative. In case you would like more data, I might recommend reading this blog about [3]. It

  1. Generate synthetic data that mimics existing continuous measurements (expected with independent variables).
  2. Generate synthetic data that mimics expert knowledge. (expected to be continuous and Independent variables).
  3. Generate synthetic Data that mimics an existing categorical dataset (expected with dependent variables).
  4. Generate synthetic data that mimics expert knowledge (expected to be categorical and with dependent variables).

A Real World Use Case In Predictive Maintenance.

So far, I actually have briefly described the Bayesian theory and demonstrated tips on how to learn structures using the sprinkler data set. On this section, we are going to work with a fancy real-world data set to find out the causal relationships, perform inferences, and assess whether we are able to recommend interventions within the system to alter the final result of machine failures. Suppose you’re chargeable for the engines that operate a water lock, and also you’re trying to grasp what aspects drive potential machine failures because your goal is to maintain the engines running without failures. In the next sections, we are going to stepwise undergo the information modeling parts and take a look at to work out how we are able to keep the engines running without failures.

Figure 2
Photo by Jani Brumat on Unsplash

Step 1: Data Understanding.

The information set we are going to use is a predictive maintenance data set [1] . It captures a simulated but realistic representation of sensor data from machinery over time. In our case, we treat this as if it were collected from a fancy infrastructure system, reminiscent of the motors controlling a water lock, where equipment reliability is critical. See the code block below to load the information set.

# Import library
import bnlearn as bn

# Load data set
df = bn.import_example('predictive_maintenance')

# print dataframe
+-------+------------+------+------------------+----+-----+-----+-----+-----+
|  UDI | Product ID  | Type | Air temperature  | .. | HDF | PWF | OSF | RNF |
+-------+------------+------+------------------+----+-----+-----+-----+-----+
|    1 | M14860      |   M  | 298.1            | .. |   0 |   0 |   0 |   0 |
|    2 | L47181      |   L  | 298.2            | .. |   0 |   0 |   0 |   0 |
|    3 | L47182      |   L  | 298.1            | .. |   0 |   0 |   0 |   0 |
|    4 | L47183      |   L  | 298.2            | .. |   0 |   0 |   0 |   0 |
|    5 | L47184      |   L  | 298.2            | .. |   0 |   0 |   0 |   0 |
| ...  | ...         | ...  | ...              | .. | ... | ... | ... | ... |
| 9996 | M24855      |   M  | 298.8            | .. |   0 |   0 |   0 |   0 |
| 9997 | H39410      |   H  | 298.9            | .. |   0 |   0 |   0 |   0 |
| 9998 | M24857      |   M  | 299.0            | .. |   0 |   0 |   0 |   0 |
| 9999 | H39412      |   H  | 299.0            | .. |   0 |   0 |   0 |   0 |
|10000 | M24859      |   M  | 299.0            | .. |   0 |   0 |   0 |   0 |
+-------+-------------+------+------------------+----+-----+-----+-----+-----+
[10000 rows x 14 columns]

The predictive maintenance data set is a so-called mixed-type data set containing a mixture of continuous, categorical, and binary variables. It captures operational data from machines, including each sensor readings and failure events. As an example, it includes like rotational speed, torque, and gear wear (all continuous variables reflecting how the machine is behaving over time). Alongside these, now we have categorical information reminiscent of the machine type and like air temperature. The information set also records whether specific varieties of failures occurred, reminiscent of tool wear failure or heat dissipation failure, represented as binary variables. This mixture of variables allows us to not only observe what happens under different conditions but additionally explore the potential causal relationships which may drive machine failures.

Table 1: The table provides an outline of the variables within the predictive maintenance data set. There are various kinds of variables, identifiers, sensor readings, and goal variables (failure indicators). Each variable is characterised by its role, data type, and a transient description.

Step 2: Data Cleansing

Before we are able to begin learning the causal structure of this method using Bayesian methods, we’d like to perform some pre-processing steps first. Step one is to remove irrelevant columns, reminiscent of unique identifiers (), which holds no meaningful information for modeling. If there have been missing values, we could have needed to impute or remove them. On this data set, there aren’t any missing values. If there have been missing values, bnlearn provide two imputation methods for handling missing data, namely the K-Nearest Neighbor imputer (knn_imputer) and the MICE imputation approach (mice_imputer). Each methods follow a two-step approach during which first the numerical values are imputed, then the explicit values. This two-step approach is an enhancement on existing methods for handling missing values in mixed-type data sets.

# Remove IDs from Dataframe
del df['UDI']
del df['Product ID']

Step 3: Discretization Using Probability Density Functions.

A lot of the Bayesian models are designed to model categorical variables. Continuous variables can distort computations because they require assumptions in regards to the underlying distributions, which aren’t at all times easy to validate. In case of the information sets that contain each continuous and discrete variables, it’s best to discretize the continual variables. There are multiple ways for discretization, and in bnlearn the next solutions are implemented:

  1. Discretize using probability density fitting. This approach mechanically matches the most effective distribution for the variable and bins it into 95% confidence intervals (the thresholds may be adjusted). A semi-automatic approach is advisable because the default CII (upper, lower) intervals may not correspond to meaningful domain-specific boundaries.
  2. Discretize using a principled Bayesian discretization method. This approach requires providing the DAG before applying the discretization method. The underlying idea is that experts’ knowledge shall be included within the discretization approach, and due to this fact increase the accuracy of the binning.
  3. Don’t discretize but model continuous and hybrid data sets in a semi-parametric approach. There are two approaches implemented in bnlearn are those who can handle mixed data sets; Direct-lingam and Ica-lingam, which each assume linear relationships.
  4. Manually discretizing using the expert’s domain knowledge. Such an answer may be helpful, nevertheless it requires expert-level mechanical knowledge or access to detailed operational thresholds. A limitation is that it may well introduce certain bias into the variables because the thresholds reflect subjective assumptions and should not capture the true underlying variability or relationships in the information.

Approach 2 and three could also be less suitable for our current use case because Bayesian discretization methods often require strong priors or assumptions in regards to the system (DAG) that I cannot confidently provide. The semi-parametric approach, then again, may introduce unnecessary complexity for this relatively small data set. The discretization approach that I’ll use is a mixture of probability density fitting [3] together with the specifications in regards to the operation ranges of the mechanical devices. I don’t have expert-level mechanical knowledge to confidently set the thresholds. Nonetheless, the specifications are listed for normal mechanical operations within the documentation [1]. Let me elaborate more on this. The information set description lists the next specifications: Air Temperature is measured in Kelvin, and around 300 K with a regular deviation of two K.​ The Process temperature throughout the manufacturing process is roughly the Air Temperature plus 10 K. The Rotational speed of the machine is in revolutions per minute, and calculated from an influence of 2860 W.​ The Torque is in Newton-meters, and around 40 Nm without negative values.​ The Tool wear is the cumulative minutes. With this information, we are able to define whether we’d like to set lower and/ or upper boundaries for our probability density fitting approach.

Table 2: The table outlines how the continual sensor variables are discretized using probability density fitting by including the expected operating ranges of the machinery.

See Table 2 where I defined normal and significant operation ranges, and the code block below to set the brink values based on the information distributions of the variables.

pip install distfit
# Discretize the next columns
colnames = ['Air temperature [K]', 'Process temperature [K]', 'Rotational speed [rpm]', 'Torque [Nm]', 'Tool wear [min]']
colours = ['#87CEEB', '#FFA500', '#800080', '#FF4500', '#A9A9A9']

# Apply distribution fitting to every variable
for colname, color in zip(colnames, colours):
    # Initialize and set 95% confidence interval
    if colname=='Tool wear [min]' or colname=='Process temperature [K]':
        # Set model parameters to find out the medium-high ranges
        dist = distfit(alpha=0.05, sure='up', stats='RSS')
        labels = ['medium', 'high']
    else:
        # Set model parameters to find out the low-medium-high ranges
        dist = distfit(alpha=0.05, stats='RSS')
        labels = ['low', 'medium', 'high']

    # Distribution fitting
    dist.fit_transform(df[colname])

    # Plot
    dist.plot(title=colname, bar_properties={'color': color})
    plt.show()

    # Define bins based on distribution
    bins = [df[colname].min(), dist.model['CII_min_alpha'], dist.model['CII_max_alpha'], df[colname].max()]
    # Remove None
    bins = [x for x in bins if x is not None]

    # Discretize using the defined bins and add to dataframe
    df[colname + '_category'] = pd.cut(df[colname], bins=bins, labels=labels, include_lowest=True)
    # Delete the unique column
    del df[colname]

This semi-automated approach determines the optimal binning for every variable given the critical operation ranges. We thus fit a probability density function (PDF) to every continuous variable and use statistical properties, reminiscent of the 95% confidence interval, to define categories like low, medium, and high. This approach preserves the underlying distribution of the information while still allowing for interpretable discretization aligned with natural variations within the system. This enables to create bins which can be each statistically sound and interpretable. As at all times, plot the outcomes and make sanity checks, because the resulting intervals may not at all times align with meaningful, domain-specific thresholds. See Figure 2 with the estimated PDFs and thresholds for the continual variables. On this scenario, we see nicely that two variables are binned into medium-high, while the remainder are in low-medium-high.

Figure 2: Estimated probability density functions (PDF) and threshold for every continuous variable based on the 95% confidence interval.

Step 4: The Final Cleaned Data set.

At this point, now we have a cleaned and discretized data set. The remaining variables in the information set are failure modes (TWF, HDF, PWF, OSF, RNF) that are boolean variables for which no transformation step is required. These variables are kept within the model due to their possible relationships with the opposite variables. For example, Torque may be linked to OSF (overstrain failure), or Air temperature differences with HDF (heat dissipation failure), or Tool Wear is linked with TWF (tool wear failure). In the information set description is described that if not less than one failure mode is true, the method fails, and the Machine Failure label is ready to 1. It’s, nonetheless, not transparent which of the failure modes has caused the method to fail. Or in other words, the Machine Failure label is a composite final result: it only tells you that went flawed, but not which causal path led to the failure.

Step 5: Learning The Causal Structure.

On this step, we are going to determine the causal relationships. In contrast to supervised Machine Learning approaches, we don’t have to set a goal variable reminiscent of Machine Failure. The Bayesian model will learn the causal relationships based on the information using a search strategy and scoring function. A scoring function quantifies how well a selected DAG explains the observed data, and the search strategy is to efficiently walk through your entire search space of DAGs to eventually find essentially the most optimal DAG without testing all of them. For this use case, we are going to use HillClimbSearch as a search strategy and the Bayesian Information Criterion (BIC) as a scoring function. See the code block to learn the structure using Python bnlearn .

# Structure learning
model = bn.structure_learning.fit(df, methodtype='hc', scoretype='bic')
# [bnlearn] >Warning: Computing DAG with 12 nodes can take a really very long time!
# [bnlearn] >Computing best DAG using [hc]
# [bnlearn] >Set scoring type at [bds]
# [bnlearn] >Compute structure scores for model comparison (higher is best).

print(model['structure_scores'])
# {'k2': -23261.534992034045,
# 'bic': -23296.9910477033,
# 'bdeu': -23325.348497769708,
# 'bds': -23397.741317668322}

# Compute edge weights using ChiSquare independence test.
model = bn.independence_test(model, df, test='chi_square', prune=True)

# Plot the most effective DAG
bn.plot(model, edge_labels='pvalue', params_static={'maxscale': 4, 'figsize': (15, 15), 'font_size': 14, 'arrowsize': 10})

dotgraph = bn.plot_graphviz(model, edge_labels='pvalue')
dotgraph

# Store to pdf
dotgraph.view(filename='bnlearn_predictive_maintanance')

Each model may be scored based on its structure. Nonetheless, the scores should not have straightforward interpretability, but may be used to check different models. A better rating represents a greater fit, but keep in mind that scores are often log-likelihood based, so a less negative rating is thus higher. From the outcomes, we are able to see that K2=-23261 scored the most effective, meaning that the learned structure had the most effective fit on the information. 

Nonetheless, the differences in rating with BIC=-23296 could be very small. I then prefer selecting the DAG determined by BIC over K2 as DAGs detected BIC are generally sparser, and thus cleaner, because it adds a penalty for complexity (variety of parameters, variety of edges). The K2 approach, then again, determines the DAG purely on the likelihood or the fit on the information. Thus, there isn’t any penalty for making a more complex network (more edges, more parents).

Figure 3: DAG based on Hillclimbsearch and BIC scoring function. All the continual values are discretized using Distfit with the 95% confidence intervals. The perimeters are the -log10(P-values) which can be determined using the chi-square test. The image is created using Bnlearn. Image by the writer.

Discover Potential Interventions for Machine Failure.

I introduced the concept Bayesian evaluation enables lively intervention in a system. Meaning that we are able to steer towards our desired outcomes, aka the prescriptive evaluation. To accomplish that, we first need a causal understanding of the system. At this point, now we have obtained our DAG (Figure 3) and may start interpreting the DAG to find out the possible driver variables of machine failures.

From Figure 3, it may well be observed that the Machine Failure label is a composite final result; it’s influenced by multiple underlying variables. We will use the DAG to systematically discover the variables for intervention of machine failures. Let’s start by examining the foundation variable, which is PWF (Power Failure). The DAG shows that stopping power failures would directly contribute to stopping machine failures overall. Although this finding is intuitive (aka power issues result in system failure), it will be important to acknowledge that this conclusion has now been derived purely from data. If it were a distinct variable, we would have liked to give it some thought what it could mean and whether the DAG is accurate for our data set.

After we proceed to look at the DAG, we see that Torque is linked to OSF (Overstrain Failure). Air Temperature is linked to HDF (Heat Dissipation Failure), and Tool Wear is linked to TWF (Tool Wear Failure). Ideally, we expect that failure modes (TWF, HDF, PWF, OSF, RNF) are effects, while physical variables like Torque, Air Temperature, and Tool Wear act as causes. Although structure learning detected these relationships quite well, it doesn’t at all times capture the proper causal direction purely from observational data. Nonetheless, the discovered edges provide actionable starting points that may be used to design our interventions:

  • Torque → OSF (Overstrain Failure):
    Actively monitoring and controlling torque levels can prevent overstrain-related failures.
  • Air Temperature → HDF (Heat Dissipation Failure):
    Managing the ambient environment (e.g., through improved cooling systems) may reduce heat dissipation issues.
  • Tool Wear → TWF (Tool Wear Failure):
     Real-time tool wear monitoring can prevent tool wear failures.

Moreover, Random Failures (RNF) aren’t detected with any outgoing or incoming connections, indicating that such failures are truly stochastic inside this data set and can’t be mitigated through interventions on observed variables. That is an ideal sanity check for the model because we’d not expect the RNF to be essential within the DAG!


Quantify with Interventions.

Up up to now, now we have learned the structure of the system and identified which variables may be targeted for intervention. Nonetheless, we aren’t finished yet. To make these interventions meaningful, we must quantify the expected outcomes.

That is where inference in Bayesian networks comes into play. Let me elaborate a bit more on this because after I describe , I mean changing a variable within the system, like keeping Torque at a low level, or reducing Tool Wear before it hits high values, or ensuring Air Temperature stays stable. In this way, we are able to reason over the learned model since the system is interdependent, and a change in a single variable can ripple throughout your entire system. 

Using inferences is thus essential and for various reasons: 1. Forward inference, where we aim to predict future outcomes given current evidence. 2. Backward inference, where we are able to diagnose the almost certainly cause after an event has occurred. 3. Counterfactual inference to simulate the “what-if” scenarios. Within the context of our predictive maintenance data set, inference can now help answer specific questions. But first, we’d like to learn the inference model, which is completed easily as shown within the code block below. With the model we are able to start asking questions and see how its effects ripples throughout the system.

# Learn inference model
model = bn.parameter_learning.fit(model, df, methodtype="bayes")
q = bn.inference.fit(model, variables=['Machine failure'],
                      evidence={'Torque [Nm]_category': 'high'},
                      plot=True)

+-------------------+----------+
|   Machine failure |        p |
+===================+==========+
|                 0 | 0.584588 |
+-------------------+----------+
|                 1 | 0.415412 |
+-------------------+----------+

Machine failure = 0: No machine failure occurred.
Machine failure = 1: A machine failure occurred.

On condition that the Torque is high:
There's a few 58.5% likelihood the machine won't fail.
There's a few 41.5% likelihood the machine will fail.

A High Torque value thus significantly increases the chance of machine failure.
Give it some thought, without conditioning, machine failure probably happens
at a much lower rate. Thus, controlling the torque and keeping it out of
the high range may very well be a vital prescriptive motion to forestall failures.
Figure 4. Inference Summary. Image by the Writer
q = bn.inference.fit(model, variables=['HDF'],
                      evidence={'Air temperature [K]_category': 'medium'},
                      plot=True)

+-------+-----------+
|   HDF |         p |
+=======+===========+
|     0 | 0.972256  |
+-------+-----------+
|     1 | 0.0277441 |
+-------+-----------+

HDF = 0 means "no heat dissipation failure."
HDF = 1 means "there's a heat dissipation failure."

On condition that the Air Temperature is kept at a medium level:
There's a 97.22% likelihood that no failure will occur.
There is barely a 2.77% likelihood that a failure will occur.
Figure 5. Inference Summary. Image by the Writer
q = bn.inference.fit(model, variables=['TWF', 'HDF', 'PWF', 'OSF'],
                      evidence={'Machine failure': 1},
                       plot=True)

+----+-------+-------+-------+-------+-------------+
|    |   TWF |   HDF |   PWF |   OSF |           p |
+====+=======+=======+=======+=======+=============+
|  0 |     0 |     0 |     0 |     0 | 0.0240521   |
+----+-------+-------+-------+-------+-------------+
|  1 |     0 |     0 |     0 |     1 | 0.210243    | <- OSF
+----+-------+-------+-------+-------+-------------+
|  2 |     0 |     0 |     1 |     0 | 0.207443    | <- PWF
+----+-------+-------+-------+-------+-------------+
|  3 |     0 |     0 |     1 |     1 | 0.0321357   |
+----+-------+-------+-------+-------+-------------+
|  4 |     0 |     1 |     0 |     0 | 0.245374    | <- HDF
+----+-------+-------+-------+-------+-------------+
|  5 |     0 |     1 |     0 |     1 | 0.0177909   |
+----+-------+-------+-------+-------+-------------+
|  6 |     0 |     1 |     1 |     0 | 0.0185796   |
+----+-------+-------+-------+-------+-------------+
|  7 |     0 |     1 |     1 |     1 | 0.00499062  |
+----+-------+-------+-------+-------+-------------+
|  8 |     1 |     0 |     0 |     0 | 0.21378     | <- TWF
+----+-------+-------+-------+-------+-------------+
|  9 |     1 |     0 |     0 |     1 | 0.00727977  |
+----+-------+-------+-------+-------+-------------+
| 10 |     1 |     0 |     1 |     0 | 0.00693896  |
+----+-------+-------+-------+-------+-------------+
| 11 |     1 |     0 |     1 |     1 | 0.00148291  |
+----+-------+-------+-------+-------+-------------+
| 12 |     1 |     1 |     0 |     0 | 0.00786678  |
+----+-------+-------+-------+-------+-------------+
| 13 |     1 |     1 |     0 |     1 | 0.000854361 |
+----+-------+-------+-------+-------+-------------+
| 14 |     1 |     1 |     1 |     0 | 0.000927891 |
+----+-------+-------+-------+-------+-------------+
| 15 |     1 |     1 |     1 |     1 | 0.000260654 |
+----+-------+-------+-------+-------+-------------+

Each row represents a possible combination of failure modes:

TWF: Tool Wear Failure
HDF: Heat Dissipation Failure
PWF: Power Failure
OSF: Overstrain Failure

More often than not, when a machine failure occurs, it may well be traced back to
exactly one dominant failure mode:
HDF (24.5%)
OSF (21.0%)
PWF (20.7%)
TWF (21.4%)

Combined failures (e.g., HDF + PWF lively at the identical time) are much
less frequent (<5% combined).

When a machine fails, it's almost at all times attributable to one specific failure mode and never a mixture.
Heat Dissipation Failure (HDF) is essentially the most common root cause (24.5%), but others are very close.
Intervening on these individual failure types could significantly reduce machine failures.

I demonstrated three examples using inferences with interventions at different points. Keep in mind that to make the interventions meaningful, we must thus quantify the expected outcomes. If we don’t quantify these actions will change the probability of machine failure, we are only guessing. The quantification, “ is strictly what inference in Bayesian networks does because it updates the chances based on our intervention (the evidence), after which tells us how much impact our control motion may have. I do have one last section that I would like to share, which is about cost-sensitive modeling. The query you must ask yourself shouldn't be just: “Can I predict or prevent failures?” but how cost-effective is it? Keep


Cost Sensitive Modeling: Finding the Sweet-Spot.

That is the query you must ask yourself before After we construct prescriptive maintenance models and recommend interventions based on model outputs, we must also understand the economic returns. This moves the discussion from pure model accuracy to a cost-optimization framework. 

One solution to do that is by translating the normal confusion matrix right into a cost-optimization matrix, as depicted in Figure 6. The confusion matrix has the 4 known states (A), but each state can have a distinct cost implication (B). For illustration, in Figure 6C, a premature alternative (false positive) costs €2000 in unnecessary maintenance. In contrast, missing a real failure (false negative) can cost €8000 (including €6000 damage and €2000 alternative costs). This asymmetry highlights why cost-sensitive modeling is critical: False negatives are 4x more costly than false positives.

Figure 6. Cost-sensitive modeling. Image by the Writer

In practice, we must always due to this fact not only optimize for model performance but additionally minimize the overall expected costs. A model with the next false positive rate () can due to this fact be more optimal if it significantly reduces the prices in comparison with the much costlier false negatives (Failure). Having said this, this doesn't mean that we must always at all times go for premature replacements because, besides the prices, there's also the timing of replacing. Or in other words,

The precise moment when equipment must be replaced or serviced is inherently uncertain. Mechanical processes with wear and tear are stochastic. Due to this fact, we cannot expect to know the precise point of optimal intervention. What we are able to do is search for the so-called sweet spot for maintenance, where intervention is most cost-effective, as depicted in Figure 7.

Figure 7. Finding the optimal alternative time (sweet-spot) using ownership and repair costs. Image by the writer.

This figure shows how the prices of owning (orange) and repairing an asset (blue) evolve over time. Firstly of an asset’s life, owning costs are high (but decrease steadily), while repair costs are low (but rise over time). When these two trends are combined, the overall cost initially declines but then starts to extend again.

The sweet spot occurs within the period where the overall cost of ownership and repair is at its lowest. Although the sweet spot may be estimated, it often can't be pinpointed exactly because real-world conditions vary. We will higher define a sweet-spot window. Good monitoring and data-driven strategies allow us to remain near it and avoid the steep costs related to unexpected failure later within the asset’s life. Acting during this sweet-spot window (e.g., replacing, overhauling, etc) ensures the most effective financial final result. Intervening too early means missing out on usable life, while waiting too long results in rising repair costs and an increased risk of failure. The essential takeaway is that effective asset management goals to act near the sweet spot, avoiding each unnecessary early alternative and dear reactive maintenance after failure.


Wrapping up.

In this text, we moved from a RAW data set to a causal Directed Acyclic Graph (DAG), which enabled us to transcend descriptive statistics to prescriptive evaluation. I demonstrated a data-driven approach to learn the causal structure of an information set and to discover which points of the system may be adjusted to enhance and reduce failure rates. Before making interventions, we also must perform inferences, which give us the updated probabilities after we fix (or observe) certain variables. Without this step, the intervention is just guessing because actions in a single a part of the system often ripple through and affect others. This interconnectedness is strictly why understanding causal relationships is so essential.

Before moving into prescriptive analytics and taking motion based on our analytical interventions, it is extremely advisable to research whether the price of failure outweighs the price of maintenance. The challenge is to search out the sweet spot: the purpose where the price of preventive maintenance is balanced against the rising risk and value of failure. I showed with Bayesian inference how variables like Torque can shift the failure probability. Such insights provides understanding of the impact of intervention. The timing of the intervention is crucial to make it cost-effective; being too early would waste resources, and being too late can lead to high failure costs.

Similar to all other models, Bayesian models are also “” models, and the causal network needs experimental validation before making any critical decisions. 



Software

Let’s connect!


References

  1. AI4I 2020 Predictive Maintenance Data set. (2020). UCI Machine Learning Repository. Licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0).
  2. E. Taskesen, bnlearn for Python library.
  3. E. Taskesen, , Towards Data Science (TDS), May 2026
ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x