Home Artificial Intelligence Beginner’s Guide to Machine Learning and Power BI: Constructing a Lead Scoring Dashboard

Beginner’s Guide to Machine Learning and Power BI: Constructing a Lead Scoring Dashboard

3
Beginner’s Guide to Machine Learning and Power BI: Constructing a Lead Scoring Dashboard

Step-by-Step Tutorial for Constructing a Machine Learning-Powered Lead Scoring Dashboard with Power BI

This tutorial will walk you thru the means of constructing a lead scoring dashboard with Power BI. We’ll use a dataset of 1000 leads and construct a machine learning model with pycaret to predict the likelihood of a lead converting to a customer. We’ll then use the model to attain the leads, store the brand new lead dataset in a PostgreSQL database and visualize the ends in Power BI.

You’ll find the code for this tutorial on my GitHub here.

– Basic knowledge of Python

– Basic knowledge of Power BI

– Basic knowledge of SQL

– Basic knowledge of machine learning classification models

Lead scoring is a means of assigning a rating to a lead based on their behavior and profile. The rating is used to find out the likelihood of a lead converting to a customer. Lead scoring is a standard practice within the marketing industry and is used to prioritize leads for sales teams.

Normally lead scoring is finished manually by sales teams. Nonetheless, it’s a time-consuming process and will be error-prone. Machine learning will be used to automate the method and make it more efficient.

– Jupyter Notebook to construct the machine learning model with pycaret

– Power BI to visualise the outcomes and construct the dashboard with the info from the PostgreSQL database

– PostgreSQL to store the brand new lead dataset with the lead scores and probabilities predicted by the machine learning model

– Pycaret to construct the machine learning model, predict the lead scores and probabilities

Power BI is a business analytics service by Microsoft. It provides interactive visualizations and business intelligence capabilities with an interface easy enough for end users to create their very own reports and dashboards. It might probably be used to investigate data and share insights across your organization or embed them in your app or website.

Power BI can connect with a wide selection of knowledge sources and you need to use the built-in templates to create reports and dashboards in minutes. Power BI is an awesome tool for data visualization and dashboard constructing. It is simple to make use of and has a variety of features that make it a strong tool for data evaluation. It’s also free to make use of for people and small businesses. You’ll find more details about Power BI here.

PostgreSQL is a strong, open source object-relational database system. It has greater than 15 years of lively development and a proven architecture that has earned it a powerful repute for reliability, data integrity, and correctness. It’s fully ACID compliant, has full support for foreign keys, joins, views, triggers, and stored procedures (in multiple languages).

It includes most SQL:2008 data types, including INTEGER, NUMERIC, BOOLEAN, CHAR, VARCHAR, DATE, INTERVAL, and TIMESTAMP. It also supports storage of binary large objects, including pictures, sounds, or video. It has native programming interfaces for C/C++, Java, .Net, Perl, Python, Ruby, Tcl, ODBC, amongst others, and exceptional documentation.

It runs on all major operating systems, including Linux, UNIX (AIX, BSD, HP-UX, SGI IRIX, Mac OS X, Solaris, Tru64), and Windows. It’s free to make use of and you will discover more details about PostgreSQL here.

Pycaret is an open source, low-code machine learning library in Python that permits you to go from preparing your data to deploying your model inside minutes in your selection of notebook environment.

It’s an end-to-end machine learning library that automates machine learning workflows. It is straightforward, easy to make use of and fast. It’s free to make use of and you will discover more details about pycaret here.

The dataset utilized in from Kaggle and will be found here . We’re supplied with a historical dataset which lists the assorted leads, their respective attributes resembling the lead source or website usage data, and a label of whether or not that lead has converted up to now. Here below is a preview of the dataset:

Step one is to import the obligatory libraries and cargo the dataset right into a pandas dataframe. We’ll then check the form of the dataset and the primary 5 rows to get a greater understanding of the info. We can even check the info sorts of the columns and the variety of missing values in each column.

We’ll apply basic cleansing to drop the columns that are usually not relevant to the evaluation and drop the rows with missing values. We can even check for duplicates and drop them if obligatory. Finally, we’ll apply a basic Eda to get a greater understanding of the info.

The cleansing notebook will be found here. Below I’ll show below only the relevant code snippets from the notebook.

The code snippet below shows how one can drop the columns that are usually not relevant to the evaluation.

df.drop(['How did you hear about X Education', 'What is your current occupation', 'What matters most to you in choosing a course', 'Tags', 'Lead Quality'], axis=1, inplace=True)

The code snippet below shows how one can drop the rows with missing values and reset the index.

# drop rows with null values

df.dropna(inplace=True)

# reset the index

df.reset_index(drop=True, inplace=True)

The updated dataset has 6173 rows and 27 columns.

The code snippet below shows how one can visualize the Lead Origin and City columns and the count of Converted and never Converted. The converted has been converted to a binary variable (1 for converted and 0 for not converted).

# value counts bar chart of column 'Lead Origin'

df['Lead Origin'].value_counts().plot.bar(figsize=(12,6))

# value counts bar chart of column 'City'

df['City'].value_counts().plot.bar(figsize=(12,6))

python
# value counts bar chart of column 'Converted'

df['Converted'].value_counts().plot.bar(figsize=(12,6))

Now that we’ve cleaned the dataset, we will move on to the subsequent step, we’ll save the cleaned dataset to an excel file after which we’ll construct the model using pycaret.

# save the cleaned data to a recent excel file

df.to_excel('C:/Users/marce/Desktop/Python/pythoncheatsheets/python_projects/Lead_scoring/archive/Lead_Scoring_cleaned.xlsx', index=False)

Now that we’ve cleaned the dataset, we will move on to the subsequent step, we’ll construct the model using pycaret. The code snippet below shows how one can load the cleaned dataset right into a pandas dataframe after which we’ll construct the model using pycaret. The notebook will be found here.

# import the obligatory packages

import pandas as pd
import numpy as np
from pycaret.classification import * #that is the pycaret classification module imported

# load the cleaned dataset

df_historical = pd.read_excel('C:/Users/marce/Desktop/Python/pythoncheatsheets/python_projects/Lead_scoring/archive/Lead_Scoring_cleaned.xlsx')

The code snippet below shows how one can setup the pycaret pipeline. We’ll set the goal variable to Converted, we’ll ignore the Prospect_ID column, we’ll bin the numeric features, we’ll normalize the info and take away multicollinearity. We can even set the session_id to 123 and use the GPU. The setup function will return a pycaret environment which we’ll save within the variable exp_1.

# initialize setup

exp_1 = setup(data = df_historical, goal = 'Converted', session_id=123,
ignore_features=['Prospect_ID'], bin_numeric_features=['TotalVisits', 'Total_Time_Spent_on_Website', 'Page_Views_Per_Visit'],
normalize=True, remove_multicollinearity=True, use_gpu=True)

The code snippet below shows how one can compare the models. We’ll compare the models using the AUC metric and we’ll use 10 fold cross validation. The compare_models function will return a pandas dataframe with the outcomes of the comparison. We’ll save the ends in the variable top3.

best_model = compare_models(n_select=5, sort='AUC')

From the outcomes of the comparison, we will see that the logistic regression model has the very best AUC rating. We’ll now create the logistic regression model. The code snippet below shows how one can create the logistic regression model. We’ll set the fold to 10 and we’ll use the AUC metric. The create_model function will return a trained model which we’ll save within the variable lr.

lr = create_model('lr')

The code snippet below shows how one can create and test an ensemble model of logistic regression. We’ll set the fold to 10 and we’ll use the AUC metric. The ensemble_model function will return a trained model which we’ll save within the variable bagged_lr.

bagged_lr = ensemble_model(lr, method='Bagging', round=2, choose_better=True, optimize='AUC')

The code snippet below shows how one can plot the model. We’ll plot the model using the AUC metric. The plot_model function will return a plot which we’ll save within the variable plot_lr.

# Plot the model based on AUC

plot_model(bagged_lr, plot='auc')

Taking a look at the feature importance of the model, Total Time Spent on Website, DO not email, Last Activity, Lead Source, are the important thing aspects that outline the success of a conversion, due to this fact we’ll create a recent column witht the lead scoring of Hot Lead or Cold lead based on the time spent on the web site later.

The code snippet below shows how one can predict the model on the test data. We’ll use the predict_model function. The predict_model function will return a pandas dataframe with the predictions which we’ll save within the variable test_pred.

test_pred = predict_model(bagged_lr)

The code snippet below shows how one can predict the model on recent data/unseen data. We’ll use the predict_model function. The predict_model function will return a pandas dataframe with the predictions which we’ll save within the variable predictions.

# Import the unseen data csv file

unseen_data = pd.read_excel('C:/Users/marce/Desktop/Python/pythoncheatsheets/python_projects/Lead_scoring/archive/Lead_new.xlsx')

## unseen_predictions

unseen_predictions_new = predict_model(final_bagged_lr, data=unseen_data)

unseen_predictions_new.head(5)

The code snippet below shows how one can create a recent column with the lead scoring of Hot Lead or Cold lead based on the time spent on the web site. We’ll use the numpy.where function. The numpy.where function will return a pandas dataframe with a column named Lead_Score.

# Create the scoring lead for the brand new unseen data and drop the Label column (Predicted Column) on the brand new data

unseen_predictions_new_BI = unseen_predictions_new.copy()

# so the parameters to attain the leads between cold and warm are: Total time Spent on Site >= 600

unseen_predictions_new_BI['Lead_Score'] = np.where(unseen_predictions_new_BI['Total_Time_Spent_on_Website'] >= 600, 'Hot', 'Cold')

unseen_predictions_new_BI.head(5)

Now we clean the dataset before saving it into an excel and sql file: drop the column label not useful for our dashboard, we raname ‘Rating’ column into ‘Pred_Prob’ and the ‘Lead_Score’ into Lead_Category.

unseen_predictions_new_BI.drop(columns='Label', inplace=True, axis=1)

unseen_predictions_new_BI.rename(columns={'Rating': 'ML_Pred_Proba'}, inplace=True)

unseen_predictions_new_BI.rename(columns={'Lead_Score': 'Lead_Category'}, inplace=True)

unseen_predictions_new_BI.head(5)

**Save the info into an excel file**

unseen_predictions_new_BI.to_excel('new_leads_BI.xlsx', index=None)

Now we’ll store the info right into a SQL database. If you would like to know more about how one can install and configure PostgreSQL, you may check the next article . The complete code is on the market within the my GitHub repository.

We’ll first get the file excel right into a pandas dataframe. Then we’ll create a connection to the database. We’ll then create a cursor and we’ll execute the SQL query to create the table. See the code below:

'''
Upload the brand new leads data to postgresql database using psycopg2 and pandas
'''

import pandas as pd
import numpy as np
import psycopg2

# read the excel file and convert it to a dataframe

df = pd.read_excel('C:/Users/marce/Desktop/Python/pythoncheatsheets/python_projects/Lead_scoring/data_powerBI/new_leads_BI.xlsx')

# connect with the database

conn = psycopg2.connect(
host = 'localhost',
dbname = 'Leads',
user = 'postgres',
password = 'LXXXX',
port = '5432')
```
# Now we'll create a cursor and we'll execute the SQL query to create the table.

# Create a cursor object
cur = conn.cursor()

# Generate the SQL statement to insert the info

table_name = 'new_leads' # name of the table to be created
cols = [] # list of columns
for col, dtype in zip(df.columns, df.dtypes): # iterate over the columns and their data types
if dtype == 'int64':
cols.append(f'{col} INTEGER')
elif dtype == 'float64':
cols.append(f'{col} FLOAT')
else:
cols.append(f'{col} VARCHAR(255)')
cols = ', '.join(cols)

placeholders = ', '.join(['%s'] * len(df.columns)) # create placeholders for the values to be inserted

create_table = f"CREATE TABLE IF NOT EXISTS {table_name} ({cols})" # create the table
insert_data = f"INSERT INTO {table_name} VALUES ({placeholders})" # insert the info

# Execute the SQL statements
cur.execute(create_table)
for i, row in df.iterrows():
cur.execute(insert_data, tuple(row))

# Commit the changes and shut the cursor
conn.commit()
cur.close()

# Close the connection
conn.close()

Now we’ll connect the PostgreSQL with the table and create the PowerBI dashboard. If you would like to know more about how one can install and configure PowerBI, you may check the next article. The PowerBI file is on the market in my GitHub repository.

To attach the PostgreSQL table with PowerBI, we’ll follow the steps below:

1) Create a recent data source.

2) Select the PostgreSQL database.

3) Select the table we wish to attach.

4) Select the columns we wish to attach.

5) Select the info kind of the columns.

6) Select the connection mode.

7) Select the authentication mode.

8) Select the server name.

9) Select the database name, user name, password and the port number.

After following the steps above, the brand new leads data can be connected to the PowerBI dashboard and it appears as follows under the table section.

In this text, we’ve seen how one can create a machine learning model to predict the lead scoring of latest leads. Now we have seen how one can create a PowerBI dashboard to visualise the outcomes of the machine learning model.

This approach clearly informs our hypothetical sales team concerning the quality of incoming leads and enables them to follow up on the ‘really’ hot leads.

I hope this inspires and encourages you to try it out for yourself. If you will have any questions or comments, please be happy to depart them below. I can be completely happy to reply them.

I write about data science, python coding projects and data driven marketing. I also provide data and business mentorship for data novice or data entry level people.

You possibly can follow me on Medium, and Twitter, or visit my website and Github page.

3 COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here