Home Artificial Intelligence The most effective optimization techniques to make ML model out perform — Part-2

The most effective optimization techniques to make ML model out perform — Part-2

2
The most effective optimization techniques to make ML model out perform — Part-2

This text is all about using the optimization techniques that we’ve discussed in our part-1 on a dataset with different algorithms. Finally we compare the various metrics of those algorithms after cross validating with the test data. This text gives you numerous insights on how you can play with the hyper parameter tuning.

credits: Web

Within the previous article we’ve seen the theoritical concept on the optimization techniques like GridSearchCV, RandomSearchCV, Bayesian Optimizer etc. In this text we are going to take a dataset and begin applying these Optimizers and see how the outcomes vary between them.

for our research purpose we are going to consider a heart care dataset, by which the aim is to predict the chance of heart attack.

import warnings
warnings.filterwarnings('ignore')

import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier
from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score

def load_data(file_path):
"""Load data from CSV file."""
data = pd.read_csv(file_path)
X = data.drop('cardio', axis=1)
y = data['cardio']
return X, y

def split_data(X, y):
"""Split data into training and test sets."""
scaler = StandardScaler()
X = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
return X_train, X_test, y_train, y_test

Now lets write two different methods for two optimization algorithms GridSearchCV & RandomizedSearchCV as below

def train_model_using_gridsearchcv(model, X_train, y_train, param_grid):
"""Train a machine learning model using GridSearchCV for hyperparameter tuning."""
grid_search = GridSearchCV(model, param_grid, cv=5, scoring='accuracy', n_jobs=-1)
grid_search.fit(X_train, y_train)
return grid_search.best_estimator_

def train_model_using_randomsearchcv(model, X_train, y_train, param_grid):
"""Train a machine learning model using RandomizedSearchCV for hyperparameter tuning."""
random_search = RandomizedSearchCV(model, param_grid, cv=5, scoring='accuracy', n_jobs=-1)
random_search.fit(X_train, y_train)
return random_search.best_estimator_

def evaluate_model(model, X_test, y_test):
"""Evaluate a machine learning model on the test set."""
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
return accuracy, precision, recall, f1

Now we’ve to run the code through the use of a drive code as below

def run_models(file_path):
"""Load data, split into training and test sets, train and evaluate multiple models."""
X, y = load_data(file_path)
X_train, X_test, y_train, y_test = split_data(X, y)

models = [
('Random Forest', RandomForestClassifier(), {'n_estimators': [5, 10, 20], 'max_depth': [None, 10, 20]}),
('AdaBoost', AdaBoostClassifier(), {'n_estimators': [5, 10, 20], 'learning_rate': [0.1, 0.5, 1.0]}),
('Gradient Boosting', GradientBoostingClassifier(), {'n_estimators': [5, 10, 20], 'learning_rate': [0.1, 0.5, 1.0], 'max_depth': [None, 10, 20]}),
('K-Nearest Neighbors', KNeighborsClassifier(), {'n_neighbors': [3, 5, 7], 'weights': ['uniform', 'distance']})
]

for name, model, param_grid in models:
print(f'Training {name} using GridSearchCV')
best_model = train_model_using_gridsearchcv(model, X_train, y_train, param_grid)
accuracy, precision, recall, f1 = evaluate_model(best_model, X_test, y_test)
print(f'Accuracy: {accuracy:.4f}, Precision: {precision:.4f}, Recall: {recall:.4f}, F1-Rating: {f1:.4f}')

for name, model, param_grid in models:
print(f'Training {name} using RandomisedSearchCV')
best_model = train_model_using_randomsearchcv(model, X_train, y_train, param_grid)
accuracy, precision, recall, f1 = evaluate_model(best_model, X_test, y_test)
print(f'Accuracy: {accuracy:.4f}, Precision: {precision:.4f}, Recall: {recall:.4f}, F1-Rating: {f1:.4f}')

file_path = '/.csv'
run_models(file_path=file_path)

Once the training is accomplished and we start comparing the outcomes here is the output you may see, output can vary based in your machine configuration also.

metrics comparison of various ML algos, performed using different optimization techniques

In conclusion, optimization of machine learning algorithms is crucial for obtaining the very best possible performance from the models. Hyperparameter tuning using techniques like GridSearchCV and RandomizedSearchCV may help discover the very best hyperparameters for a specific model, resulting in improved accuracy and generalization performance. Feature selection and engineering may also improve model performance by reducing the noise in the info and highlighting a very powerful features. Nevertheless, optimizing machine learning models requires a careful balance between bias and variance, and overfitting have to be avoided. Finally, the optimization process will be time-consuming, and it might be needed to think about the available hardware resources when choosing optimization techniques. By following these best practices, we will improve the accuracy and reliability of our machine learning models and enable them to make higher predictions on recent data.

credits to my friends Ajani and Raj who helped me in understand these concepts more higher way.

2 COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here