This text is all about using the optimization techniques that we’ve discussed in our part-1 on a dataset with different algorithms. Finally we compare the various metrics of those algorithms after cross validating with the test data. This text gives you numerous insights on how you can play with the hyper parameter tuning.
Within the previous article we’ve seen the theoritical concept on the optimization techniques like GridSearchCV, RandomSearchCV, Bayesian Optimizer
etc. In this text we are going to take a dataset and begin applying these Optimizers and see how the outcomes vary between them.
for our research purpose we are going to consider a heart care
dataset, by which the aim is to predict the chance of heart attack.
import warnings
warnings.filterwarnings('ignore')import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score
def load_data(file_path):
"""Load data from CSV file."""
data = pd.read_csv(file_path)
X = data.drop('cardio', axis=1)
y = data['cardio']
return X, y
def split_data(X, y):
"""Split data into training and test sets."""
scaler = StandardScaler()
X = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
return X_train, X_test, y_train, y_test
Now lets write two different methods for two optimization algorithms GridSearchCV & RandomizedSearchCV
as below
def train_model_using_gridsearchcv(model, X_train, y_train, param_grid):
"""Train a machine learning model using GridSearchCV for hyperparameter tuning."""
grid_search = GridSearchCV(model, param_grid, cv=5, scoring='accuracy', n_jobs=-1)
grid_search.fit(X_train, y_train)
return grid_search.best_estimator_def train_model_using_randomsearchcv(model, X_train, y_train, param_grid):
"""Train a machine learning model using RandomizedSearchCV for hyperparameter tuning."""
random_search = RandomizedSearchCV(model, param_grid, cv=5, scoring='accuracy', n_jobs=-1)
random_search.fit(X_train, y_train)
return random_search.best_estimator_
def evaluate_model(model, X_test, y_test):
"""Evaluate a machine learning model on the test set."""
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
return accuracy, precision, recall, f1
Now we’ve to run the code through the use of a drive code as below
def run_models(file_path):
"""Load data, split into training and test sets, train and evaluate multiple models."""
X, y = load_data(file_path)
X_train, X_test, y_train, y_test = split_data(X, y)models = [
('Random Forest', RandomForestClassifier(), {'n_estimators': [5, 10, 20], 'max_depth': [None, 10, 20]}),
('AdaBoost', AdaBoostClassifier(), {'n_estimators': [5, 10, 20], 'learning_rate': [0.1, 0.5, 1.0]}),
('Gradient Boosting', GradientBoostingClassifier(), {'n_estimators': [5, 10, 20], 'learning_rate': [0.1, 0.5, 1.0], 'max_depth': [None, 10, 20]}),
('K-Nearest Neighbors', KNeighborsClassifier(), {'n_neighbors': [3, 5, 7], 'weights': ['uniform', 'distance']})
]
for name, model, param_grid in models:
print(f'Training {name} using GridSearchCV')
best_model = train_model_using_gridsearchcv(model, X_train, y_train, param_grid)
accuracy, precision, recall, f1 = evaluate_model(best_model, X_test, y_test)
print(f'Accuracy: {accuracy:.4f}, Precision: {precision:.4f}, Recall: {recall:.4f}, F1-Rating: {f1:.4f}')
for name, model, param_grid in models:
print(f'Training {name} using RandomisedSearchCV')
best_model = train_model_using_randomsearchcv(model, X_train, y_train, param_grid)
accuracy, precision, recall, f1 = evaluate_model(best_model, X_test, y_test)
print(f'Accuracy: {accuracy:.4f}, Precision: {precision:.4f}, Recall: {recall:.4f}, F1-Rating: {f1:.4f}')
file_path = '/.csv'
run_models(file_path=file_path)
Once the training is accomplished and we start comparing the outcomes here is the output you may see, output can vary based in your machine configuration also.
In conclusion, optimization of machine learning algorithms is crucial for obtaining the very best possible performance from the models. Hyperparameter tuning using techniques like GridSearchCV
and RandomizedSearchCV
may help discover the very best hyperparameters for a specific model, resulting in improved accuracy and generalization performance. Feature selection and engineering may also improve model performance by reducing the noise in the info and highlighting a very powerful features. Nevertheless, optimizing machine learning models requires a careful balance between bias and variance, and overfitting have to be avoided. Finally, the optimization process will be time-consuming, and it might be needed to think about the available hardware resources when choosing optimization techniques. By following these best practices, we will improve the accuracy and reliability of our machine learning models and enable them to make higher predictions on recent data.
credits to my friends Ajani and Raj who helped me in understand these concepts more higher way.
calming music
sabun büyüsü yaptırmak bozmak isteyenler için http://www.medyumnazar.com