Home Artificial Intelligence Who will win IPL 2023?? Data Where is the code? 1. Data cleansing and formatting 2. Exploratory data evaluation 3. Feature engineering and selection 4. Compare several machine learning models on a performance metric 5. Performing hyperparameter tuning on the perfect model 6. Evaluate the perfect model on the testing set In accordance with this model, Rajasthan Royals is prone to win IPL 2023

Who will win IPL 2023?? Data Where is the code? 1. Data cleansing and formatting 2. Exploratory data evaluation 3. Feature engineering and selection 4. Compare several machine learning models on a performance metric 5. Performing hyperparameter tuning on the perfect model 6. Evaluate the perfect model on the testing set In accordance with this model, Rajasthan Royals is prone to win IPL 2023

1
Who will win IPL 2023??
Data
Where is the code?
1. Data cleansing and formatting
2. Exploratory data evaluation
3. Feature engineering and selection
4. Compare several machine learning models on a performance metric
5. Performing hyperparameter tuning on the perfect model
6. Evaluate the perfect model on the testing set
In accordance with this model, Rajasthan Royals is prone to win IPL 2023

IPL, one of the vital distinguished cricketing events on the earth with over 400 million viewers across the globe has proven to be certainly one of the mega-events.

IPL 2023 is in full swing on the halfway stage of the league, the points table is evenly poised and teams are aiming towards the highest 4 spots. Now we have witnessed some nerve-cracking finishes throughout this season. Having 40 matches accomplished and 30 to go as I write this, the query on everyone’s mind now’s “Who will win IPL 2023?”

I intended to make use of machine learning to forecast this 12 months’s champions in the beginning of the season, but owing to unexpected events, the duty was postponed until roughly the midway point of the tournament. At this point, I made the choice to make use of the information to try and forecast the outcomes of subsequent games.

So, brace yourself as I take you on the journey to predict who will probably be crowned champions of this 12 months

: Please note that you need to not use these results to position bets. I created this as a straightforward mathematical exercise to raised grasp the capabilities of ML and my passion for the sport.

I obtained the information for the years 2008–2022 here. Nevertheless, the information for the 2023 season till match number 40 needed to be extracted from Cricbuzz’s website. I manually created a worksheet to populate all of the columns whose data were present on multiple web pages. And in addition created testing data for the upcoming matches.

I followed the overall machine learning workflow step-by-step:

  1. Data cleansing and formatting.
  2. Exploratory data evaluation.
  3. Feature engineering and selection.
  4. Compare several machine learning models on a performance metric.
  5. Perform hyper-parameter tuning on the perfect model.
  6. Evaluate the perfect model on the testing set.
  7. Interpret the model results.
  8. Draw conclusions and document work.

Without much ado, let’s start. The entire project on Kaggle may be found here.

I began by loading the csv data from Kaggle and likewise the information that I prepared. Here’s a snapshot of the information

There have been some null values within the City column which I identified are for the games that were played in Dubai within the 12 months 2021 which I updated. Other columns had nulls for legitimate reasons

The subsequent step was to explore the information and fetch some insights.

I began by first fetching details of previous season winners. ( To be sure all the things is correct)

This data looks accurate. Next was to know which team has essentially the most variety of wins within the history of IPL

As we are able to say Mumbai Indians are on the topmost position. One thing I noticed here was that the brand new team names are usually not updated in the information ( We will see Sunrisers Hyderabad and Deccan Charges which is same team, renamed) Hence I fixed this before proceeding ahead

Up next, we see which city has hosted essentially the most matches

Mumbai has hosted the best variety of matches with 3 different stadiums in town. Next, we see which player has won essentially the most variety of Player of the Match award

Ab de Villers with no surprise has won essentially the most variety of awards within the history of IPL followed by Gayle and Warner.

Up then, I analyzed teams’ inclination toward toss decisions in IPL and located that in around 64% of matches, the team has decided to field first But I desired to explore the trend for each Venue

As we are able to see here, fielding has been essentially the most chosen decision especially in venues like at Mumbai and Bangalore indicating it’s chasing-friendly. Nevertheless, we are able to see the trend is otherwise in Chennai being high on batting first decisions.

This might be crucial a part of the machine learning workflow. Because the algorithm is completely depending on how we feed data into it, feature engineering needs to be given the topmost priority for each machine learning project.

Correlation evaluation was performed to investigate after which the features which won’t assist in the predictions of subsequent matches were dropped.

The subsequent step was to encode the information as any machine learning algorithm only understands data in a numerical format, unlike the explicit data that now we have. various encoding methods were explored corresponding to one hot, Label and Binary encoders(more details ) and I made a decision to go together with the Binary encoder as the information that I actually have a lot of categories and it’s nominal.

Also, I separated training and test sets with 70% and 30% in training and validation sets respectively.

As the issue is a supervised learning category with a classification task. I explored various Classification Algorithms. Then, I used Logistic Regression, Support Vector Machines, Random Forests, and Decision Tress for training the model.

Support Vector Machines outperformed all of the models with 73% training accuracy and 57% testing accuracy

Hyperparameter tuning is the means of choosing the optimal set of hyperparameters for a machine learning algorithm to enhance its performance on a given dataset. Hyperparameters are parameters that are usually not learned in the course of the training process but as an alternative are set before training begins.

Hyperparameter tuning is completed since the default hyperparameters of a machine learning algorithm is probably not optimal for a selected dataset, and subsequently may end in suboptimal performance. By tuning the hyperparameters, we are able to find the perfect combination of hyperparameters for a given problem, which may result in significant improvements in model accuracy and generalization.

I used GridSearchCv to perform hyperparameter tuning. It takes a set of hyperparameters and their possible values and evaluates them using cross-validation to find out which combination of hyperparameters produces the perfect model performance.

Now comes the trickier part to predict the outcomes of the brand new matches. For this, I loaded the information that I prepared from Game 41 to 70 including the knockout and performed the identical encoding on this to fetch to my model.

I faced challenges on this because the model has not seen the test data and there have been less variety of categories in it in comparison to the training data. Thus, I needed to make it consistent with the training data to have the model predict the outcomes of the sport.

I then inputted data from all of the league game matches and Vola, I had the prediction results of each league match!

At this stage, I actually have prediction results for all of the league stage matches. I used these results together with the points table data currently (till Match number 40) to discover the highest 4 teams.

The 4 teams that progressed to knockout games are

  1. Gujarat Titans -18 points
  2. Chennai Superkings- 18 points
  3. Rajasthan Royals-16 points
  4. Lucknow SuperGiants- 16 points

IPL follows the below fixtures for its knockout games. Participants of a game is predicted based on the winner of a previous game, Hence I needed to make predictions of every game

I entered the information that I actually have for Qualifier 1 game( Venue, Team1 and Team2) and asked my model to predict the winner

Then I performed the identical task for Eliminator and Qualifier 2 game to predict the opposite Finalist

Here I actually have the 2 finalists: Rajasthan Royals and Chennai Superkings

Then, the model predicted the winner of the finals game

This text and project is an indication of how machine learning may be used to predict the end result of the ends in the sports domain. Nevertheless, there are lots of assumptions made on this project, it was my try and club together my passion for data and cricket.

  1. This model has a testing accuracy of 57% which may be made higher with other techniques. To search out results of few matches were saddening(especially being an RCB fan)
  2. More data may be added which might help the model to make higher predictions.
  3. Further evaluation may be made to know where the model is predicting mistaken and take a look at to rectify that
  4. Another encoding method may be explored that suits this data well

Any suggestions/ improvements are most welcome!

Cheers!

Aditya Bharadwaj

References

  1. https://towardsdatascience.com/icc-2019-cricket-world-cup-prediction-using-machine-learning-7c42d848ace1
  2. https://towardsdatascience.com/lstm-recurrent-neural-networks-how-to-teach-a-network-to-remember-the-past-55e54c2ff22e

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here