How I Won at Italian Fantasy Football ⚽ Using Machine Learning

Cracking the code of Fantacalcio through the ability of AI

As a mechanical engineer with a keen interest in programming and computer science, I became fascinated by the world of and artificial intelligence just a few years ago. Recognizing their potential across various engineering disciplines, I launched into a journey to review machine learning. Nevertheless, despite acquiring theoretical knowledge, I struggled to search out practical ways to use and practice my newfound skills. While ready-made datasets were available, they didn’t provide the entire experience of collecting and processing data. Then, a thought occurred to me: why not apply machine learning to assist me win at ?

Introduction to Fantacalcio

is a highly popular game amongst Italian football fans. Participants form groups and compete all year long based on the performances of real players in , the highest Italian football league. Prior to the beginning of the season, participants hold an auction to draft their rosters of greater than 20 players. After each Serie A matchday, players receive votes based on their performance, with additional bonuses for goals and assists. These accrued votes and bonuses determine the participants’ scores. One in every of the crucial features of the sport is choosing a weekly lineup of players, and making decisions on who to play recurrently and who to bench.

Aim of my work

The first objective of my machine learning algorithm could be to predict the vote and fanta-vote (vote plus bonus) of Serie A players based on their team’s match. Football is an inherently uncertain game, because it is unattainable to ensure whether a player will rating or not. Nevertheless, certain players have a better likelihood of scoring in comparison with others, and their performance can vary based on the team they’re up against. My goal was to search out an objective method for determining which player had a better probability of delivering a stronger performance on any given Serie A matchday.

Disclaimer: sections like this shall be utilized in the article to supply real-case examples from Fantacalcio, as an instance the concepts discussed. Should you aren’t acquainted with the sport or Serie A players, be happy to skip these sections.

Preview of the algorithm results for predicting the performance of players in a Fantacalcio lineup. Image by writer.

Gathering and processing the information

Once I downloaded the archive of votes from Fantacalcio, the following step was to gather a comprehensive set of features to coach the machine learning algorithm. To construct this dataset, I discovered to be a useful resource, providing a convenient means to scrape statistics for each Serie A players and teams. The positioning offered an in depth range of meticulously compiled statistics, encompassing various metrics akin to expected goals, tackles, passes, and average variety of probabilities created. The abundance of detailed data available on FBRef greatly facilitated the technique of assembling a strong feature set for training the machine learning algorithm.

Table showing a number of the players stats available. Source: FBref.

Table showing a number of the teams stats available. Source: FBref.

The approach I took involved constructing a dataset comprising of greater than 50 features for every player. This dataset combined the processed average statistics of the player, merged with their team’s stats and the stats of the opposing team for a given matchday. The goal outputs for every row of the dataset were the player’s vote and fanta-vote. To construct the dataset, I considered the last three seasons of Serie A.

To handle the challenge of unreliable statistics for players with limited game time within the season, I employed three strategies:

Weighted averaging with the previous season’s stats.
Within the absence of reliable historical data, the player’s stats were averaged with those of the typical player in an analogous role.
I used a predefined list to partially average a player’s stats with those of a previous player from the identical team who played an analogous role.

As an illustration, the performance of Napoli rookie Kim is perhaps in comparison with the previous performance of Koulibaly, or Thauvin’s performance might be assessed in relation to his predecessor, Deulofeu (but this turned out to be incorrect).

Scheme representing each dataset row. The player’s vote and fanta-vote are the goal outputs, while all the opposite stats, and residential aspects, are merged into the features set. Image by writer.

Definition and training of the algorithm

With a purpose to make things more interesting and results nicer to visualise, the machine learning algorithm was designed to transcend easy vote and fanta-vote predictions. As an alternative, a was adopted, leveraging TensorFlow and TensorFlow Probability to construct a neural network able to generating a probability distribution. Specifically, the network predicted the parameters of a sin-arcsinh probability distribution. This alternative was made to account for the inherent skewness within the distribution of player performance vote. As an illustration, within the case of an offensive player, although their average fanta-vote could also be around 6.5, the algorithm recognized that a vote of 10 (indicating an exceptional performance, akin to scoring a goal) could be far more more likely to occur than a vote of 4 (representing a rare subpar performance).

Figure Sinh-Arcsinh probability distributions with different parameters. Source on: ResearchGate.

The architecture employed for this task comprised multiple dense layers, each utilizing the sigmoid activation function. To stop overfitting and enhance generalization, regularization techniques akin to Dropout and Early Stopping were used. Dropout randomly disables a fraction of neural network units during training, while Early Stopping halts the training process if the validation loss ceases to enhance. The chosen loss function for training the model was Negative Log Likelihood, which measures the discrepancy between the expected probability distribution and the actual outcomes.

A snippet of the code written for constructing the neural network is shown here:

callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience = 10)
neg_log_likelihood = lambda x, rv_x: -rv_x.log_prob(x)inputs = tfk.layers.Input(shape=(X_len,), name="input")
x = tfk.layers.Dropout(0.2)(inputs)
x = tfk.layers.Dense(16, activation="relu") (x)
x = tfk.layers.Dropout(0.2)(x)
x = tfk.layers.Dense(16, activation="relu") (x)
prob_dist_params = 4
def prob_dist(t): 
return tfp.distributions.SinhArcsinh(loc=t[..., 0], scale=1e-3 + tf.math.softplus(t[..., 1]), skewness = t[..., 2], 
tailweight = tailweight_min + tailweight_range * tf.math.sigmoid(t[..., 3]),
allow_nan_stats = False)
x1 = tfk.layers.Dense(8, activation="sigmoid")(x)
x1 = tfk.layers.Dense(prob_dist_params, activation="linear")(x1)
out_1 = tfp.layers.DistributionLambda(prob_dist)(x1)
x2 = tfk.layers.Dense(8, activation="sigmoid")(x)
x2 = tfk.layers.Dense(prob_dist_params, activation="linear")(x2)
out_2 = tfp.layers.DistributionLambda(prob_dist)(x2)
modelb = tf.keras.Model(inputs, [out_1, out_2])
modelb.compile(optimizer=tf.keras.optimizers.Nadam(learning_rate = 0.001), 
loss=neg_log_likelihood)
modelb.fit(X_train.astype('float32'), [y_train[:, 0].astype('float32'), y_train[:, 1].astype('float32')], 
validation_data = (X_test.astype('float32'), [y_test[:, 0].astype('float32'), y_test[:, 1].astype('

Employing the neural network for predictions

The trained algorithm offered probability distribution predictions for a player’s vote and fanta-vote. By considering the player’s averaged stats, team information, opponent data, and residential/away aspects, it was able to predicting the player’s performance for future Serie A matches. Through post-processing of the probability distributions, an expected numeric vote prediction and a maximum potential vote might be derived, simplifying the decision-making for lineup selections in Fantacalcio.

Plot showing player vote (blue) and fanta-vote (green) probability distributions. Image by writer.

Using Monte Carlo technique, the probability distributions of every player were employed to predict the expected total vote of a lineup. The Monte Carlo method involves running multiple random simulations to estimate potential outcomes. And that’s it! I had all of the tools that allowed me to decide on the from my Fantacalcio roster for every Serie A matchday.

Plot showing lineup points probability distribution, obtained through Monte-Carlo simulation. Image by writer.

Where the algorithm succeded

As a further metric, I compared the expected votes with my very own subjective expectations and located the outcomes satisfying. The algorithm proved particularly effective within the Fantacalcio variant, which involves players assuming multiple roles much like real football, starting from central backs and full-backs to wingers and strikers. Choosing the optimal lineup from the available modules presented a posh challenge, because it wasn’t all the time the case that offensive players outperformed defensive ones.

Moreover, through the use of the algorithm to predict choosing a statistically average Serie A team as an opponent, it was useful in preparing for the January market auction. It enabled me to discover undervalued players who can have been underestimated by popular opinion.

Players like El Shaarawy and Orsolini are notable examples of players who performed exceptionally well within the later stages of the Serie A season. The algorithm predicted their expected performance to be at the extent of other top midfielders already by the tip of January.

Table showing votes predictions generated for a Fantacalcio roster, in certainly one of the Serie A matchdays. Image by writer.

Where it failed or might be improved

The algorithm’s weak point lays in predicting the performance of goalkeepers. A separate neural network was developed, utilizing different features and adding clean sheet probability as an output. Nevertheless, the outcomes weren’t as satisfying, likely on account of the limited variety of goalkeepers (just one per team) in comparison with outfield players. This resulted in a less diverse dataset, increasing the danger of overfitting.

Moreover, the algorithm considered only the typical stats of every player throughout the season. While this approach was sufficient, incorporating data from the player’s previous two or three matches leading as much as a given matchday could enhance the algorithm’s ability to account for his or her current form. This could provide a more comprehensive assessment of the player’s recent performance.

The entire work in public

You’ll find the code written for this project, in addition to the outcomes generated for several Serie A matchdays on . I plan to make further improvements for the following season every time time permits. If you’ve gotten any questions or need clarification, be happy to contact me.

How I Won at Italian Fantasy Football ⚽ Using Machine Learning

Cracking the code of Fantacalcio through the ability of AI

Introduction to Fantacalcio

Aim of my work

Gathering and processing the information

Definition and training of the algorithm

Employing the neural network for predictions

Where the algorithm succeded

Where it failed or might be improved

The entire work in public

What are your thoughts on this topic?
Let us know in the comments below.

3 COMMENTS

Share this article

Recent posts

Python Concurrency — A Brain-Friendly Guide for Data Professionals

AI in Finance and Its Impact on Worker Retention

AI’s Growing Power Needs: Tech Industry’s Move Towards Nuclear Power

“Human Intelligence Created”… Human Intelligence Challenge Spreads Against ‘Made by AI’

What We Still Don’t Understand About Machine Learning

How I Won at Italian Fantasy Football ⚽ Using Machine Learning

Cracking the code of Fantacalcio through the ability of AI

Introduction to Fantacalcio

Aim of my work

Gathering and processing the information

Definition and training of the algorithm

Employing the neural network for predictions

Where the algorithm succeded

Where it failed or might be improved

The entire work in public

What are your thoughts on this topic? Let us know in the comments below.

3 COMMENTS

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.