When You Just Can’t Settle on a Single Motion

In Game Theory, the players typically need to make assumptions in regards to the other players’ actions. What is going to the opposite player do? Will he use rock, paper or scissors? You never know, but in some cases, you may have an idea of the probability of some actions being higher than others. Adding such a notion of probability or randomness opens up a brand new chapter in game theory that lets us analyse more complicated scenarios.

This text is the third in a four-chapter series on the basics of game theory. For those who haven’t checked out the primary two chapters yet, I’d encourage you to try this to change into accustomed to the essential terms and ideas utilized in the next. For those who feel ready, let’s go ahead!

Mixed Strategies

To the most effective of my knowledge, soccer is all about hitting the goal, although that happens very infrequently. Photo by Zainu Color on Unsplash

To date we now have all the time considered games where each player chooses exactly one motion. Now we are going to extend our games by allowing each player to pick different actions with given , which we call a mixed strategy. For those who play rock-paper-scissors, you have no idea which motion your opponent takes, but you may guess that they select each motion with a probability of 33%, and if you happen to play 99 games of rock-paper-scissors, you may indeed find your opponent to decide on each motion roughly 33 times. With this instance, you directly see the principal the explanation why we would like to introduce probability. First, it allows us to explain games which are played multiple times, and second, it enables us to think about a notion of the (assumed) likelihood of a player’s actions.

Let me reveal the later point in additional detail. We come back to the soccer game we saw in chapter 2, where the keeper decides on a corner to leap into and the opposite player decides on a corner to aim for.

For those who are the keeper, you win (reward of 1) if you happen to select the identical corner because the opponent and also you lose (reward of -1) if you happen to select the opposite one. To your opponent, it’s the opposite way round: They win, if you happen to select different corners. This game only is smart, if each the keeper and the opponent select a corner randomly. To be precise, if one player knows that the opposite all the time selects the identical corner, they know exactly what to do to win. So, the important thing to success on this game is to decide on the corner by some random mechanism. The principal query now’s, what probability should the keeper and the opponent assign to each corners? Wouldn’t it be a great technique to select the proper corner with a probability of 80%? Probably not.

To seek out the most effective strategy, we want to seek out the Nash equilibrium, because that’s the state where no player can get any higher by changing their behaviour. Within the case of mixed strategies, such a Nash Equilibrium is described by a probability distribution over the actions, where no player wants to extend or decrease any probability anymore. In other words, it is perfect (because if it weren’t optimal, one player would really like to alter). We will find this optimal probability distribution if we consider the expected reward. As you may guess, the expected reward consists of the reward (also called utility) the players get (which is given within the matrix above) times the likelihood of that reward. Let’s say the shooter chooses the left corner with probability p and the proper corner with probability 1-p. What reward can the keeper expect? Well, in the event that they select the left corner, they will expect a reward of p*1 + (1-p)*(-1). Do you see how that is derived from the sport matrix? If the keeper chooses the left corner, there’s a probability of p, that the shooter chooses the identical corner, which is nice for the keeper (reward of 1). But with a likelihood of (1-p), the shooter chooses the opposite corner and the keeper loses (reward of -1). In a likewise fashion, if the keeper chooses the proper corner, he can expect a reward of (1-p)*1 + p*(-1). Consequently, if the keeper chooses the left corner with probability q and the proper corner with probability (1-q), the general expected reward for the keeper is q times the expected reward for the left corner plus (1-q) times the reward for the proper corner.

Now let’s take the angle of the shooter. He wants the keeper to be indecisive between the corners. In other words, he wants the keeper to see no advantage in any corner so he chooses randomly. Mathematically that implies that the expected rewards for each corners must be equal, i.e.

which might be solved to p=0.5. So the optimal strategy for the shooter to maintain the keeper indecisive is to decide on the proper corner with a Probability of p=0.5 and hence select the left corner with an equal probability of p=0.5.

But now imagine a shooter who’s well-known for his tendency to decide on the proper corner. You may not expect a 50/50 probability for every corner, but you assume he’ll select the proper corner with a probability of 70%. If the keeper stays with their 50/50 split for selecting a corner, their expected reward is 0.5 times the expected reward for the left corner plus 0.5 times the expected reward for the proper corner:

That doesn’t sound too bad, but there’s a greater option still. If the keeper all the time chooses the proper corner (i.e., q=1), they get a reward of 0.4, which is best than 0. On this case, there’s a transparent best answer for the keeper which is to favour the corner the shooter prefers. That, nevertheless, would lower the shooter’s reward. If the keeper all the time chooses the proper corner, the shooter would get a reward of -1 with a probability of 70% (since the shooter themself chooses the proper corner with a probability of 70%) and a reward of 1 within the remaining 30% of cases, which yields an expected reward of 0.7*(-1) + 0.3*1 = -0.4. That’s worse than the reward of 0 they got after they selected 50/50. Do you keep in mind that a Nash equilibrium is a state, where no player has any reason to alter his motion unless every other player does? This scenario shouldn’t be a Nash equilibrium, since the shooter has an incentive to alter his motion more towards a 50/50 split, even when the keeper doesn’t change his strategy. This 50/50 split, nevertheless, is a Nash equilibrium, because in that scenario neither the shooter nor the keeper gains anything from changing their probability of selecting the one or the opposite corner.

Fighting birds

Food could be a reason for birds to fight one another. Photo by Viktor Keri on Unsplash

From the previous example we saw, that a player’s assumptions in regards to the other player’s actions influence the primary player’s motion selection as well. If a player desires to behave rationally (and that is what we all the time expect in game theory), they’d select actions such that they maximize their expected reward given the opposite players’ mixed motion strategies. Within the soccer scenario it is kind of easy to more often jump right into a corner, if you happen to assume that the opponent will select that corner more often, so allow us to proceed with a more complicated example, that takes us outside into nature.

As we walk across the forest, we notice some interesting behaviour in wild animals. Say two birds come to a spot where there’s something to eat. For those who were a bird, what would you do? Would you share the food with the opposite bird, which implies less food for each of you? Or would you fight? For those who threaten your opponent, they could give in and you will have all of the food for yourself. But when the opposite bird is as aggressive as you, you find yourself in an actual fight and also you hurt one another. Nonetheless you may have preferred to provide in in the primary place and just leave with out a fight. As you see, the consequence of your motion relies on the opposite bird. Preparing to fight might be very rewarding if the opponent gives in, but very costly if the opposite bird is willing to fight as well. In matrix notation, this game looks like this:

A matrix for a game that’s someties called hawk vs. dove.

The query is, what could be the rational behaviour for a given distribution of birds who fight or give in? For those who are in a really dangerous environment, where most birds are known to be aggressive fighters, you may prefer giving in to not get hurt. But if you happen to assume that the majority other birds are cowards, you may see a possible profit in preparing for a fight to scare the others away. By calculating the expected reward, we will work out the precise proportions of birds fighting and birds giving in, which forms an equilibrium. Say the probability to fight is denoted p for bird 1 and q for bird 2, then the probability for giving in is 1-p for bird 1 and 1-q for bird 2. In a Nash equilibrium, no player wants to alter their strategies unless every other payer does. Formally meaning, that each options have to yield the identical expected reward. So, for bird 2 fighting with a probability of q must be pretty much as good as giving in with a probability of (1-q). This leads us to the next formula we will solve for q:

For bird 2 it could be optimal to fight with a probability of 1/3 and provides in with a probability of two/3, and the identical holds for bird 1 due to the symmetry of the sport. In a giant population of birds, that may mean that a 3rd of the birds are fighters, who often seek the fight and the opposite two-thirds prefer giving in. As that is an equilibrium, these ratios will stay stable over time. If it were to occur that more birds became cowards who all the time give in, fighting would change into more rewarding, as the prospect of winning increased. Then, nevertheless, more birds would decide to fight and the variety of cowardly birds decreases and the stable equilibrium is reached again.

Report against the law

There may be nothing to see here. Move on and learn more about game theory. Photo by JOSHUA COLEMAN on Unsplash

Now that we now have understood that we will find optimal Nash equilibria by comparing the expected rewards for the various options, we are going to use this strategy on a more sophisticated example to unleash the ability game theory analyses can have for realistic complex scenarios.

Say against the law happened in the course of the town centre and there are multiple witnesses to it. The query is, who calls the police now? As there are a lot of people around, everybody might expect others to call the police and hence refrain from doing it themself. We will model this scenario as a game again. Let’s say we now have players and everybody has two options, namely or . And what’s the ? For the reward, we distinguish three cases. If no person calls the police, the reward is zero, because then the crime shouldn’t be reported. For those who call the police, you will have some costs (e.g. the time you will have to spend to attend and tell the police what happened), however the crime is reported which helps keep your city protected. If any individual else reports the crime, the town would still be kept protected, but you didn’t have the prices of calling the police yourself. Formally, we will write this down as follows:

is the reward of keeping the town protected, which you get either if any individual else calls the police (first row) or if you happen to call the police yourself (second row). Nevertheless, within the second case, your reward is diminished just a little by the prices you will have to take. Nevertheless, allow us to assume that is smaller than , which implies, that the prices of calling the police never exceed the reward you get from keeping your city protected. Within the last case, where no person calls the police, your reward is zero.

This game looks just a little different from the previous ones we had, mainly because we didn’t display it as a matrix. In truth, it’s more complicated. We didn’t specify the precise variety of players (we just called it ), and we also didn’t specify the rewards explicitly but just introduced some values and . Nevertheless, this helps us model a quite complicated real situation as a game and can allow us to reply more interesting questions: First, what happens if more people witness the crime? Will it change into more likely that any individual will report the crime? Second, how do the prices influence the likelihood of the crime being reported? We will answer these questions with the game-theoretic concepts we now have learned already.

As within the previous examples, we are going to use the Nash equilibrium’s property that in an optimal state, no person should want to alter their motion. Meaning, for each individual calling the police must be pretty much as good as not calling it, which leads us to the next formula:

On the left, you will have the reward if you happen to call the police yourself (). This must be pretty much as good as a reward of times the likelihood that anybody else calls the police. Now, the probability of anybody else calling the police is similar as 1 minus the probability that no person else calls the police. If we denote the probability that a person calls the police with , the probability that a single individual does call the police is . Consequently, the probability that two individuals don’t call the police is the product of the one probabilities, (1-p)*(1-p). For n-1 individuals (all individuals except you), this offers us the term 1-p to the ability of n-1 within the last row. We will solve this equation and eventually arrive at:

This last row gives you the probability of a single individual calling the police. What happens, if there are more witnesses to the crime? If n gets larger, the exponent becomes smaller (1/n goes towards 0), which finally results in:

On condition that x to the ability of 0 is all the time 1, p becomes zero. In other words, the more witnesses are around (higher n), the less likely it becomes that you just call the police, and for an infinite amount of other witnesses, the probability drops to zero. This sounds reasonable. The more other people around, the more likely you’re to expect that anybody else will call the police and the smaller you see your responsibility. Naturally, all other individuals could have the identical chain of thought. But that also sounds just a little tragic, doesn’t it? Does this mean that no person will call the police if there are a lot of witnesses?

Well, not necessarily. We just saw that the probability of a single person calling the police declines with higher n, but there are still more people around. Perhaps the sheer number of individuals around counteracts this diminishing probability. 100 individuals with a small probability of calling the police each might still be price greater than a number of individuals with moderate individual probabilities. Allow us to now take a take a look at the probability that anybody calls the police.

The probability that anybody calls the police is the same as 1 minus the probability that no person calls the police. Like in the instance before, the probability of no person calling the police is described by 1-p to the ability of n. We then use an equation we derived previously (see formulas above) to interchange (1-p)^(n-1) with c/v.

Once we take a look at the last line of our calculations, what happens for large n now? We already know that p drops to zero, leaving us with a probability of 1-c/v. That is the likelihood that will call the police if there are a lot of people around (note that that is different from the probability that a calls the police). We see that this likelihood heavily relies on the ratio of c and v. The smaller c, the more likely it’s that anybody calls the police. If c is (near) zero, it is nearly certain that the police will likely be called, but when c is nearly as big as v (that’s, the prices of calling the police eat up the reward of reporting the crime), it becomes unlikely that anybody calls the police. This offers us a lever to influence the probability of reporting crimes. Calling the police and reporting against the law must be as effortless and low-threshold as possible.

Summary

We now have learned loads about probabilities and selecting actions randomly today. Photo by Robert Stump on Unsplash

On this chapter on our journey through the realms of game theory, we now have introduced so-called mixed strategies, which allowed us to explain games by the possibilities with which different actions are taken. We will summarize our key findings as follows:

A mixed strategy is described by a probability distribution over the various actions.
In a Nash equilibrium, the expected reward for all actions a player can take should be equal.
In mixed strategies, a Nash equilibrium implies that no player desires to change the possibilities of their actions
We will discover the possibilities of various actions in a Nash equilibrium by setting the expected rewards of two (or more) options equal.
Game-theoretic concepts allow us to investigate scenarios with an infinite amount of players. Such analyses may also tell us how the precise shaping of the reward can influence the possibilities in a Nash equilibrium. This might be used to encourage decisions in the true world, as we saw within the crime reporting example.

We’re almost through with our series on the basics of game theory. In the following and final chapter, we are going to introduce the thought of taking turns in games. Stay tuned!

References

The topics introduced listed below are typically covered in standard textbooks on game theory. I mainly used this one, which is written in German though:

Bartholomae, F., & Wiens, M. (2016). . Wiesbaden: Springer Fachmedien Wiesbaden.

An alternate in English language could possibly be this one:

Espinola-Arredondo, A., & Muñoz-Garcia, F. (2023). . Springer Nature.

Game theory is a slightly young field of research, with the primary principal textbook being this one:

Von Neumann, J., & Morgenstern, O. (1944). Theory of games and economic behavior.

When You Just Can’t Settle on a Single Motion

Mixed Strategies

Fighting birds

Report against the law

Summary

References

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Speed up StarCoder with 🤗 Optimum Intel on Xeon: Q8/Q4 and Speculative Decoding

a Leaderboard for Real World Use Cases

Patch Time Series Transformer in Hugging Face

Constitutional AI with Open LLMs

Hugging Face Text Generation Inference available for AWS Inferentia2

When You Just Can’t Settle on a Single Motion

Mixed Strategies

Fighting birds

Report against the law

Summary

References

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.