First, the scope and assumptions of the question of learning in games.
Second, several learning models.
- Pure Strategy Best Response Equilibrium and Best Response Dynamics
- Mixed Strategy Best Response Equilibrium (Nash Equilibrium) and Best Response Dynamics
- Pure Strategy Fictitious Player
- Replicator Dynamics, mimicking the best or the better
- Pure Strategy Smoothed Best Response Equilibrium and Best Response Dynamics
- Smoothed Fictitious Play
- Here comes something that is natural but not in the book: Quantal Response Equilibrium (QRE) and Dynamical QRE, or mixed strategy smoothed best response and its dynamical version
- In principle, one can also have mixed fictitious play with smoothed best response
\[S^{i} = BR^{i}\left(S^{-i}\right)\]
and
\[S^{i}\left(t+1\right) = BR^{i}\left(S^{-i}\left(t\right)\right)\]
Where \(S^{i}\) is the pure strategy of player \(i\) and \(S^{-i}\) is the pure strategy state of players other than the player \(i\)
\[\rho^{i} = BR^{i}\left(\rho^{-i}\right)\]
and
\[\rho^{i}\left(t+1\right) = BR^{i}\left(\rho^{-i}\left(t\right)\right)\]
Where \(\rho^{i}\) is the mixed strategy of player \(i\) and \(\rho^{-i}\) is the pure strategy state of players other than the player \(i\)
\[S^{i}\left(t+1\right) = BR^{i}\left(\rho^{-i, E}\left(t\right)\right)\]
Where \(\rho^{-i,E}\) is the empirical distribution of strategies of players other than the player \(i\) from the whole history, or certain length of the previous actions
\[Prob\left(S^{i}\left(t+1\right)=S^{j}\left(t\right)\right) = \delta_{E^j\left(t\right), Max\left(E^{1}\left(t\right), \cdots, E^{i}\left(t\right), \cdots, E^{N}\left(t\right)\right)}\]
or
\[Prob\left(S^{i}\left(t+1\right)=S^{j}\left(t\right)\right) \propto e^{\beta\left(E^{j}\left(t\right)-E^{i}\left(t\right)\right)}\]
\[S^{i} = \bar{BR}^{i}\left(S^{-i}\right)\]
and
\[S^{i}\left(t+1\right) = \bar{BR}^{i}\left(S^{-i}\left(t\right)\right)\]
where
\[\bar{BR}^{i}\left(\rho^{-i}\right)\propto e^{\beta E\left(s^{i},\rho^{-i}\right)}\]
is a probability distribution of player \(i\)’s strategies and \(S^{i}\) takes one sample from this probability distribution at a time.
\[S^{i}\left(t+1\right) = \bar{BR}^{i}\left(\rho^{-i, E}\left(t\right)\right)\]
Again \(S^{i}\) takes one sample from this probability distribution at a time.
\[\rho^{i} = \bar{BR}^{i}\left(\rho^{-i}\right)\]
and
\[\rho^{i}\left(t+1\right) = \bar{BR}^{i}\left(\rho^{-i}\left(t\right)\right)\]
What it does is to simply replace the static/dynamical mixed best response by static/dynamical mixed smoothed best response. This is what we have done in this field: Dynamical QRE and its stability.
\[\rho^{i}\left(t+1\right) = \bar{BR}^{i}\left(\rho^{-i,E}\left(t\right)\right)\]
where \(\rho^{-i,E}\left(t\right)\) is some kind of empirical distribution of strategies of players other than player \(i\). For example, one approach can be taking average of all historical \(\rho^{j}\left(\tau<t+1\right)\)s,
\[\rho^{j,E}\left(t\right) = \sum_{\tau<t+1}\frac{\rho^{j}\left(\tau\right)}{t}.\]
Not sure this has been discussed by others or not.
All the above models can be simultaneously updated or alternatively updated.