Summary of The Theory of Learning in Games by Fudenberg

First, the scope and assumptions of the question of learning in games.

Second, several learning models.

Pure Strategy Best Response Equilibrium and Best Response Dynamics

\[S^{i} = BR^{i}\left(S^{-i}\right)\]
and
\[S^{i}\left(t+1\right) = BR^{i}\left(S^{-i}\left(t\right)\right)\]
Where \(S^{i}\) is the pure strategy of player \(i\) and \(S^{-i}\) is the pure strategy state of players other than the player \(i\)

Mixed Strategy Best Response Equilibrium (Nash Equilibrium) and Best Response Dynamics

\[\rho^{i} = BR^{i}\left(\rho^{-i}\right)\]
and
\[\rho^{i}\left(t+1\right) = BR^{i}\left(\rho^{-i}\left(t\right)\right)\]
Where \(\rho^{i}\) is the mixed strategy of player \(i\) and \(\rho^{-i}\) is the pure strategy state of players other than the player \(i\)

Pure Strategy Fictitious Player

\[S^{i}\left(t+1\right) = BR^{i}\left(\rho^{-i, E}\left(t\right)\right)\]
Where \(\rho^{-i,E}\) is the empirical distribution of strategies of players other than the player \(i\) from the whole history, or certain length of the previous actions

Replicator Dynamics, mimicking the best or the better

\[Prob\left(S^{i}\left(t+1\right)=S^{j}\left(t\right)\right) = \delta_{E^j\left(t\right), Max\left(E^{1}\left(t\right), \cdots, E^{i}\left(t\right), \cdots, E^{N}\left(t\right)\right)}\]
or
\[Prob\left(S^{i}\left(t+1\right)=S^{j}\left(t\right)\right) \propto e^{\beta\left(E^{j}\left(t\right)-E^{i}\left(t\right)\right)}\]

Pure Strategy Smoothed Best Response Equilibrium and Best Response Dynamics

\[S^{i} = \bar{BR}^{i}\left(S^{-i}\right)\]
and
\[S^{i}\left(t+1\right) = \bar{BR}^{i}\left(S^{-i}\left(t\right)\right)\]
where
\[\bar{BR}^{i}\left(\rho^{-i}\right)\propto e^{\beta E\left(s^{i},\rho^{-i}\right)}\]
is a probability distribution of player \(i\)’s strategies and \(S^{i}\) takes one sample from this probability distribution at a time.

Smoothed Fictitious Play

\[S^{i}\left(t+1\right) = \bar{BR}^{i}\left(\rho^{-i, E}\left(t\right)\right)\]
Again \(S^{i}\) takes one sample from this probability distribution at a time.

Here comes something that is natural but not in the book: Quantal Response Equilibrium (QRE) and Dynamical QRE, or mixed strategy smoothed best response and its dynamical version

\[\rho^{i} = \bar{BR}^{i}\left(\rho^{-i}\right)\]
and
\[\rho^{i}\left(t+1\right) = \bar{BR}^{i}\left(\rho^{-i}\left(t\right)\right)\]
What it does is to simply replace the static/dynamical mixed best response by static/dynamical mixed smoothed best response. This is what we have done in this field: Dynamical QRE and its stability.

In principle, one can also have mixed fictitious play with smoothed best response

\[\rho^{i}\left(t+1\right) = \bar{BR}^{i}\left(\rho^{-i,E}\left(t\right)\right)\]
where \(\rho^{-i,E}\left(t\right)\) is some kind of empirical distribution of strategies of players other than player \(i\). For example, one approach can be taking average of all historical \(\rho^{j}\left(\tau<t+1\right)\)s,
\[\rho^{j,E}\left(t\right) = \sum_{\tau<t+1}\frac{\rho^{j}\left(\tau\right)}{t}.\]
Not sure this has been discussed by others or not.

All the above models can be simultaneously updated or alternatively updated.

发表评论 取消回复

发表评论取消回复