Summary of The Theory of Learning in Games by Fudenberg

First, the scope and assumptions of the question of learning in games.

Second, several learning models.

  1. Pure Strategy Best Response Equilibrium and Best Response Dynamics
  2. \[S^{i} = BR^{i}\left(S^{-i}\right)\]
    and
    \[S^{i}\left(t+1\right) = BR^{i}\left(S^{-i}\left(t\right)\right)\]
    Where \(S^{i}\) is the pure strategy of player \(i\) and \(S^{-i}\) is the pure strategy state of players other than the player \(i\)

  3. Mixed Strategy Best Response Equilibrium (Nash Equilibrium) and Best Response Dynamics
  4. \[\rho^{i} = BR^{i}\left(\rho^{-i}\right)\]
    and
    \[\rho^{i}\left(t+1\right) = BR^{i}\left(\rho^{-i}\left(t\right)\right)\]
    Where \(\rho^{i}\) is the mixed strategy of player \(i\) and \(\rho^{-i}\) is the pure strategy state of players other than the player \(i\)

  5. Pure Strategy Fictitious Player
  6. \[S^{i}\left(t+1\right) = BR^{i}\left(\rho^{-i, E}\left(t\right)\right)\]
    Where \(\rho^{-i,E}\) is the empirical distribution of strategies of players other than the player \(i\) from the whole history, or certain length of the previous actions

  7. Replicator Dynamics, mimicking the best or the better
  8. \[Prob\left(S^{i}\left(t+1\right)=S^{j}\left(t\right)\right) = \delta_{E^j\left(t\right), Max\left(E^{1}\left(t\right), \cdots, E^{i}\left(t\right), \cdots, E^{N}\left(t\right)\right)}\]
    or
    \[Prob\left(S^{i}\left(t+1\right)=S^{j}\left(t\right)\right) \propto e^{\beta\left(E^{j}\left(t\right)-E^{i}\left(t\right)\right)}\]

  9. Pure Strategy Smoothed Best Response Equilibrium and Best Response Dynamics
  10. \[S^{i} = \bar{BR}^{i}\left(S^{-i}\right)\]
    and
    \[S^{i}\left(t+1\right) = \bar{BR}^{i}\left(S^{-i}\left(t\right)\right)\]
    where
    \[\bar{BR}^{i}\left(\rho^{-i}\right)\propto e^{\beta E\left(s^{i},\rho^{-i}\right)}\]
    is a probability distribution of player \(i\)’s strategies and \(S^{i}\) takes one sample from this probability distribution at a time.

  11. Smoothed Fictitious Play
  12. \[S^{i}\left(t+1\right) = \bar{BR}^{i}\left(\rho^{-i, E}\left(t\right)\right)\]
    Again \(S^{i}\) takes one sample from this probability distribution at a time.

  13. Here comes something that is natural but not in the book: Quantal Response Equilibrium (QRE) and Dynamical QRE, or mixed strategy smoothed best response and its dynamical version
  14. \[\rho^{i} = \bar{BR}^{i}\left(\rho^{-i}\right)\]
    and
    \[\rho^{i}\left(t+1\right) = \bar{BR}^{i}\left(\rho^{-i}\left(t\right)\right)\]
    What it does is to simply replace the static/dynamical mixed best response by static/dynamical mixed smoothed best response. This is what we have done in this field: Dynamical QRE and its stability.

  15. In principle, one can also have mixed fictitious play with smoothed best response
  16. \[\rho^{i}\left(t+1\right) = \bar{BR}^{i}\left(\rho^{-i,E}\left(t\right)\right)\]
    where \(\rho^{-i,E}\left(t\right)\) is some kind of empirical distribution of strategies of players other than player \(i\). For example, one approach can be taking average of all historical \(\rho^{j}\left(\tau<t+1\right)\)s,
    \[\rho^{j,E}\left(t\right) = \sum_{\tau<t+1}\frac{\rho^{j}\left(\tau\right)}{t}.\]
    Not sure this has been discussed by others or not.

All the above models can be simultaneously updated or alternatively updated.

发表评论

电子邮件地址不会被公开。 必填项已用*标注