2014年4月 – 吴金闪的工作和思考

Summary of The Theory of Learning in Games by Fudenberg

First, the scope and assumptions of the question of learning in games.

Second, several learning models.

Pure Strategy Best Response Equilibrium and Best Response Dynamics

\[S^{i} = BR^{i}\left(S^{-i}\right)\]
and
\[S^{i}\left(t+1\right) = BR^{i}\left(S^{-i}\left(t\right)\right)\]
Where \(S^{i}\) is the pure strategy of player \(i\) and \(S^{-i}\) is the pure strategy state of players other than the player \(i\)

Mixed Strategy Best Response Equilibrium (Nash Equilibrium) and Best Response Dynamics

\[\rho^{i} = BR^{i}\left(\rho^{-i}\right)\]
and
\[\rho^{i}\left(t+1\right) = BR^{i}\left(\rho^{-i}\left(t\right)\right)\]
Where \(\rho^{i}\) is the mixed strategy of player \(i\) and \(\rho^{-i}\) is the pure strategy state of players other than the player \(i\)

Pure Strategy Fictitious Player

\[S^{i}\left(t+1\right) = BR^{i}\left(\rho^{-i, E}\left(t\right)\right)\]
Where \(\rho^{-i,E}\) is the empirical distribution of strategies of players other than the player \(i\) from the whole history, or certain length of the previous actions

Replicator Dynamics, mimicking the best or the better

\[Prob\left(S^{i}\left(t+1\right)=S^{j}\left(t\right)\right) = \delta_{E^j\left(t\right), Max\left(E^{1}\left(t\right), \cdots, E^{i}\left(t\right), \cdots, E^{N}\left(t\right)\right)}\]
or
\[Prob\left(S^{i}\left(t+1\right)=S^{j}\left(t\right)\right) \propto e^{\beta\left(E^{j}\left(t\right)-E^{i}\left(t\right)\right)}\]

Pure Strategy Smoothed Best Response Equilibrium and Best Response Dynamics

\[S^{i} = \bar{BR}^{i}\left(S^{-i}\right)\]
and
\[S^{i}\left(t+1\right) = \bar{BR}^{i}\left(S^{-i}\left(t\right)\right)\]
where
\[\bar{BR}^{i}\left(\rho^{-i}\right)\propto e^{\beta E\left(s^{i},\rho^{-i}\right)}\]
is a probability distribution of player \(i\)’s strategies and \(S^{i}\) takes one sample from this probability distribution at a time.

Smoothed Fictitious Play

\[S^{i}\left(t+1\right) = \bar{BR}^{i}\left(\rho^{-i, E}\left(t\right)\right)\]
Again \(S^{i}\) takes one sample from this probability distribution at a time.

Here comes something that is natural but not in the book: Quantal Response Equilibrium (QRE) and Dynamical QRE, or mixed strategy smoothed best response and its dynamical version

\[\rho^{i} = \bar{BR}^{i}\left(\rho^{-i}\right)\]
and
\[\rho^{i}\left(t+1\right) = \bar{BR}^{i}\left(\rho^{-i}\left(t\right)\right)\]
What it does is to simply replace the static/dynamical mixed best response by static/dynamical mixed smoothed best response. This is what we have done in this field: Dynamical QRE and its stability.

In principle, one can also have mixed fictitious play with smoothed best response

\[\rho^{i}\left(t+1\right) = \bar{BR}^{i}\left(\rho^{-i,E}\left(t\right)\right)\]
where \(\rho^{-i,E}\left(t\right)\) is some kind of empirical distribution of strategies of players other than player \(i\). For example, one approach can be taking average of all historical \(\rho^{j}\left(\tau<t+1\right)\)s,
\[\rho^{j,E}\left(t\right) = \sum_{\tau<t+1}\frac{\rho^{j}\left(\tau\right)}{t}.\]
Not sure this has been discussed by others or not.

All the above models can be simultaneously updated or alternatively updated.

几本书

量子力学的课堂讲稿和讲义
 系统理论基础教程
 概念地图学习和教学方法

科学家中的社会学习

考虑科学家采用其他人提出的科学方法或者科学概念这样一件事情。我们注意到：第一、非常原创性的工作在短期内比较难以得到大家的采用；第二、原创性不高的跟踪性的工作往往更容易发表和得到采用；第三、原创性高的工作最后还是会扩散开来。

问：如何描述这件事情，扩散和得到采用的时间等取决于什么因素。

模型：我们把科学家分成三种：聪明的，一般的和傻的。一个工作会释放一个隐藏的私人信号给每一位科学家，p的几率这个信号就是这个工作的价值。这个p的取值对于三类科学家是不一样的，例如0.8、0.5、0.4。但是所有的科学家都相信自己的聪明的，于是主观认为自己的p=0.8。每一个科学家的类型是隐藏信息，其他人不知道，自己也不知道（可以考虑让科学家自己知道）。但是，科学家分三类以及各个类别的比例是公共知识。对于上帝（这个模型的设计者）来说，每一个科学家的类型是确定的，随机分配以后定下来的。

一个科学家如果有了对一个科学工作的判断，那么判断和实际一样的时候得到收益T，否则损失收益T。

现在再加上一个收益机制：聪明的科学家，尽管是隐藏的，喜欢在少数派里面，收益增加M；傻的科学家喜欢在多数派里面，收益增加M；一般的科学家，无所谓，收益不增加和减少。

看一看这个动力学。如果我们把一般的和傻的合并为一般的，结果会改变吗？如果只有一个群体，那么按照社会学习的已有结果，正确的或者错误的信息塌缩都是可能的。多个群体呢？没准科学的扩散与传播需要多个群体？

《概念地图学习与教学方法》

经过两周的努力，终于写完了这本小书。这本书主要是给家长、老师写的。希望越来越多的学生依赖于理解型学习，放弃机械式学习，需要更少的学习的时间，学得更深刻的对于知识的理解，有一个好的知识结构。愿更多的老师不以比学生懂得多为荣，而是以比学生思考的更深入、掌握更好的知识之间的联系为荣。

书稿。