Learned something new about Bayesian formula last night and learned the hard way

Bayesian formula,
[P(A|B)=\frac{P(B|A)P(A)}{P(B|A)P(A)+P(B|\bar{A})P(\bar{A})}]
is conceptually straightforward, but amazingly useful in statistics. It turns calculation of (P(A|B)) into finding out (P(B|A)) by simply making use of the rule of total probability,
[P(A\cap B) = P(A|B)P(B) = P(B|A)P(A)]
and
[P(A) = P(A\cap B) + P(A\cap \bar{B}). ]

This seems rather trivial to me. Here comes the surprising part. Let us now add another set (C) in the following way,
[P\left(A|B\right) = P\left(\left(A|C\right)|B\right)P\left(C|B\right) + P\left(\left(A|\bar{C}\right)|B\right)P\left(\bar{C}|B\right), \hspace{2cm} (1)]
or in this way,
[P\left(A|B\right) = P\left(\left(A|B\right)|C\right)P\left(C\right) + P\left(\left(A|B\right)|\bar{C}\right)P\left(\bar{C}\right). \hspace{2cm} (2)]

Now let us ask which one of the above two formulae is the proper one, or both, or none?

It is easy to verify the first one: Assuming
[P\left(\left(A|C\right)|B\right) = P\left(A|\left(C,B\right)\right) = \frac{A\cap B \cap C}{B \cap C}, \hspace{2cm} (3)]
then right-hand side of Equ(1) becomes
[\frac{A\cap B \cap C}{B \cap C}\frac{B\cap C}{B} + \frac{A\cap B \cap \bar{C}}{B \cap \bar{C}}\frac{B\cap \bar{C}}{B} = \frac{A\cap B \cap C}{B} + \frac{A\cap B \cap \bar{C}}{B} = \frac{A\cap B}{B}, \hspace{1cm} (4)]
which is exactly the left-hand side of Equ(1).

Verifying Equ(2) is however not easy. If the assumption in Equ(3) is right, then Equ(2) becomes
[\frac{A\cap B \cap C}{B \cap C}\frac{C}{\Omega} + \frac{A\cap B \cap \bar{C}}{B \cap \bar{C}}\frac{\bar{C}}{\Omega}. \hspace{1cm} (4)]
I can see no clue that this expression should be (\frac{A\cap B}{B}).

However, if (P\left(A|B\right)) is the probability of a set of events, then the second one should be correct too. So what is the problem? It seems to me that when discussing (P\left(A|B\right)), we have implicitly limited the whole set, which originally is (\Omega), to be (B), therefore, all the expressions derived from there should have carried the condition (B) forever. So lesson one: Keeping the condition (B) as the condition for all other events. Therefore, Equ(1), not Equ(2), should be used in our case.

Another lesson learned is that conditional probability is a tricky concept and one has to deal it with extra attention.

神书推荐(Recommending The Princeton Companion to Mathematics)

最近读了一点点《普林斯顿数学指南》(The Princeton Companion to Mathematics),实在是精品,強烈推荐每一个数学家、物理学家、数学和物理系的学生,都看一看。

这本书把数学的主要分支的研究问题、主要思想、学习材料都做了介绍,而且是深入浅出,又不牺牲准确性、科学性的介绍。

这样的书,高中生、本科生、研究生、教授读了以后都会有收获。

什么时候,物理学也应该整出这样一本书来,系统科学也是。

Recently, I found a great book on mathematics, The Princeton Companion to Mathematics. It is like a guide or a big-picture introduction to almost every subfields of mathematics, without losing any accuracy and attractiveness.

All mathematicians, physicists, and students in math, physics, or even other fields related to appplied math, should read at least certain parts of this great book.

I think physicists should produce a similar book on physics too. Or maybe every discpline should have a simiar one.