Bayes’ theorem: a comment on a comment March 10, 2014Posted by larry in Bayes' theorem, Logic, Philosophy, Statistics.
Assume the standard axioms of set theory, say, the Zermelo-Fraenkel axioms.
Then provide a definition of conditional probability:
Because set intersection is commutative, you can have this:
What we have here is a complex, contextual definition relating a term, P, from probability theory with a newly introduced stroke operator, |, read as “given”, so the locution becomes, for instance, the probability, P, of A given B. Effectively, the definition is a contextual definition of the stroke operator, |, “given”.
Although set intersection (equivalent in this context to conjunction) is commutative, conditional probability isn’t, which is due to the asymmetric character of the stroke operator, |. This means that, in general, P(A|B) ≠ P(B|A). If we consider the example of Data vs. Hypothesis, we can see that in general, for A = Hypothesis and B = Data, that P(Hypothesis|Data) ≠ P(Data|Hypothesis).
Now, from the definition of “conditional probability” and the standard axioms of set theory which have already been implicitly used, we obtained Bayes’ theorem trivially, mathematically speaking, via a couple of simple substitutions.
Or the Bayes-Laplace theorem, since Laplace discovered the rule independently. However, according to Stigler’s rule of eponymy in mathematics, theorems are invariably attributed to the wrong person (Stigler, “Who Discovered Bayes’ Theorem?”. In Stigler, Statistics on the Table, 1999).
Now, since we have seen that Bayes’ theorem follows from the axioms of set theory plus the definition of “conditional probability”, the following comments from a recent tutorial text on Bayes’ theorem can only be interpreted as being odd. The following quote is from James V. Stones’ Bayes’ Rule: A Tutorial Introduction to Bayesian Analysis (3rd printing, Jan. 2014).
If we had to establish the rules for calculating with probabilities, we would insist that the result of such calculations must tally with our everyday experience of the physical world, just as surely as we would insist that 1+1 = 2. Indeed, if we insist that probabilities must be combined with each other in accordance with certain common sense principles then Cox (1946) showed that this leads to a unique set of rules, a set which includes Bayes’ rule, which also appears as part of Kolmogorov’s (1933) (arguably, more rigorous) theory of probability (Stone: pp. 2-3).
Bayes’ theorem does not form part of Kolmogorov’s set of axioms. Strictly speaking, Bayes’ rule must be viewed as a logical consequence of the axioms of set theory, the Kolmogorov axioms of probability, and the definition of “conditional probability”.
Whether Kolmogorov’s axioms for probability tally with our experience of the real world is another question. The axioms are sometimes used as indications of non-rational thought processes in certain psychological experiments, such as the Linda experiment by Tversky and Kahneman. (For an alternative interpretation of this experiment that brings into question the assumption that people either do or should reason according to a simple application of the Kolmogorov axioms, cf. Luc Bovens & Stephan Hartmann, Bayesian Epistemology, 2003: 85-88).
A matter of interpretation
In the discussion above, the particular set theory and the Kolmogorov axioms mentioned and used were interpreted via the first-order extensional predicate calculus. This means that both theories can be viewed as not involving intensional contexts such as beliefs. The probability axioms in particular were understood by Kolomogorov and others using them as relating to objective frequencies and applicable to the real world, not to beliefs we might have about the world. For instance, an unbiased coin and die, in the ideal case admittedly, are considered to have a .5 and 1/6 (or .1666) probability for the side of the coin and a side of a six-sided die, respectively, on each flip or throw of the object in question. In these two particular cases, it is only via behavior observed over a long period of time that can produce data that will show whether in fact our assumption that the coin and the die are unbiased is true or not.
Why does this matter. Simply because Bayes’ theorem has been interpreted in two distinct ways – as a descriptively objective statement about the character of the world and as a subjective statement about a users’ beliefs about the state of the world. The derivation above derives from two theories that are considered to be non-subjective in character. One can then reasonably ask: where does the subjective interpretation of Bayes’ theorem come from? Two answers suggest themselves, though these are not the only ones. One is that Bayes’ theorem is arrived at via a different derivation than the one I considered, relying, say, on a different notion of probability than that of Kolmogorov’s. The other is that Bayesian subjectivity is introduced by means of the stroke (or ‘given’) operator, |.
Personally, I see nothing subjective about statements concerning the probability of obtaining a H or a T on the flip of a coin as being .5 or that of obtaining one particular side of a 6-sided die being .166. These probabilities are about the objects themselves, and not about our beliefs concerning them. Of course, this leaves open the possibility of alternative interpretations of probabilities in other contexts, say the probability of guilt or non-guilt in a jury trial. Whether the notions of probability involving coins or dice are the same as those involving situations such as jury trials is a matter for further debate.