Probability is basic to the Bayesian way of thinking. We use probability directly in our computations, and our conclusions are expressed as probabilities.

### What is probability?

The modern theory of probability is based on work by Andrey Kolmogorov, published in German in 1933 with the English translation appearing in 1950^{1}Kolmogorov, A.N. (1950) Foundations of the theory of probability, 2nd English edition 1956 edn. Chelsea Publishing Company, New York.

**Basic definition**: Probability is a number attached to an **event **which has three properties:

a. It is between 0 and 1.

b. Addition: If events are **mutually exclusive**, we can add the probabilities.

c. Multiplication: If events are **independent**, we can multiply the probabilities.

**Event**: An event may be something which occurs, such as rain tomorrow, or a die coming up 6, or Manchester United winning the Premier League next year. We can also relate it to a statement which can be true or false, for example “It will rain tomorrow” or “There are 231 tigers in Malaysia”.

**Mutually exclusive**: A die cannot come up with 6 and 1 at the same time, so we can add; the probability of each with a fair 6-sided die is 1/6, so the probability of 1 or 6 = 1/6 + 1/6 = 1/3. But for drawing cards from a deck, aces and hearts are not mutually exclusive – it’s possible to draw the ace of hearts; the probability of drawing an ace is 4/52 and the probability of a heart is 13/52, but we can’t just add them up and say the probability of “ace” or “heart” is 17/52.

**Independent**: If I roll two dice, what’s the probability they will both show 6? The score on one die does not affect the score on the other, so we can multiply: 1/6 x 1/6 = 1/36. If I draw two cards from the same deck, what’s the probability they will both be aces? If the first card is an ace, there are now only three aces left in the deck, so the second draw is affected by the first draw. The draws are not independent, and we can’t just multiply 1/13 x 1/13.

### Probability with two variables

When we are working with two or more variables, we need additional terminology: marginal, conditional and joint (or conjoint) probabilities. We’ll use an example to explain these:

To test the effect of vitamin C on the probability of getting a common cold during winter, 818 volunteers were recruited and randomly assigned to receive vitamin C or a placebo. After the winter, each was interviewed to determine if they had had a cold^{2}Data from Anderson, Reid and Beaton (1972) Vitamin C and the common cold, *Canadian Medical Association Journal* 107:503-508. . The results are shown below:

cold | no cold | totals | |

vitamin C | 302 | 105 | 407 |

no vitamin C | 335 | 76 | 411 |

totals | 637 | 181 | 818 |

What’s the probability that if we take one volunteer at random they had a cold **and **took vitamin C? This is the **joint probability**. We can calculate that as $${\rm I\!P}(cold \cap vitamin) = \frac {302}{818} = 0.37$$ Here we are using the symbol $\cap$ for intersection, ie, the people who belong to the vitamin group and the cold group.

Each of the 4 cells in the table divided by the grand total produces a joint probability. They add up to 1.

What’s the probability that a person who took vitamin C had a cold? Now we are only interested in the 407 volunteers who took the vitamin, and we calculate the **conditional probability** as $${\rm I\!P}(cold | vitamin) = \frac {302}{407} = 0.74$$ The vertical bar, |, means “given that”.

That used the marginal total for the row; we can do the same thing using the column totals. What’s the probability that a person who had a cold took vitamin C? (This question *looks *similar to the question in the previous paragraph, but is in fact quite different, and the answer is different.) Now we are interested in the 637 volunteers who had colds: $${\rm I\!P}(vitamin| cold) = \frac {302}{637} = 0.474$$

Each of the 4 cells in the table can be divided by the row sums or the column sums, giving 8 conditional probabilities. Conditional probabilities only make sense if the condition is clear.

What’s the probability that a volunteer got a cold? We don’t care if they took the vitamin or not, just whether they got a cold. Here we need the number in the bottom margin, 637, and calculate the **marginal probability** with $${\rm I\!P}(cold) = \frac {637}{818} = 0.78$$

### Deriving Bayes’ Rule

Notice that we can calculate the joint probability from the marginal and conditional probabilities. We look at the people who got colds and the proportion of those who took vitamin C: $${\rm I\!P}(vitamin\cap cold) = {\rm I\!P}(vitamin | cold) {\rm I\!P}(cold)$$ $$ = \frac {302}{637} \frac {637}{818} = \frac {302}{818} = 0.37$$

We can do it the other way around too: $${\rm I\!P}(vitamin\cap cold) = {\rm I\!P}(cold \cap vitamin) = {\rm I\!P}(cold| vitamin) {\rm I\!P}(vitamin)$$ $$ = \frac {302}{407} \frac {407}{818} = \frac {302}{818} = 0.37$$

Hence $$ {\rm I\!P}(vitamin | cold) {\rm I\!P}(cold) = {\rm I\!P}(cold| vitamin) {\rm I\!P}(vitamin)$$ And with a little bit of algebra we have $$ {\rm I\!P}(vitamin | cold) = \frac{ {\rm I\!P}(cold| vitamin) {\rm I\!P}(vitamin)}{ {\rm I\!P}(cold) }$$

Let’s replace vitamin and cold with the usual things we are interested in: parameter values $\theta$ and data: $$ {\rm I\!P}(\theta| data) = \frac{ {\rm I\!P}(data| \theta) {\rm I\!P}(\theta)}{ {\rm I\!P}(data) }$$ This is the usual form of Bayes’ Rule, and it can be derived directly from the axioms of probability theory.

- $ {\rm I\!P}(\theta| data) $ is the
**posterior**probability, based on the likelihood and the prior. - ${\rm I\!P}(data| \theta)$ is the
**likelihood**, the probability of observing the data for given values of $\theta$. - $ {\rm I\!P}(\theta)$ is the
**prior**probability for $\theta$ before considering the data. - ${\rm I\!P}(data)$ is the marginal likelihood, the probability of observing the data ignoring the values of $\theta$.

The marginal likelihood is usually impossible to calculate for models with multiple parameters, but for a given data set it does not depend on $\theta$. We can then use the simpler form: $$ {\rm I\!P}(\theta| data) \propto {\rm I\!P}(data| \theta) {\rm I\!P}(\theta) $$ We can still get proper posterior distributions, since we know that probabilites must add to one.

A mistake occurred when you calculated the conditional probability. Is not 302/407 instead is 302/637. You reported the correct formula when you derive the Bayes rule, but I noticed that in the previous calculation something went wrong.

No, sorry. After looking better, as far as the calculation of each probability one by one is concerned, it seems that everything is ok. However, considering the derivation of the Bayes’ rule, there is still something that I don’t get as the fration values of the conditional probability does not match with the calculation above. Please, can anyone explain? Many thanks.

Thanks for pointing out that it’s not clear! For the example of conditional probability we used P(cold | vitamin) and in the Bayes’ Rule section P(vitamin | cold) – they are not the same. I’ve now added it to the text. I’ve also added the arithmetic for the calculation of P(cold | vitamin)P(vitamin). Look for the NEW and UPDATED icons in the text.

I hope that makes it clearer.

Thank you very much for the explanation. Now it’s clear.