Probability

Probability is basic to the Bayesian way of thinking. We use probability directly in our computations, and our conclusions are expressed as probabilities.

What is probability?

The modern theory of probability is based on work by Andrey Kolmogorov, published in German in 1933 with the English translation appearing in 19501Kolmogorov, A.N. (1950) Foundations of the theory of probability, 2nd English edition 1956 edn. Chelsea Publishing Company, New York.

Basic definition: Probability is a number attached to an event which has three properties:
a. It is between 0 and 1.
b. Addition: If events are mutually exclusive, we can add the probabilities.
c. Multiplication: If events are independent, we can multiply the probabilities.

Event: An event may be something which occurs, such as rain tomorrow, or a die coming up 6, or Manchester United winning the Premier League next year. We can also relate it to a statement which can be true or false, for example “It will rain tomorrow” or “There are 231 tigers in Malaysia”.

Mutually exclusive: A die cannot come up with 6 and 1 at the same time, so we can add; the probability of each with a fair 6-sided die is 1/6, so the probability of 1 or 6 = 1/6 + 1/6 = 1/3. But for drawing cards from a deck, aces and hearts are not mutually exclusive – it’s possible to draw the ace of hearts; the probability of drawing an ace is 4/52 and the probability of a heart is 13/52, but we can’t just add them up and say the probability of “ace” or “heart” is 17/52.

Independent: If I roll two dice, what’s the probability they will both show 6? The score on one die does not affect the score on the other, so we can multiply: 1/6 x 1/6 = 1/36. If I draw two cards from the same deck, what’s the probability they will both be aces? If the first card is an ace, there are now only three aces left in the deck, so the second draw is affected by the first draw. The draws are not independent, and we can’t just multiply 1/13 x 1/13.

Probability with two variables

When we are working with two or more variables, we need additional terminology: marginal, conditional and joint (or conjoint) probabilities. We’ll use an example to explain these:

To test the effect of vitamin C on the probability of getting a common cold during winter, 818 volunteers were recruited and randomly assigned to receive vitamin C or a placebo. After the winter, each was interviewed to determine if they had had a cold2Data from Anderson, Reid and Beaton (1972) Vitamin C and the common cold, Canadian Medical Association Journal 107:503-508. . The results are shown below:

cold no cold totals
vitamin C 302 105407
no vitamin C 335 76411
totals 637 181818

What’s the probability that if we take one volunteer at random they had a cold and took vitamin C? This is the joint probability. We can calculate that as $${\rm I\!P}(cold \cap vitamin) = \frac {302}{818} = 0.37$$ Here we are using the symbol $\cap$ for intersection, ie, the people who belong to the vitamin group and the cold group.

Each of the 4 cells in the table divided by the grand total produces a joint probability. They add up to 1.

What’s the probability that a person who took vitamin C had a cold? Now we are only interested in the 407 volunteers who took the vitamin, and we calculate the conditional probability as $${\rm I\!P}(cold | vitamin) = \frac {302}{407} = 0.74$$ The vertical bar, |, means “given that”.

That used the marginal total for the row; we can do the same thing using the column totals. What’s the probability that a person who had a cold took vitamin C? (This question looks similar to the question in the previous paragraph, but is in fact quite different, and the answer is different.) Now we are interested in the 637 volunteers who had colds: $${\rm I\!P}(vitamin| cold) = \frac {302}{637} = 0.474$$

Each of the 4 cells in the table can be divided by the row sums or the column sums, giving 8 conditional probabilities. Conditional probabilities only make sense if the condition is clear.

What’s the probability that a volunteer got a cold? We don’t care if they took the vitamin or not, just whether they got a cold. Here we need the number in the bottom margin, 637, and calculate the marginal probability with $${\rm I\!P}(cold) = \frac {637}{818} = 0.78$$

Deriving Bayes’ Rule

Notice that we can calculate the joint probability from the marginal and conditional probabilities. We look at the people who got colds and the proportion of those who took vitamin C: $${\rm I\!P}(vitamin\cap cold) = {\rm I\!P}(vitamin | cold) {\rm I\!P}(cold)$$ $$ = \frac {302}{637} \frac {637}{818} = \frac {302}{818} = 0.37$$

We can do it the other way around too: $${\rm I\!P}(vitamin\cap cold) = {\rm I\!P}(cold \cap vitamin) = {\rm I\!P}(cold| vitamin) {\rm I\!P}(vitamin)$$ $$ = \frac {302}{407} \frac {407}{818} = \frac {302}{818} = 0.37$$

Hence $$ {\rm I\!P}(vitamin | cold) {\rm I\!P}(cold) = {\rm I\!P}(cold| vitamin) {\rm I\!P}(vitamin)$$ And with a little bit of algebra we have $$ {\rm I\!P}(vitamin | cold) = \frac{ {\rm I\!P}(cold| vitamin) {\rm I\!P}(vitamin)}{ {\rm I\!P}(cold) }$$

Let’s replace vitamin and cold with the usual things we are interested in: parameter values $\theta$ and data: $$ {\rm I\!P}(\theta| data) = \frac{ {\rm I\!P}(data| \theta) {\rm I\!P}(\theta)}{ {\rm I\!P}(data) }$$ This is the usual form of Bayes’ Rule, and it can be derived directly from the axioms of probability theory.

  • $ {\rm I\!P}(\theta| data) $ is the posterior probability, based on the likelihood and the prior.
  • ${\rm I\!P}(data| \theta)$ is the likelihood, the probability of observing the data for given values of $\theta$.
  • $ {\rm I\!P}(\theta)$ is the prior probability for $\theta$ before considering the data.
  • ${\rm I\!P}(data)$ is the marginal likelihood, the probability of observing the data ignoring the values of $\theta$.

The marginal likelihood is usually impossible to calculate for models with multiple parameters, but for a given data set it does not depend on $\theta$. We can then use the simpler form: $$ {\rm I\!P}(\theta| data) \propto {\rm I\!P}(data| \theta) {\rm I\!P}(\theta) $$ We can still get proper posterior distributions, since we know that probabilites must add to one.

8 thoughts on “Probability”

  1. A mistake occurred when you calculated the conditional probability. Is not 302/407 instead is 302/637. You reported the correct formula when you derive the Bayes rule, but I noticed that in the previous calculation something went wrong.

    1. No, sorry. After looking better, as far as the calculation of each probability one by one is concerned, it seems that everything is ok. However, considering the derivation of the Bayes’ rule, there is still something that I don’t get as the fration values of the conditional probability does not match with the calculation above. Please, can anyone explain? Many thanks.

      1. Thanks for pointing out that it’s not clear! For the example of conditional probability we used P(cold | vitamin) and in the Bayes’ Rule section P(vitamin | cold) – they are not the same. I’ve now added it to the text. I’ve also added the arithmetic for the calculation of P(cold | vitamin)P(vitamin). Look for the NEW and UPDATED icons in the text.
        I hope that makes it clearer.

  2. Wonderful explanation! It’s must clearer than many other presentations I have come across. What might be helpful is explaining the alpha symbol found in the last formula (in the Simplified form of Bayes Rule). I believe it means “is proportional to”, but I’m not certain why it isn’t represented as an equals sign ‘=’.

    1. Michael,
      Yes, that’s the “proportional to” symbol. You can check your guess by Googling “proportional to”. There’s a list of math symbols here.

      “Proportional to” is not the same as “equal to”. If someone works for a fixed hourly rate of pay, their pay is proportional to the hours worked, pay $\propto$ hours. That doesn’t mean pay = hours. We’d need to know their hourly rate (dollars per hour) to complete the calculation: pay = hours * rate.

      1. Ah yes, Thanks! I think I get it. When you simplified the equation, you removed the marginal likelihood from the equation. Therefore, you need to state that the two sides “are proportional to”, as you would need to know the marginal likelihood to complete the equality calculation.
        I really appreciate the link to the list of math symbols, thanks! I’m relatively far into my stats training (as far as school goes) but am pretty weak in my mathematics background.

Leave a Reply to Marcello Franchini Cancel reply

Your email address will not be published.

The maximum upload file size: 1 MB. You can upload: image, document, text, archive. Drop file here