Bayes’ theorem is the mathematical formula for updating a belief when you see new evidence. It is the single most important result in probability theory for real-world reasoning:

P(AB)=P(BA)P(A)P(B)P(A \mid B) = \frac{P(B \mid A) \cdot P(A)}{P(B)}

In words: the probability of hypothesis AA given evidence BB equals the probability of evidence BB given AA, times the prior probability of AA, divided by the total probability of evidence BB.

The intuition

Suppose a disease affects 1 in 1000 people. A test detects the disease correctly 99% of the time but also raises a false alarm 5% of the time. You test positive. What is the probability you actually have the disease?

Most people’s intuition says “about 99%.” The correct answer, from Bayes, is about 2%. The reason: the disease is so rare that even a highly accurate test generates more false positives than true positives in the general population.

Bayes’ theorem forces you to combine three quantities honestly:

  1. Prior: how common is the condition before any test (P(A)P(A))
  2. Likelihood: how likely is the evidence if the condition holds (P(BA)P(B \mid A))
  3. Normalization: the total probability of the evidence (P(B)P(B))

Ignoring any of these — especially the prior — is the most common source of statistical error in medicine, law, and journalism.

The derivation

The theorem falls directly out of the definition of conditional probability. By definition:

P(AB)=P(AB)P(B),P(BA)=P(AB)P(A)P(A \mid B) = \frac{P(A \cap B)}{P(B)}, \qquad P(B \mid A) = \frac{P(A \cap B)}{P(A)}

Multiplying the second equation by P(A)P(A) gives P(AB)=P(BA)P(A)P(A \cap B) = P(B \mid A) \cdot P(A). Substituting into the first equation yields Bayes’ theorem.

It is almost trivial as a mathematical statement. Its importance is purely interpretive: it tells you how rational belief should move in response to data.

History

Thomas Bayes (1701–1761) was an English Presbyterian minister and amateur mathematician. His essay “An Essay towards solving a Problem in the Doctrine of Chances” was found among his papers after his death and published in 1763 by Richard Price. Pierre-Simon Laplace, working independently a decade later, generalized the result and laid the foundations of what we now call Bayesian inference.

For most of the 20th century, Bayesian methods were considered suspect by mainstream statisticians, who preferred the “frequentist” interpretation championed by Fisher, Neyman, and Pearson. The Bayesian view made a comeback in the 1980s as computing power made the calculations feasible, and it is now dominant in machine learning, scientific inference, and artificial intelligence.

Modern relevance

Almost every modern machine learning algorithm contains a Bayesian core. Spam filters classify email by applying Bayes’ theorem to word frequencies. Self-driving cars update their world-model against sensor readings using Bayesian filters. Medical diagnostic systems, legal evidence evaluation, weather forecasting, and astronomical signal detection all rely on the same formula.

When people say a modern AI “learns from data,” what the AI is usually doing — formally or approximately — is Bayesian updating.

Frequently asked

Who discovered Bayes' theorem?

It is named after the English clergyman Thomas Bayes (1701–1761), whose essay was published posthumously in 1763 by Richard Price. Pierre-Simon Laplace independently rediscovered and generalized it around 1774.

What is the simplest way to understand it?

Bayes' theorem tells you how to update your belief in a hypothesis when you see new evidence. Start with your prior belief, weight it by how likely the evidence is if the hypothesis were true, and normalize.

Why is it controversial?

Applying Bayes requires a prior probability, which is often subjective. 'Frequentist' statisticians argue this injects personal judgment into scientific reasoning; 'Bayesians' argue that every inference implicitly uses priors, and making them explicit is a feature.