You flip a fair coin 10 times. You get 7 heads, 3 tails. That’s a 70% heads rate — quite different from the 50% you’d expect.
You flip the same coin 100 times. The result is closer to 50%, maybe 53%.
You flip it 10,000 times. Almost certainly you’ll be within a percent or two of 50%.
You flip it a million times. You’ll be very close to 50%, with deviations smaller than 0.1%.
This is the Law of Large Numbers in action — perhaps the most important single result in probability theory, and one that underlies almost every practical use of probability and statistics. Casinos exploit it. Insurance companies depend on it. Statisticians invoke it constantly. And popular intuition gets it wrong all the time.
This article is about what the law actually says, why it works, and what it does NOT say (which causes most misunderstandings).
The statement
Let be independent random variables, all drawn from the same distribution with finite mean . The sample mean is
The Law of Large Numbers states:
In words: as you take more and more samples and compute their average, the sample average converges to the true (population) mean.
This sounds intuitive. It’s not as intuitive as it sounds. Let me unpack what this actually says.
What the law tells you
Long-run averages converge: This is the headline result. If you flip a fair coin enough times, the proportion of heads will approach 0.5. Roll a fair die many times and the average roll approaches 3.5. Sample from a normal distribution and the average approaches the mean.
Convergence rate is : The standard deviation of is where is the original distribution’s standard deviation. This means: to halve your typical error, you need 4× as many samples. To get 10× more precision, you need 100× more samples.
This is the basis of polling and surveys: When pollsters announce “1000 randomly selected voters, margin of error ±3%”, they’re using the Law. The 3% comes from , plus some extra factors.
This is why casinos make money: Each individual bet has a small house edge, perhaps 1-2%. Over millions of bets, the actual win rate for the house converges to the expected edge with negligible variance. The casino doesn’t have to win every bet — it has to host enough bets that the law of large numbers does the work.
This is what makes insurance work: An individual insurance claim is unpredictable. Average over a million policy-holders and the total claim costs become predictable. Insurance companies set premiums based on the expected loss plus a margin; the law guarantees that, over many policies, total payouts will be close to the predicted total.
What the law does NOT tell you
This is where most popular misunderstanding occurs.
The law does not say averages return to the mean after deviations: Suppose you’ve flipped a coin 100 times and gotten 60 heads. The sample mean is 0.6, ten percentage points above 0.5. People often think: “tails is ‘due’ to come up; the future flips will compensate.”
This is wrong. The coin has no memory. Each future flip is independent. The future flips will average around 0.5, just like the past ones should have. But the past 100 flips are fixed — they happened, you got 60 heads. The way the running average drifts back to 0.5 is by dilution: as grows, the past 100 flips become a smaller and smaller fraction of total flips.
To make this concrete: after 100 flips with 60 heads, suppose you now flip 10,000 more times. The new flips give about 5,000 heads. Total: 5,060 heads in 10,100 flips, or 50.1%. The early imbalance is still there in absolute terms (60 - 50 = 10 extra heads), but it’s been overwhelmed by the much larger pool of subsequent flips.
This is the Law of Large Numbers correctly understood: averages converge by drowning early imbalances in later samples, not by correcting them.
The law does not say all sample means are near the true mean: Even with flips, you might (rarely) see a sample mean of 0.45 or 0.55. The probability is small but not zero. The law only says: for any tolerance , the probability of deviating by more than goes to zero as .
The law assumes independence: If your samples are correlated, the law may fail. Time series with autocorrelation, social-network data with clustering, financial returns with volatility clustering — all violate the simple i.i.d. assumption. Generalized versions of the law exist, but they require additional structure.
The law assumes finite variance: Distributions with heavy tails (Cauchy, certain Pareto distributions) have infinite variance, and the standard Law of Large Numbers fails for them. Sample averages of Cauchy-distributed data don’t converge.
The gambler’s fallacy
The most pervasive misunderstanding is the gambler’s fallacy: the belief that random events “even out” in the short term, so that recent imbalances must be corrected by upcoming opposite results.
The fallacy is everywhere:
- “Black has come up 8 times in a row at the roulette wheel; red is due.”
- “I’ve gotten three losing scratch tickets in a row; I’m bound to win soon.”
- “We’ve had three sons; the next baby is more likely to be a girl.”
In each case, the underlying probabilistic mechanism doesn’t have memory. Past outcomes don’t influence future ones. Yet humans persistently believe in “balance,” “due,” and “evening out.”
The gambler’s fallacy has consequences. Casinos exploit it; their roulette displays of recent results are designed to encourage betting based on patterns. Lotto players choosing “overdue” numbers are subtly betting against the law of independence. The 1913 Monte Carlo casino infamously had a roulette wheel that landed on black 26 times in a row; gamblers betting heavily on red lost millions.
The Law of Large Numbers is correctly understood as a statement about long-run averages, not a statement that the universe enforces fairness short-term. Quite the opposite — short-term, anything can happen. The law just says that over enough trials, the noise gets averaged out.
Variance and the Central Limit Theorem
The Law of Large Numbers tells you the sample mean converges to the true mean. The next-level question is: how fast?
The Central Limit Theorem answers this. It says that the deviation , properly scaled, approaches a normal (Gaussian) distribution:
So you can be much more precise than just “the sample mean converges.” You can describe the distribution of the deviation, and use that to compute confidence intervals.
For details, see our Central Limit Theorem post. The two laws together — Law of Large Numbers and Central Limit Theorem — form the foundation of classical statistics. The first says averages stabilize; the second says the rate of stabilization is precisely Gaussian.
Where the law shows up
Almost everywhere statistics is used:
Polls and surveys: 1000-respondent polls have margin of error ~3% by the law plus CLT. Larger samples give tighter intervals.
Quality control: a production line with a small defect rate produces unpredictable individual products but predictable average defect rates. Statistical process control uses this for monitoring.
Insurance pricing: an individual house’s flood risk is unpredictable; the expected total flood claims for 100,000 houses across a state are predictable to within a few percent.
Casino edge: roulette has a 5.26% house edge in American versions. Any specific bet might win or lose, but over millions of bets, the casino’s profit converges to 5.26% of total amount wagered.
Drug efficacy: a clinical trial of 1,000 patients gives much tighter estimates of drug efficacy than a trial of 30. Required sample sizes for medical studies are determined by power calculations rooted in the law.
Election forecasting: aggregating polls reduces variance below any individual poll. Nate Silver’s models in the 2010s exploited this aggregation explicitly.
Sports analytics: a player’s batting average over 600 at-bats per season is far more reliable than an average over 50 at-bats. Small samples deceive.
Financial trading: high-frequency trading firms make millions of small bets, each with a small expected profit. The law of large numbers ensures their daily P&L is close to the expected value, with manageable variance.
What the law teaches
The deepest lesson of the Law of Large Numbers is that large-scale averaging is reliable in a precise mathematical sense. The randomness of individual events doesn’t infect the average; given enough events, the average becomes essentially deterministic.
This is the foundation of all of empirical science. You can’t predict any particular molecule’s behavior, but the average pressure of a gas is utterly predictable. You can’t predict any individual person’s day, but the average behavior of a million people on a Saturday afternoon is highly predictable. You can’t predict whether a single coin flip will be heads, but the long-run fraction is exactly 1/2.
This kind of “convergence of randomness” is why statistics works. Without the law, sample averages would be just guesses. With the law, sample averages are highly reliable estimates with computable error bars.
The flip side — what the law does NOT tell you — is the source of most popular misunderstandings of probability. Random events don’t “even out” in the short term. Streaks happen. Coincidences happen. The law operates at long-run scales, not short-run “balance.”
For everyone working with data: trust averages over large samples. Distrust averages over small samples. Don’t expect short-term balance from random processes. And remember that “expected” doesn’t mean “what you’ll see most of the time” — it means what averages converge to over many trials.
The Bernoulli family — particularly Jacob Bernoulli, who first proved a version of the law in 1713 — discovered something deep: the universe doesn’t care about individual outcomes, but it cares enormously about long-run averages. The law that governs this caring is the Law of Large Numbers.
Three hundred years later, it’s still the foundation of every confidence interval, every poll, every casino payout, and every insurance premium. The mathematics is older than calculus and just as durable. When you see a “margin of error,” you’re seeing the Law of Large Numbers at work, quietly making the average reliable so the rest of statistics can do its job.
Frequently asked
Is the gambler's fallacy related to the Law of Large Numbers?
Yes — it's the most common misunderstanding of the Law. After a long streak of heads, people often think tails is 'due' because the long-run average must be 50%. This is wrong. Each new flip is independent of the past; the coin has no memory. The Law says averages converge over many trials; it doesn't say recent imbalances are corrected by future imbalances.
What's the difference between the weak and strong law of large numbers?
The weak law says that for any small ε, the probability that the sample mean differs from the true mean by more than ε converges to zero as n grows. The strong law says the sample mean converges to the true mean with probability 1 — almost every individual sequence converges, not just the probabilities. The strong law is mathematically stronger and was proven later (Kolmogorov, 1930s).
Does the Law of Large Numbers apply to non-i.i.d. data?
Generalized versions do, but the assumptions matter. Independent but non-identically distributed: the law holds under various conditions. Dependent data: laws of large numbers exist for ergodic processes and Markov chains, but the convergence rates and conditions are subtler. Real-world data with strong correlations or non-stationarity may violate the law's predictions.