Some events arrive at random. A radioactive nucleus may decay at any instant; a meteorite may strike Earth at no fixed time; a customer may walk into a shop at no schedule. If you watch for a fixed length of time, the number of such events you see — zero, one, two, or perhaps a few more — has a distribution. Astonishingly, when the events are rare, independent, and happen at a steady average rate, the distribution always takes the same simple shape, and is captured by one of the most useful formulas in probability:

P(X=k)=λkeλk!,k=0,1,2,P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}, \qquad k = 0, 1, 2, \dots

This is the Poisson distribution, named after the French mathematician Siméon Denis Poisson, who derived it in 1837. The single parameter λ\lambda — the Greek letter lambda — is the average number of events expected in the observation window. Plug in any non-negative integer kk and the formula tells you the probability of seeing exactly that many. This article explains why this little formula governs everything from telephone calls and queueing customers to typos on a page and the clicks of a Geiger counter.

The derivation: a limit of coin flips

The fastest way to feel where the formula comes from is to start with something familiar. Imagine cutting the observation window into a very large number nn of tiny equal slots, each so small that at most one event can occur in it. Suppose each slot independently contains an event with the same small probability pp. The total count is then a sum of nn independent yes/no trials — a binomial distribution with parameters nn and pp, whose probability of exactly kk events is (nk)pk(1p)nk\binom{n}{k} p^k (1-p)^{n-k} with mean npnp.

Now refine the picture. Keep the average count at a fixed value λ\lambda, but cut the time into ever-finer slots: let nn \to \infty while p=λ/n0p = \lambda/n \to 0. The expression (nk)(λ/n)k(1λ/n)nk\binom{n}{k}\,(\lambda/n)^k\,(1-\lambda/n)^{n-k} has a clean limit. The factor (1λ/n)n(1-\lambda/n)^n approaches eλe^{-\lambda}, the binomial coefficient (nk)/nk\binom{n}{k}/n^k tends to 1/k!1/k!, and the rest collapses to λk\lambda^k. What is left is exactly Poisson’s formula. So the Poisson distribution is the limit of binomial counts when the events are dilute: many opportunities, each rarely seized, with a steady total rate. Real-world rare-event counts inherit the same distribution whenever those assumptions are approximately met.

What it looks like

The shape of the distribution depends entirely on the mean λ\lambda. Below is the case λ=3\lambda = 3 — perhaps the average number of e-mails arriving in your inbox during a coffee break, or the number of goals in a typical football match. The probabilities of 0,1,2,0, 1, 2, \dots events are plotted as bars: small at zero, rising to a peak around k=2k = 2 and k=3k = 3 (both about 0.2240.224), then thinning out, with the long right tail that is the signature of Poisson counts.

Poisson distribution with λ = 3 0 1 2 3 4 5 6 7 8 9 10 number of events k 0.1 0.2 P(X = k)

A useful sanity check: add up all the bar heights and you get 11, since the total probability must, of course, be 11. The fact that k=0λk/k!=eλ\sum_{k=0}^{\infty} \lambda^k/k! = e^\lambda is exactly what makes the eλe^{-\lambda} prefactor normalize the formula correctly.

A signature property: mean equals variance

The Poisson distribution has a striking algebraic peculiarity. Its mean is λ\lambda — unsurprising, since λ\lambda was defined as the average count. Its variance, which measures how widely the count tends to scatter around that average, is also λ\lambda. Mean and variance coincide. This is a fingerprint: if you observe a series of counts and find that the sample variance is approximately equal to the sample mean, you have circumstantial evidence that the underlying mechanism is Poisson-like. If the variance is noticeably larger — overdispersion — the events are probably clumping or have a varying rate, and the simple Poisson model needs a richer cousin.

The Poisson process: the same idea in continuous time

So far we have looked at one window. In practice we usually have a stream of arrivals — calls into a help desk, particles entering a detector, hits on a web server — and we care about when they happen, not just how many. This is captured by the Poisson process, the natural continuous-time companion of the distribution. Events occur at random instants such that the number of events in any time interval of length tt is Poisson distributed with mean λt\lambda t, and the counts in disjoint intervals are independent.

A beautiful equivalent description: the waiting times between successive events of a Poisson process are independent, each distributed exponentially with rate λ\lambda. So a Poisson process is a probabilistic clock whose tick spacing is exponentially distributed. This is the simplest non-trivial point process, and it underlies queueing theory (the M/M/1 queue and its descendants), reliability engineering (component failures), and the simulation of stochastic systems across the sciences.

Where it shows up

Once you know what to look for, the Poisson distribution turns up everywhere counts of rare events are involved. Radioactive decay: in any short interval the number of decays from a sample is Poisson, which is why Geiger counters click in characteristically irregular bursts. Traffic and queues: arrivals at a call centre, a server, or a checkout line are well modelled by a Poisson process when customers act independently. Mutations: the number of new mutations in a small region of a genome per generation, the number of cosmic-ray hits a memory cell takes per day, even the number of horse-kick deaths per Prussian cavalry corps in Bortkiewicz’s famous 1898 study — all are Poisson because each involves many opportunities and very small per-opportunity probabilities.

When the assumptions hold, the Poisson distribution is the right answer; when they almost hold, it is an excellent approximation; and when they fail, the manner of the failure tells you something about the underlying mechanism — perhaps the rate is varying, perhaps events are correlated, perhaps the system is bursty. That a single short formula can serve simultaneously as a model, an approximation, and a diagnostic is the quiet reason it sits among the half-dozen most important distributions in all of probability.

Frequently asked

Who was Poisson and when did the distribution appear?

Siméon Denis Poisson was a French mathematician and physicist who published the distribution in 1837 as a limiting case of the binomial in a treatise on judicial probability. The result sat largely unused for half a century, then gained wide currency in 1898 when Ladislaus Bortkiewicz showed it fitted the famous data on Prussian cavalrymen killed by horse-kicks — small whole-number counts of rare, independent events, exactly the regime Poisson's law was made for.

Why are the mean and the variance both equal to λ?

Both inherit from the binomial distribution from which the Poisson is derived. A binomial(n, p) has mean np and variance np(1−p). As n grows and p shrinks with np held fixed at λ, the mean stays at λ and the variance, np(1−p), tends to λ as well, since 1−p → 1. The two quantities coincide in the limit. This 'mean equals variance' fingerprint is a quick check for whether real-world counts plausibly come from a Poisson process at all.

What is a Poisson process?

It is the continuous-time generator of Poisson counts. Events occur at random instants such that any two disjoint time intervals contain independent counts and the count in an interval of length t is Poisson with mean λt. Equivalently, the waiting time between successive events is exponentially distributed with rate λ. The Poisson process is the simplest model of memoryless arrivals and underpins queueing theory, reliability engineering, and the simulation of stochastic systems.

When does the Poisson distribution fail to fit real data?

Whenever the events influence each other or the rate changes. Goals in a football match are nearly Poisson — the assumption of independent rare events fits well — but pile-ups on motorways are not: one accident causes congestion that triggers more. Bursty data such as packet arrivals on a network or word occurrences in text typically show variance larger than the mean — 'overdispersion' — and are better modelled by mixtures or by negative-binomial distributions instead.