In 1927, two Scottish researchers — William Ogilvy Kermack (a biochemist) and Anderson Gray McKendrick (a military doctor) — published a paper in the Proceedings of the Royal Society with a remarkably ambitious goal. They wanted to capture the entire trajectory of an epidemic — the rise, peak, and decline — using nothing more than a few coupled differential equations.
The paper introduced what we now call the SIR model, named for its three categories: Susceptible, Infected, Recovered. Almost a hundred years later, this remains the workhorse model of mathematical epidemiology. Every COVID-19 forecast, every flu-season projection, every measles-outbreak risk assessment uses some descendant of Kermack and McKendrick’s framework.
This article is about what SIR models are, what they predict, and why the simple three-compartment picture captures so much of how diseases actually spread.
The three compartments
Imagine a population divided into three groups at any moment in time:
- : number of Susceptible individuals — people who could catch the disease.
- : number of Infected individuals — currently sick and contagious.
- : number of Recovered individuals — recovered (and assumed immune) or died.
The total population is (assumed constant — no births or non-disease deaths during the outbreak).
The model describes how people move between these compartments:
People flow from at rate . The factor is the mass action rate of contacts between susceptible and infected people: bigger or bigger produces more new infections, and normalizes the contact rate. The parameter is the transmission rate — how infectious the disease is.
People flow from at rate . The parameter is the recovery rate — its reciprocal is the average time an individual stays infectious.
That’s the entire model: two rate constants, three differential equations.
The equations
The mathematical content of the SIR model is three coupled ordinary differential equations:
Reading them: decreases as people get infected. increases from new infections and decreases as people recover. increases from recoveries. The sum is conserved.
These are nonlinear coupled ODEs. They don’t have a clean closed-form solution, but they can be solved numerically with simple methods (Euler integration, Runge-Kutta) — and even computed in real time on any modern computer.
The basic reproduction number R₀
The single most important quantity in epidemiology is the basic reproduction number, denoted :
This is the average number of new infections caused by one infected person in a fully susceptible population. The interpretation: each infected person is infectious for time and produces new infections per unit time, so they cause total infections before recovering.
The crucial threshold is :
- : each infected person produces more than one new infection on average. The disease spreads exponentially in its early phase.
- : each infected person produces less than one new infection. The disease dies out.
- : stable, marginal case.
Estimated values for some diseases:
| Disease | |
|---|---|
| Measles | 12–18 |
| Pertussis | 12–17 |
| Mumps | 10–12 |
| Rubella | 6–7 |
| Smallpox | 5–7 |
| Polio | 5–7 |
| Chickenpox | 10–12 |
| COVID-19 (original) | 2–3 |
| COVID-19 (Omicron) | 9–10 |
| Seasonal influenza | 1–2 |
| Ebola | 1.5–2.5 |
Measles is one of the most infectious known human diseases — a single case produces 12–18 new cases on average in a fully susceptible population. COVID-19’s original variant was milder by this measure (2–3), comparable to a bad influenza season. Omicron’s much higher transmissibility was one factor in its rapid global spread.
The shape of an outbreak
When you solve the SIR equations numerically with reasonable parameter values, you get the characteristic shape of an epidemic:
The blue curve is the susceptible population — it monotonically decreases as people get infected. The red curve rises rapidly, peaks, and decays — this is the famous “epidemic curve” that public health officials track. The green curve rises monotonically toward the final total of people who got infected.
Three features stand out:
The peak: has a maximum at the time when , i.e., when , or equivalently . Below this threshold, infections decline. Above, they grow.
The final size: not everyone gets infected. The outbreak stops when drops to (when the effective reproduction number falls below 1). The fraction of the population that escapes infection is the “final susceptible fraction” , which depends on in a way that has no closed-form expression — but it’s positive for any finite .
The basic shape: this characteristic bell curve appears in nearly every observed disease outbreak. From the bubonic plague to influenza to COVID-19, the trajectory of has roughly this shape: slow start, exponential rise, peak, decay, eventual end.
”Flatten the curve”
The phrase “flatten the curve” — ubiquitous in 2020 during COVID-19 — comes directly from SIR models. The idea is to use public health interventions (distancing, masking, hygiene, quarantine) to reduce , which:
- Lowers the peak of — fewer simultaneous infections, less pressure on hospitals.
- Extends the duration of the outbreak — total infected fraction may stay similar.
- Buys time for vaccine development, hospital capacity expansion, or treatment improvements.
The unmitigated outbreak (red) has a high peak that exceeds hospital capacity, causing deaths from lack of care. The flattened version (green) has a lower peak — fewer concurrent severe cases — even if the total number of eventual infections is similar.
The mathematics here is precise: depends on and . Reducing (via interventions) lowers , which lowers the peak. If drops below 1, the outbreak ends.
Herd immunity
The herd immunity threshold is the fraction of the population that must be immune (through prior infection or vaccination) to prevent sustained spread. It’s a direct consequence of the SIR model.
If a fraction of the population is immune, the effective reproduction number becomes:
To prevent spread, we need , giving the threshold:
For some diseases:
| Disease | Herd immunity threshold | |
|---|---|---|
| Measles | 15 | 93% |
| Mumps | 11 | 91% |
| Polio | 6 | 83% |
| Rubella | 6 | 83% |
| Smallpox | 6 | 83% |
| COVID-19 (Omicron) | 9 | 89% |
| Seasonal influenza | 1.5 | 33% |
The high herd immunity threshold for measles (93%) explains why even small drops in vaccination coverage cause outbreaks. A community with 85% vaccination is below threshold, and measles will spread when introduced.
Extensions and variants
The basic SIR model has been extended in many directions, each adding more realistic detail:
SIRS: includes loss of immunity, where Recovered individuals return to Susceptible after some time. Models diseases like influenza where immunity wanes.
SEIR: adds an Exposed compartment for people who are infected but not yet contagious. Captures the incubation period that most diseases have.
MSEIR: adds Maternal immunity for newborns who initially have antibodies from their mother.
Spatial models: track where people are, not just compartments. Used for modeling geographic spread of disease via cellular automata or PDEs.
Age-structured models: divide each compartment by age class. Critical for diseases where transmission and severity vary with age (COVID-19, measles, RSV).
Network models: each person is a node in a contact graph. Captures heterogeneity of contact patterns much better than mean-field models.
Stochastic SIR: replaces the deterministic ODEs with random processes. Important for small outbreaks where chance matters.
Multi-strain models: track multiple variants competing for hosts. Used for influenza evolution and COVID-19 variant tracking.
Modern epidemic forecasting combines many of these extensions into models with hundreds of compartments and thousands of parameters. The COVID-19 forecasting models used by the CDC, WHO, and academic groups (Imperial College, IHME, Columbia) are all SIR-family models with extensive elaborations.
Historical context
The SIR framework was developed during one of the major epidemiological catastrophes of the 20th century — the 1918 influenza pandemic, which killed 50–100 million people. Kermack and McKendrick worked in the aftermath, trying to understand mathematically why some outbreaks burned out while others sustained.
Their 1927 paper A Contribution to the Mathematical Theory of Epidemics is the foundational document of mathematical epidemiology. The model has held up remarkably well — almost a century of disease modeling has elaborated rather than replaced the framework.
Earlier work existed: Daniel Bernoulli (yes, the same family from the St. Petersburg paradox) modeled smallpox in 1760 to argue for variolation (a precursor to vaccination). But Bernoulli’s work was lost in the broader research literature for over a century. Kermack and McKendrick’s framework is what survived as the standard.
What SIR models teach
The deepest lesson of the SIR model is that collective behavior of a population can be predicted from individual-level interaction rules. Each person is just one of — but if their contact patterns and recovery rates can be modeled, the aggregate behavior follows from simple ODEs.
This is the same lesson as statistical mechanics, the Law of Large Numbers, and large-scale physics: large numbers of similar agents produce predictable aggregate behavior, even when individual behavior is random or complex.
For policy makers, SIR models give a precise framework for thinking about interventions. Vaccinate % of the population, you push below 1. Reduce contacts by %, you lower the peak. These aren’t precise quantitative predictions — they’re qualitative tools for reasoning about which interventions matter most.
For everyday users, the next time you read a headline about R₀ or herd immunity, you’re seeing the SIR model in action. The basic reproduction number isn’t a vague intuition; it’s a specific number from a specific set of differential equations, calibrated against data. The numbers in pandemic news are mathematical objects with precise definitions.
For mathematics students, the SIR model is one of the cleanest examples of applied differential equations in real-world impact. The model is elementary by ODE standards — three equations, two parameters — and yet it correctly predicts the qualitative trajectory of nearly every epidemic ever observed.
A hundred years after Kermack and McKendrick wrote it down, their three coupled equations remain the silent grammar of every pandemic response. Every news headline about transmission rates, every WHO advisory, every CDC recommendation rests on this framework. The mathematics is from 1927. The lives saved are continuously updated.
That’s what makes good applied mathematics: simple enough to understand, powerful enough to predict, durable enough to last.
Frequently asked
Why is R₀ called the 'basic reproduction number'?
Because it measures the average number of new infections caused by one infected person in a fully susceptible population — the disease's 'reproduction' in epidemiological terms. If R₀ > 1, the disease spreads; if R₀ < 1, it dies out. The threshold R₀ = 1 separates outbreaks from contained spread, which is why public health interventions aim to push R₀ below 1.
Are SIR models actually accurate?
They capture the qualitative shape of most outbreaks remarkably well — the rapid rise, peak, and decay. For quantitative predictions, modifications are needed: age structure, geographic heterogeneity, asymptomatic carriers, vaccination, waning immunity. Real-world epidemic forecasting uses 'compartmental models' that extend SIR with many more compartments and parameters.
How does herd immunity actually work?
When enough people are immune (either through infection or vaccination), the effective reproduction number drops below 1 and outbreaks cannot sustain themselves. The threshold fraction needed is 1 - 1/R₀. For a disease with R₀ = 4, you need 75% immunity for herd immunity. For measles (R₀ ≈ 15), you need over 93% — which is why even small vaccination gaps allow measles outbreaks.