There is a particular sequence of integers that has the disconcerting habit of showing up everywhere in combinatorics. Its first few terms are

1,  1,  2,  5,  14,  42,  132,  429,  1430,  1,\; 1,\; 2,\; 5,\; 14,\; 42,\; 132,\; 429,\; 1430,\; \dots

Count the ways to fully parenthesize a string of letters: same numbers. Count the rooted binary trees of a given size: same numbers. Count the triangulations of a convex polygon: same numbers. Count the lattice paths from one corner of a grid to the opposite corner that stay above the diagonal: same numbers. The integers above are the Catalan numbers, denoted CnC_n for n=0,1,2,n = 0, 1, 2, \dots, and they are one of the most ubiquitous sequences in all of mathematics. This article is about why they keep appearing, and what they really are.

The formula

There are several ways to write the nn-th Catalan number. The most compact uses a single binomial coefficient:

Cn=1n+1(2nn)=(2n)!(n+1)!n!.C_n = \frac{1}{n+1}\binom{2n}{n} = \frac{(2n)!}{(n+1)!\,n!}.

A short and useful recurrence, which is in fact how the numbers usually arise from a combinatorial problem, is

Cn+1=i=0nCiCni,C0=1.C_{n+1} = \sum_{i=0}^{n} C_i\,C_{n-i}, \qquad C_0 = 1.

The recurrence says: to build a Catalan object of size n+1n+1, split it into a “left part” of size ii and a “right part” of size nin - i, count those independently, and sum over all ways to split. Almost every Catalan-counted family has a natural split like this, which is what makes the recurrence so widely applicable.

A picture: the five mountain ranges of height 3

The cleanest concrete picture of Catalan numbers is a Dyck path: a sequence of up-steps // and down-steps \\backslash that starts and ends on the horizontal axis, uses equally many of each, and never dips below the axis. A Dyck path of length 2n2n — equivalently, nn ups and nn downs — is exactly the kind of “mountain skyline” that respects sea level. The number of distinct skylines of length 2n2n is CnC_n.

The smallest non-trivial case is n=3n = 3, where C3=5C_3 = 5. The five skylines are shown below — every up-step climbs one unit, every down-step drops one unit, and the path is never allowed underwater.

The five Dyck paths of length 6 (C₃ = 5)

Now read the same picture differently. Treat each up-step as an opening bracket "((" and each down-step as a closing bracket "))". The five paths correspond to the five legal bracket strings of length six: ((()))((())), (()())(()()), (())()(())(), ()(())()(()), ()()()()()(). The “never dipping below the axis” rule becomes “never closing a bracket that was not yet opened.” A Dyck path is a balanced parenthesization in geometric disguise.

Why Catalan numbers count so many things

The bijection between mountain paths and bracket strings is the start of a long chain. Replace each pair of matching brackets with a parent node having two children, and a parenthesization becomes a rooted binary tree — there are CnC_n rooted binary trees with n+1n+1 leaves. Draw chords inside a convex polygon to triangulate it; each triangulation is the dual of a binary tree, so a convex polygon with n+2n+2 sides admits exactly CnC_n triangulations. Insert nn items into a stack in some order and pop them in another: the number of legal stack-shuffle sequences for nn items is CnC_n. Two armies of nn soldiers shaking hands across a table without crossing arms: CnC_n ways. Mountain ranges, brackets, trees, polygons, stacks, handshakes — all the same counting function.

The reason these problems land at the same answer is that each of them carries a hidden split-into-left-and-right structure that obeys the Catalan recurrence. A binary tree with n+1n+1 leaves consists of a left subtree with i+1i+1 leaves and a right subtree with ni+1n-i+1 leaves; a triangulated polygon has one triangle resting on a chosen side and two smaller polygons below it; a bracket string starts with an opening bracket whose matching closer somewhere in the middle splits the rest into two balanced halves. In every case the recurrence Cn+1=i=0nCiCniC_{n+1} = \sum_{i=0}^{n} C_i C_{n-i} falls out for free. Once this is recognised, the closed form Cn=(2nn)/(n+1)C_n = \binom{2n}{n}/(n+1) follows by a beautiful counting trick called the reflection principle, which counts the bad lattice paths (the ones that do dip below the axis) by reflecting them across the axis and bijecting them with shifted unrestricted paths.

A glimpse of the generating function

A more grown-up perspective bundles the entire infinite sequence into a single formal power series:

C(x)=n=0Cnxn=1+x+2x2+5x3+14x4+C(x) = \sum_{n=0}^{\infty} C_n\,x^n = 1 + x + 2x^2 + 5x^3 + 14x^4 + \cdots

The Catalan recurrence is exactly the statement that C(x)=1+xC(x)2C(x) = 1 + x\,C(x)^2. Solving this quadratic for C(x)C(x) gives the closed form

C(x)=114x2x.C(x) = \frac{1 - \sqrt{1 - 4x}}{2x}.

Expand the square root by Newton’s binomial theorem and the coefficients fall out as the Catalan numbers, recovering the formula. The same generating function appears, often with a change of variables, throughout combinatorics and algebra — in random matrix theory it is the moment generating function of the semicircle distribution, the universal density of eigenvalues of large random symmetric matrices. The Catalan numbers are the moments of a continuous distribution from one of the deepest pieces of modern probability, a connection that quietly hints at why they should be everywhere.

Why they matter

Catalan numbers are not the most computationally important sequence in mathematics. What they are is a unifier. When two combinatorial families — say, binary trees and triangulated polygons — both turn out to be counted by Catalan numbers, the next step is to find an explicit bijection between them. Constructing these bijections is a sport that has occupied combinatorialists for the better part of a century, and each new bijection reveals a fresh way in which two apparently unrelated structures are really the same.

The deeper lesson is that simple recursive structure — split into two pieces and solve each — produces a single sequence regardless of the dressing the problem wears. Once you understand the Catalan numbers, you start to notice them everywhere: in the data structures of computer science, in the chord diagrams of knot theory, in the eigenvalue statistics of random matrices, even in the seemingly innocent problem of fitting a maximum number of non-crossing arcs above a baseline. A short list of integers — 1,1,2,5,14,421, 1, 2, 5, 14, 42 — turns out to be one of the alphabets of combinatorics, and learning it is one of those small acts of mathematical literacy that pays back forever.

Frequently asked

Why are these numbers named after Catalan?

Eugène Charles Catalan was a 19th-century Belgian mathematician who studied the sequence in connection with the problem of counting the ways to parenthesize a non-associative product. He was not the first to encounter the numbers — Mongolian mathematician Minggatu found them around 1730, and Euler studied them in the 1750s while counting triangulations of polygons — but Catalan's 1838 paper gave the modern name and the modern recurrence, and the sequence has carried his name ever since.

How many things do Catalan numbers count?

Hundreds. Richard Stanley's monograph Catalan Numbers lists over two hundred different combinatorial families, all of which are counted by the same sequence. Whenever a problem involves nesting, balancing, branching, or a one-sided constraint — anything with the flavour of 'staying above zero' or 'never going below the baseline' — Catalan numbers tend to appear. Proving two such families have the same count typically means exhibiting an explicit bijection between them, and the network of such bijections is a beautiful subfield on its own.

Why is C_n divisible by n+1?

The closed form C_n = (2n choose n)/(n+1) raises an immediate question: why does dividing the central binomial coefficient by n+1 ever give an integer? The cleanest explanation is the cycle lemma of Dvoretzky and Motzkin (1947): among the 2n+1 cyclic rotations of any sequence of n+1 'ups' and n 'downs', exactly one stays strictly above zero. This neatly forces (2n+1 choose n)/(2n+1) to be an integer, which equals the Catalan number C_n.

What is the generating function of the Catalan numbers?

If C(x) = Σ C_n x^n is the formal power series whose coefficients are the Catalan numbers, then the recurrence C_{n+1} = Σ C_i C_{n−i} translates into the algebraic equation C(x) = 1 + x·C(x)². Solving the quadratic for C(x) gives the closed form C(x) = (1 − √(1 − 4x)) / (2x). This compact expression encodes the whole infinite sequence and is the easiest route to deriving the closed-form formula for C_n itself.