If you learned linear algebra in school, you probably met matrices as rectangular grids of numbers with a specific multiplication rule: to compute the (i,j)(i,j) entry of ABAB, you take the dot product of the ii-th row of AA with the jj-th column of BB. Memorise the rule, do the drills, advance to eigenvalues.

This description is technically accurate. It is also a profoundly misleading introduction to why matrices are one of the most important objects in mathematics. The rule that seems arbitrary is actually forced. The calculation you grind through by hand is a special case of something much more general. And the reason linear algebra is the language of computer graphics, machine learning, quantum mechanics, and Google PageRank is not that these fields happen to like grids of numbers. It’s that they all involve linear transformations, and matrices are the most convenient way to represent them.

This article is an attempt to re-teach matrices from the point of view their original inventors had — as a compact notation for something more interesting than themselves.

Linear transformations

Start with a very different-sounding concept: a linear transformation is a function T:VWT: V \to W between vector spaces that respects addition and scalar multiplication:

T(u+v)=T(u)+T(v),T(cv)=cT(v).T(\mathbf{u} + \mathbf{v}) = T(\mathbf{u}) + T(\mathbf{v}), \quad T(c\mathbf{v}) = c \cdot T(\mathbf{v}).

That’s it. Linear transformations are functions that preserve the two operations defining a vector space. Examples include: rotations of the plane, projections onto a subspace, scaling, shear, any combination of these. Most of the geometry you do in computer graphics consists of chains of linear transformations.

Once you know the structure above, you know most of what a linear transformation is. In particular, if e1,e2,,en\mathbf{e}_1, \mathbf{e}_2, \ldots, \mathbf{e}_n is a basis of VV, then TT is completely determined by what it does to those basis vectors. Any other vector can be written as a linear combination v=c1e1++cnen\mathbf{v} = c_1 \mathbf{e}_1 + \cdots + c_n \mathbf{e}_n, and linearity gives

T(v)=c1T(e1)++cnT(en).T(\mathbf{v}) = c_1 T(\mathbf{e}_1) + \cdots + c_n T(\mathbf{e}_n).

So to describe a linear transformation, it is enough to list the images of the basis vectors. That list, written as columns, is the matrix.

A matrix is a list of images

Here is the definition that makes everything fall into place: the matrix of a linear transformation T:RnRmT: \mathbb{R}^n \to \mathbb{R}^m, with respect to the standard bases, is the m×nm \times n array whose jj-th column is the vector T(ej)T(\mathbf{e}_j).

For example, consider the rotation of the plane by 90 degrees counterclockwise. The standard basis vectors are e1=(1,0)\mathbf{e}_1 = (1,0) and e2=(0,1)\mathbf{e}_2 = (0,1). Rotating them by 90° gives T(e1)=(0,1)T(\mathbf{e}_1) = (0,1) and T(e2)=(1,0)T(\mathbf{e}_2) = (-1,0). Writing these as columns:

R90=(0110).R_{90} = \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix}.

That’s the matrix of the rotation. The entries aren’t mysterious — they are literally where the basis vectors end up after the transformation.

To apply the transformation to any other vector v=(x,y)\mathbf{v} = (x, y), you use linearity:

T(v)=xT(e1)+yT(e2).T(\mathbf{v}) = x \cdot T(\mathbf{e}_1) + y \cdot T(\mathbf{e}_2).

In other words, T(v)T(\mathbf{v}) is xx times the first column of the matrix plus yy times the second column. Which is exactly the rule you learned as “matrix times vector.”

The matrix is just bookkeeping for “where does each basis vector go?” Everything else — the multiplication rule, the transpose, the inverse — comes from the underlying transformation, not from the grid.

Why matrix multiplication is what it is

If AA is the matrix of a transformation T:RnRmT: \mathbb{R}^n \to \mathbb{R}^m and BB is the matrix of S:RpRnS: \mathbb{R}^p \to \mathbb{R}^n, then the matrix of the composition TST \circ S (first apply SS, then TT) is the product ABAB.

This is the definition, in my opinion the correct one: matrix multiplication is defined to be whatever makes composition of transformations correspond to multiplication of matrices.

Working out what this forces: the jj-th column of ABAB should be T(S(ej))T(S(\mathbf{e}_j)). The vector S(ej)S(\mathbf{e}_j) is the jj-th column of BB, call it bj\mathbf{b}_j. To apply TT to bj\mathbf{b}_j, we use the rule above: T(bj)=b1jA1+b2jA2++bnjAnT(\mathbf{b}_j) = b_{1j} A_{*1} + b_{2j} A_{*2} + \cdots + b_{nj} A_{*n}, a linear combination of the columns of AA.

Working out the ii-th entry of this gives exactly

(AB)ij=k=1naikbkj,(AB)_{ij} = \sum_{k=1}^{n} a_{ik} b_{kj},

which is the row-times-column formula. It looks arbitrary when presented as a definition. It is the unique formula that makes composition work.

As a corollary, matrix multiplication is associative (A(BC)=(AB)CA(BC) = (AB)C), because function composition is associative. It is not commutative (ABBAAB \neq BA in general), because neither is function composition — rotating and then scaling is generally different from scaling and then rotating.

This is the pattern throughout linear algebra: every “rule” you memorised is a consequence of what matrices represent.

The determinant, rethought

Another concept that looks arbitrary in introductory courses: the determinant. In 2×2 the formula det(abcd)=adbc\det \begin{pmatrix} a & b \\ c & d \end{pmatrix} = ad - bc is easy enough. In 3×3 it becomes the rule of Sarrus or cofactor expansion. In higher dimensions it looks worse.

The geometric meaning is simpler than any formula: the determinant of a matrix is the signed volume scaling factor of the linear transformation.

If TT maps the unit square to a parallelogram of area 2, then det(T)=2\det(T) = 2. If TT flattens everything onto a line (collapses the square to a segment), the area is 0 and det(T)=0\det(T) = 0. If TT reflects things — changes orientation — the determinant is negative.

This interpretation makes every property of the determinant either obvious or intuitive:

  • det(AB)=det(A)det(B)\det(AB) = \det(A) \det(B): two transformations in a row multiply their volume scaling factors.
  • det(A1)=1/det(A)\det(A^{-1}) = 1/\det(A): the inverse undoes the scaling.
  • det(A)=0\det(A) = 0 iff AA is singular: the transformation has collapsed dimension, so no inverse exists.
  • Determinant is unchanged by adding a multiple of one row to another: this is shearing, which doesn’t change volume.

The algebraic formulas are computational tools. The geometric meaning is what the determinant is.

Eigenvalues and eigenvectors

The final conceptual move is to notice that for any linear transformation, there often exist special vectors that don’t get rotated — they only get stretched (or flipped). These are eigenvectors, and the stretch factor is the eigenvalue.

Formally, v\mathbf{v} is an eigenvector of TT with eigenvalue λ\lambda if

T(v)=λv.T(\mathbf{v}) = \lambda \mathbf{v}.

Geometrically: an eigenvector is a direction that the transformation preserves. Finding the eigenvectors and eigenvalues of a transformation is essentially finding the axes along which it acts simplest — purely by scaling, no rotation.

This is why eigenvalues turn up everywhere. The principal components of a dataset are eigenvectors of its covariance matrix. The vibrational modes of a bridge are eigenvectors of a stiffness matrix. The ranking of web pages in Google’s PageRank algorithm is computed from the dominant eigenvector of a linkage matrix. Quantum mechanics, in its matrix formulation, makes the measurement of an observable correspond to projecting a state vector onto eigenspaces of the observable’s matrix.

The unifying theme: if you can find the eigenvectors, the transformation decomposes into independent one-dimensional actions, each a simple scaling. Many hard problems become easy in the right basis, and linear algebra is, to a great extent, the art of finding that basis.

Why this matters beyond mathematics

Linear algebra — matrices and their associated operations — is the computational backbone of modern science and engineering. It turns up everywhere because many systems are approximately linear over short scales, and linear problems have tractable solutions.

Computer graphics moves polygons around by multiplying their vertices by 4×4 matrices representing rotations, translations, and projections. Every video game you have ever played is, at the render level, a matrix-heavy computation.

Machine learning is linear algebra at scale. A neural network layer applies a matrix transformation followed by a nonlinearity; the matrices are the parameters that training adjusts. Training a modern language model involves adjusting billions of matrix entries to minimise a loss function.

Quantum mechanics represents states as vectors in a Hilbert space and physical observables as Hermitian matrices. Measurement outcomes correspond to eigenvalues, probabilities to squared magnitudes of projections. The entire framework is linear algebra plus probability.

Data science reduces high-dimensional datasets with techniques like principal component analysis — a pure eigenvalue decomposition of the data’s covariance matrix.

Numerical simulation of physical systems (weather, fluid dynamics, structural engineering) discretises continuous equations into enormous linear systems. Solving them is 90% of the computational budget.

None of this works if you treat matrices as opaque grids of numbers. All of it depends on understanding matrices as transformations — with eigenvalues that tell you the principal directions of action, with determinants that tell you how volumes scale, with multiplications that tell you how transformations compose.

What to take away

If you came out of school thinking “matrix” meant “grid of numbers to multiply in the row-times-column way,” you were given a correct but impoverished picture. A matrix is a transformation. The grid is just its fingerprint. Every rule you learned falls out of that one fact.

Once you have that intuition, linear algebra stops being a bag of procedures and becomes a language for thinking about structure. That language is the reason so many apparently unrelated subjects — graphics, learning, physics, statistics — share a common mathematical vocabulary. They are all, at their computational core, doing linear algebra.

Matrices are important not because they are elegant, but because they turn out to be the right tool for describing an astonishing range of natural and engineered systems. The multiplication rule that looks arbitrary is actually the most economical way to encode function composition on vector spaces. Once you see that, the subject unfolds.

Frequently asked

Why is matrix multiplication defined in that weird way?

Because it exactly matches the composition of linear transformations. If A is the matrix of one transformation and B of another, then the matrix of 'do B, then A' is the product AB — with exactly the multiplication rule you learned. The rule is not arbitrary; it's forced by wanting matrices to represent transformations.

What does the determinant actually measure?

The signed volume scaling factor of the linear transformation the matrix represents. A determinant of 2 means the transformation doubles volumes. Zero means the transformation collapses volumes to zero (the matrix is singular). Negative means orientation is flipped.