In high school math, you learn that scalars are numbers, vectors are arrows in space, and matrices are tables of numbers that transform vectors. This scalar-vector-matrix hierarchy seems complete. It isn’t.

There’s a more general object that includes all three as special cases and extends the pattern indefinitely. It’s called a tensor, and it has emerged as one of the most important mathematical structures of the 20th and 21st centuries. Tensors are the language Einstein used to write his theory of general relativity. They’re also the central computational object in modern artificial intelligence — every deep neural network is a sequence of tensor operations.

The fact that a single mathematical concept underlies both Einstein’s curved spacetime and ChatGPT’s billions of parameters is one of those quiet miracles of mathematics: a structure invented for one purpose turns out, decades later, to be exactly what an unrelated field needs.

This article is about what tensors are, why they matter, and how they’ve become both essential infrastructure for physics and the central abstraction of modern AI.

The hierarchy

Start with the basics:

Scalars are pure numbers. Temperature is a scalar; mass is a scalar; the value of a function at a point is a scalar. A scalar has no direction.

Vectors have magnitude and direction. Velocity is a vector; force is a vector. In a particular coordinate system, you can write a vector as a list of numbers — its components — but the vector itself is a geometric object that exists independently of any coordinate choice.

Matrices are tables of numbers, but more importantly, they represent linear transformations of vectors. A 3×3 matrix turns one 3D vector into another 3D vector. Read our piece on matrices for the deeper story: matrices are notation for linear maps.

Tensors generalize all of these. A tensor of “rank 0” is a scalar. A tensor of “rank 1” is a vector. A tensor of “rank 2” is essentially a matrix (with some technical refinement). A tensor of rank 3 is a three-dimensional array of numbers. Rank 4 is four-dimensional. And so on.

Tensors of arbitrarily high rank exist mathematically. The 4-dimensional Riemann curvature tensor, central to general relativity, has rank 4 and 256 components in 4D spacetime (with symmetries reducing the independent components to 20).

What makes a tensor a tensor

A common misconception is that tensors are just multi-dimensional arrays. Computationally that’s true. Mathematically, it’s not quite the whole story.

A tensor is an object that transforms in a specific way under coordinate changes. If you switch from one coordinate system to another, the components of a tensor change in a predictable manner — and that transformation rule is precisely what makes it a tensor.

Vectors transform in one specific way (linearly under change of basis). Matrices, viewed as 2-tensors, transform in a related way (with the basis change applied to both indices). Higher-rank tensors transform with the basis change applied to each index in turn.

The point is that the underlying geometric or physical reality doesn’t change when you switch coordinates. Only its representation does. A tensor captures the underlying object; its components are merely its expression in a particular frame.

This invariance is critically important for physics. Physical laws should not depend on your choice of coordinates. Writing them in tensor language guarantees this automatically. Einstein’s general relativity is the canonical example — it’s expressed in coordinate-free tensor language, and the same equations apply whether you’re in flat Cartesian coordinates, spherical coordinates near a black hole, or any other frame.

Tensors in physics

The use of tensors in physics is everywhere once you know to look:

General relativity. Einstein’s field equations relate the Einstein tensor (a rank-2 object built from the Ricci curvature tensor and the metric) to the stress-energy tensor (also rank 2). The full Riemann curvature tensor (rank 4) encodes all the local geometric information about how spacetime is curved. Without tensors, general relativity cannot be written. (See our calculus of variations post for related mathematical machinery.)

Continuum mechanics. Stress in a deformable solid is a rank-2 tensor — at each point, three “directions” of force can act on three “directions” of surface, giving 9 components (with symmetry reducing it to 6 independent). Strain is similarly a rank-2 tensor. Hooke’s law for materials, in tensor form, captures how stress and strain are related.

Electromagnetism. The electromagnetic field tensor combines the electric and magnetic fields into a single rank-2 antisymmetric tensor in 4D spacetime. Maxwell’s equations, written in tensor form, become two short tensor equations instead of four vector equations.

Fluid dynamics. The stress tensor of a fluid encodes pressure and viscous forces. The Navier-Stokes equations involve tensor operations.

Quantum mechanics. Tensor products of Hilbert spaces describe composite quantum systems. Quantum entanglement is fundamentally a tensor-product phenomenon.

In each of these fields, tensors aren’t a notational choice — they’re the natural mathematical structure. Trying to do general relativity or continuum mechanics without tensors is like trying to do Newtonian mechanics without calculus.

Tensors in machine learning

The other major modern application — perhaps the more visible one — is in machine learning.

Modern neural networks are sequences of tensor operations. A typical deep network might have:

  • An input tensor (for an image classifier: a 4D tensor of shape [batch, height, width, channels])
  • Weight tensors (for each layer: 2D, 3D, or 4D, depending on layer type)
  • Activation tensors flowing through the layers
  • Output tensors

Each layer applies operations like matrix multiplication, convolution, addition, or non-linear activation. These are all tensor operations. The forward pass of a neural network is a long sequence of tensor manipulations; the backward pass (gradient computation) is essentially differentiating tensor operations.

The major frameworks — TensorFlow (named for tensors), PyTorch, JAX — are essentially efficient tensor libraries with automatic differentiation. They make tensor operations fast (often by running them on GPUs or TPUs, which are hardware optimized for parallel tensor math) and enable computing gradients automatically.

Modern transformer models (the basis of ChatGPT and similar) use tensors particularly heavily:

  • Input is a 3D tensor [batch, sequence_length, embedding_dim]
  • Attention layers compute query, key, and value tensors
  • Multi-head attention works in 4D tensors
  • A 175-billion-parameter language model is essentially a vast collection of weight tensors

The total computation in training a large language model is on the order of 102510^{25} tensor operations. This is feasible only because of the enormous infrastructure that has been built around tensor manipulation.

The mathematical formalism

For a more rigorous picture of what a tensor is, mathematicians use the language of multilinear algebra.

A rank-nn tensor on a vector space VV is a multilinear map from nn copies of VV (or its dual VV^*) to the real numbers. Linearity in each argument is the defining property. The “components” of the tensor in a particular basis are the values it takes when you plug in basis vectors.

This abstract definition has a specific advantage: it’s coordinate-free. A tensor is a geometric/algebraic object that exists without reference to any basis. The components change when you change basis; the tensor itself doesn’t. This is why tensors capture invariant physical quantities so cleanly.

In physics terminology, you’ll see “covariant” and “contravariant” tensor components. These distinguish the two ways a vector can be represented — as a column or as a row — and how each transforms under coordinate changes. The terminology is annoying but reflects a real mathematical distinction.

A worked example: rotations and the metric

Take 3D space with a Cartesian coordinate system. The “metric tensor” — usually denoted gijg_{ij} — describes how distances are measured. In Cartesian coordinates, gijg_{ij} is the identity matrix:

gij=(100010001).g_{ij} = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix}.

Distance between x\mathbf{x} and y\mathbf{y} is ijgij(xiyi)(xjyj)=(x1y1)2+(x2y2)2+(x3y3)2\sqrt{\sum_{ij} g_{ij}(x_i - y_i)(x_j - y_j)} = \sqrt{(x_1 - y_1)^2 + (x_2 - y_2)^2 + (x_3 - y_3)^2} — the Pythagorean formula.

Switch to spherical coordinates. The components of the metric tensor change to

g=(1000r2000r2sin2θ).g = \begin{pmatrix} 1 & 0 & 0 \\ 0 & r^2 & 0 \\ 0 & 0 & r^2 \sin^2\theta \end{pmatrix}.

But it’s still the same metric tensor — distances haven’t changed. Only the components change. The formulae for length and angle, written in tensor notation, give the same answer regardless of coordinates.

In curved spaces (non-Euclidean geometries), gijg_{ij} depends on position, and the entire structure of curvature flows from how it varies. This is the bridge from special relativity (flat spacetime, constant metric) to general relativity (curved spacetime, position-dependent metric).

Why this works for AI

It’s worth asking: why is the same mathematical structure that captures spacetime curvature also right for neural networks?

The honest answer is that the connection is partly accidental — multidimensional arrays are useful in many contexts. But there’s also a deep reason: tensors capture structured numerical relationships in their full generality.

A neural network maps an input tensor (e.g. an image) to an output tensor (e.g. classification scores) via a sequence of operations that are mostly linear (matrix multiplication, convolution) interleaved with nonlinearity (ReLU, sigmoid). The linear parts are tensor contractions. The whole computation is a tensor function.

When you generalize from individual layers (acting on vectors) to mini-batches (acting on collections of vectors at once) to convolutional layers (acting on spatial structure) to attention layers (acting on sequences with relationships), you’re climbing the tensor rank ladder. Each generalization adds an index to the relevant arrays.

The mathematical machinery that handles all of this elegantly — tensor products, contraction, broadcasting — is exactly the machinery of multilinear algebra that physicists developed for general relativity. A century later, AI engineers reach into the same toolbox.

What tensors teach

The deepest lesson of tensors is that the right abstraction has wide applicability. Multi-indexed arrays of numbers transforming consistently under coordinate change is a structural pattern that appears wherever you have vectors with many degrees of freedom and operations that compose linearly.

Once you have tensor notation, many problems become tractable that weren’t before. General relativity has 10 coupled differential equations; in tensor notation, it’s Gμν=8πTμνG_{\mu\nu} = 8\pi T_{\mu\nu} — two indices, one equation. Modern deep learning has billions of operations; in tensor notation, the entire forward pass is a few lines of code.

For a working scientist, the lesson is to invest in the right notation. Notation that captures the structure of the problem is one of the most powerful tools mathematics provides. Tensors are a particularly successful example. Mathematicians invented them. Physicists used them to revolutionize physics. Engineers used them to revolutionize machine learning. The next field to discover its tensor-notation moment is probably already underway.

For everyone else, tensors are a reminder that the mathematics underlying modern technology is often deeper than it looks. The “TensorFlow” in TensorFlow isn’t a marketing name — it really is the same tensor calculus that Einstein used. The mathematics that runs on the GPU in your phone is, in a deep sense, the same mathematics that describes the curvature of spacetime around a black hole.

That’s a remarkable continuity for a single piece of mathematical infrastructure. From 1915 to today, tensors have only become more central, not less. And the next century of mathematical physics, computer science, and engineering will probably continue to find new uses for them — that’s how good mathematical abstractions work.

Frequently asked

Are tensors just multidimensional arrays?

Computationally, yes — that's how they're represented. Mathematically, they're more: tensors are objects that transform in specific ways under change of coordinates. A 'tensor' in the strict sense is a multilinear map, and the array of components is its representation in a particular basis. Machine learning often uses 'tensor' loosely to mean 'multidimensional array' — that's the looser usage but widely accepted.

Why did Einstein need tensors?

Because gravity in general relativity is curvature of spacetime, and curvature has too many components to fit into vectors or matrices. The Riemann curvature tensor has 4 indices and contains all the local geometric information about curvature. Einstein's field equations relate this tensor to the stress-energy tensor (which describes matter and energy distribution). Without tensors, the theory cannot be written down.

Why do machine learning frameworks call themselves 'TensorFlow' and 'PyTorch'?

Because the basic operation in deep learning is multiplying multi-dimensional arrays — tensors. A single layer of a neural network involves a tensor of weights times a tensor of inputs to produce a tensor of outputs. Modern AI frameworks are essentially efficient tensor manipulation libraries with automatic differentiation built in.