# Terry Tao’s “what is a gauge?”

From the introduction to the blog post of the same name:

What is a gauge?

Gauge theory” is a term which has connotations of being a fearsomely complicated part of mathematics – for instance, playing an important role in quantum field theory, general relativity, geometric PDE, and so forth.  But the underlying concept is really quite simple: a gauge is nothing more than a “coordinate system” that varies depending on one’s “location” with respect to some “base space” or “parameter space”, a gauge transform is a change of coordinates applied to each such location, and a gauge theory is a model for some physical or mathematical system to which gauge transforms can be applied (and is typically gauge invariant, in that all physically meaningful quantities are left unchanged (or transform naturally) under gauge transformations).  By fixing a gauge (thus breaking or spending the gauge symmetry), the model becomes something easier to analyse mathematically, such as a system of partial differential equations (in classical gauge theories) or a perturbative quantum field theory (in quantum gauge theories), though the tractability of the resulting problem can be heavily dependent on the choice of gauge that one fixed.  Deciding exactly how to fix a gauge (or whether one should spend the gauge symmetry at all) is a key question in the analysis of gauge theories, and one that often requires the input of geometric ideas and intuition into that analysis.

I was asked recently to explain what a gauge theory was, and so I will try to do so in this post.  For simplicity, I will focus exclusively on classical gauge theories; quantum gauge theories are the quantization of classical gauge theories and have their own set of conceptual difficulties (coming from quantum field theory) that I will not discuss here. While gauge theories originated from physics, I will not discuss the physical significance of these theories much here, instead focusing just on their mathematical aspects.  My discussion will be informal, as I want to try to convey the geometric intuition rather than the rigorous formalism (which can, of course, be found in any graduate text on differential geometry).

– Coordinate systems –

Before I discuss gauges, I first review the more familiar concept of a coordinate system, which is basically the special case of a gauge when the base space (or parameter space) is trivial.

Classical mathematics, such as practised by the ancient Greeks, could be loosely divided into two disciplines, geometry and number theory, where I use the latter term very broadly, to encompass all sorts of mathematics dealing with any sort of number.  The two disciplines are unified by the concept of a coordinate system, which allows one to convert geometric objects to numeric ones or vice versa.  The most well known example of a coordinate system is the Cartesian coordinate system for the plane (or more generally for a Euclidean space), but this is just one example of many such systems.  For instance:

1. One can convert a length (of, say, an interval) into an (unsigned) real number, or vice versa, once one fixes a unit of length (e.g. the metre or the foot).  In this case, the coordinate system is specified by the choice of length unit.
2. One can convert a displacement along a line into a (signed) real number, or vice versa, once one fixes a unit of length and an orientation along that line.  In this case, the coordinate system is specified by the length unit together with the choice of orientation.  Alternatively, one can replace the unit of length and the orientation by a unit displacement vector $e$ along the line.
3. One can convert a position (i.e. a point) on a line into a real number, or vice versa, once one fixes a unit of length, an orientation along the line, and an origin on that line.  Equivalently, one can pick an origin $O$ and a unit displacement vector $e$.  This coordinate system essentially identifies the original line with the standard real line ${\Bbb R}$.
4. One can generalise these systems to higher dimensions.  For instance, one can convert a displacement along a plane into a vector in ${\Bbb R}^2$, or vice versa, once one fixes two linearly independent displacement vectors $e_1, e_2$ (i.e. a basis) to span that plane; the Cartesian coordinate system is just one special case of this general scheme.  Similarly, one can convert a position on a plane to a vector in ${\Bbb R}^2$ once one picks a basis $e_1, e_2$ for that plane as well as an origin $O$, thus identifying that plane with the standard Euclidean plane ${\Bbb R}^2$.  (To put it another way, units of measurement are nothing more than one-dimensional (i.e. scalar) coordinate systems.)
5. To convert an angle in a plane to a signed number (modulo multiples of $2\pi$), or vice versa, one needs to pick an orientation on the plane (e.g. to decide that anti-clockwise angles are positive).
6. To convert a direction in a plane to a signed number (again modulo multiples of $2\pi$), or vice versa, one needs to pick an orientation on the plane, as well as a reference direction (e.g. true or magnetic north is often used in the case of ocean navigation).
7. Similarly, to convert a position on a circle to a number (modulo multiples of $2\pi$), or vice versa, one needs to pick an orientation on that circle, together with an origin on that circle.  Such a coordinate system then equates the original circle to the standard unit circle $S^1 := \{ z \in {\Bbb C}: |z| = 1 \}$ (with the standard origin $+1$ and the standard anticlockwise orientation $\circlearrowleft$).
8. To convert a position on a two-dimensional sphere (e.g. the surface of the Earth, as a first approximation) to a point on the standard unit sphere $S^2 := \{ (x,y,z) \in {\Bbb R}^3: x^2+y^2+z^2 \}$, one can pick an orientation on that sphere, an “origin” (or “north pole”) for that sphere, and a “prime meridian” connecting the north pole to its antipode.  Alternatively, one can view this coordinate system as determining a pair of Euler angles $\phi, \lambda$ (or alatitude and longitude) to be assigned to every point on one’s original sphere.
9. The above examples were all geometric in nature, but one can also consider “combinatorial” coordinate systems, which allow one to identify combinatorial objects with numerical ones.  An extremely familiar example of this is enumeration: one can identify a set A of (say) five elements with the numbers 1,2,3,4,5 simply by choosing an enumeration $a_1, a_2, \ldots, a_5$ of the set A.  One can similarly enumerate other combinatorial objects (e.g.graphs, relations, trees, partial orders, etc.), and indeed this is done all the time in combinatorics.  Similarly for algebraic objects, such as cosets of a subgroup H (or more generally, torsors of a group G); one can identify such a coset with H itself by designating an element of that coset to be the “identity” or “origin”.

More generally, a coordinate system $\Phi$ can be viewed as an isomorphism $\Phi: A \to G$ between a given geometric (or combinatorial) object A in some class (e.g. a circle), and a standard object G in that class (e.g. the standard unit circle).  (To be pedantic, this is what a global coordinate system is; a localcoordinate system, such as the coordinate charts on a manifold, is an isomorphism between a local piece of a geometric or combinatorial object in a class, and a local piece of a standard object in that class.  I will restrict attention to global coordinate systems for this discussion.)

Coordinate systems identify geometric or combinatorial objects with numerical (or standard) ones, but in many cases, there is no natural (or canonical) choice of this identification; instead, one may be faced with a variety of coordinate systems, all equally valid.  One can of course just fix one such system once and for all, in which case there is no real harm in thinking of the geometric and numeric objects as being equivalent.  If however one plans to change from one system to the next (or to avoid using such systems altogether), then it becomes important to carefully distinguish these two types of objects, to avoid confusion.  For instance, if an interval AB is measured to have a length of 3 yards, then it is OK to write $|AB|=3$ (identifying the geometric concept of length with the numeric concept of a positive real number) so long as you plan to stick to having the yard as the unit of length for the rest of one’s analysis.  But if one was also planning to use, say, feet, as a unit of length also, then to avoid confusing statements such as “$|AB|=3$ and $|AB|=9$“,  one should specify the coordinate systems explicitly, e.g. “$|AB| = 3 \hbox{ yards}$ and $|AB| = 9 \hbox{ feet}$“.  Similarly, identifying a point P in a plane with its coordinates (e.g. $P = (4,3)$) is safe as long as one intends to only use a single coordinate system throughout; but if one intends to change coordinates at some point (or to switch to a coordinate-free perspective) then one should be more careful, e.g. writing $P = 4 e_1 + 3 e_2$, or even $P = O + 4 e_1 + 3 e_2$, if the origin O and basis vectors $e_1, e_2$ of one’s coordinate systems might be subject to future change.

As mentioned above, it is possible to in many cases to dispense with coordinates altogether.  For instance, one can view the length $|AB|$ of a line segment AB not as a number (which requires one to select a unit of length), but more abstractly as the equivalence class of all line segments CD that are congruent to AB.  With this perspective, $|AB|$ no longer lies in the standard semigroup ${\Bbb R}^+$, but in a more abstract semigroup ${\mathcal L}$ (the space of line segments quotiented by congruence), with addition now defined geometrically (by concatenation of intervals) rather than numerically.  A unit of length can now be viewed as just one of many different isomorphisms $\Phi: {\mathcal L} \to {\Bbb R}^+$ between ${\mathcal L}$ and ${\Bbb R}^+$, but one can abandon the use of such units and just work with ${\mathcal L}$ directly.  Many statements in Euclidean geometry involving length can be phrased in this manner.  For instance, if B lies in AC, then the statement $|AC|=|AB|+|BC|$ can be stated in ${\mathcal L}$, and does not require any units to convert ${\mathcal L}$ to ${\mathcal R}^+$; with a bit more work, one can also make sense of such statements as $|AC|^2 = |AB|^2 + |BC|^2$ for a right-angled triangle ABC (i.e. Pythagoras’ theorem) while avoiding units, by defining a symmetric bilinear product operation $\times: {\mathcal L} \times {\mathcal L} \to {\mathcal A}$ from the abstract semigroup ${\mathcal L}$ of lengths to the abstract semigroup ${\mathcal A}$ of areas.  (Indeed, this is basically how the ancient Greeks, who did not quite possess the modern real number system ${\Bbb R}$, viewed geometry, though of course without the assistance of such modern terminology as “semigroup” or “bilinear”.)

The above abstract coordinate-free perspective is equivalent to a more concretecoordinate-invariant perspective, in which we do allow the use of coordinates to convert all geometric quantities to numeric ones, but insist that every statement that we write down is invariant under changes of coordinates.  For instance, if we shrink our chosen unit of length by a factor $\lambda > 0$, then the numerical length of every interval increases by a factor of $\lambda$, e.g. $|AB| \mapsto \lambda |AB|$.  The coordinate-invariant approach to length measurement then treats lengths such as $|AB|$ as numbers, but requires all statements involving such lengths to be invariant under the above scaling symmetry.  For instance, a statement such as $|AC|^2 = |AB|^2 + |BC|^2$ is legitimate under this perspective, but a statement such as $|AB| = |BC|^2$ or $|AB| = 3$ is not.  [In other words, co-ordinate invariance here is the same thing as being dimensionally consistent.  Indeed,dimensional analysis is nothing more than the analysis of the scaling symmetries in one’s coordinate systems.]  One can retain this coordinate-invariance symmetry throughout one’s arguments; or one can, at some point, choose to spend (orbreak) this coordinate invariance by selecting (or fixing) the coordinate system (which, in this case, means selecting a unit length).  The advantage in spending such a symmetry is that one can often normalise one or more quantities to equal a particularly nice value; for instance, if a length $|AB|$ is appearing everywhere in one’s arguments, and one has carefully retained coordinate-invariance up until some key point, then it can be convenient to spend this invariance to normalise $|AB|$ to equal 1.  (In this case, one only has a one-dimensional family of symmetries, and so can only normalise one quantity at a time; but when one’s symmetry group is larger, one can often normalise many more quantities at once; as a rule of thumb, one can normalise one quantity for each degree of freedom in the symmetry group.)  Conversely, if one has already spent the coordinate invariance, one can often buy it back by converting all the facts, hypotheses, and desired conclusions one currently possesses in the situation back to a coordinate-invariant formulation.  Thus one could imagine performing one normalisation to do one set of calculations, then undoing that normalisation to return to a coordinate-free perspective, doing some coordinate-free manipulations, and then performing a different normalisation to work on another part of the problem, and so forth.  (For instance, in Euclidean geometry problems, it is often convenient to temporarily assign one key point to be the origin (thus spending translation invariance symmetry), then another, then switch back to a translation-invariant perspective, and so forth.  As long as one is correctly accounting for what symmetries are being spent and bought at any given time, this can be a very powerful way of simplifying one’s calculations.)

Given a coordinate system $\Phi: A \to G$ that identifies some geometric object A with a standard object G, and some isomorphism $\Psi: G \to G$ of that standard object, we can obtain a new coordinate system $\Psi \circ \Phi: A \to G$ of A by composing the two isomorphisms.  [I will be vague on what “isomorphism” means; one can formalise the concept using the language of category theory.] Conversely, every other coordinate system $\Phi': A \to G$ of $A$ arises in this manner.  Thus, the space of coordinate systems on A is (non-canonically) identifiable with the isomorphism group $\hbox{Isom}(G)$ of G.  This isomorphism group is called the structure group (or gauge group) of the class of geometric objects.  For example, the structure group for lengths is ${\Bbb R}^+$; the structure group for angles is ${\Bbb Z}/2{\Bbb Z}$; the structure group for lines is the affine group $\hbox{Aff}({\Bbb R})$; the structure group for $n$-dimensional Euclidean geometry is the Euclidean group $E(n)$; the structure group for (oriented) 2-spheres is the (special) orthogonal group $SO(3)$; and so forth.  (Indeed, one can basically describe each of the classical geometries (Euclidean, affine, projective, spherical, hyperbolic, Minkowski, etc.) as ahomogeneous space for its structure group, as per the Erlangen program.)