MathOverflow: a miscellany (Part I)

MathOverflow is one of my favorite places to lurk online. Feels I’ve been there long enough that I should start noting down favorite answers etc “for posterity”.

Hence this post, a weekly series on quote-worthy answers to nice questions every Tuesday!

Unfortunately the LaTeX script doesn’t transfer directly to WordPress via just copy-pasting, which makes the notation downright ugly – so I’ll leave them as they are and provide links to original sources, as per standard practice.


What are the most misleading alternate definitions taught in mathematics?

It is often the case that two or more equivalent (but not necessarily semantically equivalent) definitions of the same idea/object are used in practice. Are there examples of equivalent definitions where one is more natural or intuitive? (I’m meaning so greatly more intuitive so as to not be subjective.)

Alternatively, what common examples are there in standard lecture courses where a particular symbolic definition obscures the concept being conveyed.

I

Many topics in linear algebra suffer from the issue in the question. For example:

In linear algebra, one often sees the determinant of a matrix defined by some ungodly formula, often even with special diagrams and mnemonics given for how to compute it in the 3×3 case, say.

det(A) = some horrible mess of a formula

Even relatively sophisticated people will insist that det(A) is the sum over permutations, etc. with a sign for the parity, etc. Students trapped in this way of thinking do not understand the determinant.

The right definition is that det(A) is the volume of the image of the unit cube after applying the transformation determined by A. From this alone, everything follows. One sees immediately the importance of det(A)=0, the reason why elementary operations have the corresponding determinant, why diagonal and triangular matrices have their determinants.

Even matrix multiplication, if defined by the usual formula, seems arbitrary and even crazy, without some background understanding of why the definition is that way.

The larger point here is that although the question asked about having a single wrong definition, really the problem is that a limiting perspective can infect one’s entire approach to a subject. Theorems, questions, exercises, examples as well as definitions can be coming from an incorrect view of a subject!

Too often, (undergraduate) linear algebra is taught as a subject about static objects—matrices sitting there, having complicated formulas associated with them and complex procedures carried out with the, often for no immediately discernible reason. From this perspective, many matrix rules seem completely arbitrary.

The right way to teach and to understand linear algebra is as a fully dynamic subject. The purpose is to understand transformations of space. It is exciting! We want to stretch space, skew it, reflect it, rotate it around. How can we represent these transformations? If they are linear, then we are led to consider the action on unit basis vectors, so we are led naturally to matrices. Multiplying matrices should mean composing the transformations, and from this one derives the multiplication rules. All the usual topics in elementary linear algebra have deep connection with essentially geometric concepts connected with the corresponding transformations.

II

I normally won’t bother with a 5 month old community wiki, but someone else bumped it and I couldn’t help but notice that the significant majority of the examples are highly algebraic. I wouldn’t want the casual reader to go away with the impression that everything is defined correctly all the time in analysis and geometry, so here we go…

1) “A smooth structure on a manifold is an equivalence class of atlases…” Aside from the fact that one hardly ever works directly with an explicit example of an atlas (apart from important counter-examples like stereographic projections on spheres and homogeneous coordinates on projective space), this point of view seems to obscure two important features of a smooth structure. First, the real point of a smooth structure is to produce a notion of smooth functions, and the definition should reflect that focus. With the atlas definition, one has to prove that a function which is smooth in one atlas is also smooth in any equivalent atlas (not exactly difficult, but still an irritating and largely irrelevant chore). Second, it should be clear from the definition that smoothness is really a local condition (the fact that there are global obstructions to every point being “smooth” point is of course interesting, but also not the point). The solution to both problems is to invoke some version of the locally ringed space formalism from the get-go. Yes, it takes some work on the part of the instructor and the students, but I and a number of my peers are living proof that geometry can be taught that way to second year undergraduates. If you still don’t believe there are any benefits, try the following exercise. Sit down and write out a complete proof that the quotient of a manifold by a free and properly discontinuous group action has a canonical smooth structure using (a) the maximal atlas definition and (b) the locally ringed space definition.

2) “A tangent vector on a manifold is a point derivation…” While there are absolutely a lot of advantages to having this point of view around (not the least of which is that it is a better definition in algebraic geometry), I believe that this is misleading as a definition. Indeed, the key property that a good definition should have in my opinion is an emphasis on the close relationship between tangent vectors and smooth curves. Note that such a definition is bound to involve equivalence classes of smooth curves having the same derivative at a given point, and the notion of the derivative of a smooth curve is defined by composing with a smooth function. So for those who really like point derivations, they aren’t far behind. There just needs to be some mention of curves, which in many ways are really what give differential geometry its unique flavor.

3) The notion of amenability in geometric group theory particularly lends itself to misleading definitions. I think there are two reasons. The first is that modulo some mild exaggeration basically every property shared by all amenable groups is equivalent to the definition. The second is that amenability comes up in so many different contexts that it is probably impossible to say there is one and only one “right” definition. Every definition is useful for some purposes and not useful for others. For example the definition involving left invariant means is probably most useful to geometric group theorists while the definition involving the topological properties of the regular representation in the dual is probably more relevant to representation theorists. All that being said, I think I can confidently say that there are “wrong” definitions. For example, I spent about a year of my life thinking that the right definition of amenability for a group is that its reduced group C* algebra and its full group C* algebra are the same.

4) Some functional analysis books have really bad definitions of weak topologies, involving specifying certain bases of open sets. This point of view can be useful for proving certain lemmas and working with some examples, but given the plethora of weak topologies in analysis these books should really give an abstract definition of weak topologies relative to any given family of functions and from then on specify the topology by specifying the relevant family of functions.

III

A simple example is the two definitions for independence of events:

  1. A and B are independent iff P(AB)=P(A)P(B)
  2. A is independent from B iff P(AB)=P(A)

Some presentations start with Definition 1, which is entirely uninformative: nothing in it explains why on earth we bother discussing this. In contrast, Definition 2 says exactly what “independent” means: knowing that B has occured does not change the probability that A occurs as well.

A reasonable introduction to the subject should start with Definition 2; then observe there is an issue when P(B)=0, and resolve it; then observe independence is symmetric; then derive Definition 1.

IV

Another simple example is the definition for equivalence relations:

  1. R(.,.) is an equivalence relation iff R is reflexive, symmetric, and transitive.
  2. R(.,.) is an equivalence relation iff there exists a function f such that R(a,b) iff f(a)=f(b).

Most presentations start with Definition 1, which contains no hint as to why we bother discussing such relations or why we call them “equivalences”. In contrast, Definition 2 (along with a couple of examples) immediately tells you that R captures one particular attribute of the elements of the domain; and, since elements with the same value for this attribute are called “equivalent”, R is called an “equivalence”.

A reasonable introduction should start with Definition 2, then go on to prove Definition 1 is a convenient alternative characterization.

V

My biggest issue is with the coordinate-definition of tensor products. A physicist defines a rank k tensor over a vector space V of dimension n to be an array of nk scalars associated to each basis of V which satisfy certain transformation rules; in particular, if we know the array for a given basis, we can automatically determine it for a different basis. Another way to say this is that the space of tensors is the set of pairs consisting of a basis and a nk array of scalars, identified by an equivalence relation which gives the coordinate transformation law. For some strange reason, people seem to call this a coordinate-free definition. While it is in a sense coordinate-free (the transformation between coordinates lets you break free of coordinates in a sense), it is very confusing at first sight. People who use this definition will they say that certain operations are coordinate-free. What they mean by this, and it took me a long time to figure this out, is that you can do a certain algebraic operation to the coordinates of the tensor, and the formula is the same no matter which basis you work with (e.g., multiplying a covariant rank 1 tensor with a contravariant rank 1 tensor to get a scalar, or exterior differentiation of differential forms, or multiplying two vectors to get a rank 2 tensor).

The much nicer definition uses tensor products. This is a coordinate-free construction, as opposed to thecoordinate-full description given above. This definition is nice because it connects to multilinear maps (in particular, it has a nice universal property). It also helped me see why tensors are different from elements of some nk-dimensional vector space over the same field (they are special because we are equipped not just with a vector space but with a multilinear map from V××VVV. The covariant/contravariant distinction can be explained in terms of functionals. This allows you to talk about contraction of tensors without worrying having to prove that it is coordinate-invariant! Finally, once you have all that under your best, you can easily derive the coordinate transformation laws from the multilinearity of .

VI

I see the problem crop up: a certain mathematical object has many characterizations, any one of which can be taken as the definition. Which do you use when you are introducing the subject?

The first one that comes to mind is the basis of a vector space. Perhaps this is not the best example for the title question of this thread of discussion, but I know that this confuses some students. When I last taught linear algebra, we taught them at least four characterizations. It’s not really that any of the characterizations is obscuring or misleading. Rather, each one highlights some important property(-ies). Of course, the better students enjoy seeing all of the characterizations, and they appreciate every one. The less facile students get flustered because they want there to be just One Right Way of thinking about them.

A similar issue arises with the characterizations of an invertible matrix or linear transformation, though at least with a matrix it seems most reasonable to define an invertible matrix as one that has an inverse, namely another matrix that you can multiply it by to get the identity matrix.

The issue comes up in spades when introducing matroids.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s