Nice quote-worthy answers and questions from MathOverflow every Tuesday!
I’ve been fascinated by the phenomenon the question addresses for a long time. We have complex minds evolved over many millions of years, with many modules always at work. A lot we don’t habitually verbalize, and some of it is very challenging to verbalize or to communicate in any medium. Whether for this or other reasons, I’m under the impression that mathematicians often have unspoken thought processes guiding their work which may be difficult to explain, or they feel too inhibited to try. One prototypical situation is this: there’s a mathematical object that’s obviously (to you) invariant under a certain transformation. For instant, a linear map might conserve volume for an ‘obvious’ reason. But you don’t have good language to explain your reason—so instead of explaining, or perhaps after trying to explain and failing, you fall back on computation. You turn the crank and without undue effort, demonstrate that the object is indeed invariant.
Here’s a specific example. Once I mentioned this phenomenon to Andy Gleason; he immediately responded that when he taught algebra courses, if he was discussing cyclic subgroups of a group, he had a mental image of group elements breaking into a formation organized into circular groups. He said that ‘we’ never would say anything like that to the students. His words made a vivid picture in my head, because it fit with how I thought about groups. I was reminded of my long struggle as a student, trying to attach meaning to ‘group’, rather than just a collection of symbols, words, definitions, theorems and proofs that I read in a textbook.
I find there is a world of difference between explaining things to a colleague, and explaining things to a close collaborator. With the latter, one really can communicate at the intuitive level, because one already has a reasonable idea of what the other person’s mental model of the problem is. In some ways, I find that throwing out things to a collaborator is closer to the mathematical thought process than just thinking about maths on one’s own, if that makes any sense.
One specific mental image that I can communicate easily with collaborators, but not always to more general audiences, is to think of quantifiers in game theoretic terms. Do we need to show that for every epsilon there exists a delta? Then imagine that you have a bag of deltas in your hand, but you can wait until your opponent (or some malicious force of nature) produces an epsilon to bother you, at which point you can reach into your bag and find the right delta to deal with the problem. Somehow, anthropomorphising the “enemy” (as well as one’s “allies”) can focus one’s thoughts quite well. This intuition also combines well with probabilistic methods, in which case in addition to you and the adversary, there is also a Random player who spits out mathematical quantities in a way that is neither maximally helpful nor maximally adverse to your cause, but just some randomly chosen quantity in between. The trick is then to harness this randomness to let you evade and confuse your adversary.
Is there a quantity in one’s PDE or dynamical system that one can bound, but not otherwise estimate very well? Then imagine that it is controlled by an adversary or by Murphy’s law, and will always push things in the most unfavorable direction for whatever you are trying to accomplish. Sometimes this will make that term “win” the game, in which case one either gives up (or starts hunting for negative results), or looks for additional ways to “tame” or “constrain” that troublesome term, for instance by exploiting some conservation law structure of the PDE.
For evolutionary PDEs in particular, I find there is a rich zoo of colourful physical analogies that one can use to get a grip on a problem. I’ve used the metaphor of an egg yolk frying in a pool of oil, or a jetski riding ocean waves, to understand the behaviour of a fine-scaled or high-frequency component of a wave when under the influence of a lower frequency field, and how it exchanges mass, energy, or momentum with its environment. In one extreme case, I ended up rolling around on the floor with my eyes closed in order to understand the effect of a gauge transformation that was based on this type of interaction between different frequencies. (Incidentally, that particular gauge transformation won me a Bocher prize, once I understood how it worked.) I guess this last example is one that I would have difficulty communicating to even my closest collaborators. Needless to say, none of these analogies show up in my published papers, although I did try to convey some of them in my PDE book eventually.
ADDED LATER: I think one reason why one cannot communicate most of one’s internal mathematical thoughts is that one’s internal mathematical model is very much a function of one’s mathematical upbringing. For instance, my background is in harmonic analysis, and so I try to visualise as much as possible in terms of things like interactions between frequencies, or contests between different quantitative bounds. This is probably quite a different perspective from someone brought up from, say, an algebraic, geometric, or logical background. I can appreciate these other perspectives, but still tend to revert to the ones I am most personally comfortable with when I am thinking about these things on my own.
ADDED (MUCH) LATER: Another mode of thought that I and many others use routinely, but which I realised only recently was not as ubiquitious as I believed, is to use an “economic” mindset to prove inequalities such as X≤Y or X≤CY for various positive quantities X,Y, interpreting them in the form “If I can afford Y, can I therefore afford X?” or “If I can afford lots of Y, can I therefore afford X?” respectively. This frame of reference starts one thinking about what types of quantities are “cheap” and what are “expensive”, and whether the use of various standard inequalities constitutes a “good deal” or not. It also helps one understand the role of weights, which make things more expensive when the weight is large, and cheaper when the weight is small.
I think the root of the phenomenon is that we can only communicate to others what we know, not what we understand.
Also, it is not unreasonable to think that one’s mental images are not going to be of any help to others (In fact, they may well make things more complicated, or confusing for others: I have been told mental images by others—sometimes indirectly, by the choice of the word introduced in a definition—and been thereby misled; here «misled» means «led in a direction different to the one I personally would follow in order to form my own mental image of the concept».) For example, for me resolving the singularities of algebraic varieties makes a clicking (or clacking) sound: this is quite significant for me in a way, but when talking to others I doubt I’d make any mention of this, for seriously doubt it would help 🙂
There is a huge gap.
I’ve always been interested in this question from the point of not just the “hermeneutics” of mathematics, but also from the standpoint of motivation for mathematicians. I wonder to what degree doing mathematics is constructing a mental model for a mathematical object, comparing the properties of that model to the facts associated to the object and then trying to reconcile the model with the facts? Some personal examples: I often read a definition, and then try to write the definition in an equivalent form in my own words…based on whatever vague impression/model the definition inspired. After this, I compare my definition with the original and then try to see what in my conception/mental picture needs altering. Another thing I commonly do is to read the statement of a theorem and try to prove it for myself…however I am not simply trying to prove the theorem formally, but am trying to build a conceptual construct or point of view that will make the proof of the theorem evident in light of that construct. Personally, this (often failing) attempt to build a sharp intuitive mental model of a mathematical object is the primary motivation for doing mathematics at all.
This all reminds me of what was (purportedly) written on Richard Feynman’s blackboard at the time of his death: If you cannot create it, you don’t understand it. One is forced to wonder what constitutes “creation” here.
EDIT/ADDENDUM: In the interview with Alain Connes here there is a great description of internal mental process. This has inspired me to distill that, for me, mathematics is the triumph of concept over brute computation. The uncommunicated mental models that allow us to organize and complete computations and proofs that seem impossible is, the central source of joy and surprise in mathematics.
Paraphrasing Gowers, philosophy of mathematics is useful in that it affects the practice of mathematics. The above viewpoint is liberating as it points at what to spend time trying to do. It is frustrating to watch introductory analysis students stumble around with formalism with no apparent “picture” (not necessarily geometric) of what is going on in a proof. The typical undergraduate seems to spend very little time trying to find a mental model that generates a vivid proof. This may be due to the fact that a fig leaf covers the essential part of what we do!
The well known situation of language translation is I believe akin to the tension between thinking and explaining.
I am French, I can understand, write and explained myself in English yet I am a bit at a loss when required to translate some piece of English into French so I call this translating ability a third language. My guess or feeling or personal view is that the thinking is more semantic and – for weird (and sad) cultural reasons- mathematical explaining is too often required to be on a syntactic level.
Just to talk about something fresh for which I still have a good memory of what I actually thought and what I wrote, let’s take this example
1) Ivan Fesenko teased me with the puzzle without the gradient condition 10 years ago. I solved it with the standard (1−xy)2+x2, but it would be nice to tease him with the upgraded puzzle when I talk to him next. Also, it is high time to finish it off.
2) The standard polynomial is even and the only critical point is a saddle. If you think of it, this saddle is inevitable: we have low points on the landscape on both sides of the x-axis and high values on the x-axis going to +∞ both ways. Thus the mountain pass lemma will ensure a saddle somewhere.
3) This works every time when the sequence of points where the polynomial goes to 0 has two different limiting directions. Then we can separate them by a line and run the same argument. So, the limiting direction must be unique.
4) This seems impossible because the highest (even) degree homogeneous part should vanish in this direction but then it also vanishes in the opposite direction. This makes hte second highest (odd) degree homogeneous part to vanish on the entire line too. The next degree is unclear though…
5) We certainly need something to break the symmetry here. A polynomial family Py(x) of polynomials in x that have roots close to 0 when y→+∞ and no roots when y→−∞ would be nice.
6) Hey, I know this one: yx2−1. Let’s try x2+(x2y−1)2.
7) Damn, it doesn’t work. The origin is still a critical point.
8) Yeah, what else would you expect: the low points on the landscape are accumulating to one direction but still are separated by the line, so the mountain pass lemma is as powerful as before. To kill it, we need to shift both descends to one side.
9) Add x to Py(x). That won’t change the limiting direction but will shift the zeroes a bit. So, let’s try f(x,y)=x2+(x2y+x+1)2.
10) The origin is good now: fx=2 there. Actually, it is 2 everywhere where x=0.
11) If x≠0, then fy=0 only if x2y+x+1=0 but then fx=2x≠0 by the chain rule.
12) OK, let’s post the example and keep it in memory for teasing people…
That’s what I actually thought for the last hour or so (interlaced with some personal thoughts that are of no interest for this thread).
What I wrote can be seen easily if you follow the link.
Why such discrepancy?
a) Some steps in the chain like 1) and 10) are too personal to be of interest to anybody. You need them to “start the engine running” and to “vent the steam”, but they aren’t, strictly speaking, mathematics and do not make me look any better, so why to publish them?
b) Some steps like 8) and the heuristics in 9) are actually false. To publish them would be ridiculous.
c) 4) and 7) are “failures” on the way. There is no point in telling anyone where and how I failed. I could fill volumes with my failed attempts if I started doing it.
d) 10) and 11) are trivial computations. Everyone can do those himself.
e) 2), 3), 5) are left. 9) is the counterexample. 2), 3) are steps in the direction of the affirmative answer. Once the final answer is negative, there is no point in talking of the steps in the opposite direction.
f) 5) is a nice idea but everybody knows that y is not even. One can see the whole mechanics in the answer itself, so there is no need to explain it separately.
I don’t know if this account of one personal affair with one relatively simple problem can really shed much light on why we do not tell/write exactly what we see/think of, but you asked and I answered.
One observation: People understandably hesitate telling a half-truth. When you teach a heuristic picture to someone, you also need to teach them about how fuzzy it is and when it starts to break down. A more calculational proof has the virtue of being self-contained and robustly transmissible. This is even more important when writing a textbook.
Then there are other cases where I’m puzzled why certain heuristic means of understanding and organizing knowledge don’t seem to be usually taught. Take the concept of normal subgroups. In one of his books, V.I. Arnold says that a subgroup is normal when it is relativistically invariant, and he doesn’t develop that line of thought any deeper. That statement is a good example of a heuristic analogy that is specific in its detail but general in its spirit. However you phrase it, certainly you should give your students the idea that a normal subgroup is something whose structure is invariant with respect to the parent group’s symmetries. As a litmus test, your students should be able to tell whether these subgroups are normal at a glance, without calculation:
Let E2 be the Euclidean group of the plane and let O2 be the subgroup fixing some point.
Subgroups of E2:
- Translations along some particular direction.
- Translations along every direction.
- Translations and glides along every direction.
- Reflections in every line.
- Rotations around some particular point.
- Symmetries of a tessellation.
Subgroups of O2:
- Symmetries of a regular polygon.
- Reflections in a line.
The way I think about the non-normal cases is that there is something something non-isotropic about them, some structure that the subgroup preserves that is not preserved by the parent group. For example:
- Translations along some particular direction: Rotations don’t preserve the direction.
- Translations along every direction: No special directions, so it’s normal.
- Translations and glides along every direction: Ditto.
- Reflections in every line: Ditto. (This combines the two previous cases.)
- Rotations around some particular point: Translations don’t preserve the point.
This phenomenon occurs not just in advanced mathematics but also right at the very bottom in simple mental arithmetic. If I have to do a moderately complicated calculation such as adding two three-digit numbers, there’s often a part of my brain that jumps ahead to the answer before another more cautious part has got there with carefully checked calculations. The first part just sort of feels the answer and then says “I told you so” to the second part, except occasionally when the first part gets it wrong and the second part says “Now you know why I bother to be careful” to the first part.
And there are also aspects of how I carry out integer addition and subtraction that I would normally be a bit embarrassed to verbalize, such as that if I subtract 48 from 135 then there’s a preliminary answer, 97, that I know from experience is wrong and has to be corrected by subtracting 10. (The justification for the preliminary answer is that 13-4=9 and that the answer must end in a 7.) It’s not quite what’s going on in my head, but it’s almost as though I say, “OK I’ll subtract 58 instead so as to get the right answer.” But if I were teaching this to a child then I’d tell a slightly different story, such as borrowing 1, or first subtracting 50 and then adding 2.
I have a worse problem than having unspoken thought processes: some of my best thought processes are simply beneath the level of consciousness and I don’t notice them at all until they’re finished. Even then, I often get only an answer and not an explanation out of them. Surely this happens to everyone: the problem solved during sleep, the idea on a walk in the woods, the conviction that a conjecture is true on utterly minimal evidence, the argument that pops up full formed in the middle of a conversation.
My mathematical process is roughly this: consciously, I try a lot of stupid things which essentially have no chance of working but do have the benefit of exposing me to lots of examples; these examples pile up and are subconsciously masticated for days, weeks, months — I’m not old enough mathematically to put “years” here yet — and eventually by some inner and unobservable process I just have a feeling about what to do.
Perhaps that’s an exaggeration. But I certainly do feel that way sometimes, and to the extent that it’s true, it means that the whole project of trying to communicate how I thought of something is just telling stories, at least if I say anything other than “well, I just knew one day.”
I am immediately reminded of the following phenomenon:
“Let’s consider an elliptic curve E over Qp.”
~Speaker begins to draw a donut at the board~
Clearly this is wrong. The speaker may even mention how wrong it is. But there’s enough of a kernel of truth to the picture where it may be helpful for the audience. Justifying why the picture is valid in the situation being considered would involve some hard and not undue model theory, but doing anything more than waving your hands and saying the magic words (“a-la-ca-Lefschetz!”) is likely to derail your talk so badly as to effectively destroy it. Even if the picture is not completely valid for what you’re talking about, it’s a visual aid which is at least an easily described first approximation to the truth.
The issue seems, to me, that a lot of these mental pictures are very personal.
I am reminded of an anecdote by Richard Feynman, from “The Pleasure of Finding Things Out”. He explains how counting, for him, is a verbal process (he speaks the numbers to himself as he goes along), but that a friend of his would manage visually. (Text here)
He finishes by saying:
I often think about that, especially when I’m teaching some esoteric technique such as integrating Bessel functions. When I see equations, I see the letters in colors — I don’t know why. As I’m talking, I see vague pictures of Bessel functions from Jahnke and Emde’s book, with light-tan j’s, slightly violet-bluish n’s, and dark brown x’s flying around. And I wonder what the hell it must look like to the students.
Because of this, I think there might not always be a significant value in trying to pass those mental pictures over – the real aim is to provoke the student into developing his own mental pictures, that he can strongly relate to. Some words such as “homological” or “homotopical” spark up very distinctive feelings in me, in a similar way as hearing “mountain” would make me visualise various mountains, hills, cliffs, etc. But whereas the meaning of “mountain” came to me through vision (mainly, but also other senses), the origin of my mental images of mathematical ideas comes through the practice of mathematics. As such, it seems harder to convey these mathematical pictures: they must be backed up by precise mathematical understanding, which at any rate should end up conjuring these mental pictures. Of course, many mental pictures are simple enough or “canonical” enough that one might imagine everyone would come to develop very similar ones upon understanding of one particular concept; the previously mentioned example of cyclic groups comes to mind. So there might be value in passing that on, but in the end I would think that understanding accompanied by the attention to what meaning it provides already goes a long way towards developing personal mental images.
People have mentioned examples which are hard to share due to some kind of prerequisites. Here’s one: I learned PDE from a professor who, in his mind, was always thinking about distribution theory, but officially could not talk about it until after he covered the material relevant to the exams. In distribution theory, whenever you see an integral over a domain ∫Ωu(x)dx you actually picture the characteristic function ∫χΩ(x)u(x)dx or ∫H(f(x))u(x)dx if f is a defining function for Ω and H is a heaviside function. From this point of view, you imagine that all functions are smooth and compactly supported (or you can imagine their approximations), so that if you integrate by parts on ∫χΩ∇u(x)dx=−∫∇χΩu(x)dx=∫δ(f(x))∇f(x)u(x)dx. The boundary terms come when the derivative hits the characteristic function. Same thing for Stokes’ theorem, Gauss’s divergence theorem. It’s pretty handy to compute this way.
For a little while this was all I understood until I later found out what was going on. The limit of difference quotients of χΩ is clearly supported on the boundary of Ω and it’s clear, especially if you picture an approximation, that ∇χΩ points in the direction of increase of χΩ — i.e. the inward normal. More simply: there are two points of view — if you were to take difference quotients of u, you use a Lagrangian point of view in which the point at position x moves in the direction i, and you observe a change in u between those points; instead, you can take an Eulerian point of view, (where the adjoint difference quotients go on the characteristic function) and you can instead look at movement of the region with u fixed.
Until I understood this point of view in a simpler way, it would not really be sensible to explain it to others. But now I know that giving a watered down version of the same proof when “proving” the fundamental theorem of calculus / Gauss’s divergence for a calculus class in fact does not lose any key ideas (except for technicalities like how you need the mean value theorem to ensure the difference quotients are bounded). Of course, I would also talk about characteristic functions to any math student, since it is a nice point of view.
By the way, in the calculus of variations, when your u(x)=L(x,ϕ(x)) is a Lagrangian and ϕ(x) is a solution to the Euler-Lagrange equations, and you take difference quotients using the flow of a vector field whose flow preserves the Lagrangian (a “symmetry”), you end up with Noether’s theorem through only this one variation (there are only boundary terms in what I called “the Lagrangian point of view” because you vary through a family of solutions except at the boundary). So it’s also a nice way to prove conservation laws in one swoop.
My point: for a little while, distribution theory seemed like a magical theory with prerequisites that made it unexplainable in everyday talk, but once I really understood the ideas I could usually discard the vocabulary (actually, the whole theory can often be replaced by cutoffs, partitions of unity, Taylor expansion, and changes of variable — although I still think it’s great to learn). I suspect that this phenomenon is not uncommon for elementary applications of “fancy” mathematical theories. I believe that often once one has a more basic understanding, one can throw away the new words but still fully reveal the ideas (but maybe that’s completely due to my own background). People here have talked about Feynman — he was good at doing this in the context of physics. If you watch his (outstanding) lectures on Project Tuva you will see more or less the proof of Noether’s theorem about which I just wrote.
A second point:
Another thing I think happens to me is that I feel some pressure not to convey just how often I rely on geometric modes of thought, especially when they go against the usual way of explaining things, or the background of a typical student, and are not completely necessary.
Example 1: When you row-reduce a matrix, you make a bunch of changes (most importantly some “transvections”) in the basis of the image space until a few of your basis vectors (say v1=Te1,v2=Te2) span the image of the matrix T. When you picture the domain of T foliated by level sets (which are parallel to the null space of T), you know that the remaining basis vectors e3,e4,...can be translated by some element in the span of e1,e2 (i.e. whichever one lies on the same level set) in order to obtain a basis for the null space. Now, this is how we visualize the situation, but is it how we compute and explain? Or do we just do the algebra, which at this point is quite easy? If the algebra is easy and the geometry takes a while to explain and is not “necessary” for the computation, why explain it? This is a dilemma because once algebra is sufficiently well-developed it’s possible that the necessity of (completely equivalent) geometric thinking may become more and more rare; and algebra seems to be more “robust” in that you can explore things you can’t see very well. But then, when students learn the implicit function theorem, somehow I feel like having relied on that kind of foliation much more often would help understand its geometric content. Still, even if it’s in your head and very important, are you going to draw a foliation every time you do row operations? We know the geometry, know the algebra, but it would take a while to repeatedly explain how to rely on the geometry while executing computations.
Example 2: (Things that aren’t graphs)
Another problem geometric thinking faces is that modern math often seems to regard pictures as not being proofs, even if they are more convincing, so there is a bias regarding how to choose to spend class time. Let’s say you want to differentiate x3. You can draw a cube, and a slightly larger cube, and then look at the difference of the cubes and subdivide it into a bunch of small regions, three larger slabs taking up most of the volume. Algebraically, this subdivision corresponds to multiplying out (x+h)3; collecting the terms uses the commutativity, which corresponds to rotating the various identical pieces. It is no different to write this proof out algebraically, the difference is that the algebraic one is a “proof” but the geometric one is.. not? Even if it’s more convincing. So it’s like the picture is only there for culture.
Maybe I have the lecture time to teach both, I will. But I would like to go farther than that. When I differentiate the cube root function, the same cube appears and I go through it again if I feel like it just to convince myself of the truth. Actually, every time I ever use the product rule I always picture the same rectangle with a slightly larger rectangle. My point of view is that one important “definition” of multiplication is in terms of areas, and that a linear function is not necessarily a graph. When you think of a linear function, you should also picture things like rectangles, sectors, similar triangles like the kind that come up when “proving” basic differentiation formulas. Differentiating the integral may seem like a magical trick, but it’s really just a continuation of the point of view that multiplication can look like an area/volume and differentiation means taking a small change in the input.
Now, I’d like that point of view to be absorbed, but it’s not exactly in the textbook, or completely consistent with what students’ other teachers taught them. It’s hard to go against the idea that “you should think graphically” — if I ever think about the sine or tangent function now, it might be the area of a triangle, it might be the length of some vertical line segment, but it’s basically never using the graph, which contains basically no additional information. If I have more than one shot at it, I’ll try to explain both, but is it really of service to go around saying all the time why graphs aren’t the end-all-be-all?
Also, while I can express the pictures in my head one at a time, the fact that I repeatedly, repeatedly see this pictures is something that I feel is harder to express. After all, can’t you just do algebra and get through this stuff more quickly? The algebra is “easier” too; it takes up less space.