Why the h-index is bad as an “objective” measure of individual scientific productivity

Firstly, Goodhart’s law:

“When a measure becomes a target, it ceases to be a good measure.” ….

Tangentially related is Terry Tao’s advice on doing mathematics; I’ve found Tao to be a very reliable source of evidence-based common sense in general when it comes to areas he’s expert in (which is not necessarily the case for most brilliant minds unfortunately), I quote a paragraph from this article – this was where I first heard of the law above by the way:

It is also worth noting that even one’s own personal benchmarks, such as the number of theorems and proofs from <standard reference text in your field> you have memorised, or how quickly one can solve qualifying exam problems, should also not be overemphasised in one’s personal study at the expense of actually learning the underlying mathematics, lest one fall prey to Goodhart’s law.  Such metrics can be useful as a rough assessment of your understanding of a subject, but they should not become the primary goal of one’s study.

I digress. Back to the h-index, which you can intuitively think of as being someone’s “academic footprint”. Here’s Wikipedia:

The h-index is an index that attempts to measure both the productivity and citation impact of the publishedbody of work of a scientist or scholar. The index is based on the set of the scientist’s most cited papers and the number of citations that they have received in other publications. ….

The index is based on the distribution of citations received by a given researcher’s publications. Hirsch writes:

A scientist has index h if h of his/her Np papers have at least h citations each, and the other (Np − h) papers have no more than h citations each.

In other words, a scholar with an index of h has published h papers each of which has been cited in other papers at least htimes.[4] Thus, the h-index reflects both the number of publications and the number of citations per publication. The index is designed to improve upon simpler measures such as the total number of citations or publications. The index works properly only for comparing scientists working in the same field; citation conventions differ widely among different fields.

Alright, now let’s bash the h-index. Brian Farley, a postdoc biochemist at UC Berkeley, writes on Quora in reply to the question “Is the h-index a proper index to evaluate the productivity and citation impact of a scientist?”:

Absolutely, unequivocally not. It’s an “objective” metric that can be gamed to hell and back that’s used to absolve oneself of the burden of actually evaluating a scientist’s work. Yes, it takes a lot of work to understand and evaluate what a scientist does, but if you’re about to support their ability to do it, you should probably understand why. I can understand how people (especially the dreaded administrators) can be seduced by the simplicity of a single “objective” number, but the simplicity is exactly why it fails to be meaningful or even useful. Scientists are much, much more than the papers which their name happens to be on — they train, they collaborate, they have informal discussions, they make a huge number of unpublished contributions to the community as a whole — but the h-index ignores all of that. It places an extremely inappropriate emphasis on publications and leads to all kinds of behavior that hurts the scientific community, up to and including academic fraud.

As tempting as it is to make simple, objective metrics for everything, I cannot imagine a single situation in which it is appropriate to dodge the personal responsibility associated with making a subjective decision about a scientist.

I’ve previously collected material on this subject (of h-index bashing) in a previous post on Ron Maimon and Ed Witten; don’t know which.

Lastly, here’s an exchange I found in a Less Wrong comments thread which talks about basically the same thing:

“The fact that students who are motivated to get good scores in exams very often get better scores than students who are genuinely interested in the subject is probably also an application of Goodhart’s Law?”

“Partially; but a lot of what is being tested is actually skills correlated with being good in exams – working hard, memorisation, bending youself to the rules, ability to learn skill sets even if you don’t love them, gaming the system – rather than interest in the subject.”

“But those skills don’t correlate with doing good science, or with good use of the subject of the exams in general, nearly so well, and they are easy to test in other ways.”

Leave a comment