On examining evidences for points of view, etc

Again, Scott Alexander. What’s new? He seems optimized for what counts as “interesting to me” in a way few other writers are: tons of interest overlap, high originality, a capacity to subsume large amounts of self-directed research into a subject coherently and distill it in a manner that’s not just understandable but compulsively readable – I could go on, but I’ll just shut up instead and let his post do the talking.

As usual: go to Slate Star Codex if you happen to find this kind of stuff to your liking; what follows below is pure copy-pasting done for my own future benefit.


Beware The Man of One Study

Aquinas famously said: beware the man of one book. I would add: beware the man of one study.

For example, take medical research. Suppose a certain drug is weakly effective against a certain disease. After a few years, a bunch of different research groups have gotten their hands on it and done all sorts of different studies. In the best case scenario the average study will find the true result – that it’s weakly effective.

But there will also be random noise caused by inevitable variation and by some of the experiments being better quality than others. In the end, we might expect something looking kind of like a bell curve. The peak will be at “weakly effective”, but there will be a few studies to either side. Something like this:

We see that the peak of the curve is somewhere to the right of neutral – ie weakly effective – and that there are about 15 studies that find this correct result.

But there are also about 5 studies that find that the drug is very good, and 5 studies missing the sign entirely and finding that the drug is actively bad. There’s even 1 study finding that the drug is very bad, maybe seriously dangerous.

This is before we get into fraud or statistical malpractice. I’m saying this is what’s going to happen just by normal variation in experimental design. As we increase experimental rigor, the bell curve might get squashed horizontally, but there will still be a bell curve.

In practice it’s worse than this, because this is assuming everyone is investigating exactly the same question.

Suppose that the graph is titled “Effectiveness Of This Drug In Treating Bipolar Disorder”.

But maybe the drug is more effective in bipolar i than in bipolar ii (Depakote, for example)

Or maybe the drug is very effective against bipolar mania, but much less effective against bipolar depression (Depakote again).

Or maybe the drug is a good acute antimanic agent, but very poor at maintenance treatment (let’s stick with Depakote).

If you have a graph titled “Effectiveness Of Depakote In Treating Bipolar Disorder” plotting studies from “Very Bad” to “Very Good” – and you stick all the studies – maintenence, manic, depressive, bipolar i, bipolar ii – on the graph, then you’re going to end running the gamut from “very bad” to “very good” even before you factor in noise and even before even before you factor in bias and poor experimental design.

So here’s why you should beware the man of one study.

If you go to your better class of alternative medicine websites, they don’t tell you “Studies are a logocentric phallocentric tool of Western medicine and the Big Pharma conspiracy.”

They tell you “medical science has proved that this drug is terrible, but ignorant doctors are pushing it on you anyway. Look, here’s a study by a reputable institution proving that the drug is not only ineffective, but harmful.”

And the study will exist, and the authors will be prestigious scientists, and it will probably be about as rigorous and well-done as any other study.

And then a lot of people raised on the idea that some things have Evidence and other things have No Evidence think holy s**t, they’re right!

On the other hand, your doctor isn’t going to a sketchy alternative medicine website. She’s examining the entire literature and extracting careful and well-informed conclusions from…

Haha, just kidding. She’s going to a luncheon at a really nice restaurant sponsored by a pharmaceutical company, which assures her that they would never take advantage of such an opportunity to shill their drug, they just want to raise awareness of the latest study. And the latest study shows that their drug is great! Super great! And your doctor nods along, because the authors of the study are prestigious scientists, and it’s about as rigorous and well-done as any other study.

But obviously the pharmaceutical company has selected one of the studies from the “very good” end of the bell curve.

And I called this “Beware The Man of One Study”, but it’s easy to see that in the little diagram there are like three or four studies showing that the drug is “very good”, so if your doctor is a little skeptical, the pharmaceutical company can say “You are right to be skeptical, one study doesn’t prove anything, but look – here’s another group that finds the same thing, here’s yet another group that finds the same thing, and here’s a replication that confirms both of them.”

And even though it looks like in our example the sketchy alternative medicine website only has one “very bad” study to go off of, they could easily supplement it with a bunch of merely “bad” studies. Or they could add all of those studies about slightly different things. Depakote is ineffective at treating bipolar depression. Depakote is ineffective at maintenance bipolar therapy. Depakote is ineffective at bipolar ii.

So just sum it up as “Smith et al 1987 found the drug ineffective, yet doctors continue to prescribe it anyway”. Even if you hunt down the original study (which no one does), Smith et al won’t say specifically “Do remember that this study is only looking at bipolar maintenance, which is a different topic from bipolar acute antimanic treatment, and we’re not saying anything about that.” It will just be titled something like “Depakote fails to separate from placebo in six month trial of 91 patients” and trust that the responsible professionals reading it are well aware of the difference between acute and maintenance treatments (hahahahaha).

So it’s not so much “beware the man of one study” as “beware the man of any number of studies less than a relatively complete and not-cherry-picked survey of the research”.

II.

I think medical science is still pretty healthy, and that the consensus of doctors and researchers is more-or-less right on most controversial medical issues.

(it’s the uncontroversial ones you have to worry about)

Politics doesn’t have this protection.

Like, take the minimum wage question (please). We all know about the Krueger and Cardstudy in New Jersey that found no evidence that high minimum wages hurt the economy. We probably also know the counterclaims that it was completely debunked as despicable dishonest statistical malpractice. Maybe some of us know Card and Krueger wrote a pretty convincing rebuttal of those claims. Or that a bunch of large and methodologically advanced studies have come out since then, some finding no effect like Dube, others finding strong effects like Rubinstein and Wither. These are just examples; there are at least dozens and probably hundreds of studies on both sides.

But we can solve this with meta-analyses and systemtic reviews, right?

Depends which one you want. Do you go with this meta-analysis of fourteen studies that shows that any presumed negative effect of high minimum wages is likely publication bias? With this meta-analysis of sixty-four studies that finds the same thing and discovers no effect of minimum wage after correcting for the problem? Or how about this meta-analysis of fifty-five countries that does find effects in most of them? Maybe you prefer this systematic review of a hundred or so studies that finds strong and consistent effects?

Can we trust news sources, think tanks, econblogs, and other institutions to sum up the state of the evidence?

CNN claims that 85% of credible studies have shown the minimum wage causes job loss. But raisetheminimumwage.com declares that “two decades of rigorous economic research have found that raising the minimum wage does not result in job loss…researchers and businesses alike agree today that the weight of the evidence shows no reduction in employment resulting from minimum wage increases.” Modeled Behavior says “the majority of the new minimum wage research supports the hypothesis that the minimum wage increases unemployment.” The Center for Budget and Policy Priorities says “The common claim that raising the minimum wage reduces employment for low-wage workers is one of the most extensively studied issues in empirical economics. The weight of the evidence is that such impacts are small to none.”

Okay, fine. What about economists? They seem like experts. What do they think?

Well, five hundred economists signed a letter to policy makers saying that the science of economics shows increasing the minimum wage would be a bad idea. That sounds like a promising consensus…

..except that six hundred economists signed a letter to policy makers saying that the science of economics shows increasing the minimum wage would be a good idea. (h/t Greg Mankiw)

Fine then. Let’s do a formal survey of economists. Now what?

raisetheminimumwage.com, an unbiased source if ever there was one, confidently tells us that “indicative is a 2013 survey by the University of Chicago’s Booth School of Business in which leading economists agreed by a nearly 4 to 1 margin that the benefits of raising and indexing the minimum wage outweigh the costs.”

But the Employment Policies Institute, which sounds like it’s trying way too hard to sound like an unbiased source, tells us that “Over 73 percent of AEA labor economists believe that a significant increase will lead to employment losses and 68 percent think these employment losses fall disproportionately on the least skilled. Only 6 percent feel that minimum wage hikes are an efficient way to alleviate poverty.”

So the whole thing is fiendishly complicated. But unless you look very very hard, you will never know that.

If you are a conservative, what you will find on the sites you trust will be something like this:

Economic theory has always shown that minimum wage increases decrease employment, but the Left has never been willing to accept this basic fact. In 1992, they trumpeted a single study by Card and Krueger that purported to show no negative effects from a minimum wage increase. This study was immediately debunked and found to be based on statistical malpractice and “massaging the numbers”. Since then, dozens of studies have come out confirming what we knew all along – that a high minimum wage is economic suicide. Systematic reviews and meta-analyses (Neumark 2006, Boockman 2010) consistently show that an overwhelming majority of the research agrees on this fact – as do 73% of economists. That’s why five hundred top economists recently signed a letter urging policy makers not to buy into discredited liberal minimum wage theories. Instead of listening to starry-eyed liberal woo, listen to the empirical evidence and an overwhelming majority of economists and oppose a raise in the minimum wage.

And if you are a leftist, what you will find on the sites you trust will be something like this:

People used to believe that the minimum wage decreased unemployment. But Card and Krueger’s famous 1992 study exploded that conventional wisdom. Since then, the results have been replicated over fifty times, and further meta-analyses (Card and Krueger 1995, Dube 2010) have found no evidence of any effect. Leading economists agree by a 4 to 1 margin that the benefits of raising the minimum wage outweigh the costs, and that’s why more than 600 of them have signed a petition telling the government to do exactly that. Instead of listening to conservative scare tactics based on long-debunked theories, listen to the empirical evidence and the overwhelming majority of economists and support a raise in the minimum wage.

Go ahead. Google the issue and see what stuff comes up. If it doesn’t quite match what I said above, it’s usually because they can’t even muster that level of scholarship. Half the sites just cite Card and Krueger and call it a day!

These sites with their long lists of studies and experts are super convincing. And half of them are wrong.

At some point in their education, most smart people usually learn not to credit arguments from authority. If someone says “Believe me about the minimum wage because I seem like a trustworthy guy,” most of them will have at least one neuron in their head that says “I should ask for some evidence”. If they’re really smart, they’ll use the magic words “peer-reviewed experimental studies.”

But I worry that most smart people have not learned that a list of dozens of studies, several meta-analyses, hundreds of experts, and expert surveys showing almost all academics support your thesis – can still be bullshit.

Which is too bad, because that’s exactly what people who want to bamboozle an educated audience are going to use.

III.

I do not want to preach radical skepticism.

For example, on the minimum wage issue, I notice only one side has presented a funnel plot. A funnel plot is usually used to investigate publication bias, but it has another use as well – it’s pretty much an exact presentation of the “bell curve” we talked about above.

This is more of a needle curve than a bell curve, but the point still stands. We see it’s centered around 0, which means there’s some evidence that’s the real signal among all this noise. The bell skews more to left than to the right, which means more studies have found negative effects of the minimum wage than positive effects of the minimum wage. But since the bell curve is asymmetrical, we intepret that as probably publication bias. So all in all, I think there’s at least some evidence that the liberals are right on this one.

Unless, of course, someone has realized that I’ve wised up to the studies and meta-analyses and and expert surveys, and figured out a way to hack funnel plots, which I am totally not ruling out.

(okay, I kind of want to preach radical skepticism)

Also, I should probably mention that it’s much more complicated than one side being right, and that the minimum wage probably works differently depending on what industry you’re talking about, whether it’s state wage or federal wage, whether it’s a recession or a boom, whether we’re talking about increasing from $5 to $6 or from $20 to $30, etc, etc, etc. There are eleven studies on that plot showing an effect even worse than -5, and very possibly they are all accurate for whatever subproblem they have chosen to study – much like the example with Depakote where it might an effective antimanic but a terrible antidepressant.

(radical skepticism actually sounds a lot better than figuring this all out).

IV.

But the question remains: what happens when (like in most cases) you don’t have a funnel plot?

I don’t have a good positive answer. I do have several good negative answers.

Decrease your confidence about most things if you’re not sure that you’ve investigated every piece of evidence.

Do not trust websites which are obviously biased (eg Free Republic, Daily Kos, Dr. Oz) when they tell you they’re going to give you “the state of the evidence” on a certain issue, even if the evidence seems very stately indeed. This goes double for any site that contains a list of “myths and facts about X”, quadruple for any site that uses phrases like “ingroup member uses actual FACTS to DEMOLISH the outgroup’s lies about Y”, and octuple for RationalWiki.

Most important, even if someone gives you what seems like overwhelming evidence in favor of a certain point of view, don’t trust it until you’ve done a simple Google search to see if the opposite side has equally overwhelming evidence.


Debunked and Well-Refuted

I.

As usual, I was insufficiently pessimistic.

I infer this from The Federalist‘s article on campus rape:

A new report on sexual assault released today by the U.S. Department of Justice (DOJ) officially puts to bed the bogus statistic that one in five women on college campuses are victims of sexual assault. In fact, non-students are 25 percent more likely to be victims of sexual assault than students, according to the data. And the real number of assault victims is several orders of magnitude lower than one-in-five.

The article compares the older Campus Sexual Assault Survey (which found 14-20% of women were raped since entering college) to the just-released National Crime Victmization Survey (which found that 0.6% of female college students are raped per year). They write “Instead of 1 in 5, the real number is 0.03 in 5.”

So the first thing I will mock The Federalist for doing is directly comparing per year sexual assault rates to per college career sexual assault rates, whereas obviously these are very different things. You can’t quite just divide the latter by four to get the former, but that’s going to work a heck of a lot better than not doing it, so let’s estimate the real discrepancy as more like 0.5% per year versus 5% per year.

But I can’t get too mad at them yet, because that’s still a pretty big discrepancy.

However, faced with this discrepancy a reasonable person might say “Hmm, we have two different studies that say two different things. I wonder what’s going on here and which study we should believe?”

The Federalist staff said “Ha! There’s an old study with findings we didn’t like, but now there’s a new study with different findings we do like. So the old study is debunked!”

II.

My last essay, Beware The Man Of One Study, noted that one thing partisans do to justify their bias is selectively acknowledge studies from only one side of a complicated literature.

The reason it was insufficiently pessimistic is that there are also people like the Federalist staff, who acknowledge the existence of opposing studies, but only with the adjective “debunked” in front of them. By “debunked” they usually mean one of two things:

1. Someone on my side published a study later that found something else
2. Someone on my side accused it of having methodological flaws

Since the Federalist has so amply demonstrated the first failure mode, let me say a little more about the second. Did you know that anyone with a keyboard can just type up any of the following things?

– “That study is a piece of garbage that’s not worth the paper it’s written on.”
– “People in the know dismissed that study years ago.”
– “Nobody in the field takes that study seriously.”
– “That study uses methods that are laughable to anybody who knows statistics.”
– “All the other research that has come out since discredits that study.”

They can say these things whether they are true or not. I’m kind of harping on this point, but it’s because it’s something I didn’t realize until much later than I should have.

There are many “questions” that are pretty much settled – evolution, global warming, homeopathy. But taking these as representative closes your mind and gives you a skewed picture of academia. On many issues, academics are just as divided as anyone else, and their arguments can be just as acrimonious as anyone else’s. The arguments usually take the form of one side publishing a study, the other side ripping the study apart and publishing their own study which they say is better, and the first side ripping the second study apart and arguing that their study was better all along.

Every study has flaws. No study has perfect methodology. If you like a study, you can say that it did the best it could on a difficult research area and has improved upon even-worse predecessor studies. If you don’t like a study, you can say “LOOK AT THESE FLAWS THESE PEOPLE ARE IDIOTS THE CONCLUSION IS COMPLETELY INVALID”. All you need to do is make enough isolated demands for rigor against anything you disagree with.

And so if the first level of confirmation bias is believing every study that supports your views, the second layer of confirmation bias is believing every supposed refutation that supports your views.

See for example this recent Xenosystems post about a Twitterer claiming The Bell Curve has been “well-refuted”. There are definitely a lot of people who have written books, articles, and papers arguing that The Bell Curve is wrong, often in very strong terms. There are also a lot of people who have written books, articles, and papers saying that the first set of books, articles, and papers are wrong and The Bell Curve is right, also in very strong terms. To say that the first set is a “refutation” or “debunking” is as basic a mistake as saying that the new rape study is a “refutation” or “debunking” of the earlier rape study.

(albeit a mistake likely to be made by exactly the opposite people)

There are certainly things that have been “well-refuted” and “debunked”. Andrew Wakefield’s study purporting to prove that vaccines cause autism is a pretty good example. But you will notice that it had multiple failed replications, journals published reports showing he falsified data, the study’s co-authors retracted their support, the journal it was published in retracted it and issued an apology, the General Medical Council convicted Wakefield of sixteen counts of misconduct, and Wakefield was stripped of his medical license and barred from practicing medicine ever again in the UK. The British Medical Journal, one of the best-respected medical journals in the world, published an editorial concluding:

Clear evidence of falsification of data should now close the door on this damaging vaccine scare … Who perpetrated this fraud? There is no doubt that it was Wakefield. Is it possible that he was wrong, but not dishonest: that he was so incompetent that he was unable to fairly describe the project, or to report even one of the 12 children’s cases accurately? No.

Meanwhile, The Bell Curve was lambasted in the popular press and by many academics. But it also got fifty of the top researchers in its field to sign a consensus statement saying it was pretty much right about everything and the people attacking it were biased and confused. Three years later, they re-issued their statement saying nothing had changed and more recent findings had only confirmed their opinion. The American Psychological Association launched a task force to settle the issue which stopped short of complete agreement but which given the circumstances was pretty darned supportive. There are certainly a lot of smart people with very strong negative opinions, but each one is still usually met by an equally ardent and credentialed proponent.

One of these two things has been “well-refuted”. The other has been “argued against”.

III.

I saw this same dynamic at work the other day, looking through the minimum wage literature.

The primordial titanomachy of the minimum wage literature goes like this. In 1994, two guys named Card and Krueger published a study showing the minimum wage had if anything positive effects on New Jersey restaurants, convincing many people that minimum wages were good. In 1996, two guys named Neumark and Wascher reanalyzed the New Jersey data using a different source and found that it showed the minimum wage had very bad effects on New Jersey restaurants. In 2000, Card and Krueger responded, saying that their analysis was better than Neumark and Wascher’s re-analysis, and also they had done a re-analysis of their own which confirmed their original position.

Let’s see how conservative sites present this picture:

“The support for this assertion is the oft-cited 1994 study by Card and Krueger showing a positive correlation between an increased minimum wage and employment in New Jersey. Many others have thoroughly debunked this study.” (source)

“I was under the impression that the original study done by Card and Krueger had been thoroughly debunked by Michigan State University economist David Neumark and William Wascher” (source)

“The study … by Card and Krueger has been debunked by several different people several different times. When other researchers re-evaluated the study, they found that data collected using those records ‘lead to the opposite conclusion from that reached by’ Card and Krueger.” (source)

“It was only a short time before the fantastic Card-Krueger findings were challenged and debunked by several subsequent studies…in 1995, economists David Neumark and David Wascher used actual payroll records (instead of survey data used by Card and Krueger) and published their results in an NBER paper with an amazing finding: Demand curves for unskilled labor really do slope downward, confirming 200 years of economic theory and mountains of empirical evidence (source)

And now let’s look at how lefty sites present this picture:

“…a long-debunked paper [by Neumark and Wascher]” (source)

“Note that your Mises heroes, Neumark and Wascher are roundly debunked.” (source)

“Neumark’s living wage and minimum wage research have been found to be seriously flawed…based on faulty methods which when corrected refute his conclusion.” – (source)

“…Neumark and Wascher, a study which Elizabeth Warren debunked in a Senate hearing”(source)

So if you’re conservative, Neumark and Wascher debunked Card and Krueger. But if you’re liberal, Card and Krueger debunked Neumark and Wascher.

Both sides are no doubt very pleased with themselves. They’re not men of one study. They look at all of the research – except of course the studies that have been “debunked” or “well-refuted”. Why would you waste your time with those?

IV.

Once again, I’m not preaching radical skepticism.

First of all, some studies are super-debunked. Wakefield is a good example.

Second of all, some studies that don’t quite meet Wakefield-level of awfulness are indeed really bad and need refuting. I don’t think this is beyond the intellectual capacities of most people. I think in many cases it’s easy to understand why a study is wrong, you should try to do that, and once you do it you can safely discount the results of the study.

I’m not against pointing out when you disagree with studies or think they’re flawed. I’d be a giant hypocrite if I was.

But “debunked” and “refuted” aren’t saying you disagree with a study. They’re making arguments from authority. They’re saying “the authority of the scientific community has come together and said this is a piece of crap that doesn’t count”.

And that’s fine if that’s actually happened. But you had better make sure that you’re calling upon an ex cathedra statement by the community itself, and not a single guy with an axe to grind. Or one side of a complicated an interminable debate where both sides have about equal credentials and sway.

If you can’t do that, you say “I think that my side of the academic debate is in the right, and here’s why,” not “your side has been debunked”.

Otherwise you’re going to end up like the minimum wage debaters, where both sides claim to have debunked the other. Or like that woman on Twitter, who calls a common position backed by leading researchers “well-refuted”. Or like the Federalist article that says a study has been “put to bed” as “bogus” just because another study said something different.

I think this is part of my reply to the claim that empiricism is so great that no one needs rationality.

A naive empiricist who swears off critical thinking because they can just “follow the evidence” has no contingency plan for when the evidence gets confusing. Their only recourse is to deny that the evidence is confusing, to assert that one side or the other has been “debunked”. Since they’ve already made a principled decision not to study confirmation bias, chances are it’s going to be whichever side they don’t like that’s “already been debunked”. And by “debunked” they mean “a scientist on my side said it was wrong, so now I am relieved from the burden of thinking about it.”

On the original post, I wrote:

Life is made up of limited, confusing, contradictory, and maliciously doctored facts. Anyone who says otherwise is either sticking to such incredibly easy solved problems that they never encounter anything outside their comfort level, or so closed-minded that they shut out any evidence that challenges their beliefs.

In the absence of any actual debunking more damning than a counterargument, “that’s been debunked” is the way “shuts out any evidence that challenges their beliefs” feels from the inside.

V.

Somebody’s going to want to know what’s up with the original rape studies. The answer is that a small part of the discrepancy is response bias on the CSAS, but most of it is that the two surveys encourage respondents to define “sexual assault” in very different ways. Vox has an excellent article on this which for once I 100% endorse.

In other words, both are valid, both come together to form a more nuanced picture of campus violence, and neither one “debunks” the other. How about that?

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s