Follow the Science?

It is rare that one is able to publish a book as timely, given the confusion and hysteria surrounding the COVID-19 pandemic, as Stuart Ritchie’s Science Fictions: How Fraud, Bias, Negligence, and Hype Undermine the Search for Truth, which is about malfeasance in contemporary natural and social sciences. Ritchie, a psychology lecturer at King’s College London, has written a prescient and absorbing book regarding the replication crisis, the imperfect peer-review process, and scientific misdeeds. It manages to balance conversational prose with what is in many ways an academic literature review distilled for the educated public. While Science Fictions certainly contributes to a greater understanding of some of the worrisome issues surrounding contemporary science, it is not without its omissions and faults.

To be clear at the outset, Ritchie emphatically believes that science contains objective truth, discerned through “scrutiny, questioning, revision, refinement and consensus.” When Ritchie illustrates scientific malfeasance through a large number of case studies, it is clear that for each of these, at least one of the five aspects of the scientific modus operandi has been violated. This is not to say that science has not been amazingly successful in many respects—electricity, spacecraft, and vaccines, to name just a few. Indeed, science’s successes are, in part, what make it so susceptible to what Richie identifies as its four main problems: fraud, bias, negligence, and hype.

Distorting the Scientific Method

Scientific inquiry depends on the ability to continually retest hypotheses by replication. Yet a shocking number of scientific papers fail this test. How bad is the problem? Ritchie gives some distressing numbers in a myriad of fields. For example, in 2018 there was an attempt to replicate 21 social science papers that were published in the two most prestigious science journals, Nature and Science: the replication rate was 62 percent. Alas, the social sciences are not alone. A study from 2016 found that of 268 randomly sampled biomedical papers, including clinical trials, only one of them reported its full protocol. This means that a scientist could not even attempt to replicate 267 of these studies. Replication crisis, indeed.

Ritchie takes us on a tour of fraud, bias, negligence, and hype. Whilst discussing fraud, he points the reader to the website Retraction Watch. (As I write this, the banner blares, “The list of retracted COVID-19 papers is up to 33.”) There have been over 18,000 retractions in the scientific literature since the 1970s, and a number of these papers are, despite their retraction, still cited positively out of ignorance. Ritchie homes in on a number of case studies that illustrate the myriad scientific sins committed by such papers. We learn about the defrocked-physician Andrew Wakefield who made up a link between vaccines and autism in the respected peer-reviewed journal The Lancet for financial gain. There is the spreadsheet error that made it into Carmen Reinhart and Kenneth Rogoff’s peer-reviewed paper in American Economic Review. In the original paper, they stated that any country with a debt-to-GDP ratio above 90 percent, should go through austerity during the aftermath of the 2008 financial crisis. As it turns out, once the error was fixed in the spreadsheet, the 90 percent threshold was no more. A peer-review crisis, indeed, in addition to the replication crisis!

The case of former Cornell Professor Brian Wansink is an exemplar of all four of Ritchie’s ailments that are affecting the sciences. A professor at Cornell University, Wansink’s troubles began with a self-aggrandizing blog post in the fall of 2016 where he unintentionally admitted to asking one of his post-docs to “p-hack” on what became known as the “pizza data” (it was from an Italian restaurant). But before we discuss what p-hacking is, we’ll discuss some of the fallout from this post, which is truly an example of pride coming before the fall.

Skeptics began to look at Wansink’s peer-reviewed works after the blog post and found them riddled with errors. For example, on the four papers that were published using the pizza data, there were no fewer than 150 errors. Similar errors were found in other papers. It is likely that these errors were the result of a combination of negligence (not paying attention to detail) and fraud (purposely massaging the data to look a certain way.) Wansink was also an example of media hype, with a couple of best-selling books on food psychology. He also parlayed the media attention into influencing governmental nutrition policy. Michelle Obama’s much-maligned school lunch program was influenced by Wansink’s work. Republicans were not immune, either, as he served under George W. Bush in the FDA. (Perhaps, this gives evidence for why governments should not have nutrition policies.)

Wansink’s negligence, fraud, and hype were enough to cause 18 of his papers to be retracted and for him to resign from Cornell. But his p-hacking was the worst offense. To discuss it, we first have to confront the widely misunderstood p-value (where p stands for probability) and its relation to bias in scientific research. Ritchie does an admirable job explaining the p-value as well as concomitant notions, such as effect size and sample size, in the most technical part of the book. I do wish he would have elaborated more fully on where the p-value lives in a wider statistical context. (Albeit as an applied statistician, perhaps my bias is showing here.)

The scientist must resist the temptation of the unconstrained vision, where he believes that he can overcome self-interest (whether it is manifested in fraud, bias, negligence, or hype) through reason.

For the sake of this review, we can informally say that the p-value is a probability that measures surprise: is the observed data surprising, given an assumption about the nature of the population? Due to a tradition dating back over a century, a p-value < 0.05 is typically considered to be surprising, or “statistically significant.” Generally,  it is only papers that have met this arbitrary threshold—that has no actual statistical justification— that get published in peer-reviewed journals. Having statistical significance set at p-value < 0.05 for all of the sciences ignores the diversity of the sciences and what counts as sufficient evidence in each field.

There are three types of publication biases that arise from the fixation on a p-value < 0.05. The first is that scientists are not inclined to publish work that has a p-value of .05 or more. We can call this the “Self-Censorship Bias.” This is harmful since a negative result is still a result that can add to the collective knowledge about a field of study. To use one of Wansink’s discredited dietary studies: knowing that putting an Elmo sticker on an apple does not make children more likely to pick it than a cookie is just as valid a piece of knowledge as the opposite, which the paper claimed. (I wonder if any of the peer-reviewers had children of their own since any parent would be very skeptical of the claim that an apple would ever trump a cookie.)

Secondly, there is motivation to manipulate the data to give a p-value that is surprising enough. For example, in a meta-science study from 2010 that looked at over 2,500 papers from a myriad of scientific disciplines, 70.2 percent of space science papers (the lowest rate) had a p-value < 0.05.and 91.5 percent of psychology/psychiatry papers (the highest rate). We can call the second type of publication bias, the “Publish or Perish Bias.”

This is where p-hacking comes in, and the scientific crime that Wansink committed: a researcher p-hacks when instead of starting with a hypothesis about the population, he mines the data until it gives a p-value less than five percent. This inverts the scientific process: rather than testing a theory, he finds a result in need of a theory. This is related to the tyranny of metrics, as Jerry Muller calls it, where a metric such as a p-value becomes the telos rather than the scientific theory. Also, even if there is a statistically significant result, this does not mean it is necessarily an important one. For example, there is a statistically significant difference in the ages of heterosexual couples in England, but the difference in ages for each couple is on average about two years. There is no practical difference in the ages of a 35-year-old husband and a 33-year-old wife statistical significance notwithstanding.

There is a third type of publication bias: “The Black Box Bias.” Most statistical software and programming languages that scientists use are “black box” methods, meaning that the scientists enter data and the software magically produces the statistics like p-values. What happens between the input and the output is often ignored and/or misunderstood—hence, “black box” methods. Ritchie should have emphasized the danger of relying on the p-value with “black box” statistical software. He did mention it in passing, but it is truly a type of publication bias in its own right. While a scientist cannot be fairly be expected to understand the nuances of all of the statistical models that he uses, he should be conversant in them. He cannot properly interpret a p-value in context if he does not understand the process which gave birth to it. This is akin to having the ingredients for a cake and then magically getting the baked cake after putting it into a special machine. While the so-called baker can tell us if the cake tastes good or not, he cannot tell us why this is the case.

Be Skeptical of the Experts

Ritchie concludes the book with pragmatic suggestions on how to fix these issues of fraud, bias, negligence, and hype in the sciences. For example, there are algorithms that can check papers for numeric errors and have already been deployed ex post facto on published papers. Ritchie believes they should be used during the peer-review process. Another suggestion, which the American Statistical Association has been championing, is to move away from a fixed p-value.

While strongly suggesting that anyone interested in the issues presented in the book should read it, I do feel obliged to nitpick a few topics in addition to the aforementioned belief that statistics needed to be discussed more. While there are wonderfully extensive endnotes, there is not a separate bibliography and hence, it makes it difficult to keep track of the voluminous references. While prima facie it seems unfair to criticize a book that does not have one particular reference, I will make an exception since it is inexcusable that Ritchie does not include John Staddon’s Scientific Method since it is a recently published monograph that covers much of the same ground as Science Fictions.

While Ritchie touches on the philosophy of science and ethical issues with scientific malfeasance, he should have given a more unified effort in this regard. A suggestion that he does not give for “fixing science” is that, as part of graduate school curriculum, scientists need to learn about the philosophy and ethics of science. Ritchie justifiably expresses skepticism about many scientific results, yet he intransigently defends the notion of climate change and hypocritically explains away the actions of the East Anglia climate scientists that actively manipulated the peer-review system to get their papers published.

In an era in which scientists are among the most trusted professionals in the United States, this book reminds us to be skeptical of the experts. As we have seen repeatedly with COVID-19, and as is illustrated by Science Fictions, scientists are human and are susceptible to the same sins as non-scientists. Great societal damage is done by scientists who lie (e.g., Wakefield and vaccines) or who attempt to direct social policy (e.g., keeping schools closed this Fall due to COVID-19 when children who are infected have a 0.99997 survival rate). In Thomas Sowell’s seminal A Conflict of Visions, he presents the antinomy of the constrained and unconstrained visions. The scientist must resist the temptation of the unconstrained vision, where the scientist believes that he can overcome self-interest (whether it is manifested in fraud, bias, negligence, or hype) through reason. Further, a scientist that is an expert in one field of science is not an expert in another field—and is certainly not an expert in politics or religion. Rather, the scientist must hold the constrained vision and admit that he is self-interested and that there are limits on his expertise and knowledge. The peer-review process properly executed is a manifestation of the constrained vision.

Irrespective of any of its flaws or omissions, Ritchie’s Science Fictions has a good chance to change science for the better and for that, we all benefit from his work.