Follow the Science?

It is rare that one is able to publish a book as timely, given the confusion and hysteria surrounding the COVID-19 pandemic, as Stuart Ritchie’s Science Fictions: How Fraud, Bias, Negligence, and Hype Undermine the Search for Truth, which is about malfeasance in contemporary natural and social sciences. Ritchie, a psychology lecturer at King’s College London, has written a prescient and absorbing book regarding the replication crisis, the imperfect peer-review process, and scientific misdeeds. It manages to balance conversational prose with what is in many ways an academic literature review distilled for the educated public. While Science Fictions certainly contributes to a greater understanding of some of the worrisome issues surrounding contemporary science, it is not without its omissions and faults.

To be clear at the outset, Ritchie emphatically believes that science contains objective truth, discerned through “scrutiny, questioning, revision, refinement and consensus.” When Ritchie illustrates scientific malfeasance through a large number of case studies, it is clear that for each of these, at least one of the five aspects of the scientific modus operandi has been violated. This is not to say that science has not been amazingly successful in many respects—electricity, spacecraft, and vaccines, to name just a few. Indeed, science’s successes are, in part, what make it so susceptible to what Richie identifies as its four main problems: fraud, bias, negligence, and hype.

Distorting the Scientific Method

Scientific inquiry depends on the ability to continually retest hypotheses by replication. Yet a shocking number of scientific papers fail this test. How bad is the problem? Ritchie gives some distressing numbers in a myriad of fields. For example, in 2018 there was an attempt to replicate 21 social science papers that were published in the two most prestigious science journals, Nature and Science: the replication rate was 62 percent. Alas, the social sciences are not alone. A study from 2016 found that of 268 randomly sampled biomedical papers, including clinical trials, only one of them reported its full protocol. This means that a scientist could not even attempt to replicate 267 of these studies. Replication crisis, indeed.

Ritchie takes us on a tour of fraud, bias, negligence, and hype. Whilst discussing fraud, he points the reader to the website Retraction Watch. (As I write this, the banner blares, “The list of retracted COVID-19 papers is up to 33.”) There have been over 18,000 retractions in the scientific literature since the 1970s, and a number of these papers are, despite their retraction, still cited positively out of ignorance. Ritchie homes in on a number of case studies that illustrate the myriad scientific sins committed by such papers. We learn about the defrocked-physician Andrew Wakefield who made up a link between vaccines and autism in the respected peer-reviewed journal The Lancet for financial gain. There is the spreadsheet error that made it into Carmen Reinhart and Kenneth Rogoff’s peer-reviewed paper in American Economic Review. In the original paper, they stated that any country with a debt-to-GDP ratio above 90 percent, should go through austerity during the aftermath of the 2008 financial crisis. As it turns out, once the error was fixed in the spreadsheet, the 90 percent threshold was no more. A peer-review crisis, indeed, in addition to the replication crisis!

The case of former Cornell Professor Brian Wansink is an exemplar of all four of Ritchie’s ailments that are affecting the sciences. A professor at Cornell University, Wansink’s troubles began with a self-aggrandizing blog post in the fall of 2016 where he unintentionally admitted to asking one of his post-docs to “p-hack” on what became known as the “pizza data” (it was from an Italian restaurant). But before we discuss what p-hacking is, we’ll discuss some of the fallout from this post, which is truly an example of pride coming before the fall.

Skeptics began to look at Wansink’s peer-reviewed works after the blog post and found them riddled with errors. For example, on the four papers that were published using the pizza data, there were no fewer than 150 errors. Similar errors were found in other papers. It is likely that these errors were the result of a combination of negligence (not paying attention to detail) and fraud (purposely massaging the data to look a certain way.) Wansink was also an example of media hype, with a couple of best-selling books on food psychology. He also parlayed the media attention into influencing governmental nutrition policy. Michelle Obama’s much-maligned school lunch program was influenced by Wansink’s work. Republicans were not immune, either, as he served under George W. Bush in the FDA. (Perhaps, this gives evidence for why governments should not have nutrition policies.)

Wansink’s negligence, fraud, and hype were enough to cause 18 of his papers to be retracted and for him to resign from Cornell. But his p-hacking was the worst offense. To discuss it, we first have to confront the widely misunderstood p-value (where p stands for probability) and its relation to bias in scientific research. Ritchie does an admirable job explaining the p-value as well as concomitant notions, such as effect size and sample size, in the most technical part of the book. I do wish he would have elaborated more fully on where the p-value lives in a wider statistical context. (Albeit as an applied statistician, perhaps my bias is showing here.)

The scientist must resist the temptation of the unconstrained vision, where he believes that he can overcome self-interest (whether it is manifested in fraud, bias, negligence, or hype) through reason.

For the sake of this review, we can informally say that the p-value is a probability that measures surprise: is the observed data surprising, given an assumption about the nature of the population? Due to a tradition dating back over a century, a p-value < 0.05 is typically considered to be surprising, or “statistically significant.” Generally,  it is only papers that have met this arbitrary threshold—that has no actual statistical justification— that get published in peer-reviewed journals. Having statistical significance set at p-value < 0.05 for all of the sciences ignores the diversity of the sciences and what counts as sufficient evidence in each field.

There are three types of publication biases that arise from the fixation on a p-value < 0.05. The first is that scientists are not inclined to publish work that has a p-value of .05 or more. We can call this the “Self-Censorship Bias.” This is harmful since a negative result is still a result that can add to the collective knowledge about a field of study. To use one of Wansink’s discredited dietary studies: knowing that putting an Elmo sticker on an apple does not make children more likely to pick it than a cookie is just as valid a piece of knowledge as the opposite, which the paper claimed. (I wonder if any of the peer-reviewers had children of their own since any parent would be very skeptical of the claim that an apple would ever trump a cookie.)

Secondly, there is motivation to manipulate the data to give a p-value that is surprising enough. For example, in a meta-science study from 2010 that looked at over 2,500 papers from a myriad of scientific disciplines, 70.2 percent of space science papers (the lowest rate) had a p-value < 0.05.and 91.5 percent of psychology/psychiatry papers (the highest rate). We can call the second type of publication bias, the “Publish or Perish Bias.”

This is where p-hacking comes in, and the scientific crime that Wansink committed: a researcher p-hacks when instead of starting with a hypothesis about the population, he mines the data until it gives a p-value less than five percent. This inverts the scientific process: rather than testing a theory, he finds a result in need of a theory. This is related to the tyranny of metrics, as Jerry Muller calls it, where a metric such as a p-value becomes the telos rather than the scientific theory. Also, even if there is a statistically significant result, this does not mean it is necessarily an important one. For example, there is a statistically significant difference in the ages of heterosexual couples in England, but the difference in ages for each couple is on average about two years. There is no practical difference in the ages of a 35-year-old husband and a 33-year-old wife statistical significance notwithstanding.

There is a third type of publication bias: “The Black Box Bias.” Most statistical software and programming languages that scientists use are “black box” methods, meaning that the scientists enter data and the software magically produces the statistics like p-values. What happens between the input and the output is often ignored and/or misunderstood—hence, “black box” methods. Ritchie should have emphasized the danger of relying on the p-value with “black box” statistical software. He did mention it in passing, but it is truly a type of publication bias in its own right. While a scientist cannot be fairly be expected to understand the nuances of all of the statistical models that he uses, he should be conversant in them. He cannot properly interpret a p-value in context if he does not understand the process which gave birth to it. This is akin to having the ingredients for a cake and then magically getting the baked cake after putting it into a special machine. While the so-called baker can tell us if the cake tastes good or not, he cannot tell us why this is the case.

Be Skeptical of the Experts

Ritchie concludes the book with pragmatic suggestions on how to fix these issues of fraud, bias, negligence, and hype in the sciences. For example, there are algorithms that can check papers for numeric errors and have already been deployed ex post facto on published papers. Ritchie believes they should be used during the peer-review process. Another suggestion, which the American Statistical Association has been championing, is to move away from a fixed p-value.

While strongly suggesting that anyone interested in the issues presented in the book should read it, I do feel obliged to nitpick a few topics in addition to the aforementioned belief that statistics needed to be discussed more. While there are wonderfully extensive endnotes, there is not a separate bibliography and hence, it makes it difficult to keep track of the voluminous references. While prima facie it seems unfair to criticize a book that does not have one particular reference, I will make an exception since it is inexcusable that Ritchie does not include John Staddon’s Scientific Method since it is a recently published monograph that covers much of the same ground as Science Fictions.

While Ritchie touches on the philosophy of science and ethical issues with scientific malfeasance, he should have given a more unified effort in this regard. A suggestion that he does not give for “fixing science” is that, as part of graduate school curriculum, scientists need to learn about the philosophy and ethics of science. Ritchie justifiably expresses skepticism about many scientific results, yet he intransigently defends the notion of climate change and hypocritically explains away the actions of the East Anglia climate scientists that actively manipulated the peer-review system to get their papers published.

In an era in which scientists are among the most trusted professionals in the United States, this book reminds us to be skeptical of the experts. As we have seen repeatedly with COVID-19, and as is illustrated by Science Fictions, scientists are human and are susceptible to the same sins as non-scientists. Great societal damage is done by scientists who lie (e.g., Wakefield and vaccines) or who attempt to direct social policy (e.g., keeping schools closed this Fall due to COVID-19 when children who are infected have a 0.99997 survival rate). In Thomas Sowell’s seminal A Conflict of Visions, he presents the antinomy of the constrained and unconstrained visions. The scientist must resist the temptation of the unconstrained vision, where the scientist believes that he can overcome self-interest (whether it is manifested in fraud, bias, negligence, or hype) through reason. Further, a scientist that is an expert in one field of science is not an expert in another field—and is certainly not an expert in politics or religion. Rather, the scientist must hold the constrained vision and admit that he is self-interested and that there are limits on his expertise and knowledge. The peer-review process properly executed is a manifestation of the constrained vision.

Irrespective of any of its flaws or omissions, Ritchie’s Science Fictions has a good chance to change science for the better and for that, we all benefit from his work.

Reader Discussion

Law & Liberty welcomes civil and lively discussion of its articles. Abusive comments will not be tolerated. We reserve the right to delete comments - or ban users - without notification or explanation.

on October 19, 2020 at 13:08:30 pm

Professor Purdy's book reviews are excellent. I hope too see his work published more often on L&L.

Todays' subject is of mammoth importance, given a) the dominance of the Administrative State, in which hierarchical bureaucratic rule by scientific experts replaces BOTH equality before law and democratic rule by representative government, b) the politicization of science, the politicization of science (and of all other important institutions) being essential to bureaucratic rule, and c) the resurgence of scientism, which is the deployment for ideological purposes of fake science and fake expertise, which are essential to the cancel culture, to media manipulation of science, to scientific disinformation campaigns waged against the public by the administrative state, and to rule by bureaucrats.

This is how the CCP runs China, and it is the form of postmodern rule which the USA is increasingly embracing. Fake science and deference to bureaucratic scientific expertise are the shale-stone of its foundation.

President Eisenhower prophesied of the danger to a democratic republic of BOTH the military industrial complex AND the scientific government complex. Ike knew but a smattering of the real danger of which he spoke.

read full comment
Image of paladin
on October 19, 2020 at 15:09:30 pm

I'm a little biased. I used to do data mining on a population of over a half million subjects, with hundreds of possible demographic and socioeconomic indicators. At that scale, it was not uncommon to find correlations with p < .0001, and in most of those cases it wasn't really any 'surprise' that there was a relationship. The 'facts' which are proven are as straightforward as 'people with higher incomes give more money to charities.' And so in my experience, if there were relationships with a p-value of greater than maybe even .001 (much more so as they approached that mystical .05 threshold), usually it suggested that while the directional relationship was quite real, the cause and effect issues were indirect and muddy. There were probably other factors in play but which we didn't have available as data.
The standard threshold of .05 is pretty much an artifact of an environment 75 - 100 years ago, when the science of statistics was being developed off the results of experiments with labor-intensive methodologies. You need a hundred human subjects, or 500 petri dishes, and you run them thru many tests. Subjects don't come cheap, and running tests and recording results is tedious. But on the number of data points you can generate in those environments, you're doing pretty well to get P < .05. It's perhaps enough to justify the work of writing up the results and publishing them so that somebody else can try to replicate it and take it to the next step.
But now data storage is very cheap, and running analyses on large volumes of is trivially easy thru the black box. The weakness is on the input end: the difficulty of collecting good clean data from somewhat controlled contexts, and the difficulties clean, well-defined measurements. And the thought processes which go into imagining what data to collect. The result is we have far too much data with too little clarity. But at the same time, we still seem to be teaching inquiry and analysis within a pedagogical and philosophical framework left over from the last century. It's playground science. But how cna it pull itself up by its own bootstraps, and grow up to where it's serious and trustworthy?

read full comment
Image of cmcc_aus
on October 19, 2020 at 18:44:03 pm

In the "hard" sciences like physics and chemistry, it is easier to control external variables that might distort or disturb your results, so getting decent answers is relatively "easy". But with the "soft" sciences that attempt to explore areas involving complex human beings, the ability to remove extraneous influences/ influencers is much more difficult, so they are in fact the "harder" sciences to pursue. I only came to this realization after a few decades of patting myself on the back for being a "hard" scientist/engineer, and "sneering" at the soft sciences. Thus any study that employs less than 1,000 to 10,000 participants/ subjects is likely to have questionable results*. But that size study costs an awful lot, so maybe we should be focusing our attention on fewer, better thought out, social science investigations (possibly along the lines of using data sources and methods suggested by CMCC_AUS). These might also need questions about political orientation or bias and level of cognitive dissonance to help the investigators remove such bias from their own analysis. Another bias not mentioned in the essay was the reliance on smallish college age cohorts and/or mostly Western civilization based groups to measure humanity wide characteristics, presumably due to availability and cost constraints.

Speaking of constraints, I for one would like to see a survey of a large group asking them about their inclination towards holding Sowell's constrained or unconstrained visions, correlated to political party or political outlook, etc. The gut feel is that people who are "prudent" and conservative are likely to also be constrained vision believers, and the converse for the liberals, but it would be nice to have real data to confirm or refute that supposition. I suppose Jonathan Haidt comes close, but his solicitation of unvetted participants via a web based survey has its own limits, even with the very large number of responses he now has/had to analyze.

*We are all crossing our fingers that the election polling results are wrong if they disagree with our preference, and vice versa.

read full comment
Image of RR2L
on October 26, 2020 at 01:41:08 am

“The lecture started with a line that became widely known: ‘I am a Professor of Scientific Method—but I have a problem: there is no scientific method.’ ‘However,’ [Sir Karl] Popper continued, ‘there are some simple rules of thumb, and they are quite helpful’” (Paul Feyerabend, Killing Time). And then consider Feyerabend himself in Against Method. The Feyerabend idea of "methodological anarchism" is one that needs revisiting from time to time.

read full comment
Image of Cary Nederman
Cary Nederman

Law & Liberty welcomes civil and lively discussion of its articles. Abusive comments will not be tolerated. We reserve the right to delete comments - or ban users - without notification or explanation.