Measurement versus Judgment

There is a contradiction inherent in the public life of the United States between the personal judgement we all wish to exercise in our own lives and metric-based rules that tend to govern our society. We all make our judgments based on our experience, while metric-based rules explicitly are created to supplant personal judgments. Thomas Sowell in A Conflict of Visions offers a constructive paradigm for considering this antinomy. He distinguishes between the constrained vision and the unconstrained vision. The former vision holds that man is fundamentally self-interested and his knowledge is severely limited while the latter vision holds that man can overcome this self-interest and through reason, he can reach perfectibility of judgement.

As heirs to the constrained vision of Adam Smith, Edmund Burke, and The Federalist Papers, Americans are not only generally skeptical of the powers of the government, but also of those who work within it (particularly in the bureaucracies). This skepticism leads to a distrust of the perspicacity of those who work in government, which results in the creation of rules that substitute for personal judgement. These rules, which those in the government must follow, are often numeric-based. Further, these numbers are not usually objective measurements as in the natural sciences, but merely normative. Alas, it is the unconstrained vision that advocates for the ability to plan and to choose for others, especially with the use of mathematics and purportedly objective rules, which is antithetical to the constrained vision.

This, then, is the antimony:

  • Americans are skeptical of allowing unelected individuals in the government to exercise judgement to control and plan significant areas of their lives. (The Constrained Vision.)
  • This skepticism begets a situation where bureaucrats create rules based on the numbers that in turn are then used to control and plan. (The Unconstrained Vision.)

And indeed, this contradiction is but an instantiation of the larger antinomy that is created when immense bureaucracies are coupled with a Constitution that promises a separation of powers and limited government.

The use of quantitative-based rules is not limited merely to government agencies in the United States like public education and the military, but other spheres of society such as medicine, non-profit organizations, and even baseball. If this was merely a philosophical inconsistency between judgment and rules with no pragmatic import, then it could be safely ignored like most of which philosophers worry about. However, in his well-written and engaging monograph The Tyranny of Metrics, Jerry Z. Muller demonstrates that the government and other public entities reliance on metrics, i.e. descriptive statistics that are quantitative measurements of performance, result in a host of negative unintended consequences. A descriptive statistic is a single value that summarizes data, so a metric has the additional attribute of specifically being a statistic that measures performance in some way. A couple of examples of metrics that Muller discusses are the college and university rankings of US News and World Report and mortgage backed security ratings that were part of the financial crisis of 2008.

While Muller freely admits that his ideas regarding metrics are neither new nor original, he offers a compact synthesis from the disparate literature. The criticism of metrics in the literature has not only generally been isolated from one another by the archipelago of academic disciplines, but completely absent in the public conversation other than baseball. Muller ardently hopes to change the privation of public discourse on metrics with his approachable work on what he terms “metric fixation”. The best summation of Muller’s distinction between metrics and metric fixation can be found in the Introduction.

There are things that can be measured. There are things that are worth measuring. But what can be measured is not always worth measuring; what gets measured may have no relationship to what we actually want to know. The costs of measuring may be greater than the benefits. The things that get measured may draw effort away from the things we really care about. And measurement may provide us with distorted knowledge—knowledge that seems solid but is actually deceptive.

For Muller, metrics are those things that are worth measuring as long as they accurately reflect what they are professing to measure and the cost (both in time and money) makes it worth it; and they do not have us chase the metrics as in end in itself.

Muller’s initial motivation for writing The Tyranny of Metrics is based on his being a chair of a history department who had to tabulate metrics in order to appease the gods of accreditation. It was through this process and his subsequent research that he diagnosed the ubiquitous disease of metric fixation. Hence, the fundamental motif of this book is that measurement does not equal knowledge, thereby making the converse of the dictum attributed to Lord Kelvin false; however, Muller argues that measurement can inform judgments, which can together equal knowledge and lead to prudential judgments.

Failures in Social Science

The historian Muller examines metrics, and in particular metric fixation, primarily through case studies. These form roughly half of the two-hundred or so page book and survey the following topics: education, medicine, criminal justice, the military, business, philanthropy, and foreign aid. Before Muller discusses each one of the case studies from the social realm, he gives his basic argument opposing metric fixation, which is encapsulated in the quote above, and a historical and philosophical critique of the tyranny of metrics. He finishes the brief book with an excursus on transparency and a conclusion about not only the unintended (predictable) negative consequences of metric fixation, but also when one can correctly use metrics.

Prior to evaluating the case studies, Muller traces the origin of social metrics in education in Victorian England and then to how they filtered to the United States where they were combined with the influence of business schools. Much of this story is little-known, with the exception of “Whiz Kid” Robert McNamara and the tragically pointless metric of “body counts” during the Vietnam War. (There are 58,318 names etched in black granite on the Vietnam War Memorial in Washington, D.C., which is also a metric.) The historical threads that he ties together demonstrate the dangers of overbearing governmental control as well as importing an idea from one field such as business sales to another like military strategy. When I discuss Muller’s conclusion, I’ll reference some of the case studies explicitly.

While Muller argues strongly against metric fixation, he does not speak more generally about the current use of mathematics in the social sciences, of which metric fixation is but an exemplar. In particular, Muller neglects to mention the use of inferential statistics in the social sciences, while more sophisticated than the descriptive statistic metrics, are at least as equally pernicious. The tyranny of metrics is indeed oppressive, but the totalitarianism of the null hypothesis significance testing (NHST) in the biological and social sciences likewise subjugates what these disciplines can accomplish. Scientists are usually required to use NHST in order to fulfill their obligations of their grant, to get published, or get approval by the FDA.

NHST is a method of statistical inference where a researcher typically makes an assumption called the null hypothesis which will be something like “there is no difference between the population means of the two groups.” If the data is different enough from the assumed model of the population, then the p-value will be small. Another way to think about the often misunderstood pvalue is as the probability of making a Type I Error (a.k.a. False Positive) under the assumption the null hypothesis is true, given the data. There are deep theoretical and pragmatic issues with the NHST process that would take me to far afield to detail, but many of the flaws mirror the same ills that befall metric fixation. So Muller misses the easy opportunity to make his argument regarding metric fixation even more powerful by discussing the use of NHST in the sciences and noting the sin of “p-value fixation.” Further, it is not just metric or p-value fixations that are the problems in the biological and social sciences, but “quantification fixation” in these sciences writ large.

Muller’s prudent counsel is in line with the constrained vision: there are limitations to human knowledge and judgement, and while metrics can inform the judgement, metrics qua metrics are not a panacea.

Related to this forfeited opportunity regarding NHST is Muller’s discussion on the philosophical critiques of metric fixation in Chapter 6, which is based largely on the thought of Michael Oakeshott, Michael Polanyi, and F.A. Hayek. While the rest of the purposefully brief book seems to contain the proper amount of information, even though it is not the shortest chapter, the excessive concision in Chapter 6 gives an incomplete picture of the profound philosophical issues regarding the use of quantification in the biological and social sciences. In particular, Muller does not argue strongly enough that the entire thesis of this book follows from Hayek’s work on knowledge and facts in the social sciences.

Muller does note that Hayek’s critique of scientism, i.e. the use of the methods of the natural sciences in the social sciences, also pertains to the ideology of metric fixation. Muller recounts that Hayek critiqued “the pretense of knowledge” of the planned economy since not only does it fail to consider germane and dispersed information, but also prohibits the discovery of how to meet unforeseen needs. In a similar way, Muller writes, metric fixation necessarily must limit the goals that can be purportedly measured. This analogy between Hayek’s criticism of scientism and the tyranny of metrics is correct, but regrettably Muller does not detail Hayek’s epistemology of the social sciences further. Muller would need only have to have a small addition on Hayek’s conception of quantification in the social sciences and how that relates to his theory of social facts. While Hayek admitted “that statistics are very useful in informing about the current state of affairs,” he didn’t “think that statistical information has anything to contribute to the theoretical explanation of process” (Hayek on Hayek, p. 131). Thusly, while a given metric, e.g., the graduation rate at a community college can tell you the percentage of students graduating, one cannot draw any meaningful conclusion from this metric.

Metric Fixation

There are eleven “unintended but predictable negative consequences” that Muller enumerates in his conclusion and to get the flavor of these sequela, I shall consider a few and how they relate to a sampling of his case studies.

  • Goal displacement is when there is a diversion of effort to what gets measured, rather than what needs to be done (p. 169). For example, in TK-12 education, when there is a metric based on standardized tests, this results in goal displacement since students will be taught how to pass the test (“what gets measured”), but will not learn the subject matter itself (“what needs to be done”).
  • Costs in employee time is the expenditure of employee time in compiling and processing the metrics. Like the author, I have spent (wasted) time appeasing the gods of accreditation by collecting, analyzing, and reporting on data that in the end does nothing to help the faculty in my department improve our teaching or benefit our students in any way.
  • Rule cascading, where more and more rules are added to counteract the “flow of faulty metrics through gaming, cheating, and goal diversion.” An example of gaming is policing when a “break-in” becomes “trespassing” since that will magically make a felony into a misdemeanor. Attempts to stop such gaming by the introduction of more quantitative based rules just continues the cycle, but does nothing to stop it. As Muller argues persuasively (echoing Hayek), since rules do not and cannot take into account the myriad of possibilities and nuances of every situation there will always be gaps between the rules and reality.

In his final paragraph, Muller offers the following advice to politicians, business leaders, policy officials, and academic officials on how to break free from the tyranny of metric fixation, which is worth iterating whether or not one reads his book, though I suggest that one should: “Ultimately, the issue is not one of metrics versus judgement, but metrics as informing judgments, which includes knowing how much weight to give to metrics, recognizing their characteristic distortions, and appreciating what can’t be measured.” Muller’s prudent counsel is in line with the constrained vision: there are limitations to human knowledge and judgement, and while metrics can inform the judgement, metrics qua metrics are not a panacea. Rather, metrics are just one more price signal that an individual can use to make a choice, but all too often, metrics turn out to be a source of noise.