Measurement versus Judgment: On Jerry Muller’s Tyranny of Metrics

There is a contradiction inherent in the public life of the United States between the personal judgement we all wish to exercise in our own lives and metric-based rules that tend to govern our society. We all make our judgments based on our experience, while metric-based rules explicitly are created to supplant personal judgments. Thomas Sowell in A Conflict of Visions offers a constructive paradigm for considering this antinomy. He distinguishes between the constrained vision and the unconstrained vision. The former vision holds that man is fundamentally self-interested and his knowledge is severely limited while the latter vision holds that man can overcome this self-interest and through reason, he can reach perfectibility of judgement.

As heirs to the constrained vision of Adam Smith, Edmund Burke, and The Federalist Papers, Americans are not only generally skeptical of the powers of the government, but also of those who work within it (particularly in the bureaucracies). This skepticism leads to a distrust of the perspicacity of those who work in government, which results in the creation of rules that substitute for personal judgement. These rules, which those in the government must follow, are often numeric-based. Further, these numbers are not usually objective measurements as in the natural sciences, but merely normative. Alas, it is the unconstrained vision that advocates for the ability to plan and to choose for others, especially with the use of mathematics and purportedly objective rules, which is antithetical to the constrained vision.

This, then, is the antimony:

  • Americans are skeptical of allowing unelected individuals in the government to exercise judgement to control and plan significant areas of their lives. (The Constrained Vision.)
  • This skepticism begets a situation where bureaucrats create rules based on the numbers that in turn are then used to control and plan. (The Unconstrained Vision.)

And indeed, this contradiction is but an instantiation of the larger antinomy that is created when immense bureaucracies are coupled with a Constitution that promises a separation of powers and limited government.

The use of quantitative-based rules is not limited merely to government agencies in the United States like public education and the military, but other spheres of society such as medicine, non-profit organizations, and even baseball. If this was merely a philosophical inconsistency between judgment and rules with no pragmatic import, then it could be safely ignored like most of which philosophers worry about. However, in his well-written and engaging monograph The Tyranny of Metrics, Jerry Z. Muller demonstrates that the government and other public entities reliance on metrics, i.e. descriptive statistics that are quantitative measurements of performance, result in a host of negative unintended consequences. A descriptive statistic is a single value that summarizes data, so a metric has the additional attribute of specifically being a statistic that measures performance in some way. A couple of examples of metrics that Muller discusses are the college and university rankings of US News and World Report and mortgage backed security ratings that were part of the financial crisis of 2008.

While Muller freely admits that his ideas regarding metrics are neither new nor original, he offers a compact synthesis from the disparate literature. The criticism of metrics in the literature has not only generally been isolated from one another by the archipelago of academic disciplines, but completely absent in the public conversation other than baseball. Muller ardently hopes to change the privation of public discourse on metrics with his approachable work on what he terms “metric fixation”. The best summation of Muller’s distinction between metrics and metric fixation can be found in the Introduction.

There are things that can be measured. There are things that are worth measuring. But what can be measured is not always worth measuring; what gets measured may have no relationship to what we actually want to know. The costs of measuring may be greater than the benefits. The things that get measured may draw effort away from the things we really care about. And measurement may provide us with distorted knowledge—knowledge that seems solid but is actually deceptive.

For Muller, metrics are those things that are worth measuring as long as they accurately reflect what they are professing to measure and the cost (both in time and money) makes it worth it; and they do not have us chase the metrics as in end in itself.

Muller’s initial motivation for writing The Tyranny of Metrics is based on his being a chair of a history department who had to tabulate metrics in order to appease the gods of accreditation. It was through this process and his subsequent research that he diagnosed the ubiquitous disease of metric fixation. Hence, the fundamental motif of this book is that measurement does not equal knowledge, thereby making the converse of the dictum attributed to Lord Kelvin false; however, Muller argues that measurement can inform judgments, which can together equal knowledge and lead to prudential judgments.

The structure of the book and two related criticisms

The historian Muller examines metrics, and in particular metric fixation, primarily through case studies. These form roughly half of the two-hundred or so page book and survey the following topics: education, medicine, criminal justice, the military, business, philanthropy, and foreign aid. Before Muller discusses each one of the case studies from the social realm, he gives his basic argument opposing metric fixation, which is encapsulated in the quote above, and a historical and philosophical critique of the tyranny of metrics. He finishes the brief book with an excursus on transparency and a conclusion about not only the unintended (predictable) negative consequences of metric fixation, but also when one can correctly use metrics.

Prior to evaluating the case studies, Muller traces the origin of social metrics in education in Victorian England and then to how they filtered to the United States where they were combined with the influence of business schools. Much of this story is little-known, with the exception of “Whiz Kid” Robert McNamara and the tragically pointless metric of “body counts” during the Vietnam War. (There are 58,318 names etched in black granite on the Vietnam War Memorial in Washington, D.C., which is also a metric.) The historical threads that he ties together demonstrate the dangers of overbearing governmental control as well as importing an idea from one field such as business sales to another like military strategy. When I discuss Muller’s conclusion, I’ll reference some of the case studies explicitly.

While Muller argues strongly against metric fixation, he does not speak more generally about the current use of mathematics in the social sciences, of which metric fixation is but an exemplar. In particular, Muller neglects to mention the use of inferential statistics in the social sciences, while more sophisticated than the descriptive statistic metrics, are at least as equally pernicious. The tyranny of metrics is indeed oppressive, but the totalitarianism of the null hypothesis significance testing (NHST) in the biological and social sciences likewise subjugates what these disciplines can accomplish. Scientists are usually required to use NHST in order to fulfill their obligations of their grant, to get published, or get approval by the FDA.

NHST is a method of statistical inference where a researcher typically makes an assumption called the null hypothesis which will be something like “there is no difference between the population means of the two groups.” If the data is different enough from the assumed model of the population, then the p-value will be small. Another way to think about the often misunderstood pvalue is as the probability of making a Type I Error (a.k.a. False Positive) under the assumption the null hypothesis is true, given the data. There are deep theoretical and pragmatic issues with the NHST process that would take me to far afield to detail, but many of the flaws mirror the same ills that befall metric fixation. So Muller misses the easy opportunity to make his argument regarding metric fixation even more powerful by discussing the use of NHST in the sciences and noting the sin of “p-value fixation.” Further, it is not just metric or p-value fixations that are the problems in the biological and social sciences, but “quantification fixation” in these sciences writ large.

Related to this forfeited opportunity regarding NHST is Muller’s discussion on the philosophical critiques of metric fixation in Chapter 6, which is based largely on the thought of Michael Oakeshott, Michael Polanyi, and F.A. Hayek. While the rest of the purposefully brief book seems to contain the proper amount of information, even though it is not the shortest chapter, the excessive concision in Chapter 6 gives an incomplete picture of the profound philosophical issues regarding the use of quantification in the biological and social sciences. In particular, Muller does not argue strongly enough that the entire thesis of this book follows from Hayek’s work on knowledge and facts in the social sciences.

Muller does note that Hayek’s critique of scientism, i.e. the use of the methods of the natural sciences in the social sciences, also pertains to the ideology of metric fixation. Muller recounts that Hayek critiqued “the pretense of knowledge” of the planned economy since not only does it fail to consider germane and dispersed information, but also prohibits the discovery of how to meet unforeseen needs. In a similar way, Muller writes, metric fixation necessarily must limit the goals that can be purportedly measured. This analogy between Hayek’s criticism of scientism and the tyranny of metrics is correct, but regrettably Muller does not detail Hayek’s epistemology of the social sciences further. Muller would need only have to have a small addition on Hayek’s conception of quantification in the social sciences and how that relates to his theory of social facts. While Hayek admitted “that statistics are very useful in informing about the current state of affairs,” he didn’t “think that statistical information has anything to contribute to the theoretical explanation of process” (Hayek on Hayek, p. 131). Thusly, while a given metric, e.g., the graduation rate at a community college can tell you the percentage of students graduating, one cannot draw any meaningful conclusion from this metric.

The tyranny of metric fixation

There are eleven “unintended but predictable negative consequences” that Muller enumerates in his conclusion and to get the flavor of these sequela, I shall consider a few and how they relate to a sampling of his case studies.

  • Goal displacement is when there is a diversion of effort to what gets measured, rather than what needs to be done (p. 169). For example, in TK-12 education, when there is a metric based on standardized tests, this results in goal displacement since students will be taught how to pass the test (“what gets measured”), but will not learn the subject matter itself (“what needs to be done”).
  • Costs in employee time is the expenditure of employee time in compiling and processing the metrics. Like the author, I have spent (wasted) time appeasing the gods of accreditation by collecting, analyzing, and reporting on data that in the end does nothing to help the faculty in my department improve our teaching or benefit our students in any way.
  • Rule cascading, where more and more rules are added to counteract the “flow of faulty metrics through gaming, cheating, and goal diversion.” An example of gaming is policing when a “break-in” becomes “trespassing” since that will magically make a felony into a misdemeanor. Attempts to stop such gaming by the introduction of more quantitative based rules just continues the cycle, but does nothing to stop it. As Muller argues persuasively (echoing Hayek), since rules do not and cannot take into account the myriad of possibilities and nuances of every situation there will always be gaps between the rules and reality.

In his final paragraph, Muller offers the following advice to politicians, business leaders, policy officials, and academic officials on how to break free from the tyranny of metric fixation, which is worth iterating whether or not one reads his book, though I suggest that one should: “Ultimately, the issue is not one of metrics versus judgement, but metrics as informing judgments, which includes knowing how much weight to give to metrics, recognizing their characteristic distortions, and appreciating what can’t be measured.” Muller’s prudent counsel is in line with the constrained vision: there are limitations to human knowledge and judgement, and while metrics can inform the judgement, metrics qua metrics are not a panacea. Rather, metrics are just one more price signal that an individual can use to make a choice, but all too often, metrics turn out to be a source of noise.

Reader Discussion

Law & Liberty welcomes civil and lively discussion of its articles. Abusive comments will not be tolerated. We reserve the right to delete comments - or ban users - without notification or explanation.

on July 16, 2018 at 09:23:18 am

A welcome piece - quite informative.

1) nobody.really should have this on his reading list. Ha!

2) Oddly, what came to mind while reading this essay was not all the government statsitical studies/ rules, etc but rather a former practice of a (many, if not all) manufacturing company. each year, the plant would be shut down for two days to *count* inventory, including certain pieces of hardware that, at the time, had a value of less than $0.01.

Yet, measure / count we did, no doubt spurred on by the needs of the biggest counters / statisticians within the company, Accounting. Yes, we made a(n) (im)precise count of our parts while succeeding in shutting down a factory for two days AND learning nothing whatsoever about our product quality / reliability or performance.
Ultimately, we dispensed with this metric; oddly enough both profits and performance improved as we were able to focus on those metrics which mattered: How did our machines perform; how well did they diagnose disease: how did our customers respond to them; and how did our capabilities compare to others - NOT HOW MANY penny screws we lost or took home with us.
Not the most exciting example, I confess BUT a pretty clear one.

read full comment
Image of gabe
on July 16, 2018 at 09:37:14 am

See also Russ Roberts's interview with Muller here: http://www.econtalk.org/jerry-muller-on-the-tyranny-of-metrics/?highlight=%5B%22muller%22%5D

read full comment
Image of Amy Willis
Amy Willis
on July 16, 2018 at 09:49:02 am

On second thought, let me add something which may provide a concrete example of what both Muller and the essayist are asserting.

As a result of the suspension of annual inventories (actually semi-annual for a time), we were able to change both behavior and practices and eliminating those behaviors incidental to the rules.

No longer required to count every item, we dispensed with stockrooms and the requirement that a needed part be requisitioned from the stockroom. No longer would an employee be removed from his / her work station for 15 minutes while gathering parts. Incidental to that stockroom visit, the employee may also spend an additional 5-10 minutes chatting with a friend. Thus, the line, or a sub-assembly line is delayed for a considerable period.
It also allowed us to afford greater responsibility to supply chain partners who would simply restock our shelves themselves DIRECTLY at the point of assembly. (In a nutshell, we moved to a "Just-in-Time" method of manufacturing). I'll skip the theoretical underpinnings of this manufacturing methodology so as not to bore the readers BUT I will say that the changes in behavior AND the consequent change in focus allowed us to CHANGE OUR GOALS. Indeed, forcing employees to "react" when a part (now, not available in abundance) failed caused us to examine both our manufacturing processes in depth, the components themselves and our design. None of this would be possible had we not dispensed with the burden of inventory or alternatively the luxury of having excess components on hand.
In short order, we became the premier manufacturer / designer of this diagnostic equipment by focusing on the proper metrics and using that date to inform future behavior / design and supply chain partners.

So yep, metrics may be misleading and they quite often adversely affect goals, performance and behavior - indeed, metrics distort or at least condition epistemology.

read full comment
Image of gabe
on July 16, 2018 at 13:43:13 pm

gabe: nobody.really should have this on his reading list. Ha!

Yeah, when I saw the title of this essay, I knew what I was in for. :-)

As Muller acknowledges, complaints about reliance on data are hardly new. Stephen Jay Gould complained about the Mismeasure of Man. (1981). Philip K. Howard complained about how an excessive reliance on objective standards has led to the Death of Common Sense (1994). Robin Williams’s character complained about the effort to establish an objective standard for evaluating poetry in Dead Poet's Society (1989).

And in The Little Prince (1943), Saint-Exupéry remarks that number-obsession seems to be a universal part of “grown-ups and their ways. When you tell them that you have made a new friend, they never ask you any questions about essential matters. They never say to you, ‘What does his voice sound like? What games does he love best? Does he collect butterflies?’ Instead, they demand: ‘How old is he? How many brothers has he? How much does he weigh? How much money does his father make?’ Only from these figures do they think they have learned anything about him.

If you were to say to the grown-ups: ‘I saw a beautiful house made of rosy brick, with geraniums in the windows and doves on the roof,’ they would not be able to get any idea of that house at all. You would have to say to them: "I saw a house that cost $20,000." Then they would exclaim: ‘Oh, what a pretty house that is!’"

Yet, oddly, I rarely hear people complaining that our financial meltdown proves that mortgage lenders should stop relying on numerical data and instead make lending decisions based on the color of a home’s bricks, the flowers in the windows, or the birds on the roof. Of course, these complainers were all grown-ups, so what should I expect?

The historian Muller examines metrics, and in particular metric fixation, primarily through case studies. These form roughly half of the two-hundred or so page book and survey the following topics: education, medicine, criminal justice, the military, business, philanthropy, and foreign aid.

I suspect Muller has a target-rich environment for his work. But I must ask: How do the results of our current, data-ladened world of education, medicine, criminal justice, the military, business, philanthropy, and foreign aid compare to results from the less data-ladened past? How do today’s results compare to the results achieved in less data-ladened parts of the world?

Moreover, can we even hope to make such comparisons if we’re not going to consider quantitative data?

I don't mean to deny any of the harms alleged to arise from reliance of data. Consider rules cascading: In an effort to reduce needless paperwork, Congress passed the 1980 Paperwork Reduction Act. The effect of this act is to require federal agencies gathering data to do additional paperwork to demonstrate that they considered how much paperwork the policy change would require. Irony is too slight a word.

But, more seriously, let's look at education: Yes, in a world of high-stakes standardized tests, teachers will “teach to the test,” which may burden some teaching styles. But in a world without such tests, abysmal schools can putter along forever, without ever attracting much public attention to their condition. So, are we better off with imperfect information, or perfect ignorance?

One advantage of erring on the side of seeking information, even if imperfect, is that we have the opportunity to refine and improve our information over time. For example, we can reduce the harm of "teaching to the test" by making tests better conform to the substance we want taught. But when we err on the side of perfect ignorance, we start off with perfection--and so have little opportunity for improvement.

“Ultimately, the issue is not one of metrics versus judgement, but metrics as informing judgments, which includes knowing how much weight to give to metrics, recognizing their characteristic distortions, and appreciating what can’t be measured.”

I concur. But, of course, this policy relies on decisions-makers who exercise a modicum of discretion based on their sophisticated understanding of the data’s limitations. And it relies on a public that will trust sophisticated decision-makers rendering decisions that the public—lacking the same degree of expertise—may not understand or support.

Consider: We’re recently had a lengthy discussion about Harvard’s admissions practices. Some people object to Harvard’s exercise of discretion, presumably arguing that Harvard should instead rely on some more objective measures for making its decisions. But what measures specifically should Harvard rely on—and why should we regard those measures as somehow more worthy than the considerations that Harvard’s admissions officers employ? Do we really think that our understanding of admission’s criteria are comparable to the understandings of people who do this on a daily basis?

read full comment
Image of nobody.really
on July 16, 2018 at 15:46:12 pm


Luvv'd the "Little Prince" reference.

Here is an area in which we are in agreement. Date requires a sophistication of observation AND an understanding of limitations.

In the real world example i cited this should be quite plain. We were measuring the *wrong* results, no doubt spurred on by the need of one functional Department to measure, or at least project *it's* efficiency. That danger may also be observed in current government agency directives / rules.

what is needed is the *sophistication* you argue for; one that is capable of recognizing its own failures, its own prior errors and most importantly - its original mission. Add to that the ability to assess whether its original mission is still required / possible or even worthwhile while simultaneously recognizing that its current focus may be deterring it from pursuing other more worthwhile / beneficial objectives as in the Tyranny of the Accountants I alluded to above.
Thier "institutional" mission actively deterred others from the attainment of ostensibly more "productive" corporate goals.

Hey, BTW:
While the little ones were catnapping, I watched a show from FOX Business (I dvr'ed it) as I don;t normally watch much of this silly stuff). It was called Three Days in January - about Ike's transition with JFK AND the (big lead-in) was Ike's famed "Military Industrial Complex quote.

I must go and reread Ike's Farewell Address as he also cautions against an "Academic- Science - Bureaucracy Complex." It is rumored that he also wanted to include the Congress in that last complex but......

I like Ike - more so, the more I read of him.

read full comment
Image of gabe
on July 16, 2018 at 18:09:00 pm

Oops, I forgot this:

We did not stop measurements; rather we determined WHAT we needed to measure, how to measure it and how to respond to those elements deemed sufficiently critical to measure.
Easy part of the process: Very little of these *elements* were subject to the human "factors" as is the case with social science research.

read full comment
Image of gabe
on July 16, 2018 at 18:38:07 pm


and now for a different take on statistics, i.e., specifically moneyball and sabremetrics: it jus' ain;t no fun! or How Statistics can Ruin a Good Game!


read full comment
Image of gabe
on July 16, 2018 at 21:16:54 pm

This begs the question: If you select your strategy to optimize some outcome, what specific outcome should owners optimize? How 'bout head coaches? Assistant coaches? Players? Because different goals warrant different strategies.

This calls to mind two recent discussions we've had here. First, as we discussed in the context of soccer (football), players seem to be paid for their entertainment value--that is, in proportion to their looks or their off-field antics. In other words, it appears that owners may be trying to optimize revenues (entertainment value?) rather than wins. Thus, soccer may have more in common with professional wrestling than you imagined.

Second, during our discussion about Affirmative Action and sports, I noted that the rules of sports are arbitrary, and we change them to generate whatever outcomes we favor--and I noted a long list of examples. When fans could count of each down in (American) football resulting in "three yards and a cloud of dust," people invented the forward pass. When various sports became dominated with people crowding around the goal/basket, people invented offsides/time-in-the-lane limits. When people became weary of pitchers striking out, (some) people adopted the Designated Hitter rule. When baskets became too easy, people raised the hoop. And when lay-ups became too dominant, people invented the 3-point shot. Etc.

So, if baseball is becoming too conservative, perhaps we a rule change to create an incentive for teams to change their strategies. Maybe give bonus runs for Grand Slams--thereby creating an incentive to fill the bases?

read full comment
Image of nobody.really
on July 16, 2018 at 21:48:52 pm

This is a worthwhile article, and Dr. Purdy deserves credit for noticing that reverence of p-values is afflicted by many of the same concerns affecting measurement. In addition to the drawbacks of measurement fetish (or "metric fixation") identified in the subject book by Professor Muller, there are other derivative concerns:

1. The inevitable rise of the priest class of "experts" and pseudo experts who interpret the measurements for us and thereby tell us the will of the gods;

2. The natural tendency to associate measurements with rankings, and therefore order things that otherwise defy numerical characterization. What measure do we use to determine who was worse, Hitler or Pol Pot?

3. The incorrect tendency to assume that "measurement" are objective, and therefore minimize subjective influences in public or political matters. Everyone involved in a subject of measurement has interests: the people who want something tangible by which to make decisions, the people doing the measuring, the people who are affected by the results of measurement, the people who interpret measurements, etc.

I don't think it is quite accurate to think that there is an antagonism between measurement and judgment as implied by the title of this post. I think even Professor Muller would agree given the sentence quoted in the last paragraph of the article. But I would propose going a bit further. It is not simply that measurement can "inform judgment." Measurements are data and data are evidence. Evidence can be conclusive, inconclusive, suggestive, misleading, irrelevant and confusing. The same applies to the relationship between measurement and judgment. Useful measurements are evidence of a particular state of affairs; non-useful measurements generally are not, although they may hold some value for other contexts, or other arguments. Look at all of the data, i.e. measurements, by which we were assured that alcohol was bad for you. I mean good for you. No, bad. Or is it? I can't remember.

read full comment
Image of z9z99
on July 16, 2018 at 23:03:09 pm

Everyone involved in a subject of measurement has interests: the people who want something tangible by which to make decisions, the people doing the measuring, the people who are affected by the results of measurement, the people who interpret measurements, etc.

"‘I care not who casts the votes of a nation if they’ll let me make the count." George Creel, Uncle Henry (1922).

Look at all of the data, i.e. measurements, by which we were assured that alcohol was bad for you. I mean good for you. No, bad. Or is it? I can’t remember.

Uh ... ok, I think z9z99 has had one too many; time to sleep it off....

For what it's worth, NPR's "On the Media" did a story on the history of the story that wine is good for you. Some guy was focusing on the "French paradox"--the observation that French people ate rich foods, yet didn't suffer from heart disease at the same rate as Americans. He struck on the idea that red wine (among other things) was the saving grace; Morley Safer reported the story on "60 Minutes"; and US wine consumption soared--but still not to the level of French wine consumption.

The irony is that at precisely the same time, the French were cracking down on wine consumption--also as a public health measure: True, relatively few French people died of heart disease--because they were dying of alcoholism and drunk driving first.

read full comment
Image of nobody.really
on July 17, 2018 at 00:24:26 am

Pharmaceuticals are an area rife with examples of "measurements" leading to folly. There were data, i.e. someone measured a benefit, showing that ibuprofen prevented Alzheimer's disease. Eli Lilly conducted a study called PROWESS that showed that a drug called drotrecogin (apparently it is easier to get FDA approval for drugs with unpronouncable names) had a small survival benefit in patient;s with severe sepsis. On this basis the FDA approved the drug, but required further study. The further study showed higher mortality in patients treated with drotrecogin and Lilly withdrew the drug from the market. Medical "links" are all over the place: Alcohol consumption is associated with improved lipid profiles, and lower cardiovascular mortality until it isn't. Power lines cause leukemia. Saccharine causes bladder cancer...in mice. Cell phones...Chocolate...breast feeding...plastic water bottles...

read full comment
Image of z9z99
on July 17, 2018 at 01:15:19 am

Because belief in its efficacy seems to outlast evidence that it frequently doesn’t work, metric fixation has elements of a cult.’’

‘Cult’! What? . . .

“Studies that demonstrate its lack of effectiveness are either ignored, or met with the assertion that what is needed is more data and better measurement. Metric fixation, which aspires to imitate science, too often resembles faith.’’ (19)

The line between ‘science’ and ‘faith’ is hard to ‘measure’.

Where else measurements silly?

“To demand or preach mechanical precision, even in principle, in a field incapable of it is to be blind and to mislead others,” as the British liberal philosopher Isaiah Berlin noted in an essay on political judgment. Indeed what Berlin says of political judgment applies more broadly:”

‘In a field incapable of measurement’; what is needed?

“judgment is a sort of skill at grasping the unique particularities of a situation, and it entails a talent for synthesis rather than analysis, “a capacity for taking in the total pattern of a human situation, of the way in which things hang together.”

“A feel for the whole and a sense for the unique are precisely what numerical metrics cannot supply.’’

Chapter 4 “Why Metrics became so Popular’’

“The demand for measured accountability and transparency waxes as trust wanes.’’

Where more trust found?

“In societies with an established, transgenerational upper class, the members of that class are more likely to feel secure in their positions, to trust one another, and to have imbibed a degree of tacit knowledge about how to govern from their families, giving them a high degree of confidence in their judgments.’’

Well. . .what are most. . .

“By contrast, in meritocratic societies with more open and changing elites, those who reach positions of authority are less likely to feel secure in their judgments, and more likely to seek seemingly objective criteria by which to make decisions. And numbers convey the air of objectivity; they imply the exclusion of subjective judgment.“ (39)

Numbers never lie! (Some medieval cities burned mathematicians)

1 The Argument in a Nutshell
2 Recurring Flaws

3 The Origins of Measuring and Paying for Performance
4 Why Metrics Became So Popular
5 Principals, Agents, and Motivation
6 Philosophical Critiques

Case Studies

14 When Transparency Is the Enemy of Performance: Politics, Diplomacy, Intelligence, and Marriage

15 Unintended but Predictable Negative Consequences
16 When and How to Use Metrics: A Checklist

Last page . . .

“Remember that sometimes, recognizing the limits of the possible is the beginning of wisdom. Not all problems are soluble, and even fewer are soluble by metrics. It’s not true that everything can be improved by measurement, or that everything that can be measured can be improved. Nor is making a problem more transparent necessarily a step to its solution.’’

Recognizing limits is. . .is. . .painful.

“Transparency may make a troubling situation more salient, without making it more soluble. In the end, there is no silver bullet, no substitute for actually knowing one’s subject and one’s organization, which is partly a matter of experience and partly a matter of unquantifiable skill.’’

Skill is precious. Can’t put a number on it!

“Many matters of importance are too subject to judgment and interpretation to be solved by standardized metrics. Ultimately, the issue is not one of metrics versus judgment, but metrics as informing judgment, which includes knowing how much weight to give to metrics, recognizing their characteristic distortions, and appreciating what can’t be measured. In recent decades, too many politicians, business leaders, policymakers, and academic officials have lost sight of that.’’

read full comment
Image of Clay garner
Clay garner
on July 17, 2018 at 10:58:05 am

Pharmaceuticals are an area rife with examples of “measurements” leading to folly.

I don' t know if this is an example, but are you acquainted with BiDil?

Medco asked the federal Food & Drug Administration (FDA) to approve their new drug to treat congestive heart failure (CHF). The FDA rejected the application because the supporting data provided insufficient. Medco then re-sliced the data, hunting for some sub-section of the sample that could demonstrate efficacy. And they found one: the drug had proven atypically effective—for black patients. And, for purposes of the study, “black” was defined as people who self-identify as black.

Think about that: A social label is used as a relevant variable for a medical test. But, in fairness, the old treatments for CHF (for example, ACE inhibitors) had not worked well for black people, so the idea of some meaningful black/white distinction regarding CHF meds was already planted in physicians’ minds.

The New England Journal of Medicine then published a study showing that the drug was a hit: it reduced mortality by 43%, reduced hospitalizations by 39%, and improved quality of life markers--in black CHF patients. On this basis, in 2005 the FDA approved the drug, now named BiDil, for use in black patients.

Mere statistical games—or real phenomenon? You make the call.

read full comment
Image of nobody.really
on July 17, 2018 at 12:49:40 pm

Mere statistical games—or real phenomenon? You make the call.

It is both. Doctors have been using the combination of hydralazine and isosorbide to treat heart failure in patients of all colors and shapes for at least twenty five years. It is used as the go to when patients cannot take ACE inhibitors because of impaired kidney function (a frequent occurrence in patients with heart failure and diabetes). Since blacks have less blood pressure and other physiological response to ACE inhibitors, they should demonstrate a statistical benefit even if the positive response to the combination is the same across racial traits, because the positive response to the comparator, enalapril, is not. (And of course, this does not rule out there may in fact be a marginal beneficial response in those with black racial traits.)

read full comment
Image of z9z99
on October 19, 2020 at 07:33:42 am

[…] a theory, he finds a result in need of a theory. This is related to the tyranny of metrics, as Jerry Muller calls it, where a metric such as a p-value becomes the telos rather than the scientific theory. […]

on October 19, 2020 at 18:06:15 pm

[…] than testing a idea, he finds a end in want of a idea. This is said to the tyranny of metrics, as Jerry Muller calls it, what place a metric equivalent to a p-value turns into the telos somewhat than the […]

Law & Liberty welcomes civil and lively discussion of its articles. Abusive comments will not be tolerated. We reserve the right to delete comments - or ban users - without notification or explanation.