The Tyranny of Intelligence Testing

Much as the ancient Greeks consulted the oracle at Delphi, contemporary Americans consult intelligence tests. We have always wished to divine the future, only sages have been replaced by psychometricians. The ancient oracle’s pronouncements could be enigmatic, as when she told the Lydian King Croesus that his attack on Persia would result in the destruction of a great empire, which turned out to be his own. Likewise, intelligence tests seem to predict something, but precisely what they mean, and if and how they redound to the benefit of those who take them, is at best highly ambiguous.

For a bit over 60 years, beginning at the end of World War II, students in England and Wales took the 11+ test, the result of which powerfully directed their future. The test assessed ability in three areas: math, English, and verbal reasoning. Based on their performance, students were assigned to one of three types of schools: grammar schools (the most prestigious), whose graduates often progressed to study at university; secondary modern schools, which taught basic skills such as arithmetic; and technical schools, which taught such subjects as engineering and architecture.

The high-stakes 11+ test soon became a defining moment in the lives of many students. Society found out which children were worth investing the most education in, parents learned which child’s intellect they should be most proud of, and children themselves discovered what they should expect of themselves going forward, both educationally and vocationally. The test was the principal means by which children entering their second decade of life were herded together into pathways branching out from this momentous trident in the road.

The introduction of the 11+ system was accompanied by predictable effects. Parents wanted their children to be as well prepared as possible to perform well, so those with means invested in test preparation materials. Teachers soon incorporated 11+-type questions into primary school curricula, and students were drilled regularly in answering such questions. Critics charged that such tests were biased in favor of children from advantaged socioeconomic circumstances, so the tests were redesigned to make them more “objective” and “fair” by more closely approximating IQ tests.

The idea that intelligence is testable owes a great deal to the English polymath Francis Galton, cousin of Charles Darwin, who set up the world’s first testing center in the early 1880s. Galton argued that human reproduction should be harnessed to enhance the intelligence of future generations, mainly by ensuring that intelligent people choose intelligent mates and produce large numbers of children. In his 1869 book, Hereditary Intelligence, Galton objected in what he called “the most unqualified manner” to the suggestion that intelligence is naturally equal, positing heredity as the key to its improvement.

The first modern intelligence tests were developed by French psychologist Alfred Binet and colleagues, who sought to identify mental deficiency in school children. Binet believed that it was possible to determine the mental age of any childthose whose test results showed high intelligence would have advanced mental ages, while those who performed poorly would have mental ages below chronological. When Stanford’s Lewis Terman revised Binet’s work, he produced the Stanford-Binet test, which dominated intelligence testing in the US for decades.

Intelligence testing came into its own with the onset of World War I. The US Army needed an efficient means of determining who, by virtue of low intelligence, was unfit for military service, as well as to which stations other recruits’ intelligence best suited them. It is estimated that 1.75 million men were tested, constituting by far the greatest sample size in the history of intelligence testing. In subsequent years, variations in performance on such tests were used to argue for racial differences in intelligence and to promote policies such as restricted immigration for “inferior” races.

Intelligence testing was well-suited to eugenicists, who held that some groups of people are biologically more intelligent than others. The fate of nations, they argued, hinges on their success at restricting the reproduction of individuals they deemed intellectually unfit or “feeble-minded.” Men such as psychologist Henry Goddard argued that such individuals should be institutionalized to prevent them from reproducing, a policy soon expanded in many states to forced sterilization. In one famous US Supreme Court ruling, Justice Oliver W. Holmes, Jr opined that “Three generations of imbeciles are enough.”

The Nazis under Adolph Hitler expanded US eugenic theories and policies in a program they termed “racial hygiene.” Starting with the feeble-minded and disabled, the German program quickly ballooned the ranks of those “living but unworthy of life” to include criminals and dissidents. They sterilized and “euthanized” hundreds of thousands. In his Second Book, Hitler wrote:

The destruction of the sick, weak, and deformed children [in ancient Sparta] was more decent and in truth a thousand times more humane than the wretched insanity of our day, which preserves the most pathological subject, at any price, and yet takes the lives of a hundred thousand healthy children in consequence of birth control or through abortions, in order subsequently to breed a race of degenerates burdened with illnesses.

The German system operated with ruthless efficiency. Using state-of-the-art computer punch cards developed by a subsidiary of IBM, the government office charged with implementing the program recorded information on intelligence and disability and then produced lists of individuals who were to be sterilized or killed. Prospective marriages were scrutinized for signs of hereditary defects. Conversely, to increase the numbers of the genetically fit and especially fit of the Aryan race, abortions were outlawed, preventing the loss of robust offspring.

There is a common thread running through both British efforts to sort students into appropriate educational tracks and eugenics-inspired Nazi sterilization and extermination programs. It is the effort to develop human categories that are so valid, reliable, objective, and fair that they constitute something approaching human destiny. If only we could produce a test of sufficient quality, enthusiasts argued, we could determine far in advance who should become a doctor, lawyer, or professor, and who should sweep the streetsor who should live, who should reproduce, and who should die.

It turns out that standardized intelligence tests do not do a very good job of predicting how fulfilling a life someone will lead or the magnitude of the contributions they will make to their community. They do, however, predict how well they will perform on their next standardized test.

Yet the tests were a highly imperfect means of promoting a highly questionable end. In the early 20th century, educators were confronted with daunting challenges, including a national population whose ranks had been swollen by tens of millions of immigrants, a highly heterogeneous educational system with little funding, and a teacher shortage. Standardized testing seemed to address such concernsin comparison to an oral or essay examination, it was quick, relatively inexpensive, efficient to administer and grade, and seemed to take differences in teachers’ knowledge, experience, and judgment off the table.

Give the same test to everyone and see how they do. Those who perform well on the reading test are good readers, those who shine on the math test are good at math, and those who perform well on general problem solving are good problem solvers. Those who perform well on all three are highly intelligent. Conversely, the same tests would identify the feeble-minded, who could be shunted into other educational and life pathways. By seeing people as members of objectively defined categories, it seemed possible to discriminate between people fairly.

And yet, such tests are subject to a host of linguistic, socioeconomic, and cultural biases that often favor the children of native-born middle- and upper-class parents and tilt heavily against the children of non-English speakers, the poor, those who grew up in other parts of the world, and the children of the less educated. For example, when asked to reproduce figures using either pencil and paper or wire, North American children perform better with paper and pencil, the traditional format of intelligence tests, while children from Africa do better with wire.

Yet there are far more serious problems with intelligence testing than bias, and they are intrinsic to the putative advantages of such tests. Because every child takes the same test, it is possible to rank test takers in terms of the abilities assessed by the test. Many traits and aptitudes are not tested, such as curiosity, handling questions that have more than one answer, developing a convincing argument, pursuing understanding about new things, working collaboratively with others, learning from mistakes, and incorporating new insights into daily life.

It turns out that standardized intelligence tests do not do a very good job of predicting how fulfilling a life someone will lead or the magnitude of the contributions they will make to their community. They do, however, predict how well they will perform on their next standardized test. Because such tests play such an outsize role in contemporary education, they offer some insight into how well people are likely to perform over their academic careers. Yet equating such test results with intelligence overlooks the fact that there are many different ways to look at problems and opportunities.

Merely having the childhood intelligence test scores of a Shakespeare, a Lincoln, or an Einstein would not enable a psychometrician to predict that one would produce King Lear, another would pen the Gettysburg address, and another would devise the formula E = mc2. Because such tests necessarily treat every test taker the same, they cannot account for distinctive capacities such as creativity and imagination. They might place a very small percentage of test takers into the genius category, but they offer little insight into the ways ingenuity will express itself in work and life.

Consider the once-ubiquitous college entrance examination, the SAT. Originally introduced in the 1920s as the Scholastic Aptitude Test, it purported to determine test takers’ readiness for college. Yet in 1993, the letters S, A, and T stopped standing for anything, and the test became known simply as the SAT. Why the change?  Decades of research had failed to demonstrate that test results could account for anything more than about 20% of the variation in college performance, as defined by class grades. It simply did not perform well as an assessment of college aptitude.

When it comes to the assessment of intelligence, we keep making the same mistakes. We suppose that intelligence must be hereditary, and then we devise tests to prove it. We make idols of objectivity and fairness, forgetting that learners are different and require different approaches to express their capabilities. We turn means of assessment into educational ends, leading students and teachers to devote inordinate attention to test preparation and test scores. We forget that the purpose of education is not to stratify but to help learners lead good lives and contribute as much as they can.

What matters most is not raw intelligence but the degree to which intelligence is yoked to the appetite for excellence and its pursuit. Do learners love to know, and do they value truth above convenience? Can they distinguish between worthy and unworthy ends, and do they relish hard work in pursuit of the worthy? Do they see the evil in preferring injustice to justice and falsehoods over the truth? Do they recognize what is truly lovable in other people, and are they prepared to sacrifice on their behalf? Do they understand why it is so important to examine and learn from the lives they are leading?

Compared to the lessons of a Socrates, the 11+, the SAT, and their ilk are thin gruel indeed. They both ask of us and tell us far too little about the human beings whose welfare we are called to look out for. Socrates operated with a much different set of educational priorities that demanded a much different means of assessment. No simple score, whether a standardized test result or net worth, could begin to capture what we most need to know. Facing imminent death, he highlighted what matters most when he asked his fellow Athenians to reprove his sons if they ever cared about anything more than virtue.