Organization of American Historians
Click on the keywords to navigate the site.

Reprinted from the OAH Magazine of History
7 (Fall 1992). ISSN 0882-228X

History by the Numbers: Why Counting Matters

Peter A. Coclanis

One of my best graduate students at UNC-Chapel Hill recently told me that whenever he comes across a statistical table in a history book, he skips over it, claiming “numbers just don’t do it for me.” If recent studies can be believed, numbers apparently don’t do it for most other Americans either. Nor does much sheepishness exist among the “innumerate” about their inability or unwillingness to deal with numerical information. The same people who would blanch tout de suite if they failed to recognize an esoteric foreign expression have no compunction whatsoever about announcing to one and all that they can’t balance a checkbook (1)!

This essay represents a modest attempt to address this problem. It is predicated upon the notion that numerical representation is often helpful in attempting to understand and interpret the past. As a historian who deals frequently with numbers&emdash;but who is neither a believer in “scientific history” nor an advocate of privileging the numerical over other approaches&emdash;I shall attempt here to explain that for some purposes, numbers can “do it.”

Numerical representation was not employed in a systematic and analytical way until relatively recently in world history. To be sure, numerical data had been collected for millennia by peoples all over the world. For the most part, though, such data was collected haphazardly and/or irregularly and was primarily for illustrative, descriptive, ceremonial or rudimentary record-keeping purposes. Around the seventeenth century, things began to change as certain individuals and groups in Western nation-states started to produce, distribute, and employ numerical information in more standardized and rigorous ways. This new emphasis on system, method, and classification can be attributed to Baconian empiricism, the rise of capitalism, the needs of state building, or to factors internal to mathematics itself. For whatever cause, one finds the timely emergence of what sociologist Paul Starr calls “statistical systems”&emdash;the appearance of coherent structures, social and cognitive, by which numerical information is collected, organized, analyzed, distributed, and used, typically for programmatic ends. Such structures developed in both the public and private realms of seventeenth-century Europe, most rapidly in the former; indeed, the term statistics itself originally was used to denote facts about the state (2).

Over the next two centuries these systems gradually became more sophisticated and specialized, and by the mid-nineteenth century, “modern” statistical systems had begun to emerge in parts of Europe and America. Considerable evidence supports this point. By that time, for example, national censuses were becoming common, mercantile bookkeeping was being transformed into accounting, and the academic discipline of statistics&emdash;in both its descriptive and inferential modes&emdash;had begun to take root (3).

The early history of the United States Census reflects some of the ongoing changes. Inaugurated in 1790, the U.S. was the first nation in the world to establish a periodic (decennial) census, which became increasingly detailed, elaborate, and useful over time. From its origins as a rude population count, the Census, by the middle of the nineteenth century, had become a rich treasure trove of statistical information about the demographic, economic, and social characteristics of the American population (4). Clearly, by that time, in the United States and elsewhere, numbers had begun to count.

To say that numbers began to count is not to imply that they spoke for themselves. One must take care to note that numbers and statistics are socially constructed rather than self-evident. They are as dependent on politics and social process as on method and technique. The questions asked (and those omitted), the categories studied, the methods employed, and the results reported in social science are all contingent and, as scholars influenced by phenomenology suggest, always open to interpretation (5). This point should not trouble us unduly&emdash;few historians continue to cling to positivist notions of objectivity at this late date&emdash;but it needs to be made explicit nonetheless. So, too, do the limits and misuses of numbers and the statistical approach.

Let me make the same point another way. The term statistics can be used to denote both numerical representations per se and the discipline concerned with the development of techniques that enable us to make more compact, meaningful, and logical conclusions about samples and populations (6). While our main concern here is with the former usage, it is important again to stress that numerical representations, however elegant, are just that: representations, moreover, representations of a richer, denser, more complex, and contingent reality.

If numbers are powerful tools and if the field of statistics offers a valuable “kit” of techniques, neither is totally “objective” or “scientific,” much less pure in a Platonic sense. Lenin’s sardonic reference in Imperialism to “irrefutable bourgeois statistics” is well-taken in this regard: statistics can help add muscle to historical analysis and interpretation, but both numerical representations and the field of statistics itself are as much art as they are science, as Darrell Huff long ago pointed out (7).

Art and artifice share a common root, of course, and some would contend that historical statistics partake rather more of the latter than of the former. Huff states more simply that “[a] great many manipulations and even distortions are possible within the bounds of propriety” in employing statistical data (8). Indeed, everyone who uses statistics on a regular basis has a favorite joke, story, or anecdote that illustrates this point. There’s the one, for example, about the statistician who drowned in a lake with a mean depth of two feet, and the one about statisticians using numbers the way drunks use lampposts: more for support than illumination! Perhaps the most cynical of all, recounted by Paul Starr, comes, not surprisingly, from the late Soviet Union. In this story, three men apply for a job as an auditor in a Soviet factory. During his screening test, the first candidate is asked, “How much is two and two?” “Four,” he replies, thereby failing the test summarily. The second candidate answers “Five” to the same question, and fails the test as well. The third job seeker, also asked this question, responds with one of his own: “How much do you need?” This response gets him the job (9).

These jokes suggest that care and caution are necessary in using and interpreting statistics. There are, in fact, almost as many ways to deceive and distort with numbers as there are ways to inform and instruct. No one has demonstrated the problems and pitfalls of statistical (mis)representations better than the aforementioned Darrell Huff, whose little primer How to Lie with Statistics, published in 1954, is now in its forty-first printing! In this deceptively simple work, Huff seeks above all else to alert readers to some of the ways in which numbers can be used to distort, mislead, conceal, or even fabricate information. Yet in so doing, Huff demystifies statistics and enables readers to better appreciate the power of numerical representations of reality.

Readers interested in further pursuing sources of statistical errors, abuses, and deceptions would do well to go directly to Huff, to Oskar Morgenstern’s classic treatise On the Accuracy of Economic Observations, or to any honest statistics test (10). At this time, it is only important to note that numbers and statistics can be&emdash;and often are&emdash;used carelessly, employed without sufficient information, or trotted out for partisan purposes. Countless examples exist in the historical literature of such problems, though a few words on the most common abuses will serve to illustrate our point. Data from vague and/or questionable sources are often accepted without hesitation, for example, and results derived from tiny samples or samples with built-in biases are frequently reported without qualification. Statistically insignificant findings are commonly paraded, and weak associations and correlations treated with undue respect merely because they are expressed numerically and resulted from a bit of algebraic manipulation.

Similarly, quantitative results are often rendered tendentious because they are “cooked,” served up only after experimentation with various base periods, or after seasoning with inappropriate comparisons (what Huff calls “semi-attached figures”), or after introducing a little definitional imprecision without a pinch of elaboration. For example, average income in country X, according to Professor Y, is Z, but does Z refer to mean income, median income, or even modal income? Company A’s rate of return last year was B percent, but does B refer to return on sales, return on investment, return on equity, or to something else?

If imprecision can lead to serious interpretive problems, inappropriate or unrealistic statistical precision can mislead as well. Can we really calculate mean per capita wealth in seventeenth-century New England down to the second decimal place? Is it reasonable, let alone necessary, to calculate turn-of-the-century rural fertility in the U.S. down to the third? Such virtuoso displays of statistical precision&emdash;what Morgenstern refers to as “specious accuracy”&emdash;simply will not do. They misleadingly suggest that we have recovered more of the past than anyone has ever even known (11).

We have hardly exhausted the catalogue of common statistical abuses. We haven’t even mentioned, for example, the serious problems resulting from so-called white noise, that is, from human error in the handling and transmission of data. By this time, the upshot of our discussion should be clear nonetheless: one must exercise considerable caution in working with and interpreting numbers and statistics, particularly those that purportedly represent historical phenomena. Nowhere is this point made more strongly than in the sobering epigram with which Morgenstern opens On the Accuracy of Economic Observations: “Qui numerare incipit errare incipit.” He who begins to count, begins to err (12).

If all of this is true&emdash;and I believe that it is&emdash;then why bother with numbers? In any case, why must we resort to such a stark, cold, sterile, and ultimately dehumanizing way to represent the past? For all of Stalin’s crimes, he may have been on to something when he said: “A single death is a tragedy, a million is a statistic.” Nor was he alone in this belief. Traditional humanists have long perceived social statistics to be “soulless,” and Marx and Engels argued over a century ago that the bourgeoisie, obsessed with statistical representations, drowned all human sentiment “in the icy water of egotistical calculation.” Then there’s that acerbic joke currently making the rounds about the definition of a statistician: a person who loves numbers but who hasn’t sufficient charm, grace, or wit to become an accountant! Once again, then, why bother with numbers (13)?

There are good reasons. As suggested previously, statistics can add muscle to flabby arguments and interpretations, bringing a degree of specificity to otherwise vague generalizations. Numbers and percentages, however icy, are less chilling to the mind than infuriatingly obscure adjectives and adverbs such as “few,” “some,” “generally,” “seldom,” and “often.” Moreover, because numerical representations are subject to formal verification or refutation more readily than are indefinite adjectives and adverbs, they serve to discipline and check factual sloppiness and wildly exaggerated claims with at least some degree of evidence.

Numbers and statistics, furthermore, can help to illustrate, clarify, or make difficult, dry, or conceptually challenging points palpable. To say that income and wealth are unequally distributed in our country today is correct but monochromatic. To add that the family income of the top five percent of American families is greater than that of the bottom forty percent and that the wealthiest individual, media magnate John Werner Kluge, is worth $5.9 billion helps to create a more vivid mental picture of contemporary income and wealth-holding patterns in the United States (14).

In addition, numerical and statistical series, particularly those that span relatively long periods of time, can provide an analytical structure helpful in interpreting the evolution and development of any object of historical study whether it be an entire country and its economy; a particular industry, political party, or religious body; or even an entertainment medium. In so doing, statistics can serve at once as yardsticks, bench marks, and spine, allowing us quickly and easily to gauge such things as absolute growth and rates of growth&emdash;or conversely, decline&emdash;over various periods of time.

In a more general sense, familiarity and facility with numbers, particularly large numbers, is becoming increasingly important as the United States itself becomes larger in numbers, more complex, and more dependent upon, if not dominated by, advanced technology and difficult technical and scientific issues. However, as John Allen Paulos has pointed out, in his recent best-seller Innumeracy: Mathematical Illiteracy and its Consequences, many Americans are being left behind because of their estrangement from, resistance to, and incapacity in mathematics, as well as their fear of numbers in general. This has troubling consequences to say the least in an age of trillion-dollar GNPs, hundreds-of-billion-dollar deficits, billion dollar R&D budgets, million-dollar signing bonuses, and, alas, five-figure wages and salaries. All of those zeroes can get confusing, particularly to those who proclaim that “numbers just don’t do it for me.” Finally, in an era of crystal power, channelling, and other New Age babbling, we desperately need to redouble our support for all rational modes of inquiry.

In this essay I have argued that although numbers and statistics do some things poorly, they do other things extremely well. They must always be used carefully, however, and with a grain or two of salt. And they must never be used in lieu of common sense. Otherwise one can find oneself in an unfortunate situation similar to that of famed sabermetrician Bill James in his Historical Baseball Abstract (15). In this work, James argues&emdash;on the basis of statistics&emdash;that Pete Rose was vastly overrated as a ballplayer. Anyone who really knows the game, anyone who ever saw Rose play, knows better than that. On this, I am willing to wager.

Endnotes
1. For a lively and popular account of these problems in contemporary American culture, see John Allen Paulos, Innumeracy: Mathematical Illiteracy and its Consequences (New York: Hill and Wang, 1988).

2. For an excellent short account of the origins and development of “statistical systems,” see Paul Starr, “The Sociology of Official Statistics,” in William Alonso and Paul Starr, ed., The Politics of Numbers (New York: Russell Sage Foundation, 1987), 7-57.

3. See James H. Cassedy, Demography in Early America: Beginnings of the Statistical Mind, 1600-1800 (Cambridge: Harvard University Press, 1969); Patricia Cline Cohen, A Calculating People: The Spread of Numeracy in Early America (Chicago: University of Chicago Press, 1982).

4. See, for example, Margo J. Anderson, The American Census: A Social History (New Haven: Yale University Press, 1988), 7-57 especially.

5. See William Alonso and Paul Starr, “Introduction,” in Alonso and Starr, eds., The Politics of Numbers, 1-6; Starr, “The Sociology of Official Statistics;” Sal Restivo, The Sociological Worldview (Oxford: Basil Blackwell, 1991), 161-173.

6. Any good statistics textbook can be consulted on these questions of definition and purview. See, for example, Herman J. Loether and Donald G. McTavish, Descriptive and Inferential Statistics: An Introduction (Boston: Allyn and Bacon, 1976), 3-10; Lucy Horwitz and Lou Ferleger, Statistics for Social Change (Boston: South End Press, 1980), ix-xiii.

7. V.I. Lenin, Imperialism, The Highest Stage of Capitalism: A Popular Outline (New York: International Publishers, 1939), 9; Darrell Huff, How to Lie With Statistics (New York: W.W. Norton, 1954).

8. Huff, How to Lie With Statistics, 120.

9. Starr, “The Sociology of Official Statistics,” 34-35. One should note that in 1992 good, reliable statistics on the Russian economy became available for the first time with the launching of the new quarterly report, Russian Economic Trends.

10. Oskar Morgenstern, On the Accuracy of Economic Observations, 2nd ed. (Princeton: Princeton University Press, 1963).

11. Ibid., 8-9, 62-65.

12. Ibid. (The translation is my own.)

13. The quotes are from The New York Times Book Review, 28 September 1958, 3; Eric L. Jones, The European Miracle: Environments, Economies and Geopolitics in the History of Europe and Asia, 2nd ed. (Cambridge: Cambridge University Press, 1987), xii; Karl Marx and Friedrich Engels, The Communist Manifesto, trans. Samuel Moore (New York: Washington Square Press, 1964), 62.

14. See Frank Levy, Dollars and Dreams: The Changing American Income Distribution (New York: W.W. Norton, 1988), 14; Forbes, “The 400 Richest People in America,” 1991 Edition, 21 October 1991, 150.

15. Bill James, The Bill James Historical Baseball Abstract (New York: Villard Books, 1986). In his section on the greatest players of the twentieth century, for example, James ranks Rose&emdash;at his peak&emdash;as the ninety-seventh greatest player of the century, behind such players at their peaks as Dan Quisenberry, Thurman Munson, Gary Carter, Vida Blue, Lou Boudreau, Duke Snider, Larry Doby, Ralph Kiner, Johnny Evers, and Frank Chance. See 430-433.

Peter A. Coclanis is Associate Professor at UNC-Chapel Hill and the author of The Shadow of a Dream: Economic Life and Death in the South Carolina Low Country, 1670-1920 (1989).