The real problem with distant reading.

This will be an old-fashioned, shamelessly opinionated, 1000-word blog post.

Anyone who has tried to write literary history using numbers knows that they are a double-edged sword. On the one hand they make it possible, not only to consider more examples, but often to trace subtler, looser patterns than we could trace by hand.

On the other hand, quantitative methods are inherently complex, and unfamiliar for humanists. So it’s easy to bog down in preambles about method.

Social scientists talking about access to healthcare may get away with being slightly dull. But literary criticism has little reason to exist unless it’s interesting; if it bogs down in a methodological preamble, it’s already dead. Some people call the cause of death “positivism,” but only because that sounds more official than “boredom.”

This is a rhetorical rather than epistemological problem, and it needs a rhetorical solution. For instance, Matthew Wilkens and Cameron Blevins have rightly been praised for focusing on historical questions, moving methods to an appendix if necessary. You may also recall that a book titled Distant Reading recently won a book award in the US. Clearly, distant reading can be interesting, even exciting, when writers are able to keep it simple. That requires resisting several temptations.

One temptation to complexity is technical, of course: writers who want to reach a broad audience need to resist geeking out over the latest algorithm. Perhaps fewer people recognize that the advice of more traditional colleagues can be another source of temptation. Scholars who haven’t used computational methods rarely understand the rhetorical challenges that confront distant readers. They worry that our articles won’t be messy enough — bless their hearts — so they advise us to multiply close readings, use special humanistic visualizations, add editorial apparatus to the corpus, and scatter nuances liberally over everything.

Some parts of this advice are useful: a crisp, pointed close reading can be a jolt of energy. And Katherine Bode is right that, in 2016, scholars should share data. But a lot of the extra knobs and nuances that colleagues suggest adding are gimcrack notions that would aggravate the real problem we face: complexity.

Consider the common advice that distant readers should address questions about the representativeness of their corpora by weighting all the volumes to reflect their relative cultural importance. A lot of otherwise smart people have recommended this. But as far as I know, no one ever does it. The people who recommend it, don’t do it themselves, because a moment’s thought reveals that weighting volumes can only multiply dubious assumptions. Personally, I suspect that all quests for the One Truly Representative Corpus are a mug’s game. People who worry about representativeness are better advised to consult several differently-selected samples. That sometimes reveals confounding variables — but just as often reveals that selection practices make little difference for the long-term history of the variable you’re tracing. (The differences between canon and archive are not always as large, or as central to this project, as Franco Moretti initially assumed.)

treknorman

Sic semper robotis. “I, Mudd” (1967).

Another tempting piece of advice comes from colleagues who invite distant readers to prove humanistic credentials by adding complexity to their data models. This suggestion draws moral authority from a long-standing belief that computers force everything to be a one or a zero, whereas human beings are naturally at home in paradox. That’s why Captain Kirk could easily destroy alien robots by confronting them with the Liar’s Paradox. “How can? It be X. But also? Not X. Does not compute. <Smell of frying circuitry>.”

Maybe in 1967 it was true that computers could only handle exclusive binary categories. I don’t know: I was divided into a pair of gametes at the time myself. But nowadays data models can be as complex as you like. Go ahead, add another column to the database. No technical limit will stop you. Make categories perspectival, by linking each tag to a specific observer. Allow contradictory tags. If you’re still worried that things are too simple, make each variable a probability distribution. The computer won’t break a sweat, although your data model may look like the illustration below to human readers.

goldberg

Rube Goldberg, “Professor Butts and the Self-Operating Napkin” (1931), via Wikimedia Commons.

I just wrote an article, for instance, where I consider eighteen different sources of testimony about genre — each of which models a genre in ways that can implicitly conflict with, or overlap with, other definitions. I trust you can see the danger: it’s not that the argument will be too reductive. I was willing to run a risk of complexity in this case because I was tired of being told that computers force everything into binaries. Machine learning is actually good at eschewing fixed categories to tease out loose family resemblances; it can be every bit as perspectival, multifaceted, and blurry as you wanna be.

I hope my article manages to remain lively, but I think readers will discover, by the end, that it could have succeeded with a simpler data model. When I rework it for the book version, I may do some streamlining.

It’s okay to simplify the world in order to investigate a specific question. That’s what smart qualitative scholars do themselves, when they’re not busy giving impractical advice to their quantitative friends. Max Weber and Hannah Arendt didn’t make an impact on their respective fields — or on public life — by adding the maximum amount of nuance to everything, so their models could represent every aspect of reality at once, and also function as self-operating napkins.

Because distant readers use larger corpora and more explicit data models than is usual for literary study, critics of the field (internal as well as external) have a tendency to push on those visible innovations, asking “What is still left out?” Something will always be left out, but I don’t think distant reading needs to be pushed toward even more complexity and completism. Computers make those things only too easy. Instead distant readers need to struggle to retain the vividness and polemical verve of the best literary criticism, and the  “combination of simplicity and strength” that characterizes useful social theory.

 

Emerging conversations between literary history and sociology.

As Jim English remarked in 2010, literary scholars have tended to use sociology “for its conclusions rather than its methods.” We might borrow a term like “habitus” from Bourdieu, but we weren’t interested in borrowing correspondence analysis. If we wanted to talk about methodology with social scientists at all, we were more likely to go to the linguists. (A connection to linguistics in fact almost defined “humanities computing.”)

But a different conversation seems to have emerged recently. A special issue of Poetics on topic models in 2013 was one early sign of methodological conversation between sociology and literary study. This year, Ben Merriman’s sociological review of books by Moretti and Jockers was followed by comments from Andrew Goldstone and Tressie McMillan Cottom, and then by a special issue of Cultural Sociology and by Goldstone’s response to Gisèle Sapiro. Most recently a special issue of Big Data and Society (table of contents), organized by sociologists, included several articles on literary history and/or literary theory.

What’s going on here?

Conveniently, several articles in Big Data and Society are trying to explain the reasons for growing methodological overlap between these disciplines. I think it’s interesting that the sociologists and literary scholars involved are telling largely the same story (though viewing it, perhaps, from opposite sides of a mirror).

First, the perspective of social scientists. In “Toward a computational hermeneutics,” John W. Mohr, Robin Wagner-Pacifici, and Ronald L. Breiger (who collectively edited this special issue of BDS) suggest that computational methods are facilitating a convergence between the social-scientific tradition of “content analysis” and kinds of close reading that have typically been more central to the humanities.

Close reading? Well, yes, relative to what was previously possible at scale. Content analysis was originally restricted to predefined keywords and phrases that captured the “manifest meaning of a textual corpus” (2). Other kinds of meaning, implicit in “complexities of phrasing” or “rhetorical forms,” had to be discarded to make text usable as data. But according to the authors, computational approaches to text analysis “give us the ability to instead consider a textual corpus in its full hermeneutic complexity,” going beyond the level of interpretation Kenneth Burke called “semantic” to one he considered “poetic” (3-4). This may be interpretation on a larger scale than literary scholars are accustomed to, but from the social-scientific side of the border, it looks like a move in our direction.

JariSchroderus, "Through the Looking Glass," 2006, CC BY-NC-ND 2.0.

Jari Schroderus, “Through the Looking Glass,” 2006, CC BY-NC-ND 2.0.

The essay I contributed to BDS tells a mirror image of this story. I think twentieth-century literary scholars were largely right to ignore quantitative methods. The problems that interested us weren’t easy to represent, for exactly the reason Mohr, Wagner-Pacifici, and Breiger note: the latent complexities of a text had to be discarded in order to treat it as structured data.

But that’s changing. We can pour loosely structured qualitative data into statistical models these days, and that advance basically blurs the boundary we have taken for granted between the quantitative social sciences and humanities. We can create statistical models now where loosely structured texts sit on one side of an equals sign, and evidence about social identity, prestige, or power sits on the other side.

For me, the point of that sort of model is to get beyond one of the frustrating limitations of “humanities computing,” which was that it tended to stall out at the level of linguistic detail. Before we could pose questions about literary form or social conflict, we believed we had to first agree on a stopword list, and a set of features, and a coding scheme, and … in short, if social questions can only be addressed after you solve all the linguistic ones, you never get to any social questions.

But (as I explain at more length in the essay) new approaches to statistical modeling are less finicky about linguistic detail than they used to be. Instead of fretting endlessly about feature selection and xml tags, we can move on to the social questions we want to pose — questions about literary prestige, or genre, or class, or race, or gender. Text can become to some extent a space where we trace social boundaries and study the relations between them.

In short, the long-standing (and still valuable) connection between digital literary scholarship and linguistics can finally be complemented by equally strong connections to other social sciences. I think those connections are going to have fruitful implications, beginning to become visible in this issue of Big Data and Society, and (just over the horizon) in work in progress sponsored by groups like NovelTM and the Chicago Text Lab.

A final question raised by this interdisciplinary conversation involves the notion of big data foregrounded in the journal title. For social scientists, “big data” has a fairly clear meaning — which has less to do with scale, really, than with new ways of gathering data without surveys. But of course surveys were never central to literary study, and it may be no accident that few of the literary scholars involved in this issue of BDS are stressing the bigness of big data. We’ve got terabytes of literature in digital libraries, and we’re using them. But we’re not necessarily making a fuss about “bigness” as such.

Rachel Buurma’s essay on topic-modeling Trollope’s Barsetshire novels explicitly makes a case for the value of topic-modeling at an intermediate scale — while, by the way, arguing persuasively that a topic model is best understood as an “uncanny, shifting, temporary index,” or “counter-factual map” (4). In my essay I discuss a collection of 720 books. That may sound biggish relative to what literary scholars ordinarily do, but it’s explicitly a sample rather than an attempt at coverage, and I argue against calling it big data.

There are a bunch of reasons for that. I’ve argued in the past that the term doesn’t have a clear meaning for humanists. But my stronger objection is that it distracts readers from more interesting things. It allows us to imagine that recent changes are just being driven by faster computers or bigger disks — and obscures underlying philosophical developments that would fascinate humanists if we knew about them.

I believe the advances that matter for humanists have depended less on sheer scale than on new ideas about what it means to model evidence (i.e., learn from it, generalize from it). Machine learning honestly is founded on a theory of learning, and it’s kind of tragic that humanists are understanding something that interesting as a purely technical phenomenon called “big data.” I’m not going to try to explain statistical theories of learning in a short blog post, but in my essay I do at least gesture at a classic discussion by Leo Breiman. Some of my observations overlap with an essay in this same issue of BDS by Paul DiMaggio, who is likewise interested in the epistemological premises involved in machine learning.

Can we date revolutions in the history of literature and music?

Humanists know the subjects we study are complex. So on the rare occasions when we describe them with numbers at all, we tend to proceed cautiously. Maybe too cautiously. Distant readers have spent a lot of time, for instance, just convincing colleagues that it might be okay to use numbers for exploratory purposes.

But the pace of this conversation is not entirely up to us. Outsiders to our disciplines may rush in where we fear to tread, forcing us to confront questions we haven’t faced squarely.

For instance, can we use numbers to identify historical periods when music or literature changed especially rapidly or slowly? Humanists have often used qualitative methods to make that sort of argument. At least since the nineteenth century, our narratives have described periods of stasis separated by paradigm shifts and revolutionary ruptures. For scientists, this raises an obvious, tempting question: why not actually measure rates of change and specify the points on the timeline when ruptures happened?

The highest-profile recent example of this approach is an article in Royal Society Open Science titled “The evolution of popular music” (Mauch et al. 2015). The authors identify three moments of rapid change in US popular music between 1960 and 2010. Moreover, they rank those moments, and argue that the advent of rap caused popular music to change more rapidly than the British Invasion — a claim you may remember, because it got a lot of play in the popular press. Similar arguments have appeared about the pace of change in written expression — e.g, a recent article argues that 1917 was a turning point in political rhetoric (h/t Cameron Blevins).

When disciplinary outsiders make big historical claims, humanists may be tempted just to roll our eyes. But I don’t think this is a kind of intervention we can afford to ignore. Arguments about the pace of cultural change engage theoretical questions that are fundamental to our disciplines, and questions that genuinely fascinate the public. If scientists are posing these questions badly, we need to explain why. On the other hand, if outsiders are addressing important questions with new methods, we need to learn from them. Scholarship is not a struggle between disciplines where the winner is the discipline that repels foreign ideas with greatest determination.

I feel particularly obligated to think this through, because I’ve been arguing for a couple of years that quantitative methods tend to reveal gradual change rather than the sharply periodized plateaus we might like to discover in the past. But maybe I just haven’t been looking closely enough for discontinuities? Recent articles introduce new ways of locating and measuring them.

This blog post applies methods from “The evolution of popular music” to a domain I understand better — nineteenth-century literary history. I’m not making a historical argument yet, just trying to figure out how much weight these new methods could actually support. I hope readers will share their own opinions in the comments. So far I would say I’m skeptical about these methods — or at least skeptical that I know how to interpret them.

How scientists found musical revolutions.

Mauch et al. start by collecting thirty-second snippets of songs in the Billboard Hot 100 between 1960 and 2010. Then they topic-model the collection to identify recurring harmonic and timbral topics. To study historical change, they divide the fifty-year collection into two hundred quarter-year periods, and aggregate the topic frequencies for each quarter. They’re thus able to create a heat map of pairwise “distances” between all these quarter-year periods. This heat map becomes the foundation for the crucial next step in their argument — the calculation of “Foote novelty” that actually identifies revolutionary ruptures in music history.

Figure 5 from Mauch, et. al., “The evolution of popular music” (RSOS 2015).

The diagonal line from bottom left to top right of the heat map represents comparisons of each time segment to itself: that distance, obviously, should be zero. As you rise above that line, you’re comparing the same moment to quarters in its future; if you sink below, you’re comparing it to its past. Long periods where topic distributions remain roughly similar are visible in this heat map as yellowish squares. (In the center of those squares, you can wander a long way from the diagonal line without hitting much dissimilarity.) The places where squares are connected at the corners are moments of rapid change. (Intuitively, if you deviate to either side of the narrow bridge there, you quickly get into red areas. The temporal “window of similarity” is narrow.) Using an algorithm outlined by Jonathan Foote (2000), the authors translate this grid into a line plot where the dips represent musical “revolutions.”

Trying the same thing on the history of the novel.

Could we do the same thing for the history of fiction? The labor-intensive part would be coming up with a corpus. Nineteenth-century literary scholars don’t have a Billboard Hot 100. We could construct one, but before I spend months crafting a corpus to address this question, I’d like to know whether the question itself is meaningful. So this is a deliberately rough first pass. I’ve created a sample of roughly 1000 novels in a quick and dirty way by randomly selecting 50 male and 50 female authors from each decade 1820-1919 in HathiTrust. Each author is represented in the whole corpus only by a single volume. The corpus covers British and American authors; spelling is normalized to modern British practice. If I were writing an article on this topic I would want a larger dataset and I would definitely want to record things like each author’s year of birth and nationality. This is just a first pass.

Because this is a longer and sparser sample than Mauch et al. use, we’ll have to compare two-year periods instead of quarters of a year, giving us a coarser picture of change. It’s a simple matter to run a topic model (with 50 topics) and then plot a heat map based on cosine similarities between the topic distributions in each two-year period.

Heatmap and Foote novelty for 1000 novels, 1820-1919. Rises in the trend lines correspond to increased Foote novelty.

Heatmap and Foote novelty for 1000 novels, 1820-1919. Rises in the trend lines correspond to increased Foote novelty.

Voila! The dark and light patterns are not quite as clear here as they are in “The evolution of popular music.” But there are certainly some squarish areas of similarity connected at the corners. If we use Foote novelty to interpret this graph, we’ll have one major revolution in fiction around 1848, and a minor one around 1890. (I’ve flipped the axis so peaks, rather than dips, represent rapid change.) Between these peaks, presumably, lies a valley of Victorian stasis.

Is any of that true? How would we know? If we just ask whether this story fits our existing preconceptions, I guess we could make it fit reasonably well. As Eleanor Courtemanche pointed out when I discussed this with her, the end of the 1840s is often understood as a moment of transition to realism in British fiction, and the 1890s mark the demise of the three-volume novel. But it’s always easy to assimilate new evidence to our preconceptions. Before we rush to do it, let’s ask whether the quantitative part of this argument has given us any reason at all to believe that the development of English-language fiction really accelerated in the 1840s.

I want to pose four skeptical questions, covering the spectrum from fiddly quantitative details to broad theoretical doubts. I’ll start with the fiddliest part.

1) Is this method robust to different ways of measuring the “distance” between texts?

The short answer is “yes.” The heat maps plotted above are calculated on a topic model, after removing stopwords, but I get very similar results if I compare texts directly, without a topic model, using a range of different distance metrics. Mauch et al. actually apply PCA as well as a topic model; that doesn’t seem to make much difference. The “moments of revolution” stay roughly in the same place.

2) How crucial is the “Foote novelty” piece of the method?

Very crucial, and this is where I think we should start to be skeptical. Mauch et al. are identifying moments of transition using a method that Jonathan Foote developed to segment audio files. The algorithm is designed to find moments of transition, even if those moments are quite subtle. It achieves this by making comparisons — not just between the immediately previous and subsequent moments in a stream of observations — but between all segments of the timeline.

It’s a clever and sensitive method. But there are other, more intuitive ways of thinking about change. For instance, we could take the first ten years of the dataset as a baseline and directly compare the topic distributions in each subsequent novel back to the average distribution in 1820-1829. Here’s the pattern we see if we do that:

byyearThat looks an awful lot like a steady trend; the trend may gradually flatten out (either because change really slows down or, more likely, because cosine distances are bounded at 1.0) but significant spurts of revolutionary novelty are in any case quite difficult to see here.

That made me wonder about the statistical significance of “Foote novelty,” and I’m not satisfied that we know how to assess it. One way to test the statistical significance of a pattern is to randomly permute your data and see how often patterns of the same magnitude turn up. So I repeatedly scrambled the two-year periods I had been comparing, constructed a heat matrix by comparing them pairwise, and calculated Foote novelty.

A heatmap produced by randomly scrambling the fifty two-year periods in the corpus. The “dates” on the timeline are now meaningless.

When I do this I almost always find Foote novelties that are as large as the ones we were calling “revolutions” in the earlier graph.

The authors of “The evolution of popular music” also tested significance with a permutation test. They report high levels of significance (p < 0.01) and large effect sizes (they say music changes four to six times faster at the peak of a revolution than at the bottom of a trough). Moreover, they have generously made their data available, in a very full and clearly-organized csv. But when I run my permutation test on their data, I run into the same problem — I keep discovering random Foote novelties that seem as large as the ones in the real data.

It’s possible that I’m making some error, or that we're testing significance differently. I'm permuting the underlying data, which always gives me a matrix that has the checkerboardy look you see above. The symmetrical logic of pairwise comparison still guarantees that random streaks organize themselves in a squarish way, so there are still “pinch points” in the matrix that create high Foote novelties. But the article reports that significance was calculated “by random permutation of the distance matrix.” If I actually scramble the rows or columns of the distance matrix itself I get a completely random pattern that does give me very low Foote novelty scores. But I would never get a pattern like that by calculating pairwise distances in a real dataset, so I haven’t been able to convince myself that it’s an appropriate test.

3) How do we know that all forms of change should carry equal cultural weight?

Now we reach some questions that will make humanists feel more at home. The basic assumption we’re making in the discussion above is that all the features of an expressive medium bear historical significance. If writers replace “love” with “spleen,” or replace “cannot” with “can’t,” it may be more or less equal where this method is concerned. It all potentially counts as change.

This is not to say that all verbal substitutions will carry exactly equal weight. The weight assigned to words can vary a great deal depending on how exactly you measure the distance between texts; topic models, for instance, will tend to treat synonyms as equivalent. But — broadly speaking — things like contractions can still potentially count as literary change, just as instrumentation and timbre count as musical change in “The evolution of popular music.”

At this point a lot of humanists will heave a relieved sigh and say “Well! We know that cultural change doesn’t depend on that kind of merely verbal difference between texts, so I can stop worrying about this whole question.”

Not so fast! I doubt that we know half as much as we think we know about this, and I particularly doubt that we have good reasons to ignore all the kinds of change we’re currently ignoring. Paying attention to merely verbal differences is revealing some massive changes in fiction that previously slipped through our net — like the steady displacement of abstract social judgment by concrete description outlined by Heuser and Le-Khac in LitLab pamphlet #4.

For me, the bottom line is that we know very little about the kinds of change that should, or shouldn’t, count in cultural history. “The evolution of popular music” may move too rapidly to assume that every variation of a waveform bears roughly equal historical significance. But in our daily practice, literary historians rely on a set of assumptions that are much narrower and just as arbitrary. An interesting debate could take place about these questions, once humanists realize what’s at stake, but it’s going to be a thorny debate, and it may not be the only way forward, because …

4) Instead of discussing change in the abstract, we might get further by specifying the particular kinds of change we care about.

Our systems of cultural periodization tend to imply that lots of different aspects of writing (form and style and theme) all change at the same time — when (say) “aestheticism” is replaced by “modernism.” That underlying theory justifies the quest for generalized cultural growth spurts in “The evolution of popular music.”

But we don’t actually have to think about change so generally. We could specify particular social questions that interest us, and measure change relative to those questions.

The advantage of this approach is that you no longer have to start with arbitrary assumptions about the kind of “distance” that counts. Instead you could use social evidence to train a predictive model. Insofar as that model predicts the variables you care about, you know that it’s capturing the specific kind of change that matters for your question.

Jordan Sellers and I took this approach in a working paper we released last spring, modeling the boundary between volumes of poetry that were reviewed in prominent venues, and those that remained obscure. We found that the stylistic signals of poetic prestige remained relatively stable across time, but we also found that they did move, gradually, in a coherent direction. What we didn’t do, in that article, is try to measure the pace of change very precisely. But conceivably you could, using Foote novelty or some other method. Instead of creating a heatmap that represents pairwise distances between texts, you could create a grid where models trained to recognize a social boundary in particular decades make predictions about the same boundary in other decades. If gender ideologies or definitions of poetic prestige do change rapidly in a particular decade, it would show up in the grid, because models trained to predict authorial gender or poetic prominence before that point would become much worse at predicting it afterward.

Conclusion

I haven’t come to any firm conclusion about “The evolution of popular music.” It’s a bold article that proposes and tests important claims; I’ve learned a lot from trying the same thing on literary history. I don’t think I proved that there aren’t any revolutionary growth spurts in the history of the novel. It’s possible (my gut says, even likely) that something does happen around 1848 and around 1890. But I wasn’t able to show that there’s a statistically significant acceleration of change at those moments. More importantly, I haven’t yet been able to convince myself that I know how to measure significance and effect size for Foote novelty at all; so far my attempts to do that produce results that seem different from the results in a paper written by four authors who have more scientific training than I do, so there’s a very good chance that I’m misunderstanding something.

I would welcome comments, because there are a lot of open questions here. The broader task of measuring the pace of cultural change is the kind of genuinely puzzling problem that I hope we’ll be discussing at more length in the IPAM Cultural Analytics workshop next spring at UCLA.

Postscript Oct 5: More will be coming in a day or two. The suggestions I got from comments (below) have helped me think the quantitative part of this through, and I’m working up an iPython notebook that will run reliable tests of significance and effect size for the music data in Mauch et al. as well as a larger corpus of novels. I have become convinced that significance tests on Foote novelty are not a good way to identify moments of rapid change. The basic problem with that approach is that sequential datasets will always have higher Foote novelties than permuted (non-sequential) datasets, if you make the “window” wide enough — even if the pace of change remains constant. Instead, borrowing an idea from Hoyt Long and Richard So, I’m going to use a Chow test to see whether rates of change vary.

Postscript Oct 8: Actually it could be a while before I have more to say about this, because the quantitative part of the problem turns out to be hard. Rates of change definitely vary. Whether they vary significantly, may be a tricky question.

References:

Jonathan Foote. Automatic audio segmentation using a measure of audio novelty. In Proceedings of IEEE International Conference on Multimedia and Expo, vol. I, pp. 452-455, 2000.

Matthias Mauch, Robert M. MacCallum, Mark Levy, Armand M. Leroi. The evolution of popular music. Royal Society Open Science. May 6, 2015.

Seven ways humanists are using computers to understand text.

[This is an updated version of a blog post I wrote three years ago, which organized introductory resources for a workshop. Getting ready for another workshop this summer, I glanced back at the old post and realized it’s out of date, because we’ve collectively covered a lot of ground in three years. Here’s an overhaul.]

Why are humanists using computers to understand text at all?
Part of the point of the phrase “digital humanities” is to claim information technology as something that belongs in the humanities — not an invader from some other field. And it’s true, humanistic interpretation has always had a technological dimension: we organized writing with commonplace books and concordances before we took up keyword search [Nowviskie, 2004; Stallybrass, 2007].

But framing new research opportunities as a specifically humanistic movement called “DH” has the downside of obscuring a bigger picture. Computational methods are transforming the social and natural sciences as much as the humanities, and they’re doing so partly by creating new conversations between disciplines. One of the main ways computers are changing the textual humanities is by mediating new connections to social science. The statistical models that help sociologists understand social stratification and social change haven’t in the past contributed much to the humanities, because it’s been difficult to connect quantitative models to the richer, looser sort of evidence provided by written documents. But that barrier is dissolving. As new methods make it easier to represent unstructured text in a statistical model, a lot of fascinating questions are opening up for social scientists and humanists alike [O’Connor et. al. 2011].

In short, computational analysis of text is not a specific new technology or a subfield of digital humanities; it’s a wide-open conversation in the space between several different disciplines. Humanists often approach this conversation hoping to find digital tools that will automate familiar tasks. That’s a good place to start: I’ll mention tools you could use to create a concordance or a word cloud. And it’s fair to stop there. More involved forms of text analysis do start to resemble social science, and humanists are under no obligation to dabble in social science.

But I should also warn you that digital tools are gateway drugs. This thing called “text analysis” or “distant reading” is really an interdisciplinary conversation about methods, and if you get drawn into the conversation, you may find that you want to try a lot of things that aren’t packaged yet as tools.

What can we actually do?
The image below is a map of a few things you might do with text (inspired by, though different from, Alan Liu’s map of “digital humanities”). The idea is to give you a loose sense of how different activities are related to different disciplinary traditions. We’ll start in the center, and spiral out; this is just a way to organize discussion, and isn’t necessarily meant to suggest a sequential work flow.

casualmap

1) Visualize single texts.
Text analysis is sometimes represented as part of a “new modesty” in the humanities [Williams]. Generally, that’s a bizarre notion. Most of the methods described in this post aim to reveal patterns hidden from individual readers — not a particularly modest project. But there are a few forms of analysis that might count as surface readings, because they visualize textual patterns that are open to direct inspection.

For instance, people love cartoons by Randall Munroe that visualize the plots of familiar movies by showing which characters are together at different points in the narrative.

Detail from an xkcd cartoon.

Detail from an xkcd cartoon.

These cartoons reveal little we didn’t know. They’re fun to explore in part because the narratives being represented are familiar: we get to rediscover familiar material in a graphical medium that makes it easy to zoom back and forth between macroscopic patterns and details. Network graphs that connect characters are fun to explore for a similar reason. It’s still a matter of debate what (if anything) they reveal; it’s important to keep in mind that fictional networks can behave very differently from real-world social networks [Elson, et al., 2010]. But people tend to find them interesting.

A concordance also, in a sense, tells us nothing we couldn’t learn by reading on our own. But critics nevertheless find them useful. If you want to make a concordance for a single work (or for that matter a whole library), AntConc is a good tool.

Visualization strategies themselves are a topic that could deserve a whole separate discussion.

2) Choose features to represent texts.
A scholar undertaking computational analysis of text needs to answer two questions. First, how are you going to represent texts? Second, what are you going to do with that representation once you’ve got it? Most what follows will focus on the second question, because there are a lot of equally good answers to the first one — and your answer to the first question doesn’t necessarily constrain what you do next.

In practice, texts are often represented simply by counting the various words they contain (they are treated as so-called “bags of words”). Because this representation of text is radically different from readers’ sequential experience of language, people tend to be surprised that it works. But the goal of computational analysis is not, after all, to reproduce the modes of understanding readers have already achieved. If we’re trying to reveal large-scale patterns that wouldn’t be evident in ordinary reading, it may not actually be necessary to retrace the syntactic patterns that organize readers’ understanding of specific passages. And it turns out that a lot of large-scale questions are registered at the level of word choice: authorship, theme, genre, intended audience, and so on. The popularity of Google’s Ngram Viewer shows that people often find word frequencies interesting in their own right.

But there are lots of other ways to represent text. You can count two-word phrases, or measure white space if you like. Qualitative information that can’t be counted can be represented as a “categorical variable.” It’s also possible to consider syntax, if you need to. Computational linguists are getting pretty good at parsing sentences; many of their insights have been packaged accessibly in projects like the Natural Language Toolkit. And there will certainly be research questions — involving, for instance, the concept of character — that require syntactic analysis. But they tend not to be questions that are appropriate for people just starting out.

3) Identify distinctive vocabulary.
It can be pretty easy, on the other hand, to produce useful insights on the level of diction. These are claims of a kind that literary scholars have long made: The Norton Anthology of English Literature proves that William Wordsworth emblematizes Romantic alienation, for instance, by saying that “the words ‘solitary,’ ‘by one self,’ ‘alone’ sound through his poems” [Greenblatt et. al., 16].

Of course, literary scholars have also learned to be wary of these claims. I guess Wordsworth does write “alone” a lot: but does he really do so more than other writers? “Alone” is a common word. How do we distinguish real insights about diction from specious cherry-picking?

Corpus linguists have developed a number of ways to identify locutions that are really overrepresented in one sample of writing relative to others. One of the most widely used is Dunning’s log-likelihood: Ben Schmidt has explained why it works, and it’s easily accessible online through Voyant or downloaded in the AntConc application already mentioned. So if you have a sample of one author’s writing (say Wordsworth), and a reference corpus against which to contrast it (say, a collection of other poetry), it’s really pretty straightforward to identify terms that typify Wordsworth relative to the other sample. (There are also other ways to measure overrepresentation; Adam Kilgarriff recommends a Mann-Whitney test.) And in fact there’s pretty good evidence that “solitary” is among the words that distinguish Wordsworth from other poets.

Words that are consistently more common in works by William Wordsworth than in other poets from 1780 to 1850. I’ve used Wordle’s graphics, but the words have been selected by a Mann-Whitney test, which measures overrepresentation relative to a context — not by Wordle’s own (context-free) method.

It’s also easy to turn results like this into a word cloud — if you want to. People make fun of word clouds, with some justice; they’re eye-catching but don’t give you a lot of information. I use them in blog posts, because eye-catching, but I wouldn’t in an article.

4) Find or organize works.
This rubric is shorthand for the enormous number of different ways we might use information technology to organize collections of written material or orient ourselves in discursive space. Humanists already do this all the time, of course: we rely very heavily on web search, as well as keyword searching in library catalogs and full-text databases.

But our current array of strategies may not necessarily reveal all the things we want to find. This will be obvious to historians, who work extensively with unpublished material. But it’s true even for printed books: works of poetry or fiction published before 1960, for instance, are often not tagged as “poetry” or “fiction.”

A detail from Fig 7 in So and Long, “Network Analysis and the Sociology of Modernism.”

Even if we believed that the task of simply finding things had been solved, we would still need ways to map or organize these collections. One interesting thread of research over the last few years has involved mapping the concrete social connections that organize literary production. Natalie Houston has mapped connections between Victorian poets and publishing houses; Hoyt Long and Richard Jean So have shown how writers are related by publication in the same journals [Houston 2014; So and Long 2013].

There are of course hundreds of other ways humanists might want to organize their material. Maps are often used to visualize references to places, or places of publication. Another obvious approach is to group works by some measure of textual similarity.

There aren’t purpose-built tools to support much of this work. There are tools for building visualizations, but often the larger part of the problem is finding, or constructing, the metadata you need.

5) Model literary forms or genres.
Throughout the rest of this post I’ll be talking about “modeling”; underselling the centrality of that concept seems to me the main oversight in the 2012 post I’m fixing.

A model treehouse, by Austin and Zak -- CC-NC-SA.

A model treehouse, by Austin and Zak — CC-NC-SA.

A model is a simplified representation of something, and in principle models can be built out of words, balsa wood, or anything you like. In practice, in the social sciences, statistical models are often equations that describe the probability of an association between variables. Often the “response variable” is the thing you’re trying to understand (literary form, voting behavior, or what have you), and the “predictor variables” are things you suspect might help explain or predict it.

This isn’t the only way to approach text analysis; historically, humanists have tended to begin instead by first choosing some aspect of text to measure, and then launching an argument about the significance of the thing they measured. I’ve done that myself, and it can work. But social scientists prefer to tackle problems the other way around: first identify a concept that you’re trying to understand, and then try to model it. There’s something to be said for their bizarrely systematic approach.

Building a model can help humanists in a number of ways. Classically, social scientists model concepts in order to understand them better. If you’re trying to understand the difference between two genres or forms, building a model could help identify the features that distinguish them.

Scholars can also frame models of entirely new genres, as Andrew Piper does in a recent essay on the “conversional novel.”

A very simple, imaginary statistical model that distinguishes pages of poetry from pages of prose.

A very simple, imaginary statistical model that distinguishes pages of poetry from pages of prose.

In other cases, the point of modeling will not actually be to describe or explain the concept being modeled, but very simply to recognize it at scale. I found that I needed to build predictive models simply to find the fiction, poetry, and drama in a collection of 850,000 volumes.

The tension between modeling-to-explain and modeling-to-predict has been discussed at length in other disciplines [Shmueli, 2010]. But statistical models haven’t been used extensively in historical research yet, and humanists may well find ways to use them that aren’t common in other disciplines. For instance, once we have a model of a phenomenon, we may want to ask questions about the diachronic stability of the pattern we’re modeling. (Does a model trained to recognize this genre in one decade make equally good predictions about the next?)

There are lots of software packages that can help you infer models of your data. But assessing the validity and appropriateness of a model is a trickier business. It’s important to fully understand the methods we’re borrowing, and that’s likely to require a bit of background reading. One might start by understanding the assumptions implicit in simple linear models, and work up to the more complex models produced by machine learning algorithms [Sculley and Pasanek 2008]. In particular, it’s important to learn something about the problem of “overfitting.” Part of the reason statistical models are becoming more useful in the humanities is that new methods make it possible to use hundreds or thousands of variables, which in turn makes it possible to represent unstructured text (those bags of words tend to contain a lot of variables). But large numbers of variables raise the risk of “overfitting” your data, and you’ll need to know how to avoid that.

6) Model social boundaries.
There’s no reason why statistical models of text need to be restricted to questions of genre and form. Texts are also involved in all kinds of social transactions, and those social contexts are often legible in the text itself.

For instance, Jordan Sellers and I have recently been studying the history of literary distinction by training models to distinguish poetry reviewed in elite periodicals from a random selection of volumes drawn from a digital library. There are a lot of things we might learn by doing this, but the top-line result is that the implicit standards distinguishing elite poetic discourse turn out to be relatively stable across a century.

plotmainmodelannotateSimilar questions could be framed about political or legal history.

7) Unsupervised modeling.
The models we’ve discussed so far are supervised in the sense that they have an explicit goal. You already know (say) which novels got reviewed in prominent periodicals, and which didn’t; you’re training a model in order to discover whether there are any patterns in the texts themselves that might help us explain this social boundary, or trace its history.

But advances in machine learning have also made it possible to train unsupervised models. Here you start with an unlabeled collection of texts; you ask a learning algorithm to organize the collection by finding clusters or patterns of some loosely specified kind. You don’t necessarily know what patterns will emerge.

If this sounds epistemologically risky, you’re not wrong. Since the hermeneutic circle doesn’t allow us to get something for nothing, unsupervised modeling does inevitably involve a lot of (explicit) assumptions. It can nevertheless be extremely useful as an exploratory heuristic, and sometimes as a foundation for argument. A family of unsupervised algorithms called “topic modeling” have attracted a lot of attention in the last few years, from both social scientists and humanists. Robert K. Nelson has used topic modeling, for instance, to identify patterns of publication in a Civil-War-era newspaper from Richmond.

Fugitive
But I’m putting unsupervised models at the end of this list because they may almost be too seductive. Topic modeling is perfectly designed for workshops and demonstrations, since you don’t have to start with a specific research question. A group of people with different interests can just pour a collection of texts into the computer, gather round, and see what patterns emerge. Generally, interesting patterns do emerge: topic modeling can be a powerful tool for discovery. But it would be a mistake to take this workflow as paradigmatic for text analysis. Usually researchers begin with specific research questions, and for that reason I suspect we’re often going to prefer supervised models.

* * *

In short, there are a lot of new things humanists can do with text, ranging from new versions of things we’ve always done (make literary arguments about diction), to modeling experiments that take us fairly deep into the methodological terrain of the social sciences. Some of these projects can be crystallized in a push-button “tool,” but some of the more ambitious projects require a little familiarity with a data-analysis environment like Rstudio, or even a programming language like Python, and more importantly with the assumptions underpinning quantitative social science. For that reason, I don’t expect these methods to become universally diffused in the humanities any time soon. In principle, everything above is accessible for undergraduates, with a semester or two of preparation — but it’s not preparation of a kind that English or History majors are guaranteed to have.

Generally I leave blog posts undisturbed after posting them, to document what happened when. But things are changing rapidly, and it’s a lot of work to completely overhaul a survey post like this every few years, so in this one case I may keep tinkering and adding stuff as time passes. I’ll flag my edits with a date in square brackets.

* * *

SELECTED BIBLIOGRAPHY

Elson, D. K., N. Dames, and K. R. McKeown. “Extracting Social Networks from Literary Fiction.” Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Uppsala, Sweden, 2010. 138-147.

Greenblatt, Stephen, et. al., Norton Anthology of English Literature 8th Edition, vol 2 (New York: WW Norton, 2006.

Houston, Natalie. “Towards a Computational Analysis of Victorian Poetics.” Victorian Studies 56.3 (Spring 2014): 498-510.

Nowviskie, Bethany. “Speculative Computing: Instruments for Interpretive Scholarship.” Ph.D dissertation, University of Virginia, 2004.

O’Connor, Brendan, David Bamman, and Noah Smith, “Computational Text Analysis for Social Science: Model Assumptions and Complexity,” NIPS Workshop on Computational Social Science, December 2011.

Piper, Andrew. “Novel Devotions: Conversional Reading, Computational Modeling, and the Modern Novel.” New Literary History 46.1 (2015).

Sculley, D., and Bradley M. Pasanek. “Meaning and Mining: The Impact of Implicit Assumptions in Data Mining for the Humanities.” Literary and Linguistic Computing 23.4 (2008): 409-24.

Shmueli, Galit. “To Explain or to Predict?” Statistical Science 25.3 (2010).

So, Richard Jean, and Hoyt Long, “Network Analysis and the Sociology of Modernism,” boundary 2 40.2 (2013).

Stallybrass, Peter. “Against Thinking.” PMLA 122.5 (2007): 1580-1587.

Williams, Jeffrey. “The New Modesty in Literary Criticism.” Chronicle of Higher Education January 5, 2015.

Measurement and modeling.

If the Internet is good for anything, it’s good for speeding up the Ent-like conversation between articles, to make that rumble more perceptible by human ears. I thought I might help the process along by summarizing the Stanford Literary Lab’s latest pamphlet — a single-authored piece by Franco Moretti, “‘Operationalizing’: or the function of measurement in modern literary theory.”

One of the many strengths of Moretti’s writing is a willingness to dramatize his own learning process. This pamphlet situates itself as a twist in the ongoing evolution of “computational criticism,” a turn from literary history to literary theory.

Measurement as a challenge to literary theory, one could say, echoing a famous essay by Hans Robert Jauss. This is not what I expected from the encounter of computation and criticism; I assumed, like so many others, that the new approach would change the history, rather than the theory of literature ….

Measurement challenges literary theory because it asks us to “operationalize” existing critical concepts — to say, for instance, exactly how we know that one character occupies more “space” in a work than another. Are we talking simply about the number of words they speak? or perhaps about their degree of interaction with other characters?

Moretti uses Alex Woloch’s concept of “character-space” as a specific example of what it means to operationalize a concept, but he’s more interested in exploring the broader epistemological question of what we gain by operationalizing things. When literary scholars discuss quantification, we often tacitly assume that measurement itself is on trial. We ask ourselves whether measurement is an adequate proxy for our existing critical concepts. Can mere numbers capture the ineffable nuances we assume they possess? Here, Moretti flips that assumption and suggests that measurement may have something to teach us about our concepts — as we’re forced to make them concrete, we may discover that we understood them imperfectly. At the end of the article, he suggests for instance (after begging divine forgiveness) that Hegel may have been wrong about “tragic collision.”

I think Moretti is frankly right about the broad question this pamphlet opens. If we engage quantitative methods seriously, they’re not going to remain confined to empirical observations about the history of predefined critical concepts. Quantification is going to push back against the concepts themselves, and spill over into theoretical debate. I warned y’all back in August that literary theory was “about to get interesting again,” and this is very much what I had in mind.

At this point in a scholarly review, the standard procedure is to point out that a work nevertheless possesses “oversights.” (Insight, meet blindness!) But I don’t think Moretti is actually blind to any of the reflections I add below. We have differences of rhetorical emphasis, which is not the same thing.

For instance, Moretti does acknowledge that trying to operationalize concepts could cause them to dissolve in our hands, if they’re revealed as unstable or badly framed (see his response to Bridgman on pp. 9-10). But he chooses to focus on a case where this doesn’t happen. Hegel’s concept of “tragic collision” holds together, on his account; we just learn something new about it.

In most of the quantitative projects I’m pursuing, this has not been my experience. For instance, in developing statistical models of genre, the first thing I learned was that critics use the word genre to cover a range of different kinds of categories, with different degrees of coherence and historical volatility. Instead of coming up with a single way to operationalize genre, I’m going to end up producing several different mapping strategies that address patterns on different scales.

Something similar might be true even about a concept like “character.” In Vladimir Propp’s Morphology of the Folktale, for instance, characters are reduced to plot functions. Characters don’t have to be people or have agency: when the hero plucks a magic apple from a tree, the tree itself occupies the role of “donor.” On Propp’s account, it would be meaningless to represent a tale like “Le Petit Chaperon Rouge” as a social network. Our desire to imagine narrative as a network of interactions between imagined “people” (wolf ⇌ grandmother) presupposes a separation between nodes and edges that makes no sense for Propp. But this doesn’t necessarily mean that Moretti is wrong to represent Hamlet as a social network: Hamlet is not Red Riding Hood, and tragic drama arguably envisions character in a different way. In short, one of the things we might learn by operationalizing the term “character” is that the term has genuinely different meanings in different genres, obscured for us by the mere continuity of a verbal sign. [I should probably be citing Tzvetan Todorov here, The Poetics of Prose, chapter 5.]

Illustration from "Learning Latent Personas of Film Characters," Bamman et. al.

Illustration from “Learning Latent Personas of Film Characters,” Bamman et. al.

Another place where I’d mark a difference of emphasis from Moretti involves the tension, named in my title, between “measurement” and “modeling.” Moretti acknowledges that there are people (like Graham Sack) who assume that character-space can’t be measured directly, and therefore look for “proxy variables.” But concepts that can’t be directly measured raise a set of issues that are quite a bit more challenging than the concept of a “proxy” might imply. Sack is actually trying to build models that postulate relations between measurements. Digital humanists are probably most familiar with modeling in the guise of topic modeling, a way of mapping discourse by postulating latent variables called “topics” that can’t be directly observed. But modeling is a flexible heuristic that could be used in a lot of different ways.

The illustration on the right is a probabilistic graphical model drawn from a paper on the “Latent Personas of Film Characters” by Bamman, O’Connor, and Smith. The model represents a network of conditional relationships between variables. Some of those variables can be observed (like words in a plot summary w and external information about the film being summarized md), but some have to be inferred, like recurring character types (p) that are hypothesized to structure film narrative.

Having empirically observed the effects of illustrations like this on literary scholars, I can report that they produce deep, Lovecraftian horror. Nothing looks bristlier and more positivist than plate notation.

But I think this is a tragic miscommunication produced by language barriers that both sides need to overcome. The point of model-building is actually to address the reservations and nuances that humanists correctly want to interject whenever the concept of “measurement” comes up. Many concepts can’t be directly measured. In fact, many of our critical concepts are only provisional hypotheses about unseen categories that might (or might not) structure literary discourse. Before we can attempt to operationalize those categories, we need to make underlying assumptions explicit. That’s precisely what a model allows us to do.

It’s probably going to turn out that many things are simply beyond our power to model: ideology and social change, for instance, are very important and not at all easy to model quantitatively. But I think Moretti is absolutely right that literary scholars have a lot to gain by trying to operationalize basic concepts like genre and character. In some cases we may be able to do that by direct measurement; in other cases it may require model-building. In some cases we may come away from the enterprise with a better definition of existing concepts; in other cases those concepts may dissolve in our hands, revealed as more unstable than even poststructuralists imagined. The only thing I would say confidently about this project is that it promises to be interesting.

The imaginary conflicts disciplines create.

One thing I’ve never understood about humanities disciplines is our insistence on staging methodology as ethical struggle. I don’t think humanists are uniquely guilty here; at bottom, it’s probably the institution of disciplinarity itself that does it. But the normative tone of methodological conversation is particularly odd in the humanities, because we have a reputation for embracing multiple perspectives. And yet, where research methods are concerned, we actually seem to find that very hard.

It never seems adequate to say “hey, look through the lens of this method for a sec — you might see something new.” Instead, critics practicing historicism feel compelled to justify their approach by showing that close reading is the crypto-theological preserve of literary mandarins. Arguments for close reading, in turn, feel compelled to claim that distant reading is a slippery slope to takeover by the social sciences — aka, a technocratic boot stomping on the individual face forever. Or, if we do admit that multiple perspectives have value, we often feel compelled to prescribe some particular balance between them.

Imagine if biologists and sociologists went at each other in the same way.

“It’s absurd to study individual bodies, when human beings are social animals!”

“Your obsession with large social phenomena is a slippery slope — if we listened to you, we would eventually forget about the amazing complexity of individual cells!”

“Both of your methods are regrettably limited. What we need, today, is research that constantly tempers its critique of institutions with close analysis of mitochondria.”

As soon as we back up and think about the relation between disciplines, it becomes obvious that there’s a spectrum of mutually complementary approaches, and different points on the spectrum (or different combinations of points) can be valid for different problems.

So why can’t we see this when we’re discussing the possible range of methods within a discipline? Why do we feel compelled to pretend that different approaches are locked in zero-sum struggle — or that there is a single correct way of balancing them — or that importing methods from one discipline to another raises a grave ethical quandary?

It’s true that disciplines are finite, and space in the major is limited. But a debate about “what will fit in the major” is not the same thing as ideology critique or civilizational struggle. It’s not even, necessarily, a substantive methodological debate that needs to be resolved.

One way numbers can after all make us dumber.

[Used to have a more boring title still preserved in the URL. -Ed.] In general I’m deeply optimistic about the potential for dialogue between the humanities and quantitative disciplines. I think there’s a lot we can learn from each other, and I don’t think the humanities need any firewall to preserve their humanistic character.

But there is one place where I’m coming to agree with people who say that quantitative methods can make us dumber. To put it simply: numbers tend to distract the eye. If you quantify part of your argument, critics (including your own internal critic) will tend to focus on problems in the numbers, and ignore the deeper problems located elsewhere.

I’ve discovered this in my own practice. For instance, when I blogged about genre in large digital collections. I got a lot of useful feedback on those blog posts; it was probably the most productive conversation I’ve ever had as a scholar. But most of the feedback focused on potential problems in the quantitative dimension of my argument. E.g., how representative was this collection as a sample of print culture? Or, what smoothing strategies should I be using to plot results? My own critical energies were focused on similar questions.

Those questions were useful, and improved the project greatly, but in most cases they didn’t rock its foundations. And with a year’s perspective, I’ve come to recognize that there were after all foundation-rocking questions to be posed. For instance, in early versions of this project, I hadn’t really ironed out the boundary between “poetry” and “drama.” Those categories overlap, after all! This wasn’t creating quantitative problems (Jordan Sellers and I were handling cases consistently), but it was creating conceptual ones: the line “poetry” below should probably be labeled “nondramatic verse.”

Results I think are still basically reliable, although we need to talk more about that word "genre."

Results I think are still basically reliable, although we need to talk more about that word “genre.”

The biggest problem was even less quantitative, and more fundamental: I needed to think harder about the concept of genre itself. As I model different kinds of genre, and read about similar (traditional and digital) projects by other scholars, I increasingly suspect the elephant in the room is that the word may not actually hold together. Genre may be a box we’ve inherited for a whole lot of basically different things. A bibliography is a genre; so is the novel; so is science fiction; so is the Kailyard school; so is acid house. But formally, socially, and chronologically, those are entities of very different kinds.

Skepticism about foundational concepts has been one of the great strengths of the humanities. The fact that we have a word for something (say genre or the individual) doesn’t necessarily imply that any corresponding entity exists in reality. Humanists call this mistake “reification,” and we should hold onto our skepticism about it. If I hand you a twenty-page argument using Google ngrams to prove that the individual has been losing ground to society over the last hundred years, your response should not be “yeah, but how representative is Google Books, and how good is their OCR?” (Those problems are relatively easy to solve.) Your response should be, “Uh … how do you distinguish ‘the individual’ from ‘society’ again?”

As I said, humanists have been good at catching reification; it’s a strength we should celebrate. But I don’t see this habit of skepticism as an endangered humanistic specialty that needs to be protected by a firewall. On the contrary, we should be exporting our skepticism! This habit of questioning foundational concepts can be just as useful in the sciences and social sciences, where quantitative methods similarly distract researchers from more fundamental problems. [I don’t mean to suggest that it’s never occurred to scientists to resist this distraction: as Matt Wilkens points out in the comments, they’re often good at it. -Ed.]

In psychology, for instance, emphasis on clearing a threshold of statistical significance (defined as a p-value) frequently distracts researchers from more fundamental questions of experimental design (like, are we attempting to measure an entity that actually exists?) Andrew Gelman persuasively suggests that this is not just a problem caused by quantification but can be more broadly conceived as a “dangerous lure of certainty.” In any field, it can be tempting to focus narrowly on the degree of certainty associated with a hypothesis. But it’s often more important to ask whether the underlying question is interesting and meaningfully framed.

On the other hand, this doesn’t mean that humanists need to postpone quantitative research until we know how to define long-debated concepts. I’m now pretty skeptical about the coherence of this word genre, for instance, but it’s a skepticism I reached precisely by attempting to iron out details in a quantitative model. Questions about accuracy can prompt deeper conceptual questions, which reframe questions of accuracy, in a virtuous cycle. The important thing, I think, is not to let yourself stall out on the “accuracy” part of the cycle: it offers a tempting illusion of perfectibility, but that’s not actually our goal.

Postscript: Scott Weingart conveys the point I’m trying to make in a nicely compressed way by saying that it flips the conventional worry that the mere act of quantification will produce unearned trust. In academia, the problem is more often inverse: we’re so strongly motivated to criticize numbers that we forget to be skeptical about everything else.