Remarks for a panel on data science in literary studies, in 2028

A transcript of these remarks was sent back via time machine to the Novel Theory conference at Ithaca in 2018, where a panel had been asked to envision what literary studies would look like if data analysis came to be understood as a normal part of the discipline.

I want to congratulate the organizers on their timeliness; 2028 is the perfect time for this retrospective. Even ten years ago I think few of us could have imagined that quantitative methods would become as uncontroversial in literary studies as they are today. (I, myself, might not have believed it.)

But the emergence of a data science option in the undergrad major changed a lot. It is true that the option only exists at a few schools, and most undergrads don’t opt for it. But it has made a few students confident enough to explore further. Today, almost 10% of dissertations in literary studies use numbers in some way.

A tenth of the field is not huge, but even that small foothold has had catalytic effects. One obvious effect has been to give literary studies a warm-water port in the social sciences. Without data science, I’m not sure we would be having the vigorous conversations we’re having today with sociologists, legal scholars, and historians of social media. Increasingly, our fields are bound together by a shared concern with large-scale textual interpretation. That shared conversation, in turn, invites journalists to give literary arguments a different kind of attention. There are examples everywhere. But since we’re all reading it, let me just point to today’s article in The Guardian on the Trump Prison Diaries—which couldn’t have been written ten years ago, for several obvious reasons.

But data analysis has also led to less obvious changes. I think even the recent, widely-discussed return to evaluative criticism—the so-called “new belletrism”—may have had something to do with data.


The conference venue, easily reached by water taxi.

I know this will seem unlikely. These are usually presented as opposing currents in the contemporary scene—one artsy, one mathy; one tending to focus on contemporary literature, the other on the longue durée. But I would argue that quantitative methods have made it easier to treat aesthetic arguments as scholarly questions.

It used to be difficult, after all, to reconcile evaluation with historicism. If you disavowed timeless aesthetic judgments, then it seemed you could only do historical reportage on the peculiar opinions of the 1820s or the 1920s.

Work on the history of reception has created an expansive middle ground between those poles—a way to study judgments that do change, but change in practice very slowly, across century-spanning arcs. Those arcs became visible when we backed up to a distance, but in a sense we can’t get historical distance from them; they sprawl into the present and can’t be relegated to the past. So historical scholars and contemporary critics are increasingly forced onto each other’s turf. Witness the fireworks lately between MFA programs and distant readers arguing that the recent history of the novel is really driven by genre fiction.

Most of these fireworks, I think, are healthy. But there have also been downsides. Ten years ago, none of us imagined that divisions within the data science community could become as deep or bitter as they are today. In a general sense data may be uncontroversial: many literary scholars use, say, a table of sales figures. But machine learning is more controversial than ever.

Ironically, it is often the people most enthusiastic about other forms of data who are rejecting machine learning. And I have to admit they have a point when they call it “subjective.”

It is notoriously true, after all, that learning algorithms absorb the biases implicit in the evidence you give them. So in choosing evidence for our models we are, in a sense, choosing a historical vantage point. Some people see this as appropriate for an interpretive discipline. “We have always known that interpretation was circular,” they say, “and it’s healthy to acknowledge that our inferences start from a situated perspective” (Rosencrantz, 2025). Other people worry that the subjectivity of machine learning is troubling, because it “hides inside a black box” (Guildenstern, 2027). I don’t think the debate is going away soon; I fear literary theorists will still be arguing about it when we meet again in 2038.