New methods need a new kind of conversation

Over the last decade, the (small) fraction of articles in the humanities that use numbers has slowly grown. This is happening partly because computational methods are becoming flexible enough to represent a wider range of humanistic evidence. We can model concepts and social practices, for instance, instead of just counting people and things.

That’s exciting, but flexibility also makes arguments complex and hard to review. Journal editors in the humanities may not have a long list of reviewers who can evaluate statistical models. So while quantitative articles certainly encounter some resistance, they don’t always get the kind of detailed resistance they need. I thought it might be useful to stir up conversation on this topic with a few suggestions, aimed less at the DH community than at the broader community of editors and reviewers in the humanities. I’ll start with proposals where I think there’s consensus, and get more opinionated as I go along.

1. Ask to see code and data.

Getting an informed reviewer is a great first step. But to be honest, there’s not a lot of consensus yet about many methodological questions in the humanities. What we need is less strict gatekeeping than transparent debate.

As computational methods spread in the sciences, scientists have realized that it’s impossible to discuss this work fruitfully if you can’t see how the work was done.  Journals like Cultural Analytics reflect this emerging consensus with policies that require authors to share code and data. But mainstream humanities journals don’t usually have a policy in place yet.

Three or four years ago, confusion on this topic was understandable. But in 2018, journals that accept quantitative evidence at all need a policy that requires authors to share code and data when they submit an article for review, and to make it public when the article is published.

I don’t think the details of that policy matter deeply. There are lots of different ways to archive code and data; they are all okay. Special cases and quibbles can be accomodated. For instance, texts covered by copyright (or other forms of IP) need not be shared in their original form. Derived data can be shared instead; that’s usually fine. (Ideally one might also share the code used to derive it.)

2. … especially code.

Humanists are usually skeptical enough about the data underpinning an argument, because decades of debate about canons have trained us to pose questions about the works an author chooses to discuss.

But we haven’t been trained to pose questions about the magnitude of a pattern, or the degree of uncertainty surrounding it. These aspects of a mathematical argument often deserve more discussion than an author initially provides, and to discuss them, we’re going to need to see the code.

I don’t think we should expect code to be polished, or to run easily on any machine. Writing an article doesn’t commit the author to produce an elegant software tool. (In fact, to be blunt, “it’s okay for academic software to suck.”) The author just needs to document what they did, and the best way to do that is to share the code and data they actually used, warts and all.

3. Reproducibility is great, but replication is the real point.

Ideally, the code and data supporting an article should permit a reader to reproduce all the stages of analysis the author(s) originally performed. When this is true, we say the research is “reproducible.”

But there are often rough spots in reproducibility. Stochastic processes may not run exactly the same way each time, for instance.

At this point, people who study reproducibility professionally will crowd forward and offer an eleven-point plan for addressing all rough spots. (“You just set the random number seed so it’s predictable …”)

That’s wonderful, if we really want to polish a system that allows a reader to push a button and get the same result as the original researcher, to the seventh decimal place. But in the humanities, we’re not always at the “polishing” stage of inquiry yet. Often, our question is more like “could this conceivably work? and if so, would it matter?”

In short, I think we shouldn’t let the imperative to share code foster a premature perfectionism. Our ultimate goal is not to prove that you get exactly the same result as the author if you use exactly the same assumptions and the same books. It’s to decide whether the experiment is revealing anything meaningful about the human past. And to decide that, we probably want to repeat the author’s question using different assumptions and a different sample of books.

When we do that, we are not reproducing the argument but replicating it. (See Language Log for a fuller discussion of the difference.) Replication is the real prize in most cases; that’s how knowledge advances. So the point of sharing code and data is often less to stabilize the results of your own work to the seventh decimal place, and more to guide investigators who may want to undertake parallel inquiries. (For instance, Jonathan Goodwin borrowed some of my code to pose a parallel question about Darko Suvin’s model of science fiction.)

I admit this is personal opinion. But I stress replication over reproducibility because it has some implications for the spirit of the whole endeavor. Since people often imagine that quantitative problems have a right answer, we may initially imagine that the point of sharing code and data is simply to catch mistakes.

In my view the point is rather to permit a (mathematical) conversation about the interpretation of the human past. I hope authors and readers will understand themselves as delayed collaborators, working together to explore different options. What if we did X differently? What if we tried a different sample of books? Usually neither sample is wrong, and neither is right. The point is to understand how much different interpretive assumptions do or don’t change our conclusions. In a sense no single article can answer that question “correctly”; it’s a question that has to be solved collectively, by returning to questions and adjusting the way we frame them. The real point of code-sharing is to permit that kind of delayed collaboration.


A broader purpose

The weather prevents me from being there physically, but this is a transcript of my remarks for “Varieties of Digital Humanities,” MLA, Jan 5, 2018.

Using numbers to understand cultural history is often called “cultural analytics”—or sometimes, if we’re talking about literary history in particular, “distant reading.” The practice is older than either name: sociologists, linguists, and adventurous critics like Janice Radway have been using quantitative methods for a long time.

But over the last twenty years, numbers have begun to have a broader impact on literary study, because we’ve learned to use them in a wider range of ways. We no longer just count things that happen to be easily counted (individual words, for instance, or books sold). Instead scholars can start with literary questions that really interest readers, and find ways to model them. Recent projects have cast light, for instance, on the visual impact of poetry, on imagined geography in the novel, on the instability of gender, and on the global diffusion of stream of consciousness. Articles that use numbers are appearing in central disciplinary venues: MLQ, Critical Inquiry, PMLA. Equally important: a new journal called Cultural Analytics has set high standards for transparent and reproducible research.

Of course, scholars still disagree with each other. And that’s part of what makes this field exciting. We aren’t simply piling up facts. New methods are sparking debate about the nature of the knowledge literary historians aim to produce. Are we interpreting the past or explaining it? Can numbers address perspectival questions? The name for these debates is “critical theory.” Twenty years from now, I think it will be clear that questions about quantitative models form an important unit in undergraduate theory courses.

Literary scholars are used to imagining numbers as tools, not as theories. So there’s translation work to be done. But translating between theoretical traditions could be the most important part of this project. Our existing tradition of critical theory teaches students to ask indispensable questions—about power, for instance, and the material basis of ideology. But persuasive answers to those questions will often require a lot of evidence, and the art of extracting meaningful patterns from evidence is taught by a different theoretical tradition, called “statistics.” Students will be best prepared for the twenty-first century if they can connect the two traditions, and do critical theory with numbers.

So in a lot of ways, this is a heady moment. Cultural analytics has historical discoveries, lively theoretical debates, and a public educational purpose. Intellectually, we’re in good shape.

But institutionally, we’re in awful shape. Or to be blunt: we are shape-less. Most literature departments do not teach students how to do this stuff at all. Everything I’ve just discussed may be represented by one unit in one course, where students play with topic models. Reduced to that size, I’m not sure cultural analytics makes any sense. If we were seriously trying to teach students to do critical theory with numbers, we would need to create a sequence of courses that guides them through basic principles (of statistical inference as well as historical interpretation) toward projects where they can pose real questions about the past.

What keeps us from building that curriculum? Part of the obstacle, I think, is the term digital humanities itself. Don’t get me wrong: I’m grateful for the popularity of DH. It has lent energy to many different projects. But the term digital humanities has been popular precisely because it promises that all those projects can still be contained in the humanities. The implicit pitch is something like this: “You won’t need a whole statistics course. Come to our two-hour workshop on topic models instead. You can always find a statistician to collaborate with.”

I understand why digital humanists said that kind of thing eight years ago. We didn’t want to frighten people away. If you write “Learn Another Discipline” on your welcome mat, you may not get many visitors. But a deceptively gentle welcome mat, followed by a trapdoor, is not really more welcoming. So it’s time to be honest about the preparation needed for cultural analytics. Young people entering this field will need to understand the whole process. They won’t even be able to pose meaningful questions, for instance, without some statistics.

Trompe l'oeil door mural

Trompe l’oeil faux door mural from http://www.bumblebee

But the metaphor of a welcome mat may be too optimistic. This field doesn’t have a door yet. I mean, there is no curriculum. So of course the field tends to attract people who already have an extracurricular background—which, of course, is not equally distributed. It shouldn’t surprise us that access is a problem when this field only exists as a social network. The point of a classroom is to distribute knowledge in a more equal, less homosocial way. But digital humanities classes, as currently defined, don’t really teach students how to use numbers. (For a bracingly honest exploration of the problem, see Andrew Goldstone.) So it’s almost naive to discuss “barriers to entry.” There is no entrance to this field. What we have is more like a door painted on the wall. But we’re in denial about that—because to admit the problem, we would have to admit that “DH” isn’t working as a gateway to everything it claims to contain.

I think the courses that can really open doors to cultural analytics are found, right now, in the social sciences. That’s why I recently moved half of my teaching to a School of Information Sciences. There, you find a curricular path that covers statistics and programming along with social questions about technology. I don’t think it’s an accident that you also find better gender and ethnic diversity among people using numbers in the social sciences. Methods get distributed more equally within a discipline that actually teaches the methods. So I recommend fusing cultural analytics with social science partly because it immediately makes this field more diverse. I’m not offering that as a sufficient answer to problems of access. I welcome other answers too. But I am suggesting that social-scientific methods are a necessary part of access. We cannot lower barriers to entry by continuing to pretend that cultural analytics is just the humanities, plus some user-friendly digital tools. That amounts to a trompe-l’oeil door.

What the social sciences lack are courses in literary history. And that’s important, because distant readers set out to answer concrete historical questions. So the unfortunate reality is, this project cannot be contained in one discipline.  The questions we try to answer are taught in the humanities. But the methods we use are taught, right now, in the social sciences and data science. Even if it frightens some students off, we have to acknowledge that cultural analytics is a multi-disciplinary project—a bridge between the humanities and quantitative social science, belonging equally to both.

I’m not recommending this approach for the DH community as a whole. DH has succeeded by fitting into the institutional framework of the humanities. DH courses are often pitched to English or History majors, and for many topics, that works brilliantly. But it’s awkward for quantitative courses. To use numbers wisely, students need preparation that an English major doesn’t provide. So increasingly I see the quantitative parts of DH presented as an interdisciplinary program rather than a concentration in the humanities.

dooropenIn saying this, I don’t mean to undersell the value of numbers for humanists. New methods can profoundly transform our view of the human past, and the research is deeply rewarding. So I’m convinced that statistics, and even machine learning, will gradually acquire a place in the humanistic curriculum.

I’m just saying that this is a bigger, slower project than the rhetoric of DH may have led us to expect. Mathematics doesn’t really come packaged in digital tools. Math is a way of thinking, and using it means entering into a long-term relationship with statisticians and social scientists. We are not borrowing tools for private use inside our discipline, but starting a theoretical conversation that should turn us outward, toward new forms of engagement with our colleagues and the world.

What is the point of studying culture with numbers, after all? It’s not to change English departments, but to enrich the way all students think about culture. The questions we’re posing can have real implications for the way students understand their roles in history—for instance, by linking their own cultural experience to century-spanning trends. Even more urgently, these questions give students a way to connect interpretive insights and resonant human details with habits of experimental inquiry.

Instead of imagining cultural analytics as a subfield of DH, I would almost call it an emerging way to integrate the different aspects of a liberal education. People who want to tackle that challenge are going to have to work across departments to some extent: it’s not a project that an English department could contain. But it is nevertheless an important opportunity for literary scholars, since it’s a place where our work becomes central to the broader purposes of the university as a whole.

It looks like you’re writing an argument against data in literary study …

would you like some help with that?

I’m not being snarky. Right now, I have several friends writing articles that are largely or partly a critique of interrelated trends that go under the names “data” or “distant reading.” It looks like many other articles of the same kind are being written.

This is good news! I believe fervently in Mae West’s theory of publicity. “I don’t care what the newspapers say about me as long as they spell my name right.” (Though it turns out we may not actually know who said that, so I guess the newspapers failed.)

In any case, this blog post is not going to try to stop you from proving that numbers are neoliberal, unethical, inevitably assert objectivity, aim to eliminate all close reading from literary study, fail to represent time, and lead to loss of “cultural authority.” Go for it! Ideas live on critique.

But I do want to help you “spell our names right.” Andrew Piper has recently pointed out that critiques of data-driven research tend to use a small sample of articles. He expressed that more strongly, but I happen to like the article he was aiming at, so I’m going to soften his expression. However, I don’t disagree with the underlying point! For some reason, critics of numbers don’t feel they need to consider more than one example, or two if they’re in a generous mood.

There are some admirable exceptions to this rule. I’ve argued that a recent issue of Genre was, in general, moving in the right direction. And I’m fairly confident that the trend will continue. A field that has been generating mostly articles and pamphlets is about to shift into a lower gear and publish several books. In literary studies, that tends to be an effective way of reframing debate.

But it may be another twelve to eighteen months before those books are out. In the meantime, you’ve got to finish your critique. So let me help with the bibliography.

When you’re tempted to assume that all possible uses of numbers (or “data”) in literary study can be summed up by engaging one or two texts that Franco Moretti wrote in the year 2000, you should resist the assumption. You are actually talking about a long, complex story, and your readers deserve some glimpse of its complexity.

For instance, sociologists, linguists and book historians have been using numbers to describe literature since the middle of the twentieth century. You should make clear whether you are critiquing that work, or just arguing that it is incapable of addressing the inner literariness of literature. The journal Computers and the Humanities started in the 1960s. The 1980s gave rise to a thriving tradition of feminist literary sociology, embodied in books by Janice Radway and Gaye Tuchman, and in the journal Signs. I’ve used one of Tuchman’s regression models as an illustration here.


Variables predicting literary fame in a regression model, from Gaye Tuchman and Nina E. Fortin, Edging Women Out (1989).

<deep breath>

In the 1990s, Mark Olsen (working at the University of Chicago) started to articulate many of the impulses we now call “distant reading.” Around 2000, Franco Moretti gave quantitative approaches an infusion of polemical verve and wit, which raised their profile among literary scholars who had not previously paid attention. (Also, frankly, the fact that Moretti already had disciplinary authority to spend mattered a great deal. Literary scholars can be temperamentally conservative even when theoretically radical.)

But Moretti himself is a moving target. The articles he has written since 2008 aim at different goals, and use different methods, than articles before that date. Part of the point of an experimental method, after all, is that you are forced to revise your assumptions! Because we are actually learning things, this field is changing rapidly. A recent pamphlet from the Stanford Literary Lab conceives the role of the “archive,” for instance, very differently than “Slaughterhouse of Literature” did.

But that pamphlet was written by six authors—a useful reminder that this is a collective project. Today the phrase “distant reading” is often a loose description for large-scale literary history, covering many people who disagree significantly with Moretti. In a recent roundtable in PMLA, for instance, Andrew Goldstone argues for evidence of a more sociological and less linguistic kind. Lisa Rhody and Alison Booth both argue for different scales or forms of “distance.” Richard Jean So argues that the simple measurements which typified much work before 2010 need to be replaced by statistical models, which account for variation and uncertainty in a more principled way.

One might also point, for instance, to Lauren Klein’s work on gaps in the archive, or to Ryan Cordell’s work on literary circulation, or to Katherine Bode’s work, which aims to construct corpora that represent literary circulation rather than production. Or to Matt Wilkens, or Hoyt Long, or Tanya Clement, or Matt Jockers, or James F. English … I’m going to run out of breath before I run out of examples.

Not all of these scholars believe that numbers will put literary scholarship on a more objective footing. Few of them believe that numbers can replace “interpretation” with “explanation.” None of them, as far as I can tell, have stopped doing close reading. (I would even claim to pair numbers with close reading in Joseph North’s strong sense of the phrase: not just reading-to-illustrate-a-point but reading-for-aesthetic-cultivation.) In short, the work literary scholars are doing with numbers is not easily unified by a shared set of principles—and definitely isn’t unified by a 17-year-old polemic. The field is unified, rather, by a fast-moving theoretical debate. Literary production versus circulation. Book history versus social science. Sociology versus linguistics. Measurement versus modeling. Interpretation versus explanation versus prediction.

Critics of this work may want to argue that it all nevertheless fails in the same way, because numbers inevitably (flatten time/reduce reading to visualization/exclude subjectivity/fill in the blank). That’s a fair thesis to pursue. But if you believe that, you need to show that your generalization is true by considering several different (recent!) examples, and teasing out the tacit similarities concealed underneath ostensible disagreements. I hope this post has helped with some of the bibliographic legwork. If you want more sources, I recently wrote a “Genealogy of Distant Reading” that will provide more. Now, tear them apart!

We’re probably due for another discussion of Stanley Fish

I think I see an interesting theoretical debate over the horizon. The debate is too big to resolve in a blog post, but I thought it might be narratively useful to foreshadow it—sort of as novelists create suspense by dropping hints about the character traits that will develop into conflict by the end of the book.

Basically, the problem is that scholars who use numbers to understand literary history have moved on from Stanley Fish’s critique, without much agreement about why or how. In the early 1970s, Fish gave a talk at the English Institute that defined a crucial problem for linguistic analysis of literature. Later published as “What Is Stylistics, and Why are They Saying Such Terrible Things About It?”, the essay focused on “the absence of any constraint” governing the move “from description to interpretation.” Fish takes Louis Milic’s discussion of Jonathan Swift’s “habit of piling up words in series” as an example. Having demonstrated that Swift does this, Milic concludes that the habit “argues a fertile and well stocked mind.” But Fish asks how we can make that sort of inference, generally, about any linguistic pattern. How do we know that reliance on series demonstrates a “well stocked mind” rather than, say, “an anal-retentive personality”?

The problem is that isolating linguistic details for analysis also removes them from the context we normally use to give them a literary interpretation. We know what the exclamation “Sad!” implies, when we see it at the end of a Trumpian tweet. But if you tell me abstractly that writer A used “sad” more than writer B, I can’t necessarily tell you what it implies about either writer. If I try to find an answer by squinting at word lists, I’ll often make up something arbitrary. Word lists aren’t self-interpreting.

Thirty years passed; the internet got invented. In the excitement, dusty critiques from the 1970s got buried. But Fish’s argument was never actually killed, and if you listen to the squeaks of bats, you hear rumors that it still walks at night.

Or you could listen to blogs. This post is partly prompted by a blogged excerpt from a forthcoming work by Dennis Tenen, which quotes Fish to warn contemporary digital humanists that “a relation can always be found between any number of low-level, formal features of a text and a given high-level account of its meaning.” Without “explanatory frameworks,” we won’t know which of those relations are meaningful.

Ryan Cordell’s recent reflections on “machine objectivity” could lead us in a similar direction. At least they lead me in that direction, because I think the error Cordell discusses—over-reliance on machines themselves to ground analysis—often comes from a misguided attempt to solve the problem of arbitrariness exposed by Fish. Researchers are attracted to unsupervised methods like topic modeling in part because those methods seem to generate analytic categories that are entirely untainted by arbitrary human choices. But as Fish explained, you can’t escape making choices. (Should I label this topic “sadness” or “Presidential put-downs”?)

I don’t think any of these dilemmas are unresolvable. Although Fish’s critique identified a real problem, there are lots of valid solutions to it, and today I think most published research is solving the problem reasonably well. But how? Did something happen since the 1970s that made a difference? There are different opinions here, and the issues at stake are complex enough that it could take decades of conversation to work through them. Here I just want to sketch a few directions the conversation could go.

Dennis Tenen’s recent post implies that the underlying problem is that our models of form lack causal, explanatory force. “We must not mistake mere extrapolation for an account of deep causes and effects.” I don’t think he takes this conclusion quite to the point of arguing that predictive models should be avoided, but he definitely wants to recommend that mere prediction should be supplemented by explanatory inference. And to that extent, I agree—although, as I’ll say in a moment, I have a different diagnosis of the underlying problem.

It may also be worth reviewing Fish’s solution to his own dilemma in “What Is Stylistics,” which was that interpretive arguments need to be anchored in specific “interpretive acts” (93). That has always been a good idea. David Robinson’s analysis of Trump tweets identifies certain words (“badly,” “crazy”) as signs that a tweet was written by Trump, and others (“tomorrow,” “join”) as signs that it was written by his staff. But he also quotes whole tweets, so you can see how words are used in context, make your own interpretive judgment, and come to a better understanding of the model. There are many similar gestures in Stanford LitLab pamphlets: distant readers actually rely quite heavily on close reading.

My understanding of this problem has been shaped by a slightly later Fish essay, “Interpreting the Variorum” (1976), which returns to the problem broached in “What Is Stylistics,” but resolves it in a more social way. Fish concludes that interpretation is anchored not just in an individual reader’s acts of interpretation, but in “interpretive communities.” Here, I suspect, he is rediscovering an older hermeneutic insight, which is that human acts acquire meaning from the context of human history itself. So the interpretation of culture inevitably has a circular character.

One lesson I draw is simply, that we shouldn’t work too hard to avoid making assumptions. Most of the time we do a decent job of connecting meaning to an implicit or explicit interpretive community. Pointing to examples, using word lists derived from a historical thesaurus or sentiment dictionary—all of that can work well enough. The really dubious moves we make often come from trying to escape circularity altogether, in order to achieve what Alan Liu has called “tabula rasa interpretation.”

But we can also make quantitative methods more explicit about their grounding in interpretive communities. Lauren Klein’s discussion of the TOME interface she constructed with Jacob Eisenstein is a good model here; Klein suggests that we can understand topic modeling better by dividing a corpus into subsets of documents (say, articles from different newspapers), to see how a topic varies across human contexts.

Of course, if you pursue that approach systematically enough, it will lead you away from topic modeling toward methods that rely more explicitly on human judgment. I have been leaning on supervised algorithms a lot lately—not because they’re easier to test or more reliable than unsupervised ones—but because they explicitly acknowledge that interpretation has to be anchored in human history.

At a first glance, this may seem to make progress impossible. “All we can ever discover is which books resemble these other books selected by a particular group of readers. The algorithm can only reproduce a category someone else already defined!” And yes, supervised modeling is circular. But this is a circularity shared by all interpretation of history, and it never merely reproduces its starting point. You can discover that books resemble each other to different degrees. You can discover that models defined by the responses of one interpretive community do or don’t align with models of another. And often you can, carefully, provisionally, draw explanatory inferences from the model itself, assisted perhaps by a bit of close reading.

I’m not trying to diss unsupervised methods here. Actually, unsupervised methods are based on clear, principled assumptions. And a topic model is already a lot more contextually grounded than “use of series == well stocked mind.” I’m just saying that the hermeneutic circle is a little slipperier in unsupervised learning, easier to misunderstand, and harder to defend to crowds of pitchfork-wielding skeptics.

In short, there are lots of good responses to Fish’s critique. But if that critique is going to be revived by skeptics over the next few years—as I suspect—I think I’ll take my stand for the moment on supervised machine learning, which can explicitly build bridges between details of literary language and social contexts of reception.  There are other ways to describe best practices: we could emphasize a need to seek “explanations,” or avoid claims of “objectivity.” But I think the crucial advance we have made over the 1970s is that we’re no longer just modeling language; we can model interpretive communities at the same time.

Photo credit: A school of yellow-tailed goatfish, photo for NOAA Photo Library, CC-BY Dwayne Meadows, 2004.

Postscript July 15: Jonathan Armoza points out that Stephen Ramsay wrote a post articulating his own, more deformative response to “What is Stylistics” in 2012.

Finding the great divide

Last year, Jordan Sellers and I published an article in Modern Language Quarterly, trying to trace the “great divide” that is supposed to open up between mass culture and advanced literary taste around the beginning of the twentieth century.

I’m borrowing the phrase “great divide” from Andreas Huyssen, but he’s not the only person to describe the phenomenon. Whether we explain it neutrally as a consequence of widespread literacy, or more skeptically as the rise of a “culture industry,” literary historians widely agree that popularity and prestige parted company in the twentieth century. So we were surprised not to be able to measure the widening gap.

expectedWe could certainly model literary taste. We trained a model to distinguish poets reviewed in elite literary magazines from a less celebrated “contrast group” selected randomly. The model achieved roughly 79% accuracy, 1820-1919,  and the stability of the model itself raised interesting questions. But we didn’t find that the model’s accuracy increased across time in the way we would have expected in a period when elite and popular literary taste are specializing and growing apart.

Instead of concluding that the division never happened, we guessed that we had misunderstood it or looked in the wrong place. Algee-Hewitt and McGurl have pretty decisively confirmed that a divide exists in the twentieth century. So we ought to be able to see it emerging. Maybe we needed to reach further into the twentieth century — or maybe we would have better luck with fiction, since the history of fiction provides evidence about sales, as well as prestige?

In fact, getting evidence about that second, economic axis seems to be the key. It took work by many hands over a couple of years: Kyle Johnston, Sabrina Lee, and Jessica Mercado, as well as Jordan Sellers, have all contributed to this project. I’m presenting a preliminary account of our results at Cultural Analytics 2017, and this blog post is just a brief summary of the main point.

When you look at the books described as bestsellers by Publisher’s Weekly, or by book historians (see references to Altick, Bloom, Hackett, Leavis, below) it’s easy to see the two circles of the Venn diagram pulling apart: on the one hand bestsellers, on the other hand books reviewed in elite venues. (For our definition of “elite venues” see the “Table” in a supporting code & data repository.)


On the other hand, when you back up from bestsellers to look at a broader sample of literary production, it’s still not easy to detect increasing stylistic differentiation between the elite “reviewed” texts and the rest of the literary field. A classifier trained on the reviewed fiction has roughly 72.5% accuracy from 1850 to 1949; if you break the century into parts, there are some variations in accuracy, but no consistent pattern. (In a subsequent blog post, I’ll look at the fiddly details of algorithm choice and feature engineering, but the long and short of that question is — it doesn’t make a significant difference.)

To understand why the growing separation of bestsellers from “reviewed” texts at the high end of the market doesn’t seem to make literary production as a whole more strongly stratified, I’ve tried mapping authors onto a two-dimensional model of the literary field, intended to echo Pierre Bourdieu’s well-known diagrams of the interaction between economic and cultural distinction.


Pierre Bourdieu, The Field of Cultural Production (1993), p. 49.

In the diagram below, for instance, the horizontal axis represents sales, and the vertical axis represents prestige. Sales would be easy to measure, if we had all the data. We actually don’t — so see the end of this post for the estimation strategy I adopted. Prestige, on the other hand, is difficult to measure: it’s perspectival and complex. So we modeled prestige by sampling texts that were reviewed in prominent literary magazines, and then training a model that used textual cues to predict the probability that any given book came from the “reviewed” set. An author’s prestige in this diagram is simply the average probability of review for their books. (The Stanford Literary Lab has similarly recreated Bourdieu’s model of distinction in their pamphlet “Canon/Archive,” using academic citations as a measure of prestige.)


The upward drift of these points reveals a fairly strong correlation between prestige and sales. It is possible to find a few high-selling authors who are predicted to lack critical prestige — notably, for instance, the historical novelist W. H. Ainsworth and the sensation novelist Ellen Wood, author of East Lynne. It’s harder to find authors who have prestige but no sales: there’s not much in the northwest corner of the map. Arthur Helps, a Cambridge Apostle, is a fairly lonely figure.

Fast-forward seventy-five years and we see a different picture.


The correlation between sales and prestige is now weaker; the cloud of authors is “rounder” overall.

There are also more authors in the “upper midwest” portion of the map now — people like Zora Neale Hurston and James Joyce, who have critical prestige but not enormous sales (or not up to 1949, at least as far as my model is aware).

There’s also a distinct “genre fiction” and “pulp fiction” world emerging in the southeast corner of this map, ranging from Agatha Christie to Mickey Spillane. (A few years earlier, Edgar Rice Burroughs and Zane Gray are in the same region.)

Moreover, if you just look at the large circles (the authors we’re most likely to remember), you can start to see how people in this period might get the idea that sales are actually negatively correlated with critical prestige. The right side of the map almost looks like a diagonal line slanting down from William Faulkner to P. G. Wodehouse.

That negative correlation doesn’t really characterize the field as a whole. Critical prestige still has a faint positive correlation with sales, as people over on the left side of the map might sadly remind us. But a brief survey of familiar names could give you the opposite impression.

In short, we’re not necessarily seeing a stronger stratification of the literary field. The change might better be described as a decline in the correlation of two existing forms of distinction. And as they become less correlated, the difference between them becomes more visible, especially among the well-known names on the right side of the map.


So, while we’re broadly confirming an existing story about literary history, the evidence also suggests that the metaphor of a “great divide” is a bit of an exaggeration. We don’t see any chasm emerging.

Maps of the literary field also help me understand why a classifier trained on an elite “reviewed” sample didn’t necessarily get stronger over time. The correlation of prestige and sales in the Victorian era means that the line separating the red and blue samples was strongly tilted there, and may borrow some of its strength from both axes. (It’s really a boundary between the prominent and the obscure.)


As we move into the twentieth century, the slope of the line gets flatter, and we get closer to a “pure” model of prestige (as distinguished from sales). But the boundary itself may not grow more clearly marked, if you’re sampling a group of the same size. (However, if you leave The New Republic and New Yorker behind, and sample only works reviewed in little magazines, you do get a more tightly unified group of texts that can be distinguished from a random sample with 83% accuracy.)

This is all great, you say — but how exactly are you “estimating” sales? We don’t actually have good sales figures for every author in HathiTrust Digital Library; we have fairly patchy records that depend on individual publishers.
For the answer to that question, I’m going to refer you to the github repo where I work out a model of sales. The short version is that I borrow a version of “empirical Bayes” from Julia Silge and David Robinson, and apply it to evidence drawn from bestseller lists as well as digital libraries, to construct a rough estimate of each author’s relative prominence in the market. The trick is, basically, to use the evidence we have to construct an estimate of our uncertainty, and then use our uncertainty to revise the evidence. The picture on the left gives you a rough sense of how that transformation works. I think empirical Bayes may turn out to be useful for a lot of problems where historians need to reconstruct evidence that is patchy or missing in the historical record, but the details are too much to explain here; see Silge’s post and my Jupyter notebook.

Bubble charts invite mouse-over exploration. I can’t easily embed interactive viz in this blog, but here are a few links to plotly visualizations:


The texts used here are drawn from HathiTrust via the HathiTrust Research Center. Parts of the research were funded by the Andrew G Mellon Foundation via the WCSA+DC grant, and part by SSHRC via NovelTM.

Most importantly, I want to acknowledge my collaborators on this project, Kyle Johnston, Sabrina Lee, Jessica Mercado, and Jordan Sellers. They contributed a lot of intellectual depth to the project — for instance by doing research that helped us decide which periodicals should represent a given period of literary history.


Algee-Hewitt, Mark, and Mark McGurl. “Between Canon and Corpus: Six Perspectives on 20th-Century Novels.” Stanford Literary Lab, Pamphlet 9, 2015.

Algee-Hewitt, Mark, Sarah Allison, Marissa Gemma, Ryan Heuser, Franco Moretti, Hannah Walser. “Canon/Archive: Large-Scale Dynamics in the Literary Field.” Stanford Literary Lab, January 2016.

Altick, Richard D. The English Common Reader: A Social History of the Mass Reading Public 1800-1900. Chicago: University of Chicago Press, 1957.

Bloom, Clive. Bestsellers: Popular Fiction Since 1900. 2nd edition. Houndmills: Palgrave Macmillan, 2008.

Hackett, Alice Payne, and James Henry Burke. 80 Years of Best Sellers 1895-1975. New York: R.R. Bowker, 1977.

Leavis, Q. D. Fiction and the Reading Public. 1935.

Mott, Frank Luther. Golden Multitudes: The Story of Bestsellers in the United States. New York: R. R. Bowker, 1947.

Robinson, David. Introduction to Empirical Bayes: Examples from Baseball Statistics. 2017.

Silge, Julia. “Singing the Bayesian Beginner Blues.” data science ish, September 2016.

Unsworth, John. 20th Century American Bestsellers. (

Digital humanities as a semi-normal thing

Five years ago it was easy to check on new digital subfields of the humanities. Just open Twitter. If a new blog post had dropped, or a magazine had published a fresh denunciation of “digital humanities,” academics would be buzzing.

In 2017, Stanley Fish and Leon Wieseltier are no longer attacking “DH” — and if they did, people might not care. Twitter, unfortunately, has bigger problems to worry about, because the Anglo-American political world has seen some changes for the worse.

But the world of digital humanities, I think, has seen changes for the better. It seems increasingly taken for granted that digital media and computational methods can play a role in the humanities. Perhaps a small role — and a controversial one — and one without much curricular support. But still!

In place of journalistic controversies and flame wars, we are finally getting a broad scholarly conversation about new ideas. Conversations of this kind take time to develop. Many of us will recall Twitter threads from 2013 anxiously wondering whether digital scholarship would ever have an impact on more “mainstream” disciplinary venues. The answer “it just takes time” wasn’t, in 2013, very convincing.

But in fact, it just took time. Quantitative methods and macroscopic evidence, for instance, are now a central subject of debate in literary studies. (Since flame wars may not be entirely over, I should acknowledge that I’m now moving to talk about one small subfield of DH rather than trying to do justice to the whole thing.)

The immediate occasion for this post is a special issue of Genre (v. 50, n. 1) engaging the theme of “data” in relation to the Victorian novel; this follows a special issue of Modern Language Quarterly on “scale and value.” Next year, “Scale” is the theme of the English Institute, and little birds tell me that PMLA is also organizing an issue on related themes. Meanwhile, of course, the new journal Cultural Analytics is providing an open-access home for essays that make computational methods central to their interpretive practice.

The participants in this conversation don’t all identify as digital humanists or distant readers. But they are generally open-minded scholars willing to engage ideas as ideas, whatever their disciplinary origin. Some are still deeply suspicious of numbers, but they are willing to consider both sides of that question. Many recent essays are refreshingly aware that quantitative analysis is itself a mode of interpretation, guided by explicit reflection on interpretive theory. Instead of reifying computation as a “tool” or “skill,” for instance, Robert Mitchell engages the intellectual history of Bayesian statistics in Genre.

Recent essays also seem aware that the history of large-scale quantitative approaches to the literary past didn’t begin and end with Franco Moretti. References to book history and the Annales School mix with citations of Tanya Clement and Andrew Piper. Although I admire Moretti’s work, this expansion of the conversation is welcome and overdue.

If “data” were a theme — like thing theory or the Anthropocene — this play might now have reached its happy ending. Getting literary scholars to talk about a theme is normally enough.

In fact, the play could proceed for several more acts, because “data” is shorthand for a range of interpretive practices that aren’t yet naturalized in the humanities. At most universities, grad students still can’t learn how to do distant reading. So there is no chance at all that distant reading will become the “next big thing” — one of those fashions that sweeps departments of English, changing everyone’s writing in a way that is soon taken for granted. We can stop worrying about that. Adding citations to Geertz and Foucault can be done in a month. But a method that requires years of retraining will never become the next big thing. Maybe, ten years from now, the fraction of humanities faculty who actually use quantitative methods may have risen to 5% — or optimistically, 7%. But even that change would be slow and deeply controversial.

So we might as well enjoy the current situation. The initial wave of utopian promises and enraged jeremiads about “DH” seems to have receded. Scholars have realized that new objects, and methods, of study are here to stay — and that they are in no danger of taking over. Now it’s just a matter of doing the work. That, also, takes time.

Two syllabi: Digital Humanities and Data Science in the Humanities.

When I began teaching graduate courses about digital humanities, I designed syllabi that tried to cover a little of everything.

I enjoyed teaching those courses, but if I’m being honest, it was a challenge to race from digital editing — to maps and networks — to distant reading — to critical reflection on the concept of DH itself. It was even harder to cover that range of topics while giving students meaningful hands-on experience.

The solution, obviously, was to break the subject into more than one course. But I didn’t know how to do that within an English graduate curriculum. Many students are interested in learning about “digital humanities,” because a lot of debate has swirled around that broad rubric. I think the specific fields of inquiry grouped under the rubric actually make better-sized topics for a course, but they don’t have the same kind of name recognition, and courses on those topics don’t enroll as heavily.

This problem became easier to solve when part of my job moved into the School of Information Sciences. Many aspects of digital humanities — from social reflection on information technology to data mining — are already represented in the curriculum here. So I could divide DH into parts, and still have confidence that students would recognize those parts and understand how each part fit into an existing program of study.

This year I’ve taught two courses in the LIS curriculum. I’m sharing syllabi for both at once so I can also describe the contrast between them.

1. The first of the two, “Digital Humanities” (syllabus), is fundamentally a survey of DH as a social phenomenon, with special emphasis on the role of academic libraries and librarians — since that is likely to be a career path that many MLIS students are considering. The course covers a wide range of humanistic themes and topics, but doesn’t go very deeply into hands-on exploration of methods.

2. The second course, “Data Science in the Humanities” (syllabus)  covers the field that digital humanists often call “cultural analytics” — or “distant reading,” when it focuses on literature. Although I know its history is actually more complex, I’m characterizing this field as a form of data science in order to highlight its value for a wide range of students who may or may not intend to work as researchers in universities. I think humanistic questions can be great training for the slippery problems one encounters in business and computational journalism, for instance. But as Dennis Tenen and Andrew Goldstone (among others) have rightly pointed out, it can be a huge challenge to cover all the methods required for this sort of work in a single course. I’m not sure I have a perfect solution to that problem yet. The course is only in its third week! But we are aiming to achieve a kind of hands-on experience that combines Python programming with basic principles of statistics and machine learning, and with reflection on the challenges of social interpretation. I believe this may be achievable, in a course that doesn’t have to cover other aspects of DH, and when many students have at least a little previous experience, both in programming and in the humanities.

As Jupyter notebooks for the data science course are developed, I’m sharing them in a github repo. In both of the syllabi linked above, I also mention other syllabi that served as models. My thanks go out to everyone who shared their experience; I leaned on some of those models very heavily.

data_science_vdThe question I haven’t resolved yet is, How do we connect courses like these to an English curriculum? That connection remains crucial: I chose the phrase “data science” partly because the conversation around data science has explicitly acknowledged the importance of domain expertise. (See Drew Conway’s famous Venn diagram on the right.) I do think researchers need substantive knowledge about specific aspects of cultural history in order to frame meaningful questions about the past and interpret the patterns they find.

Right now, the courses I’m offering in LIS are certainly open to graduate students from humanities departments. But over the long run, I would also like to develop courses located in humanities departments that focus on specific literary-historical problems (for instance, questions of canonicity and popularity in a particular century), integrating distant-reading approaches only as one element of a broader portfolio of methods. Courses like that would fit fairly easily into an English graduate curriculum.

On the other hand, none of the courses I’ve described above can (by themselves) solve the most challenging pedagogical problem in DH, which is to make distant reading useful for doctoral dissertations. Right now, that’s very hard. The research opportunities in distant reading are huge, I believe, but that hugeness becomes itself a barrier. A field where you start making important discoveries after two to three years initial start-up time (training yourself, developing corpora, etc) is not ideally configured for the individualistic model of doctoral research that prevails in the humanities. Collective lab-centered projects are probably a better fit for this field. We may need to envision dissertations as being (at least in part) pieces of a larger research project, exploring one aspect of a shared problem.