How much DH can we fit in a literature department?

It’s an open secret that the social phenomenon called “digital humanities” mostly grew outside the curriculum. Library-based programs like Scholars’ Lab at UVA have played an important role; so have “centers” like MITH (Maryland) and CHNM (George Mason) — not to mention the distributed unconference movement called THATCamp, which started at CHNM. At Stanford, the Literary Lab is a sui generis thing, related to departments of literature but not exactly contained inside them.

The list could go on, but I’m not trying to cover everything — just observing that “DH” didn’t begin by embedding itself in the curricula of humanities departments. It went around them, in improvisational and surprisingly successful ways.

That’s a history to be proud of, but I think it’s also setting us up for predictable frustrations at the moment, as disciplines decide to import “DH” and reframe it in disciplinary terms. (“Seeking a scholar of early modern drama, with a specialization in digital humanities …”)

Of course, digital methods do have consequences for existing disciplines; otherwise they wouldn’t be worth the trouble. In my own discipline of literary study, it’s now easy to point to a long sequence of substantive contributions to literary study that use digital methods to make thesis-driven interventions in literary history and even interpretive theory.

But although the research payoff is clear, the marriage between disciplinary and extradisciplinary institutions may not be so easy. I sense that a lot of friction around this topic is founded in a feeling that it ought to be straightforward to integrate new modes of study in disciplinary curricula and career paths. So when this doesn’t go smoothly, we feel there must be some irritating mistake in existing disciplines, or in the project of DH itself. Something needs to be trimmed to fit.

What I want to say is just this: there’s actually no reason this should be easy. Grafting a complex extradisciplinary project onto existing disciplines may not completely work. That’s not because anyone made a mistake.

Consider my home field of literary study. If digital methods were embodied in a critical “approach,” like psychoanalysis, they would be easy to assimilate. We could identify digital “readings” of familiar texts, add an article to every Norton edition, and be done with it. In some cases that actually works, because digital methods do after all change the way we read familiar texts. But DH also tends to raise foundational questions about the way literary scholarship is organized. Sometimes it valorizes things we once considered “mere editing” or “mere finding aids”; sometimes it shifts the scale of literary study, so that courses organized by period and author no longer make a great deal of sense. Disciplines can be willing to welcome new ideas, and yet (understandably) unwilling to undertake this sort of institutional reorganization.

Training is an even bigger problem. People have argued long and fiercely about the amount of digital training actually required to “do DH,” and I’m not going to resolve that question here. I just want to say that there’s a reason for the argument: it’s a thorny problem. In many cases, humanists are now tackling projects that require training not provided in humanities departments. There are a lot of possible fixes for that — we can make tools easier to use, foster collaboration — but none of those fixes solve the whole problem. Not everything can be externalized as a “tool.” Some digital methods are really new forms of interpretation; packaging them in a GUI would create a problematic black box. Collaboration, likewise, may not remove the need for new forms of training. Expecting computer scientists to do all the coding on a project can be like expecting English professors to do all the spelling.

I think these problems can find solutions, but I’m coming to suspect that the solutions will be messy. Humanities curricula may evolve, but I don’t think the majority of English or History departments are going to embrace rapid structural change — for instance, change of the kind that would be required to support graduate programs in distant reading. These disciplines have already spent a hundred years rejecting rapprochement with social science; why would they change course now? English professors may enjoy reading Moretti, but it’s going to be a long time before they add a course on statistical methods to the major.

Meanwhile, there are other players in this space (at least at large universities): iSchools, Linguistics, Departments of Communications, Colleges of Media. Digital methods are being assimilated rapidly in these places. New media, of course, are already part of media studies, and if a department already requires statistics, methods like topic modeling are less of a stretch. It’s quite possible that the distant reading of literary culture will end up being shared between literature departments and (say) Communications. The reluctance of literary studies to become a social science needn’t prevent social scientists from talking about literature.

I’m saying all this because I think there’s a strong tacit narrative in DH that understands extradisciplinary institutions as a wilderness, in which we have wandered that we may reach the promised land of recognition by familiar disciplinary authority. In some ways that’s healthy. It’s good to have work organized by clear research questions (so we aren’t just digitizing aimlessly), and I’m proud that digital methods are making contributions to the core concerns of literary studies.

But I’m also wary of the normative pressures associated with that narrative, because (if you’ll pardon the extended metaphor) I’m not sure this caravan actually fits in the promised land. I suspect that some parts of the sprawling enterprise called “DH” (in fact, some of the parts I enjoy most) won’t be absorbed easily in the curricula of History or English. That problem may be solved differently at different schools; the nice thing about strong extradisciplinary institutions is that they allow us to work together even if the question of disciplinary identity turns out to be complex.

postscript: This whole post should have footnotes to Bethany Nowviskie every time I use the term “extradisciplinary,” and to Matt Kirschenbaum every time I say “DH” with implicit air quotes.

Measurement and modeling.

If the Internet is good for anything, it’s good for speeding up the Ent-like conversation between articles, to make that rumble more perceptible by human ears. I thought I might help the process along by summarizing the Stanford Literary Lab’s latest pamphlet — a single-authored piece by Franco Moretti, “‘Operationalizing’: or the function of measurement in modern literary theory.”

One of the many strengths of Moretti’s writing is a willingness to dramatize his own learning process. This pamphlet situates itself as a twist in the ongoing evolution of “computational criticism,” a turn from literary history to literary theory.

Measurement as a challenge to literary theory, one could say, echoing a famous essay by Hans Robert Jauss. This is not what I expected from the encounter of computation and criticism; I assumed, like so many others, that the new approach would change the history, rather than the theory of literature ….

Measurement challenges literary theory because it asks us to “operationalize” existing critical concepts — to say, for instance, exactly how we know that one character occupies more “space” in a work than another. Are we talking simply about the number of words they speak? or perhaps about their degree of interaction with other characters?

Moretti uses Alex Woloch’s concept of “character-space” as a specific example of what it means to operationalize a concept, but he’s more interested in exploring the broader epistemological question of what we gain by operationalizing things. When literary scholars discuss quantification, we often tacitly assume that measurement itself is on trial. We ask ourselves whether measurement is an adequate proxy for our existing critical concepts. Can mere numbers capture the ineffable nuances we assume they possess? Here, Moretti flips that assumption and suggests that measurement may have something to teach us about our concepts — as we’re forced to make them concrete, we may discover that we understood them imperfectly. At the end of the article, he suggests for instance (after begging divine forgiveness) that Hegel may have been wrong about “tragic collision.”

I think Moretti is frankly right about the broad question this pamphlet opens. If we engage quantitative methods seriously, they’re not going to remain confined to empirical observations about the history of predefined critical concepts. Quantification is going to push back against the concepts themselves, and spill over into theoretical debate. I warned y’all back in August that literary theory was “about to get interesting again,” and this is very much what I had in mind.

At this point in a scholarly review, the standard procedure is to point out that a work nevertheless possesses “oversights.” (Insight, meet blindness!) But I don’t think Moretti is actually blind to any of the reflections I add below. We have differences of rhetorical emphasis, which is not the same thing.

For instance, Moretti does acknowledge that trying to operationalize concepts could cause them to dissolve in our hands, if they’re revealed as unstable or badly framed (see his response to Bridgman on pp. 9-10). But he chooses to focus on a case where this doesn’t happen. Hegel’s concept of “tragic collision” holds together, on his account; we just learn something new about it.

In most of the quantitative projects I’m pursuing, this has not been my experience. For instance, in developing statistical models of genre, the first thing I learned was that critics use the word genre to cover a range of different kinds of categories, with different degrees of coherence and historical volatility. Instead of coming up with a single way to operationalize genre, I’m going to end up producing several different mapping strategies that address patterns on different scales.

Something similar might be true even about a concept like “character.” In Vladimir Propp’s Morphology of the Folktale, for instance, characters are reduced to plot functions. Characters don’t have to be people or have agency: when the hero plucks a magic apple from a tree, the tree itself occupies the role of “donor.” On Propp’s account, it would be meaningless to represent a tale like “Le Petit Chaperon Rouge” as a social network. Our desire to imagine narrative as a network of interactions between imagined “people” (wolf ⇌ grandmother) presupposes a separation between nodes and edges that makes no sense for Propp. But this doesn’t necessarily mean that Moretti is wrong to represent Hamlet as a social network: Hamlet is not Red Riding Hood, and tragic drama arguably envisions character in a different way. In short, one of the things we might learn by operationalizing the term “character” is that the term has genuinely different meanings in different genres, obscured for us by the mere continuity of a verbal sign. [I should probably be citing Tzvetan Todorov here, The Poetics of Prose, chapter 5.]

Illustration from "Learning Latent Personas of Film Characters," Bamman et. al.

Illustration from “Learning Latent Personas of Film Characters,” Bamman et. al.

Another place where I’d mark a difference of emphasis from Moretti involves the tension, named in my title, between “measurement” and “modeling.” Moretti acknowledges that there are people (like Graham Sack) who assume that character-space can’t be measured directly, and therefore look for “proxy variables.” But concepts that can’t be directly measured raise a set of issues that are quite a bit more challenging than the concept of a “proxy” might imply. Sack is actually trying to build models that postulate relations between measurements. Digital humanists are probably most familiar with modeling in the guise of topic modeling, a way of mapping discourse by postulating latent variables called “topics” that can’t be directly observed. But modeling is a flexible heuristic that could be used in a lot of different ways.

The illustration on the right is a probabilistic graphical model drawn from a paper on the “Latent Personas of Film Characters” by Bamman, O’Connor, and Smith. The model represents a network of conditional relationships between variables. Some of those variables can be observed (like words in a plot summary w and external information about the film being summarized md), but some have to be inferred, like recurring character types (p) that are hypothesized to structure film narrative.

Having empirically observed the effects of illustrations like this on literary scholars, I can report that they produce deep, Lovecraftian horror. Nothing looks bristlier and more positivist than plate notation.

But I think this is a tragic miscommunication produced by language barriers that both sides need to overcome. The point of model-building is actually to address the reservations and nuances that humanists correctly want to interject whenever the concept of “measurement” comes up. Many concepts can’t be directly measured. In fact, many of our critical concepts are only provisional hypotheses about unseen categories that might (or might not) structure literary discourse. Before we can attempt to operationalize those categories, we need to make underlying assumptions explicit. That’s precisely what a model allows us to do.

It’s probably going to turn out that many things are simply beyond our power to model: ideology and social change, for instance, are very important and not at all easy to model quantitatively. But I think Moretti is absolutely right that literary scholars have a lot to gain by trying to operationalize basic concepts like genre and character. In some cases we may be able to do that by direct measurement; in other cases it may require model-building. In some cases we may come away from the enterprise with a better definition of existing concepts; in other cases those concepts may dissolve in our hands, revealed as more unstable than even poststructuralists imagined. The only thing I would say confidently about this project is that it promises to be interesting.

Interesting times for literary theory.

A couple of weeks ago, after reading abstracts from DH2013, I said that the take-away for me was that “literary theory is about to get interesting again” – subtweeting the course of history in a way that I guess I ought to explain.

A 1915 book by Chicago's "Professor of Literary Theory."

A 1915 book by Chicago’s “Professor of Literary Theory.”

In the twentieth century, “literary theory” was often a name for the sparks that flew when literary scholars pushed back against challenges from social science. Theory became part of the academic study of literature around 1900, when the comparative study of folklore seemed to reveal coherent patterns in national literatures that scholars had previously treated separately. Schools like the University of Chicago hired “Professors of Literary Theory” to explore the controversial possibility of generalization.* Later in the century, structural linguistics posed an analogous challenge, claiming to glimpse an organizing pattern in language that literary scholars sought to appropriate and/or deconstruct. Once again, sparks flew.

I think literary scholars are about to face a similarly productive challenge from the discipline of machine learning — a subfield of computer science that studies learning as a problem of generalization from limited evidence. The discipline has made practical contributions to commercial IT, but it’s an epistemological method founded on statistics more than it is a collection of specific tools, and it tends to be intellectually adventurous: lately, researchers are trying to model concepts like “character” (pdf) and “gender,” citing Judith Butler in the process (pdf).

At DH2013 and elsewhere, I see promising signs that literary scholars are gearing up to reply. In some cases we’re applying methods of machine learning to new problems; in some cases we’re borrowing the discipline’s broader underlying concepts (e.g. the notion of a “generative model”); in some cases we’re grappling skeptically with its premises. (There are also, of course, significant collaborations between scholars in both fields.)

This could be the beginning of a beautiful friendship. I realize a marriage between machine learning and literary theory sounds implausible: people who enjoy one of these things are pretty likely to believe the other is fraudulent and evil.** But after reading through a couple of ML textbooks,*** I’m convinced that literary theorists and computer scientists wrestle with similar problems, in ways that are at least loosely congruent. Neither field is interested in the mere accumulation of data; both are interested in understanding the way we think and the kinds of patterns we recognize in language. Both fields are interested in problems that lack a single correct answer, and have to be mapped in shades of gray (ML calls these shades “probability”). Both disciplines are preoccupied with the danger of overgeneralization (literary theorists call this “essentialism”; computer scientists call it “overfitting”). Instead of saying “every interpretation is based on some previous assumption,” computer scientists say “every model depends on some prior probability,” but there’s really a similar kind of self-scrutiny involved.

It’s already clear that machine learning algorithms (like topic modeling) can be useful tools for humanists. But I think I glimpse an even more productive conversation taking shape, where instead of borrowing fully-formed “tools,” humanists borrow the statistical language of ML to think rigorously about different kinds of uncertainty, and return the favor by exposing the discipline to boundary cases that challenge its methods.

Won’t quantitative models of phenomena like plot and genre simplify literature by flattening out individual variation? Sure. But the same thing could be said about Freud and Lévi-Strauss. When scientists (or social scientists) write about literature they tend to produce models that literary scholars find overly general. Which doesn’t prevent those models from advancing theoretical reflection on literature! I think humanists, conversely, can warn scientists away from blind alleys by reminding them that concepts like “gender” and “genre” are historically unstable. If you assume words like that have a single meaning, you’re already overfitting your model.

Of course, if literary theory and computer science do have a conversation, a large part of the conversation is going to be a meta-debate about what the conversation can or can’t achieve. And perhaps, in the end, there will be limits to the congruence of these disciplines. Alan Liu’s recent essay in PMLA pushes against the notion that learning algorithms can be analogous to human interpretation, suggesting that statistical models become meaningful only through the inclusion of human “seed concepts.” I’m not certain how deep this particular disagreement goes, because I think machine learning researchers would actually agree with Liu that statistical modeling never starts from a tabula rasa. Even “unsupervised” algorithms have priors. More importantly, human beings have to decide what kind of model is appropriate for a given problem: machine learning aims to extend our leverage over large volumes of data, not to take us out of the hermeneutic circle altogether.

But as Liu’s essay demonstrates, this is going to be a lively, deeply theorized conversation even where it turns out that literary theory and computer science have fundamental differences. These disciplines are clearly thinking about similar questions: Liu is right to recognize that unsupervised learning, for instance, raises hermeneutic questions of a kind that are familiar to literary theorists. If our disciplines really approach similar questions in incompatible ways, it will be a matter of some importance to understand why.

0804784469* <plug> For more on “literary theory” in the early twentieth century, see the fourth chapter of Why Literary Periods Mattered: Historical Contrast and the Prestige of English Studies (2013, hot off the press). The book has a lovely cover, but unfortunately has nothing to do with machine learning. </plug>

** This post grows out of a conversation I had with Eleanor Courtemanche, in which I tried to convince her that machine learning doesn’t just reproduce the biases you bring to it.

*** Practically, I usually rely on Data Mining: Practical Machine Learning Tools and Techniques (Ian Witten, Eibe Frank, Mark Hall), but to understand the deeper logic of the field I’ve been reading Machine Learning: A Probabilistic Perspective (Kevin P. Murphy). Literary theorists may appreciate Murphy’s remark that wealth has a long-tailed distribution, “especially in plutocracies such as the USA” (43).

PS later that afternoon: Belatedly realize I didn’t say anything about the most controversial word in my original tweet: “literary theory is about to get interesting again.” I suppose I tacitly distinguish literary theory (which has been a little sleepy lately, imo) from theory-sans-adjective (which has been vigorous, although hard to define). But now I’m getting into a distinction that’s much too slippery for a short blog post.

On the novelty of “humanistic values.”

Academics have been discussing a crisis “in” or “of” the humanities since the late 1980s. Scholars disagree about the nature of the crisis, but it’s a widely shared premise that one is located somewhere “in the humanities.”

The crisis of the humanities, as seen in Google Books.The phrase “digital humanities” invites a connection to this debate. If DH is about the humanities, and “grounded in humanistic values” (Spiro 23), then it stands to reason that it ought to somehow respond to any crisis that threatens “the humanities.” This is the premise that fuels Alan Liu’s well-known argument about DH and cultural criticism. “[T]he digital humanities community,” he argues, has a “special potential and responsibility to assist humanities advocacy.”

I think these assumptions need to be brought into conversation with Geoffrey Harpham’s recent, important book The Humanities and the Dream of America (h/t @noeljackson). Harpham’s central point is simple: our concept of “the humanities” emerged quite recently. Although the individual disciplines grouped under that umbrella are older, the umbrella itself is largely a twentieth-century invention — and only became institutionally central after WWII.

Since the beginning of the twentieth century, when administrators at Columbia, Chicago, Yale, and Harvard began to speak fervently of the moral and spiritual benefits of a university education, “the humanities” has served as the name and the form of the link Arnold envisioned between culture, education, and the state. Particularly after World War II, the humanities began to be opposed not just to its traditional foil, science, but also to social science, whose emergence as a powerful force in the American academy was marked by the founding of the Center for Advanced Study in the Behavioral Sciences at Stanford in 1951 (87).

In research for a forthcoming book (Why Literary Periods Mattered, Stanford UP) I’ve poked around a bit in the institutional history of the early-twentieth-century university, and Harpham’s thesis rings true to me. Although the word has a pre-twentieth-century history, our present understanding of “the humanities” is strongly shaped by an institutional opposition between humanities and social sciences that only made sense in the twentieth century. For whatever it’s worth, Google Books also tends to support Harpham’s contention that the concept of the humanities has only possessed its present prominence since WWII.

humanitiesDefenses of “culture,” of course, are older. But it hasn’t always been clear that culture was coextensive with the disciplines now grouped together as humanistic. In the middle of the twentieth century, literary critics like René Wellek fervently defended literary culture from philistine encroachment by the discipline of history. The notion that literary scholars and historians must declare common cause against a besieging world of philistines is a very different script, and one that really only emerged in the last thirty years.

Why do I say all this? Am I trying to divide literary scholars from historians? Don’t I see that we have to hang together, or hang separately?

I understand that higher education, as a whole, is under attack from the right. So I’m happy to declare common cause with people who are working to articulate the value of literary studies and history — or for that matter, anthropology and library science. But I don’t think it’s quite inevitable that these battles should be fought under the flag of the humanities.

After all, Florida governor Rick Scott has been just as critical of “anthropology” as of literary criticism. Humanists could well choose to make common cause with the social sciences, in order to defend shared interests.

Or one could argue that we’d be better off fighting for specific concepts like “literature” and “history” and “art.” People outside the university know what those are. It’s not clear that they have a vivid concept of the humanities. It’s a term of recent and mostly academic provenance.

lithistOn the other hand, there may be good reason to mobilize around “the humanities.” Certainly the NEH itself is worth defending. Ultimately, this is a question of political strategy, and I don’t have strong opinions about it. I’m very happy to see people defending individual disciplines, or the humanities, or higher education as a whole. In my eyes, it’s all good.

But I do want to push back gently against the notion that scholars in any discipline have a political obligation to organize under the banner of “the humanities,” or an intellectual obligation to define “humanistic” methods. The concept of the humanities may well be a recent invention, shaped by twentieth-century struggles over institutional turf. We talk about “humanistic values” as if they were immemorial. But Erasmus did not share our sense that history and literature have to band together in order to resist encroachment by sociology.

More pointedly: cultural criticism and humanities advocacy are fundamentally different things. There have been many kinds of critical, politically engaged intellectuals; only in the last sixty years have some of them self-identified as humanists.

What does all this mean for the digital humanities? I don’t know. Since “the humanities” are built right into the phrase, perhaps it should belong to people who identify as humanists. But much of the work that interests me personally is now taking place in departments of Library and Information Science, which inherit a social science tradition (as Kari Kraus has recently pointed out). So I would also be happy with a phrase like “digital humanities and social sciences.” Dan Cohen recently used that phrase as a course title, and it’s an interesting move.

Added a few hours after posting: To show a few more of my own cards, I’ll confess that what I love most about DH is the freedom to ignore disciplinary boundaries and follow shared problems wherever they lead. But I’m beginning to suspect that the concept of the humanities may itself discourage interdisciplinary risks. It seems to have been invented (rather recently) to define certain disciplines through their collective difference from the social and natural sciences. If that’s true, “digital humanities” may be an awkward concept for me. I’m a literary historian, and I do feel loyalty to the methods of that discipline. But I don’t feel loyalty to them specifically as different from the sciences.

Added a day after initial posting: And, to be clear, I don’t mean that we need a better name than “digital humanities.” There’s a basic tension between interdisciplinarity and field definition — so any name can become constricting if you spend too much time defining it. For me the bottom line is this: I like the interdisciplinary energy that I’ve found in the DH blogosphere and don’t care what we call it — don’t care, in a radical way — to the extent that I don’t even care whether critics think DH is consonant with, quote, “humanistic values.” Because in truth, some of those values are recent inventions, shaped by pressure to differentiate the humanities from the social sciences — and that move deserves to be questioned every bit as much as DH itself does. /done now

References
Harpham, Geoffrey Galt. The Humanities and the Dream of America. Chicago: University of Chicago Press, 2011. (I should note that I may not agree with all aspects of Harpham’s argument. In particular, I’m not yet persuaded that the concept of ‘the humanities’ is as fully identified with the United States in particular as he argues.)

Liu, Alan. “Where is Cultural Criticism in the Digital Humanities.” Debates in the Digital Humanities. Ed. Matthew K. Gold. (Minnesota: University of Minnesota Press, 2012). 490-509.

Spiro, Lisa. “‘This is Why We Fight’: Defining the Values of the Digital Humanities.” Debates in the Digital Humanities. Ed. Matthew K. Gold. (Minnesota: University of Minnesota Press, 2012). 16-35.

What can topic models of PMLA teach us about the history of literary scholarship?

by Andrew Goldstone and Ted Underwood

Of all our literary-historical narratives it is the history of criticism itself that seems most wedded to a stodgy history-of-ideas approach—narrating change through a succession of stars or contending schools. While scholars like John Guillory and Gerald Graff have produced subtler models of disciplinary history, we could still do more to complicate the narratives that organize our discipline’s understanding of itself.

A browsable network based on Underwood's model of PMLA. Click through, then mouse over or click on individual topics.

A browsable network based on Underwood's model of PMLA. Click through, then mouse over or click on individual topics.

The archive of scholarship is also, unlike many twentieth-century archives, digitized and available for “distant reading.” Much of what we need is available through JSTOR’s Data for Research API. So last summer it occurred to a group of us that topic modeling PMLA might provide a new perspective on the history of literary studies. Although Goldstone and Underwood are writing this post, the impetus for the project also came from Natalia Cecire, Brian Croxall, and Roger Whitson, who may do deeper dives into specific aspects of this archive in the near future.

Topic modeling is a technique that automatically identifies groups of words that tend to occur together in a large collection of documents. It was developed about a decade ago by David Blei among others. Underwood has a blog post explaining topic modeling, and you can find a practical introduction to the technique at the Programming Historian. Jonathan Goodwin has explained how it can be applied to the word-frequency data you get from JSTOR.

Obviously, PMLA is not an adequate synecdoche for literary studies. But, as a generalist journal with a long history, it makes a useful test case to assess the value of topic modeling for a history of the discipline.

Goldstone and Underwood each independently produced several different models of PMLA, using different software, stopword lists, and numbers of topics. Our results overlapped in places and diverged in places. But we’ve reached a shared sense that topic modeling can enrich the history of literary scholarship by revealing trends that are presently invisible.

What is a topic?
A “topic model” assigns every word in every document to one of a given number of topics. Every document is modeled as a mixture of topics in different proportions. A topic, in turn, is a distribution of words—a model of how likely given words are to co-occur in a document. The algorithm (called LDA) knows nothing “meta” about the articles (when they were published, say), and it knows nothing about the order of words in a given document.

100 topics from PMLA.
This is a picture of 5940 articles from PMLA, showing the changing presence of each of 100 "topics" in PMLA over time. (Click through to enlarge; a longer list of topic keywords is here.) For example, the most probable words in the topic arbitrarily numbered 59 in the model visualized above are, in descending order:

che gli piu nel lo suo sua sono io delle perche questo quando ogni mio quella loro cosi dei

This is not a “topic” in the sense of a theme or a rhetorical convention. What these words have in common is simply that they’re basic Italian words, which appear together whenever an extended Italian text occurs. And this is the point: a “topic” is neither more nor less than a pattern of co-occurring words.

Nonetheless, a topic like topic 59 does tell us about the history of PMLA. The articles where this topic achieved its highest proportion were:

Antonio Illiano, “Momenti e problemi di critica pirandelliana: L’umorismo, Pirandello e Croce, Pirandello e Tilgher,” PMLA 83 no. 1 (1968): pp. 135-143
Domenico Vittorini, “I Dialogi ad Petrum Histrum di Leonardo Bruni Aretino (Per la Storia del Gusto Nell’Italia del Secolo XV),” PMLA 55 no. 3 (1940): pp. 714-720
Vincent Luciani, “Il Guicciardini E La Spagna,” PMLA 56 no. 4 (1941): pp. 992-1006

And here’s a plot of the changing proportions of this topic over time, showing moving 1-year and 5-year averages:

topic59lineWe see something about PMLA that is worth remembering for the history of criticism, namely, that it has embedded Italian less and less frequently in its language since midcentury. (The model shows that the same thing is true of French and German.)

What can topics tell us about the history of theory?
Of course a topic can also be a subject category—modeling PMLA, we have found topics that are primarily “about Beowulf” or “about music.” Or a topic can be a group of words that tend to co-occur because they’re associated with a particular critical approach.

Here, for instance, we have a topic from Underwood’s 150-topic model associated with discussions of pattern and structure in literature. We can characterize it by listing words that occur more commonly in the topic than elsewhere, or by graphing the frequency of the topic over time, or by listing a few articles where it’s especially salient.

Topic 109 from Underwood's model of 150 topics.
At first glance this topic might seem to fit neatly into a familiar story about critical history. We know that there was a mid-twentieth-century critical movement called “structuralism,” and the prominence of “structure” here might suggest that we’re looking at the rise and fall of that movement. In part, perhaps, we are. But the articles where this topic is most prominent are not specifically “structuralist.” In the top four articles, Ferdinand de Saussure, Claude Lévi-Strauss, and Northrop Frye are nowhere in evidence. Instead these articles appeal to general notions of symmetry, or connect literary patterns to Neoplatonism and Renaissance numerology.

By forcing us to attend to concrete linguistic practice, topic modeling gives us a chance to bracket our received assumptions about the connections between concepts. While there is a distinct mid-century vogue for structure, it does not seem strongly associated with the concepts that are supposed to have motivated it (myth, kinship, language, archetype). And it begins in the 1940s, a decade or more before “structuralism” is supposed to have become widespread in literary studies. We might be tempted to characterize the earlier part of this trend as “New Critical interest in formal unity” and the latter part of it as “structuralism.” But the dividing line between those rationales for emphasizing pattern is not evident in critical vocabulary (at least not at this scale of analysis).

This evidence doesn’t necessarily disprove theses about the history of structuralism. Topic modeling might not reveal varying “rationales” for using a word even if those rationales did vary. The strictly linguistic character of this technique is a limitation as well as a strength: it’s not designed to reveal motivation or conflict. But since our histories of criticism are already very intellectual and agonistic, foregrounding the conscious beliefs of contending critical “schools,” topic modeling may offer a useful corrective. This technique can reveal shifts of emphasis that are more gradual and less conscious than the ones we tend to celebrate.

It may even reveal shifts of emphasis of which we were entirely unaware. “Structure” is a familiar critical theme, but what are we to make of this?

Topic 79 from Underwood's 150-topic model.A fuller list of terms included in this topic would include “character”, “fact,” “choice,” “effect,” and “conflict.” Reading some of the articles where the topic is prominent, it appears that in this topic “point” is rarely the sort of point one makes in an argument. Instead it’s a moment in a literary work (e.g., “at the point where the rain occurs,” in Robert apRoberts 379). Apparently, critics in the 1960s developed a habit of describing literature in terms of problems, questions, and significant moments of action or choice; the habit intensified through the early 1980s and then declined. This habit may not have a name; it may not line up neatly with any recognizable school of thought. But it’s a fact about critical history worth knowing.

Note that this concern with problem-situations is embodied in common words like “way” and “cannot” as well as more legible, abstract terms. Since common words are often difficult to interpret, it can be tempting to exclude them from the modeling process. It’s true that a word like “the” isn’t likely to reveal much. But subtle, interesting rhetorical habits can be encoded in common words. (E.g. “itself” is especially common in late-20c theoretical topics.)

We don’t imagine that this brief blog post has significantly contributed to the history of criticism. But we do want to suggest that topic modeling could be a useful resource for that project. It has the potential to reveal shifts in critical vocabulary that aren’t well described, and that don’t fit our received assumptions about the history of the discipline.

Why browse topics as a network?
The fact that a word is prominent in topic A doesn’t prevent it from also being prominent in topic B. So certain generalizations we might make about an individual topic (for instance, that Italian words decline in frequency after midcentury) will be true only if there’s not some other “Italian” topic out there, picking up where the first one left off.

For that reason, interpreters really need to survey a topic model as a whole, instead of considering single topics in isolation. But how can you browse a whole topic model? We’ve chosen relatively small numbers of topics, but it would not be unreasonable to divide literary scholarship into, say, 500 topics. Information overload becomes a problem.

A browsable image map of 150 topics from PMLA. After you click through you can mouseover (or click) individual topics for more information.

A browsable image map of 150 topics from PMLA. After you click through you can mouseover (or click) individual topics for more information.

We’ve found network graphs useful here. Click on the image of the network on the right to browse Underwood’s 150-topic model. The size of each node (roughly) indicates the number of words in the topic; color indicates the average date of words. (Blue topics are older; yellow topics are more recent.) Topics are linked to each other if they tend to appear in the same articles. Topics have been labeled with their most salient word—unless that word was already taken for another topic, or seemed misleading. Mousing over a topic reveals a list of words associated with it; with most topics it’s also possible to click through for more information.

The structure of the network makes a loose kind of sense. Topics in French and German form separate networks floating free of the main English structure. Recent topics tend to cluster at the bottom of the page. And at the bottom, historical and pedagogical topics tend to be on the left, while formal, phenomenological, and aesthetic categories tend to be on the right.

But while it’s a little eerie to see patterns like this emerge automatically, we don’t advise readers to take the network structure too seriously. A topic model isn’t a network, and mapping one onto a network can be misleading. For instance, topics that are physically distant from each other in this visualization are not necessarily unrelated. Connections below a certain threshold go unrepresented.

Goldstone's 100-topic model of PMLA; click through to enlarge.

Goldstone’s 100-topic model of PMLA; click through to enlarge.

Moreover, as you can see by comparing illustrations in this post, a little fiddling with dials can turn the same data into networks with rather different shapes. It’s probably best to view network visualization as a convenience. It may help readers browse a model by loosely organizing topics—but there can be other equally valid ways to organize the same material.

How did our models differ?
The two models we’ve examined so far in this post differ in several ways at once. They’re based on different spans of PMLA‘s print run (1890–1999 and 1924–2006). They were produced with different software. Perhaps most importantly, we chose different numbers of topics (100 and 150).

But the models we’re presenting are only samples. Goldstone and Underwood each produced several models of PMLA, changing one variable at a time, and we have made some closer apples-to-apples comparisons.

Broadly, the conclusion we’ve reached is that there’s both a great deal of fluidity and a great deal of consistency in this process. The algorithm has to estimate parameters that are impossible to calculate exactly. So the results you get will be slightly different every time. If you run the algorithm on the same corpus with the same number of topics, the changes tend to be fairly minor. But if you change the number of topics, you can get results that look substantially different.

On the other hand, to say that two models “look substantially different” isn’t to say that they’re incompatible. A jigsaw puzzle cut into 100 pieces looks different from one with 150 pieces. If you examine them piece by piece, no two pieces are the same—but once you put them together you’re looking at the same picture. In practice, there was a lot of overlap between our models; on the older end of the spectrum you often see a topic like “evidence fact,” while the newer end includes topics that foreground narrative, rhetoric, and gender. Some of the more surprising details turned out to be consistent as well. For instance, you might expect the topic “literary literature” to skew toward the older end of the print run. But in fact this is a relatively recent topic in both of our models, associated with discussion of canonicity. (Perhaps the owl of Minerva flies only at dusk?)

Contrasting models: a short example
While some topics look roughly the same in all of our models, it’s not always possible to identify close correlates of that sort. As you vary the overall number of topics, some topics seem to simply disappear. Where do they go? For example, there is no exact counterpart in Goldstone’s model to that “structure” topic in Underwood’s model. Does that mean it is a figment? Underwood isolated the following article as the most prominent exemplar:

Robert E. Burkhart, The Structure of Wuthering Heights, Letter to the Editor, PMLA 87 no. 1 (1972): 104–5. (Incidentally, jstor has miscategorized this as a “full-length article.”)

Goldstone’s model puts more than half of Burkhart’s comment in three topics:

0.24 topic 38 time experience reality work sense form present point world human process structure concept individual reader meaning order real relationship

0.13 topic 46 novels fiction poe gothic cooper characters richardson romance narrator story novelist reader plot novelists character reade hero heroine drf

0.12 topic 13 point reader question interpretation meaning make reading view sense argument words word problem makes evidence read clear text readers

The other prominent documents in Underwood’s 109 are connected to similar topics in Goldstone’s model. The keywords for Goldstone’s topic 38, the top topic here, immediately suggest an affinity with Underwood’s topic 109. Now compare the time course of Goldstone’s 38 with Underwood’s 109 (the latter is above):

It is reasonable to infer that some portion of the words in Underwood’s “structure” topic are absorbed in Goldstone’s “time experience” topic. But “time experience reality work sense” looks less like vocabulary for describing form (although “form” and “structure” are included in it, further down the list; cf. the top words for all 100 topics), and more like vocabulary for talking about experience in generalized ways—as is also suggested by the titles of some articles in which that topic is substantially present:

“The Vanishing Subject: Empirical Psychology and the Modern Novel”
“Metacommentary”
“Toward a Modern Humanism”
“Wordsworth’s Inscrutable Workmanship and the Emblems of Reality”

This version of the topic is no less “right” or “wrong” than the one in Underwood’s model. They both reveal the same underlying evidence of word use, segmented in different but overlapping ways. Instead of focusing our vision on affinities between “form” and “structure”, Goldstone’s 100-topic model shows a broader connection between the critical vocabulary of form and structure and the keywords of “humanistic” reflection on experience.

The most striking contrast to these postwar themes is provided by a topic which dominates in the prewar period, then gives way before “time experience” takes hold. Here are box plots by ten-year intervals of the proportions of another topic, Goldstone’s topic 40, in PMLA articles:

Underwood’s model shows a similar cluster of topics centering on questions of evidence and textual documentation, which similarly decrease in frequency. The language of PMLA has shown a consistently declining interest in “evidence found fact” in the era of the postwar research university.

So any given topic model of a corpus is not definitive. Each variation in the modeling parameters can produce a new model. But although topic models vary, models of the same corpus remain fundamentally consistent with each other.

Using LDA as evidence
It’s true that a “topic model” is simply a model of how often words occur together in a corpus. But information of that kind has a deeper significance than we might at first assume. A topic model doesn’t just show you what people are writing about (a list of “topics” in our ordinary sense of the word). It can also show you how they’re writing. And that “how” seems to us a strong clue to social affinities—perhaps especially for scholars, who often identify with a methodology or critical vocabulary. To put this another way, topic modeling can identify discourses as well as subject categories and embedded languages. Naturally we also need other kinds of evidence to produce a history of the discipline, including social and institutional evidence that may not be fully manifest in discourse. But the evidence of topic modeling should be taken seriously.

As you change the number of topics (and other parameters), models provide different pictures of the same underlying collection. But this doesn’t mean that topic modeling is an indeterminate process, unreliable as evidence. All of those pictures will be valid. They are taken (so to speak) at different distances, and with different levels of granularity. But they’re all pictures of the same evidence and are by definition compatible. Different models may support different interpretations of the evidence, but not interpretations that absolutely conflict. Instead the multiplicity of models presents us with a familiar choice between “lumping” or “splitting” cultural phenomena—a choice where we have long known that multiple levels of analysis can coexist. This multiplicity of perspective should be understood as a strength rather than a limitation of the technique; it is part of the reason why an analysis using topic modeling can afford a richly detailed picture of an archive like PMLA.

Appendix: How did we actually do this?
The PMLA data obtained from JSTOR was independently processed by Goldstone and Underwood for their different LDA tools. This created some quantitative subtleties that we’ve saved for this appendix to keep this post accessible to a broad audience. If you read closely, you’ll notice that we sometimes talk about the “probability” of a term in a topic, and sometimes about its “salience.” Goldstone used MALLET for topic modeling, whereas Underwood used his own Java implementation of LDA. As a result, we also used slightly different formulas for ranking words within a topic. MALLET reports the raw probability of terms in each topic, whereas Underwood’s code uses a slightly more complex formula for term salience drawn from Blei & Lafferty (2009). In practice, this did not make a huge difference.

MALLET also has a “hyperparameter optimization” option which Goldstone’s 100-topic model above made use of. Before you run screaming, “hyperparameters” are just dials that control how much fuzziness is allowed in a topic’s distribution across words (beta) or across documents (alpha). Allowing alpha to vary allows greater differentiation between the sizes of large topics (often with common words), and smaller (often more specialized) topics. (See “Why Priors Matter,” Wallach, Mimno, and McCallum, 2009.) In any event, Goldstone’s 100-topic model used hyperparameter optimization; Underwood’s 150-topic model did not. A comparison with several other models suggests that the difference between symmetric and asymmetric (optimized) alpha parameters explains much of the difference between their structures when visualized as networks.

Goldstone’s processing scripts are online in a github repository. The same repository includes R code for making the plots from Goldstone’s model. Goldstone would also like to thank Bob Gerdes of Rutgers’s Office of Instructional and Research Technology for support for running mallet on the university’s apps.rutgers.edu server, Ben Schmidt for helpful comments at a THATCamp Theory session, and Jon Goodwin for discussion and his excellent blog posts on topic-modeling jstor data.

Underwood’s network graphs were produced by measuring Pearson correlations between topic distributions (across documents) and then selecting the strongest correlations as network edges using an algorithm Underwood has described previously. That data structure was sent to Gephi. Underwood’s Java implementation of LDA, as well as his PMLA model, and code for translating a model into a network, are on github, although at this point he can’t promise a plug-and-play workflow. Underwood would like to thank Matt Jockers for convincing him to try topic modeling (see Matt’s impressive, detailed model of the nineteenth-century novel) and Michael Simeone for convincing him to try force-directed network graphs. David Mimno kindly answered some questions about the innards of MALLET.

[Cross-posted: andrewgoldstone.com, Arcade (to appear).]

[Edit (AG) 12/12/16: 10×10 grid image now with topics in numerical order. Original version still available: overview.png.]