19c 20c Bayesian topic modeling disciplinary history interpretive theory visualization

What can topic models of PMLA teach us about the history of literary scholarship?

by Andrew Goldstone and Ted Underwood

Of all our literary-historical narratives it is the history of criticism itself that seems most wedded to a stodgy history-of-ideas approach—narrating change through a succession of stars or contending schools. While scholars like John Guillory and Gerald Graff have produced subtler models of disciplinary history, we could still do more to complicate the narratives that organize our discipline’s understanding of itself.

A browsable network based on Underwood's model of PMLA. Click through, then mouse over or click on individual topics.
A browsable network based on Underwood's model of PMLA. Click through, then mouse over or click on individual topics.
The archive of scholarship is also, unlike many twentieth-century archives, digitized and available for “distant reading.” Much of what we need is available through JSTOR’s Data for Research API. So last summer it occurred to a group of us that topic modeling PMLA might provide a new perspective on the history of literary studies. Although Goldstone and Underwood are writing this post, the impetus for the project also came from Natalia Cecire, Brian Croxall, and Roger Whitson, who may do deeper dives into specific aspects of this archive in the near future.

Topic modeling is a technique that automatically identifies groups of words that tend to occur together in a large collection of documents. It was developed about a decade ago by David Blei among others. Underwood has a blog post explaining topic modeling, and you can find a practical introduction to the technique at the Programming Historian. Jonathan Goodwin has explained how it can be applied to the word-frequency data you get from JSTOR.

Obviously, PMLA is not an adequate synecdoche for literary studies. But, as a generalist journal with a long history, it makes a useful test case to assess the value of topic modeling for a history of the discipline.

Goldstone and Underwood each independently produced several different models of PMLA, using different software, stopword lists, and numbers of topics. Our results overlapped in places and diverged in places. But we’ve reached a shared sense that topic modeling can enrich the history of literary scholarship by revealing trends that are presently invisible.

What is a topic?
A “topic model” assigns every word in every document to one of a given number of topics. Every document is modeled as a mixture of topics in different proportions. A topic, in turn, is a distribution of words—a model of how likely given words are to co-occur in a document. The algorithm (called LDA) knows nothing “meta” about the articles (when they were published, say), and it knows nothing about the order of words in a given document.

100 topics from PMLA.
This is a picture of 5940 articles from PMLA, showing the changing presence of each of 100 "topics" in PMLA over time. (Click through to enlarge; a longer list of topic keywords is here.) For example, the most probable words in the topic arbitrarily numbered 59 in the model visualized above are, in descending order:

che gli piu nel lo suo sua sono io delle perche questo quando ogni mio quella loro cosi dei

This is not a “topic” in the sense of a theme or a rhetorical convention. What these words have in common is simply that they’re basic Italian words, which appear together whenever an extended Italian text occurs. And this is the point: a “topic” is neither more nor less than a pattern of co-occurring words.

Nonetheless, a topic like topic 59 does tell us about the history of PMLA. The articles where this topic achieved its highest proportion were:

Antonio Illiano, “Momenti e problemi di critica pirandelliana: L’umorismo, Pirandello e Croce, Pirandello e Tilgher,” PMLA 83 no. 1 (1968): pp. 135-143
Domenico Vittorini, “I Dialogi ad Petrum Histrum di Leonardo Bruni Aretino (Per la Storia del Gusto Nell’Italia del Secolo XV),” PMLA 55 no. 3 (1940): pp. 714-720
Vincent Luciani, “Il Guicciardini E La Spagna,” PMLA 56 no. 4 (1941): pp. 992-1006

And here’s a plot of the changing proportions of this topic over time, showing moving 1-year and 5-year averages:

topic59lineWe see something about PMLA that is worth remembering for the history of criticism, namely, that it has embedded Italian less and less frequently in its language since midcentury. (The model shows that the same thing is true of French and German.)

What can topics tell us about the history of theory?
Of course a topic can also be a subject category—modeling PMLA, we have found topics that are primarily “about Beowulf” or “about music.” Or a topic can be a group of words that tend to co-occur because they’re associated with a particular critical approach.

Here, for instance, we have a topic from Underwood’s 150-topic model associated with discussions of pattern and structure in literature. We can characterize it by listing words that occur more commonly in the topic than elsewhere, or by graphing the frequency of the topic over time, or by listing a few articles where it’s especially salient.

Topic 109 from Underwood's model of 150 topics.
At first glance this topic might seem to fit neatly into a familiar story about critical history. We know that there was a mid-twentieth-century critical movement called “structuralism,” and the prominence of “structure” here might suggest that we’re looking at the rise and fall of that movement. In part, perhaps, we are. But the articles where this topic is most prominent are not specifically “structuralist.” In the top four articles, Ferdinand de Saussure, Claude Lévi-Strauss, and Northrop Frye are nowhere in evidence. Instead these articles appeal to general notions of symmetry, or connect literary patterns to Neoplatonism and Renaissance numerology.

By forcing us to attend to concrete linguistic practice, topic modeling gives us a chance to bracket our received assumptions about the connections between concepts. While there is a distinct mid-century vogue for structure, it does not seem strongly associated with the concepts that are supposed to have motivated it (myth, kinship, language, archetype). And it begins in the 1940s, a decade or more before “structuralism” is supposed to have become widespread in literary studies. We might be tempted to characterize the earlier part of this trend as “New Critical interest in formal unity” and the latter part of it as “structuralism.” But the dividing line between those rationales for emphasizing pattern is not evident in critical vocabulary (at least not at this scale of analysis).

This evidence doesn’t necessarily disprove theses about the history of structuralism. Topic modeling might not reveal varying “rationales” for using a word even if those rationales did vary. The strictly linguistic character of this technique is a limitation as well as a strength: it’s not designed to reveal motivation or conflict. But since our histories of criticism are already very intellectual and agonistic, foregrounding the conscious beliefs of contending critical “schools,” topic modeling may offer a useful corrective. This technique can reveal shifts of emphasis that are more gradual and less conscious than the ones we tend to celebrate.

It may even reveal shifts of emphasis of which we were entirely unaware. “Structure” is a familiar critical theme, but what are we to make of this?

Topic 79 from Underwood's 150-topic model.A fuller list of terms included in this topic would include “character”, “fact,” “choice,” “effect,” and “conflict.” Reading some of the articles where the topic is prominent, it appears that in this topic “point” is rarely the sort of point one makes in an argument. Instead it’s a moment in a literary work (e.g., “at the point where the rain occurs,” in Robert apRoberts 379). Apparently, critics in the 1960s developed a habit of describing literature in terms of problems, questions, and significant moments of action or choice; the habit intensified through the early 1980s and then declined. This habit may not have a name; it may not line up neatly with any recognizable school of thought. But it’s a fact about critical history worth knowing.

Note that this concern with problem-situations is embodied in common words like “way” and “cannot” as well as more legible, abstract terms. Since common words are often difficult to interpret, it can be tempting to exclude them from the modeling process. It’s true that a word like “the” isn’t likely to reveal much. But subtle, interesting rhetorical habits can be encoded in common words. (E.g. “itself” is especially common in late-20c theoretical topics.)

We don’t imagine that this brief blog post has significantly contributed to the history of criticism. But we do want to suggest that topic modeling could be a useful resource for that project. It has the potential to reveal shifts in critical vocabulary that aren’t well described, and that don’t fit our received assumptions about the history of the discipline.

Why browse topics as a network?
The fact that a word is prominent in topic A doesn’t prevent it from also being prominent in topic B. So certain generalizations we might make about an individual topic (for instance, that Italian words decline in frequency after midcentury) will be true only if there’s not some other “Italian” topic out there, picking up where the first one left off.

For that reason, interpreters really need to survey a topic model as a whole, instead of considering single topics in isolation. But how can you browse a whole topic model? We’ve chosen relatively small numbers of topics, but it would not be unreasonable to divide literary scholarship into, say, 500 topics. Information overload becomes a problem.

A browsable image map of 150 topics from PMLA. After you click through you can mouseover (or click) individual topics for more information.
A browsable image map of 150 topics from PMLA. After you click through you can mouseover (or click) individual topics for more information.
We’ve found network graphs useful here. Click on the image of the network on the right to browse Underwood’s 150-topic model. The size of each node (roughly) indicates the number of words in the topic; color indicates the average date of words. (Blue topics are older; yellow topics are more recent.) Topics are linked to each other if they tend to appear in the same articles. Topics have been labeled with their most salient word—unless that word was already taken for another topic, or seemed misleading. Mousing over a topic reveals a list of words associated with it; with most topics it’s also possible to click through for more information.

The structure of the network makes a loose kind of sense. Topics in French and German form separate networks floating free of the main English structure. Recent topics tend to cluster at the bottom of the page. And at the bottom, historical and pedagogical topics tend to be on the left, while formal, phenomenological, and aesthetic categories tend to be on the right.

But while it’s a little eerie to see patterns like this emerge automatically, we don’t advise readers to take the network structure too seriously. A topic model isn’t a network, and mapping one onto a network can be misleading. For instance, topics that are physically distant from each other in this visualization are not necessarily unrelated. Connections below a certain threshold go unrepresented.

Goldstone's 100-topic model of PMLA; click through to enlarge.
Goldstone’s 100-topic model of PMLA; click through to enlarge.
Moreover, as you can see by comparing illustrations in this post, a little fiddling with dials can turn the same data into networks with rather different shapes. It’s probably best to view network visualization as a convenience. It may help readers browse a model by loosely organizing topics—but there can be other equally valid ways to organize the same material.

How did our models differ?
The two models we’ve examined so far in this post differ in several ways at once. They’re based on different spans of PMLA‘s print run (1890–1999 and 1924–2006). They were produced with different software. Perhaps most importantly, we chose different numbers of topics (100 and 150).

But the models we’re presenting are only samples. Goldstone and Underwood each produced several models of PMLA, changing one variable at a time, and we have made some closer apples-to-apples comparisons.

Broadly, the conclusion we’ve reached is that there’s both a great deal of fluidity and a great deal of consistency in this process. The algorithm has to estimate parameters that are impossible to calculate exactly. So the results you get will be slightly different every time. If you run the algorithm on the same corpus with the same number of topics, the changes tend to be fairly minor. But if you change the number of topics, you can get results that look substantially different.

On the other hand, to say that two models “look substantially different” isn’t to say that they’re incompatible. A jigsaw puzzle cut into 100 pieces looks different from one with 150 pieces. If you examine them piece by piece, no two pieces are the same—but once you put them together you’re looking at the same picture. In practice, there was a lot of overlap between our models; on the older end of the spectrum you often see a topic like “evidence fact,” while the newer end includes topics that foreground narrative, rhetoric, and gender. Some of the more surprising details turned out to be consistent as well. For instance, you might expect the topic “literary literature” to skew toward the older end of the print run. But in fact this is a relatively recent topic in both of our models, associated with discussion of canonicity. (Perhaps the owl of Minerva flies only at dusk?)

Contrasting models: a short example
While some topics look roughly the same in all of our models, it’s not always possible to identify close correlates of that sort. As you vary the overall number of topics, some topics seem to simply disappear. Where do they go? For example, there is no exact counterpart in Goldstone’s model to that “structure” topic in Underwood’s model. Does that mean it is a figment? Underwood isolated the following article as the most prominent exemplar:

Robert E. Burkhart, The Structure of Wuthering Heights, Letter to the Editor, PMLA 87 no. 1 (1972): 104–5. (Incidentally, jstor has miscategorized this as a “full-length article.”)

Goldstone’s model puts more than half of Burkhart’s comment in three topics:

0.24 topic 38 time experience reality work sense form present point world human process structure concept individual reader meaning order real relationship

0.13 topic 46 novels fiction poe gothic cooper characters richardson romance narrator story novelist reader plot novelists character reade hero heroine drf

0.12 topic 13 point reader question interpretation meaning make reading view sense argument words word problem makes evidence read clear text readers

The other prominent documents in Underwood’s 109 are connected to similar topics in Goldstone’s model. The keywords for Goldstone’s topic 38, the top topic here, immediately suggest an affinity with Underwood’s topic 109. Now compare the time course of Goldstone’s 38 with Underwood’s 109 (the latter is above):

It is reasonable to infer that some portion of the words in Underwood’s “structure” topic are absorbed in Goldstone’s “time experience” topic. But “time experience reality work sense” looks less like vocabulary for describing form (although “form” and “structure” are included in it, further down the list; cf. the top words for all 100 topics), and more like vocabulary for talking about experience in generalized ways—as is also suggested by the titles of some articles in which that topic is substantially present:

“The Vanishing Subject: Empirical Psychology and the Modern Novel”
“Toward a Modern Humanism”
“Wordsworth’s Inscrutable Workmanship and the Emblems of Reality”

This version of the topic is no less “right” or “wrong” than the one in Underwood’s model. They both reveal the same underlying evidence of word use, segmented in different but overlapping ways. Instead of focusing our vision on affinities between “form” and “structure”, Goldstone’s 100-topic model shows a broader connection between the critical vocabulary of form and structure and the keywords of “humanistic” reflection on experience.

The most striking contrast to these postwar themes is provided by a topic which dominates in the prewar period, then gives way before “time experience” takes hold. Here are box plots by ten-year intervals of the proportions of another topic, Goldstone’s topic 40, in PMLA articles:

Underwood’s model shows a similar cluster of topics centering on questions of evidence and textual documentation, which similarly decrease in frequency. The language of PMLA has shown a consistently declining interest in “evidence found fact” in the era of the postwar research university.

So any given topic model of a corpus is not definitive. Each variation in the modeling parameters can produce a new model. But although topic models vary, models of the same corpus remain fundamentally consistent with each other.

Using LDA as evidence
It’s true that a “topic model” is simply a model of how often words occur together in a corpus. But information of that kind has a deeper significance than we might at first assume. A topic model doesn’t just show you what people are writing about (a list of “topics” in our ordinary sense of the word). It can also show you how they’re writing. And that “how” seems to us a strong clue to social affinities—perhaps especially for scholars, who often identify with a methodology or critical vocabulary. To put this another way, topic modeling can identify discourses as well as subject categories and embedded languages. Naturally we also need other kinds of evidence to produce a history of the discipline, including social and institutional evidence that may not be fully manifest in discourse. But the evidence of topic modeling should be taken seriously.

As you change the number of topics (and other parameters), models provide different pictures of the same underlying collection. But this doesn’t mean that topic modeling is an indeterminate process, unreliable as evidence. All of those pictures will be valid. They are taken (so to speak) at different distances, and with different levels of granularity. But they’re all pictures of the same evidence and are by definition compatible. Different models may support different interpretations of the evidence, but not interpretations that absolutely conflict. Instead the multiplicity of models presents us with a familiar choice between “lumping” or “splitting” cultural phenomena—a choice where we have long known that multiple levels of analysis can coexist. This multiplicity of perspective should be understood as a strength rather than a limitation of the technique; it is part of the reason why an analysis using topic modeling can afford a richly detailed picture of an archive like PMLA.

Appendix: How did we actually do this?
The PMLA data obtained from JSTOR was independently processed by Goldstone and Underwood for their different LDA tools. This created some quantitative subtleties that we’ve saved for this appendix to keep this post accessible to a broad audience. If you read closely, you’ll notice that we sometimes talk about the “probability” of a term in a topic, and sometimes about its “salience.” Goldstone used MALLET for topic modeling, whereas Underwood used his own Java implementation of LDA. As a result, we also used slightly different formulas for ranking words within a topic. MALLET reports the raw probability of terms in each topic, whereas Underwood’s code uses a slightly more complex formula for term salience drawn from Blei & Lafferty (2009). In practice, this did not make a huge difference.

MALLET also has a “hyperparameter optimization” option which Goldstone’s 100-topic model above made use of. Before you run screaming, “hyperparameters” are just dials that control how much fuzziness is allowed in a topic’s distribution across words (beta) or across documents (alpha). Allowing alpha to vary allows greater differentiation between the sizes of large topics (often with common words), and smaller (often more specialized) topics. (See “Why Priors Matter,” Wallach, Mimno, and McCallum, 2009.) In any event, Goldstone’s 100-topic model used hyperparameter optimization; Underwood’s 150-topic model did not. A comparison with several other models suggests that the difference between symmetric and asymmetric (optimized) alpha parameters explains much of the difference between their structures when visualized as networks.

Goldstone’s processing scripts are online in a github repository. The same repository includes R code for making the plots from Goldstone’s model. Goldstone would also like to thank Bob Gerdes of Rutgers’s Office of Instructional and Research Technology for support for running mallet on the university’s server, Ben Schmidt for helpful comments at a THATCamp Theory session, and Jon Goodwin for discussion and his excellent blog posts on topic-modeling jstor data.

Underwood’s network graphs were produced by measuring Pearson correlations between topic distributions (across documents) and then selecting the strongest correlations as network edges using an algorithm Underwood has described previously. That data structure was sent to Gephi. Underwood’s Java implementation of LDA, as well as his PMLA model, and code for translating a model into a network, are on github, although at this point he can’t promise a plug-and-play workflow. Underwood would like to thank Matt Jockers for convincing him to try topic modeling (see Matt’s impressive, detailed model of the nineteenth-century novel) and Michael Simeone for convincing him to try force-directed network graphs. David Mimno kindly answered some questions about the innards of MALLET.

[Cross-posted:, Arcade (to appear).]

[Edit (AG) 12/12/16: 10×10 grid image now with topics in numerical order. Original version still available: overview.png.]

By tedunderwood

Ted Underwood is Professor of Information Sciences and English at the University of Illinois, Urbana-Champaign. On Twitter he is @Ted_Underwood.

64 replies on “What can topic models of PMLA teach us about the history of literary scholarship?”

You two did an absolutely phenomenal job on this, and it will become one of my new go-to examples of easily understandable yet nuanced approaches to interpreting topic models. I do have two questions for Ted regarding the graphs: 1) How exactly did you “count words” to create the node sizes, and what did you do to calculate the “age” of a topic?

Also, a suggestion: It would be great to see a comparison of topics that are not correlated across documents, but correlated across time, comparing correlations in the time series between both of your topic model outputs. Thanks again for a great post.

Thanks for the kind words, Scott. The counting-words part itself is straightforward. Each token in the corpus is actually assigned to some topic, so just add em up and … you have something that is in a real sense a “size.” In my model it ranged from 17198 to 561844. I then normalized by dividing them by the size of the biggest, so it became a scale from .03 to 1. (I think normalizing made no difference, but, whatever.)

Then, for visualizing, I arbitrarily mapped that scale onto a range of node sizes in Gephi (15 to 40, I think). You will rapidly observe that 1/.03 ≠ 40/15, which is why I say node sizes are loosely proportional to topic sizes. Very loosely, esp. at the bottom. But I just didn’t think people wanted to be looking at a pinhead-sized node. It would also complicate labeling.

With age, it’s literally the mean date of an average word in the topic. Or to put that more algorithmically: take the vector of a topic’s distribution across documents (the number of words in that topic for each of D documents). Divide the whole vector by the total size of the topic, so now you’ve got a normalized unit vector saying what percentage of words in the topic come from each document. Now multiply each element of that vector by the date of the corresponding document, and add ’em up. You’ve got a weighted average date for the topic.

Correlation across time would be interesting. It’s visually apparent that topics are clustering in a way strongly inflected by time. How strongly, I don’t know.

I’ve thought about the correlation-across-time idea too, Scott. Just didn’t get round to running numbers for this post. With both correlation across documents and corr. across time, I think the large negatives would be as interesting as the large positives: the large positives seem to me likely to tell you which topics might “fuse” if you lowered the number of topics in the model. In that sense you could just as well keep turning the dial marked “N.” But negative correlations might expose antinomies like the one we conjectured in talking about the difference between pre-WW2 “philology” topics and post-WW2 “literary interpretation” topics. Let me try some numbers and see if I can get a quick-and-dirty time-correlation network going.

Factoid: in my hundred-topic model, the average yearly proportions of topic 40 “evidence found fact time note part early professor” and topic 13 “point reader question interpretation meaning” have r = -0.85

Wow, that’s a negative correlation. If it’s even a third as strong as that on an article-to-article basis … Or, heck, at n=5940, it might not even have to be remotely that strong to be interesting.

If you give full weight to both “shocking” and “unsurprising,” I’d agree. This is not actually something that lit people know about our history. We tend to imagine that the constitution of our discipline as “interpretive” rather than “evidentiary” is a time-honored “humanistic value.”

E.g., just try introducing quantitative methods and listen to what people say. They don’t say “hey, we just spent the last century convincing people that humanists talk about interpretation rather than evidence.” They pretend that you’re introducing some unheard-of innovation.

Having trouble with the nesting myself here. Doing everything seat-of-pants, so this could all be GIGO. So per-document correlations are funny because they are so heavily weighted into the origin: most docs have very little of either topic. For my topics 40 and 13 that we’re discussing, the scatterplot looks like a right triangle. The per-doc correlation is r = -0.06. I don’t know whether to say this means we shouldn’t pay too much attention to the per-doc correlation or whether my “factoid” turns out to be misleading. Anyway, need more time to think about the best way to present and interpret.

Also: I am doing correlations of doc proportions, not of feature counts assigned to topics (which is how the networks visualized above are done).

Interesting. It could be that time is the way to do this. E.g., if “point reader interpretation” was replacing “evidence found” within some semi-thematic subset of the corpus, thinking article-by-article would be misleading. Both topics would be associated with the same articles, in a sense. (Let’s say, articles that are kind of argue-y or epistemolog-ish.) But they’d be negatively correlated over time. Cool lead, anyway.

Here’s what those topics whose series of yearly average proportions in documents have r > 0.5 look like as a network: time-corr.png. That’s my 100-topic model, edges weighted by r. Need help thinking about how to visualize large negative r relationships.

Nothing leaps out as strikingly different from the document-by-document correlations used in the networks in the post, but there’s definitely variation. One thing the visualization suggests is a striking periodization of the model, reflected in the two large disconnected components of the network, which, on first glace, look like topics that are prominent either in the 1890-1940 period or in the 1970-1999 period.

Though I sometimes wonder whether LDA is sensitive enough to language change that this kind of thing might emerge even with thematically and generically varying texts.

That’s a pretty cool visualization. I’ll have to try similar kinds of correlation on other corpora (containing multiples genres) to see whether we get the same periodization effect. My guess is that we don’t, and that the division in your viz. is telling us something substantive about 20c literary scholarship.

What’s interesting to me is what *doesn’t* change as well as what does. It’s dramatic that you end up with two completely separate networks, but honestly your first network already sort of looked like it was about to undergo mitosis. It’s actually even more surprising to me that so much of the associative structure is retained when we subtract so much document info. E.g. “moral character” and “god paradise” are still associated with each other — now purely on the basis of their temporal profiles.

But actually this shouldn’t surprise me, because it’s a consequence of what Ryan Heuser and I discovered last year, which is that conceptually related words do actually track each other over time. In fact I demonstrated in my paper for the 2011 Chicago Colloquium that if you topic-model words in one century, words in that are the same topic tend to correlate with each other on the time axis — significantly — even in the next century.

There is a correlated topic model variant of the LDA algorithm. I believe it’s implemented in either the “lda” or “topicmodels” R packages. I have only experimented with dynamic topic modeling, which I think is interesting to compare with time-series data from regular LDA models (that’s what John Laudun, Clai Rice, and I are doing with folklore journals in a similar project we’re working on at the moment). But I haven’t tried correlated topic modeling, which seems like it might add something to what is being discussed here.

Yes. Or dynamic topic modeling could help. If my hypothesis above (about “argue-y articles”) was right, DTM might actually show “evidence found” evolving into “point reader.” It’s designed to do that. On the other hand, I have to confess I’m wary of specialized topic modeling algorithms. This might also be a place where it makes sense to let go of numbers and use our eyes. Sometimes I feel we’re like kids who just got a new pair of rollerskates and are so thrilled with them that we also want to wear them inside the house and use them to walk upstairs.

@Ted, a point I’ve made before, I know, so at risk of belaboring a point: There is no epistemological ground to value one instantiation of topic modeling over another, unless there’s some situational performance-based evidence underlying it. The reasons I can see to prefer vanilla LDA are two: Temporal (it came first), and Parsimonious (it’s less complex). I’m not convinced that either of those are good enough reasons to value it over any other model.

A downside of specialized models is the risk of weighing one element (time, correlation, authorship, etc.) over other equally valid elements. However, it seems like an incomplete view via specialized topic models is still legitimate compared to an even more incomplete view via the vanilla implementation.

That being said, I very much agree with your point about it being time to use your eyes. The algorithms themselves are uncertain enough, and still incomplete proxies, that a difference of .2 in the r values really doesn’t say anything. It’s enough to point out something interesting and, as you’ve mentioned before, that’s when it’s time to start using your eyes and your brain to make more nuanced connections.

I think we’re in agreement, Scott, because I don’t pretend to have any epistemological basis for my wariness of specialized algorithms. It’s a rhetorical/practical rather than algorithmic wariness. I just suspect I won’t be able to convince readers to stay with me through the kinds of explanation and argument that would be required. Natalia Cecire said something similar on another thread.

Lots of things here to respond to. I think that disciplinary change is often explained through post hoc generalizations that smooth over a highly contingent and chaotic process and that topic modeling would be likely to reveal this. Your structuralist topic illustrates this in an interesting way, and it also reminds me of my suspicion that most literary critics’ practice is largely unaffected by larger disciplinary trends (except, perhaps, in rote citations and token gestures that topic modeling would be unlikely to detect).

How about your stop-words? Did you use the default MALLET list (Andrew) or one of your own devising (Ted)? Did you try stop-lists in French, German, Norwegian, Italian, etc.? I haven’t done that with mine yet, but I have included a long list of common prepositions and articles in other languages.

Also, Scott, I’m not quite sure I understand your suggestion about correlating topics over time. Are you referring to a network graph?

I’m referring to correlations solely of the time-series frequency data of one topic to the time-series frequency data of another, such that you can see which topics seem to rise and fall together, or which tend to precede which others.

So, a superimposed graph in which your brain does the correlation and not some functional comparison? The big faceted graph isn’t superimposed, but it does allow you to compare trends in topics over time, if that’s what you have in mind.

@Jonathan Doesn’t need to be your brain, it can be a functional comparison, but yes. Algorithmic methods would make it easier to account for differences in scale or slight transpositions across time.

Andrew has included the stoplist he used in his github repository (linked above). It’s not a standard one; e.g., we include a lot of personal names but I believe we were less aggressive in subtracting common/function words than is sometimes the practice in other disciplines. I would put my stoplist up in github, except I actually made this model 4 months ago, and I’ve been adding to the stoplist since then, so it’s no longer totally accurate. E.g., you’ll see “univ.” and “trans.” in some topics, which I have since decided to stoplist-out.

Our stoplists were very close except that I believe Andrew did include common words in French and German, etc., in his stoplist. I didn’t. I’ve since decided that Andrew’s approach is preferable. In a multilingual corpus you want multilingual stopwords.

Thanks, Matt. It actually hadn’t occurred to me to publish in print at all, though it’s conceivable. With a problem of this scale, browsing can be more fun than argument. And for browsing, it’s awfully nice to be able to use color, scroll, click to enlarge, and so on. But I suppose one could pull out a lot of specific examples to support a specific argument and refer people to the web for the rest. Andrew and I should obviously confer about it.

Certainly lots of things here that work well as an interactive project; maybe you could make those available as a kind of super appendix to an article?

I like projects, I really do, but it seems me that they’re not a replacement for formal writeups. *Someone* needs to pore over the thing and see what’s there; if not you guys, who? A good article could also serve as a catalyst for others to spend more time with the data. And of course it would bring more attention to this sort of work among our less methodologically progressive colleagues. My guess would be that PMLA is dying to publish stuff like this.

I agree that we need articles, in mainstream journals, and that’s basically my agenda for the next six months. I’ve started thinking about how to frame them.

My own feeling is that “exploring a topic model” turns out to be a fairly weak frame for an article. Too broad, and too easy for people to fix their attention on parts of the model that aren’t actually surprising. I think I’m more inclined to take a particular strand or two (or three) of a model (like the “problem-question-situation” topic I highlight above) and actually pursue that historical phenomenon in depth, with a thesis that ends up being primarily about that in particular. The article might only spend about 30% of its time pointing out that the original lead came from topic modeling. Anyway, that’s the direction I’m leaning at the moment.

One thing I’m definitely not going to do is spend a lot of time polishing an interactive version of the model (a “project,” as you say). Our visualizations here are rough-edged and have stone-age kinds of interactivity. But I agree with you 100%; DH already has a lot of projects. We need articles.

Looking over the rest of your conversation with Matt, here’s what I see when I put on my Utopian Spectacles of Prophesy: A bunch of scholars of varying skills working together on a given corpus. Some are doing topics models and such on the whole corpus and place results in a common (online) repository. Folks look things over and “matters of concern” are identified (to use a phrase from Bruno Latour) and others investigate them by reading a bunch of texts to see what’s actually there. Depending on this that and the other some of this joint work will prove really interesting and articles will be written that combine the corpus computational work with the more traditional sleuthing; those articles will be signed by two, three, a half-dozen on so authors, whatever it takes to get the work done. These articles will be discussed by the wider community etc etc more corpus work more actual reading etc etc and in time some of these matters of concern will make their way toward becoming what Latour calls “matters of fact” (everyone else simply calls them facts).

Meanwhile, still other folks will be developing computational tools that are more fine-grained (‘smart’) than corpus tools, but not so sophisticated as, you know, actual human readers.

Thank you, Matt! Part of the reason we did this was to get some ways to think about a whole topic model out there, because we believe you need to know about a whole model in order to make arguments based on some part of it. So for me any open-ended browsing possibilities are a side benefit; like Ted, I think arguments within a major scholarly conversation are where it’s at. But that also means this post isn’t yet there, and we see multiple possible arguments emerging from this, focused on smaller subgroups of topics. There may be a larger methodological argument like the one Jonathan sketched in his comment (similar to what we say at the start of the post) that could be refined into something that stands on its own.

I think Jonathan is right on about the big point here, and probably in other forms of text mining — it reveals that our histories have been relying on “post hoc generalizations that smooth over a highly contingent and chaotic process.”

Beyond the history of criticism, I might say the same thing about literary history itself. For me the issue is especially that we haven’t had any way of representing/arguing about continuous change. We rely on a discretized rhetoric, most obviously by organizing history into “periods”. But our insistence on discontinuity goes beyond period boundaries to a whole rhetoric of “turns,” “movements,” “formations,” “case studies,” etc. Because we just haven’t had any way to represent continuous change.

E.g. that long slow trend of decline in Andrew’s “evidence found” graph, spanning more than a century. That’s not going to line up with any of the “movements” or “schools” we like to invoke as explanations for change.

Fascinating work, guys. A couple of anecdotal comments.

First, about Underwood’s “structure” topic (109). Forgetting about Goldstone’s (38) for a moment I think that, yes, that’s mostly about New Critical interest in formal matters. The fact is, the structuralist moment was rather brief, a decade at most spanning the later 60s and early 70s, at which time it passed over into deconstruction, post-structuralism, and post-modernism. And, despite a handful of detailed analytical cases (by Jakobson, Barthes, Ruwet, and others) it was never really about close textual analysis. It was more of a philosophical stance.

I’m quite familiar with the Peterson article (“Critical Calculations: Measure and Symmetry in Literature”) because it discusses one of my hobby horses, ring form. Ring form is a kind of parallelism in which items are ordered in the text as follows: A, B, C … C’, B’, A’. In the small it gives you a figure like chiasmus and in the large it gives you a pattern of the sort that people wonder whether it’s really there or whether it’s a critical hallucination. The anthropologist Mary Douglas devoted the last decade of her career to it and discussed it’s workings in the Old Testament, the Iliad, but also in Tristram Shandy (Thinking in Circles, Yale 2007). That is, it is about temporal experience, and that, I believe, puts us in Goldstone’s 38.

We’ll find Wuthering Heights there as well. As you’ll recall, it’s told through a complex double narration in which Nelly Dean tells the story to Lockwood who, in turn, tells it to us. The result is that the reader’s temporal experience of the events in the story is quite at variance from the order in which they happened. The novel opens quite late in the sequence of events, then goes back to the relatively distant past and brings us back to a ‘present’, from which it then marches to the conclusion of the story. Once again, temporal order.

The temporal nature of literary experience makes temporal order an obvious focus for formal analysis.

A final observation. Perhaps the pre-war interest in evidence reflects the rise of New Criticism, with its emphasis on providing textual evidence for arguments. Once the point had been made and the method had become routine it was no longer necessary to hammer home the methodological point.

As I said, fascinating stuff. I’m looking forward to reading more.

Hi Ted: Your comment seems a bit truncated from the one that arrived in email, and so the stuff I wanted to respond to is missing. If you don’t mind, I’ll respond to the missing material.

You mentioned that you and Andrew had tossed around “philology” in thinking about the pre-war emphasis on philology. I think you should keep tossing it around, as it is apt. Every once in awhile I ask myself: When did the profession become obsessed with scouting out the meanings of texts? The last time I did that I called up my old undergraduate mentor, Dick Macksey at Hopkins, and asked him what you’d find in PMLA before WWII and going back into the 19th century. His answer: philology and history, which is the other thing that was in your initial reply and has now disappeared.

It’s really only in the last half-century or so that linguistics has become firmly established as an independent intellectual discipline, and, love them or not, Chomsky’s ideas had a lot to do with that. Before that linguists were philologists or anthropologists. And the discipline of English was philologically oriented, so Macksey told me. I’ve got a book of essays by Leo Spitzer (alas, in storage) where he reviews an early book by Earl Wasserman. The review came out, I believe, in the mid to later 50s. He talks of Wasserman as one of those younger scholars on the cutting edge of combining critical perspectives with philology.

A disciplinary history that focuses, as you say, on successive waves of interpretive methodologies, simply misses a lot of what the discipline was about pre-WWII.

@Bill, regarding the ngrams comment. I was curious how ‘philology’ played out over the years in Literature studies in particular, using JSTOR’s DfR Literature sub-category, which is available here: Philology ratio per year. Though it’s quite a slow decline, it seems to fit your narrative.

It’s odd how there’s still this pervasive implicit assumption of ‘revolutions’ (or at least punctuated equilibria) in disciplinary histories, even though slow growths and declines appear every bit as frequently. Those are harder to pack up neatly in narratives, I suppose.

On the question of periodization, I don’t disagree with the renewed emphasis on continuity. But I’d add that small and/or gradual shifts aren’t a priori incompatible with models of punctuated equilibrium. Sometimes small absolute differences really matter. And then there’s what Marx or Jameson would call the reality of the illusion; standing narratives about eras and schools affect the kinds of work we do, even when they don’t line up with the objects they’re supposed to organize.

But none of this is to disagree with the original point, which was that we’re getting a much better feel for the kinds and magnitudes of changes that have happened in past literature and scholarship. W00t!

A philological anecdote from Hopkins: I was at some function at Hopkins and someone pointed to an old man and said, in hushed tones, “that’s Kemp Malone.” Who’s he? From his Wikipedia bio:

Born in an academic family, Kemp Malone graduated from Emory College as it then was in 1907, with the ambition of mastering all the languages that impinged upon the development of Middle English. He spent several years in Germany, Denmark and Iceland. When World War I broke out he served two years in the United States Army and was discharged with the rank of Captain.

Malone served as President of the Modern Language Association, and other philological associations (see link) and was etymology editor of the American College Dictionary, 1947.

Who’d have thought the MLA was a philological organization?

@Bill and Matthew

Yes, your points are well taken about how slow changes can manifest as punctuations, and it fits fairly neatly in the complex systems language we’ve been using. There’s plenty of literature out there showing how even a slight shift in quantity can lead to a shift in quality, as long as the quantity shift is across a phase change point, or a critical mass of some sort.

It seems that not only do these narratives not fit the evolution/revolution and continuous/discontinuous binaries that are often attributed to them, but the binaries themselves are not particularly useful frameworks within which these narratives fit. Perhaps it is because all of these words are metaphors drawn from the natural sciences and mathematical sciences.

@Scott “It’s odd how there’s still this pervasive implicit assumption of ‘revolutions’ (or at least punctuated equilibria) in disciplinary histories, even though slow growths and declines appear every bit as frequently. Those are harder to pack up neatly in narratives, I suppose.”

I wonder. I wonder how long the profession’s talked that way about itself? Certainly when structuralism and post-structuralism came roaring through there was a sense of radical change. I can imagine that that rhetoric just settled in and became the standard for talking about the profession. Anything new had to be revolutionary, otherwise it wouldn’t register. But did people talk that way before then? I don’t know.

I spent time over the weekend writing a very long post centered on this work:

Literary History, the Future: Kemp Malone, Corpus Linguistics, Digital Archaeology, and Cultural Evolution

I’ve posted it at my personal blog:

and at a group blog for linguistics and language evolution:


I’d assume that citation analysis is pretty sophisticated. Some years ago (1981) I visited the Institute for Scientific Information (now owned by Thomson Reuters) in Philadelphia. They’d just developed what they called co-citation analysis. The idea was to associate scholars according to their patterns of citation. In particular, they were looking for scholars who cited many of the same sources but didn’t necessarily cite one another. They figured this might indicate a direction for future research. That is, these independent groups of scholars would discover one another and boom! a new research community is born.

Something like that.

I have no idea what, if anything, came of that.

But, get the whole lit crit literature in a database and do a co-citation analysis to identify groups of scholars who are interacting with one another through the literature. Do a topic analysis of the literature as a whole and now superimpose the co-citation analysis and the topic analysis and see what you get. Do the topic nets and the co-citation nets track one another? Who knows.

Since we’re dealing with literary criticism we probably want to pay particular attention to primary texts. So we have a mapping between primary texts and the co-citation graph and another mapping between primary texts and the topic graph.

Now what?

Of course we want to track time through the whole thing.


However, I’ve been thinking. One of my main hobbyhorses these days is description. Literary studies has to get a lot more sophisticated about description, which is mostly taken for granted and so is not done very rigorously. There isn’t even a sense that there’s something there to be rigorous about. Perhaps corpus linguistics is a way to open up that conversation.

Why? Because corpus techniques ARE descriptive. They tell you what’s there, but it’s up to you to make sense of it. And to do that you have to know something about how the description is done. And if we can get THAT going, then, who knows, maybe grammar’s next. For a grammar is a description of a language, but it’s very different from ‘ordinary’ description, whatever that is. Grammars tend to look like explanations rather than descriptions, but . . . There’s Chomsky’s old stuff on descriptive vs. explanatory adequacy in Aspects of a Theory of Syntax, which still seems relevant to me. Is that kind of discussion still kicking around?

[…] It isn’t the case. Some people will have mastered it, certainly, and they’re the ones writing blog posts showing tantalizing glimpses of its potential applications. But this doesn’t mean everyone is doing it, or should be doing it. I was completely entranced by […]

[…] attempting to grasp topic modeling, I found it helpful to cluster together the readings from Andrew Goldstone and Ted Underwood, Meghan R. Brett’s introduction to topic modeling, and reviews/responses from Andrew Perrin […]