New models of literary collectivity.

This is a version of a response I gave at session 155 of MLA 2014, “Literary Criticism at the Macroscale.” Slides and/or texts of the original papers by Andrew Piper and Hoyt Long and Richard So are available on the web, as is another resonse by Haun Saussy.

* * *

The papers we heard today were not picking the low-hanging fruit of text mining. There’s actually a lot of low-hanging fruit out there still worth picking — big questions that are easy to answer quantitatively and that only require organizing large datasets — but these papers were tackling problems that are (for good or ill) inherently more difficult. Part of the reason involves their transnational provenance, but another reason is that they aren’t just counting or mapping known categories but trying to rethink some of the basic concepts we use to write literary history — in particular, the concept we call “influence” or “diffusion” or “intertextuality.”

I’m tossing several terms at this concept because I don’t think literary historians have ever agreed what it should be called. But to put it very naively: new literary patterns originate somehow, and somehow they are reproduced. Different generations of scholars have modeled this differently. Hoyt and Richard quote Laura Riding and Robert Graves exploring, in 1927, an older model centered on basically personal relationships of imitation or influence. But early-twentieth-century scholars could also think anthropologically about the transmission of motifs or myths or A. O. Lovejoy’s “unit ideas.” In the later 20th century, critics got more cautious about implying continuity, and reframed this topic abstractly as “intertextuality.” But then the specificity of New Historicism sometimes pushed us back in the direction of tracing individual sources.

I’m retelling a story you already know, but trying to retell it very frankly, in order to admit that (while we’ve gained some insight) there is also a sense in which literary historians keep returning to the same problem and keep answering it in semi-satisfactory ways. We don’t all, necessarily, aspire to give a causal account of literary change. But I think we keep returning to this problem because we would like to have a kind of narrative that can move more smoothly between individual examples and the level of the discourse or genre. When we’re writing our articles the way this often works in practice is: “here’s one example, two examples — magic hand-waving — a discourse!”

leviathan_svSomething interesting and multivocal about literary history gets lost at the moment when we do that hand-waving. The things we call genres or discourses have an internal complexity that may be too big to illustrate with examples, but that also gets lost if you try to condense it into a single label, like “the epistolary novel.” Though we aspire to subtlety, in practice it’s hard to move from individual instances to groups without constructing something like the sovereign in the frontispiece for Hobbes’ Leviathan – a homogenous collection of instances composing a giant body with clear edges.

While they offer different solutions, I think both of the papers we heard today are imagining other ways to move between instances and groups. They both use digital methods to describe new forms of similarity between texts. And in both cases, the point of doing this lies less in precision than in creating a newly flexible model of collectivity. We gain a way of talking about texts that is collective and social, but not necessarily condensed into a single label. For Andrew, the “Werther effect” is less about defining a new genre than about recognizing a new set of relationships between different communities of works. For Hoyt and Richard, machine learning provides a way of talking about the reception of hokku that isn’t limited to formal imitation or to a group of texts obviously “influenced” by specific models. Algorithms help them work outward from clear examples of a literary-historical phenomenon toward a broader penumbra of similarity.

I think this kind of flexibility is one of the most important things digital tools can help us achieve, but I don’t think it’s on many radar screens right now. The reason, I suspect, is that it doesn’t fit our intuitions about computers. We understand that computers can help us with scale (distant reading), and we also get that they can map social networks. But the idea that computers can help us grapple with ambiguity and multiple determination doesn’t feel intuitive. Aren’t computers all about “binary logic”? If I tell my computer that this poem both is and is not a haiku, won’t it probably start to sputter and emit smoke?

Well, maybe not. And actually I think this is a point that should be obvious but just happens to fall in a cultural blind spot right now. The whole point of quantification is to get beyond binary categories — to grapple with questions of degree that aren’t well-represented as yes-or-no questions. Classification algorithms, for instance, are actually very good at shades of gray; they can express predictions as degrees of probability and assign the same text different degrees of membership in as many overlapping categories as you like. So I think it should feel intuitive that a quantitative approach to literary history would have the effect of loosening up categories that we now tend to treat too much as homogenous bodies. If you need to deal with gradients of difference, numbers are your friend.

Of course, how exactly this is going to work remains an open question. Technically, the papers we heard today approach the problem of similarity in different ways. Hoyt and Richard are borrowing machine learning algorithms that use the contrast between groups of texts to define similarity. Andrew’s improvising a different approach that uses a single work to define a set of features that can then be used to organize other works as an “exotext.” And other scholars have approached the same problem in other ways. Franco Moretti’s chapter on “Trees” also bridges the gap I’m talking about between individual examples and coherent discourses; he does it by breaking the genre of detective fiction up into a tree of differentiations. It’s not a computational approach, but for some problems we may not need computation. Matt Jockers, on the other hand, has a chapter on “influence” in Macroanalysis that uses topic modeling to define global criteria of similarity for nineteenth-century novels. And I could go on: Sara Steger, for instance, has done work on sentimentality in the nineteenth century novel that uses machine learning in a loosely analogous way to think about the affective dimension of genre.

The differences between these projects are worth discussing, but in this response I’m more interested in highlighting the common impulse they share. While these projects explore specific problems in literary history, they can also be understood as interventions in literary theory, because they’re all attempting to rethink certain basic concepts we use to organize literary-historical narrative. Andrew’s concept of the “exotext” makes this theoretical ambition most overt, but I think it’s implicit across a range of projects. For me the point of the enterprise, at this stage, is to brainstorm flexible alternatives to our existing, slightly clunky, models of literary collectivity. And what I find exciting at the moment is the sheer proliferation of alternatives.

OT: politics, social media, and the academy.

This is a blog about text mining, but from time to time I’m going to allow myself to wander off topic, briefly. At the moment, I think social media are adding a few interesting twists to an old question about the relationship between academic politics and politics-politics.

Protests outside the Wisconsin State Capitol, Feb 2011.

It’s perhaps never a good idea to confuse politics with communicative rationality. Recently, in the United States, it isn’t even clear that all parties share a minimal respect for democratic norms. One side is willing to obstruct the right to vote, to lie about scientifically ascertainable fact, and to convert US attorneys (when they’re in power) into partisan enforcers. In circumstances like this, observers of good faith don’t need to spend a lot of time “debating” politics, because the other side isn’t debating. The only thing worth debating is how to fight back. And in a fight, dispassionate self-criticism becomes less important than solidarity.

Personally, I don’t mind a good fight with clearly-drawn moral lines. But this same clarity can be a bad thing for the academy. Dispassionate debate is what our institution is designed to achieve. If contemporary political life teaches us that “debate” is usually a sham, staged to produce an illusion of equivalence between A) fact and B) bullshit — then we may start to lose faith in our own guiding principles.

This opens up a whole range of questions. But maybe the most interesting question for likely readers of this blog will involve the role of social media. I think the web has proven itself a good tool for grassroots push-back against corporate power; we’re all familiar with successful campaigns against SOPA and Susan G. Komen. But social media also work by harnessing the power of groupthink. “Click like.” “Share.” “Retweet.” This doesn’t bother me where politics itself is concerned; political life always entails a decision to “hang together or hang separately.”

But I’m uneasy about extending the same strategy to academic politics, because our main job, in the academy, is debate rather than solidarity. I hesitate to use Naomi Schaefer Riley’s recent blog post to the Chronicle as an example, because it’s not in any sense a model of the virtues of debate. It was a hastily-tossed-off, sneering attack on junior scholars that failed to engage in any depth with the texts it attacked. Still, I’m uncomfortable when I see academics harnessing the power of social media to discourage The Chronicle from publishing Riley.

There was, after all, an idea buried underneath Riley’s sneers. It could have been phrased as a question about the role of politics in the humanities. Political content has become more central to humanistic research, at the same time as actual political debate has become less likely (for reasons sketched above). The result is that a lot of dissertations do seem to be proceeding toward a predetermined conclusion. This isn’t by any means a problem only in Black Studies, and Riley’s reasons for picking on Black Studies probably won’t bear close examination.

Still, I’m not persuaded that we would improve the academy by closing publications like the Chronicle to Riley’s kind of critique. Attacks on academic institutions can raise valid questions, even when they are poorly argued, sneering, and unfair. (E.g., I wouldn’t be writing this blog post if it weren’t for the outcry over Riley.) So in the end I agree with Liz McMillen’s refusal to take down the post.

But this particular incident is not of great significance. I want to raise a more general question about the role that technologies of solidarity should play in academic politics. We’ve become understandably cynical about the ideal of “open debate” in politics and journalism. How cynical are we willing to become about its place in academia? It’s a question that may become especially salient if we move toward more public forms of review. Would we be comfortable, for instance, with a petition directed at a particular scholarly journal, urging them not to publish article(s) by a particular author?

[11 p.m. May 6th: This post was revised after initial publication, mainly for brevity. I also made the final paragraph a little more pointed.]

[Update May 7: The Chronicle has asked Schaefer Riley to leave the blog. It’s a justifiable decision, since she wrote a very poorly argued post. But it also does convince me that social media are acquiring a new power to shape the limits of academic debate. That’s a development worth watching.]

[Update May 8th: Kevin Drum at Mother Jones weighs in on the issue.]

What kinds of “topics” does topic modeling actually produce?

I’m having an interesting discussion with Lisa Rhody about the significance of topic modeling at different scales that I’d like to follow up with some examples.

I’ve been doing topic modeling on collections of eighteenth- and nineteenth-century volumes, using volumes themselves as the “documents” being modeled. Lisa has been pursuing topic modeling on a collection of poems, using individual poems as the documents being modeled.

The math we’re using is probably similar. I believe Lisa is using MALLET. I’m using a version of Latent Dirichlet Allocation that I wrote in Java so I could tinker with it.

But the interesting question we’re exploring is this: How does the meaning of LDA change when it’s applied to writing at different scales of granularity? Lisa’s documents (poems) are a typical size for LDA: this technique is often applied to identify topics in newspaper articles, for instance. This is a scale that seems roughly in keeping with the meaning of the word “topic.” We often assume that the topic of written discourse changes from paragraph to paragraph, “topic sentence” to “topic sentence.”

By contrast, I’m using documents (volumes) that are much larger than a paragraph, so how is it possible to produce topics as narrowly defined as this one?


This is based on a generically diverse collection of 1,782 19c volumes, not all of which are plotted here (only the volumes where the topic is most prominent are plotted; the gray line represents an aggregate frequency including unplotted volumes). The most prominent words in this topic are “mother, little, child, children, old, father, poor, boy, young, family.” It’s clearly a topic about familial relationships, and more specifically about parent-child relationships. But there aren’t a whole lot of books in my collection specifically about parent-child relationships! True, the most prominent books in the topic are A. F. Chamberlain’s The Child and Childhood in Folk Thought (1896) and Alice Earl Morse’s Child Life in Colonial Days (1899), but most of the rest of the prominent volumes are novels — by, for instance, Catharine Sedgwick, William Thackeray, Louisa May Alcott, and so on. Since few novels are exclusively about parent-child relations, how can the differences between novels help LDA identify this topic?

The answer is that the LDA algorithm doesn’t demand anything remotely like a one-to-one relationship between documents and topics. LDA uses the differences between documents to distinguish topics — but not by establishing a one-to-one mapping. On the contrary, every document contains a bit of every topic, although it contains them in different proportions. The numerical variation of topic proportions between documents provides a kind of mathematical leverage that distinguishes topics from each other.

The implication of this is that your documents can be considerably larger than the kind of granularity you’re trying to model. As long as the documents are small enough that the proportions between topics vary significantly from one document to the next, you’ll get the leverage you need to discriminate those topics. Thus you can model a collection of volumes and get topics that are not mere “subject classifications” for volumes.

Now, in the comments to an earlier post I also said that I thought “topic” was not always the right word to use for the categories that are produced by topic modeling. I suggested that “discourse” might be better, because topics are not always unified semantically. This is a place where Lisa starts to question my methodology a little, and I don’t blame her for doing so; I’m making a claim that runs against the grain of a lot of existing discussion about “topic modeling.” The computer scientists who invented this technique certainly thought they were designing it to identify semantically coherent “topics.” If I’m not doing that, then, frankly, am I using it right? Let’s consider this example:


This is based on the same generically diverse 19c collection. The most prominent words are “love, life, soul, world, god, death, things, heart, men, man, us, earth.” Now, I would not call that a semantically coherent topic. There is some religious language in there, but it’s not about religion as such. “Love” and “heart” are mixed in there; so are “men” and “man,” “world” and “earth.” It’s clearly a kind of poetic diction (as you can tell from the color of the little circles), and one that increases in prominence as the nineteenth century goes on. But you would be hard pressed to identify this topic with a single concept.

Does that mean topic modeling isn’t working well here? Does it mean that I should fix the system so that it would produce topics that are easier to label with a single concept? Or does it mean that LDA is telling me something interesting about Victorian poetry — something that might be roughly outlined as an emergent discourse of “spiritual earnestness” and “self-conscious simplicity”? It’s an open question, but I lean toward the latter alternative. (By the way, the writers most prominently involved here include Christina Rossetti, Algernon Swinburne, and both Brownings.)

In an earlier comment I implied that the choice between “semantic” topics and “discourses” might be aligned with topic modeling at different scales, but I’m not really sure that’s true. I’m sure that the document size we choose does affect the level of granularity we’re modeling, but I’m not sure how radically it affects it. (I believe Matt Jockers has done some systematic work on that question, but I’ll also be interested to see the results Lisa gets when she models differences between poems.)

I actually suspect that the topics identified by LDA probably always have the character of “discourses.” They are, technically, “kinds of language that tend to occur in the same discursive contexts.” But a “kind of language” may or may not really be a “topic.” I suspect you’re always going to get things like “art hath thy thou,” which are better called a “register” or a “sociolect” than they are a “topic.” For me, this is not a problem to be fixed. After all, if I really want to identify topics, I can open a thesaurus. The great thing about topic modeling is that it maps the actual discursive contours of a collection, which may or may not line up with “concepts” any writer ever consciously held in mind.

Computer scientists don’t understand the technique that way.* But on this point, I think we literary scholars have something to teach them.

On the collective course blog for English 581 I have some other examples of topics produced at a volume level.

*[UPDATE April 3, 2012: Allen Riddell rightly points out in the comments below that Blei’s original LDA article is elegantly agnostic about the significance of the “topics” — which are at bottom just “latent variables.” The word “topic” may be misleading, but computer scientists themselves are often quite careful about interpretation.]

Documentation / open data:
I’ve put the topic model I used to produce these visualizations on github. It’s in the subfolder 19th150topics under folder BrowseLDA. Each folder contains an R script that you run; it then prompts you to load the data files included in the same folder, and allows you to browse around in the topic model, visualizing each topic as you go.

I have also pushed my Java code for LDA up to github. But really, most people are better off with MALLET, which is infinitely faster and has hyperparameter optimization that I haven’t added yet. I wrote this just so that I would be able to see all the moving parts and understand how they worked.

Digital humanities and the spy business.

Flickr / dunechaser (Creative Commons)

I’m surprised more digital humanists haven’t blogged the news that the US Intelligence Advanced Projects Activity wants to fund techniques for mining and categorizing metaphors.

The stories I’ve read so far have largely missed the point of the program. They focus instead on the amusing notion that the government “fancies a huge metaphor repository.” And it’s true that the program description reads a bit like a section of English 101 taught by the men from Dragnet. “The Metaphor Program will exploit the fact that metaphors are pervasive in everyday talk and reveal the underlying beliefs and worldviews of members of a culture.” What is “culture,” you ask? Simply refer to section 1.A.3., “Program Definitions”: “Culture is a set of values, attitudes, knowledge and patterned behaviors shared by a group.”

This seems accurate enough, although the combination of precision and generality does feel a little freaky. “Affect is important because it influences behavior; metaphors have been associated with affect.”

The program announcement is similarly precise about the difference between metaphor and metonymy. (They’re not wild about metonymy.)

(3) Figurative Language: The only types of figurative language that are included in the program are metaphors and metonymy.
• Metonymy may be proposed in addition to but not instead of metaphor analysis. Those interested in metonymy must explain why metonymy is required, what metonymy adds to the analysis and how it complements the proposed work on metaphors.

All this is fun, but the program also has a purpose that hasn’t been highlighted by most of the reporting I’ve seen. The second phase of the program will use statistical analysis of metaphors to “characterize differing cultural perspectives associated with case studies of the types of interest to the Intelligence Community.” One can only speculate about those types, but I imagine that we’re talking about specific political movements and religious groups. The goal is ostensibly to understand their “cultural perspectives,” but it seems quite possible that an unspoken, longer-term goal might involve profiling and automatically identifying members of demographic, vocational, or political groups. (IARPA has inherited some personnel and structures once associated with John Poindexter’s Total Information Awareness program.) The initial phase of the metaphor-mining is going to focus on four languages: “American English, Iranian Farsi, Russian Russian and Mexican Spanish.”

Naturally, my feelings are complex. Automatically extracting metaphors from text would be a neat trick, especially if you also distinguished metaphor from metonymy. (You would have to know, for instance, that “Oval Office” is not a metaphor for the executive branch of the US government.) [UPDATE: Arno Bosse points out that Brad Pasanek has in fact been working on techniques for automatic metaphor extraction, and has developed a very extensive archive. Needless to say, I don’t mean to associate Brad with the IARPA project.]

Going from a list of metaphors to useful observations about a “cultural perspective” would be an even neater trick, and I doubt that it can be automated. My doubts on that score are the main source of my suspicion that the actual deliverable of the grant will turn out to be profiling. That may not be the intended goal. But I suspect it will be the deliverable because I suspect that it’s the part of the project researchers will get to work reliably. It probably is possible to identify members of specific groups through statistical analysis of the metaphors they use.

On the other hand, I don’t find this especially terrifying, because it has a Rube Goldberg indirection to it. If IARPA wants to automatically profile people based on digital analysis of their prose, they can do that in simpler ways. The success of stylometry indicates that you don’t need to understand the textual features that distinguish individuals (or groups) in order to make fairly reliable predictions about authorship. It may well turn out that people in a particular political movement overuse certain prepositions, for reasons that remain opaque, although the features are reliably predictive. I am confident, of course, that intelligence agencies would never apply a technique like this domestically.

Postscript: I should credit Anna Kornbluh for bringing this program to my attention.

Why humanists need to understand text mining.

Humanists are already doing text mining; we’re just doing it in a theoretically naive way. Every time we search a database, we use complex statistical tools to sort important documents from unimportant ones. We don’t spend a lot of time talking about this part of our methodology, because search engines hide the underlying math, making the sorting process seem transparent.

But search is not a transparent technology: search engines make a wide range of different assumptions about similarity, relevance, and importance. If (as I’ve argued elsewhere) search engines’ claim to identify obscure but relevant sources has powerfully shaped contemporary historicism, then our critical practice has come to depend on algorithms that other people write for us, and that we don’t even realize we’re using. Humanists quite properly feel that humanistic research ought to be shaped by our own critical theories, not by the whims of Google. But that can only happen if we understand text mining well enough to build — or at least select — tools more appropriate for our discipline.

The AltaVista search page, circa 1996. This was the moment to freak out about text mining.


This isn’t an abstract problem; existing search technology sits uneasily with our critical theory in several concrete ways. For instance, humanists sometimes criticize text mining by noting that words and concepts don’t line up with each other in a one-to-one fashion. This is quite true: but it’s a critique of humanists’ existing search practices, not of embryonic efforts to improve them. Ordinary forms of keyword search are driven by individual words in a literal-minded way; the point of more sophisticated strategies — like topic modeling — is precisely that they pay attention to looser patterns of association in order to reflect the polysemous character of discourse, where concepts always have multiple names and words often mean several different things.

Perhaps more importantly, humanists have resigned themselves to a hermeneutically naive approach when they accept the dart-throwing game called “choosing search terms.” One of the basic premises of historicism is that other social forms are governed by categories that may not line up with our own; to understand another place or time, a scholar needs to begin by eliciting its own categories. Every time we use a search engine to do historical work we give the lie to this premise by assuming that we already know how experience is organized and labeled in, say, seventeenth-century Spain. That can be a time-consuming assumption, if our first few guesses turn out to be wrong and we have to keep throwing darts. But worse, it can be a misleading assumption, if we accept the first or second set of results and ignore concepts whose names we failed to guess. The point of more sophisticated text-mining techniques — like semantic clustering — is to allow patterns to emerge from historical collections in ways that are (if not absolutely spontaneous) at least a bit less slavishly and minutely dependent on the projection of contemporary assumptions.

I don’t want to suggest that we can dispense with search engines; when you already know what you’re looking for, and what it’s called, a naive search strategy may be the shortest path between A and B. But in the humanities you often don’t know precisely what you’re looking for yet, or what it’s called. And in those circumstances, our present search strategies are potentially misleading — although they remain powerful enough to be seductive. In short, I would suggest that humanists are choosing the wrong moment to get nervous about the distorting influence of digital methods. Crude statistical algorithms already shaped our critical practice in the 1990s when we started relying on keyword search; if we want to take back the reins, each humanist is going to need to understand text mining well enough to choose the tools appropriate for his or her own theoretical premises.

A bit more on the tension between heuristic and evidentiary methods.

Just a quick link to this post at cliotropic, (h/t Dan Cohen) which dramatizes what’s concretely at stake in the tension I was describing earlier between heuristic and evidentiary applications of technology.

Shane Landrum reports that historians on the job market may run into skeptical questions from social scientists — who apparently don’t like to see visualization used as a heuristic. They call it “fishing for a thesis.”

I think I understand the source of the tension here. In a discipline focused on the present, where evidence can be produced essentially at will, a primary problem that confronts researchers is that you can prove anything if you just keep rolling the dice often enough. “Fishing expeditions” really are a problem for this kind of enterprise, because there’s always going to be some sort of pattern in your data. If you wait to define a thesis until you see what patterns emerge, then you’re going to end up crafting a thesis to fit what might be an accidental bounce of the dice in a particular experiment.

Obviously history and literary studies are engaged in a different sort of enterprise, because our archives are for the most part fixed. We occasionally discover new documents, but history as a whole isn’t an experiment we can repeat, so we’re not inclined to view historical patterns as things that “might not have statistical significance.” I mean, of course in a sense all historical patterns may have been accidents. But if they happened, they’re significant — the question of whether they would happen again if we repeated the experiment isn’t one that we usually spend much time debating. So “fishing for patterns” isn’t usually something that bothers us; in fact, we’re likely to value heuristics that help us discover them.