Categories
disciplinary history interpretive theory machine learning

Interesting times for literary theory.

A couple of weeks ago, after reading abstracts from DH2013, I said that the take-away for me was that “literary theory is about to get interesting again” – subtweeting the course of history in a way that I guess I ought to explain.

A 1915 book by Chicago's "Professor of Literary Theory."
A 1915 book by Chicago’s “Professor of Literary Theory.”

In the twentieth century, “literary theory” was often a name for the sparks that flew when literary scholars pushed back against challenges from social science. Theory became part of the academic study of literature around 1900, when the comparative study of folklore seemed to reveal coherent patterns in national literatures that scholars had previously treated separately. Schools like the University of Chicago hired “Professors of Literary Theory” to explore the controversial possibility of generalization.* Later in the century, structural linguistics posed an analogous challenge, claiming to glimpse an organizing pattern in language that literary scholars sought to appropriate and/or deconstruct. Once again, sparks flew.

I think literary scholars are about to face a similarly productive challenge from the discipline of machine learning — a subfield of computer science that studies learning as a problem of generalization from limited evidence. The discipline has made practical contributions to commercial IT, but it’s an epistemological method founded on statistics more than it is a collection of specific tools, and it tends to be intellectually adventurous: lately, researchers are trying to model concepts like “character” (pdf) and “gender,” citing Judith Butler in the process (pdf).

At DH2013 and elsewhere, I see promising signs that literary scholars are gearing up to reply. In some cases we’re applying methods of machine learning to new problems; in some cases we’re borrowing the discipline’s broader underlying concepts (e.g. the notion of a “generative model”); in some cases we’re grappling skeptically with its premises. (There are also, of course, significant collaborations between scholars in both fields.)

This could be the beginning of a beautiful friendship. I realize a marriage between machine learning and literary theory sounds implausible: people who enjoy one of these things are pretty likely to believe the other is fraudulent and evil.** But after reading through a couple of ML textbooks,*** I’m convinced that literary theorists and computer scientists wrestle with similar problems, in ways that are at least loosely congruent. Neither field is interested in the mere accumulation of data; both are interested in understanding the way we think and the kinds of patterns we recognize in language. Both fields are interested in problems that lack a single correct answer, and have to be mapped in shades of gray (ML calls these shades “probability”). Both disciplines are preoccupied with the danger of overgeneralization (literary theorists call this “essentialism”; computer scientists call it “overfitting”). Instead of saying “every interpretation is based on some previous assumption,” computer scientists say “every model depends on some prior probability,” but there’s really a similar kind of self-scrutiny involved.

It’s already clear that machine learning algorithms (like topic modeling) can be useful tools for humanists. But I think I glimpse an even more productive conversation taking shape, where instead of borrowing fully-formed “tools,” humanists borrow the statistical language of ML to think rigorously about different kinds of uncertainty, and return the favor by exposing the discipline to boundary cases that challenge its methods.

Won’t quantitative models of phenomena like plot and genre simplify literature by flattening out individual variation? Sure. But the same thing could be said about Freud and Lévi-Strauss. When scientists (or social scientists) write about literature they tend to produce models that literary scholars find overly general. Which doesn’t prevent those models from advancing theoretical reflection on literature! I think humanists, conversely, can warn scientists away from blind alleys by reminding them that concepts like “gender” and “genre” are historically unstable. If you assume words like that have a single meaning, you’re already overfitting your model.

Of course, if literary theory and computer science do have a conversation, a large part of the conversation is going to be a meta-debate about what the conversation can or can’t achieve. And perhaps, in the end, there will be limits to the congruence of these disciplines. Alan Liu’s recent essay in PMLA pushes against the notion that learning algorithms can be analogous to human interpretation, suggesting that statistical models become meaningful only through the inclusion of human “seed concepts.” I’m not certain how deep this particular disagreement goes, because I think machine learning researchers would actually agree with Liu that statistical modeling never starts from a tabula rasa. Even “unsupervised” algorithms have priors. More importantly, human beings have to decide what kind of model is appropriate for a given problem: machine learning aims to extend our leverage over large volumes of data, not to take us out of the hermeneutic circle altogether.

But as Liu’s essay demonstrates, this is going to be a lively, deeply theorized conversation even where it turns out that literary theory and computer science have fundamental differences. These disciplines are clearly thinking about similar questions: Liu is right to recognize that unsupervised learning, for instance, raises hermeneutic questions of a kind that are familiar to literary theorists. If our disciplines really approach similar questions in incompatible ways, it will be a matter of some importance to understand why.

0804784469* <plug> For more on “literary theory” in the early twentieth century, see the fourth chapter of Why Literary Periods Mattered: Historical Contrast and the Prestige of English Studies (2013, hot off the press). The book has a lovely cover, but unfortunately has nothing to do with machine learning. </plug>

** This post grows out of a conversation I had with Eleanor Courtemanche, in which I tried to convince her that machine learning doesn’t just reproduce the biases you bring to it.

*** Practically, I usually rely on Data Mining: Practical Machine Learning Tools and Techniques (Ian Witten, Eibe Frank, Mark Hall), but to understand the deeper logic of the field I’ve been reading Machine Learning: A Probabilistic Perspective (Kevin P. Murphy). Literary theorists may appreciate Murphy’s remark that wealth has a long-tailed distribution, “especially in plutocracies such as the USA” (43).

PS later that afternoon: Belatedly realize I didn’t say anything about the most controversial word in my original tweet: “literary theory is about to get interesting again.” I suppose I tacitly distinguish literary theory (which has been a little sleepy lately, imo) from theory-sans-adjective (which has been vigorous, although hard to define). But now I’m getting into a distinction that’s much too slippery for a short blog post.

By tedunderwood

Ted Underwood is Professor of Information Sciences and English at the University of Illinois, Urbana-Champaign. On Twitter he is @Ted_Underwood.

11 replies on “Interesting times for literary theory.”

In principle I agree. The conversation could very well be a fruitful one. Things will go significantly better if the disciplines of Theory can get over the notion that they’ve always already been there and done that.

I was absolutely interested. That’s dazzlingly ambitious and a perfect example of the kind of theoretical approach I was predicting (without a whole lot of solid evidence) in this post. I’ll definitely keep my eyes peeled for the published version.

There might be machine-learning on a body of texts late next year, though just to see what happens rather than as substantiation for the work in the dissertation. (Almost all of what I treat as the ‘input space’ for mood is already unreachably high-level features from the point of view of actually existing machine-learning, but sometimes ML is good at replicating really abstract distinctions without nailing the intermediary levels.)

I think ‘model’ is right, though the constraints of writing a comparative literature dissertation at a very traditional department oblige to me to veer over to metaphor pretty often. There hasn’t been a ton of cog-sci work influenced by work on deep unsupervised feature learning in ML yet, but it’s certainly a prospect that leading DL people like Bengio, Hinton, and recently Michael Nielsen have in mind.

1. By way of motivating your ML approach, or just out of curiousity, you might want to look at some neuroscience folks who model neurals systems as very high-dimensional spaces of neurons. A sensory percept in such a space would be a low-dimensional manifold. I’m thinking in particular of the work of Berkeley’s Walter Freeman, whom I consulted while writing a book on music. His papers are freely available on the web.

2. By way of contrast, you might want to look at some old symbolic AI work done by my friend, Brian Phillips and by me. We both worked under David Hays, one of the founders of computational linguistics. Early in the 1970s Hays had advanced the notion that abstract concepts were patterns in a semantic net. Thus, “charity” is when “someone does something nice for someone else without thought of reward.” Any story that matches that pattern is said to be about charity; charity IS the pattern.

Brian was interested in discourse analysis and in the concept of tragedy. He developed a crude definition of charity as a pattern in a semantic net. He then collected a bunch of newspaper stories, some of which he judged to be tragic by that defintion, others not. He then tested those stories against his model. You can find an account of his work on the web in an old issue of the American Journal of Computational Linguistics (now just JCL): A Model for Knowledge and its Application to Discourse Analysis.

I took Hays’ model and applied it to Shakespeare’s Sonnet 129, Lust in Action. I didn’t run any simulations, but I did develop a rather complex set of diagrams and then traced the poem as a path through those diagrams. I published that work in the Centennary Issue of MLN in 1976, Cognitive Networks and Literary Semantics, and in Language and Style, Lust in Action: An Abstraction (1981). It was also the basis of my dissertation, Cognitive Science and Literary Theory (1978).

3. Now, how do you bridge the gap between these two notions of abstraction, one grounded in high-dimensional spaces and the other in directed graphs (cognitive networks)? Or, how do you realize a cognitive net in a neural net? As far as I’m concerned, that’s one of the major items on the intellectual agenda for cognitive neuroscience in the next couple of decades. Of course, you don’t have to tackle it in your disseration. But it’s an issue raised by the work you are doing.

Thanks for these references Bill! The Ashbery section of my dissertation, which implicitly uses Keith Holyoak’s symbolic theory of analogy and schematic-induction, tries to make some first gestures in this direction. The coming month are going to be devoted to making the case for the literary theoretical merit of it all, but soon after that I’m hoping to go back to my brigade of comp-sci and cog-sci colleagues and refine the technical foundations of the work so a detailed critique would be wonderfully welcome.

“…the case for the literary theoretical merit of it all…”

I don’t know whether you’ve looked at the cognitive metaphor folks, I’m thinking particularly of Lakoff and Turner, More than Cool Reason. I think the notion of cognitive metaphor has been oversold by a mile, but that’s beside the point I’m making here. First, they have made something of a lit theory case for their work and have been using it in various ways, including textual analysis. It’s all fairly informal and, in some ways, like good old rhetocial analysis, but with a different set of tropes (a criticism that’s been advanced against the work). But, if you look in the conceptual background of that work, such as Lakoff’s Wpmrn, Fire, and Dangerous Things you’ll find that older symbolic AI & CL (computational linguistics) literature discussed under the rubric of idealized cognitive models (ICMs),

Leave a Reply to Bill Benzon Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s