A new approach to the history of character?

In Macroanalysis, Matt Jockers points out that computational stylistics has found it hard to grapple with “the aspects of writing that readers care most deeply about, namely plot, character, and theme” (118). He then proceeds to use topic modeling to pretty thoroughly anatomize theme in the nineteenth-century novel. One down, I guess, two to go!

But plot and character are probably harder than theme; it’s not yet clear how we would trace those patterns in thousands of volumes. So I think it may be worth flagging a very promising article by David Bamman, Brendan O’Connor, and Noah A. Smith. Computer scientists don’t often develop a new methodology that could seriously enrich criticism of literature and film. But this one deserves a look. (Hat tip to Lynn Cherny, by the way, for this lead.)

Emotion-Masks-760092The central insight in the article is that character can be modeled grammatically. If you can use natural language processing to parse sentences, you should be able to identify what’s being said about a given character. The authors cleverly sort “what’s being said” into three questions: what does the character do, what do they suffer or undergo, and what qualities are attributed to them? The authors accordingly model character types (or “personas”) as a set of three distributions over these different domains. For instance, the ZOMBIE persona might do a lot of “eating” and “killing,” get “killed” in turn, and find himself described as “dead.”

The authors try to identify character types of this kind in a collection of 42,306 movie plot summaries extracted from Wikipedia. The model they use is a generative one, which entails assumptions that literary critics would call “structuralist.” Movies in a given genre have a tendency to rely on certain recurring character types. Those character types in turn “generate” the specific characters in a given story, which in turn generate the actions and attributes described in the plot summary.

Using this model, they reason inward from both ends of the process. On the one hand, we know the genres that particular movies belong to. On the other hand, we can see that certain actions and attributes tend to recur together in plot summaries. Can we infer the missing link in this process — the latent character types (“personas”) that mediate the connection from genre to action?

It’s a very thoughtful model, both mathematically and critically. Does it work? Different disciplines will judge success in different ways. Computer scientists tend to want to validate a model against some kind of ground truth; in this case they test it against character patterns described by fans on TV Tropes. Film critics may be less interested in validating the model than in seeing whether it tells them anything new about character. And I think the model may actually have some new things to reveal; among other things, it suggests that the vocabulary used to describe character is strongly coded by genre. In certain genres, characters “flirt,” in others, they “switch” or “are switched.” In some genres, characters merely “defeat” each other; in other genres, they “decapitate” or “are decapitated”!

Since an association with genre is built into the generative assumptions that define the article’s model of character, this might be a predetermined result. But it also raises a hugely interesting question, and there’s lots of room for experimentation here. If the authors’ model of character is too structuralist for your taste, you’re free to sketch a different one and give it a try! Or, if you’re skeptical about our ability to fully “model” character, you could refuse to frame a generative model at all, and just use clustering algorithms in an ad hoc exploratory way to find clues.

Critics will probably also cavil about the dataset (which the authors have generously made available). Do Wikipedia plot summaries tell us about recurring character patterns in film, or do they tell us about the character patterns that are most readily recognized by editors of Wikipedia?

But I think it would be a mistake to cavil. When computer scientists hand you a new tool, the question to ask is not, “Have they used it yet to write innovative criticism?” The question to ask is, “Could we use this?” And clearly, we could.

The approach embodied in this article could be enormously valuable: it could help distant reading move beyond broad stylistic questions and start to grapple with the explicit social content of fiction (and for that matter, nonfiction, which may also rely on implicit schemas of character, as the authors shrewdly point out). Ideally, we would not only map the assumptions about character that typify a given period, but describe how those patterns have changed across time.

Making that work will not be simple: as always, the real problem is the messiness of the data. Applying this technique to actual fictive texts will be a lot harder than applying it to a plot summary. Character names are often left implicit. Many different voices speak; they’re not all equally reliable. And so on.

But the Wordseer Project at Berkeley has begun to address some of these problems. Also, it’s possible that the solution is to scale up instead of sweating the details of coreference resolution: an error rate of 20 or 30% might not matter very much, if you’re looking at strongly marked patterns in a corpus of 40,000 novels.

In any case, this seems to me an exciting lead, worthy of further exploration.

Postscript: Just to illustrate some of the questions that come up: How gendered are character types? The article by Bamman et. al. explicitly models gender as a variable, but the types it ends up identifying are less gender-segregated than I might expect. The heroes and heroines of romantic comedy, for instance, seem to be described in similar ways. Would this also be true in nineteenth-century fiction?

By tedunderwood

Ted Underwood is Professor of Information Sciences and English at the University of Illinois, Urbana-Champaign. On Twitter he is @Ted_Underwood.

9 replies on “A new approach to the history of character?”

Hi Ted,

Thanks for the thoughtful comments. I’m really happy to see this kind of critical engagement of computational models — I think it’s one of the best ways for computer science and the humanities to interact. I like to think that one of the advantages of generative models like the ones we describe (though the persona regression model is a hybrid that conditions on metadata rather than generating it) is that they make very explicit exactly what the structural assumptions are (in our case these are conditional independencies), which makes it convenient both to question those assumptions outright and to change/adapt the model to reflect different ones.

That’s a great point too re: the potential selection bias of Wikipedia summaries. I noticed this in the actor distribution in Freebase, where the ratio of men to women is very close to 2:1, which either reflects an insane true bias in reality or (maybe more likely, though it’s unclear) captures a measure of the gender gap in the level of attention paid on Freebase (and thereby on Wikipedia, from which it draws). For movies, plot summaries are convenient since they capture character and narrative elements that may only be expressed visually, and not in dialogue. Presumably this wouldn’t be true of books, where all of the information about the narrative is self-contained in the text, but yes I definitely agree there’s lot of good work waiting to be done in figuring out how to that well.


(P.S., we’ll be releasing code for this paper too before ACL in August.)

Thanks! The methodological question about generative models is a fascinating one, and I don’t know what to conclude yet. You’re right that they have the advantage of making assumptions explicit — which means that you can test your assumptions, and change them as needed.

I like that feature of your article, and I think it’s expressing some assumptions about character that are in practice pretty widely shared by audiences and critics. I should also admit that I initially misunderstood the conditional logic you guys are using; it’s a bit more flexible than I thought. The upstream-downstream way you’re able to factor in both metadata and observations from plot summary is very nice — the flexibility of a principled model is really shining there, in a way that I hope literary scholars will appreciate.

On the other hand, I can imagine a situation where modeling wouldn’t be the most efficient approach for domain specialists. If a process were complicated enough, it might turn out to be impractical to model it. In a case like that, you might want to fall back on ad hoc methods. But I really don’t know whether that will be the case here. And in any event, it does no harm to start by attempting to model things in a principled way, as you guys have done. That way, if it turns out later that we have to “fuzzify” certain aspects of the model, at least we’ll know what assumptions we’re relaxing.

Hi, thanks for the comments on our paper. As for generative vs ad-hoc clustering models, I think it’s reasonable to think of our model just as a clustering method, that maybe is a little more clear about its assumptions than some of the ad-hoc approaches out there. I personally believe too much is made of what generative text modeling means — perhaps the name “generative” it too evocative — especially given all our models aren’t super plausible anyways (e.g. topic models). Generative models are just one mathematically convenient approach to specifying structured statistical models, which are a great way to automatically learn latent structures for language.

That said, this modeling approach was very helpful to us in clarifying assumptions when we were tossing around different variations of these models. For example, it’s easy to see that characters are front-and-center in our approach.

Very interesting stuff, lots of problems/challenges, but it’s early days, no? So I’ve got some quick comments. I’ve not read the article, though I’ve looked through it very quickly.

You’ve already raised one issue about working from Wikipedia plot summaries. And, having spent some time reading at Wikipedia plot summaries (mostly film and TV) for various purposes, I’d say that there are problems of completeness and accuracy, which are, in any case, fuzzy notions. There may also be problems of consistency, different editors using different terms for the same thing, though some of that is likely to be, or could be, compensated-for buy suitable techniques.

The thing is, those plot summaries were not, for the most part, written by trained professionals. They were written by ‘civilians’ interested in movies. And the same goes for the entries TV Tropes.

Now, I could go on to complain about this and insist that they work only from materials prepared by trained professionals. But that’s not where I’m going. For one thing, it will be a cold day in Hell before academic film critics do that sort of thing.

There is such a thing as citizen science, in which non-professionals with particular interests engage in collaborations with professionals. Large chunks of biology have been built on the work of amateur naturalists, who continue to do important (largely observational) work. Bird watchers are the most obvious example. I’d think that amateurs are doing important work in tracking the problems with bees, and, for that matter, bats. Amateur astronomers are also important.

All those Wikipedia plot summaries and all the stuff at TV Tropes, that’s all, in effect, citizen cultural criticism. How can we make use of all that free and interested intellectual labor? It seems to me that THAT question gets pretty near the core of what’s going on, or what could be going on, in college and university education and online courses and digital humanities.

At least some of the students who come through undergraduate courses in literature and film are the sorts of people who make Wikipedia plot summaries and who contribute to TV Tropes and they’ll be doing that even when they’ve graduated. Is there anything we can teach them that will help them to do a better job? For that matter, do we know anything ourselves about doing that kind of work well? How many of us have done plot summaries?

One of the things I’ve been talking about at New Savanna is the need for much better description in our work (that link is to every post I’ve tagged with “description”). In this post, The Key to the Treasure is the Treasure, for example, I outline a four-part program for literary studies: 1) description, 2) naturalist criticism, 3) ethical criticism, and 4) digital humanities. Here I use Heart of Darkness as an example in the project of creating handbook-level accounts of texts of interest; plot summaries would be one aspect of such a handbook. We’re not going to create such handbooks for every text, but surely we’d want to do so for the canonical texts and for selected other texts. And this post is a brief annotated bibliography of posts I’ve done on description.

What I’m suggesting then, is that we have to go beyond treating those Wikipedia plot summaries and those TV Topes entries as stuff that just happens to be out there on the web and that we can use. We need to actively engage the people who do that work and make that part of the scholarly conversation.

Ted – this is fascinating. Thanks for writing about it, and thanks guys for doing such interesting work. A couple of thoughts:

First, I think this puts a very practical ball back in the court of literary scholars: what, exactly, do we think ‘character’ is? In the paper, Bamman et al. come up with what, for me, is a pretty good practical formula: it’s attributive terms (descriptive adjectives and nouns) + verbs where the character is the agent (what they do) + verbs where the character is the patient (what gets done to them). Do literary scholars think that’s enough? If not, what should be added, and how should we count it?

Second, the paper (FN8 on page 2), and the comments above, note the 2.1 ratio of male to female characters found in the data sets – and people find this so surprising they suspect it must be a bias of those making the data sets. It’s not surprising to me though – or if it is, I’m surprised it is that equal. Heather Froehlich has done work on male/female ratios in Shakespeare and Early Modern Drama (which also trades on stock character types), and the ratios there are even worse []. So I’m guessing this is an accurate reflection of the character ratios in actual films – if not an over-estimation of female characters.

You’re asking a couple of very good questions here. To acknowledge the second one first: Froehlich’s results are striking. If it turns out to be true that there are just more male characters, on average, in stage and screenplays, that’s a huge and significant result. It’s a lot of manual labor, but it might be possible to track that ratio across time: seems to me that it would be interesting if the ratio changes significantly, and also interesting if it doesn’t!

I think your first question is the inevitable one literary critics will raise about this work. “Is a set of three distributions over the lexicon really what we mean by character?” There’s a lot that could be said on that topic, and it’s even conceivable that different models of character might be appropriate to different periods. E.g., the paper by Bamman et. al models character largely in terms of what is explicitly done or said. I could imagine a critic making a case that internal and tacit psychological models of character become more appropriate to the 19c …

But my own preference is to treat the issue nominalistically. That’s why I hint, above, that there might be reasons to treat this as an ad-hoc heuristic rather than a principled model. I bet it’s going to turn out to be impossible for critics to agree about what character “really is.” But it’s probably still true that clustering of the kind pursued in this paper can give us a lot of valuable leads and evidence about the history of character.

Annenberg study of 500 recent films:

Females are grossly underrepresented on screen in 2012 films. Out of 4,475 speaking characters on screen, only 28.4% are female. This translates into a ratio of 2.51 males to every 1 female on screen. 2012 reveals the lowest percentage of on-screen females (28.4%) across the 5-year sample. Only 6% of the top-grossing films in 2012 featured a balanced cast, or females in 45-54.9% of all speaking roles. Just over a quarter of all narrators (27.5%) are female.

Only 16.7% of the 1,228 directors, writers, and producers are female across the 100 top-grossing films of 2012. Women accounted for 4.1% of directors, 12.2% of writers, and 20% of producers. This calculates to a 2012 ratio of 5 males to every 1 female behind the camera. Almost no changes are observed in female employment patterns behind the camera across the 5 years studied. Together, the findings show that the gender needle is not moving on screen or behind the camera in popular films.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s