Why it’s hard for syuzhet to be right or wrong yet.

I’ve enjoyed following the exchange between Matt Jockers, Annie Swafford, Jacob Eisenstein, and Dan Piepenbring about Jockers’ R package syuzhet — designed to illuminate plot by tracing the “emotional valence” of narration across the course of a novel.

I’ve found this a consistently impressive and informative conversation; it has taught me literally everything I know about “low-pass filters.” But I have no idea who is right or wrong.

More fundamentally, I’m unsure how anyone could be right or wrong here, because as far as I can tell there’s no thesis under discussion yet. Jockers’ article isn’t published. All we have is an R package, syuzhet, which does something I would call exploratory data analysis. And it’s hard to evaluate exploratory data analysis in the absence of a specific argument.

For instance, does syuzhet smooth plot arcs appropriately? I don’t know. Without a specific thesis we’re trying to test, how would we decide what scale of variation matters? In some novels it might be a scene-to-scene rhythm; in others it might be a long arc. Until I know what scale of variation matters for a particular question, I have no way of knowing what kind of smoothing is “too much” or “too little.”*

The same thing goes, more fundamentally, for the concepts of “plot” and “emotional valence” themselves. As Jacob Eisenstein has pointed out, these aren’t concepts that have a single agreed-upon meaning. To argue about them meaningfully, we’re going to need a particular historical or formal question we’re trying to solve.

It seems to me likely that syuzhet will usefully illuminate some aspects of plot. But I have no way of knowing which aspects until I look at a test involving groups of books that readers perceive as different in some specific way. For instance, if syuzhet reliably discriminates between books with tragic and comic endings, that would already be interesting. It’s not everything we mean by plot, but it’s one important thing.

The underlying issue here is that Matt hasn’t published his article yet. So we don’t actually have a thesis to debate. What we have is a new form of exploratory data analysis, released as an R package. Conversation about exploration can be interesting; it can teach me a lot about low-pass filters; but I don’t know how it could be wrong or right until I know what the exploration is trying to reveal.

I think this holds even for Matt’s claim that he’s identified six (or seven) fundamental plot patterns. That sounds like a thesis, but I would tend to say it’s still description of exploratory analysis — in this case a clustering process. Matt has done the clustering in a principled and careful way, but clustering is still (in my eyes) basically an exploratory method. I’m not sure how to evaluate it until I know what kind of generic or historical evidence would count as confirmation that we’re looking at a coherent “plot pattern.”

There are a range of ways to get that confirmation. Lynn Cherny has explored plot using supervised methods; if you do that, predictive accuracy gives you an easy test. But unsupervised methods can also be great, in cases where tests aren’t so easy to define; it’s just that an unsupervised method needs to be supplemented by historical or formal discussion that tells you what would count as confirmation for this method. I imagine there will be some of that in Matt’s article, when it comes out.

* [Edit March 31: After playing around with some artificial data myself, I have to acknowledge that the low-pass filter option in syuzhet can behave in unintuitive ways where extreme outliers and edges are involved. I think Annie Swafford (in blog posts) and Daniel Lepage (below) have been right to emphasize this. It could be less of an issue with real data; I had to use pretty extreme outliers to “break” the filter; it’s not actually the case that the whole shape is necessarily defined by its single highest point. But my guess is that this sort of filter would only add value if you wanted to build in a strong prior that plot fluctuates on or near a particular “wavelength.” On the other hand, Matt Jockers has alluded to unpublished evidence for that sort of prior (or at least for a particular filter setting). So, after changing my opinion a couple times, I’m still not feeling I have an answer here.]