Finding the great divide

Last year, Jordan Sellers and I published an article in Modern Language Quarterly, trying to trace the “great divide” that is supposed to open up between mass culture and advanced literary taste around the beginning of the twentieth century.

I’m borrowing the phrase “great divide” from Andreas Huyssen, but he’s not the only person to describe the phenomenon. Whether we explain it neutrally as a consequence of widespread literacy, or more skeptically as the rise of a “culture industry,” literary historians widely agree that popularity and prestige parted company in the twentieth century. So we were surprised not to be able to measure the widening gap.

expectedWe could certainly model literary taste. We trained a model to distinguish poets reviewed in elite literary magazines from a less celebrated “contrast group” selected randomly. The model achieved roughly 79% accuracy, 1820-1919,  and the stability of the model itself raised interesting questions. But we didn’t find that the model’s accuracy increased across time in the way we would have expected in a period when elite and popular literary taste are specializing and growing apart.

Instead of concluding that the division never happened, we guessed that we had misunderstood it or looked in the wrong place. Algee-Hewitt and McGurl have pretty decisively confirmed that a divide exists in the twentieth century. So we ought to be able to see it emerging. Maybe we needed to reach further into the twentieth century — or maybe we would have better luck with fiction, since the history of fiction provides evidence about sales, as well as prestige?

In fact, getting evidence about that second, economic axis seems to be the key. It took work by many hands over a couple of years: Kyle Johnston, Sabrina Lee, and Jessica Mercado, as well as Jordan Sellers, have all contributed to this project. I’m presenting a preliminary account of our results at Cultural Analytics 2017, and this blog post is just a brief summary of the main point.

When you look at the books described as bestsellers by Publisher’s Weekly, or by book historians (see references to Altick, Bloom, Hackett, Leavis, below) it’s easy to see the two circles of the Venn diagram pulling apart: on the one hand bestsellers, on the other hand books reviewed in elite venues. (For our definition of “elite venues” see the “Table” in a supporting code & data repository.)

authorfractions2

On the other hand, when you back up from bestsellers to look at a broader sample of literary production, it’s still not easy to detect increasing stylistic differentiation between the elite “reviewed” texts and the rest of the literary field. A classifier trained on the reviewed fiction has roughly 72.5% accuracy from 1850 to 1949; if you break the century into parts, there are some variations in accuracy, but no consistent pattern. (In a subsequent blog post, I’ll look at the fiddly details of algorithm choice and feature engineering, but the long and short of that question is — it doesn’t make a significant difference.)

To understand why the growing separation of bestsellers from “reviewed” texts at the high end of the market doesn’t seem to make literary production as a whole more strongly stratified, I’ve tried mapping authors onto a two-dimensional model of the literary field, intended to echo Pierre Bourdieu’s well-known diagrams of the interaction between economic and cultural distinction.

bourdieufield

Pierre Bourdieu, The Field of Cultural Production (1993), p. 49.

In the diagram below, for instance, the horizontal axis represents sales, and the vertical axis represents prestige. Sales would be easy to measure, if we had all the data. We actually don’t — so see the end of this post for the estimation strategy I adopted. Prestige, on the other hand, is difficult to measure: it’s perspectival and complex. So we modeled prestige by sampling texts that were reviewed in prominent literary magazines, and then training a model that used textual cues to predict the probability that any given book came from the “reviewed” set. An author’s prestige in this diagram is simply the average probability of review for their books. (The Stanford Literary Lab has similarly recreated Bourdieu’s model of distinction in their pamphlet “Canon/Archive,” using academic citations as a measure of prestige.)

field1850noline

The upward drift of these points reveals a fairly strong correlation between prestige and sales. It is possible to find a few high-selling authors who are predicted to lack critical prestige — notably, for instance, the historical novelist W. H. Ainsworth and the sensation novelist Ellen Wood, author of East Lynne. It’s harder to find authors who have prestige but no sales: there’s not much in the northwest corner of the map. Arthur Helps, a Cambridge Apostle, is a fairly lonely figure.

Fast-forward seventy-five years and we see a different picture.

field1925noline

The correlation between sales and prestige is now weaker; the cloud of authors is “rounder” overall.

There are also more authors in the “upper midwest” portion of the map now — people like Zora Neale Hurston and James Joyce, who have critical prestige but not enormous sales (or not up to 1949, at least as far as my model is aware).

There’s also a distinct “genre fiction” and “pulp fiction” world emerging in the southeast corner of this map, ranging from Agatha Christie to Mickey Spillane. (A few years earlier, Edgar Rice Burroughs and Zane Gray are in the same region.)

Moreover, if you just look at the large circles (the authors we’re most likely to remember), you can start to see how people in this period might get the idea that sales are actually negatively correlated with critical prestige. The right side of the map almost looks like a diagonal line slanting down from William Faulkner to P. G. Wodehouse.

That negative correlation doesn’t really characterize the field as a whole. Critical prestige still has a faint positive correlation with sales, as people over on the left side of the map might sadly remind us. But a brief survey of familiar names could give you the opposite impression.

In short, we’re not necessarily seeing a stronger stratification of the literary field. The change might better be described as a decline in the correlation of two existing forms of distinction. And as they become less correlated, the difference between them becomes more visible, especially among the well-known names on the right side of the map.

summary

So, while we’re broadly confirming an existing story about literary history, the evidence also suggests that the metaphor of a “great divide” is a bit of an exaggeration. We don’t see any chasm emerging.

Maps of the literary field also help me understand why a classifier trained on an elite “reviewed” sample didn’t necessarily get stronger over time. The correlation of prestige and sales in the Victorian era means that the line separating the red and blue samples was strongly tilted there, and may borrow some of its strength from both axes. (It’s really a boundary between the prominent and the obscure.)

cheating

As we move into the twentieth century, the slope of the line gets flatter, and we get closer to a “pure” model of prestige (as distinguished from sales). But the boundary itself may not grow more clearly marked, if you’re sampling a group of the same size. (However, if you leave The New Republic and New Yorker behind, and sample only works reviewed in little magazines, you do get a more tightly unified group of texts that can be distinguished from a random sample with 83% accuracy.)

This is all great, you say — but how exactly are you “estimating” sales? We don’t actually have good sales figures for every author in HathiTrust Digital Library; we have fairly patchy records that depend on individual publishers.
bayesiantransform
For the answer to that question, I’m going to refer you to the github repo where I work out a model of sales. The short version is that I borrow a version of “empirical Bayes” from Julia Silge and David Robinson, and apply it to evidence drawn from bestseller lists as well as digital libraries, to construct a rough estimate of each author’s relative prominence in the market. The trick is, basically, to use the evidence we have to construct an estimate of our uncertainty, and then use our uncertainty to revise the evidence. The picture on the left gives you a rough sense of how that transformation works. I think empirical Bayes may turn out to be useful for a lot of problems where historians need to reconstruct evidence that is patchy or missing in the historical record, but the details are too much to explain here; see Silge’s post and my Jupyter notebook.

Bubble charts invite mouse-over exploration. I can’t easily embed interactive viz in this blog, but here are a few links to plotly visualizations:

https://plot.ly/~TedUnderwood/8/the-literary-field-1850-1874/

https://plot.ly/~TedUnderwood/4/the-literary-field-1875-1899/

https://plot.ly/~TedUnderwood/2/the-literary-field-1900-1924/

https://plot.ly/~TedUnderwood/6/the-literary-field-1925-1949/

Acknowledgments

The texts used here are drawn from HathiTrust via the HathiTrust Research Center. Parts of the research were funded by the Andrew G Mellon Foundation via the WCSA+DC grant, and part by SSHRC via NovelTM.

Most importantly, I want to acknowledge my collaborators on this project, Kyle Johnston, Sabrina Lee, Jessica Mercado, and Jordan Sellers. They contributed a lot of intellectual depth to the project — for instance by doing research that helped us decide which periodicals should represent a given period of literary history.

References

Algee-Hewitt, Mark, and Mark McGurl. “Between Canon and Corpus: Six Perspectives on 20th-Century Novels.” Stanford Literary Lab, Pamphlet 9, 2015. https://litlab.stanford.edu/LiteraryLabPamphlet8.pdf

Algee-Hewitt, Mark, Sarah Allison, Marissa Gemma, Ryan Heuser, Franco Moretti, Hannah Walser. “Canon/Archive: Large-Scale Dynamics in the Literary Field.” Stanford Literary Lab, January 2016. https://litlab.stanford.edu/LiteraryLabPamphlet11.pdf

Altick, Richard D. The English Common Reader: A Social History of the Mass Reading Public 1800-1900. Chicago: University of Chicago Press, 1957.

Bloom, Clive. Bestsellers: Popular Fiction Since 1900. 2nd edition. Houndmills: Palgrave Macmillan, 2008.

Hackett, Alice Payne, and James Henry Burke. 80 Years of Best Sellers 1895-1975. New York: R.R. Bowker, 1977.

Leavis, Q. D. Fiction and the Reading Public. 1935.

Mott, Frank Luther. Golden Multitudes: The Story of Bestsellers in the United States. New York: R. R. Bowker, 1947.

Robinson, David. Introduction to Empirical Bayes: Examples from Baseball Statistics. 2017. http://varianceexplained.org/r/empirical-bayes-book/

Silge, Julia. “Singing the Bayesian Beginner Blues.” data science ish, September 2016. http://juliasilge.com/blog/Bayesian-Blues/

Unsworth, John. 20th Century American Bestsellers. (http://bestsellers.lib.virginia.edu)

How quickly do literary standards change?

by Ted Underwood and Jordan Sellers

Part of this project will appear next year — revised and improved — in MLQ. But we’ve decided to release it as a free-standing draft rather than a preprint, because it allows us to use color and to explore some puzzling leads that won’t fit into the physical limits of one journal article.

To understand the aesthetic standards that govern reception, we contrasted two samples of English-language poetry, drawn from different social contexts: 1) a group of 360 volumes that we chose by sampling reviews in prominent periodicals, 1820-1919, and 2) a group of 360 volumes sampled at random from HathiTrust Digital Library, many of them pretty obscure.
LaborOdes
We were curious whether the difference in prestige between these books would be legible in the texts themselves. For instance, could you train a statistical model to predict whether a volume of poetry came from the “reviewed” or “random” sample just by looking at diction? And if you could, what social difference exactly would you be detecting?

Scholars sometimes suggest that high culture hadn’t differentiated from the rest of the literary field very sharply yet in the early 19th century [1: Huyssen 1986]. If so, books of poetry reviewed in prestigious contexts might be hard to identify in that part of the timeline. It might get easier toward the 20th century, as different poetic styles specialized to address (say) “high” and “middlebrow” audiences.

On the other hand, if writers became prominent by occupying the leading edge of a rapidly-moving wave, we might only be able to separate these samples by training a sequence of different models for different periods. For instance, prominent poets in the 1820s might be united by gloomy Byronism; in the 1850s they might share an interest in history; by the 1890s what they had in common might be the word “mauve.” As for the randomly-selected volumes, who knows? Maybe they would share only a tendency to trail thirty years behind the trend.

Since it seemed reasonable to assume that the standards governing reception had been volatile, we began by training a different model of poetic prestige for each twenty-year period. But we found, in practice, that the best way to separate these samples was to treat the whole period 1820-1919 as a single unit organized by a single set of aesthetic standards. You can click on the image that follows to see a slightly larger and clearer version.

plotmainmodelannotate

In the image above, each point is a volume of poetry, colored according to its actual social provenance. The y axis expresses a statistical model’s prediction about that provenance: How likely is it that this volume came from the “reviewed” sample, based only on the words in the volume?

As you can see, the model does a pretty decent job of sorting the two samples. It’s not right all the time, because of course a volume’s reception is determined by a lot of factors other than language (politics, the whims of reviewers, social networks). But the model is right 79.2% of the time, which is often enough to suggest that volumes reviewed in prominent venues had something in common. The sort of poetic language that got reviewed is distinguished from other poetic traditions not just toward the twentieth century, as we had expected, but throughout this period.

What’s even more puzzling is this: reviewed writers seem to have had the same thing in common throughout this century. The model is using essentially the same list of prestigious and banal words to separate Lord Byron from more obscure poets around 1819, and Christina Rossetti from more obscure poets around 1866, and T. S. Eliot from more obscure writers around 1917. That’s starting to sound like an oddly durable set of preferences. And actually, it’s even more durable than the image above suggests. A model trained on a quarter-century of the evidence can predict the other 75 years almost as accurately as a model trained on the whole century.

A model trained only on evidence from 1845-69 makes predictions about the other 75 years in the dataset.

A model trained only on evidence from 1845-69 makes predictions about the other 75 years in the dataset.

So how is it even possible to characterize a whole century of poetic reception — based on fourteen different periodicals from both sides of the Atlantic — with a single set of aesthetic standards? Weren’t there supposed to be a couple of “poetic revolutions” in this century somewhere? W. B. Yeats certainly thought that one happened in the 1890s [2].

There’s another curious detail implied in the image above: why is the boundary between “reviewed” and “random” volumes drifting upward across the timeline? Technically, that’s an error. Volumes are not really “more likely to be reviewed” just because they were published later. But this is an error of an interesting kind. The model doesn’t know when these volumes were published: the dataset drifts upward because words that were more common in reviewed volumes across this period turn out to be more common in all volumes by the end of the period. If you divide the timeline into parts, the same pattern recurs in each part; and — to leak a detail from the next stage of this project — it also happens when we model fiction. That starts to suggest an interestingly general connection between synchronic judgment and diachronic change.

And there’s more. The detailed differences between reviewed and random poetry are interesting. In the article, we examine a haunting passage from Christina Rossetti; it turns out the model likes “haunting.” We also generalize about the theory of representativeness underpinning distant reading, and ask how our contemporary pedagogical canon looks when viewed by nineteenth-century aesthetic standards.

But all this, obviously, is too much to discuss in a blog post. See the article itself for our actual attempt to understand these puzzles.

We’ve released our code and data on Github, and hope readers will find flaws in our reasoning so we can improve the project. But this draft has been bounced off a couple of audiences already; at this point it’s stable enough to be cited and criticized. So, after some reflection, we’ve closed comments on this post in order to encourage a more public sort of critique. If we’re overlooking something, please say so in a blog post. It’s an explicit premise of the project that “being reviewed at all indicates a sort of literary distinction — even if the review is negative.”

[1]: One influential thesis holds that this division crystallized “in the last decades of the 19th century and the first few years of the 20th.” Andreas Huyssen, After the Great Divide: Modernism, Mass Culture, Postmodernism (Bloomington: Indiana UP, 1986), viii.

[2]: W.B. Yeats dated the “revolt against Victorianism” and against “the poetical diction of everybody” to the 1890s. See discussion in Richard Fallis, “Yeats and the Reinterpretation of Victorian Poetry,” Victorian Poetry 14.2 (1976): 89-100.