March 2011 – The Stone and the Shell

William Wordsworth’s claim to have brought poetry back to “the language of conversation in the middle and lower classes of society” gets repeated to each new generation of students (1). But did early nineteenth-century writing in general become more accessible, or closer to speech? It’s hard to say. We’ve used remarks like Wordsworth’s to anchor literary history, but we haven’t had a good way to assess their representativeness.

Increasingly, though, we’re in a position to test some familiar stories about literary history — to describe how the language of one genre changed relative to others, or even relative to “the language of conversation.” We don’t have eighteenth-century English speakers to interview, but we do have evidence about the kinds of words that tend to be more common in spoken language. For instance, Laly Bar-Ilan and Ruth Berman have shown in the journal Linguistics that contemporary spoken English is distinguished from writing by containing a higher proportion of words from the Old English part of the lexicon (2). This isn’t terribly surprising, since English was for a couple of hundred years (1066-1250) almost exclusively a spoken language, while French and Latin were used for writing. Any word that entered English before this period, and survived, had to be the kind of word that gets used in conversation. Words that entered afterward were often borrowed from French or Latin to flesh out the written language.

If the spoken language was distinguished from writing this way in the thirteenth century, and the same thing holds true today, then one might expect it to hold true in the eighteenth and nineteenth centuries as well. And it does seem to hold true: eighteenth-century drama, written to be spoken on stage, is distinguished from nondramatic poetry and prose by containing a higher proportion of Old English words. This is a broad-brush approach to diction, and not one that I would use to describe individual works. But applied to an appropriately large canvas, it may give us a rough picture of how the “register” of written diction has changed across time, becoming more conversational or more formal.

This graph is based on a version of the Google English corpus that I’ve cleaned up in a number of ways. Common OCR errors involving s, f, and ct have been corrected. The graph shows the aggregate frequency of the 500 most common English words that entered the language before the twelfth century. (I’ve found date-of-entry a more useful metric of a word’s affinity with spoken language than terms like “Latinate” or “Germanic.” After all, “Latinate” words like “school,” “street,” and “wall” don’t feel learned to us, because they’ve been transmitted orally for more than a millennium.) I’ve excluded a list of stopwords that includes determiners, prepositions, pronouns, and conjunctions, as well as the auxiliary verbs “be,” “will,” and “have.”

In relative terms, the change here may not look enormous; the peak in the early eighteenth century (181 words per thousand) is only about 20% higher than the trough in the late eighteenth century (152 words per thousand). But we’re talking about some of the most common words in the language (can, think, do, self, way, need, know). It’s a bit surprising that this part of the lexicon fluctuates at all. You might expect to see a gradual decline in the frequency of these words, as the overall size of the lexicon increases. But that’s not what happens: instead we see a rapid decline in the eighteenth century (as prose becomes less like speech, or at least less like the imagined speech of contemporaneous drama), and then a gradual recovery throughout the nineteenth century.

What does this tell us about literature? Not much, without information about genre. After all, as I mentioned, dramatic writing is a lot closer to speech than, say, poetry is. This curve might just be telling us that very few plays got written in the late eighteenth century.

Fortunately it’s possible to check the Google corpus against a smaller corpus of individual texts categorized by genre. I’ve made an initial pass at the first hundred years of this problem using a corpus of 2,188 eighteenth-century books produced by ECCO-TCP, which I obtained in plain text with help from Laura Mandell and 18thConnect. Two thousand books isn’t a huge corpus, especially not after you divide them up by genre, so these results are only preliminary. But the initial results seem to confirm that the change involved the language of prose itself, and not just changes in the relative prominence of different genres. Both fiction and nonfiction prose show a marked change across the century. If I’m right that the frequency of pre-12c words is a fair proxy for resemblance to spoken language, they became less and less like speech.

“Fiction” is of course a fuzzy category in the eighteenth century. The blurriness of the boundary between a sensationalized biography and a “novel” is a lot of the point of being a novel. In the graph above, I’ve lumped biographies and collections of personal letters in with novels, because I’m less interested in distinguishing something unique about fiction than I am in confirming a broad change in the diction of nondramatic prose.

By contrast, there’s relatively little change in the diction of poetry and drama. The proportion of pre-twelfth-century words is roughly the same at the end of the century as it was at the beginning.

Are these results intuitive, or are they telling us something new? I think the general direction of these curves probably confirms some intuitions. Anyone who studies eighteenth and nineteenth-century English knows that you get a lot of long words around 1800. Sad things become melancholy, needs become a necessity, and so on.

What may not be intuitive is how broad and steady the arc of change appears to be. To the extent that we English professors have any explanation for the elegant elaboration of late-eighteenth-century prose, I think we tend to blame Samuel Johnson. But these graphs suggest that much of the change had already taken place by the time Johnson published his Dictionary. Moreover, our existing stories about the history of style put a lot of emphasis on poetry — for instance, on Wordsworth’s critique of poetic diction. But the biggest changes in the eighteenth century seem to have involved prose rather than poetry. It’ll be interesting to see whether that holds true in the nineteenth century as well.

How do we explain these changes? I’m still trying to figure out. In the next couple of weeks I’ll write a post asking what took up the slack: what kinds of language became common in books where old, common words were relatively underrepresented?

—– references —–
1) William Wordsworth and Samuel T. Coleridge, Lyrical Ballads, with a Few Other Poems (Bristol: 1798), i.
2) Laly Bar-Ilan and Ruth A. Berman, “Developing register differentiation: the Latinate-Germanic divide in English,” Linguistics 45 (2007): 1-35.