How not to do things with words.

In recent weeks, journals published two papers purporting to draw broad cultural inferences from Google’s ngram corpus. The first of these papers, in PLoS One, argued that “language in American books has become increasingly focused on the self and uniqueness” since 1960. The second, in The Journal of Positive Psychology, argued that “moral ideals and virtues have largely waned from the public conversation” in twentieth-century America. Both articles received substantial attention from journalists and blogs; both have been discussed skeptically by linguists and digital humanists. (Mark Liberman’s takes on Language Log are particularly worth reading.)

I’m writing this post because systems of academic review and communication are failing us in cases like this, and we need to step up our game. Tools like Google’s ngram viewer have created new opportunities, but also new methodological pitfalls. Humanists are aware of those pitfalls, but I think we need to work a bit harder to get the word out to journalists, and to disciplines like psychology.

The basic methodological problem in both articles is that researchers have used present-day patterns of association to define a wordlist that they then take as an index of the fortunes of some concept (morality, individualism, etc) over historical time. (In the second study, for instance, words associated with morality were extracted from a thesaurus and crowdsourced using Mechanical Turk.)

The fallacy involved here has little to do with hot-button issues of quantification. A basic premise of historicism is that human experience gets divided up in different ways in different eras. If we crowdsource “leadership” using twenty-first-century reactions on Mechanical Turk, for instance, we’ll probably get words like “visionary” and “professional.” “Loud-voiced” probably won’t be on the list — because that’s just rude. But to Homer, there’s nothing especially noble about working for hire (“professionally”), whereas “the loud-voiced Achilles” is cut out to be a leader of men, since he can be heard over the din of spears beating on shields (Blackwell).

The laws of perspective apply to history as well. We don’t have an objective overview; we have a position in time that produces its own kind of distortion and foreshortening. Photo 2004 by June Ruivivar.

The authors of both articles are dimly aware of this problem, but they imagine that it’s something they can dismiss if they’re just conscientious and careful to choose a good list of words. I don’t blame them; they’re not coming from historical disciplines. But one of the things you learn by working in a historical discipline is that our perspective is often limited by history in ways we are unable to anticipate. So if you want to understand what morality meant in 1900, you have to work to reconstruct that concept; it is not going to be intuitively accessible to you, and it cannot be crowdsourced.

The classic way to reconstruct concepts from the past involves immersing yourself in sources from the period. That’s probably still the best way, but where language is concerned, there are also quantitative techniques that can help. For instance, Ryan Heuser and Long Le-Khac have carried out research on word frequency in the nineteenth-century novel that might superficially look like the psychological articles I am critiquing. (It’s Pamphlet 4 in the Stanford Literary Lab series.) But their work is much more reliable and more interesting, because it begins by mining patterns of association from the period in question. They don’t start from an abstract concept like “individualism” and pick words that might be associated with it. Instead, they find groups of words that are associated with each other, in practice, in nineteenth-century novels, and then trace the history of those groups. In doing so, they find some intriguing patterns that scholars of the nineteenth-century novel are going to need to pay attention to.

It’s also relevant that Heuser and Le-Khac are working in a corpus that is limited to fiction. One of the problems with the Google ngram corpus is that really we have no idea what genres are represented in it, or how their relative proportions may vary over time. So it’s possible that an apparent decline in the frequency of words for moral values is actually a decline in the frequency of certain genres — say, conduct books, or hagiographic biographies. A decline of that sort would still be telling us something about literary culture; but it might be telling us something different than we initially assume from tracing the decline of a word like “fidelity.”

So please, if you know a psychologist, or journalist, or someone who blogs for The Atlantic: let them know that there is actually an emerging interdisciplinary field developing a methodology to grapple with this sort of evidence. Articles that purport to draw historical conclusions from language need to demonstrate that they have thought about the problems involved. That will require thinking about math, but it also, definitely, requires thinking about dilemmas of historical interpretation.

My illustration about “loud-voiced Achilles” is a very old example of the way concepts change over time, drawn via Friedrich Meinecke from Thomas Blackwell, An Enquiry into the Life and Writings of Homer, 1735. The word “professional,” by the way, also illustrates a kind of subtly moralized contemporary vocabulary that Kesebir & Kesebir may be ignoring in their account of the decline of moral virtue. One of the other dilemmas of historical perspective is that we’re in our own blind spot.

15 thoughts on “How not to do things with words.

  1. Pingback: Weekend Reading « Backslash Scott Thoughts

  2. Hey Ted –

    Great post. Your methodological point is right on: making sure that there is a reliable mapping between the concept that people want to track and the specific n-grams that they use to do it is a big issue. For instance, it’s why the principal non-linguistic studies in our paper examine n-grams corresponding to dates and to people’s names. Although far from perfect (we dedicate many pages of supplemental materials to ensuring a robust association association between names and people) the terminology for referring to a person or date tends to be more stable than the terms used for, say, abstract concepts. If everyone understood the points you’re making, the quality of work in this domain would rise.

    That said, I think you go to far when you say that: ‘Humanists are aware of those pitfalls, but I think we need to work a bit harder to get the word out to…disciplines like psychology.’ Some psychologists I happen to know are quite aware of the relevant issues. Others aren’t. The same is true of humanists. (The following xkcd comes to mind:

    Let me put it this way. This field is extremely new and is expanding rapidly. No one knows where all this will lead. None of us have more than a small fraction of the answers. As we march forward into this terra incognita, we need to focus on building a global economy of ideas that we all will benefit from, instead of fomenting intellectual nationalism within existing disciplinary boundaries. We all have a lot to learn, from each other, and from whatever lies in the future – both the triumphs and the failures.

    Erez Lieberman Aiden

    • Fair enough. I think I agree with you here. I definitely don’t want to erect walls around the “humanities” or “sciences,” in order to insist that only “natives” get to explore problems within the walls. The enormous success of your collaboration with Google seems to me proof that scientists have a lot to contribute to problem domains that have traditionally been explored by humanists.

      In fact, I don’t even want to suggest (as some others have suggested) that researchers necessarily need to find collaborators as “native guides” to a given discipline. Collaboration is great, and I’m all for it — partly because these problems are just so big and labor-intensive. But it’s also the case that scientists can teach themselves enough about the humanities to do historical work. And humanists can teach themselves enough about data mining to tackle the quantitative angle. It’s not easy in either direction, but it’s doable, and actually I think it’s healthy for us all to get outside our disciplinary comfort zone. You’re right about how much we still have to learn.

      So I’m sorry if I seemed to be picking on psychologists in particular. I don’t mean to bash psychology, or to suggest that only humanists “get it.” But I do feel there’s a need for people who understand these issues (from whatever discipline) to educate a wider public so we can raise standards of evaluation for this kind of study. Right now they seem perilously low.

  3. Pingback: Links of the week – let there be lights edition « ivry twr

  4. I think I’ve been noticing a vague pattern of this kind of thing in the high-profile, general science journals, across several kinds of work, especially with papers not from the physical sciences, where their reviewing seems better. Not just humanities-implicating work, but just about any other area as well. In my field (artificial intelligence), people tend to instinctively groan when a widely publicized AI-related paper comes out in places like Nature or Science or PLoS Science. They’re sometimes good, but often seriously over-claim, in either their conclusions or their novelty or both. There’s something about attempts at “blockbuster” work that lends itself to that. Not as bad as TED talks, say, but with some of the same flavor of too strong and too quick conclusions.

  5. Pingback: Facilitators are More Popular than Dictators: Google Ngram Viewer « Facilitative Leadership & Facilitator Training

  6. Pingback: N-grams, So What? | Digital Modernisms

  7. Pingback: A “big” step | Digital Realism

  8. Pingback: Using Google Ngrams | Digital Tools: Sherlock Holmes's London

  9. Pingback: Strike A Pose: Topic Modeling | digilitpsu

  10. Pingback: What is Distant Reading and What Is It Good For? | Itsjustjulia's Blog

  11. There have been a rash of papers in the last decade that get published in Science or Nature or especially PNAS that purport to do historical linguistics from the perspective of biology or psychology. Many of these papers are riddled with elementary statistical and methodological errors which the historical and general linguistics communities have been at pains to refute, but falsehood goes around the world seven times while truth is getting its boots on.

  12. I don’t have much to say about this other than this is brilliant! I had to read this for an assignment in my university Advanced English research Methods course, and I must say, I’m so glad I did. This really solidifies what I’m currently learning, and I think that other disciplines should look at this post as well.

  13. Pingback: How Big Data Creates False Confidence – Teknoids News

  14. Pingback: Как массивы данных создают в людях ложную уверенность

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s