[Update May 6th, 2011: The problem I describe here is solved a bit more effectively in a more recent post.] It’s fairly easy to visualize a cluster of associated words. But I’d also like to understand how these associations change, and visualizing that is trickier. For one thing, it’s not easy to define what it means to trace “the same” cluster across time; we need an approach that remains open to the possibility that a particular set of associations could simply weaken or dissolve. The video I’ve embedded below is a first, tentative stab at the problem. Move your mouse pointer away after clicking “play” to see the image without cropping.
I’m trying to understand a late-eighteenth-century convergence between the language of temporality and of feeling. Two words that seemed particularly strongly connected were “moment” and “felt.” So what I’ve done is to proceed five years at a time through a 200-year-long corpus, looking at 80-year-long windows from the corpus. In each “snapshot,” I select the twelve words that associate most strongly in vector space
with a vector that’s composed of both “moment” and “felt.” In order to graph them on a coordinate plane, I also measure their association with each term separately. The y axis is association with “moment,” and the x axis is association with “felt.” The reference terms themselves are also plotted. This gives me a way to visualize strength of association in the whole cluster — basically, as everything gets closer to the upper-right-hand corner, the strength of association is getting stronger. At the same time we can get a general sense of the semantic character of the cluster.
I’m working with a relatively small collection here — 538 works of British fiction stretched out between 1700 and 1900. I have a larger 18th-century collection, but in this case I needed continuity over a longer span of time, and in order to achieve that I had to limit the collection to fiction, which reduced its size. It also means that the selection of words you’ll see here is different from the selection of words you saw in previous posts about the “felt-moment” convergence
, which were based on a generically diverse collection.
Some of the things that are awkward about this video are consequences of the small collection size. For instance, given the small collection size, I have to choose a pretty long window (80 years out of an overall 200-year-long collection). The window is a bit shorter than that at the beginning of the video — for purely dramatic reasons, so that we don’t reach the “climax” of the clip too rapidly.
Also, of course, the stop-motion animation is rather jerky. With a larger collection, I think it might actually be possible to watch these terms move across the coordinate plane in a smooth and connected fashion. But given the small collection size, smooth motion would be illusory; the data don’t really support that level of precision.
However, even with all those caveats, I feel I’m learning something from the exercise. I think we are glimpsing the transformation of an associative cluster, and looking at the way it changes across time makes me more than ever suspect that — at the moment when it’s strongest — it has something to do with the way late-eighteenth-century fiction imagines suspense. “Anxiety” and “agitation” are durable presences, often in the upper-right-hand corner of the cluster. This interpretation is also, of course, based on reading some of the relevant works, and I think the next stage in exploring the question will be to go back and read them again. As always, I’m inclined to present text-mining more as an exploratory tool or brainstorming technique than as definitive evidence.
It is also a bit interesting to watch the language of gothic agitation turn into language of middle-class striving as we get into the nineteenth century. The intersection between “moment” and “felt” is increasingly occupied not by trembling but by terms like “energy,” “effort,” and “struggle.” I’m not quite sure what to make of that trajectory. Perhaps it helps explain the dissolution of the earlier cluster.
Another way of visualizing clusters like this might be to group terms in a force-directed graph
and animate the evolution of the graph across time.