The history of an association, part two.

Here’s another attempt to animate the history of a cluster of associated words — this time as a force-directed graph that folds and unfolds itself as the window of time moves forward, and changing strengths of association create different tensions in the graph.

I had a lot of fun making this clip, but I don’t want to make exaggerated claims for it. These images might not mean very much to me if I hadn’t also read some of the books on which they’re based. The visualization only took a day to build, though, and I think it might turn out to be a useful brainstorming tool. In this instance the clip got me thinking about the different ways time is imagined in the “terror gothic” and in the “horror gothic.”

Association between words is measured here using a vector space model and a collection of more than five hundred works of British fiction. I realize it may seem strange that associations can form and disappear while an eighty-year search window moves forward only sixty years — at the end of this clip the cluster is disappearing while the window still overlaps with the period where the cluster started to emerge. It’s worth recalling that the model isn’t counting words, but measuring the association between them. An early-eighteenth-century work that didn’t use sentimental language at all would do nothing to dilute the association between sentimental terms. But a group of nineteenth-century works that used the same language differently could rapidly obscure earlier patterns.

In short, I suspect that the language of temporal immediacy (“moment,” “suddenly,” “immediately,” and so on) is strongly associated with feeling in the 18c in part because gothic novels, and novels of sensibility, just get to it first. In the nineteenth century other kinds of fiction may take up the same temporal language, diluting its specific connection to tremulous feeling. I can’t prove it yet, but the clues I’m seeing do point in that general direction.

The history of an association.

[Update May 6th, 2011: The problem I describe here is solved a bit more effectively in a more recent post.] It’s fairly easy to visualize a cluster of associated words. But I’d also like to understand how these associations change, and visualizing that is trickier. For one thing, it’s not easy to define what it means to trace “the same” cluster across time; we need an approach that remains open to the possibility that a particular set of associations could simply weaken or dissolve. The video I’ve embedded below is a first, tentative stab at the problem. Move your mouse pointer away after clicking “play” to see the image without cropping.

I’m trying to understand a late-eighteenth-century convergence between the language of temporality and of feeling. Two words that seemed particularly strongly connected were “moment” and “felt.” So what I’ve done is to proceed five years at a time through a 200-year-long corpus, looking at 80-year-long windows from the corpus. In each “snapshot,” I select the twelve words that associate most strongly in vector space with a vector that’s composed of both “moment” and “felt.” In order to graph them on a coordinate plane, I also measure their association with each term separately. The y axis is association with “moment,” and the x axis is association with “felt.” The reference terms themselves are also plotted. This gives me a way to visualize strength of association in the whole cluster — basically, as everything gets closer to the upper-right-hand corner, the strength of association is getting stronger. At the same time we can get a general sense of the semantic character of the cluster.

I’m working with a relatively small collection here — 538 works of British fiction stretched out between 1700 and 1900. I have a larger 18th-century collection, but in this case I needed continuity over a longer span of time, and in order to achieve that I had to limit the collection to fiction, which reduced its size. It also means that the selection of words you’ll see here is different from the selection of words you saw in previous posts about the “felt-moment” convergence, which were based on a generically diverse collection.

Some of the things that are awkward about this video are consequences of the small collection size. For instance, given the small collection size, I have to choose a pretty long window (80 years out of an overall 200-year-long collection). The window is a bit shorter than that at the beginning of the video — for purely dramatic reasons, so that we don’t reach the “climax” of the clip too rapidly.

Also, of course, the stop-motion animation is rather jerky. With a larger collection, I think it might actually be possible to watch these terms move across the coordinate plane in a smooth and connected fashion. But given the small collection size, smooth motion would be illusory; the data don’t really support that level of precision.

However, even with all those caveats, I feel I’m learning something from the exercise. I think we are glimpsing the transformation of an associative cluster, and looking at the way it changes across time makes me more than ever suspect that — at the moment when it’s strongest — it has something to do with the way late-eighteenth-century fiction imagines suspense. “Anxiety” and “agitation” are durable presences, often in the upper-right-hand corner of the cluster. This interpretation is also, of course, based on reading some of the relevant works, and I think the next stage in exploring the question will be to go back and read them again. As always, I’m inclined to present text-mining more as an exploratory tool or brainstorming technique than as definitive evidence.

It is also a bit interesting to watch the language of gothic agitation turn into language of middle-class striving as we get into the nineteenth century. The intersection between “moment” and “felt” is increasingly occupied not by trembling but by terms like “energy,” “effort,” and “struggle.” I’m not quite sure what to make of that trajectory. Perhaps it helps explain the dissolution of the earlier cluster.

Another way of visualizing clusters like this might be to group terms in a force-directed graph and animate the evolution of the graph across time.