topic modeling – Page 2 – The Stone and the Shell

The key to all mythologies.

Well, not really. But it is a classifying scheme that might turn out to be as loopy as Casaubon’s incomplete project in Middlemarch, and I thought I might embrace the comparison to make clear that I welcome skepticism.

In reality, it’s just a map of eighteenth-century diction. I took the 1,650 most common words in eighteenth-century writing, and asked my iMac to group them into clusters that tend to be common in the same eighteenth-century works. Since the clustering program works recursively, you end up with a gigantic branching tree that reveals how closely words are related to each other in 18c practice. If they appear on the same “branch”; they tend to occur in the same works. If they appear on the same “twig,” that tendency is even stronger.

You wouldn’t necessarily think that two words happening to occur in the same book would tell you much, but when you’re dealing with a large number of documents, it seems there’s a lot of information contained in the differences between them. In any case, this technique produced a detailed map of eighteenth-century topics that seemed — to me, anyway — surprisingly illuminating. To explore a couple of branches, or just marvel at this monument of digital folly, click here, or on the illustration to the right. That’ll take you through to a page where you can click on whichever branches interest you. (Click on the links in the right-hand margin, not the annotations on the tree itself.) To start with, I recommend Branch 18, which is a sort of travel narrative, Branch 13, which is 18c poetic diction in a nutshell, and Branch 5, which is saying something about gender and/or sexuality that I don’t yet understand.

If you want to know exactly how this was produced, and contrast it to other kinds of topic modeling, I describe the technique in this “technical note.” I should also give thanks to the usual cast of characters. Ryan Heuser and Ben Schmidt have produced analogous structures which gave me the idea of attempting this. Laura Mandell and 18th Connect helped me obtain the eighteenth-century texts on which the tree was based.