I’m getting ahead of myself with this post, because I don’t have time to explain everything I did to produce this. But it was just too striking not to share.
Basically, I’m experimenting with Latent Dirichlet Allocation, and I’m impressed. So first of all, thanks to Matt Jockers, Travis Brown, Neil Fraistat, and everyone else who tried to convince me that Bayesian methods are better. I’ve got to admit it. They are.
But anyway, in a class I’m teaching we’re using LDA on a generically diverse collection of 1,853 volumes between 1751 and 1903. The collection includes fiction, poetry, drama, and a limited amount of nonfiction (just biography). We’re stumbling on a lot of fascinating things, but this was slightly moving. Here’s the graph for one particular topic.
The circles and X’s are individual volumes. Blue is fiction, green is drama, pinkish purple is poetry, black biography. Only the volumes where this topic turned out to be prominent are plotted, because if you plot all 1,853 it’s just a blurry line at the bottom of the image. The gray line is an aggregate frequency curve, which is not related in any very intelligible way to the y-axis. (Work in progress …) As you can see. this topic is mostly prominent in fiction around the year 1800. Here are the top 50 words in the topic:
But here’s what I find slightly moving. The x’s at the top of the graph are the 10 works in the collection where the topic was most prominent. They include, in order, Mary Wollstonecraft Shelley, Frankenstein, Mary Wollstonecraft, Mary, William Godwin, St. Leon, Mary Wollstonecraft Shelley, Lodore, William Godwin, Fleetwood, William Godwin, Mandeville, and Mary Wollstonecraft Shelley, Falkner.
In short, this topic is exemplified by a family! Mary Hays does intrude into the family circle with Memoirs of Emma Courtney, but otherwise, it’s Mary Wollstonecraft, William Godwin, and their daughter.
Other critics have of course noticed that M. W. Shelley writes “Godwinian novels.” And if you go further down the list of works, the picture becomes less familial (Helen Maria Williams and Thomas Holcroft butt in, as well as P. B. Shelley). Plus, there’s another topic in the model (“myself these should situation”) that links William Godwin more closely to Charles Brockden Brown than it does to his wife or daughter. And LDA isn’t graven on stone; every time you run topic modeling you’re going to get something slightly different. But still, this is kind of a cool one. “Mind feelings heart felt” indeed.