Giving a talk this morning at the MLA. There are two main arguments:
1) The first one will be familiar if you’ve read my blog. I suggest that the boundary between “text mining” and conventional literary research is far fuzzier than people realize. There appears to be a boundary only because literary scholars are pretty unreflective about the way we’re currently using full-text search. I’m going to press this point in detail, because it’s not just a metaphor: to produce a simple but useful topic-modeling algorithm, all you have to do is take a search engine and run it backwards.
2) The second argument is newer; I don’t think I’ve blogged about it yet. I’m going to present topic modeling as a useful bridge between “distant” and “close” reading. I’ve found that I often learn most about a genre by modeling it as part of a larger collection that includes many other genres. In that context, a topic-modeling algorithm can highlight peculiar convergences of themes that characterize the genre relative to its contemporary backdrop.
This is distant reading, in the sense that it requires a large collection. But it’s also close reading, in the sense that it’s designed to reveal subtle formal principles that shape individual works, and that might otherwise elude us.
Although the emphasis is different, a lot of the examples I use are recycled from a talk I gave in August, described here.