This blog began as an attempt to keep track of cool things I was discovering on Google’s ngram viewer. But the ngram dataset proved to be — as Dan Cohen predicted — only a “gateway drug.” Actually, the MONK workbench was the gateway drug several years earlier, and before that Mark Olsen’s work on ARTFL, which I tried to imitate in 1995. In any case, I’m a text-mining addict now, and the blog has become an attempt to explain my addiction to a broader humanistic community. A theme I keep returning to is that humanists have already integrated simple forms of text mining into their research; the question is no longer whether we’re going to do text mining, but how much control we’re going to have over the tools we use.
I mostly focus on the eighteenth and nineteenth centuries. In collaboration with Jordan Sellers, I’ve built a 4,500-volume collection of English-language books from that period. The sharable parts of that collection are shared at the end of our JDH article. I’m about to scale up to a 500,000-volume collection, and intend to use machine learning algorithms to enrich metadata about genre on that scale. In collaboration with Loretta Auvil, Boris Capitanu, and Ryan Heuser, I developed a website that allows researchers to mine correlations in the eighteenth- and nineteenth-century printed record. The OCR-correction tools originally developed for that project are being further developed in a collaborative project on Uses of Scale in Literary Study.
The name of the blog is drawn from a dream described in the fifth book of Wordsworth’s Prelude, where a shell seems to represent poetry, and a stone mathematics — or at any rate, “geometric truth.” Toward the end of the book, Wordsworth observes that
Attends upon the motions of the winds
Embodied in the mystery of words;
There darkness makes abode, and all the host
Of shadowy things do work their changes there
As in a mansion like their proper home.
So, there’s a bit of poetry about changes worked through the mystery of words. Now for some math.
Ted Underwood teaches eighteenth- and nineteenth-century literature in the English department of the University of Illinois, Urbana-Champaign. He can be reached on Twitter at @Ted_Underwood, and on email at firstname.lastname@example.org.