Just a quick note here to acknowledge a collaborative project that I hope will generate some useful resources for scholars interested in text mining. We don’t have many resources up on the website yet, but watch this space.
The project is called The Uses of Scale, and it’s a pilot project for the Humanities Without Walls planning initiative, run by the Illinois Program for Research in the Humanities at the University of Illinois at Urbana-Champaign.
The principal investigators most actively involved in Uses of Scale are Ted Underwood (University of Illinois, Urbana-Champaign), Robin Valenza (University of Wisconsin, Madison), and Matt Wilkens (Notre Dame). All of us have been mining large collections of printed books, ranging from the early modern period to the twentieth century. We’ll be joining forces this year to reflect critically on problems of scale in literary research — including the questions that arise when we try to connect different scales of analysis. But we also hope to generate a few resources that are immediately and practically useful for scholars attempting to “scale up” their research projects (resources, for instance, for correcting OCR). There’s already a bare-bones list of OCR-correction rules on the website, as well a description of a more ambitious project now underway.