The pace of posts here has slowed, and it may stay pretty slow until I get some new data-slicing tools set up.
I spent the weekend trying to understand when I might want to use a vector space model to compare documents or terms, and when ordinary Pearson’s correlation would be better. Also, I now understand how Ward’s method of hierarchical agglomerative clustering is different from all the other methods.
Aside from the sheer fun of geekery, what I’ve learned is that the digital humanities have become *much* easier to enter than they were in the 90s. I attempted a bit of data-mining in the early 90s, and published an article containing a few graphs in Studies in Romanticism, but didn’t pursue the approach much further because I found it nearly impossible to produce the kind of results I wanted on the necessary scale. (You have to remember that my interests lean toward the large end of the scale continuum in DH.)
I told myself that I would get back in the game when the kinds of collections I needed began to become available, and in the last couple of years it became clear to me that they were, if not available, at least possible to construct. But I actually had no idea how transparent and accessible things have become. So much information is freely available on the web, and with tools like Zotero and SEASR the web is also becoming a medium in which one can do the work itself. Everything’s frickin interoperable. It’s so different from the 90s when you had to build things more or less from scratch yourself.