Two syllabi: Digital Humanities and Data Science in the Humanities.

When I began teaching graduate courses about digital humanities, I designed syllabi that tried to cover a little of everything.

I enjoyed teaching those courses, but if I’m being honest, it was a challenge to race from digital editing — to maps and networks — to distant reading — to critical reflection on the concept of DH itself. It was even harder to cover that range of topics while giving students meaningful hands-on experience.

The solution, obviously, was to break the subject into more than one course. But I didn’t know how to do that within an English graduate curriculum. Many students are interested in learning about “digital humanities,” because a lot of debate has swirled around that broad rubric. I think the specific fields of inquiry grouped under the rubric actually make better-sized topics for a course, but they don’t have the same kind of name recognition, and courses on those topics don’t enroll as heavily.

This problem became easier to solve when part of my job moved into the School of Information Sciences. Many aspects of digital humanities — from social reflection on information technology to data mining — are already represented in the curriculum here. So I could divide DH into parts, and still have confidence that students would recognize those parts and understand how each part fit into an existing program of study.

This year I’ve taught two courses in the LIS curriculum. I’m sharing syllabi for both at once so I can also describe the contrast between them.

1. The first of the two, “Digital Humanities” (syllabus), is fundamentally a survey of DH as a social phenomenon, with special emphasis on the role of academic libraries and librarians — since that is likely to be a career path that many MLIS students are considering. The course covers a wide range of humanistic themes and topics, but doesn’t go very deeply into hands-on exploration of methods.

2. The second course, “Data Science in the Humanities” (syllabus)  covers the field that digital humanists often call “cultural analytics” — or “distant reading,” when it focuses on literature. Although I know its history is actually more complex, I’m characterizing this field as a form of data science in order to highlight its value for a wide range of students who may or may not intend to work as researchers in universities. I think humanistic questions can be great training for the slippery problems one encounters in business and computational journalism, for instance. But as Dennis Tenen and Andrew Goldstone (among others) have rightly pointed out, it can be a huge challenge to cover all the methods required for this sort of work in a single course. I’m not sure I have a perfect solution to that problem yet. The course is only in its third week! But we are aiming to achieve a kind of hands-on experience that combines Python programming with basic principles of statistics and machine learning, and with reflection on the challenges of social interpretation. I believe this may be achievable, in a course that doesn’t have to cover other aspects of DH, and when many students have at least a little previous experience, both in programming and in the humanities.

As Jupyter notebooks for the data science course are developed, I’m sharing them in a github repo. In both of the syllabi linked above, I also mention other syllabi that served as models. My thanks go out to everyone who shared their experience; I leaned on some of those models very heavily.

data_science_vdThe question I haven’t resolved yet is, How do we connect courses like these to an English curriculum? That connection remains crucial: I chose the phrase “data science” partly because the conversation around data science has explicitly acknowledged the importance of domain expertise. (See Drew Conway’s famous Venn diagram on the right.) I do think researchers need substantive knowledge about specific aspects of cultural history in order to frame meaningful questions about the past and interpret the patterns they find.

Right now, the courses I’m offering in LIS are certainly open to graduate students from humanities departments. But over the long run, I would also like to develop courses located in humanities departments that focus on specific literary-historical problems (for instance, questions of canonicity and popularity in a particular century), integrating distant-reading approaches only as one element of a broader portfolio of methods. Courses like that would fit fairly easily into an English graduate curriculum.

On the other hand, none of the courses I’ve described above can (by themselves) solve the most challenging pedagogical problem in DH, which is to make distant reading useful for doctoral dissertations. Right now, that’s very hard. The research opportunities in distant reading are huge, I believe, but that hugeness becomes itself a barrier. A field where you start making important discoveries after two to three years initial start-up time (training yourself, developing corpora, etc) is not ideally configured for the individualistic model of doctoral research that prevails in the humanities. Collective lab-centered projects are probably a better fit for this field. We may need to envision dissertations as being (at least in part) pieces of a larger research project, exploring one aspect of a shared problem.

Syllabus for a graduate seminar.

Sharing the syllabus for a course called “Distant-Reading the Long Nineteenth Century,” in case anyone finds it useful.

I profited a lot from other syllabi in writing this, taking hints in particular from courses designed by Rachel Buurma, James A. Evans, Andrew Goldstone, Lauren Klein, Alan Liu, Andrew Piper, Benjamin Schmidt, and Matthew Wilkens. My goals were especially close to Goldstone’s syllabus for “Literary Data” (Spring 2015), and there’s a lot of borrowing here: like him, I’m teaching R, using texts by Matt Jockers and Paul Teetor.

Although the title says “nineteenth century,” this is definitely a methods course more than a survey of literary history. (I mention a period in the title for truth in advertising, since I don’t have the data to support research projects outside of 1750-1922 yet.) The course will include several occasions for close reading of nineteenth-century literature, but the choices of texts will mostly be made as we proceed and motivated by our distant readings.

Three years ago I taught a very different grad seminar called “Digital Tools and Critical Theory.” That was more about teaching the conflicts; this one focuses on preparing students to do distant reading in their own work.

[Postscript a day later: One thing I’m borrowing from Goldstone, and emphasizing here, is an analogy to sociological “content analysis.” It’s been striking me lately that some useful applications of distant reading don’t require much algorithmic complexity at all — just thoughtful sampling of passages from a large collection.]

Syllabus: ENGL581: Digital Tools and Critical Theory.

This syllabus is indebted to just about everyone who has posted a syllabus for a DH course, and especially to Paul Fyfe, from whose draft syllabus I borrowed several readings.

The syllabus itself is here as a .pdf file.

As you’ll see if you download it, this is not a general digital humanities course. At Urbana-Champaign, John Unsworth has been teaching an introduction to digital humanities in the Graduate School of Library and Information Science, and there’s no way I could hope to replicate his breadth of knowledge. Instead I’ve focused on literary and historical applications of text mining, because that’s an area where I feel I can teach skills that a wide range of humanities graduate students will find immediately useful.

I realize the choice of focus may seem odd, since text mining is a relatively controversial subfield of DH, and a technically challenging one. There’s no way to duck the technical challenge: I am going to try to teach enough coding (using R) to empower students to define their own questions and visualize their own results. But I don’t think controversies about quantification need to be a problem, since I approach text mining largely as a discovery strategy. I hope it will turn up insights and clues that students find useful, without necessarily compelling them to add a lot of numbers or graphs to their arguments.

The “tools” and “theory” in the title of the course are not meant to be pitted against each other. The title instead flags a working assumption that practice and theory are fused: our interpretive theories are already shaped by the social/technical infrastructure we use to find and read texts, so reflectively reshaping that infrastructure is a way of “doing theory.”

A course description.

I thought I would share the description of a graduate course I’ll be teaching in Spring 2012. It’s targeted specifically at students in English literature. So instead of teaching an “introduction to digital humanities” as a whole, I’ve decided to focus on the parts of this research program that seem to integrate most easily into literary study. I want to help students take risks — but I also want to focus, candidly, on risks that seem likely to produce useful credentials within the time frame of graduate study.

I think the perception among professors of literature may be that TEI-based editing is the digital tool that integrates most easily into what we do. But where grad students are concerned, I think new modes of collection-mapping are actually more widely useful, because they generate leads that can energize projects not otherwise centrally “digital.” This approach is technically a bit more demanding than TEI would be, but if students are handed a few simple modules (LSA-based topic modeling, Dunning’s log likelihood, collocation analysis, entity extraction, time series graphing) I think it’s fairly easy to reveal discourses, trends, and perhaps genres that no one has discussed. I’ll be sharing my own tools built in R, and an 18-19c collection I have developed in collaboration with E. Jordan Sellers. But I’ll also ask students to learn some basic elements of R themselves, so that they can adapt or connect modules and generate their own visualizations. As we get into problems that exceed the power of the average Mac, I’ll introduce students to the modular resources of SEASR. Wish us luck — it’s an experiment!

ENGL 581. Digital Tools and Critical Theory. Spring 2012.

Critical practice is already shaped by technology. Contemporary historicism emerged around the same time as full-text search, for instance, and would be hard to envision without it. Our goal in this course will be to make that relationship more reciprocal by using critical theory to shape technology in turn. For example, the prevailing system of “keyword search” requires scholars to begin by guessing how another era categorized the world. But much critical theory suggests that we cannot predict those categories in advance, and there are ways of mapping an archive that don’t require us to.

I’ve found that it does make a difference: when critics build their own tools, they can uncover trends and discourses that standard search technology does not reveal. The course will not assume any technical background, although it does assume willingness to learn a few basic elements of programming and statistics. Many of the tools/collections we need are already available on the web; others I can give you, or show you how to cobble together. We will often take time out from building things to read theory — like Moretti’s Maps, Graphs, Trees (2005), corpus linguistics, and influential critiques of or definitions of the digital humanities. But we will not mostly be writing about digital humanities. Instead I’ll recommend writing an ordinary critical essay about literary/cultural history, subtly informed by new tools or new models of discourse. (Underline “subtly.”) Projects on any period are possible, although the resources I can provide are admittedly richest between 1700 and 1900.

*****
By the way, it would be churlish of me not to acknowledge that I’ve learned much of what I know about this topic from grad students, and especially (where methodology is concerned) from Benjamin Schmidt, whose blog posts are an education in themselves and will certainly be on the syllabus. “Graduate education” in this field is a very circular process.

It’s okay not to solve “the crisis of the humanities.”

I read Cathy Davidson’s latest piece in Academe with pleasure and admiration. She’s right that humanists need to think about the social function of our work, and right that this will require self-criticism. Moreover, Davidson’s work with HASTAC seems to me a model of the sort of innovation we need now.

However, Davidson says such kind things about the digital humanities that someone needs to pour in a few grains of salt. And since I’m a digital humanist, it might as well be me.

To reimagine a global humanism with relevance to the contemporary world means understanding, using, and contributing to new computational tools and methods. … Even a few examples show how being open to digital possibilities changes paradigms and brings new ways of reimagining the humanities into the world.

Reading this, I find myself blushing and stammering. And what I’m stammering is: “slow down a sec, because I’m not sure how central any of this is really going to be to our pedagogical mission.”

I’m going to teach a graduate course on digital humanities next semester, because I’m confident that information technology will change (actually, already has changed) the research end of our discipline. But I’m not yet sure about the implications at the undergraduate level. Maybe ten years from now I’ll be teaching text mining to undergrads … but then again, maybe the things undergraduates need most from an English course will still be historical perspective, close reading, a willingness to revise, and a habit of considering objections to their own thesis.

I’m sure that text mining belongs in undergraduate education somewhere. It raises fascinating social and linguistic puzzles. But I’m not sure whether we’ll be able to fit all the puzzles raised by technological change into the undergrad English major. It’s possible that English departments will want to stay focused on an older mission, leaving these new challenges to be scooped up by Linguistics or Computer Science. If that happens, it’s okay with me. It’s not particularly crucial that all the projects I care about be combined in a single department.

I’m dwelling on this because I feel humanists spend way too much time these days arguing about “what we need to do in order to keep the discipline from shrinking.” Sometimes the answer offered is a) return to our core competence, and sometimes the answer is b) boldly take on some new mission. But really I want to answer c) it is not our job to keep the discipline from shrinking, and we shouldn’t do anything purely for that reason. Our job is to make sure that we keep passing on the critical skills that the humanities develop best, at the same time as we explore new intellectual challenges.

Maybe those new challenges require us to expand. Or maybe it turns out that new challenges are relevant mostly at the graduate level, whereas at the undergraduate level we already have our hands full teaching students social history, close reading, and revision. And maybe that means that departments of English do end up shrinking relative to Communications or CompSci. If so, I hope it doesn’t happen rapidly, because I care about the fortunes of particular graduate students. But in the long term, it would not be a tragedy. Ideas matter. Departmental boundaries don’t. Intellectual history is not a contest to see who can retain the most faculty.

UPDATE Dec. 30 2011: I have to admit that my mind is in the process of being changed about this. After participating in a NITLE-sponsored seminar about teaching digital humanities at the undergraduate level, I’m much less hesitant than I was in September. Ryan Cordell, Brian Croxall, and Jeff McClurken presented really impressive digital-humanities courses that were also deeply grounded in the context of a specific discipline. Recording available at the link above.