Two syllabi: Digital Humanities and Data Science in the Humanities.

When I began teaching graduate courses about digital humanities, I designed syllabi that tried to cover a little of everything.

I enjoyed teaching those courses, but if I’m being honest, it was a challenge to race from digital editing — to maps and networks — to distant reading — to critical reflection on the concept of DH itself. It was even harder to cover that range of topics while giving students meaningful hands-on experience.

The solution, obviously, was to break the subject into more than one course. But I didn’t know how to do that within an English graduate curriculum. Many students are interested in learning about “digital humanities,” because a lot of debate has swirled around that broad rubric. I think the specific fields of inquiry grouped under the rubric actually make better-sized topics for a course, but they don’t have the same kind of name recognition, and courses on those topics don’t enroll as heavily.

This problem became easier to solve when part of my job moved into the School of Information Sciences. Many aspects of digital humanities — from social reflection on information technology to data mining — are already represented in the curriculum here. So I could divide DH into parts, and still have confidence that students would recognize those parts and understand how each part fit into an existing program of study.

This year I’ve taught two courses in the LIS curriculum. I’m sharing syllabi for both at once so I can also describe the contrast between them.

1. The first of the two, “Digital Humanities” (syllabus), is fundamentally a survey of DH as a social phenomenon, with special emphasis on the role of academic libraries and librarians — since that is likely to be a career path that many MLIS students are considering. The course covers a wide range of humanistic themes and topics, but doesn’t go very deeply into hands-on exploration of methods.

2. The second course, “Data Science in the Humanities” (syllabus)  covers the field that digital humanists often call “cultural analytics” — or “distant reading,” when it focuses on literature. Although I know its history is actually more complex, I’m characterizing this field as a form of data science in order to highlight its value for a wide range of students who may or may not intend to work as researchers in universities. I think humanistic questions can be great training for the slippery problems one encounters in business and computational journalism, for instance. But as Dennis Tenen and Andrew Goldstone (among others) have rightly pointed out, it can be a huge challenge to cover all the methods required for this sort of work in a single course. I’m not sure I have a perfect solution to that problem yet. The course is only in its third week! But we are aiming to achieve a kind of hands-on experience that combines Python programming with basic principles of statistics and machine learning, and with reflection on the challenges of social interpretation. I believe this may be achievable, in a course that doesn’t have to cover other aspects of DH, and when many students have at least a little previous experience, both in programming and in the humanities.

As Jupyter notebooks for the data science course are developed, I’m sharing them in a github repo. In both of the syllabi linked above, I also mention other syllabi that served as models. My thanks go out to everyone who shared their experience; I leaned on some of those models very heavily.

data_science_vdThe question I haven’t resolved yet is, How do we connect courses like these to an English curriculum? That connection remains crucial: I chose the phrase “data science” partly because the conversation around data science has explicitly acknowledged the importance of domain expertise. (See Drew Conway’s famous Venn diagram on the right.) I do think researchers need substantive knowledge about specific aspects of cultural history in order to frame meaningful questions about the past and interpret the patterns they find.

Right now, the courses I’m offering in LIS are certainly open to graduate students from humanities departments. But over the long run, I would also like to develop courses located in humanities departments that focus on specific literary-historical problems (for instance, questions of canonicity and popularity in a particular century), integrating distant-reading approaches only as one element of a broader portfolio of methods. Courses like that would fit fairly easily into an English graduate curriculum.

On the other hand, none of the courses I’ve described above can (by themselves) solve the most challenging pedagogical problem in DH, which is to make distant reading useful for doctoral dissertations. Right now, that’s very hard. The research opportunities in distant reading are huge, I believe, but that hugeness becomes itself a barrier. A field where you start making important discoveries after two to three years initial start-up time (training yourself, developing corpora, etc) is not ideally configured for the individualistic model of doctoral research that prevails in the humanities. Collective lab-centered projects are probably a better fit for this field. We may need to envision dissertations as being (at least in part) pieces of a larger research project, exploring one aspect of a shared problem.