Categories
cultural analytics DH as a social phenomenon teaching

A broader purpose

The weather prevents me from being there physically, but this is a transcript of my remarks for “Varieties of Digital Humanities,” MLA, Jan 5, 2018.

Using numbers to understand cultural history is often called “cultural analytics”—or sometimes, if we’re talking about literary history in particular, “distant reading.” The practice is older than either name: sociologists, linguists, and adventurous critics like Janice Radway have been using quantitative methods for a long time.

But over the last twenty years, numbers have begun to have a broader impact on literary study, because we’ve learned to use them in a wider range of ways. We no longer just count things that happen to be easily counted (individual words, for instance, or books sold). Instead scholars can start with literary questions that really interest readers, and find ways to model them. Recent projects have cast light, for instance, on the visual impact of poetry, on imagined geography in the novel, on the instability of gender, and on the global diffusion of stream of consciousness. Articles that use numbers are appearing in central disciplinary venues: MLQ, Critical Inquiry, PMLA. Equally important: a new journal called Cultural Analytics has set high standards for transparent and reproducible research.

Of course, scholars still disagree with each other. And that’s part of what makes this field exciting. We aren’t simply piling up facts. New methods are sparking debate about the nature of the knowledge literary historians aim to produce. Are we interpreting the past or explaining it? Can numbers address perspectival questions? The name for these debates is “critical theory.” Twenty years from now, I think it will be clear that questions about quantitative models form an important unit in undergraduate theory courses.

Literary scholars are used to imagining numbers as tools, not as theories. So there’s translation work to be done. But translating between theoretical traditions could be the most important part of this project. Our existing tradition of critical theory teaches students to ask indispensable questions—about power, for instance, and the material basis of ideology. But persuasive answers to those questions will often require a lot of evidence, and the art of extracting meaningful patterns from evidence is taught by a different theoretical tradition, called “statistics.” Students will be best prepared for the twenty-first century if they can connect the two traditions, and do critical theory with numbers.

So in a lot of ways, this is a heady moment. Cultural analytics has historical discoveries, lively theoretical debates, and a public educational purpose. Intellectually, we’re in good shape.

But institutionally, we’re in awful shape. Or to be blunt: we are shape-less. Most literature departments do not teach students how to do this stuff at all. Everything I’ve just discussed may be represented by one unit in one course, where students play with topic models. Reduced to that size, I’m not sure cultural analytics makes any sense. If we were seriously trying to teach students to do critical theory with numbers, we would need to create a sequence of courses that guides them through basic principles (of statistical inference as well as historical interpretation) toward projects where they can pose real questions about the past.

What keeps us from building that curriculum? Part of the obstacle, I think, is the term digital humanities itself. Don’t get me wrong: I’m grateful for the popularity of DH. It has lent energy to many different projects. But the term digital humanities has been popular precisely because it promises that all those projects can still be contained in the humanities. The implicit pitch is something like this: “You won’t need a whole statistics course. Come to our two-hour workshop on topic models instead. You can always find a statistician to collaborate with.”

I understand why digital humanists said that kind of thing eight years ago. We didn’t want to frighten people away. If you write “Learn Another Discipline” on your welcome mat, you may not get many visitors. But a deceptively gentle welcome mat, followed by a trapdoor, is not really more welcoming. So it’s time to be honest about the preparation needed for cultural analytics. Young people entering this field will need to understand the whole process. They won’t even be able to pose meaningful questions, for instance, without some statistics.

Trompe l'oeil door mural
Trompe l’oeil faux door mural from http://www.bumblebee murals.com/cool-wall-murals/

But the metaphor of a welcome mat may be too optimistic. This field doesn’t have a door yet. I mean, there is no curriculum. So of course the field tends to attract people who already have an extracurricular background—which, of course, is not equally distributed. It shouldn’t surprise us that access is a problem when this field only exists as a social network. The point of a classroom is to distribute knowledge in a more equal, less homosocial way. But digital humanities classes, as currently defined, don’t really teach students how to use numbers. (For a bracingly honest exploration of the problem, see Andrew Goldstone.) So it’s almost naive to discuss “barriers to entry.” There is no entrance to this field. What we have is more like a door painted on the wall. But we’re in denial about that—because to admit the problem, we would have to admit that “DH” isn’t working as a gateway to everything it claims to contain.

I think the courses that can really open doors to cultural analytics are found, right now, in the social sciences. That’s why I recently moved half of my teaching to a School of Information Sciences. There, you find a curricular path that covers statistics and programming along with social questions about technology. I don’t think it’s an accident that you also find better gender and ethnic diversity among people using numbers in the social sciences. Methods get distributed more equally within a discipline that actually teaches the methods. So I recommend fusing cultural analytics with social science partly because it immediately makes this field more diverse. I’m not offering that as a sufficient answer to problems of access. I welcome other answers too. But I am suggesting that social-scientific methods are a necessary part of access. We cannot lower barriers to entry by continuing to pretend that cultural analytics is just the humanities, plus some user-friendly digital tools. That amounts to a trompe-l’oeil door.

What the social sciences lack are courses in literary history. And that’s important, because distant readers set out to answer concrete historical questions. So the unfortunate reality is, this project cannot be contained in one discipline.  The questions we try to answer are taught in the humanities. But the methods we use are taught, right now, in the social sciences and data science. Even if it frightens some students off, we have to acknowledge that cultural analytics is a multi-disciplinary project—a bridge between the humanities and quantitative social science, belonging equally to both.

I’m not recommending this approach for the DH community as a whole. DH has succeeded by fitting into the institutional framework of the humanities. DH courses are often pitched to English or History majors, and for many topics, that works brilliantly. But it’s awkward for quantitative courses. To use numbers wisely, students need preparation that an English major doesn’t provide. So increasingly I see the quantitative parts of DH presented as an interdisciplinary program rather than a concentration in the humanities.

dooropenIn saying this, I don’t mean to undersell the value of numbers for humanists. New methods can profoundly transform our view of the human past, and the research is deeply rewarding. So I’m convinced that statistics, and even machine learning, will gradually acquire a place in the humanistic curriculum.

I’m just saying that this is a bigger, slower project than the rhetoric of DH may have led us to expect. Mathematics doesn’t really come packaged in digital tools. Math is a way of thinking, and using it means entering into a long-term relationship with statisticians and social scientists. We are not borrowing tools for private use inside our discipline, but starting a theoretical conversation that should turn us outward, toward new forms of engagement with our colleagues and the world.

What is the point of studying culture with numbers, after all? It’s not to change English departments, but to enrich the way all students think about culture. The questions we’re posing can have real implications for the way students understand their roles in history—for instance, by linking their own cultural experience to century-spanning trends. Even more urgently, these questions give students a way to connect interpretive insights and resonant human details with habits of experimental inquiry.

Instead of imagining cultural analytics as a subfield of DH, I would almost call it an emerging way to integrate the different aspects of a liberal education. People who want to tackle that challenge are going to have to work across departments to some extent: it’s not a project that an English department could contain. But it is nevertheless an important opportunity for literary scholars, since it’s a place where our work becomes central to the broader purposes of the university as a whole.

Categories
DH as a social phenomenon teaching

Two syllabi: Digital Humanities and Data Science in the Humanities.

When I began teaching graduate courses about digital humanities, I designed syllabi that tried to cover a little of everything.

I enjoyed teaching those courses, but if I’m being honest, it was a challenge to race from digital editing — to maps and networks — to distant reading — to critical reflection on the concept of DH itself. It was even harder to cover that range of topics while giving students meaningful hands-on experience.

The solution, obviously, was to break the subject into more than one course. But I didn’t know how to do that within an English graduate curriculum. Many students are interested in learning about “digital humanities,” because a lot of debate has swirled around that broad rubric. I think the specific fields of inquiry grouped under the rubric actually make better-sized topics for a course, but they don’t have the same kind of name recognition, and courses on those topics don’t enroll as heavily.

This problem became easier to solve when part of my job moved into the School of Information Sciences. Many aspects of digital humanities — from social reflection on information technology to data mining — are already represented in the curriculum here. So I could divide DH into parts, and still have confidence that students would recognize those parts and understand how each part fit into an existing program of study.

This year I’ve taught two courses in the LIS curriculum. I’m sharing syllabi for both at once so I can also describe the contrast between them.

1. The first of the two, “Digital Humanities” (syllabus), is fundamentally a survey of DH as a social phenomenon, with special emphasis on the role of academic libraries and librarians — since that is likely to be a career path that many MLIS students are considering. The course covers a wide range of humanistic themes and topics, but doesn’t go very deeply into hands-on exploration of methods.

2. The second course, “Data Science in the Humanities” (syllabus)  covers the field that digital humanists often call “cultural analytics” — or “distant reading,” when it focuses on literature. Although I know its history is actually more complex, I’m characterizing this field as a form of data science in order to highlight its value for a wide range of students who may or may not intend to work as researchers in universities. I think humanistic questions can be great training for the slippery problems one encounters in business and computational journalism, for instance. But as Dennis Tenen and Andrew Goldstone (among others) have rightly pointed out, it can be a huge challenge to cover all the methods required for this sort of work in a single course. I’m not sure I have a perfect solution to that problem yet. The course is only in its third week! But we are aiming to achieve a kind of hands-on experience that combines Python programming with basic principles of statistics and machine learning, and with reflection on the challenges of social interpretation. I believe this may be achievable, in a course that doesn’t have to cover other aspects of DH, and when many students have at least a little previous experience, both in programming and in the humanities.

As Jupyter notebooks for the data science course are developed, I’m sharing them in a github repo. In both of the syllabi linked above, I also mention other syllabi that served as models. My thanks go out to everyone who shared their experience; I leaned on some of those models very heavily.

data_science_vdThe question I haven’t resolved yet is, How do we connect courses like these to an English curriculum? That connection remains crucial: I chose the phrase “data science” partly because the conversation around data science has explicitly acknowledged the importance of domain expertise. (See Drew Conway’s famous Venn diagram on the right.) I do think researchers need substantive knowledge about specific aspects of cultural history in order to frame meaningful questions about the past and interpret the patterns they find.

Right now, the courses I’m offering in LIS are certainly open to graduate students from humanities departments. But over the long run, I would also like to develop courses located in humanities departments that focus on specific literary-historical problems (for instance, questions of canonicity and popularity in a particular century), integrating distant-reading approaches only as one element of a broader portfolio of methods. Courses like that would fit fairly easily into an English graduate curriculum.

On the other hand, none of the courses I’ve described above can (by themselves) solve the most challenging pedagogical problem in DH, which is to make distant reading useful for doctoral dissertations. Right now, that’s very hard. The research opportunities in distant reading are huge, I believe, but that hugeness becomes itself a barrier. A field where you start making important discoveries after two to three years initial start-up time (training yourself, developing corpora, etc) is not ideally configured for the individualistic model of doctoral research that prevails in the humanities. Collective lab-centered projects are probably a better fit for this field. We may need to envision dissertations as being (at least in part) pieces of a larger research project, exploring one aspect of a shared problem.

Categories
teaching

Syllabus for a graduate seminar.

Sharing the syllabus for a course called “Distant-Reading the Long Nineteenth Century,” in case anyone finds it useful.

I profited a lot from other syllabi in writing this, taking hints in particular from courses designed by Rachel Buurma, James A. Evans, Andrew Goldstone, Lauren Klein, Alan Liu, Andrew Piper, Benjamin Schmidt, and Matthew Wilkens. My goals were especially close to Goldstone’s syllabus for “Literary Data” (Spring 2015), and there’s a lot of borrowing here: like him, I’m teaching R, using texts by Matt Jockers and Paul Teetor.

Although the title says “nineteenth century,” this is definitely a methods course more than a survey of literary history. (I mention a period in the title for truth in advertising, since I don’t have the data to support research projects outside of 1750-1922 yet.) The course will include several occasions for close reading of nineteenth-century literature, but the choices of texts will mostly be made as we proceed and motivated by our distant readings.

Three years ago I taught a very different grad seminar called “Digital Tools and Critical Theory.” That was more about teaching the conflicts; this one focuses on preparing students to do distant reading in their own work.

[Postscript a day later: One thing I’m borrowing from Goldstone, and emphasizing here, is an analogy to sociological “content analysis.” It’s been striking me lately that some useful applications of distant reading don’t require much algorithmic complexity at all — just thoughtful sampling of passages from a large collection.]

Categories
teaching

Syllabus: ENGL581: Digital Tools and Critical Theory.

This syllabus is indebted to just about everyone who has posted a syllabus for a DH course, and especially to Paul Fyfe, from whose draft syllabus I borrowed several readings.

The syllabus itself is here as a .pdf file.

As you’ll see if you download it, this is not a general digital humanities course. At Urbana-Champaign, John Unsworth has been teaching an introduction to digital humanities in the Graduate School of Library and Information Science, and there’s no way I could hope to replicate his breadth of knowledge. Instead I’ve focused on literary and historical applications of text mining, because that’s an area where I feel I can teach skills that a wide range of humanities graduate students will find immediately useful.

I realize the choice of focus may seem odd, since text mining is a relatively controversial subfield of DH, and a technically challenging one. There’s no way to duck the technical challenge: I am going to try to teach enough coding (using R) to empower students to define their own questions and visualize their own results. But I don’t think controversies about quantification need to be a problem, since I approach text mining largely as a discovery strategy. I hope it will turn up insights and clues that students find useful, without necessarily compelling them to add a lot of numbers or graphs to their arguments.

The “tools” and “theory” in the title of the course are not meant to be pitted against each other. The title instead flags a working assumption that practice and theory are fused: our interpretive theories are already shaped by the social/technical infrastructure we use to find and read texts, so reflectively reshaping that infrastructure is a way of “doing theory.”

Categories
methodology teaching

A course description.

I thought I would share the description of a graduate course I’ll be teaching in Spring 2012. It’s targeted specifically at students in English literature. So instead of teaching an “introduction to digital humanities” as a whole, I’ve decided to focus on the parts of this research program that seem to integrate most easily into literary study. I want to help students take risks — but I also want to focus, candidly, on risks that seem likely to produce useful credentials within the time frame of graduate study.

I think the perception among professors of literature may be that TEI-based editing is the digital tool that integrates most easily into what we do. But where grad students are concerned, I think new modes of collection-mapping are actually more widely useful, because they generate leads that can energize projects not otherwise centrally “digital.” This approach is technically a bit more demanding than TEI would be, but if students are handed a few simple modules (LSA-based topic modeling, Dunning’s log likelihood, collocation analysis, entity extraction, time series graphing) I think it’s fairly easy to reveal discourses, trends, and perhaps genres that no one has discussed. I’ll be sharing my own tools built in R, and an 18-19c collection I have developed in collaboration with E. Jordan Sellers. But I’ll also ask students to learn some basic elements of R themselves, so that they can adapt or connect modules and generate their own visualizations. As we get into problems that exceed the power of the average Mac, I’ll introduce students to the modular resources of SEASR. Wish us luck — it’s an experiment!

ENGL 581. Digital Tools and Critical Theory. Spring 2012.

Critical practice is already shaped by technology. Contemporary historicism emerged around the same time as full-text search, for instance, and would be hard to envision without it. Our goal in this course will be to make that relationship more reciprocal by using critical theory to shape technology in turn. For example, the prevailing system of “keyword search” requires scholars to begin by guessing how another era categorized the world. But much critical theory suggests that we cannot predict those categories in advance, and there are ways of mapping an archive that don’t require us to.

I’ve found that it does make a difference: when critics build their own tools, they can uncover trends and discourses that standard search technology does not reveal. The course will not assume any technical background, although it does assume willingness to learn a few basic elements of programming and statistics. Many of the tools/collections we need are already available on the web; others I can give you, or show you how to cobble together. We will often take time out from building things to read theory — like Moretti’s Maps, Graphs, Trees (2005), corpus linguistics, and influential critiques of or definitions of the digital humanities. But we will not mostly be writing about digital humanities. Instead I’ll recommend writing an ordinary critical essay about literary/cultural history, subtly informed by new tools or new models of discourse. (Underline “subtly.”) Projects on any period are possible, although the resources I can provide are admittedly richest between 1700 and 1900.

*****
By the way, it would be churlish of me not to acknowledge that I’ve learned much of what I know about this topic from grad students, and especially (where methodology is concerned) from Benjamin Schmidt, whose blog posts are an education in themselves and will certainly be on the syllabus. “Graduate education” in this field is a very circular process.

Categories
teaching undigitized humanities

It’s okay not to solve “the crisis of the humanities.”

I read Cathy Davidson’s latest piece in Academe with pleasure and admiration. She’s right that humanists need to think about the social function of our work, and right that this will require self-criticism. Moreover, Davidson’s work with HASTAC seems to me a model of the sort of innovation we need now.

However, Davidson says such kind things about the digital humanities that someone needs to pour in a few grains of salt. And since I’m a digital humanist, it might as well be me.

To reimagine a global humanism with relevance to the contemporary world means understanding, using, and contributing to new computational tools and methods. … Even a few examples show how being open to digital possibilities changes paradigms and brings new ways of reimagining the humanities into the world.

Reading this, I find myself blushing and stammering. And what I’m stammering is: “slow down a sec, because I’m not sure how central any of this is really going to be to our pedagogical mission.”

I’m going to teach a graduate course on digital humanities next semester, because I’m confident that information technology will change (actually, already has changed) the research end of our discipline. But I’m not yet sure about the implications at the undergraduate level. Maybe ten years from now I’ll be teaching text mining to undergrads … but then again, maybe the things undergraduates need most from an English course will still be historical perspective, close reading, a willingness to revise, and a habit of considering objections to their own thesis.

I’m sure that text mining belongs in undergraduate education somewhere. It raises fascinating social and linguistic puzzles. But I’m not sure whether we’ll be able to fit all the puzzles raised by technological change into the undergrad English major. It’s possible that English departments will want to stay focused on an older mission, leaving these new challenges to be scooped up by Linguistics or Computer Science. If that happens, it’s okay with me. It’s not particularly crucial that all the projects I care about be combined in a single department.

I’m dwelling on this because I feel humanists spend way too much time these days arguing about “what we need to do in order to keep the discipline from shrinking.” Sometimes the answer offered is a) return to our core competence, and sometimes the answer is b) boldly take on some new mission. But really I want to answer c) it is not our job to keep the discipline from shrinking, and we shouldn’t do anything purely for that reason. Our job is to make sure that we keep passing on the critical skills that the humanities develop best, at the same time as we explore new intellectual challenges.

Maybe those new challenges require us to expand. Or maybe it turns out that new challenges are relevant mostly at the graduate level, whereas at the undergraduate level we already have our hands full teaching students social history, close reading, and revision. And maybe that means that departments of English do end up shrinking relative to Communications or CompSci. If so, I hope it doesn’t happen rapidly, because I care about the fortunes of particular graduate students. But in the long term, it would not be a tragedy. Ideas matter. Departmental boundaries don’t. Intellectual history is not a contest to see who can retain the most faculty.

UPDATE Dec. 30 2011: I have to admit that my mind is in the process of being changed about this. After participating in a NITLE-sponsored seminar about teaching digital humanities at the undergraduate level, I’m much less hesitant than I was in September. Ryan Cordell, Brian Croxall, and Jeff McClurken presented really impressive digital-humanities courses that were also deeply grounded in the context of a specific discipline. Recording available at the link above.