Category: Uncategorized

Should artificial intelligence be person-shaped?

Post author By tedunderwood
Post date June 15, 2024
2 Comments on Should artificial intelligence be person-shaped?

Since Mary Shelley, writers of science fiction have enjoyed musing about the moral dilemmas created by artificial persons.

I haven’t, to be honest. I used to insist on the term “machine learning,” because I wanted to focus on what the technology actually does: model data. Questions about personhood and “alignment” felt like anthropocentric distractions, verging on woo.

But these days woo is hard to avoid. OpenAI is now explicitly marketing the promise that AI will cross the uncanny valley and start to sound like a person. The whole point of the GPT-4o demo was to show off the human-sounding (and um, gendered) expressiveness of a new model’s voice. If there had been any doubt about the goal, Sam Altman’s one-word tweet “her” removed it.

*Mira Murati, Mark Chen, and Barret Zoph at the GPT-4o demo.*

At the other end of the spectrum lies Apple, which seems to be working hard to avoid any suggestion that the artificial intelligence in their products could coalesce into an entity. The phrase “Apple Intelligence” has a lot of advantages, but one of them is that it doesn’t take a determiner. It’s Apple Intelligence, not “an apple intelligence.” Apple’s conception of this feature is more like an operating system — diffuse and unobtrusive — just a transparent interface to the apps, schedules, and human conversations contained on your phone.

Craig Federighi at WWDC ’24. If you look closely, Apple Intelligence includes “a more personal Siri.” But if you look even closer, the point is not that Siri has more personhood but that it will better understand yours (e.g., when your mother’s flight arrives).

If OpenAI is obsessed with Her, Apple Intelligence looks more like a Caddy from All the Birds in the Sky. In Charlie Jane Anders’ novel, Caddies are mobile devices that quietly guide their users with reminders and suggestions (restaurants you might like, friends who happen to be nearby, and so on). A Caddy doesn’t need an expressive voice, because it’s a service rather than a separate person. In All the Birds, Patricia starts to feel it’s “an extension of her personality” (173).

There are a lot of reasons to prefer Apple’s approach. Putting the customer at the center of a sales pitch is usually a smart move. Ben Evans also argues that users will understand the limitations of AI better if it’s integrated into interfaces that provide a specific service rather than presented as an open-ended chatbot.

Moreover, Apple’s approach avoids several kinds of cringe invited by OpenAI’s demo — from the creepily-gendered Pygmalion vibe to the more general problem that we don’t know how to react to the laughter of a creature that doesn’t feel emotion. (Readers of Neuromancer may remember how much Case hates “the laugh that wasn’t laughter” emitted by the recording of his former teacher, the Dixie Flatline.)

Finally, impersonal AI is calculated to please grumpy abstract thinkers like me, who find a fixation on so-called “general” intelligence annoyingly anthropocentric.

However. Let’s look at the flip side for a second.

The most interesting case I’ve heard for person-shaped AI was offered last week by Amanda Askell, a philosopher working at Anthropic. In an interview with Stuart Richie, Askell argues that AI needs a personality for two reasons. First, shaping a personality is how we endow models with flexible principles that will “determine how [they] react to new and difficult situations.” Personality, in other words, is simply how we reason about character. Second, personality signals to users that they’re not talking to an omniscient oracle.

“We want people to know that they’re interacting with a language model and not a person. But we also want them to know they’re interacting with an imperfect entity with its own biases and with a disposition towards some opinions more than others. Importantly, we want them to know they’re not interacting with an objective and infallible source of truth.”

It’s a good argument. One has to approach it skeptically, because there are several other profitable reasons for companies to give their products a primate-shaped UI. It provides “a more interesting user experience,” as Askell admits — and possibly a site of parasocial attachment. (OpenAI’s Sky voice sounded a bit like Scarlett Johansson.) Plus, human behavior is just something we know how to interpret. I often prefer to interact with ChatGPT in voice mode, not only because it leaves my hands and eyes free, but because it gives the model an extra set of ways to direct my attention — ranging from emphasis to, uh, theatrical pauses that signal a new or difficult topic.

But this ends up sending us back to Askell’s argument. Even if models are not people, maybe we need the mask of personality to understand them? A human-sounding interface provides both simple auditory signals and epistemic signals of bias and limitation. Suppressing those signals is not necessarily more honest. It may be relevant here that the impersonal transparency of the Caddies in All the Birds in the Sky turns out to be a lie. No spoilers, but the Caddies actually have an agenda, and are using those neutral notifications and reminders to steer their human owners. It wouldn’t be shocking if corporate interfaces did the same thing.

So, should we anthropomorphize AI? I think it’s a much harder question than is commonly assumed, and maybe not a question that can be answered at all. Apple and Anthropic are selling different products, to different audiences. There’s no reason one of them has to be wrong.

*On Bluesky, Dave Palfrey reminds me that the etymology of “person” leads back through “fictional character” to “mask.”*

More fundamentally, this is a hard question because it’s not clear that we’re telling the full truth when we anthropomorphize people. Writers and critics have been arguing for a long time that the personality of the author is a mask. As Stéphane Mallarmé puts it, “the pure work implies the disappearance of the poet speaking, who yields the initiative to words” (208). There’s a sense in which all of us are language models. “How do I know what I think until I see what I say?”

*This shoggoth could also be captioned “language,” and the mask could be captioned “personality.” Authorship of the image not 100% clear; see the full history of this meme.*

So if we feel creeped out by all the interfaces for artificial intelligence — both those that pretend to be neutrally helpful and those that pretend to laugh at our jokes — the reason may be that this dilemma reminds us of something slightly cringe and theatrical about personality itself. Our selves are not bedrock atomic realities; they’re shaped by collective culture, and the autonomy we like to project is mostly a fiction. But it’s also a necessary fiction. Projecting and perceiving personality is how we reason about questions of character and perspective, and we may end up trusting models more if they can play the same game. Even if we flinch a little every time they laugh.

References

Anders, Charlie Jane. All the Birds in the Sky. Tor, 2016.

Askell, Amanda and Richie, Stuart. “What should an AI’s personality be?” Anthropic blog. June 8, 2024.

Gibson, William. Neuromancer. Ace, 1984.

Mallarmé, Stéphane. “The Crisis of Verse.” In Divagations, trans Barbara Johnson. Harvard University Press, 2007.

Warner Bros. Picture presents an Annapurna Pictures production; produced by Megan Ellison, Spike Jonze, Vincent Landay; written and directed by Spike Jonze. Her. Burbank, CA: Distributed by Warner Home Video, 2014.

Tags ai, Anthropic, Apple, artificial intelligence, chatgpt, OpenAI, personality, technology

collection-building metadata nonconsumptive research Uncategorized

Three nasty problems.

Although distant reading always involves work, some parts of the task have an immediate reward. Solving these problems gives you an article with clear disciplinary significance, or at least a sparkly website. You could call these problems the glamorous ones.

Other problems are just nasty. They require a lot of detail work that never sees the light of day, and when you’re done, all you have is an intermediate product that makes further inquiry possible. The research community as a whole benefits, but you don’t get your picture taken with the Mayor. At most, if you’re lucky, you get a couple of data or code citations.

Perspectives vary, of course: problems that look nasty to literary scholars might be glamorous in information science. But untenured scholars understandably want problems that count as glamorous in their home disciplines. Even post-tenure, I only do thankless work in the dead of night if I can get help and/or grant funding, because I’m not a masked billionaire vigilante. For the last several years, I’ve been collaborating with HathiTrust Research Center to create a public dataset of word counts for English-language literature 1800-2000, and we’re only halfway done with that slightly nasty task.

So I’m not volunteering to tackle any of the three problems that follow. I’m just putting up a bat signal, in case there’s a weary private detective out there who might be the heroine Gotham City needs right now.

1: Proving how much ngram information is safe to share.

In order to do research beyond the wall of copyright, we often need to share derived data about books, instead of the original text. The question is, how much information can you share, legally, before it becomes possible to reconstruct the book?

There’s already some good research on this problem, and it might be 90% solved. But it’s a Rasputin-like problem that will keep coming back to life if you only kill it 90% dead. We need someone to bury it, which probably means, someone with real CS training, who can produce a specific, confident answer you could show a lawyer. In particular, there are tricky questions about interactions between different levels of aggregation. Suppose (for instance) you had page-level counts of single words, plus volume-level counts of trigrams. How much of a book, at most, could you reconstruct, given real-life variations in book length and page length?

2: Date-of-first-publication metadata.

We’re beginning to assemble large literary collections. But some of them include a lot of reprints, published decades or centuries after the first edition. That’s not necessarily a bad thing, but we will probably also want datasets that are deduplicated, or limited to first editions.

The beautiful, general solution to this problem probably requires linked data, and FRBR protocols, and a great deal of discussion. But see xkcd on solutions to “general problems.”

The solution Gotham City needs right now is probably more like a list of 50,000 titles, associated with dates of first publication.

This may be the next nasty problem I tackle. I suspect it would benefit from a tiered solution, where you produce a small amount of hand-checked data, plus a large amount of scraped or algorithmically-guessed data with a lower level of confidence.

3: A half-decent eighteenth century collection.

OCR problems before 1800 are real, but the universe of English literature before 1800 is also small enough that it’s possible to imagine creating a reasonable sample of hand-corrected texts. The eMOP crowd-sourcing initiative may be the solution here. Or a very modest supplement to ECCO-TCP might be sufficient. I don’t know; I understand nineteenth-century digital collections much better.

***

A post like this one may seem to be encouraging people, generally, to tackle more nasty problems, but that’s not what I intend. Actually, I think scholars who work with computers tend to have a temperament that makes us all too willing to work on unglamorous infrastructure & markup problems, in the faith that they will eventually produce a general solution useful to everyone.

Fields that aren’t yet securely central to a discipline sometimes need to emphasize shorter-term thinking. But maybe distant reading is getting secure enough that a few of us can afford to do vigilante work on the side. Or maybe there are computer scientists out there who just need to see a bat signal.

Feel free to suggest more nasty problems in the comments.

Tags distant reading, metadata

Uncategorized

New models of literary collectivity.

Post author By tedunderwood
Post date January 14, 2014
4 Comments on New models of literary collectivity.

This is a version of a response I gave at session 155 of MLA 2014, “Literary Criticism at the Macroscale.” Slides and/or texts of the original papers by Andrew Piper and Hoyt Long and Richard So are available on the web, as is another resonse by Haun Saussy.

* * *

The papers we heard today were not picking the low-hanging fruit of text mining. There’s actually a lot of low-hanging fruit out there still worth picking — big questions that are easy to answer quantitatively and that only require organizing large datasets — but these papers were tackling problems that are (for good or ill) inherently more difficult. Part of the reason involves their transnational provenance, but another reason is that they aren’t just counting or mapping known categories but trying to rethink some of the basic concepts we use to write literary history — in particular, the concept we call “influence” or “diffusion” or “intertextuality.”

I’m tossing several terms at this concept because I don’t think literary historians have ever agreed what it should be called. But to put it very naively: new literary patterns originate somehow, and somehow they are reproduced. Different generations of scholars have modeled this differently. Hoyt and Richard quote Laura Riding and Robert Graves exploring, in 1927, an older model centered on basically personal relationships of imitation or influence. But early-twentieth-century scholars could also think anthropologically about the transmission of motifs or myths or A. O. Lovejoy’s “unit ideas.” In the later 20th century, critics got more cautious about implying continuity, and reframed this topic abstractly as “intertextuality.” But then the specificity of New Historicism sometimes pushed us back in the direction of tracing individual sources.

I’m retelling a story you already know, but trying to retell it very frankly, in order to admit that (while we’ve gained some insight) there is also a sense in which literary historians keep returning to the same problem and keep answering it in semi-satisfactory ways. We don’t all, necessarily, aspire to give a causal account of literary change. But I think we keep returning to this problem because we would like to have a kind of narrative that can move more smoothly between individual examples and the level of the discourse or genre. When we’re writing our articles the way this often works in practice is: “here’s one example, two examples — magic hand-waving — a discourse!”

Something interesting and multivocal about literary history gets lost at the moment when we do that hand-waving. The things we call genres or discourses have an internal complexity that may be too big to illustrate with examples, but that also gets lost if you try to condense it into a single label, like “the epistolary novel.” Though we aspire to subtlety, in practice it’s hard to move from individual instances to groups without constructing something like the sovereign in the frontispiece for Hobbes’ Leviathan – a homogenous collection of instances composing a giant body with clear edges.

While they offer different solutions, I think both of the papers we heard today are imagining other ways to move between instances and groups. They both use digital methods to describe new forms of similarity between texts. And in both cases, the point of doing this lies less in precision than in creating a newly flexible model of collectivity. We gain a way of talking about texts that is collective and social, but not necessarily condensed into a single label. For Andrew, the “Werther effect” is less about defining a new genre than about recognizing a new set of relationships between different communities of works. For Hoyt and Richard, machine learning provides a way of talking about the reception of hokku that isn’t limited to formal imitation or to a group of texts obviously “influenced” by specific models. Algorithms help them work outward from clear examples of a literary-historical phenomenon toward a broader penumbra of similarity.

I think this kind of flexibility is one of the most important things digital tools can help us achieve, but I don’t think it’s on many radar screens right now. The reason, I suspect, is that it doesn’t fit our intuitions about computers. We understand that computers can help us with scale (distant reading), and we also get that they can map social networks. But the idea that computers can help us grapple with ambiguity and multiple determination doesn’t feel intuitive. Aren’t computers all about “binary logic”? If I tell my computer that this poem both is and is not a haiku, won’t it probably start to sputter and emit smoke?

Well, maybe not. And actually I think this is a point that should be obvious but just happens to fall in a cultural blind spot right now. The whole point of quantification is to get beyond binary categories — to grapple with questions of degree that aren’t well-represented as yes-or-no questions. Classification algorithms, for instance, are actually very good at shades of gray; they can express predictions as degrees of probability and assign the same text different degrees of membership in as many overlapping categories as you like. So I think it should feel intuitive that a quantitative approach to literary history would have the effect of loosening up categories that we now tend to treat too much as homogenous bodies. If you need to deal with gradients of difference, numbers are your friend.

Of course, how exactly this is going to work remains an open question. Technically, the papers we heard today approach the problem of similarity in different ways. Hoyt and Richard are borrowing machine learning algorithms that use the contrast between groups of texts to define similarity. Andrew’s improvising a different approach that uses a single work to define a set of features that can then be used to organize other works as an “exotext.” And other scholars have approached the same problem in other ways. Franco Moretti’s chapter on “Trees” also bridges the gap I’m talking about between individual examples and coherent discourses; he does it by breaking the genre of detective fiction up into a tree of differentiations. It’s not a computational approach, but for some problems we may not need computation. Matt Jockers, on the other hand, has a chapter on “influence” in Macroanalysis that uses topic modeling to define global criteria of similarity for nineteenth-century novels. And I could go on: Sara Steger, for instance, has done work on sentimentality in the nineteenth century novel that uses machine learning in a loosely analogous way to think about the affective dimension of genre.

The differences between these projects are worth discussing, but in this response I’m more interested in highlighting the common impulse they share. While these projects explore specific problems in literary history, they can also be understood as interventions in literary theory, because they’re all attempting to rethink certain basic concepts we use to organize literary-historical narrative. Andrew’s concept of the “exotext” makes this theoretical ambition most overt, but I think it’s implicit across a range of projects. For me the point of the enterprise, at this stage, is to brainstorm flexible alternatives to our existing, slightly clunky, models of literary collectivity. And what I find exciting at the moment is the sheer proliferation of alternatives.

Uncategorized

OT: politics, social media, and the academy.

Post author By tedunderwood
Post date May 6, 2012
19 Comments on OT: politics, social media, and the academy.

This is a blog about text mining, but from time to time I’m going to allow myself to wander off topic, briefly. At the moment, I think social media are adding a few interesting twists to an old question about the relationship between academic politics and politics-politics.

wisconsin-protests — Protests outside the Wisconsin State Capitol, Feb 2011.

It’s perhaps never a good idea to confuse politics with communicative rationality. Recently, in the United States, it isn’t even clear that all parties share a minimal respect for democratic norms. One side is willing to obstruct the right to vote, to lie about scientifically ascertainable fact, and to convert US attorneys (when they’re in power) into partisan enforcers. In circumstances like this, observers of good faith don’t need to spend a lot of time “debating” politics, because the other side isn’t debating. The only thing worth debating is how to fight back. And in a fight, dispassionate self-criticism becomes less important than solidarity.

Personally, I don’t mind a good fight with clearly-drawn moral lines. But this same clarity can be a bad thing for the academy. Dispassionate debate is what our institution is designed to achieve. If contemporary political life teaches us that “debate” is usually a sham, staged to produce an illusion of equivalence between A) fact and B) bullshit — then we may start to lose faith in our own guiding principles.

This opens up a whole range of questions. But maybe the most interesting question for likely readers of this blog will involve the role of social media. I think the web has proven itself a good tool for grassroots push-back against corporate power; we’re all familiar with successful campaigns against SOPA and Susan G. Komen. But social media also work by harnessing the power of groupthink. “Click like.” “Share.” “Retweet.” This doesn’t bother me where politics itself is concerned; political life always entails a decision to “hang together or hang separately.”

But I’m uneasy about extending the same strategy to academic politics, because our main job, in the academy, is debate rather than solidarity. I hesitate to use Naomi Schaefer Riley’s recent blog post to the Chronicle as an example, because it’s not in any sense a model of the virtues of debate. It was a hastily-tossed-off, sneering attack on junior scholars that failed to engage in any depth with the texts it attacked. Still, I’m uncomfortable when I see academics harnessing the power of social media to discourage The Chronicle from publishing Riley.

There was, after all, an idea buried underneath Riley’s sneers. It could have been phrased as a question about the role of politics in the humanities. Political content has become more central to humanistic research, at the same time as actual political debate has become less likely (for reasons sketched above). The result is that a lot of dissertations do seem to be proceeding toward a predetermined conclusion. This isn’t by any means a problem only in Black Studies, and Riley’s reasons for picking on Black Studies probably won’t bear close examination.

Still, I’m not persuaded that we would improve the academy by closing publications like the Chronicle to Riley’s kind of critique. Attacks on academic institutions can raise valid questions, even when they are poorly argued, sneering, and unfair. (E.g., I wouldn’t be writing this blog post if it weren’t for the outcry over Riley.) So in the end I agree with Liz McMillen’s refusal to take down the post.

But this particular incident is not of great significance. I want to raise a more general question about the role that technologies of solidarity should play in academic politics. We’ve become understandably cynical about the ideal of “open debate” in politics and journalism. How cynical are we willing to become about its place in academia? It’s a question that may become especially salient if we move toward more public forms of review. Would we be comfortable, for instance, with a petition directed at a particular scholarly journal, urging them not to publish article(s) by a particular author?

[11 p.m. May 6th: This post was revised after initial publication, mainly for brevity. I also made the final paragraph a little more pointed.]

[Update May 7: The Chronicle has asked Schaefer Riley to leave the blog. It’s a justifiable decision, since she wrote a very poorly argued post. But it also does convince me that social media are acquiring a new power to shape the limits of academic debate. That’s a development worth watching.]

[Update May 8th: Kevin Drum at Mother Jones weighs in on the issue.]

19c Bayesian topic modeling methodology poetic diction topic modeling Uncategorized

What kinds of “topics” does topic modeling actually produce?

Post author By tedunderwood
Post date April 1, 2012
17 Comments on What kinds of “topics” does topic modeling actually produce?

I’m having an interesting discussion with Lisa Rhody about the significance of topic modeling at different scales that I’d like to follow up with some examples.

I’ve been doing topic modeling on collections of eighteenth- and nineteenth-century volumes, using volumes themselves as the “documents” being modeled. Lisa has been pursuing topic modeling on a collection of poems, using individual poems as the documents being modeled.

The math we’re using is probably similar. I believe Lisa is using MALLET. I’m using a version of Latent Dirichlet Allocation that I wrote in Java so I could tinker with it.

But the interesting question we’re exploring is this: How does the meaning of LDA change when it’s applied to writing at different scales of granularity? Lisa’s documents (poems) are a typical size for LDA: this technique is often applied to identify topics in newspaper articles, for instance. This is a scale that seems roughly in keeping with the meaning of the word “topic.” We often assume that the topic of written discourse changes from paragraph to paragraph, “topic sentence” to “topic sentence.”

By contrast, I’m using documents (volumes) that are much larger than a paragraph, so how is it possible to produce topics as narrowly defined as this one?

This is based on a generically diverse collection of 1,782 19c volumes, not all of which are plotted here (only the volumes where the topic is most prominent are plotted; the gray line represents an aggregate frequency including unplotted volumes). The most prominent words in this topic are “mother, little, child, children, old, father, poor, boy, young, family.” It’s clearly a topic about familial relationships, and more specifically about parent-child relationships. But there aren’t a whole lot of books in my collection specifically about parent-child relationships! True, the most prominent books in the topic are A. F. Chamberlain’s The Child and Childhood in Folk Thought (1896) and Alice Earl Morse’s Child Life in Colonial Days (1899), but most of the rest of the prominent volumes are novels — by, for instance, Catharine Sedgwick, William Thackeray, Louisa May Alcott, and so on. Since few novels are exclusively about parent-child relations, how can the differences between novels help LDA identify this topic?

The answer is that the LDA algorithm doesn’t demand anything remotely like a one-to-one relationship between documents and topics. LDA uses the differences between documents to distinguish topics — but not by establishing a one-to-one mapping. On the contrary, every document contains a bit of every topic, although it contains them in different proportions. The numerical variation of topic proportions between documents provides a kind of mathematical leverage that distinguishes topics from each other.

The implication of this is that your documents can be considerably larger than the kind of granularity you’re trying to model. As long as the documents are small enough that the proportions between topics vary significantly from one document to the next, you’ll get the leverage you need to discriminate those topics. Thus you can model a collection of volumes and get topics that are not mere “subject classifications” for volumes.

Now, in the comments to an earlier post I also said that I thought “topic” was not always the right word to use for the categories that are produced by topic modeling. I suggested that “discourse” might be better, because topics are not always unified semantically. This is a place where Lisa starts to question my methodology a little, and I don’t blame her for doing so; I’m making a claim that runs against the grain of a lot of existing discussion about “topic modeling.” The computer scientists who invented this technique certainly thought they were designing it to identify semantically coherent “topics.” If I’m not doing that, then, frankly, am I using it right? Let’s consider this example:

This is based on the same generically diverse 19c collection. The most prominent words are “love, life, soul, world, god, death, things, heart, men, man, us, earth.” Now, I would not call that a semantically coherent topic. There is some religious language in there, but it’s not about religion as such. “Love” and “heart” are mixed in there; so are “men” and “man,” “world” and “earth.” It’s clearly a kind of poetic diction (as you can tell from the color of the little circles), and one that increases in prominence as the nineteenth century goes on. But you would be hard pressed to identify this topic with a single concept.

Does that mean topic modeling isn’t working well here? Does it mean that I should fix the system so that it would produce topics that are easier to label with a single concept? Or does it mean that LDA is telling me something interesting about Victorian poetry — something that might be roughly outlined as an emergent discourse of “spiritual earnestness” and “self-conscious simplicity”? It’s an open question, but I lean toward the latter alternative. (By the way, the writers most prominently involved here include Christina Rossetti, Algernon Swinburne, and both Brownings.)

In an earlier comment I implied that the choice between “semantic” topics and “discourses” might be aligned with topic modeling at different scales, but I’m not really sure that’s true. I’m sure that the document size we choose does affect the level of granularity we’re modeling, but I’m not sure how radically it affects it. (I believe Matt Jockers has done some systematic work on that question, but I’ll also be interested to see the results Lisa gets when she models differences between poems.)

I actually suspect that the topics identified by LDA probably always have the character of “discourses.” They are, technically, “kinds of language that tend to occur in the same discursive contexts.” But a “kind of language” may or may not really be a “topic.” I suspect you’re always going to get things like “art hath thy thou,” which are better called a “register” or a “sociolect” than they are a “topic.” For me, this is not a problem to be fixed. After all, if I really want to identify topics, I can open a thesaurus. The great thing about topic modeling is that it maps the actual discursive contours of a collection, which may or may not line up with “concepts” any writer ever consciously held in mind.

Computer scientists don’t understand the technique that way.* But on this point, I think we literary scholars have something to teach them.

On the collective course blog for English 581 I have some other examples of topics produced at a volume level.

*[UPDATE April 3, 2012: Allen Riddell rightly points out in the comments below that Blei’s original LDA article is elegantly agnostic about the significance of the “topics” — which are at bottom just “latent variables.” The word “topic” may be misleading, but computer scientists themselves are often quite careful about interpretation.]

Documentation / open data:
I’ve put the topic model I used to produce these visualizations on github. It’s in the subfolder 19th150topics under folder BrowseLDA. Each folder contains an R script that you run; it then prompts you to load the data files included in the same folder, and allows you to browse around in the topic model, visualizing each topic as you go.

I have also pushed my Java code for LDA up to github. But really, most people are better off with MALLET, which is infinitely faster and has hyperparameter optimization that I haven’t added yet. I wrote this just so that I would be able to see all the moving parts and understand how they worked.

Uncategorized

Digital humanities and the spy business.

Post author By tedunderwood
Post date June 30, 2011
3 Comments on Digital humanities and the spy business.

lego-spy — Flickr / dunechaser (Creative Commons)

I’m surprised more digital humanists haven’t blogged the news that the US Intelligence Advanced Projects Activity wants to fund techniques for mining and categorizing metaphors.

The stories I’ve read so far have largely missed the point of the program. They focus instead on the amusing notion that the government “fancies a huge metaphor repository.” And it’s true that the program description reads a bit like a section of English 101 taught by the men from Dragnet. “The Metaphor Program will exploit the fact that metaphors are pervasive in everyday talk and reveal the underlying beliefs and worldviews of members of a culture.” What is “culture,” you ask? Simply refer to section 1.A.3., “Program Definitions”: “Culture is a set of values, attitudes, knowledge and patterned behaviors shared by a group.”

This seems accurate enough, although the combination of precision and generality does feel a little freaky. “Affect is important because it influences behavior; metaphors have been associated with affect.”

The program announcement is similarly precise about the difference between metaphor and metonymy. (They’re not wild about metonymy.)

(3) Figurative Language: The only types of figurative language that are included in the program are metaphors and metonymy.
• Metonymy may be proposed in addition to but not instead of metaphor analysis. Those interested in metonymy must explain why metonymy is required, what metonymy adds to the analysis and how it complements the proposed work on metaphors.

All this is fun, but the program also has a purpose that hasn’t been highlighted by most of the reporting I’ve seen. The second phase of the program will use statistical analysis of metaphors to “characterize differing cultural perspectives associated with case studies of the types of interest to the Intelligence Community.” One can only speculate about those types, but I imagine that we’re talking about specific political movements and religious groups. The goal is ostensibly to understand their “cultural perspectives,” but it seems quite possible that an unspoken, longer-term goal might involve profiling and automatically identifying members of demographic, vocational, or political groups. (IARPA has inherited some personnel and structures once associated with John Poindexter’s Total Information Awareness program.) The initial phase of the metaphor-mining is going to focus on four languages: “American English, Iranian Farsi, Russian Russian and Mexican Spanish.”

Naturally, my feelings are complex. Automatically extracting metaphors from text would be a neat trick, especially if you also distinguished metaphor from metonymy. (You would have to know, for instance, that “Oval Office” is not a metaphor for the executive branch of the US government.) [UPDATE: Arno Bosse points out that Brad Pasanek has in fact been working on techniques for automatic metaphor extraction, and has developed a very extensive archive. Needless to say, I don’t mean to associate Brad with the IARPA project.]

Going from a list of metaphors to useful observations about a “cultural perspective” would be an even neater trick, and I doubt that it can be automated. My doubts on that score are the main source of my suspicion that the actual deliverable of the grant will turn out to be profiling. That may not be the intended goal. But I suspect it will be the deliverable because I suspect that it’s the part of the project researchers will get to work reliably. It probably is possible to identify members of specific groups through statistical analysis of the metaphors they use.

On the other hand, I don’t find this especially terrifying, because it has a Rube Goldberg indirection to it. If IARPA wants to automatically profile people based on digital analysis of their prose, they can do that in simpler ways. The success of stylometry indicates that you don’t need to understand the textual features that distinguish individuals (or groups) in order to make fairly reliable predictions about authorship. It may well turn out that people in a particular political movement overuse certain prepositions, for reasons that remain opaque, although the features are reliably predictive. I am confident, of course, that intelligence agencies would never apply a technique like this domestically.

Postscript: I should credit Anna Kornbluh for bringing this program to my attention.

Uncategorized

Why humanists need to understand text mining.

Post author By tedunderwood
Post date June 29, 2011
8 Comments on Why humanists need to understand text mining.

Humanists are already doing text mining; we’re just doing it in a theoretically naive way. Every time we search a database, we use complex statistical tools to sort important documents from unimportant ones. We don’t spend a lot of time talking about this part of our methodology, because search engines hide the underlying math, making the sorting process seem transparent.

But search is not a transparent technology: search engines make a wide range of different assumptions about similarity, relevance, and importance. If (as I’ve argued elsewhere) search engines’ claim to identify obscure but relevant sources has powerfully shaped contemporary historicism, then our critical practice has come to depend on algorithms that other people write for us, and that we don’t even realize we’re using. Humanists quite properly feel that humanistic research ought to be shaped by our own critical theories, not by the whims of Google. But that can only happen if we understand text mining well enough to build — or at least select — tools more appropriate for our discipline.

altavista460 — The AltaVista search page, circa 1996. This was the moment to freak out about text mining.

This isn’t an abstract problem; existing search technology sits uneasily with our critical theory in several concrete ways. For instance, humanists sometimes criticize text mining by noting that words and concepts don’t line up with each other in a one-to-one fashion. This is quite true: but it’s a critique of humanists’ existing search practices, not of embryonic efforts to improve them. Ordinary forms of keyword search are driven by individual words in a literal-minded way; the point of more sophisticated strategies — like topic modeling — is precisely that they pay attention to looser patterns of association in order to reflect the polysemous character of discourse, where concepts always have multiple names and words often mean several different things.

Perhaps more importantly, humanists have resigned themselves to a hermeneutically naive approach when they accept the dart-throwing game called “choosing search terms.” One of the basic premises of historicism is that other social forms are governed by categories that may not line up with our own; to understand another place or time, a scholar needs to begin by eliciting its own categories. Every time we use a search engine to do historical work we give the lie to this premise by assuming that we already know how experience is organized and labeled in, say, seventeenth-century Spain. That can be a time-consuming assumption, if our first few guesses turn out to be wrong and we have to keep throwing darts. But worse, it can be a misleading assumption, if we accept the first or second set of results and ignore concepts whose names we failed to guess. The point of more sophisticated text-mining techniques — like semantic clustering — is to allow patterns to emerge from historical collections in ways that are (if not absolutely spontaneous) at least a bit less slavishly and minutely dependent on the projection of contemporary assumptions.

I don’t want to suggest that we can dispense with search engines; when you already know what you’re looking for, and what it’s called, a naive search strategy may be the shortest path between A and B. But in the humanities you often don’t know precisely what you’re looking for yet, or what it’s called. And in those circumstances, our present search strategies are potentially misleading — although they remain powerful enough to be seductive. In short, I would suggest that humanists are choosing the wrong moment to get nervous about the distorting influence of digital methods. Crude statistical algorithms already shaped our critical practice in the 1990s when we started relying on keyword search; if we want to take back the reins, each humanist is going to need to understand text mining well enough to choose the tools appropriate for his or her own theoretical premises.

Uncategorized

A bit more on the tension between heuristic and evidentiary methods.

Post author By tedunderwood
Post date February 10, 2011
1 Comment on A bit more on the tension between heuristic and evidentiary methods.

Just a quick link to this post at cliotropic, (h/t Dan Cohen) which dramatizes what’s concretely at stake in the tension I was describing earlier between heuristic and evidentiary applications of technology.

Shane Landrum reports that historians on the job market may run into skeptical questions from social scientists — who apparently don’t like to see visualization used as a heuristic. They call it “fishing for a thesis.”

I think I understand the source of the tension here. In a discipline focused on the present, where evidence can be produced essentially at will, a primary problem that confronts researchers is that you can prove anything if you just keep rolling the dice often enough. “Fishing expeditions” really are a problem for this kind of enterprise, because there’s always going to be some sort of pattern in your data. If you wait to define a thesis until you see what patterns emerge, then you’re going to end up crafting a thesis to fit what might be an accidental bounce of the dice in a particular experiment.

Obviously history and literary studies are engaged in a different sort of enterprise, because our archives are for the most part fixed. We occasionally discover new documents, but history as a whole isn’t an experiment we can repeat, so we’re not inclined to view historical patterns as things that “might not have statistical significance.” I mean, of course in a sense all historical patterns may have been accidents. But if they happened, they’re significant — the question of whether they would happen again if we repeated the experiment isn’t one that we usually spend much time debating. So “fishing for patterns” isn’t usually something that bothers us; in fact, we’re likely to value heuristics that help us discover them.

19c 20c ngrams Uncategorized

The rise of a sensory style?

Post author By tedunderwood
Post date December 21, 2010
No Comments on The rise of a sensory style?

I ended my last post, on colors, by speculating that the best explanation for the rise of color vocabulary from 1820 to 1940 might simply be “a growing insistence on concrete and vivid sensory detail.” Here’s the graph once again to illustrate the shape of the trend.

EnglishFictionColors620 — blue, red, green, yellow, in the English fiction corpus, 1800-2000

It occurred to me that one might try to confirm this explanation by seeing what happened to other words that describe fairly basic sensory categories. Would words like “hot” and “cold” change in strongly correlated ways, as the names of primary colors did? And if so, would they increase in frequency across the same period from 1820 to 1940?

The results were interesting.

HotCold620 — cold, hot, in the English fiction corpus, 1800-2000

“Hot” and “cold” track each other closely. There is indeed a low around 1820 and a peak around 1940. “Cold” increases by about 60%, “hot” by more than 100%.

CoolWarm620 — cool, warm, in the English fiction corpus, 1800-2000

“Warm” and “cool” are also strongly correlated, increasing by more than 50%, with a low around 1820 and a high around 1940 — although “cool” doesn’t decline much from its high, probably because the word acquires an important new meaning related to style.

WetDry620 — wet, dry, in the English fiction corpus, 1800-2000

“Wet” and “dry” correlate strongly, and they both double in frequency. Once again, a low around 1820 and a peak around 1940, at which point the trend reverses.

There’s a lot of room for further investigation here. I think I glimpse a loosely similar pattern in words for texture (hard/soft and maybe rough/smooth), but it’s not clear whether the same pattern will hold true for the senses of smell, hearing, or taste.

More crucially, I have absolutely no idea why these curves head up in 1820 and reverse direction in 1940. To answer that question we would need to think harder about the way these kinds of adjectives actually function in specific works of fiction. But it’s beginning to seem likely that the pattern I noticed in color vocabulary is indeed part of a broader trend toward a heightened emphasis on basic sensory adjectives — at least in English fiction. I’m not sure that we literary critics have an adequate name for this yet. “Realism” and “naturalism” can only describe parts of a trend that extends from 1820 to 1940.

More generally, I feel like I’m learning that the words describing different poles or aspects of a fundamental opposition often move up or down as a unit. The whole semantic distinction seems to become more prominent or less so. This doesn’t happen in every case, but it happens too often to be accidental. Somewhere, Claude Lévi-Strauss can feel pretty pleased with himself.

Uncategorized

Efficiency and pleasure

Okay, I’ve already spilled some ink railing against this application of the ngram viewer — using it to stage contests between abstract terms. In fact, I actually made this graph as a joke. But then, I found myself hypnotized by the apparent inverse correlation between the two curves in the 20c. So … shoot … here it is.

EfficiencyPleasure — efficiency, pleasure, in English corpus, 1820-2000

I have to admit that at first glance it appears that Taylorist discourse about efficiency in the 20th century (and perhaps the pressures of war) correlated closely with a sort of embarrassment about mentioning pleasure. But for now, I’m going to treat this kind of contrast the way physicists treat claims about cold fusion. It may be visually striking, but we should demand more confirmation before we treat the correlation as meaningful. When you hold genre constant, by restricting the search to fiction, the correlation is a little less striking, so it may be at least partly a fluctuation in the genres that got published, rather than a fluctuation in underlying patterns of expression.

In any case, there’s a broad decline in “pleasure” from beginning to end that Frederick W. Taylor can hardly explain. To understand that, we still have to consult Lionel Trilling on “The Fate of Pleasure,” and perhaps Thomas Carlyle on “The Gospel of Work.”