- The Once and Future Thing:A Response to Stephen Burt
A few years ago, I found myself in conversation with the digital programs staff at the Poetry Foundation about the question of how best to design a “recommendation engine” for poetry—a tool that would help readers discover new poems to read and love amidst the more than 10,000 works in their online archive.
There are several ways to imagine what such a tool would do and how it ought to work. Some recommendation algorithms work by “collaborative filtering”—making predictions about a person’s preferences based on his or her own past choices and the related choices of other users. This is the kind of system familiar to shoppers at Amazon.com, which informs us at every turn that people who bought X also bought Y in the hopes that we’ll be persuaded to do the same. A collaborative filtering system for poetry (letting us know, for example that many people who like Wordsworth’s “The Solitary Reaper” also like Wallace Stevens’s “The Idea of Order at Key West”) would have the virtue of being relatively straightforward to design—at least for those who know how to design such things— because it wouldn’t require its designers to know anything about poetry. A system that relates X to Y on the grounds that many people have purchased (or otherwise attested to liking) both doesn’t have to “know” anything at all about what X and Y are like in order to get people’s preferences uncannily right; it just needs enough people using the system to transform the record of individual decisions into robust predictions of taste.
By contrast, “content-based” recommendation engines are based on more or less elaborate descriptions of the Xs and Ys in question. Such systems may not be as powerful off the bat (because they are restricted to domains of description, they don’t have millions of points of data to coordinate; they also can’t tell you what dish [End Page 320] detergent readers of Stevens’s poems tend to prefer), but they are, arguably, more sophisticated at targeting the particulars that motivate our tastes. Once a user starts indicating his or her preferences, providing the seed input, the content-based engine can hone in on the specific qualities that those objects share (among those qualities that have been tagged), and use that data to point the reader toward other items that are similar in relevant ways.
You are most likely to have encountered content-based recommendations in the form of the “Music Genome Project”—an elaborate system of musical description designed to direct listeners to new music on the basis of songs they like. As described by one of its creators, Tim Westergren, the Music Genome consists of a taxonomy of over 450 qualities or “genes,” some subset of which may be found in any piece of music (“About the Music Genome Project”). With the taxonomy in hand, and a team of musicologists trained to wield it consistently (with “precisely defined terminology, a consistent frame of reference, redundant analysis, and ongoing quality control”), a system based on the genome could categorize any piece of music in ways that would connect it to many others.
Though you may not recognize the Music Genome Project by name, you may well be familiar with the music search engine Pandora—the music streaming service based on it. The transformation of project into product is fairly typical: while the musical taxonomy was briefly available online, the company has since closed Pandora’s box, and its contents are now a trade secret. What matters about Pandora as a tool is thus what matters to Pandora as a company: that the sorting engine gives listeners what they want— enough to enjoy the party and pay for the subscription—rather than distracting them with the why. But it seemed to me that the open version of a music genome could well be a powerful pedagogical tool. Given a few seeding choices, a hypothetical Poetry Genome Project might do more than just direct you to other poems that you might like (and didn’t already...