Data Mining Fiction
In lieu of an abstract, here is a brief excerpt of the content:

Data Mining Fiction
Jeffrey R. Di Leo, Editor and Publisher (bio)

You are being read by what you read. Google knows where you read; Apple knows what you read; and Amazon knows how you read. How does it feel to have the object of reading transferred from your book to your self?

Print books were a grossly inefficient means of collecting data on your reading behavior. The composition of your bookshelf was mostly a private matter, as was whether the books on your shelf were being utilized or just collecting dust. Your dogeared pages and annotations were only known by those with direct access to your library.

But digital books tell all.

As Alexandra Alter outlines in a recent Wall Street Journal article, data mining of your eBooks can determine everything from how long it took you to read a page of the novel you just downloaded to how much of it you actually read. When you re-read a page, data is collected; when you highlight a line or make an annotation, data is collected; even when you give up on a book, data is collected. Virtually everything you do with a digital book can be captured by data mining.

We can now acquire more precise information about what books people are buying and who specifically is buying them. But more significantly, we now know much more about how people are using the digital books they acquire.

Kobo, a major provider of eReaders and eBooks, keeps a record of how long it takes each of its eight million users to read their eBooks and whether they finish them. Kobo knows, for example, that readers of George R. R. Martin's A Dance with Dragons read on average about 50 pages per hour—and that most of the readers who started reading this 1,000 page-plus fantasy novel completed it.

With 2.5 million books in their inventory, reading behavior data from Kobo is a treasure trove for publishers hungry for more than just sales information. The capacity to track not just what books people buy, but also whether—and how—they read them, provides an entirely new dimension to the book industry.

Jim Hilt, vice president for eBooks at Barnes and Noble, which controls over 25 percent of the digital book market through its Nook eReader, says that his company has "more data than we can use."

"The bigger trend we're trying to unearth is where are those drop-offs in certain kinds of books," said Hilt, "and what [we can] do with publishers to prevent that." His company has already used it to produce "Nook Snaps"—short books on nonfiction subjects aimed at responding to data indicating that Nook readers routinely give up on long works of nonfiction.

Scott Turow's books have sold over 25 million copies and have been translated into 20 languages. Still, he claims that he does not know who buys his books. "I once had an argument with one of my publishers," comments Turow, "when I said, 'I've been publishing with you for a long time and you still don't know who buys my books,' and he said, 'Well, nobody in publishing knows that.'"

So what would the president of the Authors Guild do with this information if he had it?

"If you can find out that a book is too long and you've got to be more rigorous in cutting," says Turow, "personally I'd love to get the information."

But should the shaping of written works be decided not by writers and editors, but by the mean of the median of the responses of their eReading public?

The world doesn't need fifty shades of reading-behavior-data-generated fiction—even though I'm sure that sales-hungry publishers would disagree. "If we can help authors create even better books than they create today," comments Hilt, "it's a win for everybody."

Better books are not necessarily books that better meet the expectations of readers. In fact, one could argue that better books are ones that do not meet reader expectations, but rather defy or challenge them. They may not be as "reader-friendly" or sell...


pdf