In lieu of an abstract, here is a brief excerpt of the content:

  • Digital Change:The Benefits of Control1
  • David Jost (bio)

Houghton Mifflin Harcourt has blazed a unique trail in digital publishing. Houghton digitized its dictionary assets in 1969, long before most other publishers even considered doing so. This meant that the dictionary existed in a typesetters’ tape that could be used for other purposes. The first edition of The American Heritage Dictionary of the English Language was very successful and prophetically included in its front matter an essay by Henry Kučera, professor of linguistics and Slavic languages at Brown University, entitled “Computers in Language Analysis and in Lexicography.” Professor Kučera is well known for creating the Brown Corpus, the granddaddy of computer corpora.

In the late 1970s when Houghton Mifflin brought together computational linguists and lexicographers to exploit the dictionary data in electronic form, one of the crucial contributors was Kučera. He helped to devise the spell-checking algorithms that would be vital to the fortunes of Houghton. At the same time Ilya Kaufmann, an expert in spelling correction and information retrieval, contributed a compression algorithm that allowed Houghton to produce a spell checker with a tiny footprint that could fit into the small amount of space available in applications at the time.

Houghton proceeded to create various electronic products in addition to the spell checker such as a grammar checker and also [End Page 299] electronic dictionaries and thesauruses in various sizes. Various languages besides English were added, ultimately over twenty for spell checking, for example. The makers of handheld dictionaries and other products, word processors, and other applications that needed linguistic tools beat a path to Houghton’s door. The approach Houghton took was to license the tools to all comers nonexclusively. Inevitably there was a shakeout among the various players in a given industry and Houghton was left with licenses to those who had succeeded.

The classic instance of this was in the world of word-processing companies. Microsoft was not the only player in word processing but as we know Word took over. Houghton’s technology rode the wave. Microsoft in the end was its biggest and most prestigious customer but it was not the only customer with deep pockets. The initial plunge into the new technologies paid bigger and bigger dividends on various royalty deals, so much so that in 1994 Houghton spun off a company, Inso.

Its dependence on Microsoft as a main customer meant that Inso’s days were numbered but it was a good ride for Inso and Houghton. The spin off gave Houghton enough money to help buy D. C. Heath for example. All these benefits came Houghton’s way because they took the risk of investing in new technology and also maintained control of the work. This meant that Houghton had their own staff who understood the technology and who could see other ways to apply it to new products. It also meant that a publishing company was creating a staff including programmers who knew their products, especially their dictionary, well, and therefore did not have to rely on outsiders to understand what could be done or how much it should cost. This knowledge base would continue to benefit Houghton down to the present.

Houghton Mifflin had encoded The American Heritage Dictionary itself in SGML, a markup language, in the late 80s and early 90s, using its own staff to do the tagging, which also benefited the publishing and licensing of the dictionary products. Microsoft was a customer for this dictionary as well, both in Encarta and in Bookshelf, before embarking on its own dictionary program. Apparently digital control cuts both ways. At any rate in 1997 Houghton Mifflin decided to bring digital publishing back inside the company. It did not see an ongoing business in tools like spell checking, which had become commoditized, but it did see a continuing market in licensing its reference titles.

In doing work for customers of Houghton Mifflin and Inso, files and programmers and their skills emerged that could continue to be used and further developed. Content files in SGML were developed for various dictionaries and other reference books. Usually these were simply SGML [End Page 300] versions of what the...

pdf

Share