In lieu of an abstract, here is a brief excerpt of the content:

I! Taming the Wild Beast Orin Hargraves "t would be going too far to subtitle these remarks The Tyranny . of the Internet, but for anyone whose tracking of new words predates the emergence of that great extension of our mental reach, the job of dealing with neologisms in dictionaries these days very often seems to revolve around a common pattern: an inordinate amount of futzing around online without very satisfying results to show for it. What are to be the limits and parameters of influence of a tool that is now indispensable to work with, but that does not easily yield itself to mastery? This is a question I wrestle with at the micro level very often when trying to pin down the merits of a particular neologism as a candidate to appear in a particular dictionary; perhaps this corner of Dictionaries is an appropriate forum in which to generalize the problem. The Internet lurks somewhere, from periphery to core, in nearly every aspect of neology today: it not only spawns and propagates a huge number of neologisms, but also serves as one of the main vehicles by which we find them, track them, and document their distribution, frequency , and forms. There is no innovation in English today that is not in some way mirrored on the Internet. It is the "in some way" that is the wildcard, the indeterminate and highly variable factor. The Internet — or to be more accurate, the World Wide Web — does indeed mirror every aspect of English and the development of the English lexicon today; but in some areas of language it is a resonator while in others it offers only an occluded reflection. It is often left to the lexicographer to decide, for a particular word, what sort of mirror the online lexicon offers , and the tools that would enable us to make this decision wisely are, to my knowledge, not well developed. Dictionaries:Journal oftheDictionary Society ofNorth America 28 (2007), 139—141 140Orín Hargraves These days all the respectable dictionary publishers provide a corpus for their editors to work with. "The Internet is not a corpus" is the familiar mantra, and the corpus, therefore, serves to correct the shortcomings of Internet searches — by being more balanced, more inclusive of obscure or digitally neglected corners of English, and less overloaded with the technological or informal or commercial language that skews Internet searches. But it is interesting that along with the guidelines that publishers provide for the use of their corpora, there is usually a plan B, a set of unofficial guidelines for researching frequency and usage via Google when the corpus gives null, counterintuitive, ambiguous , or otherwise unusable results. No corpus is ever as up-to-theminute as a Google news or a Google blog search, and no one can resist the ease with which a hunch about new language can be put to the test, using the facility of a state-of-the-art search engine that has been developed with hundreds of thousands of big-brained man hours. So we do, of course, take advantage of the Google search box. Sometimes we get quick-and-easy results that confirm or disconfirm what corpus results or our intuitions may have suggested about a particular newword or usage. But it also often happens, as often as not I think, that search results are skewed by obvious noise and imbalance. We then start down the tortuous path of refining the search: limit it to this or that domain /file format/time period/country? Confine it to a particular inflection ? Disregard particular collocates or insist on the presence of others ? The search string grows longer, the results more desultory. We long to be able to use natural expressions in the search string; we scratch our heads to find a way to eliminate the obvious noise, without which, we are sure, the imagined usage of our search target would appear in resplendent glory. Another frequently agonized-about feature of incorporating online (or electronic corpus) data into inclusion decisions is what we might call the numbers game. Publishers often develop back-of-the-envelope algorithms for determining what sort of numbers constitute sufficient frequency to boost a...

pdf

Share