In lieu of an abstract, here is a brief excerpt of the content:

Reviewed by:
  • Programming for linguists: Perl for language researchers
  • Tom Cobb
Michael Hammond . Programming for linguists: Perl for language researchers. Oxford: Blackwell. 2003. Pp. x + 219, US$74.95 (hardcover), $39.95 (softcover).

Most people involved in the systematic study of language, whether theoretical linguists, applied linguists, or language educators, are probably already using computation in their work, or wish they knew how they could. The sheer volume and variety of data relevant to any linguistic question makes our field especially suited to computational analysis, and indeed it can be argued that the computer is to language study what the microscope was to biology or the telescope to astronomy—a vital tool heralding a late entry into the empirical age.

Few at present would attempt to produce a professional document without making some use of their word processor's spell-checking, word-counting, or find-and-replace features, but most linguists are aware that the computer's potential extends well beyond this. The problem is how to realize this potential. A common first step for many is to switch, for some purposes, from an all-purpose word processing program (such as Microsoft Word) to a specialized text editor (such as BBEdit for Macintosh or Textpad for PC) which trades text formatting for text processing. These text editors can handle files of more than a million words, or handle several files at once, but their main advantage is that they can find and replace using "regular expressions", or regexes. A simple example of a regex in a search pattern is that a full stop becomes a wildcard, so that searching a text for h.t will locate every hat, hit, hot, or hut in a text; or that square brackets indicate a constrained wildcard, so that searching for h[aeiou][a-z] matches hat, hip, hit, hop, or hut; or that a caret (^) indicates a negative wildcard, so that searching for h[^u]t matches hat, hit, and hot, but not hut.

For a real-life example of regular expressions in the workplace, as an instructor I once wanted to change the format of the one million-word Brown corpus so that I could present parts of it to first-year students, minus some of the distracting coding of the original. The Brown corpus is broken into separate lines, each beginning with a set of codes indicating its location in the corpus and its source.

A01 0010 1 The Fulton County Grand Jury said Friday an investigation

A01 0020 1 of Atlanta's recent primary election produced "no evidence"

A01 0020 9 that any irregularities took place.

A01 0030 5 The jury further said in term-end presentments that [End Page 50]

A01 0040 3 the City Executive Committee, which had over-all charge

A01 0050 2 of the election, "deserves the praise and thanks of

A01 0050 11 the City of Atlanta" for the manner in which the election

A01 0060 11 was conducted.

First, MS Word could not even open the entire corpus, which my text editor (Textpad) could do easily; and through a careful perusal of Textpad's Help files I came up with a regular expression that would remove all the coding from the entire corpus in a few minutes. This was to replace \n.\{15\} with nothing, that is to say, the first 15 characters (full stops) at the beginning of every line (\n) with nothing.

The source of these and many more powerful regexes is the programming language Perl (an acronym for Practical Extraction and Report Language invented by Larry Wall in the 1980s). After moving to a text editor, Perl is the next logical step for the linguist wishing to exploit the powers of the computer, and there is no better place to begin than Michael Hammond's Programming for linguists.

It is curious that a book on Perl "for linguists" is needed at all, since Perl was basically made for text processing and manipulation (most languages prioritize number crunching rather than text crunching). The reason a dedicated volume makes sense is that Perl has become such a widely used language (because it is cross-platform or runs on any computer, it is free...

pdf

Share