- Arabic corpus linguistics ed. by Tony McEnery, Andrew Hardie, and Nagwa Younis
Arabic is a major language spoken natively by c. 300 million people and is one of the six official languages of the United Nations, yet there have been few corpus linguistics (CL) studies devoted to it. Arabic corpus linguistics, edited by Tony McEnery, Andrew Hardie, and Nagwa Younis, is a collection of essays intended to begin redressing the balance. It is intended for three categories of reader: (i) Arabic corpus linguists (currently a tiny group), (ii) corpus linguists and those in allied fields such as text linguistics who have no experience of working with Arabic data, and (iii) Arabic linguists unfamiliar with CL methods and goals. There are ten chapters: a ground-clearing introduction, and nine chapters on specific issues. Chs. 2–6 are in essence methodological, covering ways of creating, making accessible, and analyzing Arabic corpora; Chs. 7–10 exemplify the actual application of these methods and data sets in current CL research on Arabic.
The introductory chapter, written by the editors, begins with some basic facts about Arabic that are relevant in CL, in particular its root-and-template morphology, and the peculiarities of its orthography and commonly used romanized transliteration systems. It also offers some observations on salient sociolinguistic issues, such as its 'diglossic' nature. It then introduces CL as a method of enquiry and outlines the present state of its application to Arabic as an emerging field. An important focus in CL research to emerge in recent decades has been collocation, facilitated by the development of key-word-in-context (KWIC) concordancing tools applied to large corpora of English (see e.g. Sinclair 1991). The elaboration of corpora by type and corpus annotation (tagging), which permits analyses of different types (grammatical, semantic), have also made great progress.
Most of this introductory material is well known, but some of the statements about Arabic are poorly phrased or out of date. On p. 2, Modern Standard Arabic (MSA) is said to be 'the most [End Page 202] prestigious spoken form' of the language. But MSA is not a 'spoken form' of Arabic in any normally accepted sense, as no Arab speaks it natively; it is an institutionally (and often imperfectly) learned formal variety. Its main spoken use is in the reading aloud of preprepared written texts such as news bulletins, speeches, lectures, and other formal oral performances, and deviations from the prepared text generally result in a 'mixed' form of Arabic nearer to the speaker's normal dialectal speech. As for its 'prestige', MSA certainly carries 'overt prestige' as the 'official' language of all Arab states, the vehicle of a valued literary heritage and of an ancient religious culture (Ferguson 1959). But, as more recent Arabic sociolinguistic research from many locations (Jordan, Iraq, the Gulf States, among others) has demonstrated, it is rarely if ever the source of (Labovian) 'covert prestige' (i.e. influences below the level of conscious manipulation) in normal speech—that is usually the dialect of the capital city or of some other center of political power and/or social influence. Examples are the dialect of Cairo in Egypt, Baghdad in Iraq, and Damascus in Syria.
Nor are the oft-repeated dicta that 'Egyptian Arabic is "the most influential Colloquial Arabic"' and that 'from the 1950's onwards, the Egyptian media was predominant in the Arabic speaking world' (2–3) any longer true. Over the past quarter-century or so, the internet and the proliferation of regional Arab electronic media have brought about decentralizing changes to the sociolinguistic landscape of the Arabic-speaking Middle East and North Africa. By any measure, the Egyptian media and film industry no longer rule the roost the way they did in the 1950s to 1960s in the heyday of Arab nationalism. Times have changed both geopolitically and linguistically, and there are now many new production centers and media outlets, from the countries of the Maghreb to the Levant to the Gulf...