University of Hawai'i Press
  • The "Mystery Aspirates" in Philippine Languages

The reconstruction of Proto-Malayo-Polynesian (PMP) *h as a syllable onset continuing Proto-Austronesian (PAn) *S has depended heavily, though not exclusively, on Central Philippine languages, where it is well supported. In coda position, however, the reconstruction of *h has been more difficult. Word-final *h was initially proposed solely on the basis of phonological alternations in Tagalog, and only later shown to be reflected as a coda by Itbayaten of the Batanes Islands, and less securely by Aklanon in the Bisayas. New evidence for PMP *-h continuing PAn *S is introduced here from Central Luzon languages, Bisayan languages, and Mamanwa of Mindanao. In addition, revised material on Aklanon now shows that it, too, preserves coda /h/. However, not all examples of -h in Philippine languages can be traced to PAn *S. Rather, a surprising number of these segments correspond to zero in most previous reconstructions. More problematically, the cognate sets that contain -h involve multiple sound correspondences, raising difficult questions about their origin.


Laryngeal consonants have a long and troubled history in Austronesian (AN) languages (Brandstetter 1916:86, 248, 273, 282–85; Dempwolff 1934–38; Dyen 1953a, 1962, 1965; Tsuchida 1976; Zorc 1982, 1996; Blust 2013:546–53, 567–74). Unlike other segments that have generated controversy in phonological reconstruction, such as *z (Dyen 1951), *R (Dyen 1953b), *b (Prentice 1974), or *d (Dahl 1976:58ff; Ross 1992:40ff), laryngeals in a small number of languages often correspond to zero in the majority of witnesses, raising questions about whether they have been added in a few languages, or lost in all others.

Dempwolff's reconstruction of the Uraustronesisch laryngeals was methodologically flawed, and in correcting it Dyen (1953a) reconstructed Proto-Austronesian (PAn) *q and *h. Because the second of these consonants has sibilant reflexes in a number of Formosan languages, it was later replaced by *S, and then split into six subvarieties, *S1–*S6 (Dyen 1965; Tsuchida 1976). For more than a quarter of a century, these two segments [End Page 221] were known as the PAn "laryngeals," even though *q appears to have been a pharyngeal stop and *S an alveolar sibilant (Blust 2013:553ff).

In a paper notable more for its recognition of numerous new protophonemes on the basis of limited data than for its reconstruction of a plausible phonological system, Dyen (1965:301ff) proposed what he called "The *x-reconstructions" (*x1, *x2, *X), and "The H-and Ɂ-correspondences," with *H-assigned to three forms and *Ɂ to two. To a large extent, this approach, which posited new protophonemes without concern for possible phonetic correlates, was disfavored, and for nearly two decades most Austronesianists apart from Dyen's own students (for example, Tsuchida 1976) left the matter alone.

As will be seen below, these proposals were developed on the basis of better evidence on the Formosan languages by Tsuchida (1976). In addition, Zorc (1982, 1996) raised the question whether PAn might have had a set of true laryngeals, noting that these are phonetically the weakest consonants, hence those that are most likely to disappear early in the history of a language family. Rather than follow the lead of Dyen (1965), he struck out in a new direction, proposing PAn *-Ɂ on the basis of a set of fairly richly attested sound correspondences different from that for which either *q or Dyen's fragile *-Ɂ was reconstructed. In addition, he proposed PAn *h based on evidence different from that used to justify *S.

Zorc's proposal for *-Ɂ was taken more seriously by scholars interested in the phonology of PAn because it was based on a conventional application of the comparative method, which uses converging lines of evidence as support for new distinctions, rather than basing these entirely on irregularities in a single language, and in some cases in a single form, as was the case with many of the reconstructions in Dyen (1965). However, despite its promising beginnings, the reconstruction of *-Ɂ has proven problematic (Blust 2013:567–74), and although Zorc also considered reflexes of Dyen's *H as refined by Tsuchida (1976), the Philippine evidence he proposed for *-H rarely included languages with an unambiguous coda -h.

The sections that follow address two categories of word-final aspirates: -h from PAn *S, and -h in a handful of languages corresponding to zero in all others. Although the first of these categories involves a relatively straightforward application of the comparative method, the second presents a methodological conundrum, and these instances of -h will, for lack of a better term, be called the "mystery aspirates." Our primary concern, then, will be with the appearance of word-final -h in certain languages of the Philippines where previously their Proto-Malayo-Polynesian (PMP) bases were thought to end with a vowel. But first, to avoid possible confusion, it will be best to review /h/ in Philippine languages as a reflex of PAn *S.2


The PAn sibilant *S is reconstructed in initial, medial, and final positions. In Proto-Malayo-Polynesian, the immediate ancestor of all AN languages outside the island of Taiwan, *S became *h, as shown in (1): [End Page 222]


PAn *Sajek PMP *hajek 'to sniff, smell'
Saisiyat s<om>azek Itbayaten harek
Pazeh sa-sazek Tagalog halÍk
Kavalan sanek Bikol hadók
Amis sanek Soboyo hayo?
PAn *Sasaq PMP *hasaq 'to whet, sharpen'
Bunun sasaq Tagalog hása?
Paiwan t<m>ataq(< A) Soboyo hasa
PAn *CuSuR PMP *tuhuR 'to string, as beads'
Kavalan tusuR Itbayaten tohoy
Bunun ma-tusul Tagalog túhog
Paiwan tsusu Hanunóo túhog
PAn *iSiq PMP *ihiq 'urine; to urinate'
Amis isi? Tagalog Íhi.
Favorlang isi Bikol Íhi.
Rukai (Maga) isii Hanunóo Íhi.
Paiwan isiq W Bukidnon
Manobo ihi?
PAn *CnaS PMP *tiŋah 'food stuck in teeth'
Saisiyat JW Itbayaten tiñah
Pazeh siqas Tagalog tiŋa
Amis tujas Bikol tiŋa
Paiwan tsiqas Cebuano tiŋa
PAn *CumeS PMP *tumah 'clothes louse'
Kavalan tumes Itbayaten tomah
Saisiyat somsh Tagalog túma
Amis tumus Bikol túma
Hanunóo túma

At least since Dyen (1953a). the most reliable witnesses for PMP *h have been Tagalog and other Central Philippine languages, together with a few Philippine languages that Dyen did not consider, most notably Itbayaten (itb), one of the four languages in the Batanes Islands between Taiwan and Luzon, Hanunóo and perhaps other languages of Mindoro, and the Manobo languages of Mindanao. In addition, Blust (1981) found that Soboyo, a Central Malayo-Polynesian language in the central Moluccas of eastern Indonesia, preserves PAn *S-as word-initial h-.

What stands out in data sets (1)–(6) is that even in languages that retain *S as h in onset position, it is lost word-finally in all languages but Itb. The only qualifications required in this statement are that -h appears in Tagalog alternations such as abó 'ash' : abuh-án 'ashpit', and that the West Bisayan language Aklanon (Zorc 1969) sometimes reflects PAn *-S as -h in absolute final position. However, comparative evidence has shown that the thematic -h in Tagalog abuh-án and many other forms is historically secondary, thus negating its value as evidence for PMP *-h in cases like tubó 'sugarcane' : tubuh-án 'sugarcane plantation', where it does reflect PAn*S (cf. PAn *tebuS 'sugarcane').3

The situation is somewhat different in Aklanon as presented in Zorc (1969), where many examples of apparently nonetymological -h are given, as in PAn *apu 'grandpar-ent [End Page 223] or grandchild (recipr.)' > apó(h) 'grandchild', *SadiRi > halígi(h) 'column, post, mainstay, support (of house)', *lima > limá(h) 'five', or *ta-telu 'three (of humans)' > tátlo(h) 'to raise or lower to three, make three' (cp. tátlo 'three'); and in still other cases PAn *-S is reflected as zero, as in *CiŋaS > tiŋá 'food particles caught between teeth', or *tebuS > tubó 'sugarcane', ka-túbw-an 'sugarcane plantation'.

Zorc (pers. comm., January 6, 2018) now says that his dictionary contains many errors in the representation of -h, and he has provided an extensive set of forms in a revised orthography which, in accordance with his suggestions, will be followed here. Unlike the case in Tagalog, where -h is commonly added before a suffix whether it is etymologically justified or not, Aklanon -h is said to appear both medially and word-finally as a syllable coda, where it is contrastive.

Given the situation that prevailed until recently, the reconstruction of PMP *-h based on coda /h/ had depended entirely on Itb, which alone of the four Batanic languages (Yami, Itbayaten, Ivatan, Ibatan) reflects PAn *S in all positions, as shown in dataset (2) (Tsuchida, Yamada, and Moriguchi 1987):


PAn *Sapuy 'fire'
PMP *hapuy 'fire'
Yami (Imorod) apoy
Itbayaten hapoy
Ivatan (Isamorong) apoy
Ibatan apoy
PAn *bukeS 'head hair'
PMP *buhek 'head hair'
Yami (Imorod) ovok
Itbayaten vohok
Ivatan (Isamorong) *Sapuy
PAn *CŋaS 'food particles in teeth'
PMP *tiŋah 'food particles in teeth'
Yami (Imorod) ciŋa
Itbayaten tiñah
Ivatan (Isamorong) tiña
Ibatan tiña

While PAn *S > PMP *h is attested both initially and medially in various Philippine languages, and initially in Soboyo of the central Moluccas, then, it has appeared until now that a nonzero reflex of PAn *-S in Malayo-Polynesian (MP) languages is confined to a single reliable witness, namely Itbayaten of the Batanes Islands, with the small qualification that Cebuano Bisayan reflects coda *S as postconsonantal medial h, as in PAn *kuSkuS 'scrape' (> *kuhkuh > *kuhku) > Cebuano kukhú 'scrape, scratch off something that sticks to a surface'. However, even in Cebuano, *-S is lost in absolute final position. [End Page 224]

Given this rather longstanding situation, it comes as a surprise to discover that Itb is only one of several geographically scattered and genetically diverse Philippine languages that reflect PAn *-S as -h. A corollary of this discovery is that even where documentation is relatively dense, as with the Philippines, both minor languages and dialects of major languages may preserve archaic phonological features that have disappeared from all or nearly all of their better-described relatives. In the case of the Philippines, this turns out to be particularly true of languages spoken by the aboriginal Negrito population, a pattern that achieves a special irony in view of the fact that this population must have acquired ancestral forms of the languages they now speak by language shift (Reid 1987).


Ayta Abellen (AyA) is a member of the Central Luzon subgroup of Philippine languages (Stone 2008; Himes 2012), a collection of still poorly described and generally small language communities that includes "Kapampangan (or Pampango), Sinauna (or Sinauna Tagalog), three dialects of Sambal (Bolinao, Tina, and Botolan), and a number of languages spoken by Ayta Negrito populations" (Himes 2012:490). By far the largest language in this group is Kapampangan, but because it plays no part in the following discussion, it will not be discussed further. Rather, the phonologically more conservative languages appear to be those of the Ayta Negrito groups, many of which were displaced after the catastrophic eruption of Mt. Pinatubo on June 15, 1991. Of these, Ayta Abellen, represented by a useful online dictionary (Stone 2007), stands out as particularly valuable for the purpose at hand.

The first clue that this is the case comes in seeing forms such as AyA kokoh 'fingernail, toenail' (PAn *kuSkuS, Tagalog kukó), or toboh 'sugarcane' (PAn *tebuS, Tagalog tubó). However, before reaching any conclusion, it is necessary to determine that this final aspirate (i) is distinctive, unlike the -h that was regularly added in many languages of northern Sarawak (Blust 1969:91), and in at least some dialects of Tboli and Tausug in the southern Philippines (Reid 1971), and (ii) is not a back-formation from a paradigm like Tagalog tubó 'sugarcane' : tubuh-án 'sugarcane plantation'. With only a few exceptions, to be noted below, words reconstructed with a final vowel in PAn have a final vowel in AyA: *lima > lima 'five', *maCa > mata 'eye', *Suaji > ali 'younger sibling', *ba-bahi > babayi 'female', *qaNiCu > anito 'ghost, spirit of the dead', *asu > aso 'dog', and so on. This strongly suggests that the -h in words like kokoh or toboh is a retention of an original final consonant.

The most complete demonstration that AyA reflects PAn *-S as -h is through a tally of all forms with PAn *-S that have reflexes in Itb, AyA, or both in the online open access Austronesian comparative dictionary (Blust and Trussel ongoing). This is shown in table 1.

The data in table 1 fall into two categories: forms that show *S metathesis, and those that do not. *S metathesis is a semi-regular (recurrent) sound change that distinguishes PMP from PAn.4 The essential condition for this innovation is a PAn sequence *-CVS, where C was a stop and *-CVS did not occur in a reduplicated monosyllable such as [End Page 225] *kuSkuS 'claw, fingernail' (Blust 1993:178–79). In table 1, this can be seen in nos. 2, 12, 15, 17, and 18, all of which show a change from PAn *-CVS to PMP *-hVC (the metathesis in *kaSiw > Itb kayoh, AyA kayo 'wood, tree' is sporadic, and unconnected with *S metathesis as described here). A sixth example, which lacks reflexes in either of these languages, is PAn *CaqiS > PMP *tahiq 'to sew'. *S metathesis is thus recurrent, although there are unexplained exceptions, as with PAn *CebuS/tebuS > PMP *tebuh (not **tehub) 'sugarcane', and PAn *tiR(e)peS > PMP *tipah (not **tihap) 'spittle'.

Click for larger view
View full resolution
Table 1.


In metathesized forms, PAn *-S is reflected as medial h in many Philippine languages, but evidence for word-final -h from PAn *-S is much harder to find. What is striking about the forms in table 1 that have not undergone *S metathesis is that Itb and AyA agree in three of the four cases where both have relevant data: tomah 'clothes louse', kokoh 'claw, fingernail', and kohkoh 'to scrape or scratch up'. Both the agreement with Itb and the more general correspondence with PAn *-S make it clear that Ayta Abellen is no less a witness for PMP *-h than is Itbayaten. Between these two languages, there are 12 nonborrowed reflexes of PAn forms with *-S that have not undergone *S metathesis, and 10 of them, or over 83 percent, have -h. The two exceptions are AyA kayo 'wood' and ilo 'wipe the anus'. Although the absence of -h in these words is unexplained, the similar problem in the word for 'typhoon' is clearly a result of borrowing, since PAn *R regularly becomes y in all Sambalic languages. The question that this overwhelming agreement inevitably raises is whether there are other witnesses in the Philippines for word-final h as a reflex of PAn *S.


So far, only Itbayaten and Ayta Abellen have provided clear support for PMP *-h as a continuation of PAn *-S. However, since these languages belong to different Philippine [End Page 226] microgroups (Blust 1991), it is likely that other Philippine languages preserve the contrast of -V vs. -Vh, and that this distinction has been underrepresented in the descriptive literature. A concerted effort to find such cases has, in fact, proved fruitful.

To date, discussions of the historical phonology of Central Luzon (CL) languages (Stone 2008; Himes 2012) have not recognized the retention of PAn *S as h.5 This oversight is surprising in view of the clear retention of *-S as a word-final aspirate in Ayta Abellen shown in table 1, but it cannot be judged too harshly in view of the limited number of examples and the absence of the contrast in the best-known languages of the group, namely Kapampangan and Botolan Sambal.

Other CL languages that have been investigated to date present a mixed picture: there is some evidence for retention of *-S as an aspirate, but it is limited in the number of forms available, in some cases as a result of phonotactic conditions. The cases uncovered so far are presented in the following subsections.

2.2.1 Tina Sambal

Although the better-known Botolan Sambal invariably reflects *-S as zero, the Tina dialect has two known examples that support the view that PAn *-S became -h (Elgincolin, Goschnick, and Elgincolin 1988):

(3) PAn *kuSkuS 'claw, fingernail'>   kokóh 'hoof'
PAn *tebuS          > tobóh 'sugarcane'

Since the final aspirate in these words contrasts with a final vowel in PAn *asu > Tina Sambal aso 'dog', *kuCu > koto 'head louse', or *siku > hiko 'elbow', it is clear that it is not a product of regular sound change, and we have little choice but to assume that PAn *-S was retained as -h in at least these two words, or that it was acquired by borrowing these forms from neighboring languages in which *-S was retained as -h. Needless to say, if we did not have the much richer data from Ayta Abellen, Tina Sambal would provide a very insecure foundation for the claim that PAn *-S was retained as -h in Proto-Central Luzon.

It is likely that this language has more examples of -h than are reported here, but the only dictionary currently available for it is highly provisional, and the lexical categories that it represents are based on English rather than the native system.

2.2.2 Ayta Mag-Antsi

This CL language also provides limited evidence for retention of PAn *-S as h, but although it is limited, it is conditioned in such a way as to leave little doubt that the supporting evidence is valid.

PAn *-S in nonreduplicated bases disappeared without a trace, as seen in the criterial cases in (4):


*tebuS      > tubó ósugarcaneó
*Ci.aS óefood particles caught between teethóf >tsi.a óetoothpickóf

However, in reduplicated monosyllables that had the shape CVSCVS in PAn, the coda is retained as an aspirate in both syllables, as shown in (5): [End Page 227]


*kaSkaS   >   kahkah óto shave, scrapeó

*kiSkiS   >   kihkih óto scrapeó

*kuSkuS   >   kuhkuh óto scratch (as a cat scratching)ó

Although the number of examples is small, then, there is complete consistency in showing that PAn *S in simplex bases disappeared word-finally in Ayta Mag-Antsi, but was retained in the reflexes of reduplicated monosyllables, whereas in Ayta Abellen PAn *S was retained in both environments.

The one case that might be considered questionable, but that turns out on closer inspection to further confirm this generalization, is the following:

(6) *kuSkuS   >   koko óclaw, fingernail or toenailó

Since this word and *kuSkuS 'to scratch' were homophones in PAn, we would expect them to show the same development in MP languages, but as noted briefly in the footnote to table 1, they are consistently different in Philippine languages, as with Ayta Abellen kokoh 'claw, fingernail' vs. kohkoh 'to scrape or scratch up', or Cebuano kukú 'nail, claw' vs. kukhú ~ kalukhú 'scrape, scratch off something that sticks to a surface'.

2.2.3 Bolinao

Although few reflexes have been found so far, data in McFarland (1977) suggests that PAn *-S disappeared in Bolinao without a trace, as seen in (7):


*kuSkuS óclaw, nailó  >  kuko ófingernailó

*Ci.aS     >  ti.a óefood caught in teethóf

Relevant material for other CL languages, apart from very limited data for Remontado (Sinauna), have not yet been accessed.6

2.2.4 Waray-Waray

For the northern and eastern Samar dialects of Waray-Waray in the central Philippines, Jason Lobel (pers. comm., August 14, 2017) has reported the occurrence of coda h in reduplicated monosyllables only. Most of the examples he has provided lack known PAn etymologies, but two comparisons show that -h reflects PAn *-S:


*kaSkaS   >   kahkah óto scratch an itchó

*kiSkiS   >   kihkih óto shave offó

These dialects of Waray-Waray, thus, show a pattern of retention for PAn *-S that is identical to that in Ayta Mag-Antsi, namely, preservation as -h in reduplicated monosyllables, but loss in simplex bases.

2.2.4 Mamanwa

Mamanwa, spoken by a Negrito population in northeast Mindanao, has not previously been reported as allowing -h (for example, Reid 1971). However, Jason Lobel (pers. comm., October 5, 2017) has recorded a clear contrast of final -h vs. final vowel in at least one dialect of the language, as shown in table 2. For *-V > -V, cf. PAn *maCa > matá 'eye', *lima > limá 'five', *ba-bahi > babazi 'woman', or *batu > bató 'stone'. [End Page 228]

Click for larger view
View full resolution
Table 2.


2.2.6 Aklanon

Although his 1969 dictionary leaves the matter somewhat in limbo, as noted earlier, Zorc has now rechecked all examples of word-final -h in Aklanon, and finds that the forms given in table 3 occur as true word codas reflecting PAn *S. Of these, items 2, 7, 8, and 11 show *S-metathesis (with subsequent loss of -h-in eusáɁ, for expected eusaháɁ 'nit, louse egg'). The remaining seven words reportedly preserve word-final /h/ intact.

Click for larger view
View full resolution
Table 3.


2.2.7 Summary

To summarize, with rare exceptions, PAn *S is reflected consistently in all Philippine languages that have been examined, although the nonzero reflexes fall into four types:

TYPE 1: Languages that reflect PAn *S as h-, -h-, -ØC-, -Ø:

  • • Central Philippines: Tagalog, Bikol, Hiligaynon, Tausug

  • • Mangyan: Hanunóo

  • • Manobo: Ata, Western Bukidnon Manobo, Tigwa Manobo, Binukid

TYPE 2: Languages that reflect PAn *S as h-, -h-, -Ch-, Ø:

  • • Central Philippines: Cebuano

TYPE 3: Languages that reflect PAn *S as h-, -h-, -hC-, -h, but the last two of these only in reduplicated monosyllables:

  • • Central Luzon: Ayta Mag-Antsi

  • • Central Philippines: Waray-Waray (northern and eastern Samar)

TYPE 4: Languages that reflect PAn *S as h-, -h-, -hC-, -h without qualification:

  • • Batanic: Itbayaten

  • • Central Luzon: Ayta Abellen, Tina Sambal

  • • Central Philippines: Mamanwa, Aklanon

[End Page 229]


So far, we have found a strikingly regular pattern in which PAn *-S—based on sibilant reflexes in various Formosan languages—predicts -h in Itbayaten and Ayta Abellen with nearly perfect regularity, with additional support from Tina Sambal, Ayta Mag-Antsi, Aklanon, Mamanwa, and some dialects of Waray-Waray. However, there is a puzzling asymmetry in this process: what is the result if we start with -h in a Malayo-Polynesian witness that corresponds to zero in most other languages, and try to trace its origin? Does this segment reliably predict a corresponding sibilant in Formosan languages that reflects PAn *-S? Surprisingly, the answer is 'no', but before discussing the correspondences that I will call the "mystery aspirates," it is necessary to clear away some potentially obscuring features of the problem.

Yamada (2002) lists 294 Itbayaten words with -h. Most of these have no etymology, but those that do show that -h generally reflects PAn *-S, with some exceptions to which we turn shortly. By contrast, Ayta Abellen merged *S and *s as h in coda position (in onset position *s also became h, but *S disappeared). The lenition of *s is seen both in native words (PAn *beRas > beyah 'husked rice', *Sipes > ipeh 'cockroach', *Caŋis > taŋih 'to cry'), and in Spanish loans, including many nouns that were borrowed in their plural forms in Philippine languages (dioh 'God', Spanish Dios; bayawah 'guava', Spanish guayabas; hiboyah 'onion', Spanish cebollas; maih 'corn, maize', Spanish maiz; kamatih 'tomato', Spanish tomates). Once these cases are excluded, we are left with a small number of native words ending with a word-final aspirate that cannot be derived from PAn *-s or *-S. These are shown without reconstructions in table 4, but with an 'X' indicating that a protoform is possible on the stated level with the available data. The problem, in effect, is to solve for 'X'.

Each of these words is reflected in Itb, AyA, or both with a final h that is not explained by the established PMP reconstructions, all of which end with a vowel, and it is clear that some hypothesis must be formulated to account for this departure from expectation. We have essentially two choices: (i) consider the final aspirates innovations, or (ii) consider them reflexes of phonemes that have not previously been reconstructed. The first choice is quickly ruled out on the grounds that many words in both languages have a final vowel where this is expected based on established reconstructions, making the -h in table 4 unconditioned, and hence a violation of the regularity hypothesis. This leaves us with the second choice: the final aspirates are retentions of consonants in Proto-Philippines (PPH), PMP, and in some cases PAn.7 So, what do we reconstruct? The commonsense solution is *-h, but this turns out to be more problematic than first impressions might suggest.

Even before incorporating evidence from other languages in these comparisons, it can be seen that three of the four cognate sets in table 1 that are represented by both languages (that is, nos. 5, 10, and 11) agree in reflecting PAn *-S as -h, but the eight cognate sets in table 4 that are shared by both languages exhibit multiple sound correspondences (C1– C3), as shown in (9): [End Page 230]

Click for larger view
View full resolution
Table 4.



C1 Itb h AyA h (3,5)
C2 0 h (4, 14, 23)
C3 h 0 (7, 11, 21)

Since the final aspirates in table 4 do not continue PAn *-S, it is clear that there are at least four different sound correspondences in Philippine languages that appear to reflect some type of glottal spirant. By mechanically mapping sound correspondences onto protophonemes we might posit four reconstructed segments that are reflected as glottal fricatives or zero, as in (10):


*h1: *tebuh1 'sugarcane', *tumahi 'clothes louse', *kih1kih1 'scrape off', etc.
*h2: *baRah2 'ember', *depah2 'fathom'
*h3: *bukuh3 'joint, node, knuckle', *kutuh3 'head louse', tubah 'derris root fish poison'8
*h4: *-nuh4 'interrogative marker', *umah4 'kiss', *sikuh4 'elbow'

[End Page 231]

This is the procedure that was followed by Dyen (1965) and Tsuchida (1976) for PAn, although it has not previously been proposed on the level of PMP, let alone PPH. Where a reflex is known in only one witness for PMP/PPH *-h, and diagnostic Formosan cognates are unknown, the reconstruction is ambiguous, as shown in (11):


*h(2,3): *qapah(2,3) 'empty husk (of grain)'
*duRih(2,3) 'thorn'
*qiSuh(2,3) 'shark'
*Culih(2,3) 'earwax'
*leqah(2,3) 'sesame'
*naRah(23) 'the narra tree: Pterocarpus indicus'
*papah(2,3) 'jaw, jawbone'
*puquh(2,3) 'bunch, cluster'
*h(2,4): *sasah(2,4) 'cut or collect palm leaves for roofing'

At this point, it may be worthwhile to consider the Formosan evidence for final aspirates, since Tsuchida (1976) posited just two, namely *-H1 and *H2, which may have been *-h and *x, respectively, as will be shown below. If the Formosan and Philippine evidence is concordant, *h1 in (10) would correspond to PAn *-S, *h2 would correspond to PAn *-h, and *h3 would correspond to PAn *-x, leaving *h4 as the only unexplained exception. However, to determine whether this is the case, we must first integrate the Formosan evidence with the evidence for word-final aspirates in Itbayaten and Ayta Abellen, and conduct a further search for Philippine witnesses for word-final aspirates that do not reflect PAn *-S.


As noted earlier, both Dyen (1962, 1965) and Tsuchida (1976) found evidence in several Formosan languages for final aspirates where Dempwolff (1934–38) had reconstructed final vowels. Given the newly uncovered evidence for similar segments in Philippine languages, it is important to see whether the sound correspondences in these two areas are concordant. Table 5 summarizes the Formosan witnesses identified in Tsuchida (1976) with his reconstruction of PAn word-final aspirates, (where 1 = Atayal, 2 = Saisiyat, 3 = Pazeh, 4 = Amis, 5 = Saaroa, 6 = Bunun (Takituduh), and 7 = Seediq).

The cross-linguistic agreement between several Formosan languages (especially Saisiyat, Pazeh, Amis, and Takituduh/Northern Bunun) in distinguishing these correspondences from zero and from one another is consistent enough to warrant the conclusion that PAn had two word-final voiceless spirants in the back of the vocal tract. Only the Atayalic languages (Atayal, Seediq) provide a clue to the possible phonetic difference between them, namely *H1 = *-h, and *H2 = *x. This much is typologically reasonable.

But what picture emerges when we try to integrate the evidence for Tsuchida's *-H1 and *-H2 with the evidence for final aspirates in Philippine languages? Ideally, in accord with general assumptions about sound change, *h2 and *h3 in (10) should correlate with Tsuchida's *-H1 and *-H2, leaving just C3 in (9) unexplained. Table 6 alters Tsuchida's -H1 and *-H2 to PAn *-h and *-x andmarks the distinct sound correspondences with C1– C7. Because no Philippine language is known to reflect PAn *nunuh 'breast', *Caliŋax 'ear', or *wiRix 'left side' with a final consonant, thesel forms are not mentioned further. [End Page 232]

Click for larger view
View full resolution
Table 5.


To sum up, PMP must have had *-h from PAn *-S, as shown in table 1. That much is uncontroversial, given the nearly perfect correlation of sibilant reflexes in Formosan languages with -h in either Itbayaten or Ayta Abellen, or in both. But what is the source of the other instances of -h in these two MP languages?9

As already noted, the possibility that these segments are innovations has been rejected on general methodological grounds. However, the possibility that they are retentions is hardly better, since to account for all correspondences with -h, and hence avoid the recognition of an unconditioned phonemic split, we must posit separate distinctions for each of the correspondences seen in (12): [End Page 233]

Click for larger view
View full resolution
Table 6.



Even if we ignore unique exemplifications, we still are left with C1, C4, and C6. Add to this already overburdened reconstruction the need to assume PMP *-h from PAn *-S, and the challenge to the comparative method becomes painfully obvious. It would be pointless to dignify the treatment of these correspondences with eight subscripted varieties of *h, or even four, as none of the attested languages has more than two fricatives in the glottal or velar region, and most have only one (Atayalic languages have -x and -h, described by Li [2004a:628] as "voiceless velar and pharyngeal fricatives respectively"). But the primary data remain, and the question of how to integrate them into a comprehensive and typologically realistic protolanguage will not go away.

There is, moreover, something about the mystery aspirates in MP languages that distinguishes them from the proposals of Dyen (1965) or Tsuchida (1976), who based their claims on data from Formosan languages that most linguists now assign to separate primary branches of the AN family, with the result that the proliferation of protophonemes in their reconstructive strategies was at the remotest possible level. By contrast, the word-final aspirates that would be required to account for the data in Philippine languages following the same approach will be much closer to the present, since at least C2, C3, C5, C6, and C7 need to be distinguished not only in PMP, but also in PPH. [End Page 234]

This result already appears to offer no way out: either we posit PMP *-h1 …-h8 (where *-h1 reflects PAn *-S, and the other aspirates do not), or we recognize widespread irregular change in Philippine languages where -h was added to a final vowel in some lexical items but not others. Both of these options will be considered below, but first we must ask whether Itb and AyA are the only Philippine languages that contain mystery aspirates, since if they are not, the number of sound correspondences that contain -h that is not a reflex of PAn *-S may be even larger.

As noted in an earlier section, the Central Luzon languages Tina Sambal and Ayta Mag-Antsi, as well as Mamanwa, the northern and eastern Samar dialects of Waray-Waray, and Aklanon (Central Philippine languages) all reflect PAn *-S as -h. Given this atypical development, these languages would appear to be plausible candidates as further witnesses for the mystery aspirates. However, as seen in 2.2.7, Ayta Mag-Antsi and the northern and eastern Samar dialects of Waray-Waray are Type 3 languages, which reflect *-S as -h in reduplicated monosyllables, but as zero in simplex bases. In both languages, word-final /h/ is evidently found only in words of the shape C1V2hC1V2h, and since the mystery aspirates have been attested so far only in simplex bases (table 4), the instances of -h in Type 3 languages can have no bearing on the question at hand. This leaves Tina Sambal, Mamanwa, and Aklanon as prospective sources of -h that does not reflect PAn *-S, and each of these will be examined in turn.


Only Itbayaten and Ayta Abellen have been shown to provide fairly robust support for the reconstruction of PMP *-h < PAn *-S, and each of these languages also contains examples of -h that have more obscure histories. In addition, Tina Sambal reflects PAn *-S as -h both in reduplicated monosyllables and in simplex bases. Given this pattern of development for PAn *-S, it is reasonable to expect that Tina Sambal will also provide additional examples of the mystery aspirates.

The four Batanic languages are sufficiently well described that we can be sure only Itbayaten has a word-final glottal spirant that corresponds to zero in most MP witnesses. However, the situation is descriptively quite different for Central Luzon (CL) languages. In approximate order of descending population size these are:


Tina Sambal


Botolan Sambal

Ayta Abellen

Ayta Mag-Antsi

Ayta Mag-Indi

Ayta Ambala

Ayta Bataan


The last six languages ("Ayta" varieties plus Remontado) are spoken by Negrito populations that number less than 10,000 individuals, and considerably fewer in most cases. As a result, they have received little attention from linguists as compared [End Page 235] to languages that have larger speaker numbers. Moreover, even the largest CL language, Kapampangan, lacks a dictionary comparable to those of most Philippine major languages. It is clear from the available lexical resources that Kapampangan and Botolan Sambal lack -h in cognates of AyA words with the mystery aspirates, but given the data in table 4 there is a pressing need to search for examples of the same type in other languages of this understudied group.

As already noted, Ayta Mag-Antsi can be disposed of quickly. Of the words in table 4 that have a final aspirate in either Itbayaten or Ayta Abellen (or both), Ayta Mag-Antsi has a final vowel in each of the seven forms that have a cognate morpheme: apa 'empty rice husk', baya 'ember', boko 'node, joint', dəpa 'armspan, fathom', diwi 'thorn', koto 'head louse', hiko 'elbow'. There is one partial exception. As will be seen, Tina Sambal reflects Tsuchida's PAn *qabuH2 as abóh 'ash', and the cognate form inAyta Mag-Antsi is given as abo ~ aboh 'ashes, embers', suggesting that -h in this language is undergoing loss that is complete in some forms, but only partial in others. However, Ayta Mag-Antsi appears to offer no further information regarding the mystery aspirates, and apart from a modified Swadesh 200-word list for what is called "Sinauna" in Greenhill, Blust, and Gray (2003–18) and a small set of functors for Remontado (its proper name) in Lobel (2013), little information is publicly available for other CL languages.


The situation in Tina Sambal (TS) is quite different. Like Itbayaten and Ayta Abellen, TS reflects PAn *-S as -h, as in *CebuS/tebuS > tobóh 'sugarcane', *kuSkuS > kokóh 'claw, fingernail', and *buReS 'spray water from the mouth' > i-bogáh 'to spew'. The last word is particularly interesting, since it is unknown in Itbayaten or AyA, and shows an irregular change *R > g, yet has a final /h/, unlike any other known language in the Philippines. If it is accepted as evidence, it will be the thirteenth case in which *-S > -h in a nonmetathesizing MP witness is regular, while at the same time complicating the claim that AyA bagyo, which does not show the expected -h, is a loan.

However, what is theoretically most troubling about the limited data for TS is the evidence it provides for additional sound correspondences that involve the mystery aspirates, specifically in cases where TS has -h but AyA has a final vowel. These are summarized in (13):


*Culi (?) *? tilo (<M) tolih tolóh 'ear wax'
*qabux *? avo abo abóh 'ash'
*qapeju *? apdo aplo aplóh 'gall'
*qiSu (?) *? iyoh iyóh 'shark'

The sound correspondence x : -Ø : -Ø : -h can be assigned to C4 in table 6, but this invites additional problems, since other TS words that fit this correspondence set, as bató 'stone', or oló 'head', show no final consonant, thus splitting C4 into two subtypes, as shown in (14):

(14) [End Page 236]

C4.1. *-x -h (ash)
C4.2. *-x (stone; head)

The correspondence exemplified by 'gall' is unclear. Saisiyat pæɁzoɁ, Pazeh apuzu, Bunun paqav 'bile, gall' may form a single cognate set with metathesis of the first two consonants either in Saisiyat and Bunun, or in Pazeh, or alternatively they may reflect doublets, *paqeju (for Saisiyat and Bunun) and *qapeju (for Pazeh), with a possible final laryngeal. Both Saisiyat and Pazeh could reflect a form with a final vowel, or PAn *-x. Bunun paqav (< *paqeju with *j > -Ø, *e > a, and *-au > -av) can reflect a form that ended with either PAn *-x or a vowel, unless paqav is from the Takituduh dialect, in which case the PAn form based on Formosan evidence would end in a vowel. Bunun paqav is cited from Tsuchida (1976:224), who gives it as Southern Bunun (Ishbukun), making it ambiguous for final vowel or *-x. As a result of these uncertainties, the correspondence for 'gall' could be either x : Ø : Ø : h or Ø : Ø : Ø : h. If we choose the former, it goes with 'ash'; but if the latter, it forms an entirely new correspondence in which only TS distinguishes it from zero.

In other comparisons, TS has a final vowel where AyA has -h, as in AyA bayah, TS baya 'ember', AyA kotoh, TS koto 'head louse', AyA tobah, TS toba 'derris root fish poison', or AyA diwih, TS dowi 'thorn'. However, the addition of TS data in these comparisons does not produce clear evidence of new sound correspondences, since reference to (12) shows that 'ember' can be assigned to C2, 'head louse' can be assigned to C3, and the other two terms to C6. Finally, TS has a glottal stop in two comparisons that involve final aspirates in PAn or PMP, as shown in (15):


h : – : Ø : ? : – x: Ø : Ø : ? : –
PAn *buguh 'skull' PAn *amax 'father'
PMP *bugu PMP *ama
Itb ITB ama
AyA bogo AyA ama
TS bogó? TS amó?

Even without better data for TS, then, it is clear that there are at least nine distinct sound correspondences involving a word-final aspirate in Itbayaten, Ayta Abellen, or Tina Sambal, together with Formosan evidence, namely reflexes of PAn *-S, C1–C7 in table 6, and the further split of C4 shown in (14).


Thanks to the efforts of Jason Lobel, it has become apparent that Mamanwa, a Central Philippine language spoken by a Negrito population in northeast Mindanao, is also a witness for the mystery aspirates. It has already been shown that Mamanwa reflects PAn *-S as -h, both in reduplicated monosyllables and in simplex bases (table 2). However, a number of examples of Mamanwa -h do not reflect *-S. These include the final consonant in pirah 'how much/how many?', depah 'fathom', upah 'rice husk', liŋah 'sesame', sikuh 'elbow', and ubih 'yam'.

Mamanwa data are relatively limited, but they do provide evidence of -h in some words that were not previously known to end in a consonant, as with ubih 'yam', and they agree with the appearance of -h in some other witnesses, as with sikoh next to Itb sichoh 'elbow' for what has traditionally been reconstructed as *siku. [End Page 237]


As noted earlier, Zorc (1969) reportedly was compiled before the author was a linguist. Partly for this reason, and partly because his Aklanon informants objected to writing -h in forms where it is phonetically present (but does not occur in the more prestigious national language), final aspirates were written in parentheses, leaving it unclear exactly what was intended. To add to the uncertainty, Spanish loanwords have added -h where it is historically secondary, as in abúno(h) 'fertilizer' (Spanish abono), bála(h) 'bullet' (Spanish bala), dibúho(h) 'drawing, sketch' (Spanish dibujo), or gíya(h) 'to guide, lead' (Spanish giya). Moreover, in many cases, words with final /h/ exist next to vowel-final bases that represent a different part of speech, as seen in Aklanon abúno 'to fertilize', bála 'to load, put bullets into a gun', dibúho 'to draw, sketch', or gíya 'a guide, leader'. To make the final aspirate in Aklanon even harder to interpret, some native forms show a similar pattern of alternation between coda -h and zero, as with tátlo 'three' next to tátlo(h) 'to raise or lower to three, make three', or apó(h) 'grandchild', in-ápo 'grandchildren', ka-apó-apóh-an 'future generations (of grandchildren)'. Add to this the absence of a final aspirate in *CiŋaS > tiŋá 'food particles caught between teeth' or *tebuS > tubó 'sugarcane', ka-túbw-an 'sugarcane plantation' (now regarded as errors for correct tiŋáh and tubóh), and the reader can readily understand the reluctance of some linguists to accept Aklanon as a witness for -h based on the published sources. However, Zorc has rechecked all of his data relating to Aklanon -h, and affirms that there is a contrast between words ending with a vowel, and words ending with an aspirate.


The impact of Tina Sambal, Mamanwa, and Aklanon data on the correspondences that appear in (12) is summarized below. All correspondences represent equivalent sets, whether they are attested in full or partial form. However, some sets are so minimally attested that they are assignable to multiple correspondence classes, and are consequently so ambiguous as to have little value for establishing contrast.

Because the data that must be considered cannot easily be displayed in a tabular format, they are presented here in individual cognate sets. This is shown in table 7, with Tsuchida's *H1 and *H2 replaced by *h and *x (and where Mmn = Mamanwa and Akl = Aklanon).Where Tsuchida lacks a relevant PAn reconstruction, one has been supplied from data in the Austronesian comparative dictionary (Blust and Trussel ongoing), as with *duRi 'thorn'. If Formosan cognates are available but do not include diagnostic witnesses for PAn *-h or *-x, the PAn form is followed by (?), meaning that evidence is inconclusive as to whether it ended in a vowel, or had a word-final consonant that was lost in all surviving forms.

Table 8 attempts to identify the number of distinct sound correspondences in table 7 that are known to have a "mystery aspirate" in Philippine languages. Where available, cognates in Formosan languages are cited in column 1. However, Formosan forms with *-h or *-x are omitted if all known Philippine cognates end in a vowel.

The key to compiling table 8 is to avoid conflicting signals among cognate sets that are assigned to the same correspondence class. Reference to table 7 shows that it is not difficult to find other combinations of cognate sets than those chosen here. For example, sets 6, 7, 9, 23, and 29 are mutually compatible under the assumption that they all derive [End Page 238]

Click for larger view
View full resolution
Table 7.


[End Page 239] from a gapless set of the form x : Ø : h : Ø : Ø : h. However, sets 6 and 7 are also compatible with C5, which potentially derives from a gapless set of the form x : Ø : Ø : Ø : Ø : h, but sets 9, 23, and 29 clash with this, since AyA has /h/ in each of these forms. To make this clearer, (16) shows the idealized gapless sets for C1–C13. Only C3, C8, C12, and C13 cannot be generalized in this way, the first because of the unexplained appearance of glottal stop in both Tina Sambal and Aklanon where other languages have zero or /h/, making this correspondence unique; the second because no Mamanwa data are available for any of the three cognate sets used to justify this correspondence; the third because of the absence of unambiguous evidence for a final aspirate in Formosan languages; and the [End Page 240]

Click for larger view
View full resolution
Table 8.


[End Page 241] fourth because of gaps in both Mamanwa and Aklanon. Finally, only C10 is represented by a gapless comparison (set 22: 'elbow').


PAn Itb AyA TS Mmn Akl
C1 h 0 0 0 h h
C2 h h h 0 h h
C3 h 0 ? ?
C4 x 0 0 ? h h
C5 x 0 0 0 0 h
C6 x 0 h 0 0 h
C7 x h h h h h
C8 x 0 0 h h
C9 0 h h 0 h h
C10 x h 0 0 h h
C11 ? 0 h h h h
C12 0 0 0 h h h
C13 0 h 0 0

All attempts to reduce the number of distinct correspondence classes by reassignment of cognate sets has proven fruitless, and we are left with an apparently irreducible set of thirteen. Although C3 is justified by only one cognate set (set 3: 'skull'), the number of cognate sets that represent other correspondence classes will vary with the assignment of ambiguous sets. To illustrate, set 26 ('yam': – : Ø : – : – : h : h) is compatible with C1, C4, C8, C11, and C12. If set 26 is arbitrarily assigned to C1, that correspondence class will gain support, and the other four correspondence classes will lose support; and so on with any arbitrary reassignment. For this reason, based on the data currently available, it is impossible to say with certainty how much support a given correspondence class has. However, it can be said with certainty that 33 cognate sets are represented by C1–C13, for an average of about 2.5 forms each, although the assumption of an equal distribution [End Page 242] across correspondence classes is arbitrary and perhaps untrue. Add to these the more numerous instances of PAn *-S > -h, and it is hard to escape the conclusion that Philippine languages have at least 12 distinct sound correspondences (all but C3) that involve -h in some languages in correspondence with zero in most others.

All ambiguities in the final consonant of the 33 items in (16) are indicated explicitly in table 9.

Click for larger view
View full resolution
Table 9.


It is hard to say how much longer this list might become if more witnesses for the mystery aspirates were discovered, but the data considered so far are already sufficient to show that there is a major comparative puzzle in the historical phonology of Philippine languages waiting to be solved.


What historical events could possibly have produced the range of sound correspondences shown in Table 8? Two possibilities come to mind:

  1. 1. The mystery aspirates were added in the separate history of several languages in the Philippines.

  2. 2. The mystery aspirates are retentions from PPH, PMP, and PAn.

Possibility 1 is an a priori violation of the Regularity Hypothesis. To circumvent this problem, we might argue that in many Philippine languages -V alternates with -Vh before a vowel-initial suffix. This may have been an old pattern that led to analogical wrong-division in particular cases, producing word-final -h where before there had been only intervocalic aspirates as a regular phonological feature of suffixation. However, this interpretation cannot easily explain agreements between widely separated languages. For example, if the word for 'elbow' had been *siku in a language ancestral to the modern [End Page 243] languages of the Philippines as is suggested by Ayta Abellen, Tina Sambal hiko, there is no obvious explanation why -h would have been added independently in Itbayaten sichoh, and Mamanwa, Aklanon sikuh, but not in words such as Itbayaten avo 'ashes', when Tina Sambal and Aklanon both show -h in the cognate form.

To avoid the unpredictability of -h in this approach, we might, therefore, posit a final consonant in each of the protoforms ancestral to the comparisons in table 7. However, the problem then becomes "What do we reconstruct?" The normal rule of thumb following the comparative method is to posit a distinct protophoneme for each correspondence class that is not in complementary distribution with another. In this case, we would be forced to posit thirteen aspirates *h1 – *h13 word-finally (-h from PAn *-S, and one of each *h for each correspondence class except C3 in table 8). The typological implausibility of this approach is no less objectionable than the unpredictability of approach 1. We are left, then, at an impasse, which is why these aspirates in Philippine languages are "mysterious": they fail to agree with theoretical expectations, and so leave the analyst with no good explanatory alternative.

As a compromise, we might assume that PMP and PPH had two word-final aspirates, *-h and *-x, parallel to the reconstruction of PAn *-H1 and *-H2 in Tsuchida (1976). The first of these reflected PAn *-S, and is consistent across all languages in all comparisons. The second reflected PAn *-h and *-x, and is far less consistent across languages. However, even here we encounter what appear to be insoluble problems. Laryngeals may be inherently unstable, and subject to sporadic loss, but why would this be true of PMP *-x from PAn *-h and *-x, but not of PMP *-h from PAn *-S? Or should we assume that PAn *S became PMP *x, and, hence, was more resistant to lenition than PMP *-h? And what of the subsequent history of PMP *-h: why was it retained in some forms in some languages but lost in others in a pattern that resists any simple analysis?

In short, there seems to be no plausible basis for treating the mystery aspirates as either innovations or retentions, leaving us with a methodological dilemma. Treating them as innovations requires the recognition of widespread sporadic change that inexplicably targeted the same morphemes in widely separated languages, while treating them as retentions would appear to require the reconstruction of as many as thirteen types of *h, violating typological plausibility. It is true that many of the sound correspondences summarized in (1) are represented by single forms, but as already noted, these numbers in some cases are arbitrary, given the multiple possibilities of compatibility for some correspondences with critical gaps in attestation. Moreover, comparisons such as those for 'ember' or 'elbow' cannot easily be explained as innovations, since they agree in widely separated languages in having an unexpected final aspirate.

This raises an important issue of a more general kind. In any branch of science, research does not end in a collection of data. Rather it ends in a set of generalizations about those data—a set of principles that show why the observable data behave the way they do. Given this imperative, it is clear that theory and data may appear irreconcilable, as in the present case. When that happens, there are no generally accepted guidelines that show how the data should be reported. Repeated reactions from referees to manuscripts of my own have shown that there is a substantial faction among linguists that opposes the publication of anomalous data without an accompanying theory, as unexplained data [End Page 244] establish no generalization, and, hence, fail to meet the signal requirement of all sciences, that the analysis of data should lead to a formulation of general principles. Unfortunately, this valid concern is easily twisted into an implicit claim that when theory and data conflict, theory must predominate, even to the point of excluding data from consideration. I believe this attitude is seriously mistaken.

In perhaps the earliest generalization made in linguistics, Jacob Grimm in 1822 formulated the sound correspondence that he called the first Germanic consonant shift, a change that subsequently came to be known as "Grimm's Law." What Grimm found was a pattern of correspondence between letters of the alphabet (Buchstaben) representing the stops of other Indo-European languages and their corresponding alphabetic symbols in Germanic, a pattern that was repeated so often that it clearly required a hypothesis of a common origin of the Indo-European languages, and of the unity of the Germanic subgroup.

However, not every word fit this pattern, and rather than sweep nonconforming cases under the proverbial rug, Grimm listed them conscientiously so that four decades later, Hermann Grassmann, and a decade after him, Karl Verner were able to show that these "exceptions" to Grimm's laws are in fact subregularities. It is possible, and, in fact, likely, that the subregularities formulated as Grassmann's Law and Verner's Law would have been discovered even without Grimm's list of exceptions, but his openness in listing non-conforming cases probably accelerated the discoveries that ultimately made Grimm's Law more complete than it was in its original formulation.

In my view, the same principle that governed this classic case should govern others: when theory and data disagree, theory does not trump data, leading to its exclusion from consideration until some undefined future time. Rather, irregularities should be highlighted, not treated as an embarrassment to science, since cases that challenge existing assumptions are the ones most likely to lead to advances in understanding. The anomalous "mystery aspirates" that are reported here are offered with no current explanation, in the hope that in time an explanation will be found, and it is in this spirit that I have brought them to the attention of other linguists who may in time provide a better understanding for them than is currently available.

Robert Blust
University of Hawai'i


Adelaar, Alexander. 2011. Siraya: Retrieving the phonology, grammar and lexicon of a dormant Formosan language. Trends in Linguistics Documentation 30. Berlin: de Gruyter Mouton.
Blust, Robert. 1969. Some new Proto-Austronesian trisyllables. Oceanic Linguistics 8:85–104.
———. 1981. The Soboyo reflexes of Proto-Austronesian *S. In Historical Linguistics in Indonesia, ed. by Robert A. Blust, 21–30. NUSA: Linguistic studies in Indonesian and languages in Indonesia, vol. 10.
———. 1991. The Greater Central Philippines hypothesis. Oceanic Linguistics 30:73–129.
———. 1993. *S metathesis and the Formosan/Malayo-Polynesian language boundary. In Language––a doorway between human cultures: Tributes to Dr. Otto Chr. Dahl on his ninetieth birthday, ed. by Øyvind Dahl, 178–83. Oslo: Novus. [End Page 245]
———. 2013 [2009]. The Austronesian languages. Rev. ed. Asia-Pacific Linguistics Open Access Monographs. Canberra: Research School of Pacific and Asian Studies, The Australian National University.
———. 2017. Regular metathesis in Batanic (Northern Philippines)? Oceanic Linguistics 56:491–504.
Blust, Robert, and Stephen Trussel. Ongoing. Austronesian comparative dictionary. Open access online resource available at
Brandstetter, Renward. 1916. An introduction to Indonesian linguistics: Being four essays by Renward Brandstetter. Trans. by C. O. Blagden. London: Royal Asiatic Society Monographs XV.
Dahl, Otto Chr. 1976 [1973]. Proto-Austronesian. 2nd, rev. edition. Scandinavian Institute of Asian Studies Monograph Series, No. 15. London: Curzon Press.
Dempwolff, Otto. 1934–38. Vergleichende Lautlehre des austronesischen Wortschatzes. Zeitschrift fur Eingeborenen-Sprachen, Supplement 1. Induktiver Aufbau einer indonesischen Ursprache (1934); Supplement 2. Deduktive Anwendung des Urindonesischen auf austronesische Einzelsprachen (1937); Supplement 3. Austronesisches Wörterverzeichnis (1938). Berlin: Reimer.
Dyen, Isidore. 1951. Proto-Malayo-Polynesian *Z. Language 27:534–40.
———. 1953a. The Proto-Malayo-Polynesian laryngeals. William Dwight Whitney Linguistic Series. Baltimore: Linguistic Society of America.
———. 1953b. Dempwolff's *R. Language 29:359–66.
———. 1962. Some new Proto-Malayopolynesian initial phonemes. Journal of the American Oriental Society 82:214–15.
———. 1965. Formosan evidence for some new Proto-Austronesian phonemes. Lingua 14:285–305.
Elgincolin, Sotero B., Hella E. Goschnick, and Priscilla R. Elgincolin. 1988. Diksyonaryon English–Sambalì Tinà–Pilipino. Manila: Summer Institute of Linguistics, Philippine Branch.
Greenhill, Simon J., Robert Blust, and Russell Gray. 2003–18. The Austronesian basic vocabulary database. Online resource:
Himes, Ronald S. 2012. The Central Luzon group of languages. Oceanic Linguistics 51:490–537.
Li, Paul Jen-kuei. 2004a. Reconstruction of Proto-Atayalic phonology. In Selected papers on Formosan languages, ed. by Paul Jen-kuei Li, 625–92. Language and Linguistics Monograph Series No. C3. Taipei: Institute of Linguistics, Academia Sinica.
———. 2004b. A comparative study of Bunun dialects. In Selected papers on Formosan languages, ed. by Paul Jen-kuei Li, 743–66. Language and Linguistics Monograph Series No. C3. Taipei: Institute of Linguistics, Academia Sinica.
Lobel, Jason William. 2013. Philippine and North Bornean languages: Issues in description, subgrouping, and reconstruction. PhD diss., University of Hawai'i.
McFarland, Curtis D. 1977. Northern Philippine linguistic geography. Studies of the Languages and Cultures of Asia and Oceania monograph series, No. 9: Tokyo: Institute for the Study of Languages and Cultures of Asia and Oceania.
Prentice, D. J. 1974. Yet another PAN phoneme? Oceanic Linguistics 13:33–75. [End Page 246]
Reid, Lawrence A. 1982. The demise of Proto-Philippines. In Papers from the Third International Conference on Austronesian Linguistics, vol. 2: Tracking the travellers, ed. by Amran Halim, Lois Carrington, and S. A. Wurm, 201–16. Canberra: Pacific Linguistics.
———. 1987. The early switch hypothesis: Linguistic evidence for contact between Negritos and Austronesians. Man and culture in Oceania 3:41–59.
Reid, Lawrence A., ed. 1971. Philippine minor languages: Word lists and phonologies. Oceanic Linguistics Special Publication No. 8. Honolulu: University of Hawai'i Press.
Ross, Malcolm. 1992. The sound of Proto-Austronesian: An outsider's view of the Formosan evidence. Oceanic Linguistics 31:23–64.
———. 1995. Some current issues in Austronesian linguistics. In Comparative Austronesian dictionary: An introduction to Austronesian studies, Part 1, Fascicle 1, ed. by Darrell T. Tryon, 45–120. Berlin: Mouton de Gruyter.
———. 2005. The Batanic languages in relation to the early history of the MalayoPolynesian subgroup of Austronesian. Journal of Austronesian Studies 1(2):1–24.
Smith, Alexander D. 2017. The Western Malayo-Polynesian problem. Oceanic Linguistics 56:435–90.
Stone, Roger. 2007. Ayta Abellen dictionary and texts Introduction – Work in Progress. Online resource available through SIL, Philippines.
———. 2008. The Sambalic languages of central Luzon. Studies in Philippine languages and cultures 19:158–83.
Tsuchida, Shigeru. 1976. Reconstruction of Proto-Tsouic phonology. SLCAA Monograph Series 5. Tokyo: Institute for the Study of Languages and Cultures of Asia and Africa.
Tsuchida, Shigeru, Yukihiro Yamada, and Tsunekazu Moriguchi. 1987. Lists of selected words of Batanic languages. Tokyo: Department of Linguistics, Faculty of Letters, University of Tokyo.
Yamada, Yukihiro. 1976. A preliminary dictionary of Itbayaten. Typescript.
Zorc, R. David. 1969. A study of the Aklanon dialect, vol. 2: Dictionary. Kalibo, Aklan: Public Domain.
———. 1982. Where, o where, have the laryngeals gone? Austronesian laryngeals reexamined. In Papers from the Third International Conference on Austronesian Linguistics, vol. 2: Tracking the travellers, ed. by Amran Halim, Lois Carrington, and S. A. Wurm, 111–44. Canberra: Pacific Linguistics.
———. 1986. The genetic relationships of Philippine languages. In FOCAL II: Papers from the Fourth International Conference on Austronesian Linguistics, ed. by Paul Geraghty, Lois Carrington, and S.A. Wurm, 147–73. Canberra: Pacific Linguistics.
———. 1996. The reconstruction and status of Austronesian glottal stop—Chimera or chameleon. In Reconstruction, classification, description: Festschrift in honor of Isidore Dyen, ed. by Bernd Nothofer, 41–72. Abera Network, Asia-Pacific 3. Hamburg: Abera Verlag Meyer & Co. [End Page 247]


1. I am grateful to an anonymous referee, and especially to R. David Zorc for his exceptionally thorough review of an earlier version of this paper, which contributed substantially to its improvement. I also wish to thank Jason Lobel, who drew my attention to Mamanwa and the northern and eastern Samar dialects of Waray-Waray as additional witnesses for the "mystery aspirates." Needless to say, any remaining errors of presentation or interpretation are mine alone.

2. The phoneme /h/ in Philippine languages naturally has other sources as well, as PAn *s (Ayta Abellen, Ifugaw, etc.), *j (Casiguran Dumagat), or *l (sporadic in Tagalog). Since these are irrelevant to the present discussion, they will not receive further attention except in passing.

3. As will be seen, the PAn word for 'ashes' may well have had a final laryngeal of some type, but it did not contain *-S. Since morphophonemic -h appears regularly in Tagalog before a vowel-initial suffix to separate two unlike vowels, it has no diagnostic value for the reconstruction of a final laryngeal.

4. Whether metathesis occurred before *S > h or vice versa remains an open question, and if the order was the reverse of that assumed here, we might prefer to call this *-h metathesis. However, preliminary data from Kavalan and Amis suggest that metathesis had begun before sibilant lenition, as seen in the Amis variants toris/tosir 'a line', and tosor/toros 'knee', or Kavalan tusuz 'knee', as compared with reflexes of PAn *tuduS 'knee' in Puyuma (tuɖu), Siraya (turux; Adelaar 2011:389), and the Amis variant toros.

5. Himes (2012:492), who is quite explicit about this, states that "at some time prior to the dispersal of the Central Luzon (LUZC) languages, certain of the PMP phonemes had undergone change." Among these he lists *h > Ø, adding in a footnote that cites a personal communication from Lawrence Reid, that *h may have become Ɂ in initial position, "but *h certainly was lost in all other environments."

6. The nomenclature relating to this language has had a checkered history. While early reports described it as "Sinauna Tagalog," it eventually became clear that it is not a Tagalog dialect, but rather a Central Luzon language spoken by a Negrito population. Lobel (2013, map 1.17) described it as "Remontado Dumagat," but he now (pers. comm., February 7, 2018) states that the people do not call themselves or consider themselves "Dumagats," and that the only appropriate name for it is "Remontado."

7. Not all scholars accept Proto-Philippines (Reid 1982; Ross 1995:73ff, 2005; Smith 2017). Nothing in the current argument depends upon the Proto-Philippines hypothesis, although the presence of over 1,300 Philippine-only etymologies in the online Austronesian comparative dictionary (Blust and Trussel ongoing) is difficult to explain without assuming the reality of a Philippine subgroup as formulated by Zorc (1986) and Blust (1991).

8. Cp. AyA tobah 'a species of shrub from the seeds of which croton oil is extracted', Ilokano túba 'fish poison, Croton sp. plant', tuba-en 'to poison fish in the water', as against reflexes of PAn *tuba 'fish poison: Derris elliptica' in most languages.

9. Both PMP and PPh also reflect PAn nonfinal h in a small number of forms, as in PAn *bahi > Western Bukidnon Manobo, Bukidnon bahi 'female', but considering these is beyond the scope of this paper.

Additional Information

Print ISSN
Launched on MUSE
Open Access
Back To Top

This website uses cookies to ensure you get the best experience on our website. Without cookies your experience may not be seamless.