Can phonological universals be emergent? Modeling the space of sound change, lexical distribution, and hypothesis selection: Online appendices

Rebecca L. Morley

doi:10.1353/lan.2015.0030

In lieu of an abstract, here is a brief excerpt of the content:

Language 91.2, June 2015 s1 PHONOLOGICAL ANALYSIS Can phonological universals be emergent? Modeling the space of sound change, lexical distribution, and hypothesis selection: Online appendices REBECCA L. MORLEY The Ohio State University APPENDIX A: TWO-HYPOTHESIS COMPETITION: SIMPLE AND VARIABILITY HYPOTHESES This material is supplemental to §3.2 of the main text. The posterior probability evaluation metric that the hypothesis ℎ is the correct one, given the data, 𝑑, is calculated using Bayes’s theorem: 𝑝(ℎ|𝑑) = 𝑝(𝑑|ℎ)𝑝(ℎ) 𝑝(𝑑) (A1) Under the word-independence assumption, the probability of the set 𝑑 given ℎ and 𝑦 (where ℎ = GUJARATI*, PENULT, or GUJARATI, and 𝑑 is the set of stressed words, with 𝑦 being the underlying unstressed forms) can be expanded as the product of the probability of each member of 𝑑 given ℎ and each member of 𝑦. 𝑝(ℎ|𝑑) = 𝑝(ℎ) ∏ 𝑝 𝑖 (𝑑𝑖|ℎ, 𝑦𝑖) 𝑝(𝑑) (A2) Since one is typically interested only in the relative value of the posterior probability, the ratio of posteriors for any two hypotheses can be taken to determine the winner. Thus, 𝑝(𝑑) can be ignored since it appears on both sides of the ratio, giving 𝑝(ℎ𝑖|𝑑) 𝑝(ℎ𝑗|𝑑) = 𝑝(ℎ𝑖) ∏ 𝑝(𝑑𝑥|ℎ𝑖, 𝑦𝑥) 𝑥 𝑝(ℎ𝑗) ∏ 𝑝(𝑑𝑥|ℎ𝑗, 𝑦𝑥) 𝑥 . (A3) For a given three-syllable word, 𝑦𝑥, there are three stress possibilities: 1: initial stress, 2: penultimate stress, and 3: final stress. The set of possible outputs is given by 𝐶 = {1, 2, 3}, and the stress class assigned by 𝐻𝑖 is written as a function of the input word: 𝐻𝑖(𝑦𝑥) ∈ 𝐶. For the original simple hypothesis space, each hypothesis predicts exactly one stress position per word—that is, assigns all probability to one position. Thus, the probability of stress being in any given position 𝑐 is either 0 or 1. 𝑝(𝑐|𝐻𝑖, 𝑦𝑥) = { 1 𝑐 = 𝐻𝑖(𝑦𝑥) 0 otherwise (A4) The variability versions of the simple hypotheses assign some small probability to other stress positions. From a production standpoint, the process can be conceptualized as follows. Stress placement is decided either via rule or at random. The probability that the rule will be used is high. However, the random process s2 will be chosen instead from time to time. This random process (A, for ‘arbitrary’) will result in exceptional stress placement two out of every three times for three-syllable words, and will randomly select the same location as 𝐻 one out of every three times. 𝑝(𝑐|A, 𝑦𝑥) = 1 3 , ∀𝑐 (A5) For the variability hypotheses, the probability of stress in any of the three possible locations 𝑐 is given as the weighted sum of the contributions from the two processes: 𝑝(𝑐|𝐻𝑖 α , 𝑦𝑥) = 𝑤𝑖𝑝(𝑐|𝐻𝑖, 𝑦𝑥) + 𝑤𝑎𝑝(𝑐|𝐴, 𝑦𝑥) (A6) Take 3α (= 𝑤𝑎) to be the probability that stress will be assigned randomly (thus, each position has probability α of being stressed under A). This leaves 1 − 3α as the probability with which the normal stress rule is followed (= 𝑤𝑖). The probability of stress at each possible location is given in A7. In the first instance, the two processes agree in the location of stress, at 𝑐𝑖 = 𝐻𝑖(𝑦𝑥). Otherwise, the two processes disagree, and 𝐻𝑖 assigns zero probability to each of these locations, 𝑐𝑎1, 𝑐𝑎2 ≠ 𝐻𝑖(𝑦𝑥): 𝑝(𝑐𝑖|𝐻𝑖 α , 𝑦𝑥) = (1 − 3α)𝑝(𝑐𝑖|𝐻𝑖, 𝑦𝑥) + (3α)𝑝(𝑐𝑖|𝐴, 𝑦𝑥) = 1 − 2α (A7) 𝑝(𝑐𝑎1|𝐻𝑖 α , 𝑦𝑥) = (1 − 3α)𝑝(𝑐𝑎1|𝐻𝑖, 𝑦𝑥) + (3α)𝑝(𝑐𝑎1|𝐴, 𝑦𝑥) = α 𝑝(𝑐𝑎2|𝐻𝑖 α , 𝑦𝑥) = (1 − 3α)𝑝(𝑐𝑎2|𝐻𝑖, 𝑦𝑥) + (3α)𝑝(𝑐𝑎2|𝐴, 𝑦𝑥) = α The three scenarios can be compactly expressed by the following formula: 𝐻𝑖 α : VARIABILITY VERSION OF 𝐻𝑖 (A8) 𝑝(𝑐|𝐻𝑖 α , 𝑦𝑥) = { 1 − 2α 𝑐 = 𝐻𝑖(𝑦𝑥) α 𝑐 ≠ 𝐻𝑖(𝑦𝑥) According to the definition of the variability hypotheses in A8, the probability assigned to any particular surface form is given as 1 − 2α if the form is consistent with the categorical version of the given hypothesis, and α if the form is inconsistent. Thus, it is convenient to divide the data set 𝑑 into two subsets: (i) the set of stressed words that are consistent with 𝐻 (e.g. 𝑑𝑖 = 𝐺∗(𝑦𝑖): the stress that actually appears on word 𝑦𝑖 is the same as the stress assigned by hypothesis GUJARATI* to word 𝑦𝑖), and (ii) the set of stressed words that are inconsistent with 𝐻. Equation A3 can then be rewritten as 𝑝(𝑑|GUJARATI∗α ) 𝑝(𝑑|GUJARATIα) = ∏ α [𝑑𝑥≠𝐺∗(𝑦𝑥)] ∏ (1 − 2α) [𝑑𝑥=𝐺∗(𝑦𝑥)] ∏ α [𝑑𝑥≠𝐺(𝑦𝑥)] ∏ (1 − 2α) [𝑑𝑥=𝐺(𝑦𝑥)] . (A9) If the prior probability terms are the same (𝑝(GUJARATI∗) = 𝑝(GUJARATI)), then the ratio of likelihoods in A9 is equivalent to the ratio of posteriors in A3. Derivation of equation 6: For any two hypotheses, 𝐻𝑖 α , 𝐻𝑗 α , the following variable parameters can be defined; 𝑖 = the number of data points consistent with 𝐻𝑖 AND inconsistent with 𝐻𝑗; 𝑗 = the number of data points consistent with 𝐻𝑗 AND inconsistent with 𝐻𝑖; 𝑛 = the number of data points consistent with both s3 hypotheses; and 𝑎 = the number of data points consistent with neither hypothesis. Assuming uniform priors, rewriting equation A9 in terms of these...

Language

Can phonological universals be emergent? Modeling the space of sound change, lexical distribution, and hypothesis selection: Online appendices

Share

Additional Information

Project MUSE Mission