-
Can phonological universals be emergent? Modeling the space of sound change, lexical distribution, and hypothesis selection: Online appendices
- Language
- Linguistic Society of America
- Volume 91, Number 2, June 2015
- pp. s1-s20
- 10.1353/lan.2015.0030
- Article
- Additional Information
Language 91.2, June 2015 s1 PHONOLOGICAL ANALYSIS Can phonological universals be emergent? Modeling the space of sound change, lexical distribution, and hypothesis selection: Online appendices REBECCA L. MORLEY The Ohio State University APPENDIX A: TWO-HYPOTHESIS COMPETITION: SIMPLE AND VARIABILITY HYPOTHESES This material is supplemental to Β§3.2 of the main text. The posterior probability evaluation metric that the hypothesis β is the correct one, given the data, π, is calculated using Bayesβs theorem: π(β|π) = π(π|β)π(β) π(π) (A1) Under the word-independence assumption, the probability of the set π given β and π¦ (where β = GUJARATI*, PENULT, or GUJARATI, and π is the set of stressed words, with π¦ being the underlying unstressed forms) can be expanded as the product of the probability of each member of π given β and each member of π¦. π(β|π) = π(β) β π π (ππ|β, π¦π) π(π) (A2) Since one is typically interested only in the relative value of the posterior probability, the ratio of posteriors for any two hypotheses can be taken to determine the winner. Thus, π(π) can be ignored since it appears on both sides of the ratio, giving π(βπ|π) π(βπ|π) = π(βπ) β π(ππ₯|βπ, π¦π₯) π₯ π(βπ) β π(ππ₯|βπ, π¦π₯) π₯ . (A3) For a given three-syllable word, π¦π₯, there are three stress possibilities: 1: initial stress, 2: penultimate stress, and 3: final stress. The set of possible outputs is given by πΆ = {1, 2, 3}, and the stress class assigned by π»π is written as a function of the input word: π»π(π¦π₯) β πΆ. For the original simple hypothesis space, each hypothesis predicts exactly one stress position per wordβthat is, assigns all probability to one position. Thus, the probability of stress being in any given position π is either 0 or 1. π(π|π»π, π¦π₯) = { 1 π = π»π(π¦π₯) 0 otherwise (A4) The variability versions of the simple hypotheses assign some small probability to other stress positions. From a production standpoint, the process can be conceptualized as follows. Stress placement is decided either via rule or at random. The probability that the rule will be used is high. However, the random process s2 will be chosen instead from time to time. This random process (A, for βarbitraryβ) will result in exceptional stress placement two out of every three times for three-syllable words, and will randomly select the same location as π» one out of every three times. π(π|A, π¦π₯) = 1 3 , βπ (A5) For the variability hypotheses, the probability of stress in any of the three possible locations π is given as the weighted sum of the contributions from the two processes: π(π|π»π Ξ± , π¦π₯) = π€ππ(π|π»π, π¦π₯) + π€ππ(π|π΄, π¦π₯) (A6) Take 3Ξ± (= π€π) to be the probability that stress will be assigned randomly (thus, each position has probability Ξ± of being stressed under A). This leaves 1 β 3Ξ± as the probability with which the normal stress rule is followed (= π€π). The probability of stress at each possible location is given in A7. In the first instance, the two processes agree in the location of stress, at ππ = π»π(π¦π₯). Otherwise, the two processes disagree, and π»π assigns zero probability to each of these locations, ππ1, ππ2 β π»π(π¦π₯): π(ππ|π»π Ξ± , π¦π₯) = (1 β 3Ξ±)π(ππ|π»π, π¦π₯) + (3Ξ±)π(ππ|π΄, π¦π₯) = 1 β 2Ξ± (A7) π(ππ1|π»π Ξ± , π¦π₯) = (1 β 3Ξ±)π(ππ1|π»π, π¦π₯) + (3Ξ±)π(ππ1|π΄, π¦π₯) = Ξ± π(ππ2|π»π Ξ± , π¦π₯) = (1 β 3Ξ±)π(ππ2|π»π, π¦π₯) + (3Ξ±)π(ππ2|π΄, π¦π₯) = Ξ± The three scenarios can be compactly expressed by the following formula: π»π Ξ± : VARIABILITY VERSION OF π»π (A8) π(π|π»π Ξ± , π¦π₯) = { 1 β 2Ξ± π = π»π(π¦π₯) Ξ± π β π»π(π¦π₯) According to the definition of the variability hypotheses in A8, the probability assigned to any particular surface form is given as 1 β 2Ξ± if the form is consistent with the categorical version of the given hypothesis, and Ξ± if the form is inconsistent. Thus, it is convenient to divide the data set π into two subsets: (i) the set of stressed words that are consistent with π» (e.g. ππ = πΊβ(π¦π): the stress that actually appears on word π¦π is the same as the stress assigned by hypothesis GUJARATI* to word π¦π), and (ii) the set of stressed words that are inconsistent with π». Equation A3 can then be rewritten as π(π|GUJARATIβΞ± ) π(π|GUJARATIΞ±) = β Ξ± [ππ₯β πΊβ(π¦π₯)] β (1 β 2Ξ±) [ππ₯=πΊβ(π¦π₯)] β Ξ± [ππ₯β πΊ(π¦π₯)] β (1 β 2Ξ±) [ππ₯=πΊ(π¦π₯)] . (A9) If the prior probability terms are the same (π(GUJARATIβ) = π(GUJARATI)), then the ratio of likelihoods in A9 is equivalent to the ratio of posteriors in A3. Derivation of equation 6: For any two hypotheses, π»π Ξ± , π»π Ξ± , the following variable parameters can be defined; π = the number of data points consistent with π»π AND inconsistent with π»π; π = the number of data points consistent with π»π AND inconsistent with π»π; π = the number of data points consistent with both s3 hypotheses; and π = the number of data points consistent with neither hypothesis. Assuming uniform priors, rewriting equation A9 in terms of these...