publisher colophon
  • Theory-driven statistical modeling for semantics and pragmatics: A case study on grammatically generated implicature readings

Computational probabilistic modeling is increasingly popular in linguistics, but its relationship with linguistic theory is ambivalent. We argue here for the potential benefit of theory-driven statistical modeling, based on a case study situated at the semantics-pragmatics interface. Using data from a novel experiment, we employ Bayesian model comparison to evaluate the predictive adequacy of four models that differ in the extent to and manner in which grammatically generated candidate readings are taken into account in four probabilistic pragmatic models of utterance and interpretation choice. The data provide strong evidence for the idea that the full range of potential readings made available by recently popular grammatical approaches to scalar-implicature computation might be needed, and that classical Gricean reasoning may help manage the manifold ambiguity introduced by grammatical approaches to these. The case study thereby shows a way of bridging linguistic theory and empirical data with the help of probabilistic pragmatic modeling as a linking function.


pragmatics, computational modeling, scalar implicature, grammaticalism, Bayesian model selection

1. Introduction

The last two decades have brought an empirical turn to several areas of theoretical linguistics, including syntax (Sprouse 2007), semantics (Bott et al. 2011), and pragmatics (Noveck & Sperber 2004). It is exciting when a discipline with a rich theoretical inventory becomes more experimental. But there are also challenges. An important issue concerns the precise linking of established theoretical ideas to potential experimental observations (Chemla & Singh 2014). Based on a case study on scalar implicature, this research report argues for a benefit of casting semantic theory explicitly into a probabilistic model of pragmatic decisions, by means of which empirical data can be analyzed in a theory-driven way.

The most common template for the empirical testing of a (formal or verbal) theory T is to derive specific predictions from T, in the form of a hypothesis H, ideally such that competitor theories (or, in the absence of such: common sense) would not make H appear very likely. An experiment is then designed to assess H in light of empirical data D. The most frequent method to see whether H is corroborated by D is to use a statistical model MSt, which is chosen from a restricted repertoire of standardized tools appropriate for the kind of experiment and data at hand (e.g. a generalized regression model). Interestingly, the original theory T does not usually inform the statistical model MSt with which the data are analyzed. The model MSt could be used in geology, medicine, or economics. This is particularly troublesome for disciplines with a strong theoretical foundation that could be used to formulate more specific, theory-driven models MT which directly link theory T to data D. Indeed, researchers in several areas of science likewise argue for the integration of more theory-driven statistical modeling, for example, in ecology (McElreath 2016), psychology (Lee & Wagenmakers 2015), and also linguistics (Brasoveanu & Dotlačil 2020).

To demonstrate the possible value of theory-driven modeling in an empirically oriented formal linguistics, this paper considers a case study on the familiar notion of scalar implicature (Geurts 2010, Grice 1975, Horn 2004). In response to certain shortcomings of a traditional Gricean account of implicature, a recently popular and [End Page e77] empirically successful alternative approach postulates that potential implicature-like meaning enrichments are generated in the grammar, thereby creating a massive semantic ambiguity (e.g. Chierchia, Fox, & Spector 2012, Fox 2007, Fox & Spector 2018). We here address how these multiple semantic readings affect a speaker’s choice of utterance and a listener’s choice of interpretation, which is something not directly predicted by semantic theory alone, but necessary for linking its predictions to experimental data. We introduce four different models of utterance and interpretation choices, as variants of the recently popular ‘rational speech act’ models (Frank & Goodman 2012, Franke & Jäger 2016, Goodman & Frank 2016). Models differ in the way that semantic ambiguity affects choices of utterance and interpretation (see Fig. 2 below). The key contribution here is a novel model, the so-called ‘global intentions model’, in which speakers choose both an overtly observable utterance and its intended meaning (the latter stemming from the set of grammatically generated implicature meanings). This theory-driven model MT can be linked directly to experimental data from a novel small-scale experiment introduced here. We then use Bayesian model comparison (e.g. Jeffreys 1961, Kass & Raftery 1995), thereby demonstrating how theory-driven statistical modeling could make for an additional, helpful tool for testing linguistic theories more directly based on empirical data.

The paper is structured as follows. We first introduce the grammatical approach to scalar implicature (§2), and then develop the relevant probabilistic models for utterance and interpretation choices in the light of multiple semantic ambiguity (§3). We report on a small-scale combined production and comprehension experiment in §4, and §5 uses this data to compare models.1

2. Grammatically generated scalar-implicature readings

Scalar implicatures are a special kind of quantity implicature, so-called in reference to Grice’s maxim of quantity, which requires speakers to be as informative as possible, given the current goals of the conversation (Grice 1975). For example, speakers who utter 1a may be taken to communicate 1b, because they did not utter the logically stronger, and therefore more informative, alternative sentence in 1c.


a. I own some of Johnny Cash’s albums.

b. I own some but not all of Johnny Cash’s albums.

c. I own all of Johnny Cash’s albums.

d.I own some of Johnny Cash’s albums, and it’s not true that I own all.

Traditional Gricean accounts rationalize scalar implicatures as inferences a listener can draw based on reasoning about the speaker’s actions. The action to be rationalized in the example above, so the mainstream account goes, is that the speaker uttered the complete sentence in 1a. An alternative action of the speaker is an utterance of 1c. The implicature reading in 1b therefore arises by conjoining the literal meaning of the sentence in 1a with the negation of the complete sentential alternative in 1c, resulting in 1d, which is equivalent to 1b.

This logic has difficulty explaining that an utterance of 2a intuitively conveys the meaning in 2b. This inference does not follow from 2d, which is the result of conjoining the negation of the meaning of the alternative sentence 2c with the literal meaning of 2a (but see Geurts 2010 for an alternative solution). [End Page e78]


a. A soldier showed some signs of malaria.

b. A soldier showed some but not all signs of malaria.

c. A soldier showed all signs of malaria.

d. A soldier showed some signs of malaria, and it’s not true that a soldier showed all signs.

In response to problems like this, an alternative approach has gained prominence recently (e.g. Chierchia, Fox, & Spector 2012, Fox 2007, Fox & Spector 2018). Grammaticalism assumes that the sentence in 2a is semantically ambiguous. It has a literal reading, where some just means ‘some and maybe all’. But it also has a reading where some is enriched to mean ‘some but not all’ in the scope of a soldier. Grammaticalism consequently earns its name by trying to incorporate meaning aspects that may appear to be pragmatic in nature into the grammar.

To illustrate the range of readings made available by grammaticalism, consider the set of nested aristotelians, which is the focus of this paper. Nested Aristotelians are sentences in which the Aristotelian quantifiers none, some, and all—arguably the most basic quantificational operators—appear once in outer (higher-scope) position and once in inner (lower-scope) position. There are nine nested Aristotelians, exemplified in 3.

(3) {None | Some | All} of the aliens drank {none | some | all} of their water.

In what follows, we use abbreviations for these sentences, indicating the outer and inner quantifier in order, as on the left-hand side of Table 1 below. For example, ‘AS’ is short for All of the aliens drank some of their water.

Figure 1. Examples of world states relevant for the interpretation of nested Aristotelians. Similar pictures are used in the experiment reported in §4.
Click for larger view
View full resolution
Figure 1.

Examples of world states relevant for the interpretation of nested Aristotelians. Similar pictures are used in the experiment reported in §4.

Arguably all nested Aristotelians presuppose the existence of a plurality of relevant aliens. What matters to the truth of nested Aristotelians is whether there are (i) aliens that drank none of their water, (ii) aliens that drank some but not all, and (iii) aliens that drank all. We therefore distinguish seven kinds of world states, each of which Figure 1 provides an example of. Each situation in Fig. 1 shows twelve aliens, each with its glass of water. To have a compact representation of relevant world states, we use pictures like inline graphicor - inline graphic. These pictures indicate which types of aliens exist. If there are aliens that drank none of their water, the pictorial representation for this world state contains a full glass, as in inline graphic; otherwise, if there are no aliens that drank none of their water (every alien drank at least some), then the corresponding picture shows no full glass, as in [End Page e79] inline graphic. The state inline graphic, for example, refers to the set of worlds in which all of the aliens drank some but not all of their water.

Grammaticalism assumes that pragmatic readings of sentences are generated by a silent exhaustification operator Exh whose meaning contribution is similar to that of the particle only (e.g. Chierchia, Fox, & Spector 2012). For present purposes, there are three relevant places where an Exh-operator might occur in a nested Aristotelian. As shown in 4, it can occur in matrix position, thereby applying to the whole sentence, and it can apply to the outer or inner quantifier.

(4) ExhM [ExhO(QO) of the aliens drank ExhI(QI) of their water]

A parse determines the reading of a sentence. It determines whether Exh occurs in matrix (M), in outer quantifier (O), or in inner quantifier (I) position. We use a notation where, for example, a parse of a sentence with Exh occurring only in matrix position is denoted as ‘M’, a parse with Exh occurring at all three relevant positions is written as ‘MOI’, and a parse without any Exh operator, thus a literal reading, is represented as ‘lit’. If S is a sentence and p its parse, then ⟦Sp is the reading of S under p. Sentences can have multiple readings based on different parses. Table 1 lists the readings of nested Aristotelians. These readings are derived from what is perhaps the simplest instantiation of grammaticalism (see the appendix for technical details).

Table 1. Readings produced by the instantiation of grammaticalism described in the . For each nested Aristotelian, the table shows whether this sentence is true (1) or false (0) in each world state under a given parse. For example, the first line of the table gives the truth conditions of sentence ‘NN’ for parses without Exh in matrix position, all of which happen to be equivalent.
Click for larger view
View full resolution
Table 1.

Readings produced by the instantiation of grammaticalism described in the appendix. For each nested Aristotelian, the table shows whether this sentence is true (1) or false (0) in each world state under a given parse. For example, the first line of the table gives the truth conditions of sentence ‘NN’ for parses without Exh in matrix position, all of which happen to be equivalent.

In sum, grammaticalism stipulates silent grammatical operators, whose effect it is to generate a rich but conventionally predictable semantic ambiguity. It thereby predicts the availability of intuitively attractive readings for cases, like in 2, that are difficult to explain for traditional Gricean accounts. However, it remains controversial whether all of these readings are important for explaining relevant empirical data (e.g. Chemla & Spector 2011, Franke, Schlotterbeck, & Augurzky 2017, Geurts & Pouscoulous 2009, Geurts & van Tiel 2013, Potts et al. 2016). Moreover, it is, to a large extent, an open issue how the massive ambiguity generated in the grammar is resolved in context, by [End Page e80] some, perhaps pragmatic, mechanism (for some discussion see e.g. Chierchia, Fox, & Spector 2012, Fox & Spector 2018). Seen in this way, it would be desirable to explore how far a combination of, on the one hand, Gricean ideas about efficient communication in the light of potential ambiguity and, on the other hand, a grammatical approach that generates potential pragmatic readings could solve each approach’s problems. This is the goal of the models introduced next.

3. Models

We consider four models, all of which rely on a general, Grice-inspired picture of a speaker trying to maximize the amount of information conveyed by an utterance and a listener consequently interpreting the speaker’s utterance based on such information-maximizing behavior. Each model predicts a probability with which a speaker chooses each nested Aristotelian to describe any given world state (from Fig. 1) and the degree of belief a listener assigns to each world state after hearing any given nested Aristotelian. The models differ, however, in the way they integrate grammatically supplied ambiguity.

Figure 2 gives a condensed overview of the main conceptual differences. First, the vanilla rational speech act (RSA) model of Frank and Goodman (2012) associates each sentence with its logical semantics only (§3.1). On the other end of the spectrum, the global intentions model (§3.4) considers all of the readings listed in Table 1. In between these two extremes are two models that take increasingly large subsets of the readings listed in Table 1 into account (§3.2, §3.3). Importantly, however, models also differ in the manner in which semantic ambiguity affects speakers’ and listeners’ choices. The lexical uncertainty model of Potts et al. (2016) features a speaker-production part in which each speaker is assumed to have a fixed, inflexible lexicon with exactly one meaning attached to each lexical item (§3.2). In addition, we also introduce two conceptually different, novel models in which all speakers are aware of the full semantic ambiguity generated by either their lexicon (lexical intentions model; §3.3) or grammar (global intentions model; §3.4), and in which they flexibly use the potential this ambiguity creates for the purpose of maximizing information flow in communication.

Figure 2. Schematic representation of main conceptual differences in the speaker-production part of the four models compared here.
Click for larger view
View full resolution
Figure 2.

Schematic representation of main conceptual differences in the speaker-production part of the four models compared here.

[End Page e81]

3.1. The vanilla rational speech act model

The RSA model defines a speaker production rule and a listener comprehension rule, roughly as described by a classical Gricean approach (see Franke & Jäger 2016, Goodman & Frank 2016 for an overview). The speaker is assumed to follow the Gricean maxims of quality (roughly: be truthful) and quantity (roughly: be informative). The listener tries to infer which meaning a speaker most likely had in mind when producing an utterance, and does so on the assumption that the speaker follows the Gricean postulates of truthfulness and informativity.

The speaker production rule PS(u | t; α) of the vanilla RSA model determines the probability with which the speaker chooses an utterance uU when the true state is tT.2 The model parameter α determines how stringently the speaker selects more informative descriptions over less informative ones (see below). The usual definition utilizes the notion of a literal listener (LL) and looks like this.3

(5) PLL(t | u) = P(t | ⟦u⟧) ∝ P(t) δt∈⟦u

(6) PS(u | t; α) ∝ exp(α · log PLL(t | u))

The literal listener, defined in equation 5, interprets each utterance u based on its semantic meaning, by simple Bayesian updating P(t | ⟦u⟧) of prior beliefs P(t) with the proposition ⟦u⟧ ⊆ T that u is true.4

It helps to think of the literal listener as a mere technical construct, whose purpose is to anchor the semantic meaning of utterances (Franke 2009). If the prior probabilities of states are uniform, so that P(t) = P(t′) for all t, t′T (an assumption we make throughout), equation 6 can be rewritten to show the two Gricean constraints of truthfulness and informativity at play.5


inline graphic

This reformulation makes clear that, as long as there is any true utterance in t, any false utterance has zero probability of being selected in t (since δt∈ ⟦u = 0). Between two true utterances (for which the δ-term evaluates to 1), the utterance with a stronger semantic meaning—that is, the one that is true in fewer world states—will be chosen with a higher probability. The bigger α, the more pronounced this preference for informative utterances is.

For example, RSA’s production rule predicts that a speaker would choose utterances to communicate state inline graphic with the following probabilities when we set α = 5 (an arbitrary [End Page e82] value chosen here only for illustration; §4’s model comparison does not hinge on specific values for α).


inline graphic

In words, RSA predicts that speakers would produce only descriptions that are literally true in state inline graphic, namely ‘NA’, ‘SN’, or ‘SS’ (see Table 1). Among these, the speaker’s choice probabilities reflect the semantic strength of the utterances (|⟦NA⟧| = 3, |⟦SN⟧| = 4, and |⟦SS⟧| = 6). Further example predictions for the RSA speaker rule with α = 5 are shown in Figure 3 below.

The comprehension rule of the vanilla RSA model captures a Gricean interpreter. It is Gricean in the sense that it models pragmatic inferences derived from the assumption that the speaker adheres to the Gricean postulates of truthfulness and informativity. Concretely, for each utterance u, the rule PL(t | u; α) assigns a probability to each interpretation t based on the prior probability P(t) of a state and the likelihood PS(u | t; α) that a (truthful and informative) speaker would use the observed utterance for this state, following Bayes’s rule.

(9) PL(t | u; α) ∝ P(t) · PS(u | t; α)

For example, consider the pragmatic listener’s belief after hearing ‘SS’, when fixing α = 5 for illustration.


inline graphic

The sentence ‘SS’ is literally false in state inline graphic, so this state receives zero probability in the pragmatic listener’s interpretation. But the other states, except for inline graphicand inline graphic, also receive (almost) zero probability, because the pragmatic listener would expect the speaker to choose a different expression in these states with much higher likelihood (see Fig. 3).

Figure 3. Predictions of the production rules of different models for parameter value α = 5. Rows represent states, columns utterances. An entry in a cell gives the probability assigned to the speaker’s choice of the column-utterance when trying to communicate the row-state.
Click for larger view
View full resolution
Figure 3.

Predictions of the production rules of different models for parameter value α = 5. Rows represent states, columns utterances. An entry in a cell gives the probability assigned to the speaker’s choice of the column-utterance when trying to communicate the row-state.

3.2. The lexical uncertainty model

The lexical uncertainty (LU) model extends the vanilla RSA model by including the listener’s potential uncertainty about the [End Page e83] lexical meaning that the speaker assigns to certain expressions (Bergen, Levy, & Goodman 2016). Like Potts et al. (2016) we are interested here in the case where the listener does not know which lexical entry the speaker has for the word some. The listener may nevertheless try to infer what some most likely means (literally) to the current speaker. This inference, intuitively, proceeds as follows: given that the speaker said u and u can be interpreted in such a way that either every occurrence of some means ‘some and maybe all’ or every occurrence means ‘some but not all’, which pair 〈t, l 〉 of a state t and mental lexicon l is most likely to have caused this speaker to have produced u?

Conceptually, the idea that listeners may entertain uncertainty about the lexical meaning a speaker assigns to some is quite distinct from the idea that the grammar produces manifold pragmatic readings for sentences containing words like some by variable insertion of Exh-operators. With slight abuse of notation, however, since some is the only Aristotelian quantifier susceptible to local meaning change by an Exh-operator, we can think of a speaker who considers some to mean ‘some but not all’ as a speaker who always, inflexibly, uses parse OI (see Table 1). Similarly, a speaker with a literal meaning for ‘some’ can be thought of as using the parse lit everywhere.

The LU model has listeners reason about the speaker’s mental lexicon l, represented here as parses l ∈ {lit, OI}. Crucially, any given speaker is assumed to have one of the two mental lexica, but not both (see Fig. 2). The speaker selects utterances u by the same mechanism as in the RSA model, but based on a semantic interpretation of utterances influenced by the speaker’s lexicon l.


inline graphic

The production rule of the LU model conditions the speaker’s utterance choice on that speaker’s fixed lexicon. This will be crucial when comparing the LU model to other models in what follows.

The listener does not know the speaker’s mental lexicon but infers which state-lexicon pairs are likely to have caused the speaker to produce the observed utterance, using Bayes’s rule.


inline graphic

The prior P(l) over lexica l is here assumed to be uniform.

Equation 13 defines the listener’s beliefs for each pair of state t and mental lexicon l. From this, we can derive the listener’s beliefs about only the state t, by taking the weighted sum over the mental lexica (so-called ‘marginalization’).


inline graphic

While Potts et al. (2016) use this latter formula to explain data from a truth-value judgment task, this paper also considers data from an experimental task that is most naturally linked to a production rule. Unfortunately, the production rule PS1 defined above is not ideally suited for this because it makes predictions only for a given lexicon l. We therefore follow Lassiter and Goodman (2017) and define a speaker S2 who reasons about pragmatically adequate utterance choice based on the state-interpretation of L1. Finally, a pragmatic listener L2 who reasons about the latter speaker’s choice of utterances is defined as before in terms of Bayes’s rule. Consequently, the final definitions of the LU models’ production and comprehension rules are as follows.


a. PS2(m | t; α) ∝ [PL1(t | m; α)]α

b. PL2(t | m; α) ∝ P(t) • PS2(m | t; α)

Figure 3 above shows the predictions of the LU model’s production rule (for α = 5). These clearly differ from those of the vanilla RSA model in several places. This also affects interpretation of utterances. For example, the sentence ‘SS’is interpreted as follows. [End Page e84]


inline graphic

The most likely interpretation of ‘SS’ is inline graphic, which is a maximally informative utterance inline graphic for a speaker with lexicon OI.

3.3. The lexical intentions model

The LU model introduced in the previous section treats potential in-situ enrichments of some as a consequence of a speaker’s lexicalization of a ‘some but not all’ meaning. This entails that any given speaker assigns to all occurrences of some the same lexical meaning (see Fig. 2). Consequently, the LU model considers two possible meanings for the sentence ‘SS’: the standard literal reading ⟦SS⟧ lit and the reading ⟦SS⟧ OI in 16.

(16) Some but not all of the aliens drank some but not all of their water. (⟦SS⟧ OI)

But it is also conceivable that some speakers, when uttering a sentence with multiple occurrences of some, like ‘SS’, might mean to convey a reading that results from different lexical meanings for different occurrences, such as in 17a or 17b, which correspond to readings ⟦SS⟧I and ⟦SS⟧O, respectively.


a. Some (and maybe all) of the aliens drank some but not all of their water. (⟦SS⟧I)

b. Some but not all of the aliens drank some (and maybe all) of their water. (⟦SS⟧O)

The LU model does not contain speakers of this kind. To model speakers that associate ‘SS’ with any of the readings in 17, it is not enough to just include parses I and O as additional values for the variable l in the LU model. The result would be conceptually highly implausible. By equation 11, the speaker has a fixed lexicon l and invariably applies it to interpret whatever sentence comes along. That does make sense when l is instantiated with parses lit and OI, because we can interpret it as speakers who have lexicalized a particular meaning of some and apply it invariably. But if l is a parse like O, equation 11 would model a speaker who inflexibly assigns a logical meaning to some in inner position and a strengthened lexical meaning in outer position, no matter what the sentence and the state to be communicated. This is not a conceptually plausible model of a speaker’s general behavior.

Alternatively, we can treat parses as a choice of the speaker. Suppose that all speakers have an ambiguous lexical entry for some: it might mean ‘some but not all’ or ‘some and maybe all’. Speakers can then choose to mean some in this way or that, depending on whether they deem this beneficial from a communicative point of view. In other words, when carrying an ambiguous lexicon, speakers may utter sentences with different lexical intentions, where the intended meaning for each occurrence of a word might be different. Pragmatic listeners can then try to recover how a speaker may have chosen to mean any single occurrence of some based, as usual, on a model of the speaker’s strategy of choosing utterances and lexical intentions (see Fig. 2).

The lexical intentions (LI) model formalizes these intuitions. It resembles the LU model superficially, but is conceptually different and also simpler. We assume that l ∈ {lit, I, O, OI} and define production and comprehension rules.


inline graphic

As before, we then take weighted sums to retrieve predictions of utterance and interpretation choice probabilities.


inline graphic

[End Page e85]

The main difference between equations 11 and 18 is the position of the lexicon parameter l. In the LI model the lexical meaning l is treated not as an argument to be passed into the speaker function, like in the LU model, but as an output of it. Conceptually, this means that the LI model does not model speakers who invariably assign a particular lexical meaning to each occurrence of a word, but speakers who choose utterances and their meanings in tandem. They do so, as usual, in such a way as to maximize the informativity of their utterances. Consequently, speakers choose a pair 〈u, l〉 as a description of t only if t ∈ ⟦ul, and they make this choice with a probability proportional to the relative informativity of ⟦ul; that is, they prefer pairs 〈u, l 〉 for which ⟦ul is small.6

The predictions of LI are subtly different from those of the two previous models (see Fig. 3). For example, the LI model predicts that the listener’s interpretation for the sentence ‘SS’ is as in 20.


inline graphic

According to the LI model, ‘SS’ should be associated with interpretation inline graphic much more readily than predicted by vanilla RSA or the LU model.

3.4. The global intentions model

The LI model assumes that speakers have ambiguous lexical entries for some. When pondering the choice of a sentence to communicate the given state, speakers actively choose readings of sentences that make them true and informative. From here it is only a very small step toward a model that includes all readings supplied by grammaticalism. The global intentions (GI) model is exactly like the LI model, but integrates all parses p considered in Table 1 (see Fig. 2).


inline graphic

The GI model goes far beyond the LU model of Potts et al. (2016). While the LU model captures certain phenomena, like locally embedded scalar implicatures, in an RSA-style reasoning framework, it does not directly engage with or adopt grammaticalism. The GI model, by contrast, does. Including the full set of grammaticalist readings into the LU model makes little conceptual sense for the same reasons laid out in the context of the LI model: it would be quite unnatural to imagine speakers who cannot rise above their language’s ambiguity, invariably applying a particular pattern of insertions of Exh-operators, irrespective of the sentence under consideration and the state to be described. The GI model, instead, models speakers who are masters of the rich ambiguities provided by their language’s grammar, using these ambiguities to flexibly intend to express this reading or another, depending on what serves communication.

The GI model also makes distinct empirical predictions (Fig. 3). For example, the speaker is predicted to use the sentence ‘SS’ with a very high probability in state inline graphic This is because the reading ⟦SS⟧M is available for the speaker, which yields a very strong reading that uniquely singles out this world state. Consequently, the listener’s interpretation of ‘SS’ also puts substantial probability on the interpretation inline graphic.


inline graphic

[End Page e86]

4. Experiment

4.1. Design

The models just introduced differ mostly in their quantitative predictions about the likelihood of expression and interpretation choices. Introspection is not reliable enough to assess such fine-grained probabilistic predictions, but experimental data may be. As all models make predictions about expression-choice and state-interpretation probabilities, we collected data from tasks that probe these model predictions directly. We refer to these tasks as a production and an interpretation task.

4.2. Participants

One hundred participants with US IP-addresses were recruited via Amazon’s Mechanical Turk, using psiTurk (Gureckis et al. 2016). We excluded data from two participants who did the experiment twice and another three who did not self-identify as native speakers of English.

4.3. Materials

The experiment was couched in a cover story about friendly aliens visiting Earth. The relevant test sentences are the nested Aristotelians in 3. We used quantification over a mass term (water that the aliens drank) for the inner quantifier so as to avoid typicality effects (e.g. Degen & Tanenhaus 2015, van Tiel 2014). The seven relevant states were displayed using pictures as in Fig. 1. All pictures contained twelve aliens with full, half-full, or empty glasses. For situations corresponding to a ‘some but not all’ reading of the outer quantifier we used four or six aliens (as appropriate) out of the total twelve, both of which are fairly natural or typical numbers to be denoted by some (Degen & Tanenhaus 2015, van Tiel 2014).

4.4. Procedure

Participants were first told that the experiment consisted of two parts. Each part started with a background story, which served to introduce the alien scenario and what was expected of participants, as well as our sentence and picture material. Participants completed seven production trials, one for each world state, in random order. Each trial displayed a picture of the state, and participants selected the outer and inner quantifier from a dropdown menu (see Figure 4a). Participants then completed nine interpretation trials, one for each sentence, in random order. Each trial displayed all of the seven states with a short repetition of the task question and a slider bar next to the picture (see Figure 4b). Participants rated how likely they thought it was that the situation displayed is what the speaker had observed.

Figure 4. Illustrations of experimental trials. (NB: in interpretation trials the pictures were actually shown on the left of the text and slider bar, but are here shown vertically stacked for ease of presentation.)
Click for larger view
View full resolution
Figure 4.

Illustrations of experimental trials. (NB: in interpretation trials the pictures were actually shown on the left of the text and slider bar, but are here shown vertically stacked for ease of presentation.)

[End Page e87]

4.5. Results

Figure 5 shows the frequencies with which sentences were selected for each state. Interestingly, sentences with none as outer quantifier were used very infrequently. Another interesting point is that sentence ‘AS’ was the most frequently used utterance in state inline graphic, which seems to conflict with the predictions of the RSA model plotted in Table 1. Finally, we see that the most frequently used utterance in state inline graphic is ‘SS’, which goes against the predictions of the RSA and the LU model plotted in Table 1, but is consistent with the GI model.

Figure 5. Results from production trials. The plot shows the frequencies with which each sentence (columns) was selected for each state (rows). The black outline indicates cases where a sentence is true in a given state under at least one of the pragmatic readings from .
Click for larger view
View full resolution
Figure 5.

Results from production trials. The plot shows the frequencies with which each sentence (columns) was selected for each state (rows). The black outline indicates cases where a sentence is true in a given state under at least one of the pragmatic readings from Table 1.

Each trial in the comprehension task returns nine slider ratings, one for each state. We normalized each of these nine-placed vectors. Figure 6 shows averages over these normalized vectors. The interpretation of ‘AS’ sentences reflects the production data in the sense that the (on average) most likely interpretation was state inline graphic, followed by state inline graphic, and finally inline graphic, thereby replicating previous results on interpretation preferences for these sentences (Chemla & Spector 2011, Franke, Schlotterbeck, & Augurzky 2017). Finally, the interpretation of the sentence ‘SS’ seems inconsistent with the interpretation predicted by the RSA model (for α = 5), but it is hard to assess by visual inspection whether this case might provide strong evidence for or against any of the other models. This is why we turn to formal model comparison next.

5. Model comparison

In order to quantify the relative evidence provided by the experimental data for or against the models introduced in §3, we look at Bayes factors (Jeffreys 1961, Kass & Raftery 1995). From a Bayesian point of view, a model M consists of a prior P(θ | M) over vectors θ of values for its parameters and a likelihood function P(D | θ, M), which assigns a likelihood to the observed data D for each vector θ. The marginalized likelihood for model M given data D quantifies how likely D is a priori for any parameter value.

(23) P(D | M) = ∫ P(θ | M) · P(D | θ, M) dθ

The Bayes factor in favor of model M1 over M2 is the ratio of marginalized likelihoods: P(D | M1) / P(D | M2). It quantifies the factor by which our beliefs should shift in favor of M1, relative to M2, given that we have observed D, since by Bayes’s rule:


inline graphic

[End Page e88]

Figure 6. Results from interpretation trials. The plot shows the average normalized slider ratings (see main text) that participants assigned to each state (columns) as an interpretation for a given sentence (row). The black outline indicates truth under at least one reading.
Click for larger view
View full resolution
Figure 6.

Results from interpretation trials. The plot shows the average normalized slider ratings (see main text) that participants assigned to each state (columns) as an interpretation for a given sentence (row). The black outline indicates truth under at least one reading.

Bayes factors are therefore independent of the prior odds of models P(M1) / P(M2), but depend on the priors over parameter values P(θ | Mi) for each model Mi.

We have production Dp and comprehension data Dc and compare models based on how well they explain their conjunction: P(D | θ, M) = P(Dp | θ, M) · P(Dc | θ, M). The following explains how the models defined in §3 give rise to a likelihood function for Dp and Dc, respectively.

The production rule of model M defines the likelihood PS(u | t, θ, M) of a single choice of expression for a state. All of the models from §3 use a single parameter α. On top of these, we include two more parameters in each model in order to accommodate for two general observations about Dp. First, our models predict probability zero for utterances that are false for a given state. However, we do observe false utterance choices (see Fig. 5). We therefore add a constant error term ϵ to all predicted utterance choice probabilities. Second, since no model accounts by itself for the low choice rates of sentences starting with none, we include a cost term c(u), which is fixed to zero for utterances starting with some or all, but may be positive for utterances starting with none. These costs capture a general dispreference for particular expressions and are not to be confused with ‘processing costs’ from the psycholinguistic literature. Costs are subtracted, following standard practice, so that, for example, for vanilla RSA we obtain the following.

(25) PS(u | t; α, ϵ, c) ∝ ⟦PLL(t | u) − c(u)⟧α + ϵ

Finally, if Dp consists of counts nij of the number of times utterance uj was chosen in state ti, the likelihood P(Dp | θ, M) is as follows.


inline graphic

Comprehension rules give a probability distribution over states for each given utterance: PL(t | u, θ, M). Since these are defined in terms of the speaker choice probabilities, [End Page e89] the parameterization of the listener rules are the same. The comprehension data Dc consists of probability vectors cij, where cij is the average of normalized ratings assigned to state j for utterance i (as plotted in Fig. 6). We think of the probability vector 〈ci1, …, ci7〉 as a sample from a Dirichlet distribution whose modal value is the model’s prediction 〈PL(t1 | ui, θ, M), … , PL(t7 | ui, θ, M)〉. To allow for more or less deviation in the realization of observed ratings, we introduce a parameter w, where the higher w, the more we expect observations that are very close to the model’s predictions.


inline graphic

As for priors over model parameters, we assume that all parameters are independently sampled from flat priors with a sufficiently large support. Priors are the same for all models.


inline graphic

Notice that all models fix the error parameter ϵ to a single value, which is chosen to be close to the average of each model’s maximum likelihood estimate for ϵ over all models.

Estimates of marginal likelihoods were obtained by grid approximation on a grid size of twenty for each parameter (e.g. Kruschke 2015:Ch. 10). Figure 7a shows the resulting Bayes factor approximations in favor of each model when compared to the RSA model. We see that all models are better than the baseline vanilla RSA model. The best model is GI. The Bayes factor in favor of GI when compared against the second best model, LI, is approximately 29; and when compared against the third best model, LU, it is ca. 95. From these results we can also give approximations of modeler’s posterior beliefs after conditioning unbiased priors over models with the observed data, as in Table 2.

Table 2. Approximations of posterior beliefs.
Click for larger view
View full resolution
Table 2.

Approximations of posterior beliefs.

In words, if each model is a priori equally likely, the posterior probability, after seeing the data, of the GI model is around 0.956. The combined data from production and comprehension provide strong evidence in favor of the GI model, suggesting that, within the extensions of RSA-style pragmatic reasoning models, it is best to include the full range of grammatically generated implicature readings.

Since Bayes factors depend on priors over parameter values, Figure 7b additionally shows the results of model comparison using the bayesian information criterion (BIC; Schwarz 1978), which relies on the maximum likelihood estimates of the parameters and is therefore independent of priors over parameter values. A model is better the lower its BIC score. Consequently, we retrieve the same ordinal result as for Bayes factor comparison.

To understand the results better, we can look at production and comprehension data separately. For only production data, the Bayes factor for the GI model is about 30,600; the LI model is absolutely no competition. If we look at comprehension data only, the Bayes factor in favor of the GI model is ca. 0.011; that is, the LI model is roughly 100 times more likely a posteriori, if we start from unbiased prior beliefs. Consequently, the main advantage of the GI model comes from its superior predictions for the production data, which outweigh its weaker predictions for comprehension. [End Page e90]

Figure 7. Results from model comparison.
Click for larger view
View full resolution
Figure 7.

Results from model comparison.

Zooming in to look at each individual condition from production (i.e. the utterance choices at a single state) and comprehension (i.e. the average slider ratings for a single sentence), Figure 7c shows that the main evidence in favor of the GI model comes from its superior predictions for the production data in state inline graphic. Figure 8 plots the prior predictive distributions for production data under both models. The prior predictive distribution gives the marginalized likelihood for each possible data observation. We see that in most conditions the prior predictives of models coincide, and that the main difference in prior predictions is indeed that the LI model underpredicts the choice of sentence ‘SS’ and overpredicts the choice of ‘SN’ for condition inline graphic. The LI model makes better predictions for states inline graphic and inline graphic, but, as becomes apparent from Fig. 8, its predictive advantage over the GI model is less pronounced than GI’s advantage in condition inline graphic. The reason why the GI model predicts a high frequency of ‘SS’ choices in state inline graphic is that it makes it possible that a rational speaker utters ‘SS’ while intending a global reading. The global reading with M is not available to the LI model, but it serves to perfectly single out state inline graphic from all other states (see Table 1), so it is rational for a speaker to say ‘SS’ and mean inline graphic.

6. Conclusions

We addressed the general question of how multiple ambiguity affects a speaker’s pragmatic choice of utterance and a listener’s interpretation. Focusing on implicature readings of complex sentences, we showcased theory-driven statistical modeling as a method of directly linking linguistic theory to empirical data. We formulated the ‘global intentions’ model, which assigns a conceptually plausible role to the full richness of meanings generated by grammatical approaches to scalar implicature inside of a Gricean model of probabilistic pragmatic reasoning (see Fig. 2). The GI model is not an innocuous synthesis of traditional Gricean and grammatical approaches to scalar implicatures. It is a new breed of its own. It puts general pragmatic reasoning first and [End Page e91] embeds grammaticalism as one potential mechanism of generating conventionally associated readings for sentences. It thereby also provides a possible link hypothesis, that is, a way of deriving precise quantitative predictions for empirical data for a wide range of cases. Statistical model comparison in terms of Bayes factors indeed suggests that the GI model might be empirically superior to the investigated alternatives.

Figure 8. Prior predictive distribution for the production data. Black vertical bars indicate the observed counts for each condition.
Click for larger view
View full resolution
Figure 8.

Prior predictive distribution for the production data. Black vertical bars indicate the observed counts for each condition.

Theory-driven statistical modeling of the kind executed here can inform linguistic theory in multiple ways. We might, for example, fix the GI model but compare different instantiations of grammaticalism embedded inside of it. Consider Gotzner and Romoli’s (2018) proposal of a different construction of the set of sentential alternatives Alt(S, p) than used here. Without going into the details of their account, the Bayes factor in favor of a GI model based on the simpler construction used here (see the appendix), when compared [End Page e92] to Gotzner and Romoli’s more elaborate alternative, is about 1,530, suggesting that the simpler construction is empirically superior. This is not meant to be a decisive argument against Gotzner and Romoli’s approach, but an example of how theory-driven modeling opens up new ways to quantitatively compare elaborate and fine-grained theoretical proposals in the light of experimental data.

The main reason for the GI model’s predictive success was identified as the availability of a reading ⟦SS⟧M = { inline graphic} of the sentence ‘SS’ obtained from inserting an Exh-operator at matrix position. This provides suggestive evidence for the idea that the full set of grammatically induced readings is needed. Indeed, the vanilla RSA model does not associate ‘SS’ with inline graphic because ‘SN’ is always a better choice in state inline graphic than ‘SS’ for the speaker. This highlights an important implication of the GI model: it uses two different types of sentential alternatives. On the one hand, there are what we could call grammatical sentential alternatives, which are used to derive the grammatically derived readings of candidate utterances, as detailed in the appendix. For example, by common construction ‘SN’ is not a grammatical alternative to ‘SS’, so that the reading ⟦SS⟧M = { inline graphic} can be derived. On the other hand, there is the set of utterance alternatives, which we here equated, naturally, with the set of utterances offered in the experimental design (exactly like Potts et al. 2016). The set of utterance alternatives to ‘SS’ does include ‘SN’. It is not necessary to equate these two sets of alternatives, because it is conceivable, for example, that the grammatical module that generates readings for sentences is (fairly) encapsulated from context. Utterance alternatives might be much more flexibly adapted in the context, as they provide the set of actual utterances the speaker is currently aware of in their choice. Future work should explore the potential generated by this distinction of grammatical vs. utterance alternatives further.

Future work should also scrutinize different models of pragmatic reasoning. For example, all of the models considered here assume that speakers do not have a veridical view of the listener’s interpretation; they base their choices on a simple Gricean heuristic of preferring semantically stronger utterances; they do not engage actively in audience design. Clearly, exploring other models of pragmatic reasoning in connection with grammatically generated implicature readings, and testing them in experimental designs that take the possibility of active audience design into account, is a worthwhile enterprise for future research.

Although our focus here was conceptual and methodological, more work is clearly necessary on the empirical side as well. The scope of our experimental manipulations is admittedly rather small. Recent experimental papers on scalar implicatures (e.g. Chemla & Spector 2011, Franke, Schlotterbeck, & Augurzky 2017) in complex sentences have focused attention on scalar-implicature triggers in nonmonotonic environments like in 29.

(29) Exactly one of the aliens drank some of its water.

Our design was more restricted in order to keep the set of alternative utterances for global Gricean reasoning confined to what is hopefully a maximally uncontroversial selection. For instance, the modeling of Potts et al. (2016) implicitly assumes that listeners also include 29 as a speaker’s alternative utterance during the interpretation of any nested Aristotelian. To sidestep such controversial assumptions, at least for the time being, this paper’s experimental set-up is deliberately minimal. Nonetheless, our data offer enough grip for a rational experimenter to quite substantially update their beliefs about which model could likely be true. [End Page e93]

Michael Franke
University of Osnabrück
Leon Bergen
University of California, San Diego
[Received 12 March 2018;
revision invited 3 June 2018;
revision received 1 March 2019;
revision invited 1 June 2019;
revision received 9 October 2019;
accepted 16 October 2019]

Appendix. Grammatically generated implicature readings

When Exh applies directly to a quantifier Q, the resulting meaning Exh(Q) is that of Q conjoined with the negation of all lexical alternatives that are strictly stronger, that is, which entail Q but are not entailed by it. As usual, we assume that some and all are lexical alternatives of each other. We also assume, following standard practice, that none and not all are, since we analyze none as not some (e.g. Levinson 2000:80).7 Consequently, the effect of applying Exh to Q will be vacuous for none and all, but Exh(some) gives an in-situ enrichment to ‘some but not all’. For example, under a parse I the sentence ‘SS’ gets the reading ⟦SS⟧I in A1 that will be true in any state whose pictorial representation includes a half-full glass (see Table 1).

(A1) Some of the aliens drank some but not all of their water.

When Exh applies in matrix position to a sentence S, it takes the meaning of S obtained from insertions of Exh below matrix position and conjoins that meaning with the negation of suitable sentential alternatives. Let S be a sentence and p its parse. Let ⟦S⟧ p_?? be the meaning of S under the parse p|−M, which is like p with respect to outer and inner quantifier, but has no Exh in matrix position. When Exh applies in matrix position to S whose parse is p, the meaning of Exh(S, p) is obtained by conjoining ⟦S⟧ p|−M of S with the negation of all relevant sentential alternatives Alt(S, p) of S, if this operation is noncontradictory (but see e.g. Gotzner & Romoli 2018 for discussion of more elaborate definitions).


inline graphic

We here assume that the set Alt(S, p) is obtained in two steps, a generation and a filtering step. First, we generate a set Alt*(S) of potential alternatives by blindly replacing any occurrence of a scalar item in S with any of its lexical alternatives, as defined above. A second step then filters out sentences from Alt*(S) that do not entail S.


a. Alt*(S) = {S′ | S′ is derived from S by replacements of lexical alternatives}

b. Alt(S, p) = {S′Alt*(S, p) | ⟦S′lit ∈ ⟦Sp|−M}

For example, consider the sentence ‘SS’ under a parse M. By replacing occurrences of some with its lexical alternatives, we obtain Alt*(SS) = {SS, AS, SA, AA}. Of these, ‘SS’ is filtered out. Applying matrix-Exh consequently yields the conjunction of the negation of all sentences ‘AS’, ‘SA’, and ‘AA’ with the literal meaning of ‘SS’, as in A4. This is a very strong reading, as the only world state that makes this reading true is inline graphic.

(A4) Some of the aliens drank some of their water, and it is not true that some aliens drank all or that all drank some.

As a final example, consider again the sentence ‘SS’ but now under a parse MI. We first consider ⟦SS⟧I, which gives the set of situations in which there is at least one half-full glass. The set Alt*(SS) is as before, but Alt(S, p) is now empty because none of the sentences in Alt*(SS) are, under parse lit, strictly stronger than ‘SS’ under parse I. Consequently, additional insertion of Exh in matrix position is vacuous in this case.


Bergen, Leon; Roger Levy; and Noah D. Goodman. 2016. Pragmatic reasoning through semantic inference. Semantics and Pragmatics 9:20. DOI:10.3765/sp.9.20.
Bott, Oliver; Sam Featherston; Janina Radó; and Britta Stolterfoht. 2011. The application of experimental methods in semantics. Semantics: An international handbook of natural language meaning, vol. 1, ed. by Claudia Maienborn, Klaus von Heusinger, and Paul Portner, 305–21. Berlin: De Gruyter. DOI:10.1515/9783110226614.305.
Brasoveanu, Adrian, and Jakub Dotlačil. 2020. Computational cognitive modeling and linguistic theory. Berlin: Springer.
Chemla, Emmanuel, and Raj Singh. 2014. Remarks on the experimental turn in the study of scalar implicature, parts 1 and 2. Language and Linguistics Compass 8(9).373–86, 387–99. DOIs: 10.1111/lnc3.12081; 10.1111/lnc3.12080.
Chemla, Emmanuel, and Benjamin Spector. 2011. Experimental evidence for embedded scalar implicatures. Journal of Semantics 28.359–400. DOI:10.1093/jos/ffq023.
Chierchia, Gennaro; Danny Fox; and Benjamin Spector. 2012. Scalar implicature as a grammatical phenonenon. Semantics: An international handbook of natural language meaning, vol. 3, ed. by Claudia Maienborn, Klaus von Heusinger, and Paul Portner, 2297–2332. DOI:10.1515/9783110253382.2297.
Degen, Judith, and Michael K. Tanenhaus. 2015. Processing scalar implicature: A constraint-based approach. Cognitive Science 39.667–710. DOI:10.1111/cogs.12171.
Fox, Danny. 2007. Free choice and the theory of scalar implicatures. Presupposition and implicature in compositional semantics, ed. by Uli Sauerland and Penka Stateva, 71–120. London: Palgrave MacMillan. DOI:10.1057/9780230210752_4.
Fox, Danny, and Benjamin Spector. 2018. Economy and embedded exhaustification. Natural Language Semantics 26(1).1–50. DOI:10.1007/s11050-017-9139-6.
Frank, Michael C., and Noah D. Goodman. 2012. Predicting pragmatic reasoning in language games. Science 336(6084).998. DOI:10.1126/science.1218633.
Franke, Michael. 2009. Signal to act: Game theory in pragmatics. Amsterdam: Universiteit van Amsterdam dissertation. Online:
Franke, Michael, and Gerhard Jäger. 2016. Probabilistic pragmatics, or why Bayes’ rule is probably important for pragmatics. Zeitschrift für Sprachwissenschaft 35(1).3–44. DOI:10.1515/zfs-2016-0002.
Franke, Michael; Fabian Schlotterbeck; and Petra Augurzky. 2017. Embedded scalars, preferred readings and prosody: An experimental revisit. Journal of Semantics 34(1).153–99. DOI:10.1093/jos/ffw007.
Geurts, Bart. 2010. Quantity implicatures. Cambridge: Cambridge University Press.
Geurts, Bart, and Nausicaa Pouscoulous. 2009. Embedded implicatures?!? Semantics and Pragmatics 2:4. DOI:10.3765/sp.2.4.
Geurts, Bart, and Bob van Tiel. 2013. Embedded scalars. Semantics and Pragmatics 6:9. DOI:10.3765/sp.6.9.
Goodman, Noah D., and Michael C. Frank. 2016. Pragmatic language interpretation as probabilistic inference. Trends in Cognitive Sciences 20(11).818–29. DOI:10.1016/j.tics.2016.08.005.
Goodman, Noah D., and Andreas Stuhlmüller. 2013. Knowledge and implicature: Modeling language understanding as social cognition. Topics in Cognitive Science 5(1). 173–84. DOI:10.1111/tops.12007.
Gotzner, Nicole, and Jacopo Romoli. 2018. The scalar inferences of strong scalar terms under negative quantifiers and constraints on the theory of alternatives. Journal of Semantics 35(1).95–126. DOI:10.1093/jos/ffx016.
Grice, H. Paul. 1975. Logic and conversation. Syntax and semantics, vol. 3: Speech acts, ed. by Peter Cole and Jerry L. Morgan, 41–58. New York: Academic Press.
Gureckis, Todd M.; Jay Martin; John McDonnell; Alexander S. Rich; Doug Markant; Anna Coenen; David Halpern; Jessica B. Hamrick; and Patricia Chan. 2016. psiTurk: An open-source framework for conducting replicable behavioral experiments online. Behavior Research Methods 48(3).829–42. DOI:10.3758/s13428-015-0642-8.
Herbstritt, Michele, and Michael Franke. 2019. Complex probability expressions & higher-order uncertainty: Compositional semantics, probabilistic pragmatics & experimental data. Cognition 186.50–71. DOI:10.1016/j.cognition.2018.11.013.
Horn, Laurence R. 2004. Implicature. The handbook of pragmatics, ed. by Laurence R. Horn and Gregory Ward, 3–28. Oxford: Blackwell.
Jeffreys, Harold. 1961. Theory of probability. 3rd edn. Oxford: Oxford University Press.
Kass, Robert E., and Adrian E. Raftery. 1995. Bayes factors. Journal of the American Statistical Association 90(430).773–95. DOI:10.1080/01621459.1995.10476572.
Kruschke, John E. 2015. Doing Bayesian data analysis. 2nd edn. Burlington, MA: Academic Press.
Lassiter, Daniel, and Noah D. Goodman. 2017. Adjectival vagueness in a Bayesian model of interpretation. Synthese 194(10).3801–36. DOI:10.1007/s11229-015-0786-1.
Lee, Michael D., and Eric-Jan Wagenmakers. 2015. Bayesian cognitive modeling: A practical course. Cambridge: Cambridge University Press.
Levinson, Stephen C. 2000. Presumptive meanings: The theory of generalized conversational implicature. Cambridge, MA: MIT Press.
McElreath, Richard. 2016. Statistical rethinking. Boca Raton: Chapman and Hall.
Noveck, Ira A., and Dan Sperber (eds.) 2004. Experimental pragmatics. Hampshire: Palgrave MacMillan.
Potts, Christopher; Daniel Lassiter; Roger Levy; and Michael C. Frank. 2016. Embedded implicatures as pragmatic inferences under compositional lexical uncertainty. Journal of Semantics 33(4).755–802. DOI:10.1093/jos/ffv012.
Schwarz, Gideon. 1978. Estimating the dimension of a model. Annals of Statistics 6(2). 461–64. Online:
Sprouse, Jon. 2007. A program for experimental syntax: Finding the relationship between acceptability and grammatical knowledge. College Park: University of Maryland dissertation. Online:
van Tiel, Bob. 2014. Quantity matters: Implicatures, typicality, and truth. Nijmegen: Radboud Universiteit Nijmegen dissertation.


1. Materials, data, and modeling code can be retrieved from

2. The vanilla RSA model and all other models considered here assume that the speaker knows the true world state, an assumption that is arguably warranted by the design of the experiment described in §4. See Goodman & Stuhlmüller 2013 and Herbstritt & Franke 2019 for extensions to cases with (higher-order) uncertain speakers.

3. Here, δboolean is the delta function that returns 1 if the supplied Boolean expression is true and 0 otherwise. The symbol ‘∝’ (for ‘proportional to’) allows us to define probability distributions more compactly by leaving the normalizing constant implicit: writing P(x) ∝ F(x) for some function F(x) with domain X is shorthand for inline graphic.

4. The speaker’s choice of expression, defined in equation 6, is derived from two assumptions (see e.g. Franke & Jäger 2016 for details). First, the speaker seeks to minimize the distance (in terms of Kullback-Leibler divergence) between their probabilistic belief and that of the literal listener. Second, the speaker tends toward optimal choices probabilistically (modeled with a so-called softmax function with rationality parameter α).

5. Equation 6 can be rewritten like so: PS(u | t; α) ∝ exp(α• log PLL(t | u)) = [PLL(t | u) ]α, which by definition of the literal listener expands to: inline graphic. Since P(t) = P(t′) for all t, t′ the prior term cancels out and we retrieve: inline graphic

6. Notice that the LI model, and the following GI model, assumes that the literal listener ‘miraculously’ recognizes the speaker’s intended meaning. Since the literal listener is a technical construct, the implication is that speakers are not assumed to take into account a realistic model of their interlocutor. Speakers are modeled as using a (myopic, conventional) heuristic manner of choosing utterances and intended meanings, which, however, can then be recognized by pragmatic listeners to retrieve the speaker’s intended meaning.

7. The only effect of considering not all a lexical alternative to none is that the sentence ‘NN’ gets an additional reading when Exh applies in matrix position. But this has no noteworthy effect on anything of relevance to this paper’s main concerns.

Additional Information

Print ISSN
Launched on MUSE
Open Access
Back To Top

This website uses cookies to ensure you get the best experience on our website. Without cookies your experience may not be seamless.