Conference Presentations by Dirk Pijpops

In this study, we contrast the Moroccan-Dutch ethnolect with the language use of full native spea... more In this study, we contrast the Moroccan-Dutch ethnolect with the language use of full native speakers within the framework of Contrastive Interlanguage Analysis (Granger 1996). Our focus will be on their realization of Dutch adjectival morphology. Language users of the Moroccan-Dutch ethnolect may be creatively restructuring Dutch morphology in a number of novel ways. In particular, the adjectival -e inflection has drawn scholarly attention (Van de Velde and Weerman 2014). Here, it is argued that these language users are revitalizing a seemingly defunct inflection system by discarding a number of synchronically unmotivated exceptions. The -e ending may then acquire new functions as (i) a marker of attributive modification and (ii) a boundary marker between the modification and determination zones in the noun phrase. The -e ending is not the only remnant of the once elaborate Dutch adjectival inflection system, however. The so-called partitive genitive construction also harbors an adjectival -s ending, that, like the -e, alternates with a zero ending, as in (1) versus (2) (Haeseryn et al. 1997: 421, for the contexts in which either form is used in Present-day Dutch, see Pijpops and Van de Velde 2014).
(1) de hijab is iets moois wat door Marokkaanse wijven helemaal verpest is
the hijab is something beautiful-GEN that by Moroccan women totally ruined is
‘The hijab is something beautiful that is totally ruined by Moroccan women.’
(2) Is dat iets verkeerd
is that something wrong-∅
‘Is that something wrong?’
This -s ending is one of the few surviving remnants of the Dutch genitive case, more specifically the partitive genitive: hence the name of the construction. The partitive genitive construction is a combination of an indefinite pronoun or numeral with a postmodifying adjectival phrase, although the exact theoretical architecture of the construction is still very much up for debate (Schultink 1962: 62; Kester 1996; van Marle 1996; Broekhuis and Strang 1996; Hoeksema 1998; Booij 2010: 223–228; Broekhuis 2013: 419–461). We then ask whether and, if so, how the language users of the Moroccan-Dutch ethnolect differs from full native language use in the utilization of this adjectival -s ending. This may be a difference in absolute numbers, but may also pertain to the number and choice of the factors that determine the appearance of the -s ending. There are four possible options:
(i) Like the -e ending, the users of Moroccan Dutch generalize the -s ending to all instances of the partitive genitive, thereby refunctionalizating this remnant of the Dutch case system as a transparent and reliable construction marker (cf. Booij 2010: 223–228).
(ii) The users of Moroccan Dutch generalize the zero ending, thereby ridding their language of an obsolete fossil from bygone times. This would be a continuation of the deflexion trend apparent in the development of Dutch (van der Horst 2013). The resulting state would be akin to English, where only the zero ending is used.
(iii) The users of Moroccan Dutch employ both the -s and zero ending in exactly the same way as other language users of Dutch, implementing the same factors to determine the choice between both variants. This would indicate that these factors are of a qualitatively different nature than the factors determining the use of the -e ending, as the users of Moroccan Dutch apparently do not or cannot dispose of them.
(iv) The users of Moroccan Dutch employ both the -s and zero ending in the partitive genitive construction, but in a different way than other language users of Dutch. This would indicate that these language users are creatively adapting their language to cater to new or other needs.
To investigate this, we will apply regression modelling to corpus data, as proposed by Gries and Deshors (2014). Gries & Deshors advocate this methodology as a way to fully exploit the potential of Granger's (1996) model of Contrastive Interlanguage Analysis (CIA). As a source of data, we turn to the Moroccorp corpus, which contains chat conversations in the Moroccan-Dutch etnolect and has already proven its value in studies on Dutch adjectival inflection (Ruette and Van de Velde 2013; Van de Velde and Weerman 2014). We extracted a number of possible partitive genitives which we manually filtered, and finally retained 1613 genuine partitive genitive instances. These partitive genitives are contrasted with 765 observations of partitive genitives taken from Netherlandic chat conversations in the ConDiv corpus (Grondelaers et al. 2000), adopted from an earlier study by Pijpops and Van de Velde (2014). The Moroccorp corpus was specifically designed to be commensurable to this subsection of ConDiv (Ruette and Van de Velde 2013: 467–470). Finally, we employed mixed logistic regression modelling to investigate how the realization of partitive genitives differs in both corpora.
This study will shed light on how early L2/2L1 speakers deal with seemingly defunct morphology in Dutch, and hopes to answer the call of Gries and Deshors (2014) for more elaborate statistical methods in CIA research.

In recent years, theoretical work in construction grammar has often focused on links between cons... more In recent years, theoretical work in construction grammar has often focused on links between constructions and the design of the constructional network or constructicon (Wellens 2011; Van de Velde 2014; Diessel 2015). Regarding these networks, one of the issues on which we have managed to reach consensus, is the need for a vertical dimension, ranging from fully abstract to lexically specified constructions (Croft 2001: 25–29; Goldberg 2003; Fried and Östman 2004: 15–18). Still, corpus research only rarely explicitly takes this dimension into account and often restrict itself to one particular horizontal level in the network (e.g. Pijpops & Speelman 2017, for exceptions, see a.o. Boas 2010; Wible & Tsao 2017). While such an approach is certainly justifiable, we will argue that neglecting the multi-level nature of the constructicon has led to three problems of constructional semantics.
At least two of these, which will be called the Problem of Prediction and the Problem of Proliferation, have already been noted in earlier studies. The first pertains to the formulation of specific predictions regarding low-level constructions based on only high-level, abstract semantic notions such as affectedness, involvement or agency (see Lenci 2012: 13–15, and also Broccias 2001; Perek 2015: 90–144). For example, when discussing the influence of affectedness on the argument variation of the Italian verb rimproverare ‘reproach’, Lenci (2012: 14) notes that “this interpretation would require us to stretch the meaning of affectedness well beyond its standard (fairly high) vagueness and polysemy, thereby impairing its reliability as a truly explanatory notion”. The second problem relates to positing ever more concrete constructions, which may draw the critique of non-parsimony (Culicover and Jackendoff 2005; Traugott and Trousdale 2013: 5–11). We will attempt to demonstrate that these problems are caused by a third, more fundamental problem, named the Problem of Precedence. This problem asks at which level in the constructional network speakers primarily employ a construction to communicate meaning, optimize information structure or express lectal distinctions. Next, we will argue that this concern does not constitute a theoretical issue, but rather an empirical question.
Finally, we introduce a methodological approach to deal with this question. To illustrate the approach, we employ as a case study the alternation between the Dutch transitive and prepositional argument constructions, as in (1)-(2). We identify a seemingly motley collection of 102 verbs exhibiting the alternation and map out the relevant region of the constructional network. Fully abstract argument constructions are first put under scrutiny, after which we continue on to more lexically specific constructions. The goal of this procedure is to identify the precedence level at which the alternation is predominantly active, thus solving the Problem of Precedence. It will be demonstrated that doing so will also enable us to tackle both the Problems of Prediction and Proliferation.
(1) Minister Vandenbroucke zoekt (naar) een oplossing.
‘Secretary Vandenbroucke is searching a solution.’
(2) (Met) hete koffie gemorst.
‘Spilled hot coffee.’

In recent years, theoretical work in construction grammar has often focused on links between cons... more In recent years, theoretical work in construction grammar has often focused on links between constructions and the design of the constructional network or constructicon (Wellens 2011; Van de Velde 2014; Diessel 2015). Regarding these networks, one of the issues on which we have managed to reach consensus, is the need for a vertical dimension, ranging from fully abstract to lexically specified constructions (Croft 2001: 25–29; Goldberg 2003; Fried and Östman 2004: 15–18). Still, corpus research only rarely explicitly takes this dimension into account and often restrict itself to one particular horizontal level in the network (e.g. Pijpops & Speelman 2017, for exceptions, see a.o. Boas 2010; Wible & Tsao 2017). While such an approach is certainly justifiable, we will argue that neglecting the multi-level nature of the constructicon has led to three problems of constructional semantics.
At least two of these, which will be called the Problem of Prediction and the Problem of Proliferation, have already been noted in earlier studies. The first pertains to the formulation of specific predictions regarding low-level constructions based on only high-level, abstract semantic notions such as affectedness, involvement or agency (see Lenci 2012: 13–15, and also Broccias 2001; Perek 2015: 90–144). For example, when discussing the influence of affectedness on the argument variation of the Italian verb rimproverare ‘reproach’, Lenci (2012: 14) notes that “this interpretation would require us to stretch the meaning of affectedness well beyond its standard (fairly high) vagueness and polysemy, thereby impairing its reliability as a truly explanatory notion”. The second problem relates to positing ever more concrete constructions, which may draw the critique of non-parsimony (Culicover and Jackendoff 2005; Traugott and Trousdale 2013: 5–11). We will attempt to demonstrate that these problems are caused by a third, more fundamental problem, named the Problem of Precedence. This problem asks at which level in the constructional network speakers primarily employ a construction to communicate meaning, optimize information structure or express lectal distinctions. Next, we will argue that this concern does not constitute a theoretical issue, but rather an empirical question.
Finally, we introduce a methodological approach to deal with this question. To illustrate the approach, we employ as a case study the alternation between the Dutch transitive and prepositional argument constructions, as in (1)-(2). We identify a seemingly motley collection of 102 verbs exhibiting the alternation and map out the relevant region of the constructional network. Fully abstract argument constructions are first put under scrutiny, after which we continue on to more lexically specific constructions. The goal of this procedure is to identify the precedence level at which the alternation is predominantly active, thus solving the Problem of Precedence. It will be demonstrated that doing so will also enable us to tackle both the Problems of Prediction and Proliferation.
(1) Minister Vandenbroucke zoekt (naar) een oplossing.
‘Secretary Vandenbroucke is searching a solution.’
(2) (Met) hete koffie gemorst.
‘Spilled hot coffee.’

Lectal contamination is the language-external counterpart of what has been described as construct... more Lectal contamination is the language-external counterpart of what has been described as constructional contamination (Pijpops & Van de Velde 2016). In constructional contamination, various superficially similar constructions within one and the same language variety exert an influence on each other, causing lexically-specific preferences for either of two morphological or syntactic variants, depending on which lexemes the superficially related construction share. In lectal contamination, by contrast, lexically-specific preferences may arise due to language contact with another variety that shares the same construction. In particular, lexemes that occur more often in one variety will come to prefer the morphosyntactic variant that is preferentially used in that particular variety, even in the speech of language users of a different variety. As a result, what is essentially a language-external factor conditioning a particular form of linguistic variation may become internalized.
As a case study, we zoom in on the Dutch partitive genitive construction. This construction exhibits variation between a form with and without -s ending, as in (1) and (2). The form with the -s ending is predominant in the Netherlandic regiolect, while the form without -s constitutes a marker of the Belgian regiolect (Pijpops & Van de Velde 2014). Because of this distinction between the Netherlands and Belgium, i.e. a language-external factor, partitive genitive types that feature typically Netherlandic lexemes, such as (1), more often appear in the variant with -s, whereas those that contain typically Belgian lexemes, such as (2), will more often appear without the -s. Our hypothesis was that these lexical preferences got entrenched, so that Belgian speakers using Netherlandic lexemes would import the Netherlandic morphological variant and vice versa. In other words: while the formal realisation is straightforwardly regionally stratified, we expect these lexical preferences to hold even within the Netherlandic and Belgian regiolects.
(1) Iets bijzonder(s)
‘Something remarkable’
(2) Iets speciaal(s)
‘Something special’
We tested this prediction on 3018 manually checked observations from the ConDiv corpus of written Dutch (Grondelaers et al. 2000) and found it to be confirmed, even when controlling for all other known variables to influence -s omission. Furthermore, we drew geographically-tagged data from Twitter, totaling 1299 manually checked instances, to replicate this finding and to investigate the geographical spread of both lectal contamination and the partitive genitive variation.
The effect of lectal contamination can only be explained if we have a sufficiently precise account of how individual speakers operate in language contact situations (Weinreich, Labov & Herzog 1968). If language contact can, in this way, cause lectal variation to produce lect-internal effects, then a variationist description of a particular regio-, dia-, socio- or ethnolect crucially depends on an understanding of language contact.

1. Introduction
Lieberman et al. (2007) aimed to quantify the evolutionary dynamics of language b... more 1. Introduction
Lieberman et al. (2007) aimed to quantify the evolutionary dynamics of language by investigating the rise of the English regular past tense inflection, which they equated with the weak -ed suffix. Yet, their bold conclusion that “the half-life of an irregular verb scales as the square root of its usage frequency: a verb that is 100 times less frequent regularizes 10 times as fast” (Lieberman et al., 2007, p.713) has successively attracted criticism from scholars in the fields of historical and evolutionary linguistics. First, Carroll, Svare, & Salmons (2012) showed that this constant regularization rate does not hold true for the closely-related German language. Second, Cuskley et al. (2014) found that the rise of the English weak ed suffix is not driven by forces endogenous to language, such as analogy, but rather by external forces, such as new verbs entering the language through language contact.
We will reassess the constant-rate controversy by (i) extending the methodological scope with agent-based modeling, and (ii) extending the number of languages going beyond the German-English distinction, adding Dutch.
Our results show that the constant rate does not hold. If language change is co-determined by external forces resulting in languages adapting to its niche (Lupyan & Dale 2016) this is exactly what one would expect, since English, Dutch and German have endured external pressures to a different degree. We will focus on the influence of demographic change. In particular, we investigate the growth of cities and the resulting koineization due to migration in the three language areas since the Middle Ages. The three different degrees of urbanization have led to different degrees of dialect contact, which could in turn, as we will argue, lead to different regularization rates. To support this claim, we will present both empirical evidence from linguistic and demographic databases, as well as the results of a computational simulation.
2. Empirical data
2.1. Linguistic data
To obtain a clear picture of the linguistic situation, we included the data on English from Lieberman et al. (2007) and the data on German from Carroll et al. (2012), and complemented these with our own Dutch data. This enables us to track the development of the past tense system of these three languages over a 1000 year period (800-1800).
2.2. Demographic data
For the demographic data, we make use of the databases of Bairoch et al. (1988), De Vries (1984), and Mitchell (1998). In particular, we compare the population growth of the largest cities in the English, Dutch and German language areas in each particular time period from 800-1800. Historical research has shown that the exponential growth of urban population cannot be reduced to natural growth, but is driven by immigration as well, both of foreigners and of by a rural exodus from the larger agglomeration, leading to dialect contact. We then observed correlations between the success of the weak inflection and the amount of demographic upheaval.
3. Simulation
A correlation between a demographic and a linguistic trend does not automatically entail a causation between the former and the latter, however. To further substantiate our claim, we therefore turn to an agent-based computer simulation. In this simulation, agents store exemplars or tokens of what they hear (cf. Pijpops et al., 2015), rather than type states (cf. Colaiori et al., 2015), and use these to produce novel forms. We find that (i) the weak inflection does not require special status as the single regular inflection in order to explain the tendencies observed in reality; (ii) replacement of verbs can indeed cause a continued rise of the weak inflection, even after a stable equilibrium between weak and strong verbs has emerged, confirming Cuskley et al. (2014); and most importantly (iii) if our current understanding of language, as implemented in the simulation, is correct, demography does indeed affect the rise of the weak inflection.
Morphology Days Conference, 2017
In every-day language use, two or more structurally unrelated constructions may occasionally give... more In every-day language use, two or more structurally unrelated constructions may occasionally give rise to strings that look similar or even identical on the surface. As language users sometimes employ shallow parsing, the processing of these strings may short-circuit (Ferreira, Bailey and Ferraro 2002; Ferreira and Patson 2007; Dąbrowska 2014). As a result of this interference, a subset of instances of one construction may deviate in their formal realization. This effect is called constructional contamination (Pijpops and Van de Velde 2016).

Present-day Dutch has a vestigial partitive genitive morpheme. Adjectives take the genitive -s mo... more Present-day Dutch has a vestigial partitive genitive morpheme. Adjectives take the genitive -s morpheme when they are used as a dependent of a quantifier (Haeseryn et al. 1997: 863; Broekhuis 2013: 420-426). This is illustrated in (1). The construction comes in two variants: either with an overt -s suffix, or without the suffix.
(1) iets bijzonder(-s)
something special-GEN
‘something special’
While the two variants do not show any observable semantic difference, Pijpops & Van de Velde (2014) applied mixed-model logistic regression and found that the expression of the -s is probabilistically determined by a number of factors. While overall, the [+s] variant is more frequent, the [-s] variant is also fairly common, and is more likely to occur (i) in informal registers, (ii) in low-frequency phrases, and (iii) in the south of the language area (Belgium). There also is a strong main effect for the [-s] variant for adjectives that occurred in superficially similar non-partitive constructions. This is illustrated in (2) and (3): though similar in surface form, the contexts makes clear that (2) is not a partitive construction. The absence of the -s morpheme then spills over to genuine partitives like (3) (see Pijpops & Van de Velde, forthc. for extensive explanation on what they call ‘constructional contamination’).
(2) iets verkeerd geïnterpreteerd
[something]NP [[wrongly]AdvP interpreted]
(3) iets verkeerd gegeten
[something wrong]NP eaten
This suggests that, in line with exemplar-based theories of language, prior use of constructions leaves a (context-rich) trail in the mind of the language users.
In this talk, we want to see whether the same effect also occurs with regard to the regional variable. Can the regional provenance of the lexemes inserted in a construction exert an influence on the morphological realisation of the target construction, even if the construction is used by language users with a different regiolectal background? In our study southern speakers have a stronger tendency to drop the genitive -s, but less so when they are using ‘northern’ lexemes, and vice versa. This effect holds even if the regional provenance of the lexemes is subtle, and unlikely to be a shibboleth of a regionally recognisable type of speech. Furthermore, we see that while the analogical pull of lexemes with a regional profile is felt everywhere in the language area, the effect is more blurry in cities near the border of the two regions and more clear in the core areas. This finding shows that not only the language-internal context of prior instances is stored in memory, but the ‘language-external’, lectal context as well.

Language is shaped by processing pressures from production, or encoding, and reception, or decodi... more Language is shaped by processing pressures from production, or encoding, and reception, or decoding (Hawkins 2004). Evidence of psycholinguistic experiments indicate that when both pressures counteract one another, the latter generally takes precedence (see a.o. Ferreira and Dell (2000) and references cited therein). In this study, we aim to complement this work with corpus research. The employed case study concerns the Dutch verb zoeken ‘to search’, where language producers have the choice whether or not to explicitly mark the object using the preposition naar ‘to’, as in (1)-(2).
(1) We zoeken alternatieven. (Sonar corpus, Oostdijk et al. 2013, WR-P-P-G-0000254655.p.11.s.5)
‘We are looking for alternatives.’
(2) Wij zoeken dan wel naar alternatieven. (Sonar corpus, Oostdijk et al. 2013, WR-P-P-G-0000488037.p.6.s.3)
‘We, then, look for alternatives.’
Using data from the Sonar corpus, we find that the likelihood of naar increases as the object becomes more complex (Figure 1). There are at least three possible ways to explain this relation, however. The first is that the strictly unnecessary preposition helps the addressee decode the sentence, and expressing the preposition is therefore especially called for when the object is complex (cf. Rohdenburg's (1996) Complexity Principle). The second is that naar functions as a way to buy time for the producer to formulate a complex object. Finally, the third states that naar is preferred with more complex objects because it allows the producer to extrapose such objects to postfield position. This study will attempt to disentangle these three possible explanations.

The Germanic languages maintain two general strategies to form the finite past tense of a verb. T... more The Germanic languages maintain two general strategies to form the finite past tense of a verb. The first and oldest, named the strong inflection, is based on a vowel alternation, as in English sing-sang, while the second and youngest, contrastively called the weak inflection, employs a dental suffix, e.g. the English -ed in kick-kicked (Moonen 1706; Grimm 1819). The competition between both has been going on for thousands of years, and although the precise developments still remain rather unclear, the long-term trend is undeniably one of strong decline and weak ascension (Bailey 1997; Carroll, Svare and Salmons 2012; Van de Velde et al. 2017).
Our goal is twofold. First, in order to get a clear picture of the exact facts, a large-scale empirical study on Dutch has been set up. Second, to enable us to test what may be the causes underlying these observations, we designed an agent-based computer simulation (Gilbert 2008; Steels 2011). In earlier proposals, it is often assumed that the strong inflection is stripped of all regularity and equated with ‘irregular inflection’, while the weak inflection is presented as ‘regular inflection’ (Ball 1968: 164; Bailey 1997: 17; Cuskley et al. 2014; Colaiori et al. 2015; Pijpops and Beuls 2015). We will claim (i) that this assumption is empirically questionable (see also e.g. Knooihuizen and Strik 2014; Fertig in prep.), and (ii) that it is not strictly required to explain the facts. This last point follows from the results of the simulation, where the strong inflection was assumed to be regular, yet developments akin to those observed in reality could still be shown to emerge.

From the earliest attested stages on, Germanic languages have at their disposal two competing str... more From the earliest attested stages on, Germanic languages have at their disposal two competing strategies for building preterites. One strategy, exemplified by sing-song, is called the strong inflection. It relies on root apophony (ablaut), and is a reanalysis and extension of an earlier Indo-European aspectual system (Prokosch 1939; Lass 1990). The other strategy, exemplified by work-worked, is called the weak inflection. It does not use apophony, but suffixation, and finds its origin in the morphologisation of a Indo-European stem *dheh1/*dhoh1 (‘do’) added to the verb, eventually turning into a dental suffix (Ball 1968; Tops 1974; Bailey 1997; Hill 2010), though other sources have contributed as well (Heath 1998; Ringe 2007; Hill 2010).
Setting the emergence of a third strategy later in Germanic, namely the analytic perfect (exemplified in Afrikaans werk – het gewerk, lit. ‘has worked’) aside, it has often been observed that despite occasional shifts in the opposite direction, Germanic displays a long-term drift in which the weak inflection takes the upper hand at the expense of the strong inflection, although the strong inflection remains remarkably resilient, and still has not fully succumbed to the overall weakening trend (Van Haeringen 1940). Recent years have seen publications in which this ‘weakening’ drift is cast in quantitative terms. Lieberman et al. (2007) notice that in English, the weakening of the verbs follows a constant rate through time, is only dependent on the frequency of the verb, and neatly scales proportionally to the square root of the frequency of verbs. However, Carroll et al. (2012) replicated the study for German and found no such constant rate, hence casting doubt on the universality of the mathematical regularity that seemed to govern the weakening.
In our talk, we replicate the Lieberman et al. and the Carroll study for Dutch, allowing a comparison between the three languages in the Van Haeringen (1956) tradition. Our results confirm Carroll et al. (2012)’s critique on the constant rate.
Carroll et al. suggested that underlying the differences between English and German are demographic factors, but they left it to future research to actually dig deeper into the demographic history. In our talk, we pick up this thread and couple the weakening with historical demography. Our results indicate that the differences between these three big West-Germanic languages indeed seem related to population effects. Evidence is drawn from grammars and historical demographic databases. We further support our claims with agent-based computer simulation, extending earlier work by Pijpops et al. (2015).

Processing shapes grammatical organisation, including asymmetric coding with a marked vs. unmarke... more Processing shapes grammatical organisation, including asymmetric coding with a marked vs. unmarked alternance (Hawkins 2004), but it is unclear whether the processing considerations at issue are those of speakers or of addressees. Hawkins’s model is framed as benefiting the addressee, though he remarks that it equally benefits the speaker (2004: 24-25). Glossing over parsing and production is legitimate as long as speakers’ and addressees’ motivations are aligned, but this is not always the case. The idea that language has to seek an optimal balance between the often opposite demands of both speech act participants is old, harking back at least to Georg von der Gabelentz in the 19th century. So eventually, we will have to decide which of the two speech act participants has the upper hand in the processing-driven organisation of grammar.
On the one hand, there is evidence for an addressee-oriented view: Hawkins’s ‘Minimize Domains’ principle, stating that the syntactic structure should be recognisable in as short a span as possible, benefits the addressee, as the speaker is never unsure about the syntactic structure. Likewise, Rohdenburg’s (1996) Complexity Principle stating that in complex structures more explicit encoding is used is only beneficial to the addressee. If the structure is already complex, adding extra grammatical encoding arguably burdens the speaker’s performance even more. On the other hand, it is not self-evident that speakers should be concerned with their addressees’ needs forfeiting their own. Speaker’s altruism is evolutionarily implausible (Kirby 1999). Levinson (2000) also stresses the speaker’s needs in his neo-Gricean approach. As Levinson points out, the bottleneck in human communication is at the production side: decoding is much faster and more effortless than encoding (Levinson 2000: 28), so that taking inferential short-cuts to add layers of meaning on top of what is truth-conditionally encoded is especially helpful for the speaker. Adding extra material in the overtly coded variant in an alternance (e.g. zero- vs. that-complementation in English) goes against the rationale to prioritize production efficiency over parsing speed. Hawkins’s principle ‘Minimize Forms’ also seems first and foremost serve the speaker’s comfort. True, reducing forms also adds to the parsing effort, as the form-function pair of the extra encoding has to be stored in the hearer’s brain, but given the ease with which inferencing is accomplished (Levinson 2000), and given the vast storage capacities of the human mind (Dąbrowska 2014: 626), the extra speaker’s efforts outweigh the extra addressees’ efforts.
In our paper, we will adduce quantitative data from a close-up case study that can shed light in the debate over speaker vs. addressee processing. The case study deals with the direct object vs. prepositional object alternance in Dutch verbs, like zoeken (naar) ‘search (for)’. A corpus study reveal that the prepositional variant is used more often when the object is syntactically complex. This can be explained in two ways: first, the preposition can function as a signpost to help the addressee decode the message. This would be in line with Rohdenburg’s Complexity Principle, and would point to a hearer-driven processing account. Second, the use of a preposition allows the object to be extraposed (or ‘exbraciated’). This would be beneficial to the speaker, who can postpone the expression of the complex object at the end of the clause, when all other issues have been resolved, avoiding centre-embedding. On the basis of corpus investigation, we will tease apart both explanations. Of special interest are cases such as (1), where the head noun of the object is not extraposed (to the right of gezocht ‘search-PST.PTCP’), but the submodifying complement clause is. If the use of the prepositional variant is especially favoured in this context, this would be an argument for the first explanation. Here, the processing difficulty of the discontinuous object may be alleviated for the hearer by adding the extra signpost.

The Germanic past tense system is reliant upon two general morphological strategies. The first st... more The Germanic past tense system is reliant upon two general morphological strategies. The first strategy is called the strong inflection, and switches the vowel of the verb's stem, as in sing-sang and drive-drove. The second strategy is contrastively called the weak inflection, and adds a dental suffix to the stem, as in kick-kicked (Harbert 2007). This second strategy has been pushing aside the first over the course of thousands of years. The causes for this growing supremacy of the weak inflection are generally sought in the strong inflection's ostensible irregular nature - to the point where it is often equated with `irregular inection' (Lieberman et al. 2007) - and the weak inflection's frequency dominance (Carroll et al. 2012; Cuskley et al. 2014). However, neither were the case in the earliest stages of the Germanic language, and therefore, they cannot explain the initial success of the weak sux (Ball 1968, p. 164; Bailey 1997, p. 7-8). What we propose instead, is that the weak inflection owes its initial rise to its property of general applicability, i.e. its ability to be, in principle, applicable to any verb (Pijpops, Beuls, and Van de Velde 2015). In contrast, each individual strong ablaut class is only applicable to a subset of verbs. Under certain social conditions, this evolutionary advantage may be more decisive than the advantages of the strong in ection, e.g. its shorter verb forms. In order to investigate this, we have composed an agent-based simulation. In this simulation, the strong inflection maintains its original regularity, being composed out of several transparent ablaut classes, and initially dominates the entire verbal paradigm. Still, the results show that given certain social changes, most notably in cases of demographic turmoil and a rapid replacement of the population, the weak inflection may turn the table on the strong inflection. In addition, this rise of the weak inflection in the simulation is accompanied by a Conserving Effect (Bybee 2006). That is, the weak suffix can be shown to take over the low frequency verbs before moving on to the high frequency verbs. Under conditions less favorable to the weak inflection, these high frequency verbs are in fact capable of maintaining their strong forms indefinitely, and a stable equilibrium may evolve. That is, a dual system may develop in which strong and weak each occupy their own niche. Finally, we find an effect of Class Resilience. That is, particular strong ablaut classes seem better capable of protecting even their low frequency verbs against the rise of the weak inflection, while other classes wither away. All of these effects, which we also find in corpus data (Carroll et al. 2012), can be found to emerge merely as a result of the weak inflection's general applicability and demographic changes.

The construction, as a successor to the Saussurian sign, is usually envisaged as a discrete form-... more The construction, as a successor to the Saussurian sign, is usually envisaged as a discrete form-meaning pairing, with its meaning founded on formal oppositions to other constructions (de Saussure 1916; Goldberg 2006). In actual language use however, these oppositions may come under threat by superficial resemblances between constructions.
(1) and (2) are corpus examples of two constructions that are structurally and etymologically unrelated and express different meanings. In (1), the adverb verkeerd (‘wrongly’) modifies the verb geïnterpreteerd (‘interpreted’). In (2), iets (‘something’) and verkeerd (‘wrong’) together form a noun phrase in a partitive genitive construction (Hoeksema 1998; Broekhuis and Strang 1996). In Dutch, partitive genitives may appear both with and without an -s ending on the adjective (cf. (2) and (3), Pijpops and Van de Velde 2014). Conversely, adverbs as in (1) cannot receive this -s ending. This means that only the superficial strings of (1) and (2) look alike, not the underlying syntactic structure. In fact, other instances of these constructions may look very different from one another. Still, we will quantitatively show that the realization of the -s ending in partitive genitives is affected by the frequent occurrence of constructions as in (1).
(1) dat iets verkeerd geïnterpreteerd wordt?
that something wrongly interpreted gets?
‘that something gets wrongly interpreted?’
(2) in begin van de week iets verkeerd gegeten, vandaar
in beginning of the week something wrong-∅ eaten hence
‘I had eaten something wrong at the start of the weak, that’s why.
(3) Ik had iets verkeerd-s gegeten en ik werd beroerd.
I had something wrong-S eaten and I became ill
‘I had eaten something wrong and I became ill.’
Data were drawn from the ConDiv corpus of written Dutch (Grondelaers et al. 2000) and were analyzed primarily by means of logistic regression. It turned out that the effect described above even outperforms typically cited regional differences as a predictor of -s presence (van der Horst 2008: 1624–1625; Booij 2010: 224; Broekhuis 2013: 426).
We claim that this contaminating influence is a result of chunking. That is, instead of analyzing utterances to the bone in interpretation, and building them from scratch in production, language users can make use of short-cuts by storing and accessing unanalyzed wholes (Dąbrowska 2012; Ferreira, Bailey and Ferraro 2002; Ferreira and Patson 2007). This then may cause the processing of instances like (1) and (2) to cross paths, resulting in constructional contamination.

The Germanic languages boast two morphological strategies for past tense formation. The strong in... more The Germanic languages boast two morphological strategies for past tense formation. The strong inflection is based on an ablaut in the verb’s stem (e.g. sing ~ sang, drive ~ drove) and is the oldest, largely descendant from the Indo-European mother tongue (Harbert 2007). The weak inflection, by contrast, adds a dental suffix to the stem (e.g. laugh ~ laughed), and constitutes a Proto-Germanic innovation. In the history of the Germanic languages, this dental suffix has had considerable success in taking over past tense formation, to the detriment of the strong inflection (Harbert 2007; Lieberman et al. 2007; Cuskley et al. 2014).
To account for this success, three explanations are given in the literature (Ball 1968: 164; Bailey 1997: 7–8). First, while each separate strong ablaut class is only applicable to a subset of verbs, the weak suffix can, in principle, be attached to any verb indiscriminately. Second, some verbs escaped ablaut formation altogether, for instance because they had a vowel that fitted in none of the ablauting patterns. Such verbs would then create a save nest for the nascent weak inflection, free of competing strong forms. Third, the strong inflection was ravaged by the effects of several sound laws, which severely undermined its transparency. This would have rendered it vulnerable to competition from the seemingly more transparent weak inflection.
We will claim that the first explanation is already sufficient to account for the rise of the weak inflection. Moreover, it may explain why the weak inflection first took over the low frequency verbs and low frequency ablaut classes (Carroll, Svare and Salmons 2012). Since we then no longer need the irregularization of the strong inflection to explain these effects, this irregularization may be the result of the rise of the weak inflection, rather than its cause.
To support these claims, we have built an agent-based simulation. In this simulation, computational agents communicate with each other by referring to past events, thereby employing either the strong or weak inflection. The agents preferably use the forms that they hear most often from their fellow agents. The simulation was composed in Babel2, a framework for building agent-based models of language evolution (Steels 2012).
In the simulation, the only difference between the strong and weak inflection lies in the first explanation given above. Any other possible advantages for the weak inflection were excluded from the model. Under such conditions, it can be observed that a rise of the weak inflection will come to pass in both type and token frequency, accompanied by a Conserving Effect of both the verbs and the ablaut classes (Bybee 2006; Carroll, Svare and Salmons 2012). This rise even takes place if the weak dental suffix starts out as inferior in both type and token frequency to any individual strong ablaut class.

I. INTRODUCTION
In Dutch, a number of psych verbs exhibit an alternation between a reflexive (1) ... more I. INTRODUCTION
In Dutch, a number of psych verbs exhibit an alternation between a reflexive (1) and transitive argument construction (2). The present corpus study investigates what factors drive the choice of the language user between these constructions, for the verbs ergeren (‘to annoy’), interesseren (‘to interest’), storen (‘to disturb’) and verbazen (‘to amaze’).
(1) Reflexive construction (experiencer-subject)
Daar erger ik me groen en geel aan. (CGN)
There annoy I myself green and yellow to
‘That greatly annoys me.’
(2) Transitive construction (stimulus-subject)
Dit […] ergerde de Romeinen mateloos. (ConDiv)
This […] annoyed the Romans excessively
‘This […] excessively annoyed the Romans.’
II. HYPOTHESES
A. Agentivity hypothesis
The agentivity hypothesis is put forward, be it in varying forms, in quite different theoretical frameworks (a.o. Dowty 1991; Langacker 1995; Pesetsky 1995). It may be summarized as follows.
For mental states or events, it is not always clear which of the participants, i.e. the stimulus or experiencer, is more agentive. This causes variation in argument realization. The more agentive participant is assigned subject position.
This hypothesis may operate at two levels. At the type level, the agentivity hypothesis states that verbs whose lexical meaning attributes a more agentive role to the experiencer, will be more compatible with experiencer-subject constructions. The operationalization of the agentivity hypothesis at this level is taken over from Van de Velde (2004: 53–55) and embodied by the variable Verb. This operationalization leads us to consider interesseren (‘to interest’) to entail the most agentive experiencer, followed by either ergeren (‘to annoy’) or storen (‘to disturb’), and finally verbazen (‘to amaze’). Preference for the transitive construction is therefore expected to rise from interesseren to either ergeren or storen and finally to verbazen.
The second level is the token level. Here, the agentivity hypothesis predicts that given a particular utterance, the language user will put the currently most agentive participant in subject position. The operationalization at the token level is taken over from Levin and Grafmiller (2012) and embodied by the variable Stimulus-Animacy. It predicts that utterances with animate stimuli will prefer the transitive construction, while inanimate objects, especially abstract entities, will prefer the reflexive construction.
B. Etymology hypothesis
The etymology hypothesis is inspired on Klein and Kutscher (2005), who posit that it’s not the psychological meaning of psych verbs that determines their argument construction, but rather their (ties with a former) physical meaning. Etymological inquiry led us to suspect that storen most strongly favors the transitive construction, followed by either ergeren (‘to annoy’) or verbazen (‘to amaze’), and finally interesseren (‘to interest’).
C. Topicality hypothesis
The topicality hypothesis is operationalized through the variables Stimulus- and Experiencer-Topicality. These variables present a scale ranging from the first and second persons, to the third person pronouns, the definite nouns and the indefinite nouns. It is expected that preference for object position rises as we go to the end of this scale.
III. RESULTS
All instances of the four verbs were extracted from the Corpus of Spoken Dutch (CGN, Oostdijk et al. 2002) and the ConDiv corpus (Grondelaers et al. 2000). These instances were manually checked, and a number of them had to be excluded. The resulting dataset contained 1810 occurrences, which were tagged for the hypothesis-driven variables presented above, and number of nuisance variables. Next, a logistic regression model was composed using a stepwise variable selection procedure. The hypothesis-driven variables turned out to be the most important predictors in the model. Their effect plots can be found in Figure 1.
The variable Verb does not confirm the agentivity hypothesis at the type-level, nor the etymology hypothesis. Conversely, the variable Stimulus-Animacy does more or less confirm the animacy hypothesis at the token level. The topicality hypothesis is confirmed by Stimulus-Topicality, but Experiencer-Topicality behaves exactly opposite to what was predicted. However, we will show that in retrospect, such behavior might not be as aberrant as it appears on first sight.
IV. CONCLUSIONS
To end with, we shortly summarize the relevance of this study for theories of argument realization. First, the study has shown that inter- and intralingual generalizations such as the agentivity and topicality hypothesis definitely seem possible (cf. Levin and Rappaport Hovav 2005). Second, our failure to confirm the type level agentivity hypothesis means that caution may be in order when applying the agentivity hypothesis too rigidly at the type level. Finally, the confirmation of the token level agentivity hypothesis seems to indicate that argument constructions do seem to add meaning to utterances, separately from the meaning of the verb (Goldberg 1995; Colleman and De Clerck 2009).

1. A hostile environment
In most present-day Germanic languages, the weak inflection (work-work... more 1. A hostile environment
In most present-day Germanic languages, the weak inflection (work-worked) offers a well-established and regular strategy for past tense formation. In contrast, the strong inflection (sing-sang) currently seems no more than a diminishing rubble of sub-rules and irregularities (Harbert, 2007, p. 277).
Still, things were once different. Language reconstruction shows that around the time of the birth of the weak-inflection, the strong inflection is likely to have been both clearly regular and dominant in frequency (Bailey, 1997). To explain the conundrum of how a nascent weak dental suffix could have possibly gained the upper hand in such a hostile environment, researchers usually refer to sound changes undermining the regularity of the strong system (Bailey, 1997, p. 17; Ball, 1968, p. 164). We will claim that this assumption is not needed. Instead, the rise of the weak inflection may be initially caused by nothing more than its general applicability, i.e. its ability to be – in principle – applied to any verb. In addition, this general applicability proves capable of explaining that the rise of the weak inflection (i) first affects low frequency verbs, and only later high frequency verbs, and (ii) more heavily affects particular ablaut classes than others. In concert, these effects may create the conditions in which a perfectly functioning strong ablaut system can be surrendered to the disruptive influence of sound changes without causing a problem to the language users.
2. Model design and behavior
We ran an agent-based model (Gilbert, 2008), containing the following features:
• There are no irregular verbs, nor ways for verbs or ablaut classes to become irregular.
• The weak dental suffix starts out inferior in both type and token frequency to each individual strong ablaut class.
• All verbs in the model can be conjugated both strongly and weakly.
• The only difference between the strong ablaut classes and weak dental suffix lies in the dental suffix’s general applicability.
• The agents do not show any (socially attributed) preference for one of the variants, neither in acquisition nor use. Instead, the simply prefer the variant that they more often hear.
• Agents age and are gradually replaced.
• The verbs show a realistic, Zipfian frequency distribution (Zipf, 1932).
Under these conditions, it is shown that a gradual rise of the weak dental suffix will take place, first attacking the low-frequency verbs and the low-frequency ablaut classes. Highly frequent ablaut classes prove capable of protecting their low-frequent members against weakening. These effects emerge independently of specific parameter settings.
Acknowledgements
We would like to thank Remi van Trijp for useful comments about the model.

This study presents a corpus study of the Dutch psychological verbs ergeren (‘to annoy’), interes... more This study presents a corpus study of the Dutch psychological verbs ergeren (‘to annoy’), interesseren (‘to interest’), storen (‘to disturb’) and verbazen (‘to amaze’). These verbs exhibit a syntactic alternation between their seemingly synonymous reflexive and transitive case frames, as in Elizabeth ergert John vs. John ergert zich aan Elizabeth (both: ‘Elizabeth annoys John’). The data were analyzed through logistic regression modelling. It was found that the alternation was most strongly determined by the choice of verb, as well as by the agentivity and topicality of the participants. That is, within each of the four verbs separately, more agentive and informationally lighter stimuli, as well as informationally lighter experiencers, elicited the use of the transitive case frame. However, contrary to expectations based on theoretical accounts of argument realization (Dowty 1991, Langacker 1991, Croft 1993, Levin & Rappaport-Hovav 2005), verbs whose lexical meaning entailed a more agentive experiencer did not more often realize this experiencer in subject position than other verbs. We were also unable to predict the dominant case frame of a verb based on their historical semantic development.

In present-day English, Dutch, and most of their Germanic siblings, the verbal weak inflection of... more In present-day English, Dutch, and most of their Germanic siblings, the verbal weak inflection offers a well-established and regular strategy for forming the past tense. In contrast, the strong inflection seems to present no more than a diminishing rubble of vowel sub-rules and irregularities (Harbert 2007: 277). Still, things were once different. As an innovation specific to Proto-Germanic, the fledging weak dental suffix had to compete with a sturdy strong ablaut-system inherited from Proto-Indo-European, which is assumed to have been both clearly regular and dominant in frequency (Bailey 1997: 8).
Earlier computational models of this competition have either focused exclusively on language acquisition (Rumelhart and McClelland 1986; Pinker and Prince 1988; Marcus et al. 1995; Taatgen and Anderson 2002), on the role of acquisition in language change when the weak inflection was already well-established (Hare and Elman 1995; Yang 2002), or have explicitly disregarded the regularity of the strong system (Colaiori et al. 2015; Pijpops and Beuls subm.). However, these models don’t address how a nascent weak inflection could have possibly gained enough momentum to overthrow a both regular and dominant strong system.
To explain this enigma, several proposals have been put forward in the historical literature (Ball 1968: 164; Bailey 1997: 17). The first states that the dental suffix is in principle applicable to all verbs, while each separate strong vowel alternation is not. That is, the strong system presents a broken formation of several vowel alternations against a single dental suffix. The second holds that even as a whole, the strong system was not applicable to some particular verbs, which would then create a safe nest for the weak inflection to mature. The last posits that the regularity of the strong system was being undermined by sound changes, allowing the weak inflection to take advantage of the created irregularities.
To investigate whether these causes can indeed be responsible for the ascent of the weak inflection, an agent-based model was created in which the focus lies on language use rather than acquisition (cf. Croft 2000; Bybee 2010). The model has been integrated into the Babel2-framework (Loetzsch et al. 2008), and the competing constructions have been implemented in the Fluid Construction Grammar formalism (Steels 2011; van Trijp et al. 2012). The model’s behavior showed that, in the long run, the first explanation alone already suffices to explain the rise of the weak inflection, even if each separate vowel alternation starts out more frequent than the weak inflection. This finding of course does not say that the second and third proposals did not help in creating more optimal conditions for the weak inflection to start its ascent. It does mean, however, that the disintegration of the strong system might be the result and subsequent catalyst rather than the original cause of the rise of the weak inflection. That is, perhaps both are related through a push chain, rather than a drag chain.
Uploads
Conference Presentations by Dirk Pijpops
(1) de hijab is iets moois wat door Marokkaanse wijven helemaal verpest is
the hijab is something beautiful-GEN that by Moroccan women totally ruined is
‘The hijab is something beautiful that is totally ruined by Moroccan women.’
(2) Is dat iets verkeerd
is that something wrong-∅
‘Is that something wrong?’
This -s ending is one of the few surviving remnants of the Dutch genitive case, more specifically the partitive genitive: hence the name of the construction. The partitive genitive construction is a combination of an indefinite pronoun or numeral with a postmodifying adjectival phrase, although the exact theoretical architecture of the construction is still very much up for debate (Schultink 1962: 62; Kester 1996; van Marle 1996; Broekhuis and Strang 1996; Hoeksema 1998; Booij 2010: 223–228; Broekhuis 2013: 419–461). We then ask whether and, if so, how the language users of the Moroccan-Dutch ethnolect differs from full native language use in the utilization of this adjectival -s ending. This may be a difference in absolute numbers, but may also pertain to the number and choice of the factors that determine the appearance of the -s ending. There are four possible options:
(i) Like the -e ending, the users of Moroccan Dutch generalize the -s ending to all instances of the partitive genitive, thereby refunctionalizating this remnant of the Dutch case system as a transparent and reliable construction marker (cf. Booij 2010: 223–228).
(ii) The users of Moroccan Dutch generalize the zero ending, thereby ridding their language of an obsolete fossil from bygone times. This would be a continuation of the deflexion trend apparent in the development of Dutch (van der Horst 2013). The resulting state would be akin to English, where only the zero ending is used.
(iii) The users of Moroccan Dutch employ both the -s and zero ending in exactly the same way as other language users of Dutch, implementing the same factors to determine the choice between both variants. This would indicate that these factors are of a qualitatively different nature than the factors determining the use of the -e ending, as the users of Moroccan Dutch apparently do not or cannot dispose of them.
(iv) The users of Moroccan Dutch employ both the -s and zero ending in the partitive genitive construction, but in a different way than other language users of Dutch. This would indicate that these language users are creatively adapting their language to cater to new or other needs.
To investigate this, we will apply regression modelling to corpus data, as proposed by Gries and Deshors (2014). Gries & Deshors advocate this methodology as a way to fully exploit the potential of Granger's (1996) model of Contrastive Interlanguage Analysis (CIA). As a source of data, we turn to the Moroccorp corpus, which contains chat conversations in the Moroccan-Dutch etnolect and has already proven its value in studies on Dutch adjectival inflection (Ruette and Van de Velde 2013; Van de Velde and Weerman 2014). We extracted a number of possible partitive genitives which we manually filtered, and finally retained 1613 genuine partitive genitive instances. These partitive genitives are contrasted with 765 observations of partitive genitives taken from Netherlandic chat conversations in the ConDiv corpus (Grondelaers et al. 2000), adopted from an earlier study by Pijpops and Van de Velde (2014). The Moroccorp corpus was specifically designed to be commensurable to this subsection of ConDiv (Ruette and Van de Velde 2013: 467–470). Finally, we employed mixed logistic regression modelling to investigate how the realization of partitive genitives differs in both corpora.
This study will shed light on how early L2/2L1 speakers deal with seemingly defunct morphology in Dutch, and hopes to answer the call of Gries and Deshors (2014) for more elaborate statistical methods in CIA research.
At least two of these, which will be called the Problem of Prediction and the Problem of Proliferation, have already been noted in earlier studies. The first pertains to the formulation of specific predictions regarding low-level constructions based on only high-level, abstract semantic notions such as affectedness, involvement or agency (see Lenci 2012: 13–15, and also Broccias 2001; Perek 2015: 90–144). For example, when discussing the influence of affectedness on the argument variation of the Italian verb rimproverare ‘reproach’, Lenci (2012: 14) notes that “this interpretation would require us to stretch the meaning of affectedness well beyond its standard (fairly high) vagueness and polysemy, thereby impairing its reliability as a truly explanatory notion”. The second problem relates to positing ever more concrete constructions, which may draw the critique of non-parsimony (Culicover and Jackendoff 2005; Traugott and Trousdale 2013: 5–11). We will attempt to demonstrate that these problems are caused by a third, more fundamental problem, named the Problem of Precedence. This problem asks at which level in the constructional network speakers primarily employ a construction to communicate meaning, optimize information structure or express lectal distinctions. Next, we will argue that this concern does not constitute a theoretical issue, but rather an empirical question.
Finally, we introduce a methodological approach to deal with this question. To illustrate the approach, we employ as a case study the alternation between the Dutch transitive and prepositional argument constructions, as in (1)-(2). We identify a seemingly motley collection of 102 verbs exhibiting the alternation and map out the relevant region of the constructional network. Fully abstract argument constructions are first put under scrutiny, after which we continue on to more lexically specific constructions. The goal of this procedure is to identify the precedence level at which the alternation is predominantly active, thus solving the Problem of Precedence. It will be demonstrated that doing so will also enable us to tackle both the Problems of Prediction and Proliferation.
(1) Minister Vandenbroucke zoekt (naar) een oplossing.
‘Secretary Vandenbroucke is searching a solution.’
(2) (Met) hete koffie gemorst.
‘Spilled hot coffee.’
At least two of these, which will be called the Problem of Prediction and the Problem of Proliferation, have already been noted in earlier studies. The first pertains to the formulation of specific predictions regarding low-level constructions based on only high-level, abstract semantic notions such as affectedness, involvement or agency (see Lenci 2012: 13–15, and also Broccias 2001; Perek 2015: 90–144). For example, when discussing the influence of affectedness on the argument variation of the Italian verb rimproverare ‘reproach’, Lenci (2012: 14) notes that “this interpretation would require us to stretch the meaning of affectedness well beyond its standard (fairly high) vagueness and polysemy, thereby impairing its reliability as a truly explanatory notion”. The second problem relates to positing ever more concrete constructions, which may draw the critique of non-parsimony (Culicover and Jackendoff 2005; Traugott and Trousdale 2013: 5–11). We will attempt to demonstrate that these problems are caused by a third, more fundamental problem, named the Problem of Precedence. This problem asks at which level in the constructional network speakers primarily employ a construction to communicate meaning, optimize information structure or express lectal distinctions. Next, we will argue that this concern does not constitute a theoretical issue, but rather an empirical question.
Finally, we introduce a methodological approach to deal with this question. To illustrate the approach, we employ as a case study the alternation between the Dutch transitive and prepositional argument constructions, as in (1)-(2). We identify a seemingly motley collection of 102 verbs exhibiting the alternation and map out the relevant region of the constructional network. Fully abstract argument constructions are first put under scrutiny, after which we continue on to more lexically specific constructions. The goal of this procedure is to identify the precedence level at which the alternation is predominantly active, thus solving the Problem of Precedence. It will be demonstrated that doing so will also enable us to tackle both the Problems of Prediction and Proliferation.
(1) Minister Vandenbroucke zoekt (naar) een oplossing.
‘Secretary Vandenbroucke is searching a solution.’
(2) (Met) hete koffie gemorst.
‘Spilled hot coffee.’
As a case study, we zoom in on the Dutch partitive genitive construction. This construction exhibits variation between a form with and without -s ending, as in (1) and (2). The form with the -s ending is predominant in the Netherlandic regiolect, while the form without -s constitutes a marker of the Belgian regiolect (Pijpops & Van de Velde 2014). Because of this distinction between the Netherlands and Belgium, i.e. a language-external factor, partitive genitive types that feature typically Netherlandic lexemes, such as (1), more often appear in the variant with -s, whereas those that contain typically Belgian lexemes, such as (2), will more often appear without the -s. Our hypothesis was that these lexical preferences got entrenched, so that Belgian speakers using Netherlandic lexemes would import the Netherlandic morphological variant and vice versa. In other words: while the formal realisation is straightforwardly regionally stratified, we expect these lexical preferences to hold even within the Netherlandic and Belgian regiolects.
(1) Iets bijzonder(s)
‘Something remarkable’
(2) Iets speciaal(s)
‘Something special’
We tested this prediction on 3018 manually checked observations from the ConDiv corpus of written Dutch (Grondelaers et al. 2000) and found it to be confirmed, even when controlling for all other known variables to influence -s omission. Furthermore, we drew geographically-tagged data from Twitter, totaling 1299 manually checked instances, to replicate this finding and to investigate the geographical spread of both lectal contamination and the partitive genitive variation.
The effect of lectal contamination can only be explained if we have a sufficiently precise account of how individual speakers operate in language contact situations (Weinreich, Labov & Herzog 1968). If language contact can, in this way, cause lectal variation to produce lect-internal effects, then a variationist description of a particular regio-, dia-, socio- or ethnolect crucially depends on an understanding of language contact.
Lieberman et al. (2007) aimed to quantify the evolutionary dynamics of language by investigating the rise of the English regular past tense inflection, which they equated with the weak -ed suffix. Yet, their bold conclusion that “the half-life of an irregular verb scales as the square root of its usage frequency: a verb that is 100 times less frequent regularizes 10 times as fast” (Lieberman et al., 2007, p.713) has successively attracted criticism from scholars in the fields of historical and evolutionary linguistics. First, Carroll, Svare, & Salmons (2012) showed that this constant regularization rate does not hold true for the closely-related German language. Second, Cuskley et al. (2014) found that the rise of the English weak ed suffix is not driven by forces endogenous to language, such as analogy, but rather by external forces, such as new verbs entering the language through language contact.
We will reassess the constant-rate controversy by (i) extending the methodological scope with agent-based modeling, and (ii) extending the number of languages going beyond the German-English distinction, adding Dutch.
Our results show that the constant rate does not hold. If language change is co-determined by external forces resulting in languages adapting to its niche (Lupyan & Dale 2016) this is exactly what one would expect, since English, Dutch and German have endured external pressures to a different degree. We will focus on the influence of demographic change. In particular, we investigate the growth of cities and the resulting koineization due to migration in the three language areas since the Middle Ages. The three different degrees of urbanization have led to different degrees of dialect contact, which could in turn, as we will argue, lead to different regularization rates. To support this claim, we will present both empirical evidence from linguistic and demographic databases, as well as the results of a computational simulation.
2. Empirical data
2.1. Linguistic data
To obtain a clear picture of the linguistic situation, we included the data on English from Lieberman et al. (2007) and the data on German from Carroll et al. (2012), and complemented these with our own Dutch data. This enables us to track the development of the past tense system of these three languages over a 1000 year period (800-1800).
2.2. Demographic data
For the demographic data, we make use of the databases of Bairoch et al. (1988), De Vries (1984), and Mitchell (1998). In particular, we compare the population growth of the largest cities in the English, Dutch and German language areas in each particular time period from 800-1800. Historical research has shown that the exponential growth of urban population cannot be reduced to natural growth, but is driven by immigration as well, both of foreigners and of by a rural exodus from the larger agglomeration, leading to dialect contact. We then observed correlations between the success of the weak inflection and the amount of demographic upheaval.
3. Simulation
A correlation between a demographic and a linguistic trend does not automatically entail a causation between the former and the latter, however. To further substantiate our claim, we therefore turn to an agent-based computer simulation. In this simulation, agents store exemplars or tokens of what they hear (cf. Pijpops et al., 2015), rather than type states (cf. Colaiori et al., 2015), and use these to produce novel forms. We find that (i) the weak inflection does not require special status as the single regular inflection in order to explain the tendencies observed in reality; (ii) replacement of verbs can indeed cause a continued rise of the weak inflection, even after a stable equilibrium between weak and strong verbs has emerged, confirming Cuskley et al. (2014); and most importantly (iii) if our current understanding of language, as implemented in the simulation, is correct, demography does indeed affect the rise of the weak inflection.
(1) iets bijzonder(-s)
something special-GEN
‘something special’
While the two variants do not show any observable semantic difference, Pijpops & Van de Velde (2014) applied mixed-model logistic regression and found that the expression of the -s is probabilistically determined by a number of factors. While overall, the [+s] variant is more frequent, the [-s] variant is also fairly common, and is more likely to occur (i) in informal registers, (ii) in low-frequency phrases, and (iii) in the south of the language area (Belgium). There also is a strong main effect for the [-s] variant for adjectives that occurred in superficially similar non-partitive constructions. This is illustrated in (2) and (3): though similar in surface form, the contexts makes clear that (2) is not a partitive construction. The absence of the -s morpheme then spills over to genuine partitives like (3) (see Pijpops & Van de Velde, forthc. for extensive explanation on what they call ‘constructional contamination’).
(2) iets verkeerd geïnterpreteerd
[something]NP [[wrongly]AdvP interpreted]
(3) iets verkeerd gegeten
[something wrong]NP eaten
This suggests that, in line with exemplar-based theories of language, prior use of constructions leaves a (context-rich) trail in the mind of the language users.
In this talk, we want to see whether the same effect also occurs with regard to the regional variable. Can the regional provenance of the lexemes inserted in a construction exert an influence on the morphological realisation of the target construction, even if the construction is used by language users with a different regiolectal background? In our study southern speakers have a stronger tendency to drop the genitive -s, but less so when they are using ‘northern’ lexemes, and vice versa. This effect holds even if the regional provenance of the lexemes is subtle, and unlikely to be a shibboleth of a regionally recognisable type of speech. Furthermore, we see that while the analogical pull of lexemes with a regional profile is felt everywhere in the language area, the effect is more blurry in cities near the border of the two regions and more clear in the core areas. This finding shows that not only the language-internal context of prior instances is stored in memory, but the ‘language-external’, lectal context as well.
(1) We zoeken alternatieven. (Sonar corpus, Oostdijk et al. 2013, WR-P-P-G-0000254655.p.11.s.5)
‘We are looking for alternatives.’
(2) Wij zoeken dan wel naar alternatieven. (Sonar corpus, Oostdijk et al. 2013, WR-P-P-G-0000488037.p.6.s.3)
‘We, then, look for alternatives.’
Using data from the Sonar corpus, we find that the likelihood of naar increases as the object becomes more complex (Figure 1). There are at least three possible ways to explain this relation, however. The first is that the strictly unnecessary preposition helps the addressee decode the sentence, and expressing the preposition is therefore especially called for when the object is complex (cf. Rohdenburg's (1996) Complexity Principle). The second is that naar functions as a way to buy time for the producer to formulate a complex object. Finally, the third states that naar is preferred with more complex objects because it allows the producer to extrapose such objects to postfield position. This study will attempt to disentangle these three possible explanations.
Our goal is twofold. First, in order to get a clear picture of the exact facts, a large-scale empirical study on Dutch has been set up. Second, to enable us to test what may be the causes underlying these observations, we designed an agent-based computer simulation (Gilbert 2008; Steels 2011). In earlier proposals, it is often assumed that the strong inflection is stripped of all regularity and equated with ‘irregular inflection’, while the weak inflection is presented as ‘regular inflection’ (Ball 1968: 164; Bailey 1997: 17; Cuskley et al. 2014; Colaiori et al. 2015; Pijpops and Beuls 2015). We will claim (i) that this assumption is empirically questionable (see also e.g. Knooihuizen and Strik 2014; Fertig in prep.), and (ii) that it is not strictly required to explain the facts. This last point follows from the results of the simulation, where the strong inflection was assumed to be regular, yet developments akin to those observed in reality could still be shown to emerge.
Setting the emergence of a third strategy later in Germanic, namely the analytic perfect (exemplified in Afrikaans werk – het gewerk, lit. ‘has worked’) aside, it has often been observed that despite occasional shifts in the opposite direction, Germanic displays a long-term drift in which the weak inflection takes the upper hand at the expense of the strong inflection, although the strong inflection remains remarkably resilient, and still has not fully succumbed to the overall weakening trend (Van Haeringen 1940). Recent years have seen publications in which this ‘weakening’ drift is cast in quantitative terms. Lieberman et al. (2007) notice that in English, the weakening of the verbs follows a constant rate through time, is only dependent on the frequency of the verb, and neatly scales proportionally to the square root of the frequency of verbs. However, Carroll et al. (2012) replicated the study for German and found no such constant rate, hence casting doubt on the universality of the mathematical regularity that seemed to govern the weakening.
In our talk, we replicate the Lieberman et al. and the Carroll study for Dutch, allowing a comparison between the three languages in the Van Haeringen (1956) tradition. Our results confirm Carroll et al. (2012)’s critique on the constant rate.
Carroll et al. suggested that underlying the differences between English and German are demographic factors, but they left it to future research to actually dig deeper into the demographic history. In our talk, we pick up this thread and couple the weakening with historical demography. Our results indicate that the differences between these three big West-Germanic languages indeed seem related to population effects. Evidence is drawn from grammars and historical demographic databases. We further support our claims with agent-based computer simulation, extending earlier work by Pijpops et al. (2015).
On the one hand, there is evidence for an addressee-oriented view: Hawkins’s ‘Minimize Domains’ principle, stating that the syntactic structure should be recognisable in as short a span as possible, benefits the addressee, as the speaker is never unsure about the syntactic structure. Likewise, Rohdenburg’s (1996) Complexity Principle stating that in complex structures more explicit encoding is used is only beneficial to the addressee. If the structure is already complex, adding extra grammatical encoding arguably burdens the speaker’s performance even more. On the other hand, it is not self-evident that speakers should be concerned with their addressees’ needs forfeiting their own. Speaker’s altruism is evolutionarily implausible (Kirby 1999). Levinson (2000) also stresses the speaker’s needs in his neo-Gricean approach. As Levinson points out, the bottleneck in human communication is at the production side: decoding is much faster and more effortless than encoding (Levinson 2000: 28), so that taking inferential short-cuts to add layers of meaning on top of what is truth-conditionally encoded is especially helpful for the speaker. Adding extra material in the overtly coded variant in an alternance (e.g. zero- vs. that-complementation in English) goes against the rationale to prioritize production efficiency over parsing speed. Hawkins’s principle ‘Minimize Forms’ also seems first and foremost serve the speaker’s comfort. True, reducing forms also adds to the parsing effort, as the form-function pair of the extra encoding has to be stored in the hearer’s brain, but given the ease with which inferencing is accomplished (Levinson 2000), and given the vast storage capacities of the human mind (Dąbrowska 2014: 626), the extra speaker’s efforts outweigh the extra addressees’ efforts.
In our paper, we will adduce quantitative data from a close-up case study that can shed light in the debate over speaker vs. addressee processing. The case study deals with the direct object vs. prepositional object alternance in Dutch verbs, like zoeken (naar) ‘search (for)’. A corpus study reveal that the prepositional variant is used more often when the object is syntactically complex. This can be explained in two ways: first, the preposition can function as a signpost to help the addressee decode the message. This would be in line with Rohdenburg’s Complexity Principle, and would point to a hearer-driven processing account. Second, the use of a preposition allows the object to be extraposed (or ‘exbraciated’). This would be beneficial to the speaker, who can postpone the expression of the complex object at the end of the clause, when all other issues have been resolved, avoiding centre-embedding. On the basis of corpus investigation, we will tease apart both explanations. Of special interest are cases such as (1), where the head noun of the object is not extraposed (to the right of gezocht ‘search-PST.PTCP’), but the submodifying complement clause is. If the use of the prepositional variant is especially favoured in this context, this would be an argument for the first explanation. Here, the processing difficulty of the discontinuous object may be alleviated for the hearer by adding the extra signpost.
(1) and (2) are corpus examples of two constructions that are structurally and etymologically unrelated and express different meanings. In (1), the adverb verkeerd (‘wrongly’) modifies the verb geïnterpreteerd (‘interpreted’). In (2), iets (‘something’) and verkeerd (‘wrong’) together form a noun phrase in a partitive genitive construction (Hoeksema 1998; Broekhuis and Strang 1996). In Dutch, partitive genitives may appear both with and without an -s ending on the adjective (cf. (2) and (3), Pijpops and Van de Velde 2014). Conversely, adverbs as in (1) cannot receive this -s ending. This means that only the superficial strings of (1) and (2) look alike, not the underlying syntactic structure. In fact, other instances of these constructions may look very different from one another. Still, we will quantitatively show that the realization of the -s ending in partitive genitives is affected by the frequent occurrence of constructions as in (1).
(1) dat iets verkeerd geïnterpreteerd wordt?
that something wrongly interpreted gets?
‘that something gets wrongly interpreted?’
(2) in begin van de week iets verkeerd gegeten, vandaar
in beginning of the week something wrong-∅ eaten hence
‘I had eaten something wrong at the start of the weak, that’s why.
(3) Ik had iets verkeerd-s gegeten en ik werd beroerd.
I had something wrong-S eaten and I became ill
‘I had eaten something wrong and I became ill.’
Data were drawn from the ConDiv corpus of written Dutch (Grondelaers et al. 2000) and were analyzed primarily by means of logistic regression. It turned out that the effect described above even outperforms typically cited regional differences as a predictor of -s presence (van der Horst 2008: 1624–1625; Booij 2010: 224; Broekhuis 2013: 426).
We claim that this contaminating influence is a result of chunking. That is, instead of analyzing utterances to the bone in interpretation, and building them from scratch in production, language users can make use of short-cuts by storing and accessing unanalyzed wholes (Dąbrowska 2012; Ferreira, Bailey and Ferraro 2002; Ferreira and Patson 2007). This then may cause the processing of instances like (1) and (2) to cross paths, resulting in constructional contamination.
To account for this success, three explanations are given in the literature (Ball 1968: 164; Bailey 1997: 7–8). First, while each separate strong ablaut class is only applicable to a subset of verbs, the weak suffix can, in principle, be attached to any verb indiscriminately. Second, some verbs escaped ablaut formation altogether, for instance because they had a vowel that fitted in none of the ablauting patterns. Such verbs would then create a save nest for the nascent weak inflection, free of competing strong forms. Third, the strong inflection was ravaged by the effects of several sound laws, which severely undermined its transparency. This would have rendered it vulnerable to competition from the seemingly more transparent weak inflection.
We will claim that the first explanation is already sufficient to account for the rise of the weak inflection. Moreover, it may explain why the weak inflection first took over the low frequency verbs and low frequency ablaut classes (Carroll, Svare and Salmons 2012). Since we then no longer need the irregularization of the strong inflection to explain these effects, this irregularization may be the result of the rise of the weak inflection, rather than its cause.
To support these claims, we have built an agent-based simulation. In this simulation, computational agents communicate with each other by referring to past events, thereby employing either the strong or weak inflection. The agents preferably use the forms that they hear most often from their fellow agents. The simulation was composed in Babel2, a framework for building agent-based models of language evolution (Steels 2012).
In the simulation, the only difference between the strong and weak inflection lies in the first explanation given above. Any other possible advantages for the weak inflection were excluded from the model. Under such conditions, it can be observed that a rise of the weak inflection will come to pass in both type and token frequency, accompanied by a Conserving Effect of both the verbs and the ablaut classes (Bybee 2006; Carroll, Svare and Salmons 2012). This rise even takes place if the weak dental suffix starts out as inferior in both type and token frequency to any individual strong ablaut class.
In Dutch, a number of psych verbs exhibit an alternation between a reflexive (1) and transitive argument construction (2). The present corpus study investigates what factors drive the choice of the language user between these constructions, for the verbs ergeren (‘to annoy’), interesseren (‘to interest’), storen (‘to disturb’) and verbazen (‘to amaze’).
(1) Reflexive construction (experiencer-subject)
Daar erger ik me groen en geel aan. (CGN)
There annoy I myself green and yellow to
‘That greatly annoys me.’
(2) Transitive construction (stimulus-subject)
Dit […] ergerde de Romeinen mateloos. (ConDiv)
This […] annoyed the Romans excessively
‘This […] excessively annoyed the Romans.’
II. HYPOTHESES
A. Agentivity hypothesis
The agentivity hypothesis is put forward, be it in varying forms, in quite different theoretical frameworks (a.o. Dowty 1991; Langacker 1995; Pesetsky 1995). It may be summarized as follows.
For mental states or events, it is not always clear which of the participants, i.e. the stimulus or experiencer, is more agentive. This causes variation in argument realization. The more agentive participant is assigned subject position.
This hypothesis may operate at two levels. At the type level, the agentivity hypothesis states that verbs whose lexical meaning attributes a more agentive role to the experiencer, will be more compatible with experiencer-subject constructions. The operationalization of the agentivity hypothesis at this level is taken over from Van de Velde (2004: 53–55) and embodied by the variable Verb. This operationalization leads us to consider interesseren (‘to interest’) to entail the most agentive experiencer, followed by either ergeren (‘to annoy’) or storen (‘to disturb’), and finally verbazen (‘to amaze’). Preference for the transitive construction is therefore expected to rise from interesseren to either ergeren or storen and finally to verbazen.
The second level is the token level. Here, the agentivity hypothesis predicts that given a particular utterance, the language user will put the currently most agentive participant in subject position. The operationalization at the token level is taken over from Levin and Grafmiller (2012) and embodied by the variable Stimulus-Animacy. It predicts that utterances with animate stimuli will prefer the transitive construction, while inanimate objects, especially abstract entities, will prefer the reflexive construction.
B. Etymology hypothesis
The etymology hypothesis is inspired on Klein and Kutscher (2005), who posit that it’s not the psychological meaning of psych verbs that determines their argument construction, but rather their (ties with a former) physical meaning. Etymological inquiry led us to suspect that storen most strongly favors the transitive construction, followed by either ergeren (‘to annoy’) or verbazen (‘to amaze’), and finally interesseren (‘to interest’).
C. Topicality hypothesis
The topicality hypothesis is operationalized through the variables Stimulus- and Experiencer-Topicality. These variables present a scale ranging from the first and second persons, to the third person pronouns, the definite nouns and the indefinite nouns. It is expected that preference for object position rises as we go to the end of this scale.
III. RESULTS
All instances of the four verbs were extracted from the Corpus of Spoken Dutch (CGN, Oostdijk et al. 2002) and the ConDiv corpus (Grondelaers et al. 2000). These instances were manually checked, and a number of them had to be excluded. The resulting dataset contained 1810 occurrences, which were tagged for the hypothesis-driven variables presented above, and number of nuisance variables. Next, a logistic regression model was composed using a stepwise variable selection procedure. The hypothesis-driven variables turned out to be the most important predictors in the model. Their effect plots can be found in Figure 1.
The variable Verb does not confirm the agentivity hypothesis at the type-level, nor the etymology hypothesis. Conversely, the variable Stimulus-Animacy does more or less confirm the animacy hypothesis at the token level. The topicality hypothesis is confirmed by Stimulus-Topicality, but Experiencer-Topicality behaves exactly opposite to what was predicted. However, we will show that in retrospect, such behavior might not be as aberrant as it appears on first sight.
IV. CONCLUSIONS
To end with, we shortly summarize the relevance of this study for theories of argument realization. First, the study has shown that inter- and intralingual generalizations such as the agentivity and topicality hypothesis definitely seem possible (cf. Levin and Rappaport Hovav 2005). Second, our failure to confirm the type level agentivity hypothesis means that caution may be in order when applying the agentivity hypothesis too rigidly at the type level. Finally, the confirmation of the token level agentivity hypothesis seems to indicate that argument constructions do seem to add meaning to utterances, separately from the meaning of the verb (Goldberg 1995; Colleman and De Clerck 2009).
In most present-day Germanic languages, the weak inflection (work-worked) offers a well-established and regular strategy for past tense formation. In contrast, the strong inflection (sing-sang) currently seems no more than a diminishing rubble of sub-rules and irregularities (Harbert, 2007, p. 277).
Still, things were once different. Language reconstruction shows that around the time of the birth of the weak-inflection, the strong inflection is likely to have been both clearly regular and dominant in frequency (Bailey, 1997). To explain the conundrum of how a nascent weak dental suffix could have possibly gained the upper hand in such a hostile environment, researchers usually refer to sound changes undermining the regularity of the strong system (Bailey, 1997, p. 17; Ball, 1968, p. 164). We will claim that this assumption is not needed. Instead, the rise of the weak inflection may be initially caused by nothing more than its general applicability, i.e. its ability to be – in principle – applied to any verb. In addition, this general applicability proves capable of explaining that the rise of the weak inflection (i) first affects low frequency verbs, and only later high frequency verbs, and (ii) more heavily affects particular ablaut classes than others. In concert, these effects may create the conditions in which a perfectly functioning strong ablaut system can be surrendered to the disruptive influence of sound changes without causing a problem to the language users.
2. Model design and behavior
We ran an agent-based model (Gilbert, 2008), containing the following features:
• There are no irregular verbs, nor ways for verbs or ablaut classes to become irregular.
• The weak dental suffix starts out inferior in both type and token frequency to each individual strong ablaut class.
• All verbs in the model can be conjugated both strongly and weakly.
• The only difference between the strong ablaut classes and weak dental suffix lies in the dental suffix’s general applicability.
• The agents do not show any (socially attributed) preference for one of the variants, neither in acquisition nor use. Instead, the simply prefer the variant that they more often hear.
• Agents age and are gradually replaced.
• The verbs show a realistic, Zipfian frequency distribution (Zipf, 1932).
Under these conditions, it is shown that a gradual rise of the weak dental suffix will take place, first attacking the low-frequency verbs and the low-frequency ablaut classes. Highly frequent ablaut classes prove capable of protecting their low-frequent members against weakening. These effects emerge independently of specific parameter settings.
Acknowledgements
We would like to thank Remi van Trijp for useful comments about the model.
Earlier computational models of this competition have either focused exclusively on language acquisition (Rumelhart and McClelland 1986; Pinker and Prince 1988; Marcus et al. 1995; Taatgen and Anderson 2002), on the role of acquisition in language change when the weak inflection was already well-established (Hare and Elman 1995; Yang 2002), or have explicitly disregarded the regularity of the strong system (Colaiori et al. 2015; Pijpops and Beuls subm.). However, these models don’t address how a nascent weak inflection could have possibly gained enough momentum to overthrow a both regular and dominant strong system.
To explain this enigma, several proposals have been put forward in the historical literature (Ball 1968: 164; Bailey 1997: 17). The first states that the dental suffix is in principle applicable to all verbs, while each separate strong vowel alternation is not. That is, the strong system presents a broken formation of several vowel alternations against a single dental suffix. The second holds that even as a whole, the strong system was not applicable to some particular verbs, which would then create a safe nest for the weak inflection to mature. The last posits that the regularity of the strong system was being undermined by sound changes, allowing the weak inflection to take advantage of the created irregularities.
To investigate whether these causes can indeed be responsible for the ascent of the weak inflection, an agent-based model was created in which the focus lies on language use rather than acquisition (cf. Croft 2000; Bybee 2010). The model has been integrated into the Babel2-framework (Loetzsch et al. 2008), and the competing constructions have been implemented in the Fluid Construction Grammar formalism (Steels 2011; van Trijp et al. 2012). The model’s behavior showed that, in the long run, the first explanation alone already suffices to explain the rise of the weak inflection, even if each separate vowel alternation starts out more frequent than the weak inflection. This finding of course does not say that the second and third proposals did not help in creating more optimal conditions for the weak inflection to start its ascent. It does mean, however, that the disintegration of the strong system might be the result and subsequent catalyst rather than the original cause of the rise of the weak inflection. That is, perhaps both are related through a push chain, rather than a drag chain.