Outline

Regression Analysis of Lexical and Morpho-Syntactic Properties of Kiezdeutsch

Diego Frassinelli

2021

visibility

…

description

7 pages

Abstract

Kiezdeutsch is a variety of German predominantly spoken by teenagers from multi-ethnic urban neighborhoods in casual conversations with their peers. In recent years, the popularity of Kiezdeutsch has increased among young people, independently of their socio-economic origin, and has spread in social media, too. While previous studies have extensively investigated this language variety from a linguistic and qualitative perspective, not much has been done from a quantitative point of view. We perform the first large-scale data-driven analysis of the lexical and morpho-syntactic properties of Kiezdeutsch in comparison with standard German. At the level of results, we confirm predictions of previous qualitative analyses and integrate them with further observations on specific linguistic phenomena such as slang and self-centered speaker attitude. At the methodological level, we provide logistic regression as a framework to perform bottom-up feature selection in order to quantify differen...

Key takeaways
AI

This study introduces logistic regression for quantifying lexical and morpho-syntactic differences in Kiezdeutsch.
Kiezdeutsch shows significant syntactic variations, including verb-first declaratives and absence of determiners.
Three studies analyze unigram and trigram POS distributions, revealing distinctive patterns between Kiezdeutsch and standard German.
The KiDKo corpus contains 63,604 sentences from Kiezdeutsch, while GRAIN has 14,097 sentences from standard German.
Findings support previous qualitative claims while highlighting Kiezdeutsch's unique linguistic features and slang.

Regression Analysis of Lexical and Morpho-Syntactic Properties of Kiezdeutsch Diego Frassinelli1 , Gabriella Lapesa2 , Reem Alatrash2 , Dominik Schlechtweg2 , Sabine Schulte im Walde2 1 Department of Linguistics, University of Konstanz 2 Institute for Natural Language Processing, University of Stuttgart 1 [email protected] 2 [email protected] Abstract Recent studies have shown that the stylistic ele- Kiezdeutsch is a variety of German predomi- ments of Kiezdeutsch have spread in the repertoires nantly spoken by teenagers from multi-ethnic of many young German speakers without immi- urban neighborhoods in casual conversations grant background (Freywald et al., 2011; Stevenson with their peers. In recent years, the popular- et al., 2017). At the syntactic level, the main dif- ity of Kiezdeutsch has increased among young ferences with standard German are (see examples people, independently of their socio-economic below): bare noun phrases (1) lacking determin- origin, and has spread in social media, too. ers or (2) lacking prepositions; (3) lack of copula While previous studies have extensively inves- tigated this language variety from a linguistic verbs; (4) verb-first declaratives; and (5) subject- and qualitative perspective, not much has been verb-object (SVO) word order in sentences begin- done from a quantitative point of view. ning with an adverb. We perform the first large-scale data-driven analysis of the lexical and morpho-syntactic 1. Hast du Problem? (vs. Hast du ein Problem?) properties of Kiezdeutsch in comparison with Have you problem? (Do you have a problem?) standard German. At the level of results, we confirm predictions of previous qualitative 2. Ich geh Kino. (vs. Ich gehe ins Kino.) analyses and integrate them with further obser- vations on specific linguistic phenomena such I go cinema. (I go to the cinema.) as slang and self-centered speaker attitude. At the methodological level, we provide logistic (Wiese and Pohle, 2016) regression as a framework to perform bottom- up feature selection in order to quantify differ- 3. Er aus Kreuzberg. (vs. Er ist aus Kreuzberg.) ences across language varieties. He from Kreuzberg. (He is from Kreuzberg.) 1 Introduction 4. Wollte ich keine Hektik machen da drinne. (vs. Ich wollte keine Hektik machen da Over the past 50 years, Europe has seen a sub- drinne.) stantial increase in the number of immigrants and Wanted I no hectic make there inside. in the diversity of their origin. A direct conse- (I didn’t want to make any hectic in there.) quence of this situation is the rise of the so-called ”Urban Youth Languages” (Wiese, 2017): specific 5. Jetzt ich bin 18. (vs. Jetzt bin ich 18.) linguistic practices used by young people in multi- ethnic urban areas. One example of urban youth Now I am 18. (Now, I am 18.) languages is Kiezdeutsch (’hood German’), which is a linguistic variety of German spoken primar- In previous work, researchers have studied var- ily by teenagers from multi-ethnic urban neigh- ious linguistic aspects of Kiezdeutsch focusing borhoods in casual conversations with their peers. on either qualitative analyses (Tertilt, 1996; Auer, Kiezdeutsch first appeared over 30 years ago and 2003; Keim and Knöbl, 2011; Wiese et al., 2009; has since then developed systematic linguistic struc- Wiese, 2012, 2013; te Velde, 2017; Preseau, 2018) tures that identify it as an independent variety of or small-scale quantitative analyses (Fuchs et al., German (Wiese et al., 2009). 2010; Jannedy, 2010; Wiese and Pohle, 2016). 21 Proceedings of the 8th VarDial Workshop on NLP for Similar Languages, Varieties and Dialects, pages 21–27 April 20, 2021 ©2021 Association for Computational Linguistics In this work we suggest logistic regression conclusions were in agreement with studies by as a general framework to perform bottom-up Freywald et al. (2011) who considered Kiezdeutsch data-driven feature selection in order to quantify a multi-ethnolect, and by Wiese (2013) who cate- differences across language varieties. We use gorized Kiezdeutsch as an urban dialect. Kiezdeutsch as a test case by comparing against The introduction of a corpus of spoken standard German, and we deliberately select sim- Kiezdeutsch (Rehbein et al., 2014) led to research ple lexical and morpho-syntactic features that can across linguistic levels. For example, te Velde easily be obtained from standard part-of-speech (2017) investigated phonological form using syntax (POS) taggers and lemmatisers. In this vein, we of verb-second constructions found in the German present three studies: morpho-syntactic variation dialects Kiezdeutsch, Yiddish, Bavarian, Cimbrian, in terms of part-of-speech unigram distributions and colloquial German. More recently, the effects (Study 1) and in terms of part-of-speech trigram of English on Kiezdeutsch constructions were ex- distributions (Study 2); and lexical variation in the amined by Preseau (2018) who argued that it was usage of nouns and verbs (Study 3). necessary to reconsider Kiezdeutsch as a native dialect of German, given the role of English as a 2 Previous Work on Kiezdeutsch Lingua Franca (ELF) in urban Germany and the effect it has on Kiezdeutsch-speaking communities. In the mid-1990s the release of two books sparked interest in migrant varieties of German in Ger- The above-mentioned studies illustrate the broad many, a collection of semi-fictitious interviews spectrum of qualitative evidence and analyses on with young men from Turkish backgrounds living Kiezdeutsch. On the other hand, up to date only in Berlin (Zaimoglu, 1995), and a documentation a few studies have provided quantitative evidence of a Turkish youth gang regarding their daily activ- on Kiezdeutsch. One such study was conducted ities (Tertilt, 1996). Although these ethnographic- by Fuchs et al. (2010) who used a Gaussian mix- centered analyses were not mainly concerned with ture model to explore the durational properties of language change or variation, they brought to light the particle so in various prosodic positions within the notion of groups of young people living in the utterances of Kiezdeutsch speech. The authors sup- urban centres of Germany who had developed their ported their findings by predicting distinctions (e.g., own language practices as part of their identity. utterance-final and phrase-final) from text using With the turn of the century, more language- punctuation as marker. The work by Fuchs and her centered studies of this urban vernacular began to colleagues examined a single test case of a very appear. In an effort to gain a holistic understand- specific phonological phenomenon using audio sig- ing of Kiezdeutsch, Androutsopoulos (1998a,b, nals as the main source of information. Another 2001) used media text analyses, ethnographic ob- study by Jannedy (2010) complemented the work servations and interviews to analyze the speech by Fuchs and her colleagues by investigating the style of teenagers speaking Kiezdeutsch. He then usage patterns of the particle so using a contin- compiled a list of language features on the phono- gency table and χ2 tests. The usage-patterns of the logical/phonetic, lexical, and grammatical levels. same particle were also analyzed quantitatively by Moreover, Androutsopoulos studied the various Wiese (2012) using χ2 tests. While being close socio-cultural aspects of Kiezdeutsch and their ef- to our studies, the contributions by Jannedy and fects on the German language. Similarly, Auer Wiese focused on a single test case while we fo- (2003) and Keim and Knöbl (2011) identified fea- cus on large-scale analyses. Moreover, the studies tures of Kiezdeutsch through analyses of speech by Fuchs et al. and Jannedy used a closed-access sequences which were then linked to their social in- corpus of speech obtained from interviews with teractions and functions as well as their discourses, teenagers who speak Kiezdeutsch. in order to asses the social and linguistic effects The work by Wiese and Rehbein (2016) com- of these features on the German language. Their bined qualitative and quantitative analyses of sev- research suggested that Kiezdeutsch speakers ex- eral well-established phenomena in Kiezdeutsch hibited a high level of linguistic proficiency and with the aim of demonstrating the linguistic coher- communicative competence, thus contradicting pre- ence of this urban vernacular. In their top-down ap- vious views that grammatical simplifications were proach, Wiese and Rehbein started with predefined due to deficiency in language acquisition. These linguistic patterns and then performed a χ2 test on 22 the raw corpus frequencies of these patterns. The KidKo vs. GRAIN GRAIN was selected as the results pointed to systematic differences between corpus representative of standard (spoken) Ger- data from the sub-corpus of multi-ethnic speak- man because among the available spoken German ers and the sub-corpus of mono-ethnic speakers of corpora it is the most comparable to the KiDKo German. In contrast, our study takes a bottom-up for its size and its collection time frame (see approach and does not define features a priori, but above). Both corpora contain transcriptions of instead allows them to emerge in a data-driven fash- recorded speech, however the dialogues in KiDKo ion. Furthermore, we operate on a large scale and are spontaneous conversations about everyday top- consider all patterns that emerge. ics, whereas the dialogues in GRAIN are more To our knowledge, there have been no studies controlled in their content and setting. Moreover, on Kiezdeutsch which employ logistic regression speakers of Kiezdeutsch are teenage students, while models to gather large-scale evidence regarding the the speakers in the GRAIN corpus are adults hold- various claims and theories in the literature. ing professional roles. With respect to size, both corpora are relatively 3 Materials small, with KiDKo being one third bigger than GRAIN. The sentence length of the two corpora is In our study, we use two transcribed German spo- extremely different: the average sentence length in ken corpora: the KiDKo corpus containing dia- KidKo (8.8 tokens/sentence) is much shorter than logues in Kiezdeutsch, and the GRAIN corpus con- the one in GRAIN (26.7 tokens/sentence). As basis taining radio interviews in standard German. for comparison, we extracted the same number of n-grams (unigrams and trigrams) from GRAIN and KiDKo The KiezDeutsch Korpus (KiDKo, KiDKo using a stratified sampling algorithm (Levy Rehbein et al. (2014)) is a collection of casual ev- and Lemeshow, 2013). In this way, we created a eryday conversations between teenagers (14-17yo) basis for lexical and morpho-syntactic analyses on from multi-ethnic and mono-ethnic communities in the individual token level and on a token-sequence Berlin. The collection took place from 2008 until level, while maintaining unchanged the underly- 2015 using self-recordings in the absence of adults ing distribution of the respective n-grams from the and non-members of their social group. In order original corpora. to capture the most salient emerging properties in such a dynamic language variety, in our studies we 4 Logistic Regression Analyses focus on the multi-ethnic sub-corpus (Rehbein and Schalowski, 2013). In total this part of the corpus To identify the most distinctive features in the two contains the transcription of 43 hours of conversa- corpora we use logistic regression models. In all tions with a total of 63,604 sentences (359,000 nor- the models reported below, we predict the categor- malised tokens). Part-of-speech tagging has been ical variable corpus type (KidKo vs. GRAIN) performed with a tagger developed specifically for using as predictor the presence/absence of one fea- KiezDeutsch by Rehbein et al. (2014), based on ture at a time. Running a unique model including a version of the Stuttgart-Tübingen tagset (STTS) all the lexicalised features would lead to conver- augmented by 11 additional tags tailored to spoken gence issues; for this reason, we do not use a bag- German. of-feature classifier approach consistently in all three studies. GRAIN The German RAdio INterviews corpus After fitting the model, we take the z-score cor- (GRAIN, Schweitzer et al. (2018)) is a collection responding to the predicted variable. In logistic of interviews broadcast on the German public radio. regression, a z-score is the ratio of the coefficient The hosts from the radio interviews are profession- estimate divided by its standard error. The larger als talking about social and political topics (e.g., a the z-score, the less uncertain the prediction is and, chairman of a council talking about city pollution). consequently, the stronger the difference between In total, 14,097 sentences (221,000 tokens) have the feature in the two corpora. Compared to fre- been extracted from 23 hours of recordings. The quency analysis or more traditional estimates like materials have been automatically pos-tagged with χ2 , the analysis of the z-scores conveys richer in- the Tree Tagger (Schmid, 1994) and according to formation: the sign of the z-score indicates the the STTS tagset. direction of the effect if the feature is more predic- 23 tive of the KidKo (positive sign) or of the GRAIN GRAIN KidKo corpus (negative sign); moreover, its absolute value POS z-sc. POS z-sc. is directly related to the level of uncertainty in- DET -54.27 PRT 59.32 volved in the prediction: larger numbers indicate NOUN -44.10 PRON 37.50 more reliable predictions. ADP -38.45 ADV 22.10 For this reason, we systematically look at the CONJ -18.17 VERB 21.92 largest positive and negative z-scores in the anal- ADJ -11.67 NUM 7.30 yses as the most informative ones. In such way, Table 1: Distribution of z-scores for each coarse- we filter out features that show a comparable dis- grained POS for standard German (negative) vs. tribution in both corpora and consequently have no Kiezdeutsch (positive). discriminative power. Finally, by looking at the z- score values it is possible to detect if the probability of selecting one of the two corpora is significantly Study 2: Trigram POS Analysis In this study different from zero (i.e., p-value < 0.001). In or- we analyse the distribution of trigrams of conse- der to reduce type I errors (false positives due to quent POS (e.g., DET+ADJ+NOUN) that we ex- chance), we correct the alpha values by dividing tracted from each sentence in the two corpora. In our significance threshold (0.001) by the total num- this way we approach syntactic structural differ- ber of models we run. ences in the language varieties, while still relying on simple POS information. In total we have 1,245 5 Studies and Results trigram types and, consequently, we run 1,245 lo- gistic regression models where we predict corpus Study 1: Unigram POS Analysis The aim of type using each of those trigrams as binary pre- this first study is to compare the unigram POS dis- dictors (0/absence vs. 1/presence of each trigram). tributions in the two corpora. We run 10 logistic Significant level is reached when the z-score is regression models predicting corpus type given the larger than ± 3.2 (p-value < 0.0008). presence/absence of each POS (such as NOUN). Table 2 lists the most predictive trigrams for Given the extreme granularity of the POS types GRAIN (left) and for KidKo (right). Overall, 178 in the original collections, we decided to use a trigrams are highly significant for GRAIN and 181 coarse-grained classification of 10 POS types only, for KidKo. Three trigrams of POS have an ex- encoding the word class but not the inflectional cat- tremely strong predictive power for Kiezdeutsch: egories: nouns (NOUN), pronouns (PRON), verbs PRON+VERB+PRON, PRON+VERB+ADV, and (VERB), adverbs (ADV), adpositions (ADP), de- VERB+PRON+ADV. In line with the evidence terminers (DET), conjunctions (CONJ), adjectives from Study 1, we see how trigrams of POS in- (ADJ), particles (PRT), and numerals (NUM).1 Be- volving verbs and pronouns predominate in KidKo, sides the fact that it alleviates sparsity, such coarse- while nouns and determiners are more predic- grained approach allows us to uncover differences tive of GRAIN. The clear preference for pro- between two corpora that are systematic across nouns in Kiezdeutsch, as opposed to nouns, can classes and go beyond the idiosyncratic use of ex- be explained by the topics of spontaneous speech tremely corpus-specific tags. being much more related to conversations in- Table 1 reports the z-scores associated to each volving self-reference and reference to further POS. Across the ten POS tags under analysis, five actors present in the scene. Corpus examples are significantly more predictive of GRAIN (neg- for the three most predictive KidKo trigrams in ative values) and five of KidKo (positive values). this respect are ich habe deine ’I have yours’ These results are in line with previous qualitative (PRON+VERB+PRON); wir reden hier ’we talk studies (Wiese and Pohle, 2016): determiners are here’ (PRON+VERB+ADV); and machen wir jetzt used significantly less in Kiezdeutsch compared to ’do we now’ (VERB+PRON+ADV). Nouns, on the standard German (see Example (2)); similarly, ad- other side, are essential when referring to events far positions (mainly prepositions) are much less used from the proximity of the speech act, as in political by teenagers than adults (see Example (1)). interviews, e.g., Gestaltung des Lebens ’shaping 1 of life’ (NOUN+DET+NOUN); ein Einsatz in ’a For the full set of STTS part-of-speech tags, see https://www.ims.uni-stuttgart.de/en/ mission in’ (DET+ADP+NOUN); and Menschen research/resources/lexica/germantagsets/. in Sorge ’humans in fear’ (NOUN+ADP+NOUN). 24 GRAIN KidKo POS z-score POS z-score NOUN+DET+NOUN -25.08 PRON+VERB+PRON 37.79 DET+NOUN+ADP -23.62 PRON+VERB+ADV 33.92 NOUN+ADP+NOUN -22.99 VERB+PRON+ADV 29.47 NOUN+ADP+DET -22.96 VERB+PRON+PRON 21.79 DET+ADJ+NOUN -22.71 PRON+PRON+VERB 19.78 ADP+DET+NOUN -21.67 VERB+ADV+ADV 19.54 DET+NOUN+DET -19.07 VERB+PRON+PRT 19.39 ADJ+NOUN+VERB -18.33 PRON+VERB+PRT 19.25 ADP+DET+ADJ -17.88 VERB+ADV+PRT 18.03 ADJ+NOUN+ADP -17.64 PRON+VERB+ADJ 17.21 Table 2: Distribution of z-scores of the most predictive POS trigrams for GRAIN (left) vs. KidKo (right). Figure 1 shows the distributions of POS trigrams Study 3: Noun and Verb Distributions In this sorted by their relative frequencies. As we can final study we analyse the distributions of nouns see, even though both lines follow a Zipfian distri- and verbs in the two corpora (Tables 3 and 4). bution, in KidKo there are three trigrams that are When looking at the most predictive verbs for each much more frequent than all the rest; we also see a corpus, we find that GRAIN contains verbs which longer tail indicating a higher number of trigrams are part of more complex structures (such as modal occurring only once. Moreover, if we look at mid- structures requiring the presence of an infinitive frequent trigrams (Figure 1), we see how the slope form), and more formal nouns. in the distribution from KidKo is much steeper than On the other hand, in KidKo verbs of needing, the one from GRAIN. KidKo thus shows a higher having, existence and obligation are the most pre- number of extremely frequent and extremely rare dictive ones, together with nouns referring to typ- trigrams indicating the more idiosyncratic nature of ical topics for young people (school, home, fun, Kiezdeutsch. On the other hand, in GRAIN we can games). Important to highlight is the extreme pre- find more mid-frequency trigrams indicating the dictive nature of Alter as the most frequent form of more standardised nature of the variety of German addressing among members of the younger genera- used in this corpus. tion: what the regression model has picked up here is a clear case of slang. Once more, such distributions highlight the more 5e+05 KiDko self-centered type of conversation among teenagers GRAIN with topics related to everyday life events that affect 4e+05 them, thus directly showing the simplified nature of KidKo. On the other hand, GRAIN shows the 3e+05 usage of more formal and detached forms (human, Relative Frequency question, topic). 2e+05 6 Conclusion 1e+05 The aim of this study was to introduce logistic re- gression as a general framework to perform bottom- 0e+00 up data-driven feature selection in order to quantify 0 200 400 600 800 1000 1200 differences across language varieties. We applied Rank the framework to Kiezdeutsch in comparison to standard German as a test case, which allowed us Figure 1: Overall distributions of POS trigrams sorted to identify significant differences at the level of by relative frequency, accompanied by zoom into mid- part-of-speech, part-of-speech sequences, as well values frequency scores. as lexical choices. 25 GRAIN z-score KidKo z-score Menschen ‘humans’ -8.23 Alter ‘age’ 11.19 Frage ‘question’ -6.23 Schule ‘school’ 8.11 Thema ‘topic’ -6.20 Euro ‘euro’ 7.85 Land ‘country’ -6.02 Stunden ‘hours’ 7.33 Herr ‘Mr.’ -5.99 Spiel ‘game’ 7.23 Prozent ‘percent’ -5.65 Hause ‘home’ 7.17 Europa ‘Europe’ -5.31 Ahnung ‘idea’ 6.88 Jahren ‘years’ -4.70 Spaß ‘fun’ 6.72 Gesellschaft ‘society’ -4.29 Minuten ‘minutes’ 6.34 Ende ‘end’ -4.07 Mal ‘times’ 6.20 Table 3: The 10 most predictive nouns in GRAIN (left) vs. KidKo (right) with the corresponding z-scores. GRAIN z-score KidKo z-score habe ‘have’ 20.13 werden ‘will be’ -16.81 war ‘was’ 11.72 haben ‘have’ -13.68 weiß ‘know’ 10.84 wird ‘will’ -12.61 gesehen ‘seen’ 6.58 sind ‘are’ -12.35 warte ‘wait’ 6.27 müssen ‘must’ -12.27 mache ‘make’ 6.18 gibt ‘give’ -11.34 gesagt ‘said’ 6.18 können ‘can’ -9.30 bin ‘am’ 6.09 wollen ‘want’ -8.69 mach ‘make’ 6.05 brauchen ‘need’ -7.08 gemacht ‘made’ 5.90 sagen ‘say’ -7.03 Table 4: The 10 most predictive verbs in GRAIN (left) vs. KidKo (right) with the corresponding z-scores. Our results show consistent trends: on the one The relevance of the framework is not limited to hand, we confirm the predictions drawn from the the sociolinguistic issues we address: it also pro- theoretical literature; on the other hand, our auto- poses a robust strategy to select distinctive features matic bottom-up process results in a multi-faceted and to demonstrate their use in concrete feature set of observations including slang, specific topics selection settings. Kiezdeutsch is a spoken variety, and reporting attitudes. Our studies thus confirm but we expect its most salient features to emerge our framework as a useful tool to detect and quan- also in less controlled varieties of written language, tify language variation properties, while relying posing a significant challenge to NLP tools devel- on simple and easy-to-obtain lexical and morpho- oped for standard German. From this perspective, syntactic features. our work has a straightforward application in social- media use-cases, for example in the detection and Current work targets both the scope of handling of German/Kiezdeutsch code-switching the experiments and the methodological in- on Twitter and forums. vestigation. We are further experimenting with the introduction of semi-lexicalised pat- Acknowledgments terns (e.g., PRON+VERB+Kino, ’cinema’ vs. We thank the three anonymous reviewers for the PRON+VERB+Schule, ’school’) in the regression very detailed and to-the-point comments, and ac- to investigate whether specific syntactic patterns knowledge funding from the following institutions: are more salient in certain domains, e.g., leisure vs. G. Lapesa (German Ministry of Education and Re- non-leisure activities. Methodologically, we plan search, project E-DELIB); R. Alatrash (CRETA to support our insights with a thorough comparison center funded by the German Ministry for Edu- with other feature selection methodologies, such as cation and Research); D. Schlechtweg (CRETA random forests Tagliamonte and Baayen (2012). center and Konrad Adenauer Foundation). 26 References Katrin Schweitzer, Kerstin Eckart, Markus Gärtner, Ag- nieszka Falenska, Arndt Riester, Ina Rösiger, An- Jannis Androutsopoulos. 1998a. Deutsche Jugend- tje Schweitzer, Sabrina Stehwien, and Jonas Kuhn. sprache: Untersuchungen zu ihren Strukturen und 2018. German radio interviews: The GRAIN re- Funktionen. Peter Lang, Frankfurt am Main. lease of the SFB732 Silver Standard Collection. In Proceedings of the 11th International Conference on Jannis Androutsopoulos. 1998b. Forschungsperspek- Language Resources and Evaluation. tiven auf Jugendsprache: Ein integrativer Überblick. In Jannis Androutsopoulos and Arno Scholz, editors, Patrick Stevenson, Kristine Horner, Nils Langer, and Jugendsprache – Langue des Jeunes – Young Peo- Gertrud Reershemius. 2017. The German-speaking ple’s Language. Peter Lang, Frankfurt am Main. world: A practical introduction to sociolinguistic is- sues. Routledge. Jannis Androutsopoulos. 2001. Ultra korregd Alder! Zur medialen Stilisierung und Aneignung von Sali A. Tagliamonte and R. Harald Baayen. 2012. Mod- ’Türkendeutsch’. Deutsche Sprache, 29:321–339. els, forests, and trees of York English: Was/were variation as a case study for statistical practice. Lan- Peter Auer. 2003. ”Türkenslang”: Ein jugendsprach- guage Variation and Change, 24(2):135–178. licher Ethnolekt des Deutschen und seine Trans- formationen. In Annelies Häcki-Buhofer, edi- Hermann Tertilt. 1996. Turkish Power Boys. Ethnogra- tor, Spracherwerb und Lebensalter, pages 255–264. phie einer Jugendbande. Suhrkamp, Frankfurt am Tübingen: Francke. Main. Ulrike Freywald, Katharina Mayr, Tiner Özçelik, and John R. te Velde. 2017. German V2 and the PF- Heike Wiese. 2011. Kiezdeutsch as a multiethnolect. interface: Evidence from dialects. Journal of Ger- Ethnic Styles of Speaking in European Metropolitan manic Linguistics, 29(2):147–194. Areas, pages 45–73. Heike Wiese. 2012. Kiezdeutsch: Ein neuer Dialekt Susanne Fuchs, Jelena Krivokapic, and Stefanie entsteht. C.H. Beck. Jannedy. 2010. Prosodic boundaries in German: Final lengthening in spontaneous speech. Journal Heike Wiese. 2013. What can new urban dialects tell of the Acoustical Society of America, 127(3):1851– us about internal language dynamics? The power of 1851. language diversity. Linguistische Berichte, 19:208– 245. Stefanie Jannedy. 2010. The usage and distribution of Heike Wiese. 2017. Urban contact dialects. In Sa- ”so” in spontaneous Berlin Kiezdeutsch. ZASPiL Pa- likoko S. Mufwene and Anna Maria Escobar, editors, pers from the Linguistics Laboratory, 43(52). Cambridge Handbook of Language Contact. Cam- bridge: Cambridge University Press. Inken Keim and Ralf Knöbl. 2011. Linguistic vari- ation and linguistic virtuosity of young ”ghetto”- Heike Wiese, Ulrike Freywald, and Katharina Mayr. migrants in Mannheim. In Friederike Kern and Mar- 2009. Kiezdeutsch as a test case for the interaction gret Selting, editors, Ethnic Styles of Speaking in Eu- between grammar and information structure. Inter- ropean Metropolitan Areas, pages 239–264. Amster- disciplinary Studies on Information Structure. Work- dam: Benjamins. ing Papers of the SFB 632, 12. Paul S. Levy and Stanley Lemeshow. 2013. Sampling Heike Wiese and Maria Pohle. 2016. ”Ich geh Kino” of populations: Methods and applications. John Wi- oder ”... ins Kino”? Zeitschrift für Sprachwis- ley & Sons. senschaft, 35(2):171–216. Lindsay Preseau. 2018. Kiezdeutsch, Kiezenglish: En- Heike Wiese and Ines Rehbein. 2016. Coherence in glish in German Multilingual/-ethnic Speech Com- new urban dialects: A case study. Lingua, 172:45– munities. Ph.D. thesis, UC Berkeley. 61. Ines Rehbein and Sören Schalowski. 2013. STTS goes Feridun Zaimoglu. 1995. Kanak Sprak. Rotbuch Ver- Kiez–Experiments on annotating and tagging urban lag, Berlin. youth language. Journal for Language Technology and Computational Linguistics, 28(1). Ines Rehbein, Sören Schalowski, and Heike Wiese. 2014. The KiezDeutsch Korpus (KiDKo) Release 1.0. Helmut Schmid. 1994. Probabilistic part-of-speech tagging using decision trees. In International Con- ference on New Methods in Language Processing, pages 44–49, Manchester, UK. 27

References (24)

Jannis Androutsopoulos. 1998a. Deutsche Jugend- sprache: Untersuchungen zu ihren Strukturen und Funktionen. Peter Lang, Frankfurt am Main.
Jannis Androutsopoulos. 1998b. Forschungsperspek- tiven auf Jugendsprache: Ein integrativer Überblick. In Jannis Androutsopoulos and Arno Scholz, editors, Jugendsprache -Langue des Jeunes -Young Peo- ple's Language. Peter Lang, Frankfurt am Main.
Jannis Androutsopoulos. 2001. Ultra korregd Alder! Zur medialen Stilisierung und Aneignung von 'Türkendeutsch'. Deutsche Sprache, 29:321-339.
Peter Auer. 2003. "Türkenslang": Ein jugendsprach- licher Ethnolekt des Deutschen und seine Trans- formationen. In Annelies Häcki-Buhofer, edi- tor, Spracherwerb und Lebensalter, pages 255-264. Tübingen: Francke.
Ulrike Freywald, Katharina Mayr, Tiner Özc ¸elik, and Heike Wiese. 2011. Kiezdeutsch as a multiethnolect. Ethnic Styles of Speaking in European Metropolitan Areas, pages 45-73.
Susanne Fuchs, Jelena Krivokapic, and Stefanie Jannedy. 2010. Prosodic boundaries in German: Final lengthening in spontaneous speech. Journal of the Acoustical Society of America, 127(3):1851- 1851.
Stefanie Jannedy. 2010. The usage and distribution of "so" in spontaneous Berlin Kiezdeutsch. ZASPiL Pa- pers from the Linguistics Laboratory, 43(52).
Inken Keim and Ralf Knöbl. 2011. Linguistic vari- ation and linguistic virtuosity of young "ghetto"- migrants in Mannheim. In Friederike Kern and Mar- gret Selting, editors, Ethnic Styles of Speaking in Eu- ropean Metropolitan Areas, pages 239-264. Amster- dam: Benjamins.
Paul S. Levy and Stanley Lemeshow. 2013. Sampling of populations: Methods and applications. John Wi- ley & Sons.
Lindsay Preseau. 2018. Kiezdeutsch, Kiezenglish: En- glish in German Multilingual/-ethnic Speech Com- munities. Ph.D. thesis, UC Berkeley.
Ines Rehbein and Sören Schalowski. 2013. STTS goes Kiez-Experiments on annotating and tagging urban youth language. Journal for Language Technology and Computational Linguistics, 28(1).
Ines Rehbein, Sören Schalowski, and Heike Wiese. 2014. The KiezDeutsch Korpus (KiDKo) Release 1.0.
Helmut Schmid. 1994. Probabilistic part-of-speech tagging using decision trees. In International Con- ference on New Methods in Language Processing, pages 44-49, Manchester, UK.
Katrin Schweitzer, Kerstin Eckart, Markus Gärtner, Ag- nieszka Falenska, Arndt Riester, Ina Rösiger, An- tje Schweitzer, Sabrina Stehwien, and Jonas Kuhn. 2018. German radio interviews: The GRAIN re- lease of the SFB732 Silver Standard Collection. In Proceedings of the 11th International Conference on Language Resources and Evaluation.
Patrick Stevenson, Kristine Horner, Nils Langer, and Gertrud Reershemius. 2017. The German-speaking world: A practical introduction to sociolinguistic is- sues. Routledge.
Sali A. Tagliamonte and R. Harald Baayen. 2012. Mod- els, forests, and trees of York English: Was/were variation as a case study for statistical practice. Lan- guage Variation and Change, 24(2):135-178.
Hermann Tertilt. 1996. Turkish Power Boys. Ethnogra- phie einer Jugendbande. Suhrkamp, Frankfurt am Main.
John R. te Velde. 2017. German V2 and the PF- interface: Evidence from dialects. Journal of Ger- manic Linguistics, 29(2):147-194.
Heike Wiese. 2012. Kiezdeutsch: Ein neuer Dialekt entsteht. C.H. Beck.
Heike Wiese. 2013. What can new urban dialects tell us about internal language dynamics? The power of language diversity. Linguistische Berichte, 19:208- 245.
Heike Wiese. 2017. Urban contact dialects. In Sa- likoko S. Mufwene and Anna Maria Escobar, editors, Cambridge Handbook of Language Contact. Cam- bridge: Cambridge University Press.
Heike Wiese, Ulrike Freywald, and Katharina Mayr. 2009. Kiezdeutsch as a test case for the interaction between grammar and information structure. Inter- disciplinary Studies on Information Structure. Work- ing Papers of the SFB 632, 12.
Heike Wiese and Maria Pohle. 2016. "Ich geh Kino" oder "... ins Kino"? Zeitschrift für Sprachwis- senschaft, 35(2):171-216.
Heike Wiese and Ines Rehbein. 2016. Coherence in new urban dialects: A case study. Lingua, 172:45- 61. Feridun Zaimoglu. 1995. Kanak Sprak. Rotbuch Ver- lag, Berlin.

FAQs

What morphological differences were found between Kiezdeutsch and standard German?add

The analysis reveals that Kiezdeutsch frequently employs bare noun phrases and lacks copula verbs, which contrasts with standard German's syntactic structures.

How do logistic regression results inform understanding of Kiezdeutsch features?add

Logistic regression models showed positive predictive power for five POS types indicative of Kiezdeutsch, including pronouns and verbs, while standard German was associated with nouns and determiners.

What role does lexical variation play in Kiezdeutsch compared to standard German?add

Kiezdeutsch features verbs related to obligation and existence, while standard German includes more formal nouns and complex structures, exemplifying different conversational contexts inherent to each variety.

What are the implications of slang usage in Kiezdeutsch from a sociolinguistic perspective?add

The presence of slang, such as 'Alter' for informal address, highlights the self-referential nature of Kiezdeutsch, reflecting the identity and daily experiences of its teenage speakers.

When was the KiDKo corpus collected and what distinguishes its composition?add

The KiDKo corpus was collected from 2008 to 2015 and consists of recordings of spontaneous conversations among teenagers from multi-ethnic communities, differing significantly from the controlled settings of standard German corpora.

Last updatedFebruary 03, 2025

Diego Frassinelli

University of Edinburgh, Graduate Student

Papers

Followers

View all papers from Diego Frassinelliarrow_forward

(PDF) Regression Analysis of Lexical and Morpho-Syntactic Properties of Kiezdeutsch

Regression Analysis of Lexical and Morpho-Syntactic Properties of Kiezdeutsch

Sign up for access to the world's latest research

Abstract

Key takeawaysAI

Related papers

References (24)

FAQs

Related papers

Related topics

Key takeaways
AI