Wiktionary:Language treatment requests -

Wiktionary Request pages (edit) see also: discussions
Requests for verification Requests for verification in the form of durably-archived attestations conveying the meaning of the term in question.	Requests for deletion Requests for deletion of pages in the main and Reconstruction namespace due to policy violations; also for undeletion requests.	Requests for deletion/Others add new request \| history Requests for deletion and undeletion of pages in other namespaces, such as appendices, templates and modules.	Language treatment requests add new request \| history Requests for changes to Wiktionary's language treatment practices, including renames, mergers and splits.
		Requests for moves, mergers and splits add new request \| history \| archives Discussion of proposed moves, mergers and splits of entries or other pages.	Category and label treatment requests add new request \| history Requests for changes to Wiktionary's categories or labels, including additions, deletions, renames, mergers and splits.
		Requests for cleanup add new request \| history \| archives Cleanup requests, questions and discussions.
`{{attention}}` • `{{rfap}}` • `{{rfdate}}` • `{{rfquote}}` • `{{rfdef}}` • `{{rfe}}` • `{{rfeq}}` • `{{rfex}}` • `{{rfi}}` • `{{rfp}}` • `{{rfref}}` • `{{rfscript}}` • `{{rftranslit}}` • `{{t-needed}}`

This is the page for proposing changes to Wiktionary's language treatment practices, including language renaming, merging and splitting.

Use this page if you want to propose a non-trivial change to:

For issues pertaining to a single language, such as orthography, start a conversation on the discussion page of the language considerations page (the so-called "About LANG" page), or the beer parlour if no such page exists.

Archiving: Language treatment requests, once closed and (if applicable) acted upon, are archived on Wikipedia-style archive subpages. These can be found at Wiktionary:Language treatment requests/Archives and in the list below:

As can be seen at w:Nkore-Kiga language, Kiga [cgg] should definitely be merged into Nyankore [nyn]. Unfortunately, this might require a rename to something that is both hyphenated and considerably less common that just plain "Nyankore" (though that is, strictly speaking, merely the name of the main dialect). —Μετάknowledge^{discuss/deeds} 05:21, 18 September 2016 (UTC)Reply

I'm not sure. WP suggests the merger was politically motivated, but many reference works do follow it. Ethnologue says there as "Lexical similarity [of] 78%–96% between Nyankore, Nyoro [nyo], and their dialects; 84%–94% with Chiga [cgg], [...and] 81% with Zinza [zin]" (Kiga, meanwhile, is said to be "77% [similar] with Nyoro [nyo]"), as if to suggest nyn is about as similar to cgg as to nyo, and indeed many early references treat Nkore-Nyoro like one language, where later references instead prefer to group Nkore with Kiga. Ethnologue mentions that some authorities merge all three into a "Standardized form of the western varieties (Nyankore-Chiga and Nyoro-Tooro) [...] called Runyakitara [...] taught at the University and used in internet browsing, but [it] is a hybrid language." (For comparison, Ethnologue says English has 60% lexical similarity to German.) - -sche (discuss) 00:16, 2 June 2017 (UTC)Reply

	Input needed
	This discussion needs further input in order to be successfully closed. Please take a look!

I propose that the Para-Romani lects Traveller Norwegian, Traveller Danish and Tavringer Swedish (rmg, rmd and rmu) be merged into Scandoromani. TN, TD and TS are almost identical, mostly differing in spelling (e.g. tjuro (Sweden) vs. kjuro (Norway) meaning 'knife', gräj vs. grei 'horse' etc.). WP treats them as variants of Scandoromani. My langcode proposal could be rom-sca, or maybe we could just use rmg, which already has a category. --176.23.1.95 20:19, 25 January 2017 (UTC)Reply

Im supporting it. Traveller Norwegian is sometimes referred to as Tavring, and, to be honest, Ive never herd nobody use the term Traveller Norwegian as a language. People are calling it rather Taterspråk or Fantemål, even when books states it as a derigatory therm. The other problem is that we've got in fact 2 differnet Norwegian Traveller languages (the Romani-based and the Månsing-based). So it look like a total mess rite now Tollef Salemann (talk) 07:55, 2 April 2023 (UTC)Reply

I don't think this makes sense if the orthographies are consistently different, which seems to be the case. Otherwise, we could use the same logic to merge quite a few of the Slavic languages, which obviously doesn't make sense. Theknightwho (talk) 13:43, 2 April 2023 (UTC)Reply

Ok, but Traveller Norwegian is not quite right term, cuz the Romani-based TN has two or more branches, which are quite different from eachother, while the main one is allmost the same as the Swedish and had often the same name(s). Meenwhile, there is also a Germanic TN version, unrelated to the Romani-ish TN variations. I mean, we need at least two more L2 in this case, even if we gonna merge TN and Swedish Tavring.

PS there are also Swedish stuff like Knoparmoj and Loffarspråk and more, and they still have remnants in some rare Swedish/Norwegian sociolects. Maybe they also need their L2? Or can we treat them as sociolects? Tollef Salemann (talk) 13:59, 2 April 2023 (UTC)Reply

The Yenish "language" (which we call Yeniche) was given the ISO code yec, despite being clearly not a separate language from German. Instead, it is a jargon which Wikipedia compares to Cockney (which has never had a code) and Polari (which had a code that we deleted in a mostly off-topic discussion). The case of Gayle, which is similar, is still under deliberation at RFM as of now. Most tellingly, German Wiktionary considers this to be German, and once we delete the code, we should make a dialect label for it and add the contents of de:Kategorie:Jenisch to English Wiktionary. @-sche —Μετάknowledge^{discuss/deeds} 00:49, 7 April 2017 (UTC)Reply

I don't see how that's most tellingly; I don't know about the German Wiktionary, but major language works frequently treat things as dialects of their language that outsiders consider separate languages.--Prosfilaes (talk) 03:01, 10 April 2017 (UTC)Reply

The (linked) English Wikipedia article even says "It is a jargon rather than an actual language; meaning, it consists of a significant number of unique specialized words, but does not have its own grammar or its own basic vocabulary." Despite the citation needed that follows, that sentence is about accurate, as such this should be deleted. -- Pedrianaplant (talk) 10:53, 30 April 2017 (UTC)Reply

(If kept, it should be renamed.)
There are those who argue that Yenish should have recognition (which it indeed gets, in Switzerland) as a separate language. And it can be quite divergent from Standard German, with forms that are as different as those of some of the regiolects we consider distinct. Many examples from Alemannic or Bavarian-speaking areas are better considered Alemannic or Bavarian than Standard German. But then, that's a sign that it is, as some put it, a cant overlaid onto the local grammar, rather than a language per se. Ehh... - -sche (discuss) 03:22, 9 July 2017 (UTC)Reply

Nahuatl is sometimes treated as a language, and sometimes as a family of languages. Right now, Wiktionary is treating it as both simultaneously, which doesn't make sense. "Nahuatl" should be removed as a language. --Lvovmauro (talk) 11:55, 30 August 2018 (UTC)Reply

I agree the current arrangement doesn't make sense; it is a relic of very early days on Wiktionary, and has persisted mostly because it's not entirely clear how intelligible the varieties are and hence whether it's better to lump them all into nah, or retire nah and separate everything. But enough varieties are not intelligible that I agree with retiring nah (or perhaps finally converting it to a family code). - -sche (discuss) 20:34, 31 August 2018 (UTC)Reply

I think a family code for Nahuan languages is really needed since there are many cases where we don't know specifically which variety a word was borrowed from. --Lvovmauro (talk) 09:55, 9 September 2018 (UTC)Reply

@Lvovmauro: OK, thanks to you and a few other editors, all words with ==Nahuatl== sections have been given more specific headers. However, as many as a thousand translations remain to be dealt with before the code can be made a family code and Category:Nahuatl language moved on over to Category:Nahuan languages. - -sche (discuss) 06:48, 19 September 2018 (UTC)Reply

A disturbingly large number of these translations are neologisms with no actual usage. Some of them don't even obey the rules of Nahuatl word formation. --Lvovmauro (talk) 11:03, 19 September 2018 (UTC)Reply

@Lvovmauro: Feel free to remove obvious errors / unattested neologisms. If a high proportion of the translations are bad, it might even be reasonable to start presuming they're bad and just removing them, since they already suffer from the problem of using an overbroad code. - -sche (discuss) 00:28, 21 October 2018 (UTC)Reply

Someone with more time on their hands than me at the moment will need to delete all the subcategories of Category:Nahuatl language, and then the category itself, in preparation for moving 'nah' from the language-code module to the family-code module so the categories won't be recreated by careless misuse of 'nah' in the labels etc of 'nci' entries. - -sche (discuss) 00:24, 21 October 2018 (UTC)Reply

Five years on, I've reviewed the situation here. There are no Nahuatl entries anymore, which is good progress. However, two pressing issues are stopping us from fully retiring this language code:

There are still about 450 "Nahuatl" (nah) translations in English entries. I suppose these need manual review. This should not be too difficult if one can find word lists for some of the best-attested Nahuatls.
Many languages have at least one word said to be derived from Nahuatl (presumably this is the word for "chocolate" in most cases). This could be solved by making Nahuatl an etymology-only language, or by changing these etymologies to refer generically to "a Nahuan language".

This, that and the other (talk) 09:25, 1 November 2023 (UTC)Reply

Mayo and Yaqui are mutually intelligible and sometimes considered to be a single language called Cahita. But their speakers apparently consider them to be distinct languages, and they have distinct ISO codes (mfy and yaq) and are currently treated distinctly by Wiktionary.

I'm not requesting that they be merged, but separating them is a problem because an important early source, the Arte de la lengua cahita conforme à las reglas de muchos peritos en ella (published 1737 but written earlier) treats them as a single language, and also includes an extinct dialect called Tehueco. I'd like to add words from the Arte but I can't list them specifically as either Mayo or Yaqui.

One solution would be treat to the language of the Arte as a distinct historical language, "Old Cahita", which would then be the ancestor of Mayo and Yaqui. The downside is there only seems to be one linguist currently using this name. --Lvovmauro (talk) 11:32, 4 November 2018 (UTC)Reply

On linguistic grounds, it seems like we should merge Yaqui and Mayo. Jacqueline Lindenfeld's 1974 Yaqui Syntax says "Yaqui and Mayo are sufficiently similar to be mutually intelligible", the Handbook of Middle American Indians says "the modern known representatives of Cahitan—Yaqui and Mayo—are mutually intelligible", and various more general references say "Yaqui and Mayo are mutually intelligible dialects of the Cahitan language", "The Yaqui and Mayo speak mutually intelligible dialects of Cahita". (There are political considerations behind the split, which a merger might upset, so adding Old Cahita would also work, but we have tended to be lumpers...) - -sche (discuss) 23:03, 18 November 2018 (UTC)Reply

I wouldn't object to merging them. --Lvovmauro (talk) 08:58, 19 November 2018 (UTC)Reply

"Classical Mongolian" refers to the literary language of Mongolia used from 17th to 19th century created through a language reform associated with increased Buddhist cultural production (this started in the 16th century, but language standardization took place later). In the 20th century, (outer) Mongolia became independent from China and later adopted a Cyrillic orthography based on the spoken language, while Inner Mongolia kept her Uyghur script.

The literary language of Inner Mongolia continues Classical Mongolian in terms of its orthography as well as most of its grammar (to an extent that Janhunen (?) calls the situation bilingual). Modern varieties, in both Outer and Inner Mongolia, have greatly expanded their lexicons through borrowing of modern terms, but they also both consider all of Classical Mongolian lexicon to be a part of their language, and will put it in their dictionaries, even transcribed into Cyrillic.

The actual problem I have with this division is that when it comes to borrowings from (Classical) Mongolian, we sometimes cannot ascertain whether they precede the 20th century or not, or more common still, we know they precede the 19th century (and post-date the 16th), but they obviously come from a spoken variety and not "Classical Mongolian" as a literary language. Crom daba (talk) 17:14, 15 November 2018 (UTC)Reply

Yes. I find it also strange that Wiktionary distinguishes Ottoman Turkish from Turkish, it’s like distinguishing pre-1918 Russian from “Russian”, or like one reads about “Ottoman Turks” instead of “Turks”. Also Kazakh and the other Turkic language do not get extra codes for Arabic spelling, this situation is even more comparable, innit. Kazakhs in China write in Arabic script, Mongols in China in Mongolian script, but the languages are two and not four. Or also it sounds as with Pali. Am I correct to assume that Classical Mongolian texts get reedited in Cyrillic script? Then you could base all on Cyrillic and make Mongolian script soft redirects, because even words died out before the introduction of Cyrillic can be found in Cyrillic. Fay Freak (talk) 15:23, 17 November 2018 (UTC)Reply

@Fay Freak, the situation is similar to Turkish, but it creates less problems there since the Arabic script Turkish is obsolete and most relevant loans are pre-Republican.

In principle it could be possible to collapse all of Mongolian into Cyrillic, but this would be extremely politically incorrect.

Collapsing everything (potentially even Buryat, Daur and Middle Mongolian) into Uyghur script, like we do with Chinese, would perhaps make more sense, but 1) it's a pain to enter 2) Cyrillic is generally more accessible and useful to our users and (Outer) Mongolians 3) most of my materials are in Cyrillic 4) it corresponds poorly to the spoken forms 5) its Unicode encoding corresponds poorly to its actual form 6) the encoding doesn't correspond that well to the spoken form either. Crom daba (talk) 16:50, 18 November 2018 (UTC)Reply

This is tricky, because as far as language headers and having entries for terms in the language, it seems like we could often resolve which language a word is in(?) by knowing the date of the texts it's attested in. It is, as you say, etymologies where it's hardest to ascertain dates. (Still, if we merged the lects, we could retain an "etymology only" code for borrowings that were clearly from Classical Mongolian, like is done for Classical Persian, etc.) I'm having a hard time finding any references on the mutual intelligibility of the two stages; most references are concerned with the intelligibility or non-intelligibility of modern Khalkha, Kalmyk, etc. If we kept the stages separate, etymologies could always say something like "from Mongolian foo, or a Classical Mongolian forerunner". - -sche (discuss) 22:50, 18 November 2018 (UTC)Reply

@-sche, yes, the Persian model would be desirable.

It doesn't make much sense to speak of intelligibility between Classical and Modern Mongolian, Classical Mongolian is exclusively a written language, its spelling reflects the phonology of 13th-century Mongolian (early Middle Mongolian). The same spelling is used in Modern Mongolian as written in Uyghur script.

The biggest problem with Classical Mongolian is how redundant it is. For any word that is shared between modern and classical periods, and that is probably most of the lexicon, we would need to make two identical entries in Uyghur script for modern and classical Mongolian. Crom daba (talk) 11:18, 19 November 2018 (UTC)Reply

That seems not unlike how we handle Serbo-Croatian and Hindi-Urdu. — [ זכריה קהת ] Zack. — 14:25, 30 November 2018 (UTC)Reply

Indeed. The way we handle them sucks. Crom daba (talk) 12:52, 1 December 2018 (UTC)Reply

I agree. All this duplication is a huge waste of resources. Per utramque cavernam 13:22, 1 December 2018 (UTC)Reply

Not exactly; Serbo-Croatian and Hindi-Urdu have redundant entries in different scripts on different pages, while I understand Crom daba's point to be that we would need to have redundant ==Mongolian== and ==Classical Mongolian== entries on the same pages for most Mongolian/Uyghur script words, which would be more like having duplicate Bosnian and Croatian entries on the same pages, not our current system. And Serbo-Croats are testier about their language(s) being lumped than speakers of Classical Mongolian... ;) - -sche (discuss) 17:29, 3 December 2018 (UTC)Reply

OK, does anyone object to the merge? If not, I can try to do it with AutoWikiBrowser later, or Crom or others could start reheadering our small number of Classical Mongolian entries, fixing any wayward translations, etc. For etymologies of terms that are known to derive from Classical Mongolian, we should be able to just move cmg over to Module:etymology languages/data. - -sche (discuss) 17:29, 3 December 2018 (UTC)Reply

@Crom daba, Fay Freak I made the few ==Classical Mongolian== entries we had into ==Mongolian== entries (labelled "Classical Mongolian" unless there was already a modern Mongolian section on the same page), but many of the categories still need to be deleted, and one needs to check whther anything else is left that would break before "cmg" is moved from being a language code to being an etymology-only code. - -sche (discuss) 02:46, 27 September 2020 (UTC)Reply

There's no full correspondence between different Mongolian scripts and none of the scripts is totally phonetic. It's not just the spelling, the phonologies are different but sometimes one script represents the true or historical pronunciation and it's not necessarily Cyrillic, which is strange. There are words that only exist on one or the other, which is quite understandable, cf. modern ᠱᠠᠹᠠ (šafa, “sofa”) in Inner Mongolia (from 沙發／沙发 (shāfā) and софа (sofa, “sofa”) in outer Mongolia (from софа́ (sofá). I support the merge, though but I am curious if classical Mongolian terms are equally representable in Cyrillic and Arabic scripts. In other words, are there terms in classical Mongolian, which are different from modern and there's no Cyrillic form for them? I think I saw them.

Duplication of entries is a waste. You may think I am biased but I think Mongolian should be presented/lemmatised in Cyrillic (Uyghurjin should also be available in all entries where it can be found) - for which resources are much more accessible. (Serbo-Croatian should be lemmatised on the Roman alphabet, on the other hand, let's finish the senseless duplications of entries)

Also supporting the Ottoman Turkish/Turkish merge. --Anatoli T. ^{(обсудить}/^вклад) 03:25, 27 September 2020 (UTC)Reply

@Atitarev In Mongol khelnii ikh tailbar toli we see the term уйгуржин бичиг is described as ‘монгол бичгийн дундад эртний үеийн хэлбэр’ (‘early form of the Mongolian/Khudam script’). Middle Mongolian in uigurjin with its own rules shall not to be equated with the later ‘Classical’-Modern script and orthography. I maintain uigurjin (with its specific glyph forms and spelling rules) shall be treated as a term only for Middle Mongolian.

Similarly I also object treating Northern Yuan – Qing (‘Classical’) Mongolian and Modern Mongolian-script Mongolian as one literary language standard. In fact orthographic standardisations and modifications make written Modern Mongolian such different from Classical. Personally I’d like to display a historical feature of this language collectively under ‘Classical Mongolian’, as only this term directly interlinks with an Inner Asian historical and linguistic tradition. LibCae (talk) 16:40, 7 May 2021 (UTC)Reply

We currently call this "Aguacateca", but "Aguacateco" is much more common. (Wikipedia opts for "Awakatek", which is rapidly becoming more common but is probably not there yet — not that we can't be crystal-ballsy if we want to when it comes to names rather than entries.) —Μετάknowledge^{discuss/deeds} 05:42, 19 December 2018 (UTC)Reply

You're right that several modern (and a few older) sources seem to use Awakatek. In turn, historically Aguacatec has been used in the titles of many reference works on it, and seems like it may be the most common name (ngrams), although it's also the name of the people-group. (Others: Awakateko, Awaketec, Qa'yol, Kayol, and variously spellings of Chalchitec sometimes considered a distinct lect.) - -sche (discuss) 04:31, 19 August 2020 (UTC)Reply

Indeed, the most common name by a longshot is Aguacatec, followed by Awakatek (but these are also names of the people-group), followed by Awakateko, then Aguacateco, and in dead last, our current name of Aguacateca. Can we rename to Aguacatec? - -sche (discuss) 07:02, 28 December 2023 (UTC)Reply

Support renaming to Aguacatec. Also being the name of the "people-group" is hardly an argument against it; the same is true of a huge number of languages including French, Welsh, Manx and the vast majority of language names ending in -ish. —Mahāgaja · talk 07:22, 28 December 2023 (UTC)Reply
Oh, to clarify, I didn't intend that as an argument against using that name, but as a qualification on the data; comparing which term is more common can't easily determine which is the most common name of the language if one term is also used for something else (the name of the people). But Aguacatec seems to be the most common name in e.g. the books about it in Glottolog's bibliography, too. Who has a bot that does renames? This one involves few enough entries that it could be done by hand, but it seems like the tasks that would need to be done are the same for many (all?) language renames, so it should be bottable... - -sche (discuss) 07:51, 28 December 2023 (UTC)Reply

OK, by now "Awakatek" is edging out "Aguacatec", and indeed "Awakateko" is edging them both out: Ngrams Ngrams since 1962, since 2002; Google Scholar (searching only for results published after 2000) says it finds 161 results for "Awakatek" + language, 175 for "Aguacatec" + language, and 375 for "Awakateko" + language. At this point, maybe we consider Awakateko...? - -sche (discuss) 06:01, 25 January 2026 (UTC)Reply

Discussion moved from Wiktionary:Requests for moves, mergers and splits#Retiring Moroccan Amazigh %5Bzgh%5D.

We renamed this code from "Standard Moroccan Amazigh" to "Moroccan Amazigh", but failed to note that the "standard" part was key. This is a standardised register of the dialect continuum of Berber languages in Morocco, promoted by the Moroccan government since 2011 as an official language. Marijn van Putten says this is essentially Central Atlas Tamazight [tzm], but most of the people producing texts in it are native speakers of Tashelhit [shi], so there is a bit of re-koineisation. However, if we move forward with good coverage of the Berber languages, every entry in [zgh] will be a duplicate of [tzm] or else a duplicate of [shi] marked with some sort of dialectal context label. By the way, the fact that there is an ISO code seems to be a political consideration rather than a linguistic one; compare the case of "Filipino", which we merged into Tagalog, or "Standard Estonian", which we merged into Estonian. @Fenakhay, -sche —Μετάknowledge^{discuss/deeds} 21:31, 16 March 2020 (UTC)Reply

Hmm, I see it's a rather recent attempt at standardization, too. I don't feel like I know enough about Tamazight to be confident about what to do, but it does seem like, if this is based on tzm, it could be handled as tzm (perhaps even, instead of putting "non-[ordinary-]tzm" entries at shi+label, they could be tzm+label, unless they're obviously shi words). - -sche (discuss) 15:44, 19 March 2020 (UTC)Reply

Generally, it seems the [shi] words are quite obvious; the main differences between [tzm] and [shi] are lexical (as far as I can tell, [tzm] has more internal diversity w/r/t phonology than differences with [shi]). But they're in a continuum anyway, and WP claims that there's debate on where to draw the dividing line. —Μετάknowledge^{discuss/deeds} 16:35, 19 March 2020 (UTC)Reply

And “Moroccan Amazigh” does not sound like a language name anyway if you have not been told it is one, it seems like “Berber as spoken in Morocco”, another reason to remove it. Fay Freak (talk) 15:59, 21 March 2020 (UTC)Reply

This should be renamed to "Standard Moroccan Amazigh" or "Standard Moroccan Tamazight", not removed. Lankdadank (talk) 17:21, 24 January 2026 (UTC)Reply

I'd like to get concensus to (re-)rename this to "Standard Moroccan Amazigh", I don't think it is controversial. Lankdadank (talk) 17:56, 20 March 2026 (UTC)Reply

The Constitution of the Republic of Sakha (Yakutia) (https://iltumen.ru/constitution) officially used язык саха referring to the language sah. A government decree («О Правилах орфографии и пунктуации языка саха») which approved the language’s current orthography, used язык саха instead of якутский язык from its annexe. However, this usage is not mandatorily popularised. I suggest Sakha to be adopted instead of Yakut due to the Constitution reference.

Whence atv ‘Northern Altai’ is not a singule language/dialect but a group of several (Kumandy, Chelkan & Tubalar), atv shall be split into subcodes. Furthermore Southern Altai is only a classifying term, Altai as an official term shall be suggested for alt.

Khamnigan xgn-kha, as a transitional dialect (with conservative phonology) between Buryat and Mongolian, its simple name may not create ambiguity.

In addition I also request a code for Soyot. It will help contrasting Sayan Turkic languages. LibCae (talk) 06:36, 2 September 2021 (UTC)Reply

The Constitution of the Republic of Sakha is not our guide to using English names. In the case of [sah], most scholarly descriptions use "Yakut" (e.g. The Turkic Languages), there are far more raw Google hits for "Yakut language" than "Sakha language", and Google Ngrams show a preference for "Yakut" that has not waned over time (but we don't know past 2008, after which the data are incomplete).

I can't comment on the other code requests, but it would be more convincing if there were some evidence in favour of the need for these codes and their distinctiveness from their closest relatives. —Μετάknowledge^{discuss/deeds} 16:11, 2 September 2021 (UTC)Reply

I don’t see the argument how more information would come to light if we split Northern Altai. Surely also Northern Altai and Southern Altai are the most usual names, in either English or Russian. For that number of speakers Northern Altai has, how could there be a benefit? The major factor for editors is what sources they use, whether they indicate the sources and whether those are clear about the place of origin. I had many books about “the Aramaic dialect of [village X]” where I don’t know which damn language code of Wiktionary it is supposed to belong to, Wiktionary making codes centered around city A and B but not village X, in the end I ignored to add anything. Fay Freak (talk) 17:00, 2 September 2021 (UTC)Reply

Oppose renaming Yakut

Support splitting atv

Support renaming alt to Altai

Abstain regarding xgn-kha

Support creating a code for Soyot, quite strongly so. Allahverdi Verdizade (talk) 17:13, 2 September 2021 (UTC)Reply

I suspect this request was trying to do too many things, and is now too stale, to be the best place to mention this, but in contrast to what Metaknowledge found years ago, these days Ngram Viewer shows Sakha has been more common for decades (the corpus changed). Google Scholar searches for papers from 2000 - 2026 claim to find 16,700 papers using Sakha and 17,200 using Yakut. The request to add Khamnigan was granted after a different (BP) discussion, because ISO gave it a code. - -sche (discuss) 06:20, 25 January 2026 (UTC)Reply

Wikipedia uses the phrase "Ngul (including Ngwi)" to describe this language, which we currently call "Ngul", but this paper indicates that these are just two of several synonyms, and uses "Ngwi" as the primary name. We should follow suit. —Μετάknowledge^{discuss/deeds} 00:19, 21 December 2021 (UTC)Reply

Support, Ngwi seems to be the common name in recent literature, and in the references listed at Glottolog. - -sche (discuss) 06:24, 25 January 2026 (UTC)Reply

We currently call this language "Hamer-Banna", after two of its dialects; WP uses "Hamer". This hyphenated name is found in the literature, though it excludes the third dialect, Bashaɗɗa. Modern publications, following the lead of Petrollino's grammar, use the spelling "Hamar" for that dialect. As I see it, if we stick with the hyphenated name, we should change it to "Hamar-Banna", but we could also consider elevating the name of the primary dialect to cover the language as a whole, as WP does, though in that case we should use "Hamar" instead. —Μετάknowledge^{discuss/deeds} 07:56, 22 December 2021 (UTC)Reply

We currently have this language, which Wikipedia refers to as the Harappan language, as [xiv]. I suggest that we retire the code, because the language is undeciphered and its script has not been encoded, so there is nothing to add to Wiktionary in the foreseeable future. I also suggest that we retire the script code [Inds], which is only used for this language. @AryamanA —Μετάknowledge^{discuss/deeds} 07:14, 28 December 2021 (UTC)Reply

Currently, we have codes for [mkl] "Mokole" (see Mokole language (Benin)), [cbj] "Ede Cabe", [ica] "Ede Ica", [idd] "Ede Idaca", [ijj] "Ede Ije", [nqg] "Ede Nago", [nqk] "Kura Ede Nago", [xkb] "Manigri-Kambolé Ede Nago", and [ife] "Ifè" (all of which are lumped into Ede language). These lects are all very close to Yoruba proper (which they use for formal and liturgical purposes), and spoken by people who are considered ethnic Yorubas; moreover, they are included in the Global Yoruba Lexical Database. I have added them as dialects of [yo] "Yoruba" in MOD:labels/data/subvarieties, but treating Yoruba as a macrolanguage means we must remove these codes. (Note: the family code [alv-ede] would have to be removed as well.) @AG202, Oniwe, Oníhùmọ̀ —Μετάknowledge^{discuss/deeds} 07:29, 28 December 2021 (UTC)Reply

Merge, obviously again Ethnologue’s fabrications, which were then copied over from Wikipedia and some other “encyclopedias” with their impractical credulity towards this reference. Fay Freak (talk) 07:54, 28 December 2021 (UTC)Reply

If anything I would keep the Ede family code and change the lects to be etymology-only languages (edit: excluding probably Ifè since it is much more documented), but putting them all under Yoruba I unfortunately oppose for now. The Western Ede languages as seen here have a higher degree of separation from Nuclear Yoruba, and it checks out more when comparing, at the very least, the words and phrases of Ifè to nuclear Yoruba: Ifè-French Dictionary, Peace Corps - IFÈ O.P.L. WORKBOOK, J'apprends l'ife: Langue Benue-Congo du Togo. While there are obviously words that are shared due to them being related languages, it doesn't feel like a dialect of Yoruba (to me at least), so I feel uncomfortable grouping it under Yoruba. Though I do admit that I haven't really looked into the other Ede languages nearly as much. Edit: This paper may be helpful and at least shows some of the differences between Ifè & Yoruba and some aspects of the dialect continuum. Obviously some Ede varieties are much closer to Yoruba, but then I wonder what to do about the other ones. AG202 (talk) 15:09, 28 December 2021 (UTC)Reply

@AG202: Thanks for the sources. The question of whether to lump a code is in part based on how much extra work is entailed; would you be willing to work through a subsample to see how much we would just be duplicating Yoruba entries, and how much would be distinct? I'm not sure what you're actually advocating, because making them etymology-only languages (which you say you support) would require merging them (which you say you oppose). —Μετάknowledge^{discuss/deeds} 07:18, 29 December 2021 (UTC)Reply

@Metaknowledge Yea, sorry for that being unclear. I oppose the merger under solely Yoruba. Regarding the etymology-only part, I would support having all the Ede lects (excluding Ifè) under the header "Ede" and then differentiating on the definition line which Ede lect it is, mainly because they have much less coverage than Ifè, and it's harder to tell their mutually intelligibility. (Though as mentioned I'm not as well-versed with the other lects, so I might be entirely wrong about their continuum) In terms of working through a subsample, I am up to do so, though I am swamped at the moment so it'd definitely take a while, but from what I've seen so far, I'd be worried about putting possible Ifè terms like ɖíɖì (“belt”) or àntã̀ (“chair”) under a Yoruba header and keeping nice clear entries for readers. AG202 (talk) 07:52, 29 December 2021 (UTC)Reply

Looks reasonable. To clarify, my main note relates to observation that the language names currently in the data are too unnatural to find use and are not even meeting our CFI, which again means there is no entrotopy for those who know the languages to assign material to the designations with little doubt, as there is little to confirm the meanings of the language names, which should be a consideration if you devise new namings, in so far as you would like to not have private language but more or less obvious to new editors what the language codes are for. So I was not to mean that there cannot be a split in a different manner, or a smaller merge, but the current ones should be recognized as off the wall, and then there will have to be something that interrelates the remaining codes if one stumbles upon one, else it will be a reoccurring problem that an editor did not see the distinction of the available language codes. Fay Freak (talk) 01:36, 30 December 2021 (UTC)Reply

Members:

價關: Gansu Dungan
可價: Gansu Dungan
綿魚: Gansu

@Justinrleung, RcAlex36, 沈澄心 —Fish bowl (talk) 05:55, 6 February 2022 (UTC)Reply

@Fish bowl: Gansu means actual Gansu in China, but Gansu Dungan should be its own label perhaps. I'm not sure why those entries are labelled specifically as Gansu Dungan, though, because do we know if it's not used in other varieties of Dungan? Pinging @Mar vin kaiser to know why he chose to label it as Gansu Dungan specifically. — justin(r)leung _{{ (t...) | c=› }} 06:03, 6 February 2022 (UTC)Reply

@Justinrleung: There's this website, I can't find the link now, that was like a mini Dungan dictionary, and for some of its words, it has a dialectal label. I think I got it from there. --Mar vin kaiser (talk) 08:39, 6 February 2022 (UTC)Reply

@Mar vin kaiser: This? I know these words are marked as Gansu here, but I wonder if we need to specify it as Gansu specifically when we don't know if other Dungan varieties use it. — justin(r)leung _{{ (t...) | c=› }} 09:02, 6 February 2022 (UTC)Reply

@Justinrleung: Oh, I added the label Gansu with the assumption that it's specifying that it's only used in Gansu. Aren't there just two dialects, Gansu and Shaanxi? --Mar vin kaiser (talk) 14:03, 6 February 2022 (UTC)Reply

Following up a long discussion on the Old East Slavic About: page, I'd like to propose the following splits:

Split off Old Ruthenian (zle-ort)
Set Old Ukrainian (zle-obe) and Old Belarusian (zle-ouk) as etymology-only descendants and labels of Old Ruthenian
Set Ukrainian (uk), Belarusian (be) and Rusyn (rue) as descendants of Old Ruthenian
Change Old Russian (zle-oru) to Middle Russian (zle-mru) and set this as a label of Russian (ru)

On the final point there was quite some discussion, and I personally support making Middle Russian as a full-fledged code, but since we couldn't reach consensus, I propose making that a separate discussion if need be.

The proposed historical borders of the languages are as follows:

Old East Slavic (until the 14th century)
Middle Russian (=Moscow Literary language; 14th century-18th century) [Peter the Great's reforms]
Old Ruthenian (='West Russian' Literary language; 14th century-19th century) [Kotliarevsky's Eneïd]

Pinging @Atitarev, ZomBear, Useigor, Ентусиастъ, Benwing2, Rua, Ogrezem. I apologise if I forgot anyone. Thadh (talk) 12:43, 2 March 2022 (UTC)Reply

I still support only the introduction of Old Ruthenian, which is missing but as before, I don’t claim to be an expert on the matter. The Russian corpus in the other discussion was helpful. When I filtered on “Middle Russian”, I think I was able to find a couple of words, which are now considered obsolete. The rest were words, which just need to be respelled to find quotes in (early) Modern Russian. I found a few different ways to abbreviate and also numerous misspellings. Overall I sort of feel why these additional splits are not so popular - little strong evidence to work with. Middle Russian may be allowed to be added, let’s just look for good cases.

To make decisions easier, why don’t we add a couple of specific examples for each new language code proposed - something to work with. (They can be vocab, grammar or pronunciation cases). They proponents should have examples in mind to make the case(s) stronger. We can work together on confirming or disputing those cases. --Anatoli T. ^{(обсудить}/^вклад) 22:57, 2 March 2022 (UTC)Reply

I'll see if I can make a list of features that distinguish Middle Russian from (Modern) Russian. In any case, for the time being, treating Middle Russian like Old East Slavic makes little sense to me, especially if we're splitting off Ruthenian (otherwise we get some kind of Dutch-Afrikaans situation), so we could go ahead with that now and in the meantime continue discussing MR's position as a separate code. Thadh (talk) 23:30, 2 March 2022 (UTC)Reply

(edit conflict) You can use any of the examples already in discussions used as evidence, e.g. онтарь/оньтарь, агистъ, etc. BTW, I see that "Old Russian" was used incorrectly by ZomBear when actually talking about Middle Russian. "Old Russian" = "Old East Slavic". The Russian term for Middle Russian is старору́сский (starorússkij) but Old East Slavic (Old Russian) is древнеру́сский (drevnerússkij). --Anatoli T. ^{(обсудить}/^вклад) 00:21, 3 March 2022 (UTC)Reply

Quick update, I've found a relevant discussion from three years ago, Wiktionary talk:About Russian#Middle Russian?. Also, The Russian Language before 1700 (Matthews 1953) argues your and Fay Freak's point (that Middle Russian is too similar to modern Russian to warrant a linguistic distinction) Fun point, it also provides съмьрть's accentuation :0. I'll still look for differences in the corpora, but if the languages are too similar I guess I don't mind keeping the two together - as long as the descendants sections don't get too cluttered, I'm fine. Thadh (talk) 00:02, 3 March 2022 (UTC)Reply

BTW, I didn’t get back to you on the concern I have in regards to introduction of word stresses in Old East Slavic. My reason being there are many cases where assumptions can go wrong based on descendants. We should only use referenced data. Well, we don’t have native speakers to prove us wrong, do we? —Anatoli T. ^{(обсудить}/^вклад) 23:03, 2 March 2022 (UTC)Reply

Sure, but of course we can still use sound laws for words without referencing the specific word's reconstruction. A word like съмь́рть will have the stress on the second syllable, because otherwise the Russian term would be something like **со́мерть rather than сме́рть. However, I wouldn't know where to look for any reference on this specific word, and googling "съмь́рть" returns no results. Thadh (talk) 23:30, 2 March 2022 (UTC)Reply

Of course, there could be strong (?) assumptions on vowels, which became silent (i.e. they are unstressed) but I wouldn't be so sure even on e.g. вода́ (vodá) (if it weren't referenced), since the word is stressed on the first syllable in some Ukrainian dialects, if you know what I mean. --Anatoli T. ^{(обсудить}/^вклад) 00:21, 3 March 2022 (UTC)Reply

@Thadh: I support your suggestions. Ентусиастъ (talk) 16:19, 3 March 2022 (UTC)Reply

I have already spoken before. I'm for it too.--ZomBear (talk) 00:57, 4 March 2022 (UTC)Reply

@Thadh: Again, unfortunately, I see that the discussion has stopped again. It's been almost a month since no one has written anything. Every day I look forward to the solution of this issue with the Old Ruthenian language. --ZomBear (talk) 07:32, 21 March 2022 (UTC)Reply

Done. What we need now is to split all pages into either Old East Slavic, Russian (with the Middle Russian label) or Old Ruthenian (with or without the Old Belarusian/Old Ukrainian label). Thadh (talk) 18:43, 21 March 2022 (UTC)Reply

I also removed Old Novogrodian as the child of Old East Slavic. Vininn126 (talk) 08:52, 4 October 2023 (UTC)Reply

@Thadh how about adding more etymology only language codes? Modern dictionaries use more than just Old Belarusian/Ukrainian. I saw Middle Bulgarian, Old Slovak, Old Slovene, Old Serbian, Old Croatian, Old Serbo-Croatian, Old Bulgarian, Old Upper Sorbian, Old Lower Sorbian. Possibly Middle Czech and Middle Polish also would be useful sometimes. Old Sorbian was also used by Boryś (Old Sorbian peleš as cognate for Polish pielesze), however we can't just link to both Lower and Upper Sorbian at once, so that would require full support for this language (?). Scientific publications mention Old Polabian as language of Polabian Slavs in Middle Ages, it is used usually for proper nouns like given names, theonyms, toponyms, sometimes ordinary words mentioned in Latin texts and it is always reconstructed language, I would like to have it tho. Sławobóg (talk) 14:32, 28 May 2022 (UTC)Reply

@Sławobóg I'll need from you in order to determine if the splits are worth it is:

- Exact boundaries of the languages' stages

- You need to check how much literature there is in the earlier stages of the language.

- You need to check in how much the languages differ from their modern stages.

Once you do that, we can continue the conversation about splitting them. It seems pointless to split a language off just because there are two inscriptions in some dusty old book. Thadh (talk) 15:15, 28 May 2022 (UTC)Reply

@Thadh: IMO Middle Polish would benefit greatly from the split.

Boundaries: As it is with extinct languages, there aren't really any exact boundaries, but it's usually defined as between the 16th and the 18th century; Polish Wiktionary has settled on years 1500 to 1750 to account for Doroszewski's dictionary.
Literature: There are two major corpora, accessible on the SPXVI and ESXVII websites.
Differences: I reckon the spelling and pronunciation differences, especially the employment of "slanted vowels" (samogłoski pochylone, I have no idea what their name is in English), should be enough.

Plus, like, this would help with attestation. Hythonia (talk) 11:08, 30 July 2022 (UTC)Reply

Middle Polish is also thusly defined on Wikipedia. I also think it would make more sense to have Middle Polish as an LDL. The alternative would be having a label. If we split, we'd have to add Middle Polish both to Proto Slavic descendent entries as well as intermediates on etymologies. Vininn126 (talk) 11:52, 30 July 2022 (UTC)Reply

Also pinging @KamiruPL, as an editor for Old Polish. Do you think we should fully split Middle Polish, create a label, or some other alternative? Vininn126 (talk) 13:44, 30 July 2022 (UTC)Reply

@Vininn126: I treat Arabic before the spread of printing in the Arab world, which is from 1800 (Napoléon brought the press to Egypt, which was then a state business that over time was rented by privates who would copy it), as LDL. The reason becomes more obvious for Hebrew where we are eager to include hapax legomena in the Tanakh and due to lacking distinctness of the Modern to the Biblical language, from which the former has been resurrected, have little desire to split. This is in analogy to the split of English from Middle and Old English, where basically the split happens following the new medium of printed books—accordingly if Polish literacy in the same fashion starts only somewhere in the 18th century then we become stricter only then.

Circumventing attestation criteria is no reason to split language headers, as your perception about whether something is another language is the same and only disingenuously modified by that consideration of its description. So more appropriate attestation criteria – and I think of the many carefully collected variants sadly left even unmentioned as a consequence of no sense of proportion applied to the teleology of our rules – by no means should serve motivation to split languages; we can already derive them by the accepted statutory interpretation methods.

To be clear, since legal thinking is unwonted and mysteriously strange to many in spite of people rightly being appointed for it in any society: In this case this is really just systematic interpretation: Since the community authoring the policies was biased towards English but the splits of other languages wrought comparative inconsistency with its situation according to which it has been split by chronolects, we break the criteria down to be suited for the languages they were only roughly devised for. Fay Freak (talk) 09:51, 31 July 2022 (UTC)Reply

In all honesty a label is likely the best option. Vininn126 (talk) 10:05, 31 July 2022 (UTC)Reply

@Hythonia @Sławobóg @KamiruPL I've gone ahead and added Middle Polish as a label. Vininn126 (talk) 12:11, 8 August 2022 (UTC)Reply

I've thought about this more, and I think there might be a case for Middle Polish as an L2. If we agree it should be split, I can help convert the existing entries to Middle Polish.

Here is my reasoning:

Old Polish, Middle Polish, modern Polish, and Silesian are four lects that are hard to separate accurately. Part of this argument hinges on Silesian, which we currently treat as an L2, and I don't see that changing. There are political, historical, and linguistic reasons

===Why Silesian should be an L2===

Its speakers feel strongly that it is a language, not a dialect, most Polish linguists pushing that it is a language include Jan Miodek, who is a notable prescriptavist who pushes more nationalistic views of how languages should be treated, and I believe that treating Silesian as a dialect is done partially to stifle any sense of individuality to further Polish control. However, I recognize that theory has some tinfoil-hat conspirist vibes to it, so I'll stick to its speakers strongly feel it is.
Significant linguistic difference: Silesian has a different phonology to Polish, and other grammatical features, such as retaining the Proto-Slavic aorist in an analytical past tense, as opposed to a more agglutinative/morphological one in Polish. It also recently has undergone strong standardization, as can be seen on silling.org and the ślabikŏrzowy szrajbōnek.
Significant lexical differences: Silesian differs quite a bit from Polish in terms of lexical information. Core inherited words are of course similar, but look at other Slavic languages. It's also been heavily "Policized", but so has Kashubian, which we also treat as an L2 and is recognized as a separate minority language in Poland, and both Kashubian and Silesian are recognized by ISO and Glottolog.
Finally, the key point to the overall arguement: Silesian is a descendent of Middle Polish. Most claims that it is Czechoslovakian are refuted by Silesian philologists.

===Why Middle Polish should maybe be an L2===

So if we decide that Silesian is an L2, that would give Middle Polish multiple descendents. This would "fix" many inherited etymologies, such as wszystek. This would also fix Latinate borrowings, where Silesian inherited an older pronunciation of Latinate words, and also the chain generally works better as Learned borrowing into Middle/Old Polish -> Polish + Silesian, as opposed to setting multiple Learned borrowings.

Furthermore, Middle Polish was siginificantly different from Modern Polish in terms of phonology and grammar (I recently updated the Middle Polish Wikipedia page). In terms of lexical content - there were significant shifts, I would say less than the standard differences between Slavic languages, but there were still trends, and dictionaries such as {{R:pl:SXVI}}, {{R:pl:SXVII}}, and occasionally {{R:pl:SJP1807}} or {{R:pl:SJP1900}} would be key in this. Furthermore, Middle Polish is otherwise resource poor, and should be treated as an LDL, label or not. Having it as an L2 is cleaner in terms of citations.

If we agree that this should be done, I would recommend setting the cutoff dates as c. 1500-c. 1780, with a language code of zlw-mpl. Vininn126 (talk) 12:39, 24 April 2023 (UTC)Reply

@Atitarev@Fay Freak@Hythonia@Sławobóg@Thadh@ZomBear@Ентусиастъ Vininn126 (talk) 17:30, 24 April 2023 (UTC)Reply

Update: there is debate as to whether Silesian should be listed as from Old Polish or Middle Polish, which really affects the above argument. Vininn126 (talk) 14:53, 25 April 2023 (UTC)Reply

Just flagging up that it's possible to give Middle Polish an etymology-only language code, and to set it as the ancestor of Polish (and Silesian, if desired). This would be a way to keep its entries under the Polish L2, while allowing etymologies to formally mention it. In turn, Middle Polish could have Old Polish set as its ancestor.

Of note is the fact we already have Middle Russian, Old Ukrainian, Old Belarusian, Middle Bulgarian and Early Modern Czech, which are all currently handled in the same way. Theknightwho (talk) 16:14, 25 April 2023 (UTC)Reply

@-sche, Chuck Entz, Rua, Tropylium, Hekaheka, Surjection, Brittletheories, Mölli-Möllerö

In the previous discussion on this topic ([1]) it seems everyone has agreed that it's best to merge Kven and Meänkieli into Finnish. However, the discussion was closed without actually merging the codes, and currently we (again) have 40 Kven and 30 Meänkieli lemmas, many of which are also duplicated as Finnish for the reasons discussed in the above discussion. Has anyone changed their opinion or does anyone have anything to add to this or can we actually go ahead and merge the languages?

I guess related to this is also the question of how to handle dialectal morphology of Finnish dialects, but maybe that's a bit out of scope for this discussion. Thadh (talk) 16:24, 23 September 2022 (UTC)Reply

The strongest arguments in favour of splitting them are political and should therefore be ignored. Our task is to best present the most information, and that would best be achieved by merging the three lects. The dozens or so new dialectal terms will fit in quite well with the 1250 pre-existing ones. brittletheories (talk) 16:49, 23 September 2022 (UTC)Reply

Incubator says "Wikimedia does not decide for itself what is a language and what is a dialect. We follow the ISO 639 standard." This means that it's up to the agency that grants language codes, not to us, right? Meänkieli and Kven have written standards so they should stay as they are. (In my view, Tver Karelian should also be treated as a language so I could add Tver Karelian words without knowing if they're used in the more usual "vienankarjala" dialect.) Mölli-Möllerö (talk) 19:55, 23 September 2022 (UTC)Reply

The Incubator standards are not the same as our standards. Our language treatment does not strictly follow ISO 639. — SURJECTION ^{/ T / C / L /} 20:33, 23 September 2022 (UTC)Reply

@Mölli-Möllerö: On the Tver Karelian issue, you could also just leave the first parameter of {{krl-regional}} empty or |1=? it, and it will automatically be sorted in Category:Karelian term requests, and I'll be able to add the terms later. Or you could use either {{R:krl:KKS}} or another Viena source, the correspondences are usually quite easy. Thadh (talk) 20:44, 23 September 2022 (UTC)Reply

Wrong. There's a big difference between Wikimedia's administrative needs and the lexical needs of a dictionary. As for written standards: the world is full of languages with multiple written standards: Brazilian and European Portuguese, European and Canadian French, Austrian and German German, etc. We can't let others decide for us- each case needs to be considered on its own. We've chosen to merge languages treated as separate by ISO and recognize languages with no ISO codes. In other cases we've gone with the ISO. Chuck Entz (talk) 20:59, 23 September 2022 (UTC)Reply

For outsiders, Meänkieli (in Sweden) and Kven (in Norway) are languages or rather dialects that have become languages by virtue of being across the border (the Finnish-Swedish border and the Finnish-Norwegian border, respectively). Finnish speakers can easily understand nearly 99% of Meänkieli or Kven, and the main differences are either dialectal features also found in Far Northern Ostrobothnian dialects or (the lack of) recent developments within the past 200 years (in one or the other).

Linguistically they are 100% dialects, but politically both Sweden and Norway respectively have recognized them as separate languages, which is also what their speakers think. A more cynical person might say that they have deluded themselves into thinking their language is not Finnish in order to avoid persecution of Finnish that was prevalent in Sweden and Norway in the 19th and 20th centuries ("Finnish? what Finnish? we're not speaking Finnish, it's Meänkieli/Kven").

However WIktionary best handles cases like these, I don't know. 200 years is not enough for what is generally a phonologically conservative language for it to become anywhere near unrecognizable. It could be compared to how Karelian is now almost universally treated as a separate language, even though it forms a dialect continuum and has been diverging now for at least about 800 years (ever since the 1323 Treaty of Nöteborg).

Finnish sources almost exclusively consider Meänkieli and Kven to be dialects, even more so when these sources are linguistic-oriented (some other sources take a political stance and recognize that they are considered "minority languages" in their respective countries). — SURJECTION ^{/ T / C / L /} 20:34, 23 September 2022 (UTC)Reply

"The main differences are either dialectal features also found in Far Northern Ostrobothnian dialects or (the lack of) recent developments within the past 200 years (in one or the other)"... and the additional Swedish/Norwegian loanwords found in Meänkieli/Kven, of course. But many of these are also found in Finnish dialects. — SURJECTION ^{/ T / C / L /} 21:37, 23 September 2022 (UTC)Reply

The divergence of Karelian from Finnish, FWIW, almost certainly goes back at least 1200 years (to the archeological / mentioned-in-Novgorod-sources Old Karelian culture). The initial split-off of Northern Finnish dialects is probably about as old too.

What I would think of as the best argument against treating Meänkieli and Kven as languages is that they're not even internally well-defined — typically they're just catch-all terms for "Northern Finnish in Sweden" and "Northern Finnish in Finnmark" with relatively various dialects encompassed by each. There's some efforts (schoolbooks, etc.) towards a "standard" Meänkieli based on the Torne Valley dialect but I don't think it could be called actually standardized just yet. I suppose one thing we could do is to document whatever is done on this specifically under "Meänkieli" and leave anything else as dialectal Finnish, but that might be a bit premature still too. --Tropylium (talk) 07:44, 24 September 2022 (UTC)Reply

I would not say that "everybody" agreed on the merger. I didn't. I can only comment Meänkieli but I would not be surprised if similar argumentation would also apply for Kven:

The overall small number of Meänkieli words in Wiktionary only proves that we don't have an active editor in Meänkieli. There seem to be some 30,000 entries in this Meänkieli--Finnish-Swedish dictionary[4]
The small sample of words we have proves nothing of similarity of the vocabularies. If you study the dictionary I mentioned (press "tutki") you'll find that there are considerable differences between Finnish and Meänkieli. In addition to vocabulary, conjugation of verbs seems to differ (e.g. Meänkieli: tukeat - Finnish: tuet - English: you support).
This article[5] promotes the opinion that Meänkieli is a dialect. However the writers admit that the two are not readily mutually understandable: Finnish-speakers usually understand Meänkieli relatively well, partly because of their knowledge of Swedish, but for Meänkieli speakers Finnish isn't as easy. If we took a Finn who does not know a word of Swedish, they would be lost with a Meänkieli speaker.
This article[6] starts from the maxim that Meänkieli is a dialect of Finnish but finishes with the conclusion that at the end of the day it is the spakers of a language themselves who decide the status of a language/dialect. Meänkieli speakers have made their opinion clear: they want it treated as a language. How competent are we to second-guess their point of view? Has any of us studied Meänkieli more than superficially?

Here is also a link to a Kven-Norwegian dictionary[7]--Hekaheka (talk) 09:44, 24 September 2022 (UTC)Reply

To be fair all these points would still hold for Ingrian and Savonian dialects, too, and of Ingrian dialects I'm fairly certain no Finnish speaker would readily understand them much better than, say, Izhorian or Karelian. Thadh (talk) 09:51, 24 September 2022 (UTC)Reply

A clear-cut solution would be to stick to ISO. Ingrian has an ISO code, Savo hasn't. Is Ingrian currently treated as Finnish dialect? I think it shouldn't. --Hekaheka (talk) 12:05, 24 September 2022 (UTC)Reply

You're confusing Ingrian (inkeroinen) and Ingrian (inkerin (suomalainen)). The first one is the same as Izhorian and is handled as a distinct language, has an iso code, and is spoken by the orthodox Izhorians. The latter one is the same as Ingrian Finnish and is handled as a Finnish dialect, does not have an iso code, and is spoken by the lutheran Ingrian Finns. My remark concerned the latter. Thadh (talk) 13:46, 24 September 2022 (UTC)Reply

I would second this, as a speaker of Meänkieli, I do not necessarily understand Finnish. I use the word telefuuni instead of puhelin for phone. It is bizarre to me that because Finns have learned Swedish at school that this means that the significant divergences between our languages have suddenly been nullified. My word for train is tooki, not juna. I would also say rekiunaalin and not alueellinen for regional. Finally for another rather extreme example: joukkoliikenne v. kolektiivitrafiikki (public transit).

These words diverge significantly from each other, and it is rooted in the dissolution of the Swedish Empire. A process of finnicization led to the demotion of Swedish loan words to a dialectal status and at the same time Meänkieli was infused with even more Swedish loan words as the industrial revolution reached us by way of the south. This led to far reaching vocabulary changes for both languages. Meänkieli built a corpus of academic and technical vocabulary on its own and Finnish severed overlaps between our languages. Whilst 200 years is a brief amount of time for a language to diverge, this argument neglects the fact that the last 200 years have been the most impactful and revolutionary years of human history. Terms and innovations related to science, philosophy, psychology, healthcare, technology, politics, and biology were all invented during these 200 years. This represents a significant amount of linguistic change.

Using my previous examples of divergences I believe I make it obvious, that if I with minimal awareness of Finnish, were to meet a Finnish speaker with minimal awareness of Swedish. I could not possibly navigate a simple discussion where I ask when the next train were to arrive, if that train were a regional or local train, or if there were any public transit alternatives for me to take. This mutual incomprehensibility repeats itself across every aspect requiring any form of specialized vocabulary. I must sincerely ask, in what language would it be considered natural for me to be unable to carry a political discussion? Ask for help with traffic, or be fully unable to understand any moderately advanced discussion past the CEFR level of B1? I see many arguments that Swedish loan words are considered colloquial in Finland, but I believe they confuse the fact that casual language does not compare to a completely separate academic vernacular.

The idea that Meänkieli is a political invention neglects the fact that it were Meänkieli speakers, not the Swedish state who argued for their own language, the Swedish state tried to teach the previous generation Finnish, but this failed because the divergences were too great. I identify with that failure because I really do not find any interest in learning a separate register for advanced vocabulary which is foreign to me. This means that the push for Meänkieli was made at a grassroots level, not at a state level. Indeed for many of us, our lives would be simpler if we simply learned Finnish, but despite the difficulties in developing our own language, we have embraced them.

Finally at a practical level, if I were to author articles under Finnish these would rapidly be deleted because besides the Finnish loanwords which were demoted to a dialectal level in the colloquial category. I greatly question the idea that any Finn would not find the impulse to delete articles I would write on: rekiunaali, kolektiivitrafiiki, telefuuni, repypliikki or militääri. The argument would follow somewhat as "This is not Finnish, it is broken Finnish using Swedish words with Finnish spelling, please learn Finnish before making any entries." Because this is how the average Finn would react to me speaking Meänkieli in a formal setting. Dominic-SS-Olofsson-Tuisku (talk) 09:26, 8 January 2026 (UTC)Reply

I've come around to say that I think they should be merged. We don't consider Valencian, Ulster Scots nor Lemko (the linguistic case is very similar between those examples and this one) to be their own languages despite political arguments that they should be considered as such (and even some recognition like in the ECRML). We shouldn't do so here either. And don't even mention the whole thing going on with Serbo-Croatian... The general trend on en.wikt seems to be to consider the linguistic argument more important than any political ones (which I can appreciate). — SURJECTION ^{/ T / C / L /} 11:51, 3 October 2022 (UTC)Reply

As a Norwegian, I find it odd that there is a proposal to merge Kven with Finnish - as Kven is an officially recognized minority language in Norway (Finnish is not). I do not agree with this merge, for the following reasons:

At least in Norway, Kven and Finnish are considered separate languages. You are able to get elementary school education and books in Kven (but not in Finnish, as far as I know) - you can even study Kven at the University of Tromsø and receive a bachelor's and master's degree in the language (there is a Finnish one as well, and they are considered two separate degrees). Kven people are considered a separate ethnicity, along with their language, descendant from Finns/Finnish.
Political reasons are of course relevant, not just linguistic ones. The average Kven speaker has never set foot in Finland, never studied any Finnish, nor consumed any part of Finnish culture and media (music, literature, etc.). An argument was that Finnish speakers understand 99% of Kven - as a Norwegian I understand up to 99% of Swedish and Danish, but they are not getting merged into one language called Scandinavian (for political reasons).
If merged, then in theory thousands of new Finnish entries on Wiktionary would emerge, in the form of "dialectal" words which are actually Kven words. If someone bothered to add them all (I, stubbornly, might) - then every Kven word and declension would need to be added under Finnish, and certain words and forms which don't even exist in Finnish dialects in Finland would be present. Every Kven word, even if the nominative singular is identical to Finnish, has a separate declension chart, every single one - there would then need to be a separate template to show these (I think Finnish Wiktionarians would be quite annoyed by this).
Kvens in Norwegian have fought very hard for their language, they have gotten their own language institute with a promotion of literature and culture in the Kven language - erasing their language from Wiktionary and treating it as a dialect of a language they don't even speak would be a huge slap in the face. Finns in Finland who speak a dialect of Finnish, also all know standard Finnish, Kven people do not. If a Kven person handed in an essay at a school in Finland, every other word would be marked as wrong or a typo. Supevan (talk) 22:49, 2 November 2022 (UTC)Reply

This entire argument can be boiled down to "Kven is standardized". So is Valencian and Croatian, but we still don't treat them as separate languages. — SURJECTION ^{/ T / C / L /} 14:57, 5 November 2022 (UTC)Reply

@Surjection: Actually, Kven isn't firmly standardised afaik. Thadh (talk) 14:58, 5 November 2022 (UTC)Reply

We should. Supevan (talk) 17:35, 5 November 2022 (UTC)Reply

@Supevan Most of these points were already raised for Meänkieli. I will try to answer them anyways.

1) First, our standard procedure is to emphasise linguistics over politics, even when much more controversial (see WT:Serbo-Croatian).

2) Secondly, and most importantly, you claim all Kven inflection should be incorporated into Finnish. This is false. There is already a ridiculous amount of variation in the inflection of the various Finnish dialects, and none of it is represented here. We simply do not have the capacity to maintain 30 different tables containing dozens of inflected forms. Additionally, natives do not stick to one variety of Finnish but mix standard Finnish grammar with that from various dialects and registers. It would also be naive to assume that Kven speakers all use one well-defined standard themselves. A language with a morphology as righ as that of Finnish leaves much space for variation.

3) You say, "thousands of new Finnish entries [– –] would emerge, in the form of 'dialectal' words which are actually Kven words", but this is only true if one assumes Kven not to be a collection of Finnish dialects, which is not a popular opinion among linguists. Besides, only a small number of these terms are exclusive to the Ruija dialects.

brittletheories (talk) 13:46, 27 January 2023 (UTC)Reply

In the previous discussion, my impression was that these could be merged, but the number (and conviction) of editors with knowledge of Finnic languages who think these are better kept separate leads me to change my position to abstain (and let other people, ideally ones with knowledge of Finnic languages, deal with this). - -sche (discuss) 23:18, 8 January 2026 (UTC)Reply

Meänkieli and Kven, Separate Languages from Finnish Treatment Request

[edit]

Meänkieli should be treated as its own language. (Kven too and all my arguments for Meänkieli reflect Kven.)

There are many reasons for this. I shall give a few arguments.

Meänkieli and Finnish separate or not?

There are a lot of arguments from Finnish speakers that boil down to Meänkieli being a political project from the Swedish state to create a separate language from Finnish. The arguments for this are as follows: Meänkieli Swedish loan words are tantamount to the colloquial vernacular of Finnish with large Swedish influences, see Åboland, Uusimaa, and Ostrobothnia. This is true in the sense that the much more popular word for 'to like' in Meänkieli as well as in most Finnish dialects is tykkän. However, what proponents of the dialect argument fail to recognize is that Meänkieli is not distinguished by a colloquial vernacular, rather by its advanced and technical vernacular.

A few of these examples for comparison's sake are as follows (glottolog codes used to separate examples): fit: rekiunaali fin: alueellinen eng: regional. fit: tooki fin: juna eng: train. fit: telefuuni fin: puhelin eng: phone. fit: piili, fin: auto eng: car. fit: repypliikki fin: tasavalta eng: republic. fit: taatori fin: tietokone eng: computer. fit: biikarbunaatti fin: bikarbonaatti eng: bicarbonate. fit: kolektiivitrafiikki fin: joukkoliikenne eng: public transit

The purpose of this is to illustrate that where Meänkieli and Finnish differ most greatly is not in the vernacular level, where dialectal words are most closely associated. But rather at the advanced and technical level of the language. The reason for this is quite simple and leads into another argument, Meänkieli speakers were introduced to the results of the industrial revolution and technological improvements during the last 200 years by way of Sweden and not by way of Finland. Finland also had a fennicization movement at the same time where most Swedish loan words in dialects were demoted to vernacular status in favor of more Finnish seeming words or in favor of ex nihilio constructions (which is largely what explains the difference between standard Finnish and vernacular Finnish). In summary whereas Finnish was fennocized, Meänkieli was svecocized. This is a perhaps the greatest reason to consider Finnish and Meänkieli as separate languages. Because although I can agree that 200 years represents a very limited time for separation, the last 200 years have been the most impactful in terms of scientific advancement in human history. In relationship to Finnish and Meänkieli linguistic divergence one must also accept the fact that this period represents the greatest amount of distinction vocabulary wise possible.

Some Finnish speakers seek to claim that they understand us perfectly and therefore us Meänkieli speakers speak Finnish. But this ignores many problems. The first is asymmetric intelligibility. During the 1970's the ability to study Finnish as a mother tongue was made possible in Sweden, many people in Norrbotten jumped at this opportunity to learn Finnish but the previous generation of Meänkieli speakers simply struggled to learn Finnish as they considered it too different from the language spoken at home (ibid.). Many Finns also consider the lect we Meänkieli speakers to speak as 'a messy mix of Swedish and Finnish' (en rörig blandning av finska och svenska). Some Meänkieli speakers find it simple to make themselves understood in Finland and to understand Finnish, but many Meänkieli speakers find the opposite to be true also. This varies depending as Elina Kangas points out on the level of exposure one has had to Finnish or if one speaks to a Finnish speaker the level of Swedish they have been exposed to. It is also as Kangas points out related to the issue of dialect continuum.

All of this proves one thing, that a significant level of mutual incomprehensibility exists. The issues relating to dialect and language we are well familiar with, but as regards languages that are dialects of each other, the main thing is that they are mutually intelligible, Finnish and Meänkieli are at best asymmetrically intelligible but what seems to matter most in many examples is the fact that Finns who are obligated to learn Swedish in school will find it easier to understand Meänkieli since they are exposed to what encouraged significant divergence from Finnish. Lastly I just wish to reiterate, despite many Finnish speakers arguing that Meänkieli is a political project from the Swedish state, this is rather the opposite of the case. Us Meänkieli speakers were taught Finnish by the Swedish state as early as in the 1970's but we did not understand it and we wanted to learn the language that we spoke. Saying that we do not speak our native language properly and that we have to supplement it with Swedish is not only offensive it proves the entire point of us not speaking Finnish.

Practicality and Clearity

My final argument is for the sake of avoiding edit warring, and repeated linguistic disputes. Refer back to the words I illustrated earlier, if I were to add Meänkieli entries into Wikitionary under the Finnish language this would most likely inhibit our work here at Wikitionary by leading to a multitude of political disputes on the veracity of our vocabulary. If I start writing telefuuni, rekiuunali, kolektiivitrafiikki. Then wikitionary users will have to struggle with the fact that words which are not considered intelligible by the vast majority of monolingual Finnish speakers will suddenly be added, I will most likely be accused of not knowing Finnish (which I do not) and sabotaging Wikitionary because I add 'fake words' which are too reminiscent of Swedish spelled in Meänkieli orthography. This will clog up talk pages, it will lead to unnecessary arguments, and for those who seek to learn about Finnish vernacular in Swedish influenced regions of Finland they will most likely be misled to think words such as kolektiivitrafiikki exist and are as accepted in Finnish and Finland as the vernacular loan words of tykkän. Ask yourselves, is it in our interest to mislead language learners by potentially misleading them into thinking Meänkieli corresponds to Åbo-Finnish or Uusima-Finnish? Is it in our interest to have thousands of scientific and advanced lemmas added which would never be accepted in a Finnish worksite rendered as Finnish? No professor in Finland would accept me writing in the advanced and scientific vocabulary that I know, I cannot say I study at Helsingin ynivärsiteetti they would expect me to say Helsingin yliopisto (university) I could not work at a Finnish mechanic shop and say piili instead of auto (which btw would be misunderstood as arrow proving that there are even false friends between our languages).

Conclusion

All of this is to say that not only would it make our lives harder to pretend that Kven, Meänkieli and Finnish are the same language by causing countless political arguments ad nauseum. It would also lead to conflicting entries which would confuse readers in the regards to piili, furthermore it would lead to misunderstandings of Åbo, Uusima, and Ostrobothnian Finnish which would have potentially catastrophic effects for linguists seeking to study the Swedish linguistic influence upon these dialects. And it would make life harder for language learners. It appears to me that treating Kven and Meänkieli as Finnish would be a political act that neglects significant divergences and a low level of overlap especially in terms of advanced and technical vocabulary. I therefore hope you can support this request so we can add lots of new entries in the respective languages and avoid political arguments. Dominic-SS-Olofsson-Tuisku (talk) 12:54, 9 January 2026 (UTC)Reply

This really seems to boil down to a single reason: lexical differences in technical vocabulary. Usually that is not enough to make a difference between two languages. For example, on the English Wiktionary, Croatian and Serbian are considered varieties of the same language (Serbo-Croatian) rather than separate languages. — SURJECTION ^{/ T / C / L /} 14:06, 9 January 2026 (UTC)Reply

I'll bite:

1. Lexical differences in highly specialised vocabulary are a very, very poor criterium for distinguishing varieties as separate languages. British English and American English have similar differences, as do North Korean and South Korean, France French and Canadian French or Netherlands Dutch and Belgium Dutch. It's counterproductive to separate all national varieties from each other due to differing choices in the naming of technological terms.

2. All the issues named pertain not only to Meänkieli and Kven, but also to Russian Finnish (which is made up of Ingrian Finnish dialects, Isthmus dialects and some other varieties on the border between Finland and the Republic of Karelia). I'm absolutely certain that most speakers of Ingrian Finnish would use Russian borrowings like publitsnoi transportti or telefooni rather than whatever native coinage is used in Finland.

3. Once we go past the realm of 20th-century coinages and Germanic borrowings, the differences between Meänkieli and Finnish dialects on the other side of the border become very minor. Now, this in itself does not mean we should treat the two as one language - after all, there is no objective linguistic criterium to distinguish Ingrian and Finnish or Karelian and Finnish. I think a much better line of argument would be to discuss how the speakers themselves view their language, and whether indeed the majority of Meänkieli speakers consider their language distinct from Finnish or consider it just one of Finnish varieties. Now, I don't really have any knowledge regarding this myself. Thadh (talk) 14:19, 9 January 2026 (UTC)Reply

3. The sources I provided argue for that the majority of Meänkieli speakers view their language as separate from Finnish. This is what preluded the recognition of Meänkieli as a separate language as the Swedish state repeatedly failed to teach Finnish to children in Norrbotten starting from the 1970's until 2000. Dominic-SS-Olofsson-Tuisku (talk) 17:53, 9 January 2026 (UTC)Reply

The cases of Ingrian and Karelian are illustrative here because they're actually perfectly well distinguishable from the continuum of Finnish dialects; what would be trouble is arguing purely by linguistics why these should be language rather than dialect group boundaries (and here we then turn into identification). The case with Meänkieli is not the same: it is not distinguishable from the Torne Valley dialects in Finland, except perhaps by some modern Swedish-derived technical vocabulary.

Some of the examples given here are also not even that. I believe e.g. telefuuni would be immediately intelligible to most Finns, even if today for some people only with the aid of school education in Swedish and/or English. We might even find it in outright literary Finnish before the neologism puhelin reaches fixation. At least the form telefooni certainly is attested. Checking the Finnish National Library Newspaper Corpus e.g. shows appearences of telefuuni in Ilta-Sanomat in 1934, Helsingin Sanomat in 1935 or Satakunnan Kansa in 1935. The first two appear as part of dialogue and the third in a role in a theater play, i.e. all probably qualify as colloquial, but there is no indication of these having anything to do with Meänkieli. The HS example is indeed instead a text in the Southwestern dialect and SK comes from, well, Satakunta. Further appearences continue as recently as up to 2019 in Lapin Kansa (perhaps suspicious here) or 2018 in Helsingin Sanomat. Anyway the point here is that I expect average Finnish speakers to be unlikely to know too much about what has or has not existed in even slightly historical or unfamiliar dialectal Finnish. A lot of what is supposedly "unique" about Meänkieli is instead simply just this, once we start digging. --Tropylium (talk) 23:56, 10 January 2026 (UTC)Reply

An aside here, but Ingrian specifically is actually quite difficult to distinguish from the surrounding Finnish dialects, not in the least because Lower Luga Ingrian has very little in common with Ingrian Proper (Soikkola, Hevvaa, Oredeži), and that the region saw a lot of assimilations back and forth in both religion and language, which gave rise to countless merged varieties. Kurkola Lower Luga Ingrian is practically identical to the surrounding Finnish varieties (the only isogloss distinguishing them that I can think of is the verb läätä), and the main way to distinguish Ingrian Proper from the Ingrian dialects spoken on and around the Soikkola peninsula is by looking for trisyllabic gemination, which is a rather niche change to use as a major criterium for distinguishing two languages.

This is all not to say that Ingrian should be merged with Finnish - Ingrian has a literary language (although currently mostly out of use), it has a very distinguished culture tied to it and overall it has been treated as a language of its own for at least fifty years now, but I'm just pointing out that sometimes the boundary between two languages become even more vague. Thadh (talk) 07:51, 11 January 2026 (UTC)Reply

Technically Old Church Slavonic and Church Slavonic should be two two separate languages (?), but we only have the former probably because of the small number of editors. These languages are always treated as two different languages in etymology. For now in etymologies and Proto-Slavic pages (*viňaga). For now we trick it as Church Slavonic: {{l|cu|асдф}} or Church Slavonic: {{desc|cu|асдф|nolang=1}}. That is not very convenient, we should have separate etycode for Church Slavonic.

We Should also have etycode for Czech Moravian, which is also pretty often used in Proto-Slavic pages (and many etym dictionaries), Serbo-Croatian has templates like that (ckm, sh-kaj, sh-tor). Sławobóg (talk) 12:53, 5 February 2023 (UTC)Reply

@Павло Сарт, Atitarev, Kamen Ugalj, Skiulinamo, Rua, ZomBear, Bezimenen, IYI681, Vininn126 pinging some people that might be interested. Thadh (talk) 13:03, 5 February 2023 (UTC)Reply

Support @Sławobóg I completely agree with you, we need a separate etymological code for the usual Church Slavonic language. I constantly thought about it, why is it not there.. --ZomBear (talk) 19:32, 5 February 2023 (UTC)Reply

Support for Church Slavonic Безименен (talk) 13:45, 7 February 2023 (UTC)Reply

Oppose for Czech Moravian: there would be 20-30 more regional varieties that could spring if one started Balkanizing Slavic languages + I don't want to give food for thought to Z-Russians. There are already talks for forging Novorussian, Transnistrian, or Lipovan Russian in order to justify their expansive aspirations over former Imperial Russian territories. Безименен (talk) 13:45, 7 February 2023 (UTC)Reply

Support for Church Slavonic AshFox (talk) 11:51, 6 January 2025 (UTC)Reply

Support for Church Slavonic, or at least there should be a concrete way to handle non canonical words. Chihunglu83 (talk) 11:55, 6 January 2025 (UTC)Reply

@Sławobóg @AshFox @Павло Сарт, @Atitarev, @Rua, @ZomBear, @Bezimenen, @IYI681, @Thadh

It's been a long time, but rereading this thread, we can see at least 5 people for splitting Church Slavonic. I propose to split Church Slavonic and give it etymology codes for the two variants, which I think best matches consensus by number of votes, even if there is disagreement within that. As to Moravian, I think it would be a safe split, but we only had 2 people speak up on it. I'd like the input of other Czech editors. I'll also add Wiktionary:Language_treatment_requests#East_Lechitic_typology and say that the dialect groups all got etymology codes and it has not led to more codes and has overall been a massive benefit. Vininn126 (talk) 12:11, 6 January 2025 (UTC)Reply

@Vininn126: What two variants do you mean? Russian and Serbian? Russian and Croatian? I think this was always the issue with splitting, because we don't have enough people that could comment on which varieties can and cannot be handled together. Thadh (talk) 16:07, 6 January 2025 (UTC)Reply

Perhaps those in favor of splitting could comment. Vininn126 (talk) 16:15, 6 January 2025 (UTC)Reply

It appears it should have 4 variants. Vininn126 (talk) 09:48, 7 January 2025 (UTC)Reply

I have made zls-chs at Module:languages/data/exceptional. As far as etycodes for the recessions and setting east South-Slavic as descendants, this thread should be expanded. At this time an etycode for Moravian needs more input as well. Vininn126 (talk) 10:24, 7 January 2025 (UTC)Reply

> Etym-codes for recensions of Church Slavonic. AshFox (talk) 12:09, 11 January 2025 (UTC)Reply

I also propose to do away with similar problems in the tree of Slavic languages once and for all. I suggest:

1. Add etymological code for Old Serbo-Croatian (zls-osh). With a redirect to modern Serbo-Croatian. Meets regularly in {{R:sla:ESSJa}}.

2. Add etymological code for Old Slovene (zls-osl). With a redirect to modern Slovene. Meets regularly in {{R:sla:ESSJa}}.

3. Move the Macedonian language to the descendant of Old Church Slavonic, as it was done some time ago with the Bulgarian language.

4. Add etymological code for Church Slavonic (cu-chu). Perhaps even with a division into Russian Church Slavonic (cu-rcu), Serbian Church Slavonic (cu-scu) and others, if any.

1. Add etymological code for Middle Polish (zlw-mpl). With a redirect to modern Polish or (?). @KamiruPL, Vininn126

2. Add etymological code for Old Slovak (zlw-osk). With a redirect to modern Slovak. It was high time to do it! Meets regularly in {{R:sla:ESSJa}}. Especially if even Early Modern Czech (cs-ear) was awarded a separate code.

3. Possibly add (family code) a Czech–Slovak languages (zlw-csk) ?. Just like there are Lechitic (zlw-lch) F.

4. It's possible: add etymological code for "Old Sorbian" (see Wendish/Lusatian ?) (zlw-osb)? Perhaps with a redirect to Upper Sorbian or (?).

1. Rename etymological codes Old Ukrainian (zle-ouk) & Old Belarusian (zle-obe) → Middle Ukrainian (zle-muk) & Middle Belarusian (zle-mbe), respectively. A similar request from another user was about six months ago (Wiktionary:Beer parlour/2022/September#“Old Ruthenian” language). Therefore, with "Old" for those languages, these are "parts" of Old East Slavic until the 14th c. (this is indicated on the en.Wikipedia).

2. Probably it is worth removing the Old Novgorod from the descendants of the Old East Slavic. Make it a separate and parallel ancient language in the East Slavic subgroup. --ZomBear (talk) 19:32, 5 February 2023 (UTC)Reply

3. Add etymological code for Pannonian Rusyn with a redirect to Rusyn (rue).

PS: LOL, I'm serious, add an etymological code for "Early Proto-Slavic" (sla-ear) (?) with a redirect to Proto-Balto-Slavic (?). Because Wiktionary "for the standard" uses a rather late version of the Proto-Slavic language. And sometimes in the Etymology section it may be necessary to indicate an earlier form, and the presence of a separate etym-code for "Early PSl." would not be superfluous. --ZomBear (talk) 19:50, 5 February 2023 (UTC)Reply

I don't think any "Old Sorbian" is attested. Both Upper Sorbian and Lower Sorbian are attested only from the 16th century, and they were already distinct at that point. In theory there could be a code for Proto-Sorbian, but it would have to be a full-fledged protolanguage, not an etymology-only language. —Mahāgaja · talk 20:17, 5 February 2023 (UTC)Reply

@Mahagaja Yeah, I'm not sure about "Old Sorbian" either. This suggestion is only possible. I relied on the fact that in {{R:sla:ESSJa}} sometimes there are words with abbreviations "ст.-луж."/"др.-серболуж." ("старолужицкий"/"древнесерболужицкий" = translation "Old Sorbian") without specifying where the word belongs - to the Upper or Lower Sorbian language. --ZomBear (talk) 21:09, 5 February 2023 (UTC)Reply

@ZomBear: I agree with most of your suggestions, except for Old Serbo-Croatian and Old Sorbian. Serbs and Croats never had an organized shared language until 17-18 century. One could perhaps talk about an Old Serbo-Croatian stage in the development of the Dinaric Slavic complex, but there never was a common language that could be associated with this period (leaving aside the Bosno-Rascian recension of Church Slavonic or Glagolitic Croatian). The same holds in even greater magnitude for Sorbian. Sorbs may self-identify as one people ethnically, but linguistically their languages are noticeably divergent.

PS I also don't see much educational value in copying all the distinctions that you can find in ESSJa. Note that it often gives old spellings that precede various spelling reforms, dialectal forms which don't follow any orthographic standard, morphological variants (like diminutive forms, etc.) which don't contribute much additional insight, it provides local colloquial meanings which are clearly recent innovations, etc. I personally prefer a more concise and economic presentation for reconstructed terms rather than having 10-15 dialectal spellings of Serbo-Croatian or those monstrosities that are given as dialectal variants of Polish/Bulgarian/Slovenian by ESSJa. Meiner Meinung nach, such an information should go to the respective page of the daughter language, rather than overblowing the proto-Slavic Descendants section.

PS2 Early proto-Slavic is a useful designation, however, I don't know where exactly where one should draw the border between Early, Middle and Late proto-Slavic and what notation should be applied. Безименен (talk) 13:30, 7 February 2023 (UTC)Reply

As it stands, Middle Polish is listed as a variant of Modern Polish. We do see some significant phonological changes and a few semantic ones as well, however, it's hard to say whether it should have its own code or not. Even if it did, it would certainly be a redirect to Modern Polish, seeing as it's a period of only about 1250 years. (1500-1750). Vininn126 (talk) 13:36, 7 February 2023 (UTC)Reply

@Vininn126: That's 250 years. —Mahāgaja · talk 15:16, 7 February 2023 (UTC)Reply

The one and the two are right next to each other.

Proto-Mon-Khmer is deprecated. The name of Category:Proto-Mon-Khmer language needs to be changed to Category:Proto-Austroasiatic language, just like how we have Category:Proto-Sino-Tibetan language rather than Category:Proto-Tibeto-Burman language. See the Wikipedia article on Austroasiatic languages to get an idea of why Mon-Khmer is no longer valid, because Munda and Nicobarese are simply regular branches that are sisters of the other so-called Mon-Khmer languages.

The page names can simply be renamed, and the lemmas do not need to be changed. Category:Proto-Sino-Tibetan language is a perfect example of this. The Proto-Sino-Tibetan lemmas are actually all Proto-Tibeto-Burman reconstructed forms by James A. Matisoff, who considers Tibeto-Burman to be a branch of Sino-Tibetan. Now, more scholars are thinking that Chinese is simply another another regular sister branch of the various Sino-Tibetan languages out there, rather than its own special branch. Same goes for Mon-Khmer.

So how can this name change be done? Ngôn Ngữ Học (talk) 22:23, 18 March 2023 (UTC)Reply

Formerly:

Austroasiatic
- Munda
- Mon-Khmer (which Shorto reconstructed)

Now the consensus is that the tree has a rake-like structure (per Sidwell):

Austroasiatic
- (about a dozen branches including Munda)

That's why Mon-Khmer is an obsolete term now.

Similarly, with Sino-Tibetan, it formerly was:

Sino-Tibetan
- Chinese
- Tibeto-Burman (which Matisoff reconstructed)

Now the consensus among many scholars is that the tree has a rake-like structure with many "fallen leaves" (quoting George van Driem), making Tibeto-Burman obsolete:

Sino-Tibetan
- (dozens of branches including Chinese)

Ngôn Ngữ Học (talk) 22:27, 18 March 2023 (UTC)Reply

Support. If this change happens we should delete Category:Mon-Khmer languages. Benwing2 (talk) 23:41, 18 March 2023 (UTC)Reply

Abstain. I prefer to wait for when an actual new reconstruction of Proto-Austroasiatic is published to do the move, see what I wrote at Wiktionary:About Proto-Mon-Khmer, but I do not actually oppose to moving now. However, if the move do happen, I'm would like to see a line like "This reconstruction is from Shorto (2006) for the obsolete concept of Proto-Mon-Khmer, and should not be treated as actual reconstruction of Proto-Austroasiatic, which as of now has not yet fully materialized, and is simply "placeholder" for the actual Austroasiatic etymologies" (probably as a template) to be added as warning for every reconstruction item. I very much want the same thing to happen to "Proto-Sino-Tibetan", considering a lot of them are no way near actual Proto-Sino-Tibetan, and the reconstruction items themselves are "icky" to say at least. PhanAnh123 (talk) 01:52, 19 March 2023 (UTC)Reply

@PhanAnh123: Take a look at Sidwell's Proto-Austroasiatic reconstruction and Shorto's Proto-Mon-Khmer reconstruction. Sidwell's inclusion of Munda and Nicobarese had virtually no impact on his Proto-Austroasiatic reconstruction (versus if he had only included the "Mon-Khmer" languages) because he considered Munda to be highly innovative and restructured, with few original retentions from Proto-Austroasiatic. Furthermore, it would be very confusing to have duplicates for both Proto-Austroasiatic and Proto-Mon-Khmer. I would just merge them as Proto-Austroasiatic. Ngôn Ngữ Học (talk) 19:25, 19 March 2023 (UTC)Reply

I have no intention to keep Proto-Austroasiatic and Proto-Mon-Khmer seperated (I consider Proto-Mon-Khmer to be likely a ghost after all), what I mean is that we either should keep the entries as are until actual Proto-Austroasiatic reconstruction comes about, or move the "Proto-Mon-Khmer" items to Proto-Austroasiatic but with the warning added. I know what you mean by "inclusion of Munda and Nicobarese had virtually no impact", because like Sidwell, I do think these branches are quite innovative, however, that does not mean I agree to move the Shorto's Proto-Mon-Khmer reconstruction to Proto-Austroasiatic without any warning, since Austroasiatic linguistics have progressed quite a lot even outside of those two branches. The vocalism in Shorto (2006) was very rudimentary reconstructed, which the reconstruction of the descendant branches as well as the recent "sneak peek" to Proto-Austroasiatic reconstruction by Sidwell improved upon; furthermore, the syllable structure itself is also slightly changed, it is now thought that a glottal stop phonetically presented in any Proto-Austroasiatic word that ended in a pure vowel (meaning any word ended in *aːj would still have *aːj, but those ended in **aː would automatically became *aːʔ), plus there is the status of *ʄ- that very much awaits assessment in the actual reconstruction of Proto-Austroasiatic. Like I said, I don't oppose moving, but there much be strings attached. PhanAnh123 (talk) 01:53, 20 March 2023 (UTC)Reply

@PhanAnh123, Ngôn Ngữ Học Such a warning can be added by bot to the top of all entries if both of you agree. Benwing2 (talk) 03:30, 20 March 2023 (UTC)Reply

@Benwing2: Agree, a warning placed by a bot should be sufficient. Also @PhanAnh123, we can use Sidwell & Rau (2015) for some of the basic Swadesh list words, but a full reconstruction of Proto-Austroasiatic is currently being done by Sidwell. It should come out in a few years. Ngôn Ngữ Học (talk) 10:19, 20 March 2023 (UTC)Reply

We are all in agreement then, so obviously now I support moving. With this Munda cognates can be directly added to the entries. PhanAnh123 (talk) 10:29, 20 March 2023 (UTC)Reply

Agree on the support.

~~Abstain~~ Support. I've seen assertions that Mon and Khmer actually form a subgroup within the traditional Mon-Khmer grouping. Of course, it could be something messy as with Indo-European, where we have at least Indo-Iranian and Balto-Slavonic. --RichardW57m (talk) 16:19, 21 March 2023 (UTC)Reply

There is no such thing as a Mon+Khmer grouping within Mon-Khmer. Some classifications propose Eastern, Southern, and Northern groupings within Mon-Khmer, but none of them put Monic and Khmeric together. Please consult the Austroasiatic languages article on Wikipedia to get a basic refresher of all the major previous classifiations. Ngôn Ngữ Học (talk) 15:04, 23 March 2023 (UTC)Reply

The cited articles do show that their crown group is larger than Monic + Khmeric, but it does look as though we don't need to worry about anyone using 'Mon-Khmer' to denote their (weak) association. --RichardW57m (talk) 11:36, 28 March 2023 (UTC)Reply

Discussion moved from Wiktionary:Beer parlour/2023/June.

These are two Ryukyuan languages that we currently call Oki-No-Erabu and Toku-No-Shima, because that’s how they’re spelled in ISO 639. However, literature invariably uses the unhyphenated forms, and they’re also much easier to read.

Could we please therefore rename them to the unhyphenated forms? Theknightwho (talk) 19:39, 4 June 2023 (UTC)Reply

I dislike the EN penchant for glomming Japanese names into long undifferentiated strings, as I find that this instead makes them harder to read, and it erases the distinction between the actual component terms.

In some cases, the resulting interpretation or partial-expansion goes sideways, as we see at w:Tokunoshima, where the English text describes this as "Tokuno Island" -- the no portion is simply the genitive particle の (no), so as Japanese, this is better thought of as "Toku Island".

Name derivation, for those inclined to dive into the details...

The Japanese historical record bears this out, with the first mention in a 699 text as 度感. At the time, this may have been pronounced as something like twokom or dwokom, based on the Middle Chinese readings and known man'yōgana sound values, although some sites render this as toku or doku; it is not clear to me where the ku reading for 感 comes from. At any rate, the no is not part of the base of the name.
For those interested and who can read Japanese, here are several references at the Kotobank aggregator site. Search the page for 度感.
See also this entry at Nihon Jiten, which also lists 度感嶋 as an attested spelling with the pronunciation Toku Shima, further evidence that the base name is simply Toku and that the no is the particle.

That aside, I do see that w:Tokunoshima language lists the alternative rendering "Toku-No-Shima", and the w:Okinoerabu dialect cluster similarly lists the alternative rendering "Oki-no-Erabu". A quick-and-dirty Google hits comparison (including "the" to filter for English hits):

In the English-language web, the allthewordsruntogether renderings appear to be most common. Meanwhile, the

Language Subtag Registry based on ISO 639 and maintained by IANA

(https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry) does indeed use the hyphenated descriptors.

Meh. After digging into this some, I realize I just don't care all that much one way or the other. ‑‑ Eiríkr Útlendi │^{Tala við mig} 22:09, 9 June 2023 (UTC)Reply

Searching on Google Scholar, it seems the unhyphenated forms are more common, but I concur with Eirikr's views that they look worse.

However, I would suggest that if we were to retain the hyphens, the two languages should be renamed to "Oki-no-Erabu" and "Toku-no-shima" (or the rarer "Toku-no-Shima"), since the these are more common from Google Scholar, and also because "no" is a particle that shouldn't be capitalised in a proper noun, cf. Southend-on-Sea, Stoke-on-Trent or von, de, etc. in surnames. – Wpi (talk) 11:20, 21 June 2023 (UTC)Reply

Could you correct Juǀ'hoan to Juǀʼhoan, Kwak'wala to Kwakʼwala, and K'iche' to Kʼicheʼ? There's no punctuation in the ethnonyms. If we want to use assimilated English forms, then the latter would be Quiché; I'm not sure about Juǀʼhoan. kwami (talk) 19:16, 13 July 2023 (UTC)Reply

Support. To clarify for people using low-resolution screens: the request is to use the modifier letter apostrophe character ʼ rather than the typewriter apostrophe '; the categories are currently at Category:Juǀ'hoan language (ktz) and Category:K'iche' language (quc). Our usual practice is to use the spelling most common in contemporary English-language discussions of the language. Which is more common in current books and journal articles, Kʼicheʼ or Quiché? —Mahāgaja · talk 19:30, 13 July 2023 (UTC)Reply
Just to be clear, I personally don't care about ASCII substitutions in category names; what I'm concerned about is proper headers in the dictionary entries. But it's fine by me if the two go together.

As for Kʼicheʼ or Quiché, the English-language lit has been moving from the Spanish form to the ethnonym. That's an ongoing trend, though of course not universal (e.g. 'German', 'Greek', 'Armenian' etc.). kwami (talk) 21:15, 13 July 2023 (UTC)Reply

The L2 headers and category names do need to match, at least for readers using tabbed browsing. Otherwise, the categories won't appear in the correct language tab. I think there are also bots that require the L2 header to be the canonical language name in order to work properly. —Mahāgaja · talk 22:20, 13 July 2023 (UTC)Reply

Okay. Works for me. kwami (talk) 22:24, 13 July 2023 (UTC)Reply

@Kwamikagami Normally at Wiktionary we use typewriter apostrophes rather than curly single quotes, and this issue is somewhat controversial, so this change is unlikely to happen without significant further discussion and consensus. Benwing2 (talk) 04:27, 24 July 2023 (UTC)Reply

I'm not requesting quote marks. That would also be incorrect. Rather, since we are attempting to use the endonym, IMO it should be the glottal stop or ejective diacritic that's in the orthography. kwami (talk) 04:41, 24 July 2023 (UTC)Reply

Indeed, no one is advocating curly single quotes. The modifier letter apostrophe is a different character; it's a letter, not a punctuation mark. There are several other language names besides these two that ought to be using it. —Mahāgaja · talk 06:23, 24 July 2023 (UTC)Reply

Sarci, for example, which was just moved to its endonym (minus tone marking). But I thought I'd wait to see how things went before attempting a more comprehensive proposal. kwami (talk) 06:27, 24 July 2023 (UTC)Reply

Support - this isn't a matter of using curly quotes vs straight ones; it's a matter of using the correct letter instead of punctuation. We already do this extensively in entries for languages that use it anyway. Theknightwho (talk) 15:39, 24 July 2023 (UTC)Reply

Going through WT:LOL, these are the languages whose names have the modifier letter apostrophe at Wikipedia but the typewriter apostrophe here:

Other languages with typewriter apostrophe whose Wikipedia article uses a different character include:

gez Ge'ez → Geʽez with ʽ (U+02BD modifier letter reversed comma)
hps Hawai'i Pidgin Sign Language → Hawaiʻi Pidgin Sign Language with ʻ (U+02BB modifier letter turned comma)
num Niuafo'ou language → Niuafoʻou with ʻ (U+02BB modifier letter turned comma)
tct T'en → Tʻen with ʻ (U+02BB modifier letter turned comma)
tsl Ts'ün-Lao → Tsʻün-Lao with ʻ (U+02BB modifier letter turned comma)

I support making all of these changes. —Mahāgaja · talk 19:54, 24 July 2023 (UTC)Reply

I oppose these changes. What is the actual benefit? From the above discussion, there are at least three different Unicode apostrophe-like characters involved, which are easily confused, and it will make it significantly harder to type the language names into headers, categories and the like. This is going to be a major pain in the ass for people like me who will have to clean up wrongly-typed apostrophes in language headers in innumerable articles created by IP's and other occasional contributors, who are unlikely to be able to type the right character. Furthermore, even with these changes, the language names in many cases will not actually match their endonym spelling; cf. the proposed Oʼodham, which is actually spelled ʼOʼodham natively with two apostrophes. Similarly, as pointed out by User:Kwamikagami, our spelling of the CAT:Tsuut'ina language doesn't include the tone mark that is present in the native orthography, and wouldn't even with the change in apostrophe. I should add that Wikipedia uses these Unicode chars specifically because Kwami went around renaming all the articles (formerly they used the straight apostrophes), and is not consistent, e.g. the article on the name of the people is still at O'odham with a straight apostrophe. Glottolog uses straight apostrophes for O'odham; so does [8], the Endangered Languages Project. In general, our policy is to use the *English* names for languages; we are not forced to use the exact native spelling. While I agree it's a good idea to approximate the spelling (e.g. avoiding exonyms where possible), I disagree we have to take this to the extreme of using the "correct" Unicode apostrophes (which I bet you will find native speakers not using in many cases as well). Benwing2 (talk) 20:22, 24 July 2023 (UTC)Reply

Other people's carelessness in using Unicode is no excuse for us to be careless, and anyway, language names can always be inserted by typing {{subst:\|xyz}}, which doesn't involve any non-ASCII characters. Latin a and Cyrillic а look identical in every font and font style too, but substituting one for the other is an error; it's no different with ' and ʼ. —Mahāgaja · talk 07:05, 25 July 2023 (UTC)Reply

I think you're missing the point. We don't include Cyrillic letters in language names, either. Benwing2 (talk) 07:13, 25 July 2023 (UTC)Reply

I know that. My point is that using ' where ʼ belongs is as bad as using Cyrillic letters in Latin-script language names. —Mahāgaja · talk 07:24, 25 July 2023 (UTC)Reply

I would support the changes, but only if they're truly the most used forms in terms of literature. Ideally we'd have people from each community give their opinions here, but alas, we're not afforded that. If the specific respective unicode apostrophe is used in literature, then we can use it here too. I can see the problem with inputting the apostrophes that's been brought up, but let's be real here, how many people are actually working on these languages to where this'd be a serious problem? I feel like this could be fixed with just an about:XYZ page or something. These languages unfortunately don't get enough traction. But again, I'd only support this if it can be proven that they're the forms used in English literature. AG202 (talk) 01:49, 17 August 2023 (UTC)Reply

@AG202 I agree with you, that is one of the points I made above, which has gotten lost in this thread. Benwing2 (talk) 02:08, 17 August 2023 (UTC)Reply

Ahh, got it, missed that, apologies. AG202 (talk) 02:11, 17 August 2023 (UTC)Reply

Hmm... like Benwing, my initial inclination is to oppose this, because the odds of anyone being able to type names with the fancy characters when adding entries is low (and given recent events, I wonder if one or more admins would block people for 'adding wrong language names' if people keep typing the names they're able to type). OTOH, I recognize that we require entries themselves to be input using correct spellings (with accents etc) and not in hacky ways... If we had a system like the French Wiktionary where no-one had to type the language names (instead only typing language codes, which only consist of easily-typeable ASCII characters), then changing the displayed character would be less of a problem (though still hard for navigating to categories, etc). Do we have a template with a simple short name people could subst: to produce the untypeable names, so they could write =={{subst:langname|foo-bar}}== to get ==Fooʾbar==? Or if we took this type of functionality and had a button people could periodically press (hosted on here like that Javascript is, not as a Python script on the computer of a user who might leave the project or be too busy to run it) that would search the database for instances of the typeable names and update them to the untypeable names, then it would be less of a problem (although it'd still be creating an unending maintenance task). - -sche (discuss) 16:22, 16 August 2023 (UTC)Reply

We do have {{subst:x2i}} that will convert the string _> to ʼ, but more helpfully we have (as I mentioned above) {{subst:\}}, which converts a language code to its canonical name. —Mahāgaja · talk 21:55, 16 August 2023 (UTC)Reply

Even with these workarounds, it seems extra work for no gain. There is no rule that says we need to follow native orthography to the T in our English names for languages; otherwise we'd have Deutsch in place of German, and русский in place of Russian, etc. I have seen no arguments that indicate why having these special apostrophes in language names gains us anything except some nebulous sense of "correctness". Benwing2 (talk) 23:07, 16 August 2023 (UTC)Reply

Deutsch is the endonym. What we're talking about here is using the proper Unicode characters for whichever name we decide to use. The apostrophe is a punctuation mark, and the glottal stop is not punctuation. Using the letter for glottal stop is analogous to using en-dashes and minus signs rather than hyphens. kwami (talk) 00:28, 17 August 2023 (UTC)Reply

Deutsch is the endonym

Yes exactly. The exonym can have apostrophes while the endonym has Unicode whatever. Nothing wrong with that. Benwing2 (talk) 00:56, 17 August 2023 (UTC)Reply

@Benwing2 I think we’re getting too focused on Unicode. The thing we should care about is what character is actually intended, which isn’t necessarily the same as what they actually wrote. To use an analogy: we don’t lemmatise the palochka with the numeral 1 or Latin l, even though both are probably more common than the actual palochka character, and that’s because we all know that the writer intended to use a palochka irrespective of what character they actually wrote in Unicode. Theknightwho (talk) 02:18, 17 August 2023 (UTC)Reply

@Theknightwho I think we'll just have to agree to disagree here. I don't think the analogy you are making here with palochka is very applicable and you're still missing the point made by User:AG202 about what's the most common usage in scholarly and other English sources. Benwing2 (talk) 02:24, 17 August 2023 (UTC)Reply

@Benwing2 The whole reason I brought it up is as an example of when the most common usage isn’t necessarily an indicator of what’s most appropriate. I’ve also seen plenty of typography mistakes in scholarly sources, too, or fonts that map common characters to a glyph of what is actually intended. You can’t just rely on the codepoint. Theknightwho (talk) 02:27, 17 August 2023 (UTC)Reply

Just to be clear, when I said common usage, I meant what character is actually intended, not necessarily parsing specifically based on codepoints. However, this isn't an easy task for sure, unfortunately. AG202 (talk) 02:49, 17 August 2023 (UTC)Reply

Doesn't matter whether it's the endonym or exonym: the apostrophe is a punctuation mark, and these are not punctuation marks. Yes, we can substitute, and that's common enough. We could also use a hyphen for a minus or a double hyphen for an em dash -- those substitutions are common too -- but that doesn't mean we should do that. We could substitute click letters with exclamation marks and pipes. But if we want Wiktionary to look professional, then IMO we should typeset it professionally, and not use ASCII substitutes just because they're easier to type. kwami (talk) 04:06, 17 August 2023 (UTC)Reply

Support I'm surprised this still hasn't happened yet. MedK1 (talk) 19:43, 29 December 2025 (UTC)Reply

Support 0DF (talk) 20:18, 29 December 2025 (UTC)Reply

I read five votes in favour (kwami, Mahāgaja, MedK1, Theknightwho, and myself) and two votes against (Benwing2 and -sche). It is not clear to me how I should read AG202's comments. 0DF (talk) 20:24, 29 December 2025 (UTC)Reply

@0DF First of all, you are trying to close a two-year-old vote directly after two votes came in in favor of the change, after two years of clear no consensus. That won't work and comes across as trying to game the system. You need to wait a good deal longer for further votes to come in. Secondly, we don't just blindly count votes, we consider the reasons given, and neither yours nor @MedK1's vote came with any reason in favor. Note, for example, that I have postponed renaming Byzantine -> Medieval Greek due to your vociferous opposition, even though there is a 4-2 vote in favor. Third, it appears many of the pro votes are voting "because it's the right thing to do" or "because it's kewler that way", ignoring the larger, long-established consensus that we need to establish common usage in English sources of these funky Unicode quotes (this is what @AG202's comments are about). Fourth, this is is a very large change and should definitely be brought up in the Beer Parlour, to get a wider discussion going; we might even need a formal vote. Benwing2 (talk) 00:30, 30 December 2025 (UTC)Reply

The Beer Parlour would probably be best, to attract more attention. kwami (talk) 02:25, 30 December 2025 (UTC)Reply

Oppose per Benwing. I see no reason any English-language written works would write these with anything other than an apostrophe ' or quotation mark ’, and indeed there’s been no evidence to the contrary. Who even uses these substitution templates? Language names should not be treated the same as the standard encoding for entry titles. — Polomo ⟨⁠ ⁠oi!⁠ ⁠⟩ · 12:33, 30 December 2025 (UTC)Reply

@Benwing2: I wasn't trying to close anything; I was just tallying the votes so far. Now the numbers are five in favour, three against, and one unclear. You're right that reasons should always matter more than votes, however. (And re Byzantine Greek, there is a stronger consensus to split that chronolect out of Ancient Greek than there is to rename it, yet both changes are on hold.) I would actually prefer that we call quc Quiché than Kʼicheʼ, but Kʼicheʼ is better than K'iche'. Many discussions languish here because of the relative obscurity of the forum (at twenty-nine months of age, this one is a case in point) and would be better conducted in the Beer parlour. 0DF (talk) 14:34, 30 December 2025 (UTC)Reply

Could we rename Kutenai (kut) to Ktunaxa, and Shuswap (shs) to Secwepemctsín please? The first names are the Anglicized terms for the languages, and are somewhat outdated and/or not in use among speakers. GKON (talk) 22:46, 12 August 2023 (UTC)Reply

@-sche Can you weigh in here? There is nothing wrong per se with having exonyms for languages (we say "German" not "Deutsch" for example), and I note that Wikipedia still uses Kutenai and Shuswap. The main issue in my view is (a) avoid pejorative terms, and (b) use the most common terms as found in English-language sources. Benwing2 (talk) 23:37, 15 August 2023 (UTC)Reply

For Shuswap, almost no-one uses Secwepemctsín in English, either in books overall as tracked by Ngram Viewer, or in reference works about the language at Glottolog. For kut, Kutenai was the main name (in reference works/Glottolog and overall/Ngrams) until a few years ago, when Ktunaxa started to just barely overtake it. - -sche (discuss) 17:45, 16 August 2023 (UTC)Reply

That is true, however I would argue that for Shuswap, the use of this term is declining as seen by Ngram. The replacement is looking like Secwepemc, which is another word for the language that is kind of a good middle ground between Shuswap and Secwepemctsín, wouldn't you say? Also, the actual communities in Secwepemc traditional territory mostly use Secwepemc. For example, if there is some quote or phrase on a billboard in Shuswap, the billboard will say that it's in Secwepemc. Another real life example was a board in Banff town, which had greetings in multiple languages. Among them was Blackfoot, Stoney, Ktunaxa, and Plains Cree, (apart from Ktunaxa) these are all Anglicized terms. However the greeting in Shuswap was said to be Secwepemc.

Shouldn't we be using this term, seeing as it gets the most use in these modern times? GKON (talk) 17:09, 20 August 2023 (UTC)Reply

Tagging @-sche and @Benwing2 who are likely to be interested in this. Here is a list of languages that currently lack ISO codes, with a brief explanation as to why they probably justify an L2 code. In a couple of cases, we're never likely to have more than a handful of entries for the language in question due to the scant number of attestations we have, but I don't think that should be used as justification for exclusion.

Malamuthan (dra-mal)
A small tribal language related to Malayalam - we have quite a few of these already, and I see no obvious reason to exclude this one.
I'm having trouble finding any reference works about this; Mikhail S. Andronov (in A Comparative Grammar of the Dravidian Languages and A Grammar of the Malayalam Language in Historical Treatment) speaks of "the Malamuttan dialect". Perhaps we should just wait until someone has content they're wanting to add in this lect, to judge how distinct it is. - -sche (discuss) 19:38, 16 January 2024 (UTC)Reply
@-sche I'm not sure if you've seen it, but pages 37 to 39 of Tribal Languages of Kerala has some information about it, which notes a number of distinctive qualities; not least because they have a very strong tradition of isolating themselves from outsiders. That paper cites a 1981 reference work, but I assume it's in Malayalam. Theknightwho (talk) 14:35, 20 February 2024 (UTC)Reply

Kishtwari (inc-kst)
Closely related to Kashmiri (and sometimes classified as a dialect), but only retains partial mutual intelligibility, and (unlike Kashmiri) appears to be written using the Takri script.
Oppose: I have never seen Ka/ishtwari referred to anything other than a dialect of Kashmiri, alongside Kohistani, Poguli, Rambani, and Siraji. --{{victar|talk}} 08:32, 8 December 2023 (UTC)Reply
@Victar Poguli has an ISO code, so I’m not sure how much value your assertion has. Theknightwho (talk) 08:42, 8 December 2023 (UTC)Reply
And just because an ISO code exists, doesn't mean we on the project should create a language for it. Often times, village dialects have codes just because someone put out a paper on it, not because it's any more unique than any other dialect on the continuum of dialects. --{{victar|talk}} 09:30, 8 December 2023 (UTC)Reply
@Victar It calls into question the value of your statement that you have never seen it referred to as a language, if you’re putting it on the same level as a lect which does, in fact, have a language code. It also directly contradicts your previous statement as to the weight we should put on language codes. There is also the matter of the Takri script. Theknightwho (talk) 09:44, 8 December 2023 (UTC)Reply
It doesn't contradict my opinion at all. In my experience, partially when it comes to Indo-Iranian, is ISO over assigns language codes, so trying to give a language code to a dialect when even ISO doesn't is saying something. --{{victar|talk}} 10:22, 8 December 2023 (UTC)Reply
@Victar None of which is relevant to the fact there is evidence it isn’t even written with the same script - please present something more substantive than a personal hunch, or a selective approach to the weight you put on language codes. Theknightwho (talk) 10:29, 8 December 2023 (UTC)Reply
A language written in multiple scripts is practically a hallmark of Indo-Iranian languages and to cite that as a reason to call it a different language would be naive. --{{victar|talk}} 10:39, 8 December 2023 (UTC)Reply
@Victar You’re being highly misleading: when a “dialect” is written in a different script, its speakers do not consider themselves to be speaking the same language, and it’s also highly divergent (to the point where it is tonal, unlike Kashmiri), then it creates a compelling case for separating it out. Theknightwho (talk) 10:44, 8 December 2023 (UTC)Reply
That is such an absurd statement. Script usage is frequently dependent of region and religion. Most literate Kashmiri speakers write in Perso-Arabic but the Hindus population uses Devanagari, regardless of any dialectal differences. Also I can't find any paper states Kishtwari is any more or less tonal than standard Kashmiri. You're overreliant on a Wikipedia article for your facts. --{{victar|talk}} 11:41, 8 December 2023 (UTC)Reply
@Victar Except this is the Takri script and it is directly related to “dialectal” differences, so your comparison is nonsensical because it shows that script usage in this case is affected by the lect, not other factors like religion. Standard Kashmiri isn’t tonal at all, as you very well know. Theknightwho (talk) 11:48, 8 December 2023 (UTC)Reply
Yes and the Kishtwari dialect is spoken in the region of the Kishtwar Valley, and the use of Takri is regional. Again, no paper I read remarks anything on tone. Unless you can provide a paper, your statement is meaningless. --{{victar|talk}} 11:57, 8 December 2023 (UTC)Reply

@Victar we also have code for haryanvi, considered a dialect of Hindi. So should it be removed? Word0151 (talk) 12:48, 8 December 2023 (UTC)Reply
🤷 Plenty of Hindi project users that can decide that. --{{victar|talk}} 01:33, 9 December 2023 (UTC)Reply
Urtsuniwar (inc-unr)
Closely related to Kalasha, but appears to be divergent enough to constitute a separate language with around 70% mutual intelligibility (compare Spanish/Portuguese with 85-90%).
Oppose: Urtsuniwar is a synonym for Kalasha, see Decker (1992). Some speakers just use more Khowar borrowings than others. --{{victar|talk}} 08:32, 8 December 2023 (UTC)Reply
@Victar Patently untrue - numerous references in the sources provide by WP (and elsewhere), and you’ve failed to explain the issue of mutual intelligibility. Theknightwho (talk) 08:45, 8 December 2023 (UTC)Reply
How is it "patently untrue"? Did you read Decker (1992): "Kalasha speakers in the Urtsun Valley sometimes call their language Urtsuniwar." I did explain the "issue of mutual intelligibility" -- speakers of Kalasha use varying degrees of Khowar borrowings. --{{victar|talk}} 09:30, 8 December 2023 (UTC)Reply
@Victar 70% mutual intelligibility is far below the threshold typically used to classify something as a dialect (80-85%) - the fact that one citation says they are the same does not discount the wealth of evidence to the contrary. Theknightwho (talk) 09:44, 8 December 2023 (UTC)Reply
What "wealth of evidence"? The first reference on the Wiki page literally lists Urtsuniwar under "Other Names" for Kalasha, beside Bashgali, Kalashwar, Kalashamon, and Kalash. Shall we make Kalashwar its own language as well? Another reference there is titled, I shit you not, "Kalasha of Urtsun". --{{victar|talk}} 10:22, 8 December 2023 (UTC)Reply
@Victar Insufficient levels of mutual intelligibility, as stated several times. Theknightwho (talk) 10:32, 8 December 2023 (UTC)Reply

Gorgani (ira-gor)
An extinct Caspian language attested in the 14th century, which appears to have formed a dialect continuum with Mazanderani. Previous discussion here.
Oppose: The few texts we have in Gorgani are almost indistinguishable from Old Tabari, the ancestor of Mazanderani, and should be considered a dialect of it, not its own language. There are actually more differences between Old Tabari and Mazanderani, but, like Classical Persian and Modern Persian, we treat them as the same language, in large part due to their use of an abjad alphabet. @Fay Freak --{{victar|talk}} 19:35, 7 December 2023 (UTC)Reply

@Victar In all seriousness: given you clearly respect the views of Borjian, how do you explain his apparent change in view from the line you quoted from 2004 and his 2008 paper on Gorgani in which he invariably refers to it as a language (not a dialect)? Theknightwho (talk) 22:43, 7 December 2023 (UTC)Reply
By its only being apparent. If you search for such a distinction. I’ve just looked into the 2008 paper again just for you. Normal(ly) people don’t look upon the statistical distribution of the employment of “language” and “dialect” in previous publications to find “changes in view” of linguists. Their views are rarely that sophisticated that one could make meta publications as one does on philosophers, and even then following such a bright shiny object is not an argument. language has multiple languages like sublanguage, including dialect, and one is not only not always anxious to make a distinction, there is usually nothing gained at all from such a “turf war”. All is language and words, rarely isolects or lexemes. Whether or not something should be treated separately is decided long before you realize you could beat the topics of this dichotomy again to fill your publication history.

In this case the talk of “language”, I may argue, is purposefully misleading people, to market one’s publication career. It’s just much more zhoosh to publish about whole “languages” than dialects. But it’s okay to embellish things a bit since the core message of a paper does not hinge on these concepts. All historical sciences use to be much less exact in their design than that of the jurist who has the peculiar task to weigh or find a balance for a final decision. Like how I formulate etymologies in probability terms is secondary to what information is provided, in other words: it is mostly rhetorics to present the material, the related forms, reconstructions, and bibliography—this is the science, the result is of little practical relevance, unlike in the legal art where in the end you get a sentence or recommend an action. There is a principal misunderstanding of what linguistic papers are about here I can make out. Benwing noticed. You take publications of an author and read them with an exactitude that they don’t provide, with “research results” that they didn’t care about. One could enjoy that there are still naive academics whose subjects are recondite enough for their not bewaring of a lawyer around the corner attempting to misinterpret them. Fay Freak (talk) 00:35, 8 December 2023 (UTC)Reply
@Fay Freak This seems like a very cynical answer, and it’s difficult to see how you’re not simply accusing Borjian of academic dishonesty. Also Benwing2 didn’t add anything on this topic - he simply asked for consensus. Theknightwho (talk) 08:48, 8 December 2023 (UTC)Reply

Zemiaki (iir-zem)
Spoken by around 500 people and related to Waigali, but I'm not seeing any indication it should be treated as a dialect in the literature.
Oppose: Morgenstierne (1974) calls it a dialect of Waigali, and Edelman (1999) is unsure, labeling it "jazyk/dialekt". We should play it safe and treat it like a dialect. --{{victar|talk}} 21:46, 7 December 2023 (UTC)Reply

Xiongnu (und-xnu)
Attested ~~only via~~ in Old Chinese records of the language [edit: and potentially some inscriptions - see below], but nevertheless, a handful of terms have been recorded (and we can, at least, make broad reconstructions as to how they would have been read): e.g. the Old Chinese borrowing 谷蠡.

Theknightwho (talk) 16:03, 4 December 2023 (UTC)Reply

Oppose Xiognu (Old Chinese is Old Chinese). West Galindian is also unattested. Is East Galindian attested outside of borrowings? If not, maybe keep as a substrate language?

Provisional support Zemiaki, Kishtwari, Urtsuniwar, based on the assumption there are no good arguments to keep these together.

Abstain for the others: poorly attested, extinct languages are usually subject to a lot of debate and usually dictionary entries in these don't turn out well, but they at least seem valid. Thadh (talk) 16:25, 4 December 2023 (UTC)Reply

@Thadh The issue with Galindian is that we need to deal with the present situation, since having a single language code for both is simply incorrect. Re Xiongnu, I'm not referring to borrowings - I'm referring to specific records of the Xiongnu language in Old Chinese sources. Theknightwho (talk) 16:30, 4 December 2023 (UTC)Reply

@Theknightwho: Do you mean mentions of terms à la Uindiorix, or do you actually mean texts à la Luwian? Because in the former case, I'm inclined to call it a borrowing rather than an attestation, whereas the second one is fair enough. Thadh (talk) 17:18, 4 December 2023 (UTC)Reply

@Thadh It's a bit tricky - for example, see [9], where Vovin argues (quite convincingly) that they're inscriptions in Xiongnu which used Old Chinese characters for their semantic values, except for terms that needed to be transcribed phonetically, such as titles or personal names. There's obviously precedent for this - compare Japanese, Korean, Vietnamese etc. Theknightwho (talk) 18:01, 4 December 2023 (UTC)Reply

@Thadh: Discussion will be considerably less confusing if people put their Supports, Opposes and Abstains under each individual case rather than grouping them together at the bottom. —Mahāgaja · talk 18:06, 4 December 2023 (UTC)Reply

@Mahagaja: I had quite general remarks: Living languages - split. Unattested languages - no split. Rest - abstain. I think repeating this ten times is a bit overkill. Thadh (talk) 21:12, 4 December 2023 (UTC)Reply

I'm usually sympathetic to adding extinct language X even if it's only attested as quotations/mentions/etc in old records in language Y, as long as we're sure X was a language (and different from, not just a dialect of, Y or another language). With Xiongnu, it seems like no one is sure which of various unrelated ethnolinguistic families the Xiongnu people and language(s) might have been from, or even if it was composed of multiple ethnolinguistic groups. That last part gives me pause. Are scholars generally in agreement that the attested words from the Xiongnu are all in one language, or is this like e.g. "Loup" where it's multiple different languages? (We currently have Category:Loup B language, but this is questionable and it seems good that we don't have any entries.) - -sche (discuss) 21:15, 4 December 2023 (UTC)Reply

@-sche A lot of that lack of certainty comes from two factors:

Because Xiongnu is filtered through Old Chinese characters, any kind of reconstruction therefore relies on us being able to accurately reconstruct the readings of those characters. This is something that is gradually improving, and - for example - we are in a much better position to make this kind of judgment than Pulleyblank was in the 1960s
There’s been a huge amount of (understandable) speculation as to whether the Xiongnu and the Huns were one and the same. If I had to put money on it I’d say they probably were related, but I strongly suspect there was a large dialect continuum involved (just as there was with the Mongolian languages a millennium later). However, I’m certainly not proposing we merge Hunnic with Xiongnu or anything as radical as that. What we do know is that the inscriptions which were found were created by the same Xiongnu who are written about in Old Chinese sources, because they were excavated in the old Xiongnu capital of Longcheng in Mongolia, which was discovered quite recently. The question is whether they’re in Old Chinese or Xiongnu, but I’m inclined to agree with Vovin that the evidence suggests the latter.

Theknightwho (talk) 03:36, 5 December 2023 (UTC)Reply

As the discussion was over 100,000 bytes (and this page as a whole was over a million bytes), and it had not been edited in more than a year, it has been moved to Wiktionary:Language treatment requests/Archives/2020-24#Medieval Greek from Ancient Greek. — This unsigned comment was added by -sche (talk • contribs) at 07:22, 19 January 2026 (UTC).Reply

As variants of Proto-West Germanic. This shoud hopefully be relatively uncontroversial, since we already have a healthy number of entries in Category:Anglo-Frisian Germanic and Category:North Sea Germanic, and there's a need for these due to both (sub-)families being mentioned in various etymology sections:

No doubt there are many more entries where these could be referred to. Theknightwho (talk) 02:19, 27 February 2024 (UTC)Reply

@Theknightwho Anglo-Frisian is a well-established clade but I'm not so sure about North Sea Germanic. Cf. Wikipedia's comment:

North Sea Germanic, also known as Ingvaeonic /ˌɪŋviːˈɒnɪk/, is a postulated grouping of the northern West Germanic languages that consists of Old Frisian, Old English, and Old Saxon, and their descendants.

Ingvaeonic is named after the Ingaevones, a West Germanic cultural group or proto-tribe along the North Sea coast that was mentioned by both Tacitus and Pliny the Elder (the latter also mentioning that tribes in the group included the Cimbri, the Teutoni and the Chauci). It is thought of as not a monolithic proto-language but as a group of closely related dialects that underwent several areal changes in relative unison.

Benwing2 (talk) 04:36, 27 February 2024 (UTC)Reply

@Victar as a major PWG editor.

Not to mention the fact PWG is already pretty controversial (@Mårtensås had some strong opinions on the topic).

I don't think an etym-only code for either is needed at this time, as the supposed differences were very minor, and we don't represent it in our PWG entries afaik. So while the label signifies a term's distribution, it is still supposedly the same language as any other PWG reconstruction in the model we handle. Thadh (talk) 07:24, 27 February 2024 (UTC)Reply

I've never had a need for either, and North Sea Germanic is generally considered an areal grouping. -- Sokkjō 07:39, 27 February 2024 (UTC)Reply

I can see the argument against NSG, but there is very clearly a need for Proto-Anglo-Frisian based on the etymologies mentioned above. It’s not about whether any particular editor has a need for it themselves, and nobody is suggesting we create separate entries for them outside of PWG. Theknightwho (talk) 11:00, 27 February 2024 (UTC)Reply

@Theknightwho I see you created a category Category:Old Frisian terms derived from North Sea Germanic languages as well as Category:Elfdalian terms derived from North Sea Germanic languages and Category:Elfdalian terms derived from Anglo-Frisian languages. Why did you do that, since this discussion is far from resolved? Benwing2 (talk) 22:29, 27 February 2024 (UTC)Reply

@Benwing2 I've already removed the North Sea Germanic family, as I thought better of it. The question of whether we have an Anglo-Frisian clade is separate from whether we have a protolanguage for it (and that category was created back in November). Theknightwho (talk) 22:35, 27 February 2024 (UTC)Reply

Ignoring that fact that a genetic Anglo-Frisian family is disputed, as far as I'm aware, no one has published "Proto-Anglo-Frisian" reconstructions, not even Boutkan or Siebinga, so we wouldn't even have anyone to cite. -- Sokkjō 00:57, 28 February 2024 (UTC)Reply

@Sokkjo Then someone will need to deal with the etymology sections in those entries. Either we mention Anglo-Frisian reconstructions with a proper language code, or we don't mention them at all. Theknightwho (talk) 01:43, 28 February 2024 (UTC)Reply

Which entries, these: CAT:Anglo-Frisian Germanic? -- Sokkjō 02:11, 28 February 2024 (UTC)Reply

@Sokkjo English welkin (which refers to an "Anglo-Frisian Germanic" term), while Old English hriþer and metegian, Old Frisian hrither, and Saterland Frisian dusse all explicitly give Anglo-Frisian reconstructions. Theknightwho (talk) 02:15, 28 February 2024 (UTC)Reply

Amended. -- Sokkjō 04:16, 28 February 2024 (UTC)Reply

@Sokkjo You should also look at the entries mentioned in the North Sea Germanic list at the top of the thread. Once they're dealt with, I'll close this request as resolved. Theknightwho (talk) 06:44, 28 February 2024 (UTC)Reply

@Theknightwho Before resolving this, we need to clear up whether to let the existing 'Anglo-Frisian' family stand. You created it in November without discussion and it's not clear to me from this discussion whether there's consensus in its favor. Benwing2 (talk) 07:11, 28 February 2024 (UTC)Reply

@Benwing2 To explain the reasoning: I understood it to be an uncontroversial clade, which was reinforced by the existence of Category:Anglo-Frisian Germanic. I may have misunderstood the implications of that category, though. Theknightwho (talk) 07:26, 28 February 2024 (UTC)Reply

@Theknightwho I think what this shows is that all additions of clades, and more generally any addition of languages or families, needs discussion beforehand, no matter how uncontroversial it seems. Benwing2 (talk) 07:53, 28 February 2024 (UTC)Reply

@Theknightwho I see you also created the "High German" family back in November. Let me reiterate, you need to not create any more languages or families without discussion. Benwing2 (talk) 01:25, 1 March 2024 (UTC)Reply

Tupinambá has only 3 entries, i, pá and ý, which are already covered by Old Tupi, i, pá and 'y/y. Also, Old Tupi is used as an umbrella term for all Tupi dialects in Wikitionary, so having a separate heading for Tupinambá doesn't make much sense. Trooper57 (talk) 17:11, 9 March 2024 (UTC)Reply

I also wanted to merge Tupinikin (tpk) for the same reason, just realised there's page for it. This one is basically blank, except for an empty maintenance category. Trooper57 (talk) 21:15, 9 March 2024 (UTC)Reply

tpw (Old Tupi) got merged into tpn (Tupinambá) in 2022, so we should probably follow suit. I don’t really understand why Tupinikin (tpk) should be merged, though. Theknightwho (talk) 21:52, 9 March 2024 (UTC)Reply

It's the same case of Tupinambá: what they call "Tupinikin language" is the variant of Old Tupi spoken by the Tupinikin people. I called them dialects but the difference is like General American to Southern American English, they differ on pronunciation in some points and call some things by different words, but aren't languages on their own. The category is just gonna stay blank forever as all lemmas will be put in Old Tupi anyway. Also, both Tupinambá language and Tupiniquim language redirect to Tupi language on Wikipedia.

About the code, I chose tpw over tpn because I prefer the name "Old Tupi", since it's neutral. I don't mind changing the code if we keep the name. Trooper57 (talk) 22:44, 9 March 2024 (UTC)Reply

@Trooper57 For reference ISO merged Old Tupi and Tupinambá to tpn, and the code tpw was deprecated. It also seems that all varieties of Tupi are extinct. If Tupinambá & Old Tupi [tpn] are not significantly different from Tupiniquim [tpk] perhaps they should all be merged into Tupi [tpn]? - سَمِیر | Sameer (^{مشارکت‌ها} · ^بحث) 21:54, 9 March 2024 (UTC)Reply

It seems theknightwho already said that while I was typing so my comment is now pointless 😞. - سَمِیر | Sameer (^{مشارکت‌ها} · ^بحث) 21:56, 9 March 2024 (UTC)Reply

Discussion moved from WT:RFM.

It seems that these are the names of the same language. Infovarius (talk) 16:05, 19 July 2025 (UTC)Reply

(Notifying NoKiAthami, RodRabelo7, Trooper57): please discuss. Juwan (talk) 13:24, 20 July 2025 (UTC)Reply

They are the same. Merge Category:Tupinikin language too, which doesn't even exist: it's based solely on Glottolog's list. Trooper57 (talk) 13:58, 20 July 2025 (UTC)Reply

Agree. Tupinamba, Tupinikin, etc., as far as I know are generally understood as a same language, with of course different dialects. NoKiAthami (talk) 14:26, 20 July 2025 (UTC)Reply

(Moved from RFM to here. BTW, linking another related discussion, Wiktionary:Beer parlour/2023/September#About the Tupi-Guarani family.) - -sche (discuss) 05:23, 23 July 2025 (UTC)Reply

Pinging @Benwing2. What’s the process to change this? — Polomo ⟨⁠ ⁠oi!⁠ ⁠⟩ · 19:43, 17 September 2025 (UTC)Reply

@Polomo We would have to first eliminate uses of the codes to be deprecated. There are no Tupinambá or Tupinikin lemmas, which makes things a lot easier, but there are 186 terms with Tupinambá translations. We need to decide (a) what name to use ("Old Tupi" or just "Tupi"?) and (b) what code to use (keep tpw, or switch to tpn following ISO 639-3? I would suggest the latter). Once we've decided those questions, we do something like this:

Make tpw be an alias of tpn (or vice versa, depending on what is chosen as the canonical code), and fix the small number of requests that reference tpk to reference tpn (or tpw, whatever is canonical).
Delete the category pages for Tupinambá and Tupinikin.
Delete the language entries for Tupinambá and Tupinikin.
Use a bot to switch all uses of tpw to tpn (or vice versa) and change the Translation headings from "Tupinambá" to "Old Tupi" (or whatever name is chosen; if these are really Tupinambá-specific translations, we might want to add an indication of this next to the translation; but they just be generic Old Tupi terms; someone will have to review them manually). If there are both Old Tupi and Tupinambá translations for the same term, they will have to be cleaned up manually.
Remove the aliases.

I might have the order slightly off here, but it's close enough. I can do most of the steps but some of them require help from someone who knows the language. Benwing2 (talk) 20:02, 17 September 2025 (UTC)Reply

(Notifying Atitarev, Tooironic, Fish bowl, Justinrleung, Mar vin kaiser, RcAlex36, The dog2, Frigoris, 沈澄心, 恨国党非蠢即坏, Michael Ly, Wpi, ND381, Benwing2): Following the various discussions relating to Min in the last month or so, now seems a good time to propose the additional Southern Min varieties which we've been missing:

Zhenan Min (nan-zhn)
Datian Min (nan-dtn)
Longyan Min (nan-lnx) - sometimes grouped as part of Hokkien
Sanxiang Min (nan-zsh) - one of the Zhongshan Min lects; the other two are apparently Eastern Min
Swatow Min (nan-swt) - also known as Shantou
Hoklo Min (nan-hlh) - also known as Hailufeng or Haklau Min; currently etym-only but should be made a full language
Proto-Southern Min (nan-pro) - see Appendix:Proto-Southern Min reconstructions

Although we will want codes for all of these, it might not be desirable to count all of them as separate languages. I also suspect the list is far from complete. Theknightwho (talk) 19:32, 10 March 2024 (UTC)Reply

Support although (a) are we stuck with the above codes (i.e. they are proposed ISO 639 standard codes)? If not some of them could stand to be rationalized; (b) we should clarify earlier rather than later whether these should be full or etym codes (although for Chinese I suppose it makes less difference than elsewhere as the L2 header used is always "Chinese"). Benwing2 (talk) 19:37, 10 March 2024 (UTC)Reply

Swatow Min is classified under Teochew, so we do not need additional codes for it. The term "Hoklo" is a bit ambiguous because Hokkien speakers will consider "Hoklo" to refer to Hokkien. The dog2 (talk) 19:44, 10 March 2024 (UTC)Reply

@The dog2 The difficulty with "Teochew" as a name is that it refers to two different things: (1) what Wikipedia calls Chaoshan Min as a whole, and (2) the specific lect as spoken in Chaozhou, which it calls the Teochew dialect. We will still need a code for it either way, but the question is whether it should be an etymology-only code or a full language code. Theknightwho (talk) 19:54, 10 March 2024 (UTC)Reply

The first definition of "Teochew" already has a code for it. It is "zhx-teo". But I'd be open to changing it to be in line with that of the other Southern Min dialect. In Southeast Asia, the term "Teochew" in common parlance is generally understood to refer to the first definition. The dog2 (talk) 20:00, 10 March 2024 (UTC)Reply

@The dog2 Yeah, that makes sense. Just as a side point, the Teochew code was changed to nan-tws with the split of Min Nan, because it makes sense to give all the Southern Min codes the nan prefix, and the pending ISO code is tws. Theknightwho (talk) 20:21, 10 March 2024 (UTC)Reply

@Theknightwho: Thanks for starting this discussion. There are few issues here.

Zhenan Min might be a confusing name because Southern Zhejiang has both Southern Min and Eastern Min varieties; we may want to look into what other names we can use.
Datian Min might need to split further into Qianlu and Houlu dialects.
Does Longyan Min cover all Southern Min varieties spoken in the prefecture city of Longyan? Otherwise, there are several (sub)varieties of Longyan Min.
Swatow/Shantou should probably not be separate from Teochew - it's rare to consider them different varieties.
I personally prefer Hailufeng over Hoklo for the varieties of Southern Min spoken in Haifeng/Lufeng, since Hokkien may also be called Hoklo.

— justin(r)leung _{{ (t...) | c=› }} 20:11, 10 March 2024 (UTC)Reply

@Theknightwho

1. “Zhenan Southern Min” lies within Hokkien, both sociolinguistically & in terms of intelligibility. It’s pretty much an overseas cluster of Hokkien (and not only b/c it arrived by sea), and should be discussed in that context.

2. Yes, but “Datian Min” is not one language. Which “Datian Mins” belong within “Southern Min” (in any meaningful sense) is a question yet to be thoroughly considered.

3. Yes. “Longyan Min” is sociolinguistically not-Hokkien as well as mutually unintelligible vs Hokkien.

4. Yes. (Not sure if the other two are “Eastern Min”, but that’s a whole other ballgame.)

5. Swatow “Min” is part of Teochew, as others have pointed out.

6. Yes, most definitely. BTW, “Hoklo” refers to the language cluster that includes this language, Hokkien, Teochew, Taiwanese, & maybe others. So “Hoklo” & “Haklau” would be cognate non-synonyms, kind of like “Thai” & “Tai”, but not as striking.

7. Maybe the supposed proto-language should be fleshed out first? (+ I apologise if this is obvious, but Kwok’s “reconstructions” seem to be something quite different from what we usually mean by reconstruction. Also note (as with the ONESELF line) how much data it just flat-out ignores or omits (in this case perhaps in order to hang on to the presumed characters-of-etymology 家 & 己). 釆 (talk) 13:45, 11 March 2024 (UTC)Reply

(Notifying Thadh, Tropylium, Surjection): Recently I’ve been adding Beserman Udmurt entries (Category:Beserman Udmurt), and contrary to my expectations, Beserman seems less similar to Udmurt than I initially expected (at least in terms of vocabulary and phonology). Beserman is usually considered to be a 'special' dialect of Udmurt, and since recently it also has it's own written standard. As far as I can see it definitely seems more convenient to create separate Beserman entries. I'm afraid that, if not, Udmurt might get pretty messy, with for most Udmurt entries a Beserman alternative form. A lot of information on the Beserman dialect can be found on http://beserman.ru/. I'll be glad to hear your opinions on this. Илья А. Латушкин (talk) 19:52, 13 March 2024 (UTC)Reply

At minimum most of the Beserman entries so far should not be listed as synonyms. Most are simply the result of a regular sound change from ы /ɨ/ to ө. Currently it seems this is also transcribed on here as /ʌ/ and translitterated as å, where at least the latter seems weird, most often I have seen the sound described as /ə/ (= Finno-Ugric transcription ə̑, which beserman.ru also seems to use). In any case, these could be easily accommodated similar to differences between e.g. English dialects, as alternate pronunciations + spellings (besides, this is not unique to Beserman but is paralleled by other dialects). A few other phenomena also come down to simple systematic pronunciation differences, e.g. the replacement of ӧ by /e/. It is unclear to me (and per current literature, it seems, also to Uralistics at large) how much else really differs between Beserman and even standard Udmurt. --Tropylium (talk) 20:07, 13 March 2024 (UTC)Reply

@Tropylium: The usage of synonym of stems from my usage of that format in Komi Izhma entries, e.g. асывыы (asyvyy). It's probably indeed a good idea to mark them as altforms, but the issue I have is mostly that Komi Izhma is actually semi-standardised alongside standard Komi, and the same issue is also present in Beserman.

On the differences between it and standard Udmurt, I honestly can't say a lot as I haven't worked too much with the language. It does feature some unique sound changes from the Proto-Permic language that set it apart from the other Udmurt dialects, like being the only Permic lect to (consistently) differentiate between the reflexes of *u and *ü. It also seems to have a national identity separate from other Udmurts. But other than that I would have to refer to Ilya, as they've worked with the language more closely. Thadh (talk) 20:47, 13 March 2024 (UTC)Reply

Sorry, whose *ü and where? Beserman has a few unique-looking cases of /ə/ (< ? *ɨ), but only in words where southeastern Udmurt more generally also shows /ʉ/ (the generally accepted historical scenario is that Beserman arises from the SE dialects of Udmurt, after a migration towards the north leaves them slightly isolated). --Tropylium (talk) 21:03, 13 March 2024 (UTC)Reply

Lytkin's. I'm talking of words like мөнөнө (månånå, “to go”) and зөмөнө (zåmånå, “to dive”). And I do take issue with your identification of the vowel as being a schwa, it most definitely isn't one. If you listen to actual recordings I think you'll agree that it is a low vowel, sometimes even as open as [a]. Thadh (talk) 21:30, 13 March 2024 (UTC)Reply

/ə/ is not my identification but what reference literature insists calling it, e.g. the late Keľmakov's monographs on Udmurt dialectology like Udmurtin murteet (1994), Диалектная и историческая фонетика удмуртского языка (2003). A lot of beserman.ru's recordings do sound more like [ʌ] or [ɐ], I agree. This could be a recent development, also e.g. the loss of ӧ is only post-WW1. --Tropylium (talk) 20:43, 14 March 2024 (UTC)Reply

Overall Permic languages have undergone some shifts in the recent century, also including the delabialisation of ӧ (ö) in practically all varieties of Komi. Since we are primarily a descriptive dictionary of the modern languages (earlier stages are a bonus!) I think we should stick to the modern pronunciation. The transcription of the vowel as å was taken over from Komi-Yazva, which has a very similar vowel written the same way. Thadh (talk) 09:07, 15 March 2024 (UTC)Reply

I know nothing about Udmurt, but I do agree that unless and until Beserman is considered a separate language, its entries should be formatted along the lines of {{alt form|udm|аску|from=Beserman}} rather than as synonyms of primary-dialect forms. —Mahāgaja · talk 21:40, 13 March 2024 (UTC)Reply

@Tropylium I have found some other sound correspondences between Udmurt and Beserman:

1. йырси ~ йөрчө 'hair', кырси ~ көрчө 'son-in-law'

2. кеч ~ кесь 'goat', ӟуч ~ дюсь 'Russian'

3. син ~ синь 'eye', кин ~ кинь 'who', нин ~ нинь 'linden'

4. тэй ~ тей 'louse', дӥсь ~ дись 'clothes', дэрем ~ дерем 'shirt'

5. ӝӧк ~ ӟек 'table', ӝыт ~ ӟөт 'evening', ӝужыт ~ ӟужөт 'high'

6. ньөм ~ ним 'name', йөвор ~ ивор 'news'

7. сылал ~ слал 'salt', плем ~ пилем 'cloud'

Илья А. Латушкин (talk) 18:24, 14 March 2024 (UTC)Reply

FWIW most of this is also within normal phonetic variation for Udmurt dialects, the /Te/ > /Tʲe/ change is the only systematic feature I don't recall seeing reported before (makes sense though, helps for not entirely losing the э/ӧ contrast).

One thing to consider is that even if we created Beserman separately, we'd then still want to note all forms like these in Udmurt entries, just now as etymological cognates rather than pronunciation variants. It might not save substantial work altogether. The etymologist in me at least thinks this would be probably the nicer option though, if you're already creating separate entries anyway. And it would be more consistent also with how we have split Komi-Zyrian and Komi-Permyak, instead of treating them as variants of single "Komi". --Tropylium (talk) 19:43, 14 March 2024 (UTC)Reply

The same thing has come to my mind as well, and at first sight the differences between Komi-Zyrian and Komi-Permyak do not seems to be much larger than those between Udmurt and Beserman.

I've found two more sound correspondences (1. ӟуч ~ дюсь 'Russian', ӟеч ~ десь ‘good’, 2. ньыль ~ ниль ‘four’, выль ~ виль ‘new’) and some Beserman words not found in standard Udmurt (most of them Turkic loanwords), eg. бикем ‘aunt’, биягам ‘husband's older brother’, бийөм ‘mother-in-law’, ўармиська ‘brother-in-law’, писяй ‘cat’ (also found as ‘писэй’ in dial. Udmurt), … Also some other, more sporadic, vowel correspondences have come up: изьыны ~ узьөнө ‘to sleep’, губи ~ гиби ‘mushroom’, чорыг ~ чорог ‘fish’, сюрес ~ сьөрес ‘road’, бугро ~ бөгра ‘felling’, … Илья А. Латушкин (talk) 08:50, 15 March 2024 (UTC)Reply

(Notifying Atitarev, Tooironic, Fish bowl, Justinrleung, Mar vin kaiser, RcAlex36, The dog2, Frigoris, 沈澄心, 恨国党非蠢即坏, Michael Ly, Wpi, ND381): @Theknightwho Hopefully this ping isn't too noisy. There are two more sources of Chinese lects here at Wiktionary that I have found that may need etym-only codes: qualifiers in thesaurus entries and labels in Module:labels/data/lang/zh. The following table is derived from thesaurus qualifiers (I computed this as part of converting nan codes and qualifiers to appropriate lect codes):

Qualifier	Count	Comment	Wikidata entry (if any)
ACG	1	Does this mean "Anime, Comics, Gaming"? Not a lect.
Anxi Hokkien	2	Need lect code?
Australia	1	Ambiguous
Buddhism	5	Not a lect
Buddhist temple	8	Not a lect
Chinese landscape garden	1	Not a lect
Christianity	1	Not a lect
Classical Chinese or in compounds	1	Ambiguous
Classical Chinese	59	Ambiguous
Classical	8	Ambiguous
Eastern Min; Southern Min	1	Ambiguous
Fuzhou	1	Ambiguous
Guangdong	1	Ambiguous?
Guiyang	1	Need lect code? Per w:Southwestern Mandarin, a subvariety of the Kun-Gui variety of Southwestern Mandarin	Q15911623
Harbin Mandarin	1	Need lect code; a variety of Northeastern Mandarin	Q1006919
Harbin	2	(same as above)
Hong Kong	24	Ambiguous
Hong Kong><tr:pot¹	1	Ambiguous
Hsinchu & Taichung Hokkien	1	??? Do we need two lect codes? Wikidata has a "Taichung Accent" (Q10914070) but it is a variety of Mandarin; can't find Hsinchu Hokkien in Wikipedia or Wikidata
Internet slang	9	Not a lect
Internet	2	Not a lect
Japanese calligraphy	1	Not a lect
Jilu Mandarin	1	Need lect code; primary subdivision of Mandarin	Q516721
Jinhua Wu	1	Need lect code	Q13583347
Korean calligraphy	1	Not a lect
Liuzhou Mandarin	2	Need lect code?	Q7224853
Liuzhou	1	(same as above)
Longyan Min	2	Need lect code (but will likely be transitioning to a full language, see #Additional Southern Min languages); per Wikipedia, a variety of Hokkien, but that may be wrong	Q6674568
Luoyang Mandarin	1	Need lect code; a variety of Central Plains Mandarin	Q3431347
Luoyang	3	(same as above)
Macau	2	a variety of Cantonese? Do we need a lect code?
Mainland China	3	Ambiguous
Mainland	2	Ambiguous
Malaysia	11	Ambiguous
Mandalay Taishanese	1	an overseas variety of Taishanese; Do we need a lect code?
Min	12	Ambiguous
Muping Mandarin	1	Do we need a lect code? This may be a variety of Shangdong Mandarin (Q3285432)
Muping	2	(same as above)
Nanchang Gan	1	Need lect code	Q3497239
Northern China	1	Ambiguous
Northern Mandarin	2	Ambiguous
Philippines	1	Ambiguous
Pinghua	1	Ambiguous
Pingxiang Gan	3	Do we need a lect code? A variety of Yiliu Gan Chinese (Q8053438)
Qing Dynasty	1	Not a lect
Sichuanese or Internet slang	1	Sichuanese = zhx-sic; Internet slang = not a lect
Singapore	13	Ambiguous
Son of Heaven	2	What is this? Not a lect.
Southeast Asia; dated or dialectal in Mainland China	1	Ambiguous
Southwestern Mandarin	2	Need lect code	Q2609239
TCM	3	Traditional Chinese Medicine? Not a lect.
Taichung & Tainan Hokkien	1	Do we need a lect code or two? See above under "Hsinchu & Taichung Hokkien" for Taichung Hokkien. Tainan Hokkien is mentioned in Wikipedia as being the prestige dialect of Taiwanese Hokkien but can't find it in Wikidata.
Tainan Hokkien	1	(see above)
Taiwan	24	Ambiguous
Taiwanese	2	Ambiguous
Taiyuan	1	Need lect code? Variety of Jin Chinese	Q10941068
Taoism	1	Not a lect
Thailand	2	Ambiguous
Urumqi	2	Need lect code? Variety of Lanyin Mandarin	Q10878256
Wanrong	1	~~This is a mountain indigenous township in Taiwan; I don't what lect is being referred to, and whether it's even Chinese~~ Refers to Wanrong County in Shanxi; a variety of Central Plains Mandarin, mentioned in the Great Dictionary of Modern Chinese Dialects; apparently a subvariety of Fenhe Mandarin (Q10379509)
Xi'an Mandarin	1	subvariety of Guanzhong Mandarin (Q3431648); not sure if it needs to be distinguished from Guanzong	Q123700130
Xi'an	1	(same as above)
Xinzhou	3	Need lect code? Variety of Jin Chinese, doesn't seem to have Wikidata entry
Yinchuan	1	Need lect code? Variety of Lanyin Mandarin
Yongchun Hokkien	1	Need lect code?	Q65118728
Yudu Hakka	1	Need lect code?	Q19856416

There are 14 lects among the above qualifiers with Wikidata entries that I could find, and some others apparently without Wikidata entries that might need a code. Benwing2 (talk) 03:12, 18 March 2024 (UTC)Reply

@Benwing2 Thanks for putting this together. On Longyan Min in particular, it's likely going to be separated out as a full language as per #Additional Southern Min languages, despite Wikipedia calling it a variety of Hokkien. Theknightwho (talk) 03:27, 18 March 2024 (UTC)Reply

@Theknightwho Ah, I see that now, thanks. Benwing2 (talk) 03:33, 18 March 2024 (UTC)Reply

@Benwing2: Wanrong refers to Wanrong County in Shanxi; this is a variety of Mandarin (Central Plains IIRC). — justin(r)leung _{{ (t...) | c=› }} 03:32, 18 March 2024 (UTC)Reply

@Theknightwho, Justinrleung Only pinging the people who responded to part 1 above. Here are the uncoded Chinese varieties with labels in Module:labels/data/lang/zh. As above, some have Wikidata items and some are too unspecific or ambiguous to turn into etym-only lects. Some are also clearly full languages or even families.

Canonical label	Label aliases	Comment	Wikidata item (if any)
`dialectal Cantonese`	—	Not specific enough
`Changzhounese`	`Changzhou dialect`, `Changzhou Wu`	subvariety of Northern (Taihu) Wu	Q1021819
`Chuzhou Wu`	`Chuzhou dialect`, `Lishuinese`, `Lishui dialect`, `Fujian Wu`, `Lishui Wu`	a variety of Chu-Qu Wu, a Southern Wu language; confusable with Quzhou Wu; not in Wikidata?
`Coastal Min`	`coastal Min`	Not specific enough
`Datian Min`	—	likely becoming a full language	Q19855572
`dialectal Eastern Min`	`dialectal Min Dong`	Not specific enough
`Gansu Dungan`	—	basis of the Soviet written standard for Dungan; not in Wikidata?
`dialectal Gan`	—	Not specific enough
`Guangxi Mandarin`	—	This is possibly the same as Guiliu (Gui-Liu) Mandarin (supervariety of Guilin Mandarin)	Q11111664
`dialectal Guangxi Mandarin`	—	Not specific enough
`dialectal Hakka`	—	Not specific enough
`Hong Kong Hakka`	—	Mentioned in the Wikipedia w:Hakka Chinese article	Q2675834
`Huzhounese`	`Huzhou dialect`, `Huzhou Wu`	subvariety of Northern (Taihu) Wu	Q15901269
`Inland Min`	`inland Min`	Not specific enough
`Jianghuai Mandarin`	`Jiang-Huai Mandarin`, `Lower Yangtze Mandarin`, `Huai`	primary branch of Mandarin	Q2128953
`Jiaoliao Mandarin`	`Jiao-Liao Mandarin`	primary branch of Mandarin	Q2597550
`Jilu Mandarin`	`Ji-Lu Mandarin`	primary branch of Mandarin?	Q516721
`dialectal Jin`	—	Not specific enough
`Korean Classical Chinese`	—	Not quite sure what this is and how to classify it; one of the Module:zh-usex/data lects that was skipped
`Linshao Wu`	`Linshao`, `Linshao dialect`, `Lin-Shao Wu`, `Lin-Shao dialect`, `Lin-Shao`	subvariety of Northern (Taihu) Wu; not in Wikidata?
`Liuzhou Mandarin`	—	a variety of Southwestern Mandarin	Q7224853
`dialectal Mandarin`	—	Not specific enough
`Min`	—	Not specific enough
`Nanning Pinghua`	—	a variety of Southern Pinghua Chinese; not in Wikidata?
`North America`	`North American`	Not specific enough
`Pinghua`	—	A family, not a language
`Shaoxing Wu`	`Shaoxingnese`, `Shaoxingese`, `Shaoxing dialect`	variety of Linshao Wu, in turn a variety of Northern (Taihu) Wu	Q7489194
`Shehua`	—	its own branch of Chinese	Q24841605
`Shuangfeng`	—	dialect of Old Xiang	Q10911980
`Siyi`	—	a Yue language? Includes Taishanese	Q2391679
`Southern Min`	`Min Nan`	Not specific enough
`dialectal Southern Min`	`dialectal Min Nan`	Not specific enough
`Southern Wu`	—	appears to be a Wu subfamily, including at least three languages
`Standard Written Chinese`	`SWC`	Per User:justinrleung, this refers to Standard Mandarin = Putonghua, different from Written vernacular Chinese which refers to the standard written vernacular varieties of the Qing and Ming dynasties, as opposed to Classical/Literary Chinese (NOTE: Wikipedia's Standard Written Chinese confusingly redirects to Written vernacular Chinese, and Wikipedia's article on that covers time periods from the Ming dynasty to the present, not just through the end of the 19th century)	Q727694
`Sujiahu`	`Su-Jia-Hu Wu`, `Sujiahu Wu`, `Su-Jia-Hu`	a subvariety of Northern (Taihu) Wu
`Vietnamese Classical Chinese`	—	Not quite sure what this is and how to classify it; one of the Module:zh-usex/data lects that was skipped
`dialectal Wu`	—	Not specific enough
`Wuzhou Wu`	`Jinhua dialect`, `Jinhuanese`, `Wuzhou`, `Wuzhou dialect`, `Jinhua Wu`	one of the Southern Wu languages	Q2779891
`dialectal Xiang`	—	Not specific enough
`Xinjiang`	—	subvariety of Lanyin Mandarin? Includes Urumqi Mandarin (Q10878256)
`Xinqu Wu`	`Quzhounese`, `Quzhou dialect`, `Shangraonese`, `Shangrao dialect`, `Xinzhou dialect`, `Xinzhou Wu`, `Quzhou Wu`, `Shangrao Wu`	a variety of Chu-Qu Wu, a Southern Wu language	Q6112429

Benwing2 (talk) 04:32, 18 March 2024 (UTC)Reply

@Benwing2: Huzhounese is Q15901269. Guangxi Mandarin should be approximately the same as Guiliu Mandarin, which is Q11111664. Hong Kong Hakka is Q2675834. Standard Written Chinese is usually referring to the modern standard, whereas Written Vernacular Chinese seems to refer to written vernacular Mandarin in the Yuan, Ming and Qing dynasties.

BTW, Xinzhou dialect as an alias for Xinqu Wu is problematic, since Xinzhou is ambiguous. Xinzhou Jin is a completely different variety from a different Xinzhou. — justin(r)leung _{{ (t...) | c=› }} 06:19, 18 March 2024 (UTC)Reply

@Justinrleung Thank you for finding those entries! I think we should remove all aliases that read 'Foo dialect' and consider only allowing aliases that include the language name in them. It is unfortunate that Wikipedia puts the primary entries for various Chinese lects under 'Foo dialect' instead of 'Foo Wu', 'Foo Jin', etc. for precisely the reason you mention. Even in the case of the same location mentioned, it's quite possible for a given location to have multiple dialects of different languages. Benwing2 (talk) 07:02, 18 March 2024 (UTC)Reply

@Benwing2: Thanks for tabulating these.

re: removing aliases that read 'Foo dialect', there are some dialects whose affiliation is not extremely clear, e.g. Huizhou dialect (not to be confused with Huizhou Chinese which is czh) and so we labelled it as "Huicheng dialect" ("Huizhou dialect" would also work but that will certainly be confused with czh).

Often the labels are used to achieve the text rather than categories, which is why there is a relatively large amount of |_| in {{lb|zh}}. One slighly extreme example would be 鐳#Etymology 2 sense 3,

{{lb|zh|Malaysia|&|Singapore|_|Cantonese|Hakka|Southern Min|;|Xiamen|Quanzhou|Zhangzhou|_|Hokkien|;|slang|_|in|_|Hong Kong Cantonese}}

, which is actually representing a large number of lects but it's not categorised properly due to the limits of {{lb}}. This is why sometimes you will find labels like {{lb|zh|Taiwan Hokkien and Hakka}} so that the desired result is achieved, even though it should actually be {{lb|zh|Taiwanese Hokkien|Taiwanese Hakka}}.

I would suggest to search for additional items in the form of {{lb|zh|Foo|_|Cantonese}} or {{lb|zh|Bar|_Wu}} which should unveil more unencoded dialects, some of which may already be covered in the previous section (e.g. something as mundane as {{lb|zh|Xiamen Hokkien}} isn't a recognized label so often it is inputted as {{lb|zh|Xiamen|_|Hokkien}}). (this is also why there is a relative abundance of Wu dialects in the labels data, probably the result of some dedicated user who added them)

I'll go over the actual individual lects later. – wpi (talk) 12:55, 18 March 2024 (UTC)Reply

Personally I prefer to assign full language codes to a group, while the representative dialect(s) spoken in a specific place will have an etym-only code.

Austrailia, Malaysia, Singapore, Thailand etc.: these may need a code for each lect (as appropriate), e.g. Malaysian Cantonese, Thailand Teochew (Malaysia may need to be further subdivided by location, we already have Penang Hokkien) [see also my previous comment]
Guangdong: usually means Cantonese+Teochew+may be Taishanese+maybe Leizhou+maybe Hainan, this should be replaced accordingly
Hong Kong, Macau: usually refers to the standard form of Chinese (not necessarily Cantonese, but often somewhat influenced by Cantonese) spoken in HK/Macau respectively [zh-HK and zh-MO?]
Taiwan: similar to above [zh-TW?]
Hsinchu & Taichung Hokkien: there may be some need to create code for the Taiwanese Hokkien dialects, but I'll defer to others for this (but IIRC Hsinchu is predominantly Hakka speaking?)
Mandalay Taishanese: might need a code but probably won't be used much
Shehua: a branch parallel to Neo-Hakka (which we call Hakka/which is the only part of "Hakka" that we have coverage of), "She" is likely the more common academic term (but this clashes with She the Hmong-Mien language, both names share the same etymology). [zhx-she?]
- (the ancestor Neo-Hakka and She is parallel to Paleo-Hakka, but this is another rabbit hole, plus coverage of it is relatively poor)
Anxi Hokkien, Yongchun Hokkien, Muping Mandarin, Wanrong: seems relatively minor to be assigned a code? I'm not certain however.

Some comments (partly based on my observation of the usage in {{lb|zh}} and also based on our[my] plans to increase coverage of dialects), grouped by branch:

Gan: label-wise we usually have Nanchang [gan-nan?], Lichuan [gan-lic?], Pingxiang [gan-pin?], Taining [gan-tai?], Yongxiu [gan-yon?]. These are all locations rather than subgroups (my understanding is that the subgrouping of Gan is quite undeveloped). It's worth noting that our Gan coverage is extremely lacking (due to both lack of data and lack of motivated editors), and most likely we will only have these four locations in the foreseeable future.
Hakka: Sixian may need to be divided into North Sixian/South Sixian. We might also want to add the rest of the Taiwanese Hakka dialects. Coverage of Yudu Hakka [hak-yud?] and Hong Kong Hakka [hak-HK?] seems OK.
Huizhou: this group is too small to have any meaningful subdivision, I think at most we can assign a code to Jixi [czo-jix?].
Jin: I think we could have Taiyuan [cjy-tai?] and Xinzhou [cjy-xin?]. The other dialects have poorer coverage. (I didn't find any usage of Xinzhou Wu)
Wu: besides the mentioned ones, we may also need Danyang Wu? I'll defer to ND381 and Musetta6729.
Eastern Min: representative dialect is Fuzhou [cdo-fuz?], other possible inclusion would be Fuqing [cdo-fuq?] and maybe Ningde [cdo-nin?]. The rest seems too sporadic.
Xiang: Changsha [hsn-cha?], Shuangfeng [hsn-shu?], Loudi [hsn-lou], Hengyang [hsn-hya] are major dialects. The coverage situation is similar to Gan.
Mandarin: the ones mentioned should be added generally.
Pinghua: Southern Pinghua [csp] is usually considered to be part of Yue. Worth noting Nanning Pinghua and Nanning Cantonese are different though.
Cantonese/Yue: I think we should add Siyi Yue [yue-siy?/zhx-siy?] and demote Taishanese [zhx-tai] to a variety of it. The usage of [yue] to refer to Cantonese or Yue is pending discussion. Other ones that could be added include Yangjiang [yue-yan?/zhx-yan?] and Dongguan [yue-don?], while the rest seems to have relatively poor coverage.
Southern Min is already dealt with elsewhere
Puxian Min: I believe this can have Putian [cpx-put?] and Xianyou [cpx-xia?]?

– wpi (talk) 16:37, 18 March 2024 (UTC)Reply

@Wpi Thank you for all the details! I just realized there is a third source of varieties here at Wiktionary, which is the dialectal data found in the data modules for {{zh-dial}}, specifically Module:zh/data/dial. For example, under 討食 / 讨食 you have a whole set of "dialectal synonyms of 要飯 / 要饭 (yàofàn, “to beg for food”)" in addition to the Thesaurus entries for 乞討 / 乞讨 (qǐtǎo) fetched using {{syn-saurus}}. Ultimately IMO we should probably merge the dialectal data in the {{zh-dial}} modules with the Thesaurus entries, but that is another can of worms. For now I'll just note that the {{zh-dial}} data conveniently comes with links to English or Chinese Wikipedia entries so it should be easy to find the relevant Wikidata items. *HOWEVER*, there are an absolute ton of varieties listed; I count 1,122 of them currently. (Of these, 969 have Wikipedia links, but many of these links are to geographic entries rather than dialectal entries.) I doubt all of these varieties need to be assigned etym-only codes. I think one way to pare them down is to go through the dialectal data and count how many synonyms there are for each variety. This should reveal which varieties are important enough to warrant codes (I imagine a lot of the varieties listed have no synonyms at all in the data). Benwing2 (talk) 22:32, 18 March 2024 (UTC)Reply

Please see User:Benwing2/zh-dialect-counts. This table lists all the varieties/dialects found among the dialectal synonym data along with counts, the Chinese dialect group they're in and the Wikipedia link, if any. (There 2,787 terms currently listed in the data.) I'm thinking we can start with the first 100 or 200 varieties listed, figure out what to do with them, and go from there. Also, the script I wrote to combine the counts with the variety data in Module:zh/data/dial output the following warnings concerning varieties for which there are synonyms but which aren't in Module:zh/data/dial:

WARNING: Found variety 'Luoyang' not in variety data
WARNING: Found variety 'Zhumadian' not in variety data
WARNING: Found variety 'Pingdingshan' not in variety data
WARNING: Found variety 'Zhoukou' not in variety data
WARNING: Found variety 'Xuchang' not in variety data
WARNING: Found variety 'Nanyang' not in variety data
WARNING: Found variety 'Luohe' not in variety data

Benwing2 (talk) 23:24, 18 March 2024 (UTC)Reply

@Wpi In response to some of your comments:

As for 'Foo dialect' issues, I think in cases like 'Huicheng dialect' where the affiliation isn't clear, we should just identify them as 'Huicheng Chinese'. It's true that we usually do that for top-level groups but I think it's better in this case than using "dialect".
I will search for labels specified using _ and such. Hopefully the usage isn't too inconsistent.
Concerning your statement "I prefer to assign full language codes to a group, while the representative dialect(s) spoken in a specific place will have an etym-only code", what is the alternative you are responding to? Is it further full-language splits (e.g. with Southern Min)?
For zh-HK, zh-MO, you say "standard language". If this is Cantonese, maybe we should use yue-HK, yue-MO?
For the specific lect comments, I don't know enough to respond but it all looks reasonable. User:Theknightwho, what do you think of the proposal to demote Taishanese to a variety of Siyi Yue?

Benwing2 (talk) 05:25, 19 March 2024 (UTC)Reply

In re point #2, see User:Benwing2/zh-label-sets. Benwing2 (talk) 06:41, 19 March 2024 (UTC)Reply

OK, only a few uses of labels involving 'Foo dialect', and only one involving a label actually listed in Module:labels/data/lang/zh, which was 𠀫𠀪 (which, BTW, is being RFV'd) using 'Hangzhou dialect':

  28 Huicheng dialect
   4 eye dialect
   3 ancient Chu dialect
   1 title=zh:Grammaire du dialect
   1 southern dialect
   1 some Mandarin with a Southern Chinese dialect
   1 of one's speech of the local dialect
   1 ancient Qi or Wu dialect
   1 ancient Qi dialect
   1 [[w:Luoyang dialect
   1 Sòng-Lǔ dialect
   1 Sichuan dialect
   1 Shaanxi dialect
   1 Northeastern dialect
   1 Ningyuan dialect
   1 Hangzhou dialect

I changed that one usage to 'Hangzhounese' and deleted all the 'Foo dialect' labels. We might want to add something for the 'Huicheng dialect' labels (cf. your mention above of this). Benwing2 (talk) 08:10, 19 March 2024 (UTC)Reply

@Benwing2:

re #3, I'm referring to when we are assigning the codes, i.e. groups like Siyi will have a full code whereas local dialect points like Taishanese will have etym-only codes.

re #4, it's basically Standard Written Chinese as used in Hong Kong/Macau. It should be "written/used" not "spoken" as I previously mentioned. There's a difference between yue-HK (Hong Kong Cantonese) and zh-HK (Hong Kong), it's a bit like Norweigian Nynorsk vs Norweigian Bokmal.

Also pinging @Justinrleung for comments to specific lects.– wpi (talk) 11:31, 19 March 2024 (UTC)Reply

@Wpi OK thanks. As for #3, I agree with your idea of the separation between full and etym-only languages going along group lines. As for #4, didn't realize there is this difference but it makes sense. Benwing2 (talk) 15:04, 19 March 2024 (UTC)Reply

Thoughts on Wu codes (locality codes are just suggestions):

Northern Wu subbranches imo don't really need codes but individual localities would be beneficial. Of which:

Changzhounese wuu-chz

Danyangese wuu-dan

Shaoxingese wuu-shx

are in need of codes (due to relative abundance of data, and will also be gaining zh-pron support soon). Some others to consider may include

Cixinese wuu-cix

Huzhounese wuu-huz

and all the other lects currently in Module:wuu-pron/sandbox. We are currently still working on it so it may be worth delaying the addition of these lect codes until we finish the Northern Wu overhaul.

Currently extant Northern Wu localities (Hangzhounese, Ningbonese, Shadi Wu, Shanghainese, Suzhounese) should all be listed under Northern Wu (wuu-nor) in the family tree on Category:Wu language (and any other system that may handle language families).
Southern Wu wise, I believe these would be helpful to have in the future, as we will be adding pages/making modules for them as soon as possible:

Jinhuanese / Wuzhou Wu wuu-jih

Taizhounese / Taizhou Wu wuu-tai

Lishunese / Chuzhou Wu wuu-lis

Shangraonese / Xinzhou Wu wuu-shr

in descending order of importance. I decided to split "Chuqu Wu" as is described on the chart as there is no clear consensus as to how the non-coastal non-Northern Wu bits should be split, but in general these three areas (Wuzhou, Chuzhou, Xinzhou) can be seen reflected in some way.

A Southern Wu code (wuu-sou) should not be made. It is likely not a familial grouping but rather just a term to use to contrast it with Northern Wu. There have been some preliminary studies that investigate whether it does form a coherent family, but results are mixed and sample sizes are small.

Regarding why there are so many Northern Wu localities, yes, muset & I added them, as unlike Hokkien for instance, the sociolinguistic attitude towards these lects is first and foremost the locality rather than the family (which contrasts with the "Hokkien" identity).

@Musetta6729 - only other active Wu editor: let us know if you have any other/conflicting ideas — nd381 (talk) 19:38, 19 March 2024 (UTC)Reply

@ND381 Thank you! I will probably take all your suggestions. Benwing2 (talk) 20:26, 19 March 2024 (UTC)Reply

Just only got the chance to look at this thread now - in terms of Wu I definitely agree with everything that ND has said so far, just two things I would like to mention:

First: Having Urban Shanghainese as a variety (maybe under something like wuu-ush) along with simply "Shanghainese" (wuu-sha) might be useful. This is due to a variety of reasons, but mainly that Contemporary "Urban" Shanghainese has showcased more convergent evolution with say, Ningbonese or Suzhounese during the last century, and has become more sociolinguistically and identity-wise distinct from many Non-Urban varieties surrounding it. With only the label "Shanghainese" now it is tricky to disambiguate between categories such as:

Primarily urban inventions not used in non-urban varieties, or that have spread out to non-urban regions as still recognisably "urbanite" speech
Common invention/retention in Non-Urban Shanghai varieties that are rare/obsolete/not used in Urban Shanghainese
Inventions in Non-Urban Shanghainese that is not geographically restricted to one specific region of Shanghai
Usage attested in both 1850s City-Center Shanghainese and contemporary Non-Urban, but not Contemporary Urban Shanghainese

Especially because all of this variance is also deeply interconnected with notions of locality, of new and old, of class, ethnicity and other sociolinguistic variables when looked at from an Urban Shanghainese standpoint. All of this has led to the use of ad hoc labels along with the Shanghainese tag like "old-period", "chiefly non-urban/suburban", "rare or obsolete" etc which is definitely not ideal. By having Urban Shanghainese as a variety I expect that this would be easier to manage - and as we go on to add more coverage on Non-Urban Shanghainese varieties we should hopefully be able to have more specific variety codes for lots of the Non-urban Shanghainese varieties too.

The second thing is a bit more minor - Suhujia (蘇滬嘉 - see linked Chinese Wikipedia article) might be a more commonly used term than Sujiahu (蘇嘉滬), which we seem to have now. The grouping seems to be somewhat areal and vaguely defined to me and I am doubtful of the extent to which having it might be useful, but nevertheless it's a fairly widely accepted grouping so thought I would bring this up in case we end up making the decision to add it. Musetta6729 (talk) 04:38, 24 March 2024 (UTC)Reply

Redid Chinese labels

[edit]

(Notifying Atitarev, Tooironic, Fish bowl, Justinrleung, Mar vin kaiser, RcAlex36, The dog2, Frigoris, 沈澄心, 恨国党非蠢即坏, Michael Ly, Wpi, ND381): @Theknightwho I redid the label structure in Module:labels/data/lang/zh. I added missing labels corresponding to the new lects in Module:etymology languages/data, canonicalized the labels to include the group name (e.g. Xiamen Hokkien instead of just Xiamen), and added shorter aliases. Duplication is avoided in something like {{lb|zh|Xiamen Hokkien|Quanzhou Hokkien|and|Zhangzhou Hokkien}} (or equivalently, {{lb|zh|Xiamen|Quanzhou|and|Zhangzhou}}) by a new Chinese-specific label postprocessing function in Module:labels/data/lang/zh/functions, which attempts to remove duplicate group names as well as duplicate occurrences of "Taiwanese" in {{lb|zh|Taiwanese Hokkien|and|Taiwanese Hakka}} or similar. Please let me know if you don't like the output in specific situations and I will tweak the function. Note that I removed the label Taiwanese Hokkien and Hakka and all its aliases, after converting all occurrences to use multiple labels like {{lb|zh|Taiwanese Hokkien|and|Taiwanese Hakka}} or similar. I also changed a few categories to better reflect the lect name, e.g. the label Philippine Hokkien now categorizes into Category:Philippine Hokkien instead of Category:Philippine Chinese. Benwing2 (talk) 00:50, 20 March 2024 (UTC)Reply

@Benwing2: Thanks for setting this up. The function looks like it works well generally, but there are some cases where it might lead to confusion, such as {{lb|zh|Taiwanese Hokkien|Taiwanese Hakka}} showing up as "Taiwanese Hokkien, Hakka", which could mean the unintended "Hakka (in general) and Taiwanese Hokkien". Perhaps one way to prevent this is to only remove duplicate group names when there is an "and" somewhere in the chain? Is that something that could be done? — justin(r)leung _{{ (t...) | c=› }} 06:56, 20 March 2024 (UTC)Reply

@Justinrleung Yup, I can do that, thanks for the suggestion. Benwing2 (talk) 17:08, 20 March 2024 (UTC)Reply

@Justinrleung This should be done. Let me know if you see anything else needing fixing. Benwing2 (talk) 03:25, 22 March 2024 (UTC)Reply

(Notifying Atitarev, Fish bowl, Frigoris, Justinrleung, kc_kennylau, Mar vin kaiser, Michael Ly, ND381, RcAlex36, The dog2, Theknightwho, Tooironic, Wpi, 沈澄心, 恨国党非蠢即坏): Apologies once again for the wide ping, as I haven't received any responses to some of my other pings. I added a bunch of labels for Yue Chinese lects, but it is revealing some issues:

We correctly classify Yue as a family, but it contains only two languages (Cantonese language and Taishanese language). Meanwhile per Wikipedia and Glottolog there are something like seven primary branches:
1. Yuehai Yue, which is more or less Cantonese proper.
2. Siyi Yue, which includes Taishanese.
3. Goulou Yue, most notably including Yulin dialect and its sublect Bobai dialect.
4. Yongxun Yue, with Nanning Yue as the representative dialect.
5. Gaoyang Yue, most notably including Yangjiang Yue.
6. Wuhua Yue.
7. Qinlian Yue, partly intelligible with standard Cantonese.
We are using the code yue for Cantonese proper and zhx-yue for the Yue family, which is inconvenient and contrary to ISO 639-3 usage.

I propose:

Change to using yue for the family and use some more specific code for Cantonese, either yue-can or yue-yue (for Yuehai Yue).
Create L2 languages for each of the above seven groups. We can reuse the "Cantonese language" for Yuehai Yue. This shouldn't entail any real splitting per se as we already have Yue as a family rather than a language.
Demote Taishanese to an etym-only variety of Siyi Yue and assign it a code yue-tai in place of zhx-tai.

Please also note, in the labels I created, the canonical name for each label has "Cantonese" in it for all sublects of Yuehai Yue but "Yue" for Yuehai Yue itself and for all other lects. Almost everything called "Foo Cantonese" (except for variants of standard Cantonese) has an alias "Foo Yue", but not the other way around. For example, the Dongguan dialect is called "Dongguan Cantonese" because it is a variety of Yuehai Yue, and has "Dongguan Yue" as an alias; but the Yulin dialect is called "Yulin Yue" and does NOT have "Yulin Cantonese" as an alias, since it is a variety of Goulou Yue rather than Yuehai Yue. Benwing2 (talk) 22:17, 28 March 2024 (UTC)Reply

Thanks for the ping. Here are some of my questions, to make sure I understand this better:

What would the categories of a normal entry like 不嬲 look like? I'm asking this because "Cantonese" and "Taishanese" are more recognisable than "Yuehai Yue" and "Siyi Yue" and I'm wondering if these more obscure names would end up in the entry. If this works like the other Chinese splits, I suppose the categories would not change, and just the categories of the categories would change?
We have plans (maybe) to include more Yue languages than just Cantonese and Taishanese, which primarily means expanding the scope of the "pronunciation" section of the entries, and this would also generate more categories. Would your proposal benefit this project because we could more easily categorise the new Yue languages to come?
While normal entries written using Chinese characters have the "Chinese" L2 header, romanisations have their respective header per language, such as xiànglái having the Mandarin L2 header and boán-liân having the Hokkien L2 header. We don't seem to do the same for Cantonese, and the pronunciation sections also don't link to the Cantonese romanisations, and I also can't seem to find any Cantonese L2 header. This might have been decided in an earlier policy that I don't know about, so I guess my question is, would it create problems if you demote Taishanese to an etym-only language?
Per your last point I tried to google "Yulin Yue" but the main results are about someone named Yulin Yue, so I tried to google "Yulin Yue" + language and got 235 hits, while "Yulin Cantonese" got me 73 hits (and "Yulin Cantonese" + language got me only 8 hits). This isn't a question per se, just a comment about how little-known other Yue languages are.
I feel like I just have to insert a comment about the choice of Mandarin exonyms vs. Cantonese exonyms vs. endonyms. I think the first option is generally how we do things (except for the names of the main branches), and I suppose this is just the result of the general scholarship, and I'm not really trying to subvert this practice, but I would just like to raise some awareness to this phenomenon.

The above. Apologies if 1999. --kc_kennylau (talk) 23:01, 28 March 2024 (UTC)Reply

@Kc kennylau Thanks much for the detailed questions! In response to your questions, let me see if I can answer:

There are two types of categories: (1) L2 language categories (e.g. Category:Mandarin lemmas); (2) etym-language categories (e.g. Category:Xi'an Mandarin). Under my proposal, we would probably use "Cantonese" in place of "Yuehai Yue" as the L2 language name, since they seem more or less equivalent; but "Siyi Yue" would be the L2 language subsuming Taishanese. This means that a Taishanese term would be categorized both under Category:Siyi Yue lemmas and Category:Taishanese Yue (or maybe just Category:Taishanese; there is some flexibility in the choice of etym-language categories). So essentially, things like Category:Taishanese lemmas would go away in favor of Category:Siyi Yue lemmas + Category:Taishanese Yue, but Category:Cantonese lemmas would remain (possibly with additional more specific categories like Category:Guangzhou Cantonese or Category:Hong Kong Cantonese, both of which already exist).
This proposal is somewhat orthogonal to how we handle the pronunciation section entries; the ones for Cantonese and Taishanese can remain as-is, but might categorize differently (as explained above).
If there were romanizations under a Taishanese header, they would have to be renamed to have Siyi Yue as the header and a label Taishanese attached, to make it clear that the romanizations are specifically Taishanese. (Similarly, entries like boán-liân used to be under a Min Nan header before Hokkien got split out as an L2 language.) But since we don't seem to have any such romanizations, this issue won't arise (at least for now).
As for the obscurity of Yue varieties other than Cantonese and Taishanese, I completely agree. The terminology isn't well-worked out and the term "Cantonese" is particularly problematic since it variously refers specifically to (a) the speech of Guangzhou specifically; (b) the more general Yuehai Yue language that Guangzhou speech is part of [which is what I'm defining it as]; and (c) the entire Yue family. This issue doesn't seem to come up so much for other groups like Mandarin and Wu.
As for Mandarin vs. Cantonese/Yue naming, I am not wedded to using the Mandarin terms; I just chose them because that is what Glottolog and Wikipedia largely use. If the consensus is to use Cantonese-language terms for all lects or to use native terms (endonyms), we can do that as well. I am guessing the Mandarin terms see more usage just out of a sort of default familiarity (pretty much everyone who works with Chinese languages is familiar with Mandarin but many aren't familiar with Cantonese or other varieties, and several Yue varieties don't even have standard romanization schemes). Benwing2 (talk) 23:50, 28 March 2024 (UTC)Reply

I support the move in general (with a strong preference of using yue-can), however here's a couple of problems I can foresee with this proposal:

Goulou actually forms a dialect continuum with Southern Pinghua language, and therefore nowadays [csp] is usually thought of as part of Yue, but weirdly it has a separate language code. Should [csp] be included as well?
Yongxun is a (quite recent) descendant of Cantonese spoken in the major towns and cities in the Pearl River with minor influences from the substrate Goulou varieties. Personally I don't think it should be a separate branch.
As I mentioned before, there are (at least) two distinct varieties of Yue spoken in Nanning, we currently call them Nanning Cantonese (under Yongxun) and Nanning Pinghua (under Goulou-Southern Ping). How can the two be distinguished if it is renamed to "Nanning Yue"?

– wpi (talk) 04:19, 29 March 2024 (UTC)Reply

@Wpi Thanks very much for responding. In response to your issues:

I don't know enough about Pinghua to answer, but I note that Wikipedia's Pinghua article asserts that Pinghua has been treated as its own dialect group, separate from Yue, in most textbooks and surveys written since the 1980's. As for dialect continuums, there are many places where different branches form dialect continuums with each other but are still separated. (As an example, Western Bulgarian forms a dialect continuum with Torlakian, which in turn forms a dialect continuum with (other varieties of) Serbo-Croatian. Serbo-Croatian is considered a Western South Slavic language and Bulgarian an Eastern South Slavic language; despite what the Wikipedia article on Torlakian says, it's more often considered part of Serbo-Croatian than Bulgarian.) Maybe User:Justinrleung or User:沈澄心 can comment? There's an additional issue that if we group Southern Pinghua with Yue, what do we do with Northern Pinghua?
Likewise I don't know enough about Yongxun Yue to have a firm opinion; in any case it seems like we won't have any lemmas in it, so whether we make it its own L2 or group it with some other L2 (which one? Cantonese or Goulou?) wouldn't make much difference.
I think this is only an issue if (1) we leave Yongxun as its own group and (2) we put Southern Pinghua under Yue. If Yongxun is e.g. grouped with Cantonese and Pinghua left as-is, the current names are fine. If both dialects get considered non-Cantonese Yue, then one solution is to clarify them as 'Nanning Yongxun Yue' and 'Nanning Pinghua Yue' or something.

Benwing2 (talk) 04:55, 29 March 2024 (UTC)Reply

I would prefer to have Southern Pinghua be kept as its own group separate from Yue. It seems that generally speakers of Southern Pinghua would call their varieties Pinghua, distinguished from Baihua (traditionally Yue varieties). The situation in Nanning is a case in point.
I don't have a strong opinion on whether Yongxun should be a branch. The Language Atlas of China does mention a few criteria for separating Yongxun out as its own branch, but it seems like those criteria are retentions rather than innovations (from a cursory glance).

— justin(r)leung _{{ (t...) | c=› }} 18:43, 20 May 2024 (UTC)Reply

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘There has been some discussions, and for reference this is our current categorization:

Gwangfu Yue (廣府片) / Yuehai Yue (粵海片): the "main" branch of Yue that contains Cantonese (廣東話), which is the dominant language (besides Mandarin) within the Yue Chinese lects. Our current approach is to group other (more recent) descendents as sub-branches of this branch.
1. Guan-Bao Yue (莞寶片/莞寶小片): contains Dongguan Cantonese (東莞話) which is genetically close to Cantonese but might be a bit hard to understand for Cantonese speakers because of the differences in phonology. Some classify it as a sister-branch of Gwangfu, but I think we prefer to group it under Gwangfu.
2. Yong-Xun Yue (邕潯片/邕潯小片): contains Nanning Cantonese (南寧白話). Again this branch is sometimes considered separate from Gwangfu.
3. Sanyi Yue (三邑小片): the Cantonese spoken in Sanyi (literally "three counties") is highly intelligible with Cantonese, but I want to group them together because they share the innovation that their Tone 4 ("light level") is particularly high.
4. Xiangshan Yue (香山小片): contains Shiqi Cantonese (石岐話).
Siyi Yue (四邑片): the second most famous branch of Yue that contains Taishanese (台山話). This branch is particularly distinct within Yue, and there should be no debate over the status of this branch.
Gao-Lian Yue / Gao-Lei Yue (高廉片/高雷片): (the Lian 廉 here refers to the River Lian 廉江, which is unrelated to the Lianzhou 廉州 below, which is 145 km apart.) this branch is a merger of the traditional categories Gao-Yang Yue (高陽片) and Wu-Hua Yue (吳化片). The brief reason for this merge is that Gaozhou Cantonese (高州白話, the Gao of Gao-Yang) is also sometimes classified with Wu-Hua Yue, so I think it's better to just merge the two branches. I chose this name because it was also used in earlier classifications for more-or-less the same span. This covers the Yue lects spoken in the Prefectures Yangjiang (陽江), Maoming (茂名), and Zhanjiang (湛江).
Qin-Lian Yue (欽廉片): this category has more-or-less stayed the same across different classifications, but there are also (scholarly) opinions that this is more a regional grouping instead of a proper genetic branch. The following sub-branches have also been proposed in a paper where Qin-Lian is challenged (where I have removed Qinzhou Cantonese (欽州白話) which we consider to be a descendent of Cantonese instead):
1. Lianzhou Yue (廉州小片)
2. Lingshan Yue (靈山小片)
3. Xiaojiang (小江小片)
4. Liuwanshan (六萬山小片)
Gou-Lou Yue (勾漏片): this category is also quite consistent, with the main distinguishing feature being that voiced stop initials in Middle Chinese tend to become unaspirated. It is also quite distinct among the Yue lects. This lect is primarily spoken in Gwangxi instead of Gwangdong.
1. Luo-Guang Yue (羅廣小片): this is the Gou-Lou Yue which is spoken in Gwangdong. It might be a misnomer because the Luo stands for the City Luoding (羅定) in the Prefecture Yunfu (雲浮), but there might be no Gou-Lou Yue spoken here.

(Notes for non-Chinese speakers: 片 = branch, 小片 = sub-branch, 話 = dialect.)

There are some remaining problems:

Where does the name "Cantonese belong"? Should the sub-branches of Gwangfu Yue also bear the label "Cantonese"?
I support using yue for the whole branch and yue-can for "Cantonese" proper.
How should we treat sub-branches? Should they have their own codes?
Should the names be A-B Yue or AB Yue?

I am also pinging the Chinese editors again for more opinions. (Notifying Atitarev, Benwing2, Fish bowl, Frigoris, Justinrleung, kc_kennylau, Mar vin kaiser, Michael Ly, ND381, RcAlex36, The dog2, Theknightwho, Tooironic, Wpi, 沈澄心, 恨国党非蠢即坏, LittleWhole): --kc_kennylau (talk) 14:24, 23 May 2024 (UTC)Reply

Note that the proposed tree above is solely proposed by Kenny, and certain parts of it lack any sort of substantial discussion.

I strongly disagree with the proposed "Gao-Lian"/"Gao-Lei" group, as it clearly includes at least two groups with vastly distinct phonological features: Wu-Hua (1) has a three way contrast with its voiced/implosive stops and (2) pronounces MC affricates (精 series) as dentals, while the Gao-Lei and Liangyang groups (1) only have a two-way contrast and (2) pronounce MC affricates (精 series) as affricates - among many other differences. Note that the reason why Wu-Hua is sometimes described as Gao-Lei (e.g. in Zhan Bowei's 廣東粵方言概要) is most likely due to the lack of data on Wu-Hua. I should also note that Wu-Hua is sometimes considered to be an incoherent group, but regardless that should not result in placing the entirety of Wu-Hua with Gao-Lei. As to the question of whether Liangyang is distinct or not, it seems to me that the arguments for a separate Liangyang group is stronger, especially because it has a tone system distinct from the surrounding dialects and an inflectional personal pronoun system for 1/2/3pl that is much more similar to Siyi.

Essentially, my view is identical to the divisions in Language Atlas of China (but not the classification of certain lects), with the exception of placing Yong-Xun under Guangfu (since the Yong-Xun "features" are also found in a lot of modern Guangfu lects or historical dictionaries/rime books, and it is well known that Yong-Xun is descended from Guangfu) and splitting out Liangyang from Gao-Yang (Yangjiang data is not mentioned at all in the Atlas!), and perhaps also splitting out Guan-Bao and Xiangshan (according to 廣東粵方言概要), but I am uncertain as to their position within the tree.

Moreover, it would be splitting hairs when we go for the subgroups (小片), as research is often lacking beyond first level groups (even if there is research being done, often there is only one work to reference from).

Some further comments:

I think the usage of "Cantonese" among Yue lects should be relatively liberal - the general rule would be to apply it to any Guangfu lect and any dialect described as 白話, e.g. Qinzhou, Gaozhou, Nanning.
Agree with the use of yue for the whole branch and yue-can for Standard Cantonese (i.e. what we are currently using yue for).
Regarding the use of hyphen, it should be present when the name is a combination of two names. Goulou is named after the mountain of Goulou, so there shouldn't be a hyphen.

– wpi (talk) 16:10, 23 May 2024 (UTC)Reply

Thanks, Kenny and Wpi. I generally agree with Wpi's points. Kenny's Gao-Lian/Gao-Lei should be at least two groups: Gao-Yang and Wu-Hua. I don't have a strong opinion on whether Gao-Yang should be split further. As for the structure of the tree, such as whether certain groups belong under certain groups, I feel like we can be agnostic and have them placed under Yue without thinking too much about the internal groupings; this would mean we could have Yong-Xun, Guan-Bao, Xiangshan, etc. as sisters to Guangfu unless we have really strong feelings about the grouping. Luo-Guang seems to be a very erroneous idea that we should not bother adopting at all. — justin(r)leung _{{ (t...) | c=› }} 17:38, 23 May 2024 (UTC)Reply

Indeed, I should have emphasized that the tree above is not final, and I only posted it here to attract more discussion. Thank you for bringing that up.

I will talk about the Gao-Lian/Gao-Lei group here first and leave the other points to later replies.

The "three-way contrast" is not as simple as it seems. The evolution of Middle Chinese stops in Wu-Hua is not consistent. According to 粤语“吴化片”商榷 (2016) by 邵慧君, Middle Chinese *b- became /pʰ/ in Wuyang, and in Huazhou it was distributed (irregularly) between /p/ and /pʰ/. Using Jyutdict I was able to verify this (see table below). Note how 婆 became /p-/ in Shangjiang and /pʰ-/ in Xiajiang, and 抱 is the other way round. According to the paper, *p- became /ɓ-/ in Wuyang just like in Huazhou, but even so, since *b- became universally /pʰ-/ in Wuyang, that would only be a two-way contrast. Of course, the "number" of labial plosives isn't the important point here, but rather "how" they correspond with Middle Chinese and with each other. The situation becomes even more complicated if we account for the influence of dominant languages in this area, and I believe that *b- > /pʰ-/ in Wuyang is the effect of Hakka.
In summary, if you take *p- > /ɓ-/ as the defining feature of Wu-Hua, then it fails because it is not universal (even though you might attribute the remaining lects that have /p-/ as Cantonese influence); if you take the evolution of *b- instead, then it also fails because it is inconsistent between the lects.
As for pronouncing 精 as dental, if you look at the map in 醉 in Jyutdict, you will find that indeed the four Wu-Hua languages recorded all have a dental /t-/. However, if you keep going up from there, you will find that the dental initials continue to Yulin (鬱林) of Goulou Yue, and then even to Wuzhou (梧州) of Gwangfu Yue. To the right, though disconnected, you will find that Taishanese and Kaiping (開平) of Siyi Yue also have a dental initial. Indeed, it is possible that the dental initial spread from Wu-Hua to Yulin, just like how the guttural "R" spread all throughout Europe. However, I don't see an argument of why it has to be genetic in Wu-Hua in the first place.
According to the paper, Li Jian (李健) said that "鉴江源出粤西信宜市北部山区,南流经信宜、高州、化州、吴川四市入海。......整个流域粤语不但极为相似,而且南北渐变的痕迹也十分明显。" (paraphrase: the dialects of Xinyi, Gaozhou, Huazhou, and Wuchuan form a continuum). I don't think this observation can be attributed to a "lack of data". While the dialect in Gaozhou seems to me to be highly similar to Cantonese, I did find that interestingly the character 坐 has an /-ɛ/ final in Gaozhou and also in the Wu-Hua lects.
As for the Liangyang group, I have not looked a lot into this, so I will take your side and assume that Liangyang should indeed form a group. However, this does not contradict with my proposed Gao-Lei group, where there can simply be a Liangyang sub-branch. I do wonder though how you view the "inflectional personal pronoun system" as you mentioned that is "much more similar to Siyi". Do you think Liangyang split off from Siyi, or do you think Proto-Cantonese had such a system that was lost in other lects, or do you think this feature arose by contact between Liangyang and Siyi?

Character	Middle Chinese initial	Tone Category	Zhanjiang (湛江)	Wuyang (吳陽)	Huazhou Shangjiang (化州上江)	Huazhou Xiajiang (化州下江)
巴	*p-	level (平)	/pa/	/pa/	/ɓa/	/ɓa/
怕	*ph-	departing (去)	/pʰa/	/pʰa/	/pʰa/	/pʰa/
皮	*b-	level (平)	/pʰei/	/pʰei/	/pɛi/	/pɛi/
婆	*b-	level (平)	/pʰɔ/	/pʰɔ/	/pɔ/	/pʰɔ/
抱	*b-	rising (上)	/pʰoɐu/	/pʰoɐu/	/pʰɔu/	/pɔ̯ɒu/
鼻	*b-	departing (去)	/pʰei/	/pʰei/	/ɓɛi/	/pɛi/
白	*b-	entering (入)	/pʰaʔ/	/pʰaʔ/	/ɓak/	/pak/

--kc_kennylau (talk) 19:53, 23 May 2024 (UTC)Reply

By the way, we have three Yue lects currently covered by zh-pron (see 五), which are Dongguan Cantonese, Yangjiang Yue, and Yulin Yue.^{(COI: I added them.)} Should we have language codes for these three varieties? Something like yue-dgx, yue-yjx, yue-ylx? --kc_kennylau (talk) 14:58, 25 May 2024 (UTC)Reply

(Addendum: we just removed Yulin Yue) --kc_kennylau (talk) 15:00, 25 May 2024 (UTC)Reply

(You mean in addition to the two lects that have been here longer, so actually a total of four Yue lects now.) — justin(r)leung _{{ (t...) | c=› }} 15:12, 25 May 2024 (UTC)Reply

Just to help me understand the "lay of the land", are there papers that specifically group the dialects traditionally classified as Gao-Yang and Wu-Hua together? If so, what is the name they use for such a grouping? (From the way this was described above, it feels a little original-researchy, which we don't want to do.) — justin(r)leung _{{ (t...) | c=› }} 15:20, 25 May 2024 (UTC)Reply

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ (cc @Benwing2) After more discussion, @Justinrleung and @wpi have mostly agreed with the following tree (the codes are added by me):

Guangfu Yue (廣府片) yue-guf
Guan-Bao Yue (莞寶片) yue-gub
Xiangshan Yue (香山片) yue-xis
Yong-Xun Yue (邕潯片) yue-yox
Siyi Yue (四邑片) yue-siy
Liangyang Yue (兩陽片) yue-liy
Gao-Lei Yue (高雷片) yue-gal (defined as Gao-Yang in the Atlas minus Liangyang)
Wu-Hua Yue (吳化片) yue-wuh
Qin-Lian Yue (欽廉片) yue-qil
Goulou Yue (勾漏片) yue-gol

I also mostly agree with this, but I would just like to note that Guan-Bao, Xiangshan, and Yong-Xun (and likely Gao-Lei as well) are descended from Guangfu, and the last four (Gao-Lei, Wu-Hua, Qin-Lian, Goulou) branches are more areal than genetic. From what I can gather, the reason this structure is preferred over a more nested one is because currently all the genetic relationships are still not clear, as Justinrleung explained above.

I also don't know if some of the above branches should have "~ Cantonese" as an alias.

--kc_kennylau (talk) 13:30, 26 May 2024 (UTC)Reply

Agree with the above list of groups. For Wiktionary purposes, we would simply treat all ten of them as direct descendants of Yue without being specific on their relationship. (yue "Yue" would be a family)

On top of these I think we should have the following full code:

yue-can, "Cantonese", equivalent to (some of) the current use of yue, parent yue-guf

and the following etymology codes:

yue-gzh, "Guangzhou Cantonese", equivalent to existing yue-gua, parent yue-can
yue-hkg, "Hong Kong Cantonese", equivalent to existing yue-HK, parent yue-can
yue-tai or yue-hsv, "Taishanese", equivalent to some of the existing zhx-tai, parent yue-siy

The "Cantonese" suffix could be applied to (dialects of) Guangfu, Guanbao, Xiangshan, Yongxun, and other "Baihua" varieties such as Qinzhou and Gaozhou, all of which are often considered to be related to Standard Cantonese.

– wpi (talk) 14:11, 26 May 2024 (UTC)Reply

Agree. --kc_kennylau (talk) 21:24, 29 May 2024 (UTC)Reply

See also: #Merger into Scandoromani

Lattjo dives! I have started to make some more Scandoromani and there are 4 main problems which i need to ask about advices before i can go on.

Problem 1. As far I understood, Tavringer Romani is Swedish Scandoromani, also known as Traveller Swedish. Tavring is not something exlusively Swedish, and we already have Traveller Norwegian. May it be a good idea to rename Tavringer Romani to Traveller Swedish? Anyway, it's almost no difference between TS and TN, so may it be even a better idea to merge them into one L2 (Scandoromani)? See also the same problem number 4 about Månsing.

orthographies are consistently different, which seems to be the case. - said Theknightwho once about this problem. But is it really a good reason?

Problem 2. More serious one. Some of my first editions on Wiktionary were in Scandoromani and then i was so dumb that i have not included sources on the most entries i've created. And now many of my sources are completely gone from internet. Now i remember that some entries - i don't remember which exactly - are not even from sources, but i've created them together with my former neighbor, an old drunk guy who spoke the language. I mean, i checked them in dictionaries and found them, but some of them not, and now i don't remember which one exactly, and some of the dictionaries are gone.

Dictionaries i remeber but can not find: an old web 1.0 Norwegian website with black background; an long English PDF with ugly monospaced font comparing Scandoromani and Kalo; a scan of an old Swedish book with big fat letters"

Problem 3. What is "Tavringer Romani terms in nonstandard scripts"-category? The script is unspecified, so why is this category coming up?

Problem 4. What to do with Rodi and Månsing? They are jargons of Swedish and Norwegian, so how we should refer to them? I use to refer to them as jargons, using code "sv" (Swedish), specifying that its also used in Norwegian. I hope it's ok to do so. Otherwise, we maybe need them as independent L2s.Tollef Salemann (talk) 19:42, 15 June 2024 (UTC)Reply

Guachí is an extinct language known to have been spoken in Argentina in the 19th century; the only record is a word list of 145 words, from 1845. Apparently, it's usually classified as Guaicuruan, but WP says the data is insufficient to demonstrate that. For reference, we already have Appendix:Guachí word list. Theknightwho (talk) 14:18, 17 September 2024 (UTC)Reply

Hi, in the future I'd recommend not adding a language even if you want to, but no one replies to your suggestion to add it in 10 days. In general you need at least one other person to look over and agree with your suggestion. Please don't take silence as consent. In this case you should have pinged User:-sche, who can give you thoughts. I'm personally a bit skeptical as to whether a single word list is enough data to indicate even that it's a separate language as opposed to either a dialect of an existing language or a mishmash of randomly collected words. Benwing2 (talk) 10:19, 28 September 2024 (UTC)Reply

Same thing goes for Kalašma, which you recently added with a similar "silence = consent" assumption. Benwing2 (talk) 10:20, 28 September 2024 (UTC)Reply

Wiktionary's canonical name for the language kla, spoken by the Klamath and Modoc peoples, is currently "Klamath-Modoc", which reflects the fact that the two peoples spoke different dialects. I propose that it be renamed "Klamath", which is the name that sources discussing the language predominantly (though not universally) call it.

The Klamath Tribes themselves call the language "Klamath". (The Modoc Nation could conceivably have a stake in the language being called "Klamath-Modoc", but I can't find any references to the language by name on their website.)
Most of the academic literature I can find about the language identifies it as "Klamath". In particular, the works of Albert S. Gatschet and M. A. R. Barker, who each produced by far the most extensive and most cited documentation of the language, call it "Klamath".
- The search string "Klamath language" yields significantly more results in both Google Scholar and JSTOR than the string "Klamath Modoc language".
The English Wikipedia article for the language has been titled "Klamath language" since 2011. Also, almost all sources in that article's bibliography refer to the language as "Klamath".

(In the interest of a fully informed discussion, it's worth noting that the following sources use the name "Klamath-Modoc": SIL International, Ethnologue, Glottolog, OLAC, and the California Language Archive.)

— Äþelwulf (talk) 20:56, 24 September 2024 (UTC)Reply

Is there anything I can do to elicit input on this matter? — Äþelwulf (talk) 20:19, 15 October 2024 (UTC)Reply

@Athelwulf Maybe ping User:-sche, who is often involved in these discussions? -sche, can you ping anyone else who you think might have relevant comments? Benwing2 (talk) 21:20, 15 October 2024 (UTC)Reply

BTW the fact that both Ethnologue and Glottolog use the name "Klamath-Modoc" is significant, although not decisive. Benwing2 (talk) 21:22, 15 October 2024 (UTC)Reply

You are right that "Klamath" is the more common term, and although it is hard to be sure how many uses of it mean the language [that encompasses both 'Klamath' and 'Modoc'] and how many mean the dialect ("Klamath-Modoc" is arguably clearer about the scope), probably our preference for using the most common name should lead us to use Klamath here.
It is interesting that there are almost no uses of the native name. ("Klamath" is derived from the Upper Chinook designation for all the natives of the Klamath River Basin, including the Klamath and Karuk and Shasta and Yurok — Modoc is at least [a clipped rendering of] a Klamath-Modoc word for that variety — and Victor Golla, in California Indian Languages (2022), page 135, notes that after "Gatscher used 'Klamath' as the specific ethnographic name for the Indians of the reservation on Upper Klamath Lake and for their dialect of Klamath-Modoc, [...] this usage soon became standard among anthropologists [but] there was [initially] reluctance, however, to extend the term to the Modocs, who had been treated as a separate tribe since the Modoc War of 1872-1873 and their subsequent removal to Oklahoma.") - -sche (discuss) 21:26, 21 October 2024 (UTC)Reply

Hello, I wrote wiktionary articles in Azerbaijani written in the Azerbaijani Abjad (Turco-Perso-Arabic alphabet), but some other Azerbaijani users cancel all my edits on the pages, because they are "too old for Azerbaijani". The question is related to the constant rollbacks of information from articles written in the Azerbaijani Abjad alphabet, I constantly encounter these restrictions that they write "this word does not exist in modern Azerbaiani". This is due to the fact that the ancestor of the Azerbaijani language is not defined in Wiktionary, or rather it is defined as Old Anatolian Turkish, but this is too ancient an ancestor. For comparison, in the Turkish language (of Turkish Republic) the ancestor is indicated as the Ottoman language and then the old Anatolian Turkish, this is logical, Ottoman Turkish was used until 1920s. This completely solves the problem in the case of the Turkish language (of Turkish Republic). At the same time, there is no solution to this problem for the Azerbaijani language - the ancestor of the Azerbaijani language is indicated in wiktionary as Old Anatolian Turkish, which was used until the 14th century at the latest. Azerbaijani has no ancestor in the time intervals from the 15th to the beginning of the 20th century (according to various sources, modern Azerbaijani can begin in 1922-1923, when the USSR occupied Azerbaijan, or in 1928, when the USSR translated the Azerbaijani language into latin alphabet) — Azerbaijani has no ancestor in the time intervals from the 15th to the beginning of the 1920s. However, historically, the ancestor of Azerbaijani was considered as Ajami Turkish (trk-ajm, "Turkish of Persia" and was language of Qajars, Afshars, Qizilbashs, Qashqayi, Afshar etc, it is also ancestor for Iraqi Turkmen and Sonqori languages, also possible for Khorasani Turkish and Khalaji languages, For example, In book The Turkic varieties of Iran , Christine Bulut says (page 406) that written language for theese language was Ajam Turkic since 16th century. It is a good term). I could write Azerbaijani articles written in the Abjad alphabet within this language so as not to encounter restrictions, but as I understand it is not possible at the moment. Please help me with this issue, since I have a lot of literature and I want to create pages indicating these words, but I encounter restrictions from other users.

At the moment Azerbaijani language page says that Azerbaijani language comes from:

but it should be

Please, create the language Category for this language Ajami Turkish (https://www.wikidata.org/wiki/Q110812703) and make it ancestor it for Azerbaijani language. It will look like this: Azerbaijani language comes from Ajami Turkish (trk-ajm), which comes from Old Anatolian Turkish:

m["trk-ajm"] = {

"Ajami Turkish",

110812703,

"trk-ogz",

"fa-Arab",

ancestors = "trk-oat",

entry_name = {["fa-Arab"] = "ar-entryname"},

}

Sebirkhan (talk) 19:31, 8 December 2024 (UTC)Reply

Branch from: Church Slavonic and Moravian.

Recently Church Slavonic (zls-chs) was created as L2. Everything is fine, but Church Slavonic (CS) is divided into "dialects" (recensions or redactions), the spelling of which is very different in places. There is obviously a need for etymological codes for different variants of CS. The most famous recensions of CS are Croatian, Serbian and Russian. But in reality there are more. What are your suggestions, what can be done in this situation? What codes could be created? AshFox (talk) 09:00, 11 January 2025 (UTC)Reply

I recently studied the situation in the East Slavic variants of CS. Often the East Slavic variant of CS is simply called "Russian Church Slavonic" (RuCS), but this is a very, very simplified term. Because the East Slavic variant of CS that existed in the times of Rus (around 988‒1450) is very different from RuCS that exists now, which is very developed and whose spelling is extremely different from the archaic spelling of the times of Rus. The rules for using letters and spelling in modern RuCS can be found, for example, in Смирнова А. Е. (2024), Церковнославянский язык в таблицах. For example, the Greek name "Xenia" in modern RuCS orthography is written as Ѯе́нїѧ (Ksénija), while in the times of Rus it would have been *Ѯениꙗ (*Ksenija) < Gr. ξενία (xenía). Another significant difference is the presence of the reduced ъ (ŭ) / ь (ĭ) in RuCS during the Rus times and their complete absence now. And so on... In general, there are many differences between the modern RuSC and the RuSC of the Rus times, which does not allow us to perceive the Eastern Slavic version of CS from 988 AD to the present day as one "Russian Church Slavonic".

Recension of CS used in the times of Rus is called "ru:Древнерусский извод церковнославянского языка", which is roughly literally "Old Russian Church Slavonic". But the term "Old Russian" is now obsolete, so I propose calling it "Old East Church Slavonic" with code (zls-chs-orv), by analogy with Old East Slavic (orv). The term "Old East Church Slavonic" is already used on Wiktionary about 230+ times. "Old East Church Slavonic" will designate CS entries of words that are not Old Slavonic and at the same time have characteristic East Slavic features (ru:Древнерусский извод церковнославянского языка#Характеристика).
Afterwards, the "Old Moscow Church Slavonic" (ru:Старомосковский извод церковнославянского языка) comes from it, which was used in Muscovy approximately in the period 1450‒1650. I don't know, maybe it shouldn't be singled out separately, but considered as part of the next stage ‒ from 1650 to the present day ‒ ru:Синодальный извод церковнославянского языка : "Russian Synodal Church Slavonic" (also called "Новомосковский извод церковнославянского языка", literally "New Moscow Church Slavonic"). This modern version of Russian Church Slavonic is very developed, there is :a lot of literature and dictionaries on it, for example {{R:cu:Dyachenko:1900}}, {{R:ru:STsSRJa}} or "Большой словарь церковнославянского языка Нового времени". (Different spelling norms and forms of words do not allow the modern Synodal Church Slavonic to be united into one whole with СS CS used in the times of Rus ‒ "Old East Church Slavonic".) I propose to name this particular variant as "Russian Church Slavonic" with code (zls-chs-ru), from 1650 ‒ present day. Or you can call it "Russian Synodal Church Slavonic" or "Synodal Church Slavonic".
Also, after itself "Old East Church Slavonic" left not only "Old Moscow Church Slavonic" (> "Russian Synodal Church Slavonic"), but also Church Slavonic language on the territory of modern Belarus and Ukraine. There is very little information about it, but it is often called "Ruthenian Church Slavonic" (ru:Украинско-белорусский извод церковнославянского языка), which is also called "Ukrainian Church Slavonic". For completeness, can add him etym-code.

The tree part looks like this:

─┬ Church Slavonic (zls-chs)
 ├[-]┬ Old East Church Slavonic (zls-chs-orv)
 │   ├──── Russian Church Slavonic (zls-chs-ru)
 │   └──── Ruthenian Church Slavonic (zls-chs-rt)

AshFox (talk) 11:50, 11 January 2025 (UTC)Reply

Sounds sound. Fay Freak (talk) 12:57, 11 January 2025 (UTC)Reply

Support. Etymology codes are an easy way to add precision without too much complexity. Vininn126 (talk) 12:59, 11 January 2025 (UTC)Reply

In the future, I think it would be desirable to have codes like this (names and codes themselves can be clarified/changed):

Czech-Moravian Church Slavonic (zls-chs-cs)
Bulgarian Church Slavonic (zls-chs-bg)
Macedonian Church Slavonic (zls-chs-mk)
Serbian Church Slavonic (zls-chs-sr)
Croatian Church Slavonic (zls-chs-cr)
Wallacho-Moldavian Church Slavonic (zls-chs-ro)

AshFox (talk) 18:43, 11 January 2025 (UTC)Reply

This division is slowly devolving into the tumultuous state of affairs from the times of Belić and Mladenov. I struggle to see how one can comprehensively differentiate between all of these renditions? Are we going to follow linguistic, historical, or geographical criteria?

For example, what label one should give, say, to Gregory Tsamblak writings? Did he write in Bulgarian, in Serbian, in Wallacho-Moldavian, or in Ruthenian ChSl.? After all, in different periods of his life, he worked in different places.

Or what label should be given, e.g., to Didactic gospels? Its author is Constantine Preslavsky, but all surviving copies were written in Medieval Ruthenia?

IMO, if specification is required, just write the concrete source of the word. It will save us the hassle of splitting hairs.

All in all:

Support for renditions that can be localized (Czecho-Moravian, Croatian, Rascian) and

Oppose for the rest. Безименен (talk) 10:25, 19 February 2025 (UTC)Reply

PS Bulgarian/Macedonian Church Slavonic is split into Literary Schools and into different time periods. It is misleading to clump them all together. For example, a text written by Preslav Literary School would have more in common with Czecho-Moravian from the same time period, rather than with the Tarnovo Literary School which emerged after from XII cent. Безименен (talk) 10:41, 19 February 2025 (UTC)Reply

Support. Chihunglu83 (talk) 21:45, 14 January 2025 (UTC)Reply

See Wiktionary:Beer_parlour/2025/January#"Old"_and_"Orkhon"_Turkic,_plus_some_more for the discussion leading up to this

I request the creation of these following language headers:

AmaçsızBirKişi (talk) 10:06, 25 January 2025 (UTC)Reply

@Benwing2 can u help with this? Zbutie3.14 (talk) 01:50, 15 February 2025 (UTC)Reply

Are these new L2 languages or etymology variants of existing L2 languages? The former require a lot more consensus than the latter. Benwing2 (talk) 02:06, 15 February 2025 (UTC)Reply

I think they are etymology variants. We already have the tags [xbo] and [otk], [trk-blg-pro] is an etymology tag that deals with Bulgaric languages [xbo] and [cv], excluding [zkz], us having the ability to distinguish these would be nice. [trk-ajm] is also an etymological language, for [az] and [qxq].

AmaçsızBirKişi (talk) 14:15, 15 February 2025 (UTC)Reply

Also there are these 2 siberian turkic languages with no code

https://en.wikipedia.org/wiki/Soyot_language

https://en.wikipedia.org/wiki/Dukhan_language Zbutie3.14 (talk) 21:52, 15 February 2025 (UTC)Reply

What is proto-bulgaric for? Isn't trk-ogr what we use for oghuric? Zbutie3.14 (talk) 15:17, 15 February 2025 (UTC)Reply

It's mainly for distinguishing from which stage of the Bulgar did loanwords into other languages derive from., at least in theory.

Mongolic loans from Oghur branch would separate from Oghuric, but Hungarian/Church Slavonic loans would separate from Bulgaric for instance:

Oghur [trk-ogr]
- Mongolic (borrowed from Oghur)
- (...) (borrowed from Oghur)
- Proto-Bulgaric [trk-blg-pro]
  - Church Slavonic (borrowed from Bulgar)
  - Hungarian (borrowed from Bulgar)
  - (...)

AmaçsızBirKişi (talk) 16:24, 15 February 2025 (UTC)Reply

OK I need a little more help. Etymology variants are always variants of something else (either another etymology variant or an L2 language) and can have their ancestor set separately (e.g. Old Italian is considered an etymology variant of Italian, but Italian has Old Italian as an ancestor). Currently xbo (Bulgar) is an L2 language with trk-pro (Proto-Turkic, another L2 language) as its ancestor. Would trk-blg-pro (Proto-Bulgaric) be an etym variant of trk-pro and would the ancestor chain go trk-pro -> trk-blg-pro -> xbo? And would xbo-dnb (Danube Bulgar) and xbo-vol (Volga Bulgar) be etym variants of xbo (Bulgar) and also have trk-blg-pro as their ancestor? (In a case like this I suspect we don't have to set the ancestor explicitly; xbo-dnb and xbo-vol would automatically have their ancestor as the same as xbo. @Theknightwho for verification.) And since we already have otk (Old Turkic) as an L2 language with trk-pro as its ancestor, would Orkhon Turkic (otk-ork) be an etym variant of otk and have trk-pro as its ancestor? Finally, where does trk-ajm (Ajem Turkic) fit? Currently, Azerbaijani has Old Anatolian Turkish (trk-oat) as its ancestor; presumably Ajem Turkic would slot in between Azerbaijani and Old Anatolian Turkish in the ancestor chain; would trk-ajm be an etym variant of trk-oat or of az? (i.e. which one is it more similar to?) And should the name be "Ajem Turkic" or "Ajami Turkic"? Benwing2 (talk) 22:09, 15 February 2025 (UTC)Reply

@BurakD53 wrote a paragraph about bulgar in the original thread

Looking at the family tree on here, https://en.wiktionary.org/wiki/Category:Proto-Turkic_language, Classical Azeri (az-cls) is a descendent of Azeri (az) so it goes az -> az-cls. It says that Classical Azeri is the form of Azeri used in the 16th - 20th century. Shouldn't it be the other way around and renamed to ajem? So it should be: old anatolian turkish -> ajem -> azeri Zbutie3.14 (talk) 23:47, 15 February 2025 (UTC)Reply

What is going on with that page???? old turkic is supposed to descend from south siberian, salar is supposed to descend from oghuz, why are the descendents of common turkic not listed as its descendents, there is no Sayan or Yenisei under south siberian. Am I missing something or is the page just garbage? Zbutie3.14 (talk) 00:03, 16 February 2025 (UTC)Reply

It's not garbage; it just needs some ancestors set. But as someone not familiar with the whole Turkic family tree, I need specific settings from you, @AmaçsızBirKişi and @BurakD53 before I make any changes. It sounds like there are some issues still to be worked out. Benwing2 (talk) 00:24, 16 February 2025 (UTC)Reply

I think the structure I've been working on this past week in https://en.wiktionary.org/wiki/User:Zbutie3.14/trtable is the most accurate we have right now, @AmaçsızBirKişi and @BurakD53 please look at it and tell me if anything needs to be changed Zbutie3.14 (talk) 00:44, 16 February 2025 (UTC)Reply

All right but I still need you and @AmaçsızBirKişi to review my suggestions above for how to put this info into language codes. Benwing2 (talk) 00:51, 16 February 2025 (UTC)Reply

Frankly, I'm not sure if there is a difference between Proto-Bulgar and Proto-Oghur. When we look at it, we would reconstruct Proto-Turkic *öküz as Proto-Bulgaric *ökür because in Hungarian, which contains loanwords from Bulgaric, it appears as ökör. Similarly, we would reconstruct the Turkish word yemiş as Proto-Bulgaric *yémilč based on Hungarian and Chuvash, and this would be the same in Proto-Oghur. In this case, I’m not sure if we can speak of two separate languages. If there is a difference between Proto-Bulgar and Proto-Oghur, you (@AmaçsızBirKişi) should explain what that difference is. As things stand, unfortunately, there doesn’t seem to be any differences.

The Soyot language is considered a dialect of Tofa, and the Dukhan language is considered a dialect of Tuvan. If you believe these languages are distinct enough that they shouldn't be classified as dialects, then you should identify and describe the specific points of divergence. Then we can decide whether they should be considered dialects or not.

Regarding Bulgaric, we know that Danube Bulgar and Volga Bulgar are clearly distinct from each other. I have mentioned this before. Their writing systems, languages, and the cultures they were influenced by are all different. A people who adopted Islam, the Arabic script, and fell under Mongol rule cannot be equated with a people who adopted Christianity, Greek and Cyrillic scripts, and eventually became Slavicized. That’s why, as I said before, the distinction between xbo-vol and xbo-dnb can be made. In fact, we could also add xbo-kbn (Kuban Bulgar), which we will discuss in relation to Hungarian loanwords. However we don't have any inscription in this language.

On the site, Old Turkic includes both Orkhon Turkic and Yenisei Kyrgyz. There is already a separate code for Old Kyrgyz, but one could also be added for Orkhon Turkic. When placing it in the Descendants list, we should not forget that Yenisei Kyrgyz is a continuation of Orkhon Turkic. Even today, the Khakas people, who still bear the name "Kyrgyz" in the region, should be their descendants. After all, when we look at them, the Khakas, just like the Old Kyrgyz, are not Buddhists.

I'm not sure if Ajem Turkic and Classical Azerbaijani Turkic were actually a distinct language. Previously, I argued that they should be considered separate from Old Anatolian Turkish, but when I examined works supposedly written in the Azerbaijani region, I found nothing other than Old Anatolian Turkish. Whichever text I looked at, the language was OAT. This makes sense because there were two literary languages: one was Chagatai Turkic, also known as Eastern Turkic, and the other was Old Anatolian Turkic, also known as Western Turkic, which later evolved into Ottoman Turkish. Writers produced works in these two languages.

In short, I now believe that Azerbaijani should be classified under Old Anatolian Turkish. As for the term Ajem Turkic, it can be used not to indicate a distinct language but rather to refer to both Azerbaijani Turkic and Qashqai, since Ajami means Iranian in our language. Given that it refers to Turkic spoken in the Iranian region, this naming can be justified.

Regarding Classical Azerbaijani, there is no clear-cut distinction between it, Old Anatolian Turkish, and Ottoman Turkish. However, if a logical framework can be established and its distinction from other languages is clearly defined, perhaps a code could be assigned. Personally, I don't see this distinction clearly. Even in Fuzuli’s works, both ben and men appear within the same couplet. Maybe one distinguishing feature could be the use of -em instead of -üm as a suffix. Idk. BurakD53 (talk) 08:54, 16 February 2025 (UTC)Reply

@BurakD53 Thank you very much for your detailed comments. Keep in mind that etym variant codes can be assigned for lects that are not distinct enough to warrant treatment as a separate L2 language but where there is enough of a distinction where it makes sense to make a distinct lect code. As for the five proposed codes above, I think you're saying that Proto-Bulgar isn't needed; xbo-vol and xbo-dnb can be etym variants of xbo; Orkhon Turkic can be an etym variant of otk; and Ajem/Ajami Turkic is not a separate language hence a code isn't needed, or at most it needs to be an etym variant of Ottoman Turkish. Is that right? Benwing2 (talk) 09:03, 16 February 2025 (UTC)Reply

Yes, you're right. I don't think Ajem Turkic is necessary. The Oghuz classification in User:Zbutie3.14/trtable is completely suitable for me. I sincerely thank @AmaçsızBirKişi and @Zbutie3.14 for their efforts, and you as well for your evaluations. BurakD53 (talk) 09:35, 16 February 2025 (UTC)Reply

OK, just to clarify that I have this right:

Proto-Bulgar (or should it be Proto-Bulgaric?): Same as Proto-Oghur, but we have no language for this. Should we create Proto-Oghur and assign it trk-ogr-pro? If so, should this be an L2 language or an etym variant of trk-pro?
Danube Bulgar [xbo-dnb]: Make an etym variant of xbo (Bulgar).
Volga Bulgar [xbo-vol]: Make an etym variant of xbo (Bulgar).
Orkhon Turkic [otk-ork]: Make an etym variant of otk (Old Turkic)? Confusingly, we have Old Turkic and Old Uyghur as separate L2 languages, but Wikipedia says that Old Uyghur is a later dialect of Old Turkic. In that case, what is the difference between Wiktionary's Old Turkic and Old Uyghur lemmas? Should Old Uyghur have Old Turkic as an ancestor? Should Old Uyghur be merged into Old Turkic?
Ajem/Ajami Turkic: Make it an alias of Classical Azerbaijani. Make Classical Azerbaijani the ancestor of modern Azerbaijani and Qashqai.

Benwing2 (talk) 09:55, 16 February 2025 (UTC)Reply

1. Proto-Oghur would be a more inclusive term, allowing us to include the Khazars as well. Additionally, there is a separate Oghur dialect known as the s-dialect, traces of which can be found in Hungarian and some Uralic languages. These loanwords contain sz instead of gy in word-initial position, as they were borrowed from a different dialect.

Support

4. In linguistic literature, Old Turkic includes Orkhon Turkic, Old Kyrgyz, Old Uyghur. However, Orkhon Turkic and Old Kyrgyz use the same script (despite Old Kyrgyz have some special letters) and share the same religion but in the different region. Old Uyghur, at least on the site, refers to texts written in the Old Uyghur script and associated with Manichaean or Buddhist traditions. The Old Uyghurs also produced works in the Orkhon script, but we classify those inscriptions under Orkhon Turkic on the site. For example, Irk Bitig, despite being a Manichaean divination book, is categorized as Orkhon Turkic. As far as I know, its language does not differ significantly from Orkhon Turkic. I would classify like this:

Old Turkic:
- Orkhon Turkic: (written in Old Turkic Script around the Orkhon Basin between 7th to 9th centruies)
  - Yenisei Kyrgyz: (written in Old Turkic script with Yenisei variants around the Yenisei Basin between 8th-13th centruies)
  - Old Uyghur: (written in Old Uyghur script which derived from Sogdian script around the Mongolia, Hami, Turpan, Gansu regions between 9th-14th centruies)

Or this:

Old Turkic:
- Orkhon Turkic:
- Old Uyghur:

Support BurakD53 (talk) 10:40, 16 February 2025 (UTC)Reply

Old Turkic entryies are the entries of both Orkhon Turkic and Yenisei Kyrgyz. But Old Uyghur has different entries. Old Turkic is used as an umbrella term here, but Old Uyghur entries are treated separately. BurakD53 (talk) 10:45, 16 February 2025 (UTC)Reply

The reason I wanted separate headers for Proto-Bulgar and Proto-Oghur is that they are definitely not the same language. pOghur was thought to have been spoken before 1st century AD., while Proto-Bulgaric is much more recent (6th-13th centuries.)

Proto-Bulgar is also known as West Old Turkic, which was concurrent with the East Old Turkic (i.e. Orkhon, Yenisei, Uyghur, Karakhanid)

For an example, I'd point to *bugday. The ideal way for the Oghuric descendants to be written would be like this:

pTurkic: *bugday
- Early pOghur: *bugday ~ *buday
  - (bor) pMongolic: *buguday
  - (bor) pMongolic: *budagan
    - Late pOghur: *buɣδay
      - Early pBulgaric: *buɣzai̯
        Late pBulgaric: *būza
        (bor) Old Hungarian: buʒa
        
        Old Chuvash (MČ1): *pŭraĭ
        Middle Chuvash (MČ2): *pŭri

Whether or not we need as much detail as this one is up for debate, but having two different language codes for Proto-Bulgar and Oghur seems like a no brainer for me.

(By the way, I have used 'Old Chuvash' in that entry for Proto-Bulgar, and that page also has some problems, but the desclist I've written above must be correct, here are the sources: ^[1] and ^[2])

^ Agyágasi, Klára (2019), Chuvash Historical Phonetics (Turcologica; 117), Wiesbaden: Harrssowitz, page 240

^ Róna-Tas, András; Berta, Árpád; Károly, László (2011), West Old Turkic: Turkic Loanwords in Hungarian (Turcologica; 84), volume 1, Wiesbaden: Harrassowitz Verlag, pages 186-188

AmaçsızBirKişi (talk) 11:28, 16 February 2025 (UTC)Reply

Is the dh > z change mentioned by Kashgari considered Proto-Bulgaric here? Do the Hungarian loanwords follow the dh > z pattern, or is this specific to just this word? BurakD53 (talk) 11:47, 16 February 2025 (UTC)Reply

That δ > z is just a step in the larger Bulgaric sound shift of *-d- > -r-. In the book by Róna-Tas, it's dubbed the "second rhotacism" and the following chain of sound changes are given: pTurkic cluster *-Vgd- leniates to *-Vgδ- > *-Vɣz- > *-V̄z- > *-Vr- and finally to -V̆r-.

I guess it is independent of the *-d- > *-y- sound shift present in other Turkic languages, but they have have affected each other.

The -ɣ- deletion and the lenghthening of the previous vowel seems to be a common theme before -d- in Bulgar, I don't know enough to call it regular, but see these for example:

pTurkic: *edgü ("good")
- pOghur: *ed(ɣ?)i ~ -ü
  - pBulgaric: *edV
    - (bor) Old Hungarian: idʲ ("holy") [i > e change is regular]
      - Hungarian: egyház ("church")
pTurkic: *yogur- ("to knead")
- pOghur: *ǯuɣur-
  - pBulgaric: **Cūr- (?)
    - (bor) Old Hungarian: dʲǖr-öd
      - Hungarian: gyúr ("to knead, pug")
- pOghur: **ǯiɣur- (?)
  - pMongolic: *ǯigura-
  - pBulgaric: **Cǖr- (?)
    - - (?bor) Hungarian: gyűr ("to crumple")

Source: Same book and volume by Róna-Tas and Árpád, pages 307-310, 411

AmaçsızBirKişi (talk) 12:18, 16 February 2025 (UTC)Reply

Forgot to add that these examples also should any doubt as to whether or not to have a distinct Bulgar language code, apart from Oghur. Using Old Chuvash [cv-old] (c. 13-15th century, following the Volga Bulgar) for this would not be accurate at all. AmaçsızBirKişi (talk) 12:20, 16 February 2025 (UTC)Reply

Since we have adhine > ايرنى "erne" in Volga Bulgar, we can say that there is no trace of this z-shift in VB. Unfortunately, there are no recorded Volga Bulgar words that could serve as examples of this change. We can only confirm that the r-form exists for this specific word. However, if it is claimed that there was an intermediate stage *azne, considered Proto-Bulgaric, then this intermediate phase must have been significant, so we should have a language code. If we accept this, wouldn't Kashgari’s 11th-century record of azak (instead of ayak) for the Bulgars, Yemeks, Suvars, and some Kipchaks be classified as Proto-Bulgaric? But why wouldn’t Kuban Bulgaric *z < Proto-Bulgaric *dh > Volga Bulgaric r be considered a valid transition? Are we certain that Volga Bulgar evolved from an earlier *z? BurakD53 (talk) 12:43, 16 February 2025 (UTC)Reply

I mean why not this:

pTurkic: *bugday
- Early pOghur: *bugday
  - Late pOghur: *buɣδai̯
    - Kuban Bulgaric: *buɣzai̯
      - Late Kuban Bulgaric: *būza
        (bor) Old Hungarian: buʒa
    - Old Chuvash (MČ1): *pŭraĭ
      - Middle Chuvash (MČ2): *pŭri

BurakD53 (talk) 12:53, 16 February 2025 (UTC)Reply

Hungarian and Slavic loanwords from Bulgar have a quite noticable cut-off date, around late 10th and early 11th century. Volga Bulgar however is attested 2 centuries later. Also considering that the *-z- we are talking about would probably be a volatile and unstable sound, I don't see a problem with 10-11th century Bulgar *-z- shifting to 13-14th century Bulgar *-r-. Agyágasi also gives this chain of descendants for irne in Chuvash, for your information:

New Persian āδīna
- (bor) Late Proto-Bulgar: **azinʲa ~ **arʲinʲa
  - Volga Bulgar: ايرنى (érne) [loss of palatalization, perhaps the intervocalic -z- is actually a palatal r, but who knows?]
    - Middle Chuvash (MČ1): *erne

There are good reasons for the palatalization of Proto-Bulgar -r-, and this chain of sound shifts are consistent with what I've given above (*-Vd- > *-Vδ- > *-V/V̄z- > (some intermediary shift) > *-Vr- and finally to -V̆r-.)

Source for the New Persian to Chuvash sound shifts: Agyágasi's book I've ref'd above, page 191.

Maybe it's actually the Kuban Bulgar which is responsible for that shift, but I'd like to see some sources on Kuban Bulgar, if we even have any substantial material on that.

AmaçsızBirKişi (talk) 13:11, 16 February 2025 (UTC)Reply

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘

The Kuban Bulgars seem to be the ancestors of the Volga Bulgars because, according to Tekin, they contributed words to Hungarian before the 8th century. We know that the Volga Bulgars migrated to the Volga Bulgar region from the Khazar state in around 9th century. Either the Kuban Bulgars were their ancestors or their cousins. As for Danube Bulgar, considering that the First Bulgarian Empire was founded in the 7th century, we can assume a similar background for them as well. To get the necessary answers, it would be useful to examine the Bulgar loanwords in Old Church Slavonic which evolved to modern Bulgarian.

However, I want to highlight an important point: foreign languages adapt and adopt sounds that are not present in their own languages. How can we be sure that the Hungarians didn't adapt the δ sound as z in their language? One of the strongest arguments supporting this theory appears to be Kaşgarî’s record. However, since Kaşgarî never actually visited the Bulgar and Suvar lands, this record is generally considered inaccurate.

If all of this, points to a proto-language with *z, I conclude that the Proto-Bulgar language should also have a code. Moreover, Proto-Bulgar already seems to refer to Kuban Bulgar. Danube Bulgar and Volga Bulgar must have evolved from it. @Benwing2

Proto-Oghur: *(r,l,lç,dh)
- Proto-Bulgar: *(r,l,lç,z)
  - (bor) Old Hungarian:
  - Volga Bulgar: (r,l,(l)ç,r)
  - Danube Bulgar: *(r,l,?,?)
    - (bor) Old Church Slovanic

BurakD53 (talk) 13:46, 16 February 2025 (UTC)Reply

@AmaçsızBirKişi @BurakD53 OK, there is no code for either Proto-Oghur or Proto-Bulgar(ic). And I'm still not sure what the ask is in terms of L2 languages. Do you want two new L2 langs, one new L2 lang or no L2 langs? Keep in mind that just because there is borrowing at different stages doesn't mean we need different L2 langs in all cases; etym variants may be enough. For example, we currently have no L2 codes for Proto-anything in the Romance family (although there is a pending proposal for Proto-Romanian or similar), and in the Slavic family we have only one L2 code for Proto-Slavic. In Germanic we have two L2 codes, for Proto-Germanic and Proto-West Germanic (although Proto-West Germanic is still somewhat controversial as a concept; it was mainly Victar pushing for PWG as a separate L2 language). Benwing2 (talk) 20:16, 16 February 2025 (UTC)Reply

I'm a bit confused. If oghur and proto-oghur are 2 different codes then shouldn't common-turkic and proto-common-turkic also be 2 different codes? We have a common-turkic code but no proto-common-turkic code. Common-turkic and oghur are both unattested so shouldn't we have only a proto-oghur and proto-common-turkic, no oghur and common-turkic code? Zbutie3.14 (talk) 21:15, 16 February 2025 (UTC)Reply

This is correct; the same situation exists in the Oghuz languages as well. Yes, Proto-Oghuz is a necessity, but if we already have an Oghuz code and can add reconstruction to it in the descendants list, why would we need a separate Proto-Oghuz language code? I think we should add the Proto-Bulgar code, and if necessary, we can add the reconstruction next to the Oghur heading. BurakD53 (talk) 09:14, 17 February 2025 (UTC)Reply

I figured out what the problem is. Right now common-turkic is a language. It should be a family, not a language. proto-common-turkic is the name of the language. This is why I was confused. Same should be done with oghur, oghur is a family and proto-oghur is a language. @Benwing2 first before adding new languages we should fix the stuff that's broken right now, so common-turkic should be made a family, proto-common-turkic should be a language, proto-oghur should be a language, the oghuz/kipchak/karluk/siberian families should be part of the common-turkic family, old turkic should be part of the south siberian family, and salar should be part of oghuz. Zbutie3.14 (talk) 14:21, 18 February 2025 (UTC)Reply

@Zbutie3.14 I went ahead and renamed "Common Turkic" to "Proto-Common Turkic" and changed its code from trk-cmn to trk-cmn-pro, so that trk-cmn can be used as the code for the Common Turkic family (currently it's still an alias for trk-cmn-pro). I realize now I should have pinged @AmaçsızBirKişi and @BurakD53 for confirmation but it seems like an obvious thing to do. Benwing2 (talk) 23:55, 18 February 2025 (UTC)Reply

I tried to convert all of the existing uses of trk-cmn based on the dump file and/or tracking in Special:WhatLinksHere/Wiktionary:Tracking/languages/trk-cmn. I am going to wait a day or two to see if any more uses pop up, and then create a Common Turkic family using the trk-cmn code. I'll deal with the other stuff at that point. Benwing2 (talk) 01:53, 19 February 2025 (UTC)Reply

thanks ur the best! <3 Zbutie3.14 (talk) 02:07, 19 February 2025 (UTC)Reply

I think etym variants will suffice, in a similar vein to cv-old and cv-mid er already have @Benwing2.

AmaçsızBirKişi (talk) 12:02, 17 February 2025 (UTC)Reply

@AmaçsızBirKişi OK, can you specify exactly which codes you want and what should be their parent language? Benwing2 (talk) 21:21, 17 February 2025 (UTC)Reply

I think we all agreed about xbo-vol, xbo-dnb, and otk-ork at least.

Oghur: (trk-ogr)¹
- Bulgar: (xbo)²
  - Volga Bulgar: (xbo-vol, etym variant of xbo)
  - Danube Bulgar: (xbo-dnb, etym variant of xbo)

----

Common Turkic:
- Old Turkic: (otk)
  - Orkhon Turkic: (otk-ork, etym variant of otk)
  - Old Kyrgyz/Yenisei Kyrgyz: (otk-kir, etym variant of otk)

To see if they will support it or have any suggestions to solve the problem @Bartanaqa @Yorınçga573 @Ardahan Karabağ @Blueskies006 @Vahagn Petrosyan @Samubert96 @Əkrəm Cəfər

¹@AmaçsızBirKişi thinks here after should be Proto-Oghur. I support.

²@Amaçsızbirkişi thinks here should be Proto-Bulgar, and I support it instead of the reconstruction. Because I still think Proto-Bulgar is a -ð- language, not -z-.

I don't know about the extent of the Turkic-internal argumentation, but I'd like to note that Hungarian does not require many of the borrowing scenarios proposed here to be "Proto-Bulgar" rather than Proto-Oghur — because it has itself native sound changes *ð > z (e.g. *kota > *qāðə > Old Hu. *χāzu > modern ház) and *ɣ > V (e.g. *sükśe > *hüɣsə > Old Hu. *hüɣsü > modern ősz). If, say, Proto-Oghur *buɣðay were borrowed into pre-Hungarian, it would clearly still evolve into búza. --Tropylium (talk) 01:03, 11 January 2026 (UTC)Reply

Guys If you are here, plz see also Wiktionary:Language treatment requests#Proto-Oghuz and Proto-Arghu to be able to enter recorded lemmas, if you support or not. Thanks.

BurakD53 (talk) 07:19, 19 February 2025 (UTC)Reply

@BurakD53 Can you redo your table, making the following distinctions:

clearly distinguish full languages, etym languages and families;
include all the intermediate nodes;
boldface the stuff that needs adding;
indicate, when language B is indented under language A, whether A is ancestral to B.

In this case, I take it:

Oghur (trk-ogr) is a family which already exists, but Proto-Oghur does not exist and needs to be added. Proto-Oghur (trk-ogr-pro) would be an etym variant of Proto-Turkic (trk-pro), just like Proto-Oghuz is.
Bulgar is a full language which already exists, and has Proto-Oghur as its ancestor.
Volga Bulgar and Danube Bulgar are etym variants of Bulgar, but there is not an ancestral relationship. NOTE: I am going to use xbo-dan instead of xbo-dnb, for consistency.
Old Chuvash has Volga Bulgar as its ancestor; Middle Chuvash has Old Chuvash as its ancestor; Chuvash has Middle Chuvash as its ancestor. Anatri and Viryal are Chuvash etym variants but there is not an ancestral relationship.
Common Turkic is a family that will be created. Proto-Common Turkic already exists and is an etym variant of Proto-Turkic.
The Oghuz, Kipchak, Karluk and Siberian Turkic families will be placed under the Common Turkic family.
Old Turkic will be placed under the South Siberian Turkic family, which is under Siberian Turkic.
Orkhon Turkic will be created as an etym variant of Old Turkic, as Old Kirghiz already is.
Are there are ancestor/descendant relationships among Old Turkic, Orkhon Turkic, Old Kirghiz and Old Uyghur?
Salar will be placed under the Oghuz family per @Zbutie3.14.
Pecheneg (an L2 language), Salchuq (an L2 language), Khazar (an L2 language) and Arghu (an etym variant of Proto-Turkic, with L2 language Khalaj as its descendant) are currently hanging directly off of Proto-Turkic. Should they be moved elsewhere?

Benwing2 (talk) 07:49, 19 February 2025 (UTC)Reply

Going off of Burak's comment, here is the full descendants list (based on ancestry):

Proto-Turkic: [trk-pro]
- Oghur(ic): [trk-ogr] (FAMILY)
  - Proto-Oghur: [trk-ogr-pro] (ETYM) (#1)
    - Proto-Bulgar: [trk-blg-pro] (ETYM) (#2)
      - Volga Bulgar: [xbo-vol] (ETYM) ([xbo] should also work here #3)
        Old Chuvash: [cv-old] (ETYM)
      - Danube Bulgar: [xbo-dan] (ETYM) ([xbo] should also work here #3)
    - Khazar: [zkz]
- Common Turkic: [trk-cmn] (FAMILY)
  - Siberian Turkic: [trk-sib] (FAMILY)
    - Old Turkic: [otk]
      - Orkhon Turkic: [otk-ork] (ETYM) (#4)
      - Yenisei Turkic: [otk-kir] (ETYM) (#4)
      - Old Uyghur [oui]
    - (...)
  - Arghu: [trk-arg] (FAMILY)
  - Oghuz: [trk-ogz] (FAMILY)
  - Kipchak: [trk-kip] (FAMILY)
  - Karluk: [trk-kar] (FAMILY)

---

/// Footnotes: ///

'#1: Proto-Oghur, like you said, can be a etym-variant of Proto-Turkic. It will have Proto-Bulgar and Khazar as its descendants. We might need to add Tuoba, Apar and so on if we reach a consensus or if the need arise. But those are very tentative, so I digress.

'#2: Proto-Bulgar is the theoretical reconstruction of the Bulgaric languages, Danube and Volga (and also Kuban, but that's unattested) Bulgar. Its ancestor is Proto-Oghur and its descendants are Volga and Danube Bulgar variants, alongside the unsplintered Bulgar [xbo].

'#3: The new Volga variant of Bulgar will have Old Chuvash (and the contemporary Chuvash) as its descendants. Danube Bulgar does not need a descendant, since it is a dead branch. It's there mainly because of loanwords into Hungarian, Church Slavonic and Romanian.

'#4: Both Orkhon [otk-ork] and Yenisei Turkic [otk-kir] should have Old Turkic [otk] as their ancestor. We also might need to add Old Uyghur [oui] as a descendant of [otk] too. There is a recurring issue of previous edits confusing Orkhon Turkic and Old Uyghur, and people immediately assume a text to be Orkhon if it has runes, which is simply not the case. For example, almost half of the lemmas in Orkhon Turkic mainspace cites Ïrḳ Bitig, a work in Old Uyghur, for instance. Separating these would be more accurate.

---

/// Some more: ///

Arghu is a descendant from the Common Turkic branch, as far as I am aware. The confusion stems from the fact that it is the earliest branch to diverge from other Turkics, but it is firmly in the Common Turkic family.
I don't think it would be appropriate if we placed Old Turkic under South Siberian, that would be anachronistic. Yakuts and Dolgans have not migrated northwards at the time when Old Turkic was spoken.
Orkhon - Yenisei - Uyghur has no ancestral relation to one another. They all stem from Old Turkic, that's all.
Khazar is an Oghuric language. I've already talked about Arghu, and I do not know much about Salchuq or Pecheneg. We don't have any entries in neither, so I don't think chopping them off from the family table (for now) is that much of an issue.

Please let me know if I got something wrong!

AmaçsızBirKişi (talk) 11:58, 19 February 2025 (UTC)Reply

About #4: After the collapse of the Göktürk State, the language used in the Old Uyghur runic inscriptions was no different from Orkhon Turkic. It was a continuation of the same written tradition in the same region, around the Orkhon basin. Therefore, I believe that texts written in the Orkhon script, such as Irk Bitig, should not be included under Old Uyghur entries. In academia, Old Uyghur Turkic is often used to refer to texts written in the Old Uyghur script, while Irk Bitig is frequently classified as Old Turkic. In his book Irk Bitig: Book for Omens, Talat Tekin did not use the term "Uyghur" even once for Irk Bitig. Instead, he simply referred to it as "Old Turkic" and described it as a Manichaean ny dialect. As we know, Orkhon Turkic is also a ny dialect. Therefore I think that's why Yorınçga includes to Old Turkic instead of Old Uyghur. He can explain better. As stated in the source linked, Old Turkic texts written in the Orkhon script are referred to as the Manichaean dialect. See. All the Old Turkic texts written in the Orkhon script are referred to as the Manichaean ny dialect. BurakD53 (talk) 15:53, 19 February 2025 (UTC)Reply

Very well. I'll remove the quotations from Ïrḳ Bitig I added for Old Uyghur. Thanks for correcting me!

The Dergipark article you linked is dead, by the way.

AmaçsızBirKişi (talk) 16:18, 19 February 2025 (UTC)Reply

[10] here. After reading a bit about it though, I'm not sure. Perhaps it would be more accurate to add it as Old Uyghur. Although it is written in the ny dialect, there are other differences, for example the use of the -gAy suffix for the future tense. Using the ablative suffix -dIn. These are different from Orkhon Turkic. I take my words back.BurakD53 (talk) 19:28, 19 February 2025 (UTC)Reply

I mean, sure why not? It was written in either year 930 or 942, way outside the range of other Turkic inscriptions (8th century).

We can remove the IB from the quotations part and the entries that rely only on IB when we deprecate the [otk] in favor of [otk-ork] and [otk-kir]. For example, yél ("mane") is only attested in IB and nowhere else in the Orkhon script. Entries like that will need removal.

AmaçsızBirKişi (talk) 19:58, 19 February 2025 (UTC)Reply

Probably not all the Runic inscriptions after the collapse of the Gokturk state, but Irk Bitig should be considered as Old Uyghur. BurakD53 (talk) 19:30, 19 February 2025 (UTC)Reply

I support the table.

Support. BurakD53 (talk) 16:05, 19 February 2025 (UTC)Reply

@AmaçsızBirKişi @BurakD53 @Zbutie3.14 OK I tried to implement everything in the above table. Please review the results. Arghu is not currently a family but an etym variant of Khalaj, so I just set its ancestor to Proto-Common Turkic. Also I gave Proto-Bulgar the code trk-bul-pro insead of trk-blg-pro, for consistency. Possibly it should be xbo-pro, but I don't know if it's kosher to have a protolanguage that is "Proto-" of a language rather than a family. Benwing2 (talk) 06:21, 20 February 2025 (UTC)Reply

Salar should be under Oghuz branch, just like Turkmen. Other than that, it's perfect. Thanks for resolving this issue.

AmaçsızBirKişi (talk) 10:55, 20 February 2025 (UTC)Reply

Is Khazar oghur? according to wikipedia it's disputed https://en.wikipedia.org/wiki/Khazar_language Zbutie3.14 (talk) 13:36, 20 February 2025 (UTC)Reply

We are making quite a few requests, but may I ask for one more thing? Could we create three variants for qwm, just like we did for otk?

Proto-Turkic: [trk-pro]
- Proto-Common-Turkic: [trk-cmn-pro]
  - Kipchak: [trk-kip] (FAMİLY)
    - Cuman-Kipchak: [trk-kcu]
      - Kipchak: [qwm]
        Cuman: [qwm-cum] (etym variant of qwm) (here what I ask for)
        Crimean Tatar: [crh]
        
        Karachay-Balkar: [krc]
        
        Karaim: [kdr]
        
        Krymchak: [jct]
        
        Kumyk: [kum]
        
        Armeno-Kipchak: [qwm-arm] (etym variant of qwm)
        
        Mamluk-Kipchak: [qwm-mam] (etym variant of qwm)

BurakD53 (talk) 07:30, 20 February 2025 (UTC)Reply

what do you think? @AmaçsızBirKişi BurakD53 (talk) 07:33, 20 February 2025 (UTC)Reply

Armeno Kipchak must be a descendant of Cuman too. Since Cuman is written in Crimea 14th ce., Armeno Kipchak is written in Crimea in 17th century. While Mamluk Kipchak written in Egypt in 13th-16th centuries, can't be a descendant of Cuman. I will just edit the table, to not confuse more. BurakD53 (talk) 07:44, 20 February 2025 (UTC)Reply

I added Cuman as an etym variant of qwm (Kipchak) and put Armeno-Kipchak under it, but I'm not sure about putting Crimean Tatar, Karachay-Balkar, etc. under Cuman. Currently the Kipchak-Cuman family (what you call Cuman-Kipchak) is under (a descendant of) the Kipchak language, whereas your tree above has them reversed. Can you edit your tree and label everything that's a family with the label "FAMILY" so we are completely clear what's going on? Also, Wikipedia asserts that "Cuman" and "Kipchak" are the same thing; see w:Cuman language. Benwing2 (talk) 23:15, 20 February 2025 (UTC)Reply

Also ping @AmaçsızBirKişi @Zbutie3.14. Benwing2 (talk) 23:16, 20 February 2025 (UTC)Reply

Proto-Turkic: [trk-pro]
- Proto-Common Turkic: [trk-cmn-pro]
  - Kipchak: [trk-kip] (FAMİLY)
    - Cuman-Kipchak: [trk-kcu] (FAMİLY)
      - Kipchak: [qwm]
        Cuman: [qwm-cum] (etym variant of qwm, location Crimea)
        
        Armeno-Kipchak: [qwm-arm] (etym variant of qwm, location Crimea)
        
        Mamluk-Kipchak: [qwm-mam] (etym variant of qwm, location Egypt)
      - Crimean Tatar: [crh] (location Crimea)
        Urum: [uum] (location Southeast Ukraine)
      - Krymchak: [jct] (location Crimea)
      - Karachay-Balkar: [krc] (location Caucasus)
      - Karaim: [kdr] (location Crimea, Poland)
      - Kumyk: [kum] (location Caucasus) BurakD53 (talk) 08:21, 21 February 2025 (UTC)Reply
@BurakD53 This appears to not properly indicate the ancestor/descendant relationships. Presumably Armeno-Kipchak is a descendant of Cuman? What about Crimean Tatar, Krymchak and/or Karaim? Can you explicitly indicate the ancestor of each lect where it differs from the containment relationships shown in the above table? Benwing2 (talk) 08:56, 21 February 2025 (UTC)Reply
I don't have enough knowledge about these languages, so any comment I make could be incorrect. Yes, one is probably the ancestor or descendant of the other, but I'm saying this just based on location and the period. BurakD53 (talk) 09:03, 21 February 2025 (UTC)Reply

I think [qwm-cum] stands for the language of Codex Cumanicus right? If so yes we need that.

AmaçsızBirKişi (talk) 10:53, 20 February 2025 (UTC)Reply

Just noting, because Salchuq is mentioned a couple of times above, that Salchuq has been removed (from the ISO list of languages and from ours) as spurious per a discussion further down on this page, Wiktionary:Language treatment requests#Retiring Salchuq. - -sche (discuss) 23:55, 22 November 2025 (UTC)Reply

I have never heard of Salchuq before. It is mentioned here a few times so I guess I completely missed it. Zbutie3.14 (talk) 01:29, 23 November 2025 (UTC)Reply

@AmaçsızBirKişi @BurakD53 @Zbutie3.14 @Tropylium Can you indicate if there's anything left to do in this (very long) discussion, or can we archive it? I remember making a bunch of Turkic language changes, esp. adding etym varieties, at one point. Benwing2 (talk) 03:03, 27 January 2026 (UTC)Reply

I guess Salchuq is a dialect of Azerbaijani but, I don't have enough information about this language, I have never seen it. Do we have any record of it? I'm not sure. Azerbaijani Wikipedia says it's a dialect spoken in Iran's Kirman province. That's all I can say. – BurakD53 (talk) 07:14, 27 January 2026 (UTC)Reply

Right now the common-turkic family is beneath the proto-common-turkic language. They should be at the same level. Like if you look at the indo-european language family, language families and their proto-language start at the same horizontal alignment and the proto-language is under the family. Zbutie3.14 (talk) 16:24, 28 January 2026 (UTC)Reply

@Zbutie3.14 Can you clarify what you mean? This is the top of the Turkic family hierarchy currently:

Do you mean for Proto-Common Turkic to go underneath Common Turkic?, like Proto-Turkic is underneath Turkic? What would then happen to Arghu? Maybe you can redraw the above diagram the way you want it. Benwing2 (talk) 05:18, 30 January 2026 (UTC)Reply

@Benwing2 I think like one of these? I am trying to copy the formatting of the indo-european ones. Arghu should be below proto-common-turkic. And the the common-turkic family should be above the proto-common-turkic language.

Turkic (trk) F
Proto-Turkic (trk-pro)
├[-]┬ Common Turkic (trk-cmn) F
│ │ Proto-Common Turkic (trk-cmn-pro) V
│ ├[-]┬ Arghu (klj-arg) V
│ │ └──── Khalaj (klj)
│ ├[-]┬ Karluk (trk-kar) F
│ │ │ ├──── Ili Turki (ili)
│ │ │ │ └[-]┬ Karakhanid (xqa)

Zbutie3.14 (talk) 16:03, 30 January 2026 (UTC)Reply

OK, the main difference here is that currently we're considering Arghu as (sort of) a sister to Common Turkic but in your proposed family it's definitively a daughter. Is this how it should be? If so it should be pretty easy to fix. Benwing2 (talk) 19:40, 30 January 2026 (UTC)Reply

yes it's a daughter of common turkic. Zbutie3.14 (talk) 20:40, 30 January 2026 (UTC)Reply

I would like to request etymology-only codes and dedicated dialect labels (not sure if this is the right place?) for South Sumatran Malayic varieties under the Musi and Central Malay dialect groups. These varieties used to have their own ISO 639-3 codes before they (except [liw], [vkk], and [pel]) were merged into [mui] and [pse] in 2008. Per McDowell & Anderbeck (2020), many of these lects do have their own salient distinguishing features, and they remain treated as separate languages in most Indonesian publications. Specific words from several of these varieties have been borrowed into Indonesian, and they need to be etymologized properly (attested terms only, per Wiktionary:About Indonesian#Regional Languages).

Etymology-only languages currently needed:

[mui-plm] or [mui-plb] Palembang (formerly [plm])
[mui-syu] or [mui-sky] Sekayu (formerly [syu] in Ethnologue 13, pre-ISO)
[mui-lmt] Lematang (formerly [lmt])
[pse-bke] or [pse-ben] Bengkulu (formerly [bke])

Not necessary, but may be useful for tracing etymon reflexes:

[pse-srj] Serawai (formerly [srj])

Given the lack of universally accepted standard varieties in both [mui] and [pse] groupings, we also need to carefully label and categorize their entries according to their specific dialectal origin. I propose we adopt the classification given in McDowell & Anderbeck (2020), which retains most of the familiar local "language" labels (in Italics).

Musi dialect group [mui]

Upper Musi
- Musi Proper (= Musi, formerly [mui] in the narrow sense)
- Pegagan (often misidentified as a dialect of Ogan [ogn])
- Rawas (formerly [rws])
- Col [liw]
Palembang–Lowland
- Palembang (formerly [plm])
  - Palembang Lama (traditional variety which includes a polite register akin to Javanese krama, taught locally in Palembang schools since 2024)
  - Palembang Pasar (urban koiné used as a regional lingua franca within and beyond the city of Palembang)
  - Pesisir (rural coastal variety, formerly listed under [mly])
- Lowland
  - Belide (formerly under [lmt] and [mly])
  - Lematang Ilir (= Lematang, formerly [lmt])
  - Penesak (formerly [pen])

Central Malay dialect group [pse]

Oganic
- Ogan (formerly [ogn])
- Rambang
- Enim (formerly [eni])
Highland
- Bengkulu (formerly [bke])
- Besemah (formerly [pse] in the narrow sense)
- Lematang Ulu (identical to Besemah)
- Lintang (formerly [lnt])
- Semende (formerly [sdd])
- Benakat
- Serawai (formerly [srj])
  - Talo (*-a > [o], used by Adelaar to reconstruct Proto-Malayic)
  - Manna (*-a > [aw])
- Kaur [vkk]
- Pekal [pel]

Currently I have started using some of these labels in entries, cf. katek, rete, and muanai. At the very least, I think we need dedicated labels and categories for the etymology-only languages proposed above + the already existing [pse-bsm] (Besemah). The category names for dialects of [pse] and [mui] may be appended with "Malay", e.g. Palembang Malay, Musi Malay, Ogan Malay, Semende Malay, etc.

Note that prior to the merger of the codes (and up until now in Indonesia), the term "Palembang Malay" or "Palembang language" (bahasa Palembang) can only refer to the dialects under "Palembang" in particular, while "Musi language" (bahasa Musi) refers to dialects under "Musi Proper". The rest of the dialects are either treated as languages on their own, as dialects of Malay, or occasionally under other umbrella terms such as "Bengkulu language" (bahasa Bengkulu) for Highland [pse] dialects spoken in Bengkulu.

I am indifferent to the issue of whether we should lump together [vkk] and [pel] with [pse], and [col] with [mui]. In particular, [pel] is sometimes placed closer to [min] than to other [pse] lects (e.g. in Glottolog). Haji [hji] is an isolate within Malayic, sharing only ~60% of its lexicon with neighboring South Sumatran varieties, and is best treated as its own language. All [mui], [pse], and [hji] lects should be written in [Latn] as the default script, but [pse] also uses [Rjng], and [mui] is occasionally written with [ms-Arab]. Swarabakti (talk) 21:21, 25 January 2025 (UTC)Reply

@Swarabakti Apologies that your request got dropped. @-sche Do you know who are the relevant Malayic/Austronesian editors who could comment on this? I don't know the first thing about this family. Benwing2 (talk) 19:42, 30 January 2026 (UTC)Reply

Maybe User:Rex Aurorum or User:Rentangan (who helped write Wiktionary:Proto-Malayic entry guidelines and Wiktionary:Indonesian entry guidelines), or User:Heydari or User:Austronesier (who list high id Babel competence) : do any of you have thoughts on these proposed etymology-only languages and dialect {{label}}s?
(There are other active users who list high ms Babel competence, but they have block histories for inept edits so I don't know whether they'd be the most helpful people to ping. Swarabakti is the only user who lists mui or pse competence.) - -sche (discuss) 22:04, 30 January 2026 (UTC)Reply

Unfortunately, I'm unfamiliar with the isolect classification. I agree regarding new etym codes for these isolects. Some Indonesian lexicon is borrowed from these specific lects, so it's beneficial for clarifying etymologies. ―Rex Aurōrum^{｢Disputātiō｣} 22:27, 1 February 2026 (UTC)Reply

For future reference: Austronesier will always be a good choice for any discussion of the classification of (or at the very least least the western half of) Austronesian/Malayo-Polynesian due to their familiarity with the literature, whether they're competent in any given language or not. Chuck Entz (talk) 22:19, 30 January 2026 (UTC)Reply

FWIW we did already (partially for Musi) discuss this topic on enwiki. Swarabakti (talk) 10:19, 2 February 2026 (UTC)Reply

The "Musi dialect group" tree above matches what is in Glottolog except with regard to Col (as you note): Glottolog has Col as a separate branch on the same level as "Musi", rather than having it as a sub-branch of Upper Musi. If no-one else comments in, say, a week, I would support setting up the Musi labels as you propose except with regard to Col, which we could be conservative and leave as its own language (or put directly under mui?). It seems like we already have Palembang as a dialect label (or at least category) and I support adding an ety code for Palembang, as it is very, very often mentioned as a dialect or group of dialects.
Re Central Malay, Glottolog also does not appear to split Oganic vs Highland (?), as they have Ogan, Enim, Bengkulu, Besemah / Pasemah, Lintang, Semende / Semendo, Serawai and Kaur all in one group (with all of those except Kaur in a subgroup); they stick Pekal somewhere else entirely, for whatever (possibly geographic) reason. I reckon we could be conservative and have a similarly "flat" tree, all the varieties directly under pse on the same level, except pel, where the identification of it with pse indeed seems uncertain, as you say; since its ISO code is still valid, maybe we just let it keep being its own separate language for now.
Hopefully Austronesier can comment.
I can find scattered mentions of Sekayu, Lematang and Bengkulu as dialects too, and can support adding codes for them too if there are no objections; in fact, I wonder if we should give ety-only codes to all of the dialects that formerly had ISO codes?
BTW, some papers speak of Bengkulu as a dialect of Lembak, which Wikipedia and Glottolog equate to Col, but I infer that these just mean there is a dialect of Lembak/Col spoken in the place named Bengkulu, and they are not saying bke is a variety of Col (?). (E.g. [11], [12].) - -sche (discuss) 20:32, 4 February 2026 (UTC)Reply

Ideally all of the formerly coded dialects should have etym codes since they do have salient distinguishing features and may be cited as such in literature. I was only highlighting the major ones, especially those available as etym labels on the Great Indonesian Dictionary (KBBI).

The sources you cited do not say that Bengkulu is a dialect of Lembak (which is a common name for liw in Bengkulu, but a bit lesser known in South Sumatra). The first one specifically mentions that Lembak is distinct from Bengkulu Malay (i.e. the urban variety formerly coded as bke), while the second one only mentions "Bengkulu" as a placename but does not actually discuss a "Bengkulu dialect".

Typically it's the other way around; several Indonesian-language sources (including the Bengkulu Province Language Agency) do group together Lembak (< liw), the Highland varieties of pse, Kaur (vkk), Pekal (pel), and even Nasal (nsy, which is almost universally considered non-Malayic otherwise) as "dialects" of "Bengkulu language", in an effort to forge a unique regional identity.[13] Similarly, the Bengkulu diaspora of Malaysia, who came from two different regions, recognize their native Lembak (< liw) and Serawai (< pse, formerly srj) varieties as dialects of "Bengkulu", due to a shared ethnic/geographic identity.[14]

But yeah, most of studies on SSML are limited in scope, resulting in the lack of investigation to the actual wider affiliations. Many just take language name at face value, and it just so happens that people in this area calls their language "(whatever group or river basin they are from) language". The reason I specifically cite McDowell and Anderbeck (2020) is because it's the only comprehensive monograph discussing the classification of all varieties of South Sumatran Malayic (with their dataset provided online for verification if you want to check btw)... Swarabakti (talk) 07:30, 5 February 2026 (UTC)Reply

Common Romanian, also called ‘Proto-Romanian’, is the reconstructed common ancestor of Aromanian, Istro-Romanian, Megleno-Aromenian, and Romanian. There is considerable scholarship on the subject. Sala 1976 treats the phonological aspects of the reconstruction in detail.

We already host such reconstructions under ‘Reconstruction:Latin’, which is problematic for a number of reasons:

The name. No scholar refers to this reconstruction as ‘Latin’, and that name can easily mislead our readers.
The orthography. Spellings like *⟨oestricula⟩ are quite out-of-step with reconstructions like /ˈstrekʎe/.

Proposed orthography: the phonemic transcriptions as they are now, except with some other way of indicating stress. For instance *strékʎe.

Pinging @Word dewd544, @Catonif, @Bogdan, @Benwing2 as potentially interested parties.

Nicodene (talk) 20:30, 30 January 2025 (UTC)Reply

No objection here except possibly to the name; "Proto-Romanian" sounds a bit better IMO although I'm not familiar with the scholarship to know what's the most common term. Benwing2 (talk) 21:07, 30 January 2025 (UTC)Reply

Is there any chance we could call it "Proto-Eastern Romance" since we group the languages in question together as the Eastern Romance languages? It gets a good number of Google hits. Also, both "Proto-Romanian" and "Common Romanian" are likely to be perceived as the ancestor of Romanian alone, not the other ones. —Mahāgaja · talk 21:09, 30 January 2025 (UTC)Reply

I agree that it would be useful to have Proto-Romanian (or however we decide to call it), not just for Latin words, but also for borrowings from Albanian. But about this particular case, while I agree that there can be a reconstruction before the split into Romanian and Aromanian pronounced /strekʎe/, the word itself is older, a Late Latin *oestricula must have existed, as the diminutive suffix was no longer productive at the later stage of the language (Proto-Romanian). I also wonder if we can find an obscure descendant of *oestricula in some dialect of Northern Italian, as often happens with Romanian words that are from Late Latin. Bogdan (talk) 23:02, 30 January 2025 (UTC)Reply

I’m not sure we can regard the criterion for Latin as ‘still having a productive reflex of -iculum’ in light of, for instance, Spanish -ejo.

@Mahagaja: Italian is often included under the label Eastern Romance, unfortunately. A possible option without this issue is Proto-Balkan-Romance.

Nicodene (talk) 03:43, 31 January 2025 (UTC)Reply

Even if others include Italian under Eastern Romance, we don't. We already use that term with the label roa-eas for a family consisting of ro, ruo, rup, and ruq. Calling the protolanguage of that family Proto-Eastern Romance would be internally consistent. That other people define the Eastern Romance family differently doesn't really have any relevance to what we call the protolanguage. —Mahāgaja · talk 07:13, 31 January 2025 (UTC)Reply

There has never been a discussion or vote on defining the label Eastern Romance, or using it on Wiktionary to begin with.

The vast majority of the time the term Eastern Romance has a broader scope than those four languages.

Nicodene (talk) 10:21, 31 January 2025 (UTC)Reply

Personally, I would be fine with this if it only implies we would still handle the situation exactly as we do now, the only difference being the language name as "Common Romanian" instead of "Latin" and the orthography more fitting, which are the two issues listed here. But I oppose this if, as I am to understand, this would take the role of a full-fledged language language and hence also have term inherited from attested Latin terms and terms borrowed from Slavic or some other Balkan language. This would increase the reconstruction up to an excessive number (approximately two thousands), an immense amount of work for little usefulness provided and greater informational clutter.

Regarding the name, were the first approach I mentioned go through, I would support "Common Romanian", or if we find it more coherent with the rest of the bunch, "Proto-Romanian". Any mention of "Eastern" or "Balkan Romance" I would vote against. Catonif (talk) 18:15, 31 January 2025 (UTC)Reply

(moved from Wiktionary:Beer parlour/2025/February)

We have a whole host of warnings (17) issued concerning mismatches between proto-languages and families:

Proto-Central Togo (alv-gtm-pro) does not have the expected name "Proto-Ghana-Togo Mountain", even though it is the proto-language of the Ghana-Togo Mountain languages (alv-gtm).
Proto-Arawa (auf-pro) does not have the expected name "Proto-Arauan", even though it is the proto-language of the Arauan languages (auf).
~~Proto-Arawak (awd-pro) does not have the expected name "Proto-Arawakan", even though it is the proto-language of the Arawakan languages (awd). [harmonize under Arawak]~~
~~Proto-Ta-Arawak (awd-taa-pro) does not have the expected name "Proto-Ta-Arawakan", even though it is the proto-language of the Ta-Arawakan languages (awd-taa). [harmonize under Ta-Arawak]~~
Proto-Basque (euq-pro) does not have the expected name "Proto-Vasconic", even though it is the proto-language of the Vasconic languages (euq). [keep as-is]
Proto-Norse (gmq-pro) does not have the expected name "Proto-North Germanic", even though it is the proto-language of the North Germanic languages (gmq). [keep as-is but rename gmq-pro to non-pro]
~~Proto-Kamta (inc-krn-pro) does not have the expected name "Proto-KRNB lects", even though it is the proto-language of the KRNB lects (inc-krn).~~ [rename family to KRDS languages, keep proto-language as-is]
Proto-Chumash (nai-chu-pro) does not have the expected name "Proto-Chumashan", even though it is the proto-language of the Chumashan languages (nai-chu).
Proto-Maidun (nai-mdu-pro) does not have the expected name "Proto-Maiduan", even though it is the proto-language of the Maiduan languages (nai-mdu).
Proto-Mixe-Zoque (nai-miz-pro) does not have the expected name "Proto-Mixe-Zoquean", even though it is the proto-language of the Mixe-Zoquean languages (nai-miz).
Proto-Pomo (nai-pom-pro) does not have the expected name "Proto-Pomoan", even though it is the proto-language of the Pomoan languages (nai-pom).
Proto-Mazatec (omq-maz-pro) does not have the expected name "Proto-Mazatecan", even though it is the proto-language of the Mazatecan languages (omq-maz).
Proto-North Sarawak (poz-swa-pro) does not have the expected name "Proto-North Sarawakan", even though it is the proto-language of the North Sarawakan languages (poz-swa).
~~Proto-Salish (sal-pro) does not have the expected name "Proto-Salishan", even though it is the proto-language of the Salishan languages (sal). [harmonize under Salish]~~
Proto-Samic (smi-pro) does not have the expected name "Proto-Sami", even though it is the proto-language of the Sami languages (smi).
~~Proto-Kuki-Chin (tbq-kuk-pro) does not have the expected name "Proto-Kukish", even though it is the proto-language of the Kukish languages (tbq-kuk). [harmonize under Kuki-Chin]~~
~~Proto-Saka (xsc-sak-pro) does not have the expected name "Proto-Sakan", even though it is the proto-language of the Sakan languages (xsc-sak).~~

We also have four warnings about proto-languages without associated families;

Proto-Amuesha-Chamicuro (awd-amc-pro) has a proto-language code associated with the invalid code "awd-amc".
Proto-Kampa (awd-kmp-pro) has a proto-language code associated with the invalid code "awd-kmp".
Proto-Paresi-Waura (awd-prw-pro) has a proto-language code associated with the invalid code "awd-prw".
Proto-Puroik (sit-khp-pro) has a proto-language code associated with the invalid code "sit-khp".

We also have two weird miscellaneous warnings:

Proto-Rukai (dru-pro) has a proto-language code associated with Rukai (dru), which is not a family.
Kelantan Peranakan Hokkien (mis-hkl) has its canonical name ("Kelantan Peranakan Hokkien") repeated in the table of aliases.

I can look into the second miscellaneous warning, but for the others, I mostly don't have enough context. Proto-Norse being the ancestor of the North Germanic languages is a special case because it's attested, but for the other mismatches, I imagine a lot of them are unintentional due the existence of multiple names for the same family. It should be possible in many cases to rename either the family or proto-language to avoid the mismatch. Pinging @-sche and @Theknightwho who might know something about this; please feel free to ping others. Benwing2 (talk) 04:09, 19 February 2025 (UTC)Reply

In some cases, I think the family uses a different name to avoid having the same exact name as a (non-proto) language (as described in WT:FAM). For example, "Proto-Vasconic" gets only 13 Google Books hits (that actually use that term; the subsequent pages upon pages of results that Google returns don't use the term or sometimes even have any particular relevance — who knows why Google returns them), whereas I find 10+ pages [of ten uses each] of "Proto-Basque", so "Proto-Basque" is clearly the more common name for the language ... but without even checking whether "Basque languages" or "Vasconic languages" is more common for the family, I can see that one benefit to calling them "Vasconic languages" is that if they were called "Basque languages", then things like {{der|en|euq|-}} would display identically to {{der|en|eu|-}}. (That might not matter that much in that particular case, but for larger families it'd be confusing. However, {{der|en|qwm|-}} and {{der|en|trk-kip|-}} do display identically... so maybe we need to rename one of those, or find some way of solving this "same name" issue...)
In some cases, the proto-language and family might really have different common names.
In the case of Salish, it looks like the family could be renamed "Salish" to match the proto-language; "Proto-Salish" gets 11 pages of relevant Google Books results vs only 9 pages for "Proto-Salishan", and "Salish languages" is apparently also more common. - -sche (discuss) 05:04, 19 February 2025 (UTC)Reply

"Ta-Arawak" seems to be marginally more common than "Ta-Arawakan", if we wanted to synchronize that pair: on Google Scholar, "Ta-Arawak" gets 40 hits, "Ta-Arawakan" 26; on Google Books, each one gets about 14 hits (discounting a few which are not in English and are only using ta as a particle while mentioning the Arawak/an languages). "Proto-Ta-Arawakan" gets 1 GBooks hit and "Proto-Ta-Arawak" gets none; "Ta-Arawakan languages" returns 2 copies of 1 book, "Ta-Arawak languages" returns 1 book. On Google , "Ta-Arawakan languages" returns 0 hits while "Ta-Arawak languages" returns 7 (of which 3 are duplicates of a single work). - -sche (discuss) 18:31, 19 February 2025 (UTC)Reply

@-sche What about Proto-Arawak vs. Arawakan? Wikipedia has w:Arawakan languages and w:Ta-Arawakan languages (although the w:Arawakan languages article uses "Ta-Arawak" in reference to the family). Since Ta-Arawakan is a subfamily of Arawakan, it seems we should be consistent in the names of these two families. (Meanwhile, confusingly, Category:Arauan languages is an apparently unrelated family; Wikipedia's article is at w:Arawan languages, which looks more "modern".) Benwing2 (talk) 00:59, 21 February 2025 (UTC)Reply

Although both names seem to be common enough that the Google (Books) Ngram Viewer should be able to plot them (both seem to get well over 40 hits), it doesn't like the hyphens, so this claims no results, and I can't be sure whether this is actually a graph of "Proto-Arawak" or instead of how many books have "Proto" minus "Arawak". Nonetheless it seems like "Arawak" is more common, if we wanted to standardize everything on that. (Google Scholar also claims to find slightly more results for "Proto-Arawak" than "Proto-Arawakan", and significantly more for "Arawak" than "Arawakan".) - -sche (discuss) 18:32, 22 February 2025 (UTC)Reply

For Kamta, I notice there's the added oddity that the language family/category is named "... lects" rather than "... languages", even though the languages in the category are named "Category: ... language". AFAICT, that part of the name should be regularized (from "lects" to "languages"). For the name itself, google books:"KRNB" languages Kamta turns up zilch (and I spy only three Google Scholar hits), but "Kamta languages" also turns up zilch (and if the family were renamed "Kamta" to match the proto-language, we would run into the Kipchak issue where {{der}} etc would return the same name whether the family or the [non-proto] language that's already called "Kamta" was called). Wikipedia uses a third name, "KRDS", which I can find a couple of Google Books and a couple of Google Scholar hits using. There are a couple Google Books and Scholar hits for "proto-Kamta", and none for "Proto-KRNB" or "Proto-KRDS", so maybe we leave the proto-language name as "Proto-Kamta" but change the family from "KRNB lects" to "KRDS languages"? Or maybe some Indian-language editors have better knowledge/ideas: pinging User:AryamanA who created Category:Rajbanshi language (and you already pinged TKW, who Category:Surjapuri language). - -sche (discuss) 18:32, 22 February 2025 (UTC)Reply

In general, I'd follow the literature; if they generally use a different name for the proto-language vs. the group by which the proto-language is reconstructed, so be it. If it's an even split between multiple names: sure, harmonize it for convenience. However, I have a few suggestions.

Rename "Kukish" to "Kuki-Chin" (Kuki-Chin is more common)
Change the code of Proto-Norse from gmq-pro to non-pro but keep the "Proto-Norse" name (since that's what the literature calls it). It doesn't really make sense for Old Norse to be non but Proto-Norse to have "gmq" instead.

— Ceso femmuin mbolgaig mbung, mellohi! (投稿) 17:06, 24 February 2025 (UTC)Reply

Definitely, in cases where one name is more common for the proto-language and another for the group, I agree it's fine for them not to match. - -sche (discuss) 17:50, 25 February 2025 (UTC)Reply

@-sche, Mellohi! I added the results so far in bold. There's a trend here in that so far generally the name of the proto-language has remained and the name of the family changed. I don't know if that applies to the remainder, though. Benwing2 (talk) 20:30, 24 February 2025 (UTC)Reply

For Saka, "Proto-Saka" and "Saka languages" are both considerably more common than the corresponding forms with "Sakan", AFAICT, so those could be standardizen on "Saka". - -sche (discuss) 07:25, 24 January 2026 (UTC)Reply

Done for Sakan -> Saka. Benwing2 (talk) 07:46, 24 January 2026 (UTC)Reply

Pending from 2024 (Benwing plan)

Waiting... [I understand that languages where decisions have a turnout of approx. 5 people are difficult]. It would be useful for reviewing correctly etymologies, Cat:Koine Greek and Cat:Modern Greek simultaneously. At the moment I feel 'blocked' because it will be hectic to have to go back to my reviews to rereview them. I usually write a MedGr.reminder every January of every year since 2023. The stylistic use of Koine through centuries as high register & diglossia should not discourage or confuse this decision. Thank you. ‑‑Sarri.greek ^♫ I 15:39, 19 February 2025 (UTC)Reply

@Sarri.greek: Hello again. Could you explain what you refer to by “The stylistic use of Koine through centuries as high register & diglossia should not discourage or confuse this decision.”, please? 0DF (talk) 01:17, 20 February 2025 (UTC)Reply

Proto-Turkic: (trk-pro)
- Proto-Common Turkic: (trk-cmn-pro)
  - Kipchak: (trk-kip) (FAMİLY)
    - Kipchak-Bulgar: (trk-kbu) (FAMİLY)
      - Volga Turki: (?)
        Bashkir: (ba)
        
        Tatar: (tt)

>>Volga Türki<<

BurakD53 (talk) 18:34, 3 March 2025 (UTC)Reply

Data: Qul Ali Kıssa-i Yusuf and Volga Tatar tombstones (like Volga Bulgar inscriptions). - BurakD53 (talk) 18:37, 3 March 2025 (UTC)Reply

@AmaçsızBirKişi @Zbutie3.14 - BurakD53 (talk) 18:39, 3 March 2025 (UTC)Reply

fine with me Zbutie3.14 (talk) 23:28, 3 March 2025 (UTC)Reply

Why not?

Support.

@Benwing2 (I know we keep pinging you to add new lang codes for Turkic langugaes, but the thing is the previous ones were very inaccurate and missing.)

AmaçsızBirKişi (talk) 10:08, 4 March 2025 (UTC)Reply

@Benwing2 we need a code for Volga Turki it is an L2 language under the Kipchak-Bulgar family and it is the parent of Bashkir and Tatar. Also there are a lot of other things that need to be changed but we can deal with those later I guess Zbutie3.14 (talk) 19:14, 13 March 2025 (UTC)Reply

My argument here is that Fingallian and Yola as languages is inaccurate and are better classed as dialects of Early Modern English.

As detailed by Hickey (2005, pp. 196-198), 'the dialect of Fingal' is attested in three 17th century poems which display a small number of features showing the influence of Irish Gaelic and a couple of relatively conservative features (namely Middle English /i:/ and past participle 'y-'). 'The dialect of Forth and Bargy' is attested slightly more substantively from the end of the 18th century with two longer glossaries and some short texts, mostly poems/songs. These display a larger number of divergent or conservative features (2005, pp. 199-202). Hickey (2002, 2005) is essentially the primary scholar on historical varieties of English in Ireland and is clear in referring to these as dialects. No reliable sources make any mention of Yola/Fingallian languages. Similarly, Oxford English Dictionary notes Forth and Bargy words as variant forms under Irish English (Wexford), 1800s such as 'af' and 'av' for 'if' here.

These two dialects give a glimpse into the development of English in Ireland prior to the large scale language-shift that came in the following centuries. Whilst I recognise that the 'language vs. dialect' argument is mostly contrived and relative, it does not make any sense for these two to be classes as languages on wiktionary and any entries would be better described as dialectal Early Modern English. My view is that these varieties are not actually that different from contemporary or later dialects of English eg Yorkshire or Cumbrian dialects which are traditionally quite divergent from varieties of Southern English yet fall under English all the same. Further, the majority of words currently under the heading Fingallian are cited from a glossary of dialectal words from the 20th century and aren't strictly Fingallian anyway.

Sources:

Hickey, R. (2002). A source book for Irish English. J. Benjamins Publishing Company.
Hickey, R. (2005). Dublin English: evolution and change. J. Benjamins Publishing Company.
Oxford English Dictionary, s.v. “if (conj. & n.), Forms,” accessed December 2024,

MolingLuachra (talk) 21:00, 11 March 2025 (UTC)Reply

@MolingLuachra These diverged before Early Modern English (which begins c. 1500), so what is the basis for treating them as part of it? They also have a separate ancestry to modern Irish English, as they developed out of the forms of Middle English brought over centuries earlier, so putting them under the heading "English" (which strictly refers to Early Modern English onwards) feels contrived. The fact that many Fingallian entries are wrong isn't really relevant, either - those entries just need to be corrected.

Note also that Wiktionary is not Wikipedia - we aren't limited by whatever reliable sources choose to describe as a language. We merge some traditionally treated as separate (e.g. Serbo-Croatian, Catalan and Valencian), and separate others that are usually grouped together (e.g. Low German is split into Dutch Low Saxon and German Low German). Theknightwho (talk) 12:14, 12 March 2025 (UTC)Reply

@Zff19930930 as a prolific Yola editor. —Mahāgaja · talk 06:22, 13 March 2025 (UTC)Reply

Yola is much more conservative than Early Modern English. For instance, baake (bake) /baːk/ was heard and recorded in A Modern Glossary of the Dialect of Forth and Bargy, page 154. Thus, Yola can't be classified as a dialect of Early Modern English.

There is a comment about Fingallian in A NORTHCOUNTY DUBLINGLOSSARY, page 262.

This district, Fingal, had in former times a dialect based on the I3th century colonial South-Western English of the Pale. Fingallian, of which we have only the slightest records, must have closely resembled the Forth dialect, recorded by Poole early in the last century; but, owing no doubt to its nearness to the capital, it did not keep its peculiarities so long. One naturally looks for traces of this ancient speech in a North-Dublin glossary, but they are few and doubtful.

Some words give a flavour of Fingallian, particularly forms like fat for "what", fen for "when", ame for "them" or plack-keet for "placket". Fingallian did exist, and was extinct by the mid-19th century.

I will clean some Irish English under the heading Fingallian. Zff19930930 (talk) 13:01, 13 March 2025 (UTC)Reply

You could get 'fat, fen, etc. (pronounced with voiceless bilabial [ɸʲ], the Irish slender /f´/) in most of rural Ireland up to the 20th century, hence eg. making fun of phwat is yer nam?! in An Béal Bocht by Myles na gCopaleen (and I wouldn’t be extremely surprised if there were still some old people with that in the strongest Gaeltacht areas). // Silmeth ^@talk 19:13, 13 March 2025 (UTC)Reply

A single conservative feature is not nearly enough to object to 'Yola' being a dialect of English. The conservative lack of vowel shift /iː/ → /ai/ and /aː/ → /eː/ is interesting and I'm not sure about that in particular but all of the features of the dialects in Forth and Bargy/Fingal are widely attested elsewhere or are clear substrate features of Irish. As I said, scholarly consensus is unambiguous in referring in these as dialects of English and my contention is that the classification for wiktionary's purposes as 'languages' makes no sense when other much better attested and much more divergent varieties of historical and modern 'Englishes' are not 'languages'. My argument is essentially that 'Early Modern English' as used by Burnley (1992) refers to a period of the language's history c. 1500-1800. As 'Fingallian' and 'Yola' are dialects of English attested during this period, I think it makes sense to call them 'dialects of Early Modern English'. Especially given that this is the convention taken with other instances of dialectal or historical variation in Old English, Middle English and English such as here where variant spellings reflecting regional or historical differences are given as 'Alternative forms' under the headword 'fader'.

I'm not sure what the relevance of the quote is but the examples you give are sort of besides the point. As Silmeth pointed out, the substitution of English /ʍ/ for Gaelic /ɸ/ is not limited to Fingal/F&B and continued to be a feature of Irish English until very recently if not still amongst some people. The shift in stress in a word like 'placket' is a feature of F&B not Fingal (argued by O'Rahilly 1932 to be a result of Norman influence). You say that 'Fingallian' 'was extinct by the mid-19th century', do you have any source for that? No reliable sources I can find say anything but that the dialect was only attested in three poems in the 17th century.

MolingLuachra (talk) 15:00, 14 March 2025 (UTC)Reply

Coming here from Wikipedia, if such dialects as traditional Somerset and Dorset English aren't considered separate, then neither should the dialect of Forth and Bargy be. Many authors, even while it was alive, talked about how similar they are. Fingallian even less so. Not to mention half the Fingallian etymologies are just wrong and made up by someone who clearly doesn't know much about Irish or the English of Ireland. I'm after correcting one that very clearly comes from Irish, and there's several others.

Also, as @MolingLuachrasaid, Fingallian is attested by three poems in the 17th century and likely didn't last out the century; indeed, it's unknown if those poems were even written by speakers or rather people making fun of it. There's really absolutely no reason it should be considered separate here. Sionnachnaréaltaí (talk) 19:02, 14 March 2025 (UTC)Reply

Support. Dialects form a continuum, and it makes no sense to categorize them solely based on point of divergence. In this instance, it seems clear that Yola and Fingallian belong to the broader English continuum; they differ from prestige varieties, but so are most other traditional dialects, especially in the region. 🌙🐇 ^⠀talk⠀ ^{⠀contribs⠀} 22:47, 17 March 2025 (UTC)Reply

Support. As I said earlier, there's no real reason to consider it separate apart from comparing it solely with the prestige language. Even the people who wrote about it while it was alive or recent deceased never considered the Forth and Bargy dialect (Yola) or Fingallian to be separate (and we have scant information on Fingallian to begin with). If we consider these two separate on linguistic features, we might as well consider every traditional dialect of English as a separate language; if we consider them separate based on geographic location, why is American English, Canadian English, Nigerian English, Indian English, modern Hiberno-English, et al. not considered as separate languages? Really, it seems the differences are exaggerated because the nearest dialects to it in the continuum aren't spoken as strongly anymore, and because we compare it to modern prestige English, not to the continuum of its time.

Sionnachnaréaltaí (talk) 14:07, 18 March 2025 (UTC)Reply

Comparing it to the differences between American English, Canadian English, Nigerian English, Indian English, modern Hiberno-English seems like a pretty big exaggeration, given they are all widely mutually-intelligible with each other. Theknightwho (talk) 14:35, 18 March 2025 (UTC)Reply

All the more reason to not consider it separate. As far as we know it was mutually intelligible to other dialects on the spectrum. If they're considered a single dialect continuum because of mutual intelligibility, then the Forth and Bargy dialect should be too based on what we know. Sionnachnaréaltaí (talk) 19:44, 18 March 2025 (UTC)Reply

Modern prestige BrEng is completely intelligible to me but many traditional Scottish and Irish lects aren't. Hell, many English lects aren't, either, and as a US Southerner I still often struggle to understand rural AAVE. Are these all separate languages? ;P Chances are, most everyone can understand people from the next town over (at least as far as traditional dialectal boundaries go); that's what a dialect continuum is, no? 🌙🐇 ^⠀talk⠀ ^{⠀contribs⠀} 20:33, 18 March 2025 (UTC)Reply

Prestige dialects of English (those you have mentioned, pretty much) have developed in parallel for a long time with consistent cross-pollination, thus in this era we cannot consider geography as the sole factor in the continuum. We instead need to examine each individual lect and compare them to all other lects that share similar features; in this case, as has been raised above, both Fingal and Yola share many features with other contemporaneous Irish lects, putting it in a similar position as other traditional regiolects; that is to say, I think I don't think anyone would be opposed to treating these as separate languages if e.g. traditional Yorkshire is also treated as such—but they arent. 🌙🐇 ^⠀talk⠀ ^{⠀contribs⠀} 20:38, 18 March 2025 (UTC)Reply

(Would you also merge Scots under ==English==?) I'm somewhat ambivalent, but am inclined to keep Yola separate; it has a divergent history (like Scots, which past discussions have also strongly though not unanimously kept separate), and has an ISO code, distinguishing it from some of the dialects people have pointed to above. Furthermore, it seems like handling it as ==English== would in practice mean deleting coverage of it, since it's not clear to me it would meet the criteria for inclusion (three uses-not-mentions) that English words are subject to. Fingallian OTOH (added without discussion) seems more dubious, particularly because it seems the only works "attesting" it may be parodies of it rather than actual records of it, and thus not reliable bases for entries (consider the differences between African-American English and parodies of it). - -sche (discuss) 03:22, 19 March 2025 (UTC)Reply

Oppose merging for now for the above reasons. We have pretty strict inclusion criteria for English per WT:CFI & WT:WDL, so if merging means losing the coverage we have, I cannot support it. And FWIW the last time we tried to make a carve-out for variants of WDLs, it unfortunately did not pass: Wiktionary:Votes/2022-08/Regional and Obsolete variations as LDL's. AG202 (talk) 03:50, 19 March 2025 (UTC)Reply

Comment: in a change dated to 2025-10-15, the ISO (SIL) ineptly merged yol into Middle English. Because we (and they!) define 1500 as the cutoff for Middle English, and Yola was spoke into the 19th century, we are unable to follow suit. (We could merge yol into modern English, but then we'd run into the problems discussed above.) - -sche (discuss) 05:34, 14 January 2026 (UTC)Reply

While discussions here are not strict headcounts, I would say that after most of a year with no more comments apart from mine a few days ago, and with an even 3-3 split of MolingLuachra, Lunabunn and Sionnachnaréaltaí supporting a merger of Yola into English, and me, AG202, and (if I understand correctly) TKW opposing, there is no consensus to merge Yola into English.
With MolingLuachra, Lunabunn, Sionnachnaréaltaí, and me (though I suppose my position may not have been clear until now) inclined to merge Fingallian into English, and I believe TKW and AG202 opposing, there might be consensus to merge Fingallian (there does not seem to have been consensus to create it in the first place, it was created without discussion), but so much of this discussion focused on Yola, and the discussion is so stale, that I will probably just start a new subsection here soon specifically about Fingallian in the hopes of getting more opinions. - -sche (discuss) 03:51, 24 January 2026 (UTC)Reply

I support merging Fingallian with English MolingLuachra (talk) 16:44, 24 January 2026 (UTC)Reply

I occasionally encounter reconstructions for Proto-Luwian. Kloekhorst has some, Dunkel also has at least one. So shouldn't this be added as language? Exarchus (talk) 14:06, 23 March 2025 (UTC)Reply

I think generally Proto-Luwian doesn't have a lot of difference from Proto-Anatolian, and the number of different lexemes is very limited. The classification of lower-branch Anatolian is also unclear, so we'll have a problem with deciding which languages are Luwian and which are not. Thadh (talk) 16:52, 28 March 2025 (UTC)Reply

We already have a category Luwic languages. But it seems both 'Proto-Luwic' and 'Proto-Luwian' are in use, and Kloekhorst differentiates between them here: "This means that Lycian stems from a sister language to Proto-Luwian and that both can be regarded as distinct daughters of Proto-Luwic." But Kloekhorst in his earlier dictionary apparently considers Lycian part of 'PLuw.', given as "Proto-Luwian" on page xii. So I'm indeed not sure how established this classification is. Exarchus (talk) 17:24, 28 March 2025 (UTC)Reply

Wikipedia claims that E language is a "Tai–Chinese mixed language".

Luo & Deng (1998) (which argues for that mixed language stance) identifies 53 out of 98 Swadesh list vocabulary as Kra-Dai (which is still a majority of the vocabulary), and 33 out of 98 as Sinitic, but the latter includes several words that are miscategorised e.g. sɔŋ¹ (purportedly from 雙 / 双) is from *soːŋᴬ and ultimately from Middle Chinese 雙 (sraewng), ku¹ (purportedly from 孤 (gū)) is from *kuːᴬ.

{{R:eee:Wei & Wei 2011}} suggests that the many supposedly Sinitic features in E suggested by Luo & Deng (1998) can also be found in other Tai or Kra-Dai languages. They further proposes that E actually constitutes as the third group of Zhuang, but I don't find this extremely convincing.

Overall my impression is that E is just a Tai language with a very strong Sinitic influence, and I suggest that we simply set the parent of E eee to Tai tai. (with the added benefit of only having to link to the Proto Tai entry instead of listing cognates) – wpi (talk) 18:34, 12 April 2025 (UTC)Reply

Meh. It seems impossible to tell whether it is underlyingly a Tai language which has heavily mixed with Chinese, or a Tai-Chinese mixed language. I have no strong feelings, but it seems like it should be possible (and if it is not currently possible, we should make it possible) to say that a term in a mixed (e.g. Tai-Chinese) language derives from a (e.g. Tai) protolanguage root—and link there for cognates—even if we don't reclassify the language as a descendant of solely that protolanguage. - -sche (discuss) 03:59, 19 April 2025 (UTC)Reply

@-sche: I don't have very strong feelings for this, but I simply find that (a) from a linguistic perspective, the mixed language argument is less convincing than the other – by the same logic one could say that English is a mixed language due to its large French/Latinate vocabulary and French-influenced morphology (well there are some who takes such view but the general consensus is that English is a Germanic language), and (b) from a editing perspective, it will be easier to work on etymologies for a normal language (as opposed to a mixed language), see for example the often inconsistent etymology template usage in our pidgin and creole entries. – wpi (talk) 13:49, 20 April 2025 (UTC)Reply

Support. With languages lacking historical documentation there is often not a solid line to be drawn between mixed language vs. creole vs. heavy loanage vs. substratum effect, et cetera, but if a reasonably clear leaning can be discerned and is being commented on in literature, there should be no issue (for our purposes) treating it as straight inheritance. 🌙🐇 ^⠀talk⠀ ^{⠀contribs⠀} 08:29, 21 April 2025 (UTC)Reply

Wikipedia has the language at Upper Morehead language, which is likely to be more distinct of a name than Wára (there are two other Wara languages given in Wikipedia), but even more to the point, the name Wára actually refers to one of the dialects of this language, not to the language itself. Although Ethnologue (and hence ISO 639-3) appears to use the term Wára for the language as a whole, Glottolog calls it Anta-Komnzo-Wára-Wérè-Kémä based on the five identifiable dialects. The only comment in Wikipedia about this language is:

Upper Morehead, also known as Wára, is a Papuan language of New Guinea. Varieties are Wára (Vara), Kómnjo (Rouku), Anta, and Wèré (Wärä); these are divergent enough to sometimes be listed as distinct languages.

So maybe at some point some of the dialects will be split into separate languages but at this point given the single ISO code and Glottolog's view, I would keep as a single language and use a term that does not match any individual dialect. @-sche? Benwing2 (talk) 05:53, 14 April 2025 (UTC)Reply

Oof, the fact that not one but two dialects/lects of this language are sometimes spelled Wara (give or take some diacritics) seems confusing, but Wikipedia says "Upper Morehead" is also polysemous, sometimes denoting Arammba instead. I will try to find out how commonly it denotes this language vs Arammba. (Exonymic placename language names like "Upper Morehead" always feel a little bit kludgy to me, but sometimes it can't be avoided.) - -sche (discuss) 03:59, 23 April 2025 (UTC)Reply

Hanlao language (漢佬話 or 旱澇話 in Chinese, both romanises to Hanlao) is spoken in the northern parts of Qinzhou, Guangxi, China. The primary sources are Luo (2016) (Bulletin of Chinese Linguistics #9 pp121-150, accessible via https://www.academia.edu/59519239/) and the Qinzhou City Annals.

Its affiliation is unclear, most sources either claim that it is a Zhuang-ised Sinitic language or a mixed language between Tai and Sinitic. However based on the description in Luo (2016) (e.g. 57 out of 97 Swadesh list words are cognates with Zhuang and 22 out of 97 with Sinitic), I believe the case is likely similar to Category:E language above, i.e. Hanlao is a heavily Sinitised Tai language. At any rate, it is clearly distinct from other Sinitic or Tai languages.

There is no ISO code, so I propose tai-han. – wpi (talk) 18:37, 14 April 2025 (UTC)Reply

Support on adding the language,

Weak support on Tai inheritance; agree in principle per E above but this specimen seems to have received less relavent coverage in literature. 🌙🐇 ^⠀talk⠀ ^{⠀contribs⠀} 08:30, 21 April 2025 (UTC)Reply

Support adding it; neutral on how to classify it. I tried to do my part / due diligence and look for sources about it (so things don't just sit on this page getting little input); the few mentions of it I could find do support that it is a language (as opposed to e.g. a dialect of another Zhuang language; a main concern whenever anyone proposes to add a new language is to be sure it isn't already / better covered as another language). Luo's paper suggests that more central vocabulary is Zhuangic and more peripheral words come from Cantonese, Pinghua, and Hakka, yes? which would perhaps suggest it is indeed underlyingly Tai. - -sche (discuss) 23:03, 21 April 2025 (UTC)Reply

The Podlachian Language is the East-Slavic language spoken between Narew and Bug. This language has own website and has article in Wikipedia, but there aaren't anything in Wiktionary. Could I create some articles about this here? PGałązka (talk) 10:34, 22 April 2025 (UTC)Reply

@Underfell Flowey @AshFox @Ssvb @Sławobóg @Thadh @Benwing2 as users with any sort of regular contact with East Slavic. Vininn126 (talk) 10:53, 22 April 2025 (UTC)Reply

Ok. I have any sort of regular contact with East Slavic too, I'm a half-Podlashuk. PGałązka (talk) 13:27, 22 April 2025 (UTC)Reply

@PGałązka: One of the biggest problems is that the w:Podlachian language doesn't seem to have the ISO 639-3 code yet (look at the "Proposal for several languages without ISO codes" topic above). Also there are questions about the availability of citations in durably archived sources and about the number of potential Wiktionary contributors in this language. If it's just you alone and you eventually lose interest, then the Podlachian content may become a liability. Additionally, your "half-Podlashuk" self-assessed status is not very reassuring, as there were some hot topics Wiktionary:Beer_parlour/2025/April#Prohibit_AI-generated_content and Wiktionary:Beer_parlour/2025/April#Formally_allowing_removal_of_Babel_boxes_by_other_users_if_proficiency_is_contradicted recently. These are the details that would be useful to clarify. --Ssvb (talk) 14:24, 22 April 2025 (UTC)Reply

"Once again they want to divide Ukrainian dialects into separate micro languages...". Actually I'm not against adding Podlachian, and even West Polesian... but East Slavic languages should be tidied up... he tree of East Slavic languages should ideally look like this (see below), which would maximally please reality... under such conditions I am only for adding Podlachian and West Polesian... and not any other inadequate options with "attempts to deduce Podlachian from the times of Kievan Rus". — AshFox (talk) 14:35, 22 April 2025 (UTC)Reply

* East Slavic:
** Old East Slavic:
*** Middle Russian: [etym-only]
**** Russian:
*** Old Ruthenian:
**** Middle Belarusian: [etym-only]
***** Belarusian:
**** Middle Ukrainian: [etym-only]
***** Carpathian Rusyn:
***** Podlachian:
***** Ukrainian:
***** West Polesian:
** Old Novgorodian:
*** Old Pskovian: [etym-only]

OK. But i don't understand. Can I add this language or no? PGałązka (talk) 17:28, 23 April 2025 (UTC)Reply

I'm pretty sure only template editors and admins can add a language. —Mahāgaja · talk 20:18, 23 April 2025 (UTC)Reply

I don't think that's what they were asking. @PGałązka: On a technical level, if you want to add Podlachian entries, it should to be added to Module:languages (which, yeah, only a template editor or admin can do), but you need consensus from other people who make entries in East Slavic languages before it's added, hence the pings above. You should wait for their input; in my experience though it might be a little difficult to split off a new language code, since you'll also have to go through Ukrainian entries and decide whether they fall under Podlachian or not. Saph (talk) 21:59, 23 April 2025 (UTC)Reply

Sorry to revive this nearly-month-old thread, but my two cents as someone who does stuff with Carpathian and Pannonian Rusyn (formerly also Belarusian), has no ethno-linguistic dog in the fight, and has had a bit of contact with Podlachian media: I wouldn't classify Podlachian under Ukrainian or Belarusian. Old Ruthenian is as far as I'd confidently go in terms of ancestors, but Podlachian displays both features of Belarusian and Ukrainian, most notably akanie and /d͡zʲ/ from the Belarusian perspective, but also a greater prominence of /ɫe/ from the Ukrainian perspective. Not to mention that different varieties classified under the broad umbrella of Podlachian display different degrees of Ukrainian and Belarusian characteristics. The Maksymiuk standard of Podlachian largely doesn't take akanie into account for example (like Ukrainian), but Niczos from Sw@da x Niczos sings in a variety that has akanie (like Belarusian).

Nonetheless I think separate classification is still a good idea, precisely because of this etymological ambiguity as to whether it belongs under Belarusian or Ukrainian or both or neither. In addition, Podlachian seems to be written in both the Latin and Cyrillic scripts (contrary to Maksymiuk's best efforts), and classifying them under either Belarusian or Ukrainian would just clog up the "Belarusian/Ukrainian terms spelled with X" categories, as it already is doing. Instead, one could look towards Serbo-Croatian as an example, and list Podlachian as written in both Cyrillic and Latin so it doesn't generate a million "spelled with" categories. Of course it does need to have a distinct ISO code first, or some code needs to be invented for classification within Wiktionary.

@Ssvb: about availability of citations: the Wikipedia page indicates that there are several texts in Podlachian being published regularly, as well as novels, poetry and memoirs. That's more potential citations than the entirety of Solombala English, which seems to rely entirely on a small handful of sentences from the 1800s for actual usage. My concern is that relying on the Svoja.org website too much would create a disproportionate image and under-represent certain varieties of Podlachian.

But that's just my two cents. Insaneguy1083 (talk) 11:22, 19 May 2025 (UTC)Reply

@Insaneguy1083 Thanks for taking a look and posting your opinion. I also have done my own research of the available public information, but I still would like @PGałązka to first provide a lot more details (their self-assessed language competence and the geographical location of the place where they learned it, since there are many local variants), and then make a practical proposal for their vision of how things should be preferably handled. Ssvb (talk) 04:54, 20 May 2025 (UTC)Reply

“Kulon–Pazeh” refers to a linguistic subgroup containing Pazeh and the extinct Kulon (if ever existed as a language). AFAIC this is an obsolete terminology. The Kaxabu people are culturally related to the Pazeh. Should we use the modern term Pazeh–Kaxabu like on wikipedia instead? I felt a bit confusing to use Kulon-Pazeh. Chihunglu83 (talk) 02:22, 10 May 2025 (UTC)Reply

Update: code uun is now a retired code and it was split into pzh and uon in 2022, while Kaxabu has not recognized as a language. Chihunglu83 (talk) 15:09, 14 May 2025 (UTC)Reply

(Other discussions: 1, 2.) The codes for the varieties have been added following the ISO, but uses of "uun" remain, e.g. in tarisi, tåli. We need to switch these to (as I understand it) pzh, and then uun can be completely retired (following the ISO). Maybe I can do this with AWB later or maybe Benwing can bot it. - -sche (discuss) 08:32, 26 January 2026 (UTC)Reply

@-sche I added tracking for the code uun; over time all instances will show up in Special:WhatLinksHere/Wiktionary:Tracking/languages/uun. Benwing2 (talk) 08:40, 26 January 2026 (UTC)Reply

Out of curiosity, any idea why Template:U:Finnic telicity or Module:ja-numeral are showing up in that Whatlinkshere? I think I have cleaned up the actual uses and the rest are ghosts of that sort. - -sche (discuss) 22:30, 6 February 2026 (UTC)Reply

It turns out this is because both of them have documentation that uses fam:... in place of the language code to indicate that the template applies to a specified family. The code that processes this calls getDescendants() to fetch the members of the family, and this is implemented in an extremely inefficient way in that it iterates over every single language, etym language and family to check if it has the specified family as an ancestor. I think the reason for doing this is that we don't have any cached mappings from families to members of that family, so there's no alternative but to check every language. Luckily the places where getDescendants() is called are (currently) not time-sensitive, so we're not running into problems from this. If the number of such "ghosts" is high, we could add a flag to getByCode() in Module:languages to tell it not to track the language in this case. Benwing2 (talk) 03:28, 7 February 2026 (UTC)Reply

Spoken in Southern Khuzestan, Iran, as a branch from Iraqi Arabic with influences from Gulf Arabic, Luri and Persian. Around half a million speakers.

Wikipedia article: Khuzestani Arabic
No ISO code, acm-IR in IETF but we could use acm-ira like fa-ira

Saam-andar (talk) 17:24, 25 May 2025 (UTC)Reply

IMO we have too many Arabic L2's already and don't need another one, esp. as Wikipedia explicitly describes this as a dialect of Gilit Mesopotamian Arabic and not its own language. Benwing2 (talk) 19:34, 6 June 2025 (UTC)Reply

Hmmm actually are you proposing this to be an etym-only language? That's probably OK. Benwing2 (talk) 19:36, 6 June 2025 (UTC)Reply

@Benwing2 Fair enough, would be thankful to have it as an etym-only. Saam-andar (talk) 12:11, 7 June 2025 (UTC)Reply

I've already responded to this request on Discord. It is merely a subdialect of Gelet-type dialects which are grouped under Iraqi Arabic. — Fenakhay ^{(حيطي · مساهماتي)} 23:36, 6 June 2025 (UTC)Reply

@-sche @Fenakhay can clarify but AFAIK this request was refused. Benwing2 (talk) 08:22, 26 January 2026 (UTC)Reply

OK, I was going to ask, User:Saam-andar do you still want an ety-only code for mentioning Khuzestani cognates in etymologies and such, and if so, Fenakhay are you opposed to that? Not making it a full language code, just an ety-only code? The Khuzestani (sub)dialect is mentioned in literature ("Khuzestani+"&as_sdt=0,34 ), so even if it's completely mutually intelligible with Iraqi Arabic, it might still be worth mentioning sometimes (I don't know) in the same way we have ety-only codes for things like California English which are definitely not separate languages from other English. - -sche (discuss) 08:36, 26 January 2026 (UTC)Reply

I’m fine with it as etym-only, not as an L2. — Fenakhay ^{(حيطي · مساهماتي)} 22:03, 29 January 2026 (UTC)Reply

@Saam-andar Added as an etymology-only variety of "acm", with code "acm-khu". - -sche (discuss) 22:35, 6 February 2026 (UTC)Reply

According to the book Ohlone/Costanoan Indians of the San Francisco Peninsula and their Neighbors, Yesterday and Today, the term Utian is derived from Proto-Costanoan uţxi ("two") + -ian, created by William F. Shipley in 1978. I wanted to add that to the Etymology section on Wiktionary, but Wiktionary doesn't have an appropriate language code.

"Proto-Costanoan", more accurately Proto-Ohlone, is a lower-order reconstruction of Proto-Utian (nai-utn-pro), and the reconstructed proto-language of the Ohlone languages; its numerals were in 1990 reconstructed by Catherine A. Callaghan, who has also referenced it in her other publications. Callaghan uses the name "Proto-Costanoan", but the standard modern-day term for the family is Ohlone, and the term "Proto-Ohlone" is often used nowadays.

With that in mind, I'd like to request that the Proto-Ohlone language be added to WT:LOL/S, with the code nai-ohl-pro. The Ohlone language family is not currently on Wiktionary, so I'd also like to request that a corresponding code be added to WT:LOF, with the code nai-ohl. For completeness, I also request the addition of the Miwok family (nai-miw) and Proto-Miwok (nai-miw-pro, also worked on by Callaghan), the other major subdivision of the Utian languages. Ookap (talk) 19:40, 12 June 2025 (UTC)Reply

@Ookap I moved your requests to WT:LTR as that is where these sorts of requests are normally made. I don't know anything about Ohlone or Miwok or even who to ping other than @-sche; you might poke around to see who has edited terms in these families and ping them. Benwing2 (talk) 20:43, 12 June 2025 (UTC)Reply

Thanks! I've updated Help:Adding and removing languages to mention this page instead of BP. Ookap (talk) 20:51, 12 June 2025 (UTC)Reply

@Benwing2: I'm certainly not an expert (my main interest is ethnobiology), but I'm familiar with pretty much all of the languages of California in very general terms, so I will often at least have an opinion on them. California has been inhabited for a long time, has lots of geological barriers (not to mention covering a very large area) and has been out of reach of all the pre-Columbian civilizations of the Americas, so the historical linguistics is very complicated and hard to resolve into anything large-scale. You have some families that are better known elsewhere, like Uto-Aztecan, Na-Dene and Algic, but then you have a number of isolates and smaller families. There were a couple of ambitious proposals in the early days, the Hokan languages and Penutian languages, that still haven't been proven on the highest level, but there's been some progress on demonstrating the validity of many of the parts.

The Utian languages are are one of those "Penutian" parts where progress has been made. The Miwokan languages have always been accepted as a valid group and the Ohlone languages (I've always known them as Costanoan) as well (with some debate as to whether the latter are languages or dialects). I don't know much on the substance, but it seems to me like Utian should be worthy of Wiktionary recognition, and maybe the Yok-Utian languages. — This unsigned comment was added by Chuck Entz (talk • contribs) at 04:59, 13 June 2025 (UTC).Reply

Adding family codes for Miwok and Costanoan/Ohlone seems reasonable, and adding Proto-Miwok and Proto-Costanoan—unfortunately, I can find very few sources calling it "Proto-Ohlone" (which may mean we should also call the family "Costanoan" for consistency). - -sche (discuss) 03:03, 14 June 2025 (UTC)Reply

IMO given the weight of sources we should be using "Costanoan" unless there is strong evidence of a recent shift towards "Ohlone", and the name of the proto-language needs to match the name of the family unless there's a really good reason for the divergence (which I don't see here). Benwing2 (talk) 03:18, 14 June 2025 (UTC)Reply

I agree with Chuck Entz in that these are pretty accepted groupings. People aren't completely sure about Penutian and Yok-Utian (and I find at least Penutian a bit dubious), but Utian has long been very proven and accepted (and, along with Proto-Utian, is in fact already on Wiktionary). Similarly, the Miwok (or Miwokan) and Ohlone language families are very clearly accepted groupings, and for me should be on Wiktionary.

With regard to whether to use "Ohlone" or "Costanoan"...unfortunately, almost all sources on Proto-Ohlone (Proto-Costanoan) are from the 1990s, meaning they use the name "Costanoan". Living in the area nowadays, I can say that the name "Costanoan" has completely fallen out of use for the ethnic group and language family, perhaps partially as part of recent efforts to revitalize their culture and languages. Most people here would likely not know what "Costanoan" meant, but know "Ohlone" well, and from what I know Ohlone people, while they might know the word, disclaim it as a colonizer term—even Wikipedia lists the ethnic group as "formerly known as Costanoan". With that said, given formal linguistic sources, most of which are older, mostly calling the language family Costanoan, I suppose I can understand why Wiktionary might want to call it that. My personal preference having grown up in the homeland of the Ohlone leans heavily toward "Proto-Ohlone" and "Ohlone languages", but my more important personal preference is that the proto-language is added, no matter the name. Ookap (talk) 08:07, 20 June 2025 (UTC)Reply

Several Ainu entries (such as プリ and アㇷ゚ト) show derivations from Proto-Ainu, but there's no corresponding code on WT:LOL/S so they can use templates as is the norm. Proto-Ainu, the reconstructed proto-language of the various Ainu dialects (or of the Ainuic family, already in Wiktionary as qfa-ain), has been reconstructed by Vovin (if not others), and we even have an appendix of reconstructions. Therefore, I request a code, likely ain-pro or qfa-ain-pro, be added, so Ainu entries can use proper Wiktionary formatting. Ookap (talk) 19:52, 12 June 2025 (UTC)Reply

(moved from User talk:Theknightwho)

Hi, could you update the data for these two languages to add a bit more flexibility?

For Ewe (ee), it’d be great if diacritics are stripped at the entry level, specifically acute, grave, circumflex and caron? They correspond to high, low, rising and falling tones respectively.

For Krio (kri), likewise, remove diacritics (but only acute, grave, circumflex) and also add a sort key with the following order:

ɛ after e
gb after g
kp after k (digraphs gb and kp are both treated as separate phonemes)
ɔ after o

Cheers heaps — ^{oi yeah nah mate} amazing JUSSO ... [ɡəˈdæɪ̯]! 01:11, 9 June 2025 (UTC)Reply

Also pinging user @Fenakhay — ^{oi yeah nah mate} amazing JUSSO ... [ɡəˈdæɪ̯]! 22:44, 11 June 2025 (UTC)Reply

Before implementing this, I'd like to hear some confirmation from other knowledgeable editors that these changes are correct, or at the very least, sources showing that (a) these diacritics are used in dictionaries, (b) the diacritics are not used in running text outside of dictionaries. Benwing2 (talk) 03:20, 14 June 2025 (UTC)Reply

@Benwing2 For the Ewe language, the tone markings are based on Nuseline's Ewe-English dictionary and Basic Ewe for Foreign Students. In the latter source, it says "Note that native speakers of Ewe often leave the marking of tones aside. For learners of the language, however, the marking of tones is essential".

For the Krio entries, the tones and letter orders are based on A Krio-English dictionary by Clifford Nelson Fyle. The Wikipedia article also says for the tones: "Three tones can be distinguished in Krio and are sometimes marked with grave (à), acute (á), and circumflex (â) accents over the vowels for low, high, and falling tones respectively but these accents are not employed in normal usage." — ^{oi yeah nah mate} amazing JUSSO ... [ɡəˈdæɪ̯]! 23:08, 14 June 2025 (UTC)Reply

It seems like there isn't any update on this so far — feel free to have a look at these sources for reference @Benwing2 — ^{oi yeah nah mate} amazing JUSSO ... [ɡəˈdæɪ̯]! 01:22, 20 June 2025 (UTC)Reply

@AmazingJus as far as Ewe is concerned, I found these Bible translations [15] [16] that contain only the occasional acute and grave accent (perhaps only in word-initial mí- and wò-). Could you comment on this?

I don't see any issue with the Krio side of things. I'll add that to the modules unless Benwing2 offers further objections. This, that and the other (talk) 11:04, 21 January 2026 (UTC)Reply

@This, that and the other No objections. Keep in mind that some Bible translations esp. more poor quality ones may not correctly mark tones and such. The question IMO is more are these used in dictionaries, since we're a dictionary and should follow the practice of other dictionaries. Benwing2 (talk) 20:53, 21 January 2026 (UTC)Reply

@This, that and the other Some texts mark tones for disambiguation; in this case the mí- and wò- are both pronominal prefixes attached to verbs to distinguish them from other prefixes. Here, the accent marks signify first person plural and second person singular respectively, as opposed to second person plural and third person plural. Those Bible translations are of relatively good quality, but most native speakers know the tone by context; here, with pronominal morphemes, it gets marked as it's important information. — ^{oi yeah nah mate} amazing JUSSO ... [ɡəˈdæɪ̯]! 23:46, 21 January 2026 (UTC)Reply

@Benwing2 @AmazingJus it seems clear to me that Ewe dictionaries (at least modern ones) mark tones. My concern is that Ewe texts seem to consistently place accents on those prefixes. This is explicitly stated in [17] page 38: "Note that pronouns – in order to facilitate differentiation – are always written with all their tone marks, i.e., the low tone is marked as well as the high tone." I can also find evidence to back it up (e.g. the Bible translations, this book cover, and even this old text in an obsolete orthography, which doesn't appear to be marking tones in general, but can still be seen to mark mí-, wò- etc).

It seems that we should strip accents everywhere but those pronominal prefixes. I'm not sure if our system is capable of that. This, that and the other (talk) 01:14, 23 January 2026 (UTC)Reply

@This, that and the other I do think we can set this up using the remove_exceptions field. Take a look at Bulgarian for an example of doing this; we strip all grave accents except on the single word ѝ (a pronoun), which is consistently written with a grave accent to distinguish it from the conjunction и (i, “and”). Benwing2 (talk) 01:20, 23 January 2026 (UTC)Reply

@AmazingJus added for Krio: Special:Diff/89311321/89319618 This, that and the other (talk) 00:53, 23 January 2026 (UTC)Reply

I found out from here that the split between North and South Levantine Arabic was dissolved by the ISO two and a half years ago. The 2023 discussion about merging them on Wiktionary didn't end up going anywhere because of a lack of available contributors, and I think that situation is even worse today, as the South Levantine Arabic project has left Wiktionary and North Levantine Arabic never sees much concerted activity.

I think it's good for the Wiktionary merger to happen, but I have some concerns. I want to solicit ideas for making the work manageable, keeping in mind that it could easily fizzle out this time too.

After merging, I don't get how to account for all of Levantine. The expansion to Levantine Arabic adds a burden on everyone to know things about Levantine varieties they probably don't have knowledge of in order to make an entry complete.

This burden is technically there now too, but I feel like the old ISO split creates small enough halves that it feels okay to get away with only focusing on one subvariety within those halves: North Levantine seems to mostly have had Lebanese Arabic contributors (just with a translit convention that's kind of? inclusive of Damascene) and South Levantine focuses on Palestinian because it was part of User:AdrianAbdulBaha's push to improve online resources for Palestinian Arabic. This feels harder to handwave away now.

I do feel like focusing on as small of a comprehensive area as possible is how you get things done, which is why I'm concerned about expanding the scope of Levantine.

Relatedly I'm worried about the module/template infrastructure growing unmanageably large compared to the amount of people available to maintain it (let alone maybe being unreadably spammy on transclusion). I'm lazily working on Module:User:Still, when you think about it/apc-IPA to account for some of the variation in what "North Levantine" was supposed to cover, where South Levantine Arabic never had something similar, but it seems like now it'd be good for an {{apc-IPA}} to also cover Palestinian and Jordanian varieties that I don't have very much knowledge of how to divide. The same goes double for {{ajp-conj}}.

Supposing the merger does ago ahead, South Levantine Arabic has 3016 entries (128 non-lemmas) and North Levantine Arabic has 505 entries (9 non-lemmas). I feel like all of those entries will need to be checked for whether they're exclusively Palestinian/South Levantine, exclusively North Levantine, or shared in order to add the right term, sense, or accent labels.

I think this has to be done by checking terms against published references (even for those of us currently active who do speak Levantine Arabic — at least for my part I don't know all of what's used and not used outside of my own dialect). Would it be of use to add some kind of "warning, this term needs to be assigned to a location — you can help by locating it in these references and then removing this warning" template under the L2 header of all 3k-ish merged entries to allow the effort to continue even as contributors come and go over time? (Just saw that this is what User:A455bcd9 suggested during the initial conversation as well)

Might be overthinking. Pinging some Arabic contributors, including old ajp editors that may still be around: User:Fayçalmf, User:Fenakhay, User:Benwing2, User:SarahFatimaK

Possible references

J. Elihay's 2004 Olive Tree Dictionary of Palestinian Arabic
Lughatuna's [S] [P] [L] tags for Syrian/Palestinian/Lebanese Arabic
- يَاسِين عَبْد الرَّحِيم، مَوْسُوعَةُ العَامِّيَّةِ السُّورِيَّة (yāsīn ʕabd ar-raḥīm, mawsūʕatu l-ʕāmmiyyati s-sūriyya) from 2012 for (urban?) Syrian Arabic, also available on Lughatuna
Anis Freiha's 1947 Lebanese Arabic dictionary ({{R:apc:Freiha:1947}})
Roger Makhlouf's 2018 Lebanese/English lexicon
Maybe Barthélemy's 1890 Aleppine dictionary
Jordanian references??

Still, when you think about it (talk) 18:40, 15 June 2025 (UTC)Reply

@Still, when you think about it In my experience, mergers are always harder than splits, and you're running up against this reality. I think in practice it's fine to have a warning indicating that a given term was originally North Levantine or South Levantine and hasn't been assigned appropriate labels. I also think focusing on a limited set of dialects is sufficient, maybe just urban Syrian, Lebanese and Palestinian, or the three + Jordanian (I don't know how different Jordanian Levantine is from Palestinian Levantine). As for designing the templates themselves, we can maybe follow the approach of Occitan, which has been able to handle several dialects under one L2, and design the templates so that if someone knows the correct inflections for only a limited set of dialects, only those dialects get displayed. I can help with the coding aspects. There's also the Richard Harrell series of Syrian Arabic grammar and dictionaries, I know these are a bit old but generally I have found the series reliable. Benwing2 (talk) 18:55, 15 June 2025 (UTC)Reply

Also, {{pt-IPA}} is able to handle several different Brazilian and European Portuguese dialects, and might produce some ideas as to how to handle the pronunciation differences. The general approach followed is to prefer a single spec that gives the maximal information (e.g. I know that some Levantine dialects have merged short ĭ and ŭ but others keep them apart; the "maximal information" would distinguish these two and the underlying code would merge them appropriately for the dialects that merge them), but allow different specs for different dialects. Benwing2 (talk) 18:58, 15 June 2025 (UTC)Reply

Finally, there is the issue of how to represent the script. All of the resources I'm familiar with use transcription, but Wiktionary prefers using the original script. I don't even know if there's a standard for how to represent the various dialects of Levantine Arabic in Arabic script, much less the specifics of how this works if it exists. Maybe you can help me understand this. Benwing2 (talk) 19:00, 15 June 2025 (UTC)Reply

There's no top-down standard for dialects specifically, but in real life people write their dialects in the Arabic script, which I say is good for a non-specialist dictionary to reflect. (The exception is mostly Lebanese speakers in their early 30s and younger, who practically exclusively write in 3arabizi online, which would be good to document but has too many random variables). Descriptively/impressionistically, spelling matches Standard Arabic spelling, except for

stopped interdentals (always spelled with the plosive letter)
emphatics that all dialects have deemphasized (always spelled with the plain letter, not the emphatic letter, like ركد (rakad, “to run”))
feminine -i (almost always spelled ـي to match morphological reanalysis, as in ـكي ـتي)
3ms -o, which Lebanese often spell ـو instead of ـه
other sound shifts that collapse a distinction that the Arabic script is supposed to indicate, where it's more correct to match the Fusha spelling but commonplace to respell phonetically (e.g. ق، ـوا، ـة, assibilated interdentals, emphatics in dialects that are losing emphasis across the board, etc)

Still, when you think about it (talk) 14:25, 16 June 2025 (UTC)Reply

Speaking of standards, this reminds me that translits are an issue I forgot about. We don't have room to make them show different variants like we do with IPA. Would it honestly be acceptable to just do without translits? If not, I'm imagining a weird amalgam of different pronunciations, and it's a little awkward because even though they're trans-"liter"-ations they kind of suggest pronunciation info nevertheless:

بيض (bayð̣, “eggs”)
In terms of pronunciation, the combo of -ay- and interdentals is rare, but in terms of translit that's the highest-info representation of what these letters spell
تلة (talle, “hill”)
There are dialects with invariable ة (/⁠-a⁠/), but you can always derive that from the form of a dialect with ة (/⁠-a, -e⁠/) and not vice versa, hence the symbol -e here. (Or in the true spirit of translit do we want a special symbol only for ة}?)
قبضاي (qabaḍāy, “macho man”)
This is from an Ottoman Turkish /d/, which formally was loaned as /dˤ/, but some speakers with interdentals went on to associate this /dˤ/ with their /ðˤ/. Is the proper form قبضاي (qabaḍāy /⁠-ḍāy, -ð̣āy⁠/)? (I would prefer قبضاي (qabaḍāy /⁠qabaḍāy, qabað̣āy⁠/) but I don't want /q/ in |ts=)
تعتير (taʕtīr, tiʕtīr, “miserable situation”) ~ تعثير (taʕṯīr, tiʕṯīr)
This interdental seems fine. The ت only spells t and the ث is only used for dialects with interdentals, in which it can spell ṯ. But I'm not sure how to smooth over the templatic variation in the first vowel. I guess a real "transliteration" would be تعتير (tʕtyr) ~ تعثير (tʕṯyr) (same goes for all examples, of course), but I don't think anyone would want that...
قاظان (qāẓān, “water heater”)
(I actually thought nobody said this with an interdental, the IPA on that page is new to me. If they do, then قاظان (qāẓān /⁠-ẓān, -ð̣ān⁠/, “water heater”) in the same vein as قبضاي (qabaḍāy /⁠-ḍāy, -ð̣āy⁠/, “macho man”) above?)
ظروف (ẓrūf, “circumstances”)?
Or ظروف (ẓrūf, ð̣rūf, “circumstances”), or ظروف (ẓrūf /⁠ẓrūf, ð̣rūf⁠/, “circumstances”)?
صغير (zḡīr, ẓḡīr, “small”) ~ زغير (zḡīr, ẓḡīr)

Anything stand out as super wrong? The q feels a bit unfortunate because it's impossible not to try to pronounce it (and it's a minority pronunciation). I'm not sure how useful it is to have to make up our own WT:AR TR system like this vs. just not doing translits, if Wiktionary would allow that, and leaning only on the IPA available in entries. Still, when you think about it (talk) 15:32, 16 June 2025 (UTC)Reply

We can remove the interdental pronunciation on قاظان if you're unsure about it. I'm not 100% either.

About transliteration, we should probably stick to a standard. Probably one that matches urban South Levantine/urban Syrian. Alternate pronunciations can be represented by IPA, but having a standard would probably be more helpful. The amalgamation approach would probably be confusing. Fayçalmf (talk) 15:47, 16 June 2025 (UTC)Reply

Okay, I think that's a better idea. Potential translit guidelines:

No imala, only ا (ā)
Only ق (ʔ) in native vocab
No interdentals, only ظ ذ ث (ẓ z s) and ض د ت (ḍ d t)
Distinguish lax from tense -e -i and -o -u as urban Syrians or urban Palestinians/Jordanians do (I guess by majority rule, e.g. I know a coastal Syrian guy with sani for سنة (“year”) but the default ought to be سنة (sine, sane, “year”))
- What to do about -y -w? My own dialect has /maʃe/ مَشي (“walking”), /raʔe/ رأي (“opinion”), /ħelo/ حلو (“sweet, pretty, nice”), /ʒaru/ جرو (“puppy”) (MSA loan), but my understanding is these are all /-i -u/ in dialects that distinguish lax from tense final vowels. The Olive Tree Dictionary also gives ḥil^ew as an option for حلو, apparently.
  - Can we do مَشي (mašy, “walking”), رأي (raʔy, “opinion”), حلو (ḥilw, “sweet, pretty, nice”), جرو (jarw, “puppy”)?
  - Or just مَشي (maši, “walking”), رأي (raʔi, “opinion”), حلو (ḥilu, “sweet, pretty, nice”), جرو (jaru, “puppy”) and leave the details to the IPA? I actually like this better visually.
No diphthongs, except in cases where a dialect like Damascene will have them, like of course أو (ʔaw, “or”) and I think elatives like أوضح (ʔawḍaḥ, “clearer, clearest”) instead of ʔōḍaḥ
- Can we actually just do diphthongs unconditionally? It wouldn't be faithful to most dialects' pronunciation but I'm wondering if it's an OK tradeoff.
  - بيت (bayt, “house”), بيضة (bayḍa, “egg”), لون (lawn, “color”), لوقا (lawʔa, “crooked”), فير (fēr, “hair straightener”), بوش (bōš, “nada”), بنطلون (banṭalōn, banṭalawn)?
    - I think still شلون (šlōn, “how”) though (see last bullet below)
  - Or just بيت (bēt, “house”), بيضة (bēḍa, “egg”), لون (lōn, “color”), لوقا (lōʔa, “crooked”), فير (fēr, “hair straightener”), بوش (bōš, “nada”), بنطلون (banṭalōn)?
Violate these guidelines if the word itself is in violation of them, especially e.g. if it's restricted to or in imitation of a dialect that has differing features

Still, when you think about it (talk) 16:30, 16 June 2025 (UTC)Reply

I like these guidelines.

I agree with maši, ḥilu instead of maši, ḥilw.

I'm not sure what to do about diphthongs. It could be a case by case basis like what Wikipedia does with British v American spellings & just leave it how the person who wrote the article put it in or we could standardise (only diphthongs vs no diphthongs except where they are in Damascene). Fayçalmf (talk) 17:17, 16 June 2025 (UTC)Reply

Forgot one more thing to consider that's a whole headache of its own, which is the treatment of kasra and damma. I think when they're medial and closed or stressed we'd do best by trying to adhere to original i/u, as in:

متل (mitl, “like”), ضفر (ḍufr, “fingernail, toenail”), صدر (ṣidr, “chest”), جملة (jumle, “sentence”), خلص (xiliṣ, “he finished”)?

Quick review of other options, though:

Schwa them both like a lot of Lebanese and Syrian references do, but importantly not a lot of Palestinian Arabic references:
متل (mətl, “like”), ضفر (ḍəfr, “fingernail, toenail”), صدر (ṣədr, “chest”), جملة (jəmle, “sentence”), خلص (xəleṣ, “he finished”)
Same but with ⟨i⟩ to be more inclusive of non-schwa-ing lects (like my idiolect, so this is just a personal pet peeve). This gets a bit strained when it comes to terms like بكرة ("bikra", bukra) that are still predominantly with u.
متل (mitl, “like”), ضفر (ḍifr, “fingernail, toenail”), صدر (ṣidr, “chest”), جملة (jimle, “sentence”), خلص (xileṣ, “he finished”)
Adhere to one's own lect, but at least in my case this isn't of much use: I mostly merge to kasra outside of some sporadic retentions of damma, I systematically round this vowel around emphatics, and I don't feel like I have schwa.
- If I were using ⟨i u⟩ I'd transcribe my dialect as: متل (mitl, “like”), ضفر (ḍufr, “fingernail, toenail”), صدر (ṣudr, “chest”), جملة (jumle, “sentence”), خلص (xuliṣ, “he finished”)
- Otherwise if it were totally up to me my dialect would be: متل (metl, “like”), ضفر (ḍofr, “fingernail, toenail”), صدر (ṣodr, “chest”), جملة (jomle, “sentence”), خلص (xoleṣ, “he finished”)
- This seems like something to care about in the IPA section, though, not in the translit.

There's also -iC -uC in final syllables, which Cowell (Damascene) and the South Levantine project on here represent with e o. I would prefer i u for symmetry with the above and because as a bonus it's more inclusive of a common type of Lebanese variety that really does have [-iC] (which I believe often although not universally comes with -uC as well). We can separately figure out how to get more granular with lax/tense kasra and damma in terms of the IPA, though.

خلص (xiliṣ, “he finished”)

Lastly, there's the epenthetic vowel, which despite the fact that I sorta believe it's phonemic in many varieties I still believe shouldn't be represented in translit:

متل (mitl, “like”)

May have still forgotten other stuff, which I'll try add as it comes to mind (+I'll ask anyone reading to do the same too!).

Still, when you think about it (talk) 17:43, 16 June 2025 (UTC)Reply

I'm not a Levantine speaker but my vote is to maintain the i and u in transliteration according to conservative dialects that still maintain the original distinction clearly, except for words that don't exist in such dialects, where either i or schwa is fine. Dialects that merge the two can just ignore the distinction in pronunciation. This is problematic for dialects like yours where some u have merged with i but not all; either just ignore those dialects or show two transliterations, one with i and one with u. Hope this makes sense. Benwing2 (talk) 17:56, 16 June 2025 (UTC)Reply

I'm not completely opposed to a merger, but I'm not really for it either. It's mostly personal bias because I do like them being split, but if most contributers agreed to merging, I wouldn't have an issue with that. Fayçalmf (talk) 22:50, 15 June 2025 (UTC)Reply

Regarding IPA for different dialects, we could do what the main Arabic articles do when listing other dialect pronunciation & have the "main" pronunciation along with subdialectal pronunciations listed under it

Example using قاضي
IPA^(key): /ʔaː.dˤi/, [ˈʔɑːdˤɪ]
- (Druze, Coastal Syria) IPA^(key): /qaː.dˤi/, [ˈqɑːdˤɪ]
- (Bedouin) IPA^(key): /ɡaː.dˤi/, [ˈɡɑːdˤɪ]
- (Fellahi) IPA^(key): /kˤaː.ðˤi/, [ˈkˤɑːðˤɪ]

This would allow showing diversity in pronunciation while not needing contributers to have extensive knowledge on different Levantine dialects. Fayçalmf (talk) 02:23, 16 June 2025 (UTC)Reply

@Fayçalmf Yes, this is very similar to how {{pt-IPA}} handles Portuguese pronunciations. We have a "general Brazilian" pronunciation (reflecting an amalgam of the most common features cross-dialectally, and approximately the way newscasters in Brazil speak) and a "general Portugal" pronunciation (approximately reflecting a cultured Lisbon pronunciation), and nested underneath each are specific Brazil and Portugal regional pronunciations. This is also similar to how {{es-IPA}} works. So this approach is definitely feasible. Benwing2 (talk) 02:28, 16 June 2025 (UTC)Reply

I also like the split just because it gives us a smaller area to work with, but I can see why it's arbitrary and you can come up with other isoglosses to create whatever other split you would like. Relatedly, the other day I wanted to edit North Levantine Arabic منشان to add Western Neo-Aramaic miššōn- as a descendant and found that that page only has a South Levantine Arabic entry, and it felt bad to duplicate that whole thing to North Levantine Arabic just to add one tangential note. So this is my personal thinking. Still, when you think about it (talk) 14:30, 16 June 2025 (UTC)Reply

Yeah. Realistically speaking, after the initial hurdle of tidying everything up post-merge, it would be really nice to have everything contained into one Levantine Arabic section. Would it be possible to have categories for terms that are either exclusively South or North Levantine like we already have for Lebanese Arabic, Syrian Arabic, Palestinian & Jordanian? Things like هم can be placed into the South Levantine category & هن in the North?

Like:

هم • (homme) (enclitic form ـهم (-hom))

(South Levantine) they

هن • (hinne) (enclitic form ـهن (-hon, -yon, -on))

(North Levantine, Galilee) they

Fayçalmf (talk) 15:34, 16 June 2025 (UTC)Reply

@Fayçalmf Yes we can easily create such categories. Benwing2 (talk) 17:23, 16 June 2025 (UTC)Reply

@Still, when you think about it @Fayçalmf I moved this topic to WT:LTR, which is where we normally handle language splits and mergers. In order for this topic not to stall, it would help if one of you could create a list of what is needed compared with what we currently have, and think about drafting a plan of action. I can help with the latter, but am somewhat unsure about the former as I have not studied Levantine Arabic much (I took a couple of years of MSA classes back awhile ago when I was in school, and have studied Egyptian and Moroccan Arabic on my own in fair depth). Benwing2 (talk) 05:12, 17 June 2025 (UTC)Reply

Of course. Could I get a little more elaboration on "a list of what is needed compared with what we currently have?" Fayçalmf (talk) 05:22, 17 June 2025 (UTC)Reply

@Fayçalmf Ultimately what we want is a specific plan of action regarding steps to take to implement the merger. See for example Wiktionary:Grease_pit/2023/January#apc_and_ajp_merged, where I enumerated a possible plan of action for merging Levantine Arabic, and Wiktionary:Language_treatment_requests/Archives/2020-24#RFM_discussion:_February–March_2024, which has a similar but more recent plan of action for splitting Khanty into separate languages that was actually put into practice. Part of the work will be creating new modules and templates to handled the combined language, and we need at least a preliminary working version of these modules and templates before we put a lot of working into actually merging the lemmas. In order to create those templates, we need to know how they should behave, and this requires some input from Levantine speakers. The current North and South Levantine Arabic headword templates appear to be based on the Standard Arabic templates that @Fenakhay and I (among others) put together, but there are also South Levantine Arabic verb conjugation templates (there don't seem to be any such templates for North Levantine Arabic). The current templates are not designed for a multi-dialect language, so there will need to be some thinking about how to design them to handle the differences among Levantine dialects. One relatively simple way of handling different dialects is to have one headword line per dialect; see for example Galician querer, which has a line for the standard norm and another line for the reintegrationist norm, and similarly has two conjugation tables. Another approach is to not have anything in the headword; see for example Occitan alenar, which has 5 conjugation tables but nothing in the headword. This latter approach might not make sense for adjectives and nouns, because it requires a declension table for each adjective and noun, which might be overkill (e.g. for nouns, all you need to list is the plural). So what I would need as a start is a specific design for the noun, verb and adjective headword templates, with some examples of what the input would be and how it might display. I would start with nouns (which are easier than adjectives) and start with examples, rather than trying to come up with a design right away. Pick some common nouns and think about how to best display them, and then come up with a template syntax for specifying the relevant forms. I can help with the template syntax if I have several examples of the nouns and their plurals (both in Arabic script and transliteration). After that we can tackle adjectives, and then verbs. Benwing2 (talk) 06:00, 17 June 2025 (UTC)Reply

I might as well thrown in a couple of other multistandard systems: آب has Urdu (sister lect to Hindi) and Persian (Classical Persian/Iranian Persian/Dari/Tajik) entries where you can see different approaches and examples of infrastructure used to present the different scripts and pronunciations. Note that some of these have their own language codes, scripts and L2 headers, but there are templates that tie them together. Not that I'm specifically recommending any of these for the case at hand, but it may spark some ideas. Chuck Entz (talk) 06:55, 17 June 2025 (UTC)Reply

- Create a bot to turn all ajp articles into apc, alongside changing headers to ==Levantine Arabic==. We either have it leave lemmas with both to be dealt with manually like the original thread suggests, or we manually merge any terms in both ajp & apc to apc before running the bot.

- I'm not really sure how the ajp conjugation table works. If it can handle variations of the same form already (i.e. اطلع for South Levantine & طلاع for North), then great. If not, there need to be accomodations made.

- New categories created for 'North Levantine Arabic' & 'South Levantine Arabic' to hold region exclusive terms à la the country categories.

- There has to be an agreement regarding ر in IPA. ajp tends to use r/rˤ, while apc uses r/ɾ. I suggested earlier having a "standard" for IPA with regional variations below it, so we could use r for the standard and ɾ and rˤ for the regional pronunciations.

Those are some bullet points I have for now. I'd still like to hear input from @Still, when you think about it as well as what to do with tables & modules that already exist for apc/ajp to make a finalised plan for merging. I'll add more if I think of more things we need later on. Fayçalmf (talk) 11:31, 17 June 2025 (UTC)Reply

For sure, the ajp-conj template can be used as a base, but it needs updating to be able to handle variation. I like the Occitan example with multiple tables and I'll try to think about how to implement something similar.

About the categories, I'm wondering if we can avoid recreating the North/South Levantine split. Would it be possible to stick to "chiefly Syrian, Lebanese", "chiefly Palestinian, Jordanian", alongside transitional areas? I was convinced by User:A455bcd9's reasoning in the ISO proposal that said that the division was somewhat arbitrary and not derived from literature.

For IPA, I want to give all major pronunciation standards equal weight instead of deciding on one standard ourselves. This doesn't solve the issue of how to transcribe ر, but it does leave it up to individual accents to have it transcribed in their own way without interfering with ours. I'm coming up blank for now on what to do about it, though. Still, when you think about it (talk) 07:43, 18 June 2025 (UTC)Reply

Not sure countries are the best boundaries. Rural vs urban is often bigger than country A vs B. There are also sectarian differences (esp. Druze). So unless we know that a word is only widespread inside one country's border, it's better to stick to traditional areas ("Jerusalem", "Damascus", "Beqaa Valley", etc.). A455bcd9 (talk) 07:52, 18 June 2025 (UTC)Reply

Thanks! It's more daunting but you're right. Sorry about the two username mentions, I had mistaken you for inactive. Still, when you think about it (talk) 10:57, 24 June 2025 (UTC)Reply

A standard for IPA doesn't have to be based on one country/region's pronunciation. It can be a generalisation, then any deviations from the generalised pronunciation can be accounted for as well in the IPA underneath the "standard" one.

Some examples:

تقيل / ثقيل

برداية

شطرنج

IPA^(key): /-/, [ʃɑtˤˈɾˤɑnʒ], [ʃɑ.tˤɑ.ɾanʒ]

I like this method personally because contributers can just put in the most basic form of the pronunciation & nuance can be added in later by other contributers. If we do go with this method, then we have to figure out an order for them to go in so they're not random in every article. Fayçalmf (talk) 13:32, 18 June 2025 (UTC)Reply

Draft of a plan to merge North & South Levantine Arabic:

1. Rename apc to "Levantine Arabic"

2. Merge tables. Edit declension table to be able to accommodate North Levantine as well. (Note: I can't code, so I don't know the logistics of this step.)

3. Create a bot to merge ajp into apc. Leave articles with both apc & ajp entries alone to be dealt with manually.

4. Once everything is converted to apc, delete ajp from the language list.

5. Levantine speaking contributers will have to work on tidying things like IPA & formatting to meet new standards.

@Still, when you think about it, @Benwing2, @A455bcd9, @Fay Freak

Any thoughts/criticisms? Anything that needs to be added or something else that needs to be taken into account that I missed? Fayçalmf (talk) 20:20, 21 June 2025 (UTC)Reply

Looks like you guys can do it. I generally only have bookish knowledge so only added a few, often obsolete, Levantine terms, to finish some etymologies or circumstantial curiosity, when I could not think of much argument to sneak them in under the general Arabic header, and depending on the source it was sometimes left open which Levantine a term was gathered from (e.g. {{R:ar:Berggren}} claiming to have both Damascus and Jerusalem). Fay Freak (talk) 20:42, 21 June 2025 (UTC)Reply

Hey, sorry I dodged this. Randomly don't have the free time I've had for the last month or so. Will do my best to keep on top of this nonetheless because I don't want to leave it half unfinished. These steps look good to me, I'm just still hung up on the small details: the specifics of IPA formatting (and getting my apc-IPA template to not be broken, although that's more of a side project) and, annoyingly, the i/u thing when it comes to verb headwords, as the "North Levantine" dialects I'm aware of leveled almost all Form I verbs to yiCCuC vs yiCCaC* whereas "South Levantine" dialects seem to maintain a robust distinction between yiCCiC and yiCCuC that's completely impenetrable to me. Fortunately this doesn't affect the lemma (which will be the past form), but I guess this means either spamming multiple headwords per entry or just not doing headword lines? I prefer the former.

Verb conjugation tables may be easier to deal with. The different systems I'm aware of are

Coastal Syrian katbit, katbīto
- Also up in the mountains they use originally imperative forms like طلاع as the base lf the 1sg
Nearby Lebanese katbit, katbíto due to these dialects' a-elision, not as a purely morphological thing (do these dialects also have katbto?)
North-ish Lebanese ʕaṭyit, staḥyit, ḡilyit (typically w/ nonpast 3fs+2fs taʕṭe, tistíḥe, tiḡle but somewhat rarely 2fs tistiḥye); ḥkī "speak!" (احكي not احكيه), ktōb, kōl
Typical Lebanese and urban Syrian ʕaṭit, štarit, ḡilyit; katabit, katabíto; bimši, bḡanni, biḡannu (ignoring -i -u vs -e -o); tistíḥi~tistáḥi; stʕart, ḵtart; yitruk; ʔiḥki, ktōb, kōl
South Lebanese tistḥi; stʕirt, ḵtirt
Transitional South Lebanese~Galilean katabit~katabat, katabáto, kátabato; ʔiktub~ʔuktub, kōl
Palestinian/Jordanian and Aleppine bamši, baḡanni, b(i)ḡannu (last one also found in Lebanese areas); tistáḥi~tistaḥi
Palestinian and Jordanian yitrik; ʔuktub, kul "eat!"
Jordanian and regional Palestinian? katbato

There's some stuff I don't know the details of like the Palestinian distribution of eg ramaw vs ramu or what dialects do ḡilit.

Conjugation tables technically don't need to show connecting forms so we can ignore the -o stuff to start with. I would like to represent 4 and 7, of course, plus 3 and 5 if possible. (My knowledge of lesser-used forms is poor the further south in the region we get.) This actually seems doable with minimal to no Lua, unless we want some logic to automatically show multiple tables.

Still, when you think about it (talk) 10:55, 24 June 2025 (UTC)Reply

Regarding the i/u thing, it's possible to put both in the transliteration or just leave it to what the contributer put it in as. بكرة is already just (bukra), so leaving it as is would be fine, for example. Fayçalmf (talk) 11:47, 24 June 2025 (UTC)Reply

I really dislike multiple headwords on the same word. It's ugly. I think we could simply have the transliteration reflect the different pronunciations.

مشى • (maša) (non-past بمشي (bamši, bimši))

The ajp article for أخد has 2 declension table for بوخد & باخد. Perhaps we could do something similar?

===Conjugation===

[Mock table, regular حكى conjugation for most dialects]

====Chiefly Lebanon===

[Mock table, represents forms like حكيوا (ħakyu)]

Just a suggestion. The current ajp table did a good enough job with إجا on apc (with much more coding), so this approach could work. Any other ideas? Fayçalmf (talk) 22:47, 24 June 2025 (UTC)Reply

I see your point about multiple headwords -- maybe whenever it's needed we can equally just add a new L3 with {{alternative form of}} (just did this at جاتوه) -- and the fact that small variations seem easy enough to represent within one tr:

كبس • (kabas) (non-past يكبس (yikbis, yikbus), active participle كابس (kābis))

The one last tr-related thing on my mind is when it comes to usexes and quotes. I believe the trans-"lit" for quotes should also follow pronunciation, like I did for Salam el-Rassi at أما (or to a lesser extent the yṣaḥḥ at واوا). I think usex translits can also just be in whatever dialect the usexer is most comfortable using or transcribing, especially because I don't see a reason to want to change the translits for all the ajp usexes. Can we enforce the use of an accent qualifier for usexes and quotes, like the (Lebanon) at the bottom of عبكرة?

Also, that last part and the IPA business seems like it means it's worth sitting down and figuring out an acceptable set of discrete sub-accents/dialects to enforce consistent representations of, which should be a priority but not block the merger from

happening to start with. Still, when you think about it (talk) 17:59, 26 June 2025 (UTC)Reply

I agree with your points about translit.

Druze/Coastal Syria is already being represented in apc, and Bedouin & Galilee pronunciations in ajp. We could represent Fellahi accents too, and then for anything else have cities to represent them if needed (which Galilee already is doing).

دكتور

IPA^(key): /dokˈtoːɾ/, /dʊkˈtoːr/

Should Imāla be its own subsection? I think it should be with exception to Lebanon-only words like ڤيتاس.

I mentioned order before, should the subsections go alphabetically or do you think there's a better way to arrange them? No matter what, if we have multiple city specific pronunciations, those should definitely go alphabetically. Fayçalmf (talk) 11:40, 27 June 2025 (UTC)Reply

Does the order they're in really matter? Most words won't require more than 3 variations anyway Fayçalmf (talk) 02:35, 29 June 2025 (UTC)Reply

Actually, in terms of making a template, it does. How about the "standard," then Imāla, Druze/Coastal Syria (separated if need be), Bedouin, Fellahi, then anything else like hyperforeignisms or Galilee can be manually added underneath.

Fayçalmf (talk) 03:53, 30 June 2025 (UTC)Reply

Personal preference: no base form, each variant we list goes next to the others, but we put the more-urban options up top like you're doing here. Damascene, metro Lebanese, ?urban central Palestinian/Jordanian?, and then Druze, coastal Syrian, Beqaa/Qalamoun, Fellahi, and others? I found this classification of Palestinian and Jordanian dialects by Palva that may help decide on representative forms from down there, although it seems a bit outdated (on the one hand 1984 isn't at all long ago but on the other hand it says Galilean dialects predominantly preserve interdentals and /q/, which I know exists but I'm not sure it's predominant?).

I am wondering if we can get by without the imala tag. I see the merit in referencing the common name for the phenomenon visible in some pronunciations, but it'll also add clutter.

I'm admittedly dragging my feet on looking into botting the ajp->apc conversion but I believe that the only things we'll need in order to get started are that and maybe updated declension tables (since that infrastructure already exists). IPA pronunciations (since not much infrastructure already exists for them) can maybe be left as is to start with, with "Palestinian" appended to the current ajp accent quals and "Damascene" added as an accent qual for the current unlabeled apc pronunciations? Still, when you think about it (talk) 16:35, 1 July 2025 (UTC)Reply

I can do without a base form & leaving the translit to be the "generalised" pronunciation instead. The Imāla tag would essentially be the same as "metro Lebanese," so if we're doing the latter, we don't need the former.

I agree with the last part about adding quals. I think it would be helpful to have on articles before we get to manually adjusting thing. Fayçalmf (talk) 18:55, 1 July 2025 (UTC)Reply

+ We'll have to specify in the Levantine Arabic terms with /ɡ/ category that it's only for words that are pronounced with it in the majority of dialects. Otherwise, almost every word with ق would be viable to include. Also adding pre-existing ajp terms to the category that fit the criteria like جمبري, جول, أغورة, مزچان, etc.

Same with /p/ (i.e. دبرس) and /v/ (i.e. فيديو) Fayçalmf (talk) 19:02, 1 July 2025 (UTC)Reply

Sorry that I'm adding another topic on Pannonian after such a short interval, but whose idea was it to add nonvirile as a separate noun gender? None of the Pannonian dictionaries that I use specifically define nonvirile as opposed to masculine pluralia tantum. There's masculine p.t., there's feminine p.t., and neuter p.t. in Pannonian. Not even Czech or Slovak use nonvirile on here. Did someone follow the Polish model a little too hard? If anyone can show me specific and definitive Pannonian documentation that nonvirile is defined as a noun gender, then fine, but otherwise I'll be reverting all the existing NV nouns into their respective pluralia tantums. Insaneguy1083 (talk) 16:27, 19 June 2025 (UTC)Reply

@Insaneguy1083 Before you just revert everything, see who added them and ping them to get their views. Maybe they had some reason, maybe not. Benwing2 (talk) 17:37, 20 June 2025 (UTC)Reply

@Thadh Hi, I've removed the nonvirile noun gender for Pannonian Rusyn nouns, since none of the dictionaries I use specifically mention nonvirile as a gender as opposed to just pluralia tantum. Even череґи (čeregi), the noun which you specifically changed to be NV, is listed in the dictionary as, and I quote, ж. мн. (ž. mn.). And there are neuter pluralia tantum like уста (usta) which are listed as с. мн. (s. mn.) in the same dictionary. Czech and Slovak don't use NV on here either, nor any other Slavic languages outside of the immediate Polish-sphere. Insaneguy1083 (talk) 18:11, 20 June 2025 (UTC)Reply

Did you specifically ignore what I said? I said ping them before reverting. Benwing2 (talk) 20:51, 20 June 2025 (UTC)Reply

Well, I had already reverted before you sent the initial message. It's not as if there are that many NV nouns anyway. There's like 14 of them or something, if even that, and it's just a matter of changing a few characters in the rsk-noun template if there exists an actual justification to use NV as opposed to just pluralia tantum. Insaneguy1083 (talk) 21:05, 20 June 2025 (UTC)Reply

@Insaneguy1083, Benwing2: Unlike Czech and standard Slovak, Pannonian Rusyn and afaik Eastern Slovak have a completely different gender system, where masculine human nouns have a different inflection than masculine animate, masculine inanimate, feminine or neuter:

я жем желєного мужу // я жем желєних мужох

я жем желєного коня // я жем желєни конї

я жем желєни лимун // я жем желєни лимуни

я жем желєне яблуко // я жем желєни яблука

я жем желєну вишню // я жем желєни вишнї

Now, I don't know if you notice this, but this is exactly the same system as in Polish. And as in Polish, it is impossible to tell from agreement whether a plural-only noun is masculine non-human, feminine or neuter, except for its inflection class, where mixed classes are still present. Now, it's nice that the Rusyn dictionaries you use have decided on some arbitrary gender for these nouns, but unfortunately we should be able to document any Pannonian Rusyn noun, which includes those that do not have an earlier dictionary entry. Furthermore, just like in Polish, there is nothing that makes череґи inherently feminine rather than masculine or neuter unless a singular *череґа exists. The third-person singular pronoun is the same for all genders, as are verbal endings.

I would appreciate it if you did not unilaterally remove such things from modules without first understanding the motivation behind it. Thadh (talk) 23:43, 20 June 2025 (UTC)Reply

@Thadh: That adjectival declension separating masculine personal (i.e. virile) and all others was already implemented in rsk-decl-adj. And if you had checked the referenced 2010 dictionary, you'll find that there does, in fact, exist череґа (čerega). To quote directly from the 2010 Rusyn-Serbian dictionary:

череґи ж. мн. (єд. череґа) кул. листови, мафиши

As you can see, it does point out the existence of a череґа (čerega), which on Wiktionary we can decline fully using rsk-decl-noun-f. And personally, I feel like if a Rusyn dictionary, written by Rusyns, indicates a singular form with a specified gender, then maybe we should take their word for it and implement this word as a feminine noun (arguably not even pluralia tantum to be honest, more like a feminine noun that is chiefly in the plural).

I've read the 1997 dictionary's grammar section, and I've also read the entire nouns and adjectives sections of the 2005 edition of the dedicated Pannonian Rusyn grammar book Ґраматика руского язика (quote from page 35: &28.1. Меновнїки можу буц хлопского, женского и стреднього род. (&28.1. Menovnjiki možu buc xlopskoho, ženskoho i strednʹoho rod.)). By all indications, even Rusyns themselves writing about Rusyn grammar do not specifically differentiate a "non-masculine-personal" gender for any context, other than pointing out that the plural accusative form of adjectives have a different form based on whether the noun is masculine personal.

It's nice that you'd like to document any Pannonian Rusyn noun, "which includes those that do not have an earlier dictionary entry". But the 2010 dictionary is pretty comprehensive (other than proscribed colloquial words like да (da)), and gives a gender for every pluralia tantum. And I feel like specifying the gender, e.g. to harmonize with etymology and cognates in the case of уста (usta), is rather important even if the resultant declension is the same with say a feminine p.t. or masculine inanimate p.t.. Insaneguy1083 (talk) 08:28, 21 June 2025 (UTC)Reply

@Insaneguy1083: How don't you see that the fact череґа exist is the reason the noun is feminine? Not all plural nouns have a singular though, even hypothetically. Those are the nonvirile nouns. Thadh (talk) 08:48, 21 June 2025 (UTC)Reply

@Thadh: Bottom line, languages in the immediate Polish-sphere use nonvirile as a grammatical gender because it is specifically laid out as one (niemęskoosobowy) in the official Polish grammatical canon. For Pannonian Rusyn, nonvirile is NOT in itself defined as its own gender, there doesn't exist any *хлопскоособови (*xlopskoosobovi), and Rusyn dictionaries do as much to provide the gender of pluralia tantum like уста (usta) or дзвери (dzveri), even if the declension for non-masculine-personal nouns are all the same in the plural, even if there doesn't exist a singular form. It seems disingenuous (and frankly unnecessary) to group a series of nouns using the noun gender system of a completely different paradigm, just because there are perceived similarities to the Polish system. If Pannonian Rusyns themselves decide one day that they will start using the nonvirile classification and classifying nouns as such in their own dictionaries, fine. But for the time being, differentiating the adjectival declension using rsk-decl-adj seems very much sufficient to me to address the differences between virile and nonvirile nouns. I'm just following the official line here.

@Vininn126 @Sławobóg as someone who interacts more with Polish-related entries, what are your thoughts on this? Insaneguy1083 (talk) 09:36, 21 June 2025 (UTC)Reply

I think it's disingenuous to follow Slovak grammar (which the dictionaries in question in this case seemingly follow) to explain Pannonian Rusyn grammar. If a word is a pluralia tantum, not attested in the singular, and uses the same case agreement in the nominative and accusative, then it is simply not part of any gender other than "not masculine personal". There's no way to otherwise see what the gender is, and using etymology or other languages is not only not sustainable, it's dishonest. Thadh (talk) 09:48, 21 June 2025 (UTC)Reply

(Notifying RichardW57, Arafsymudwr, Llusiduonbach, Linguoboy, Silmethule, Brutal Russian, Mellohi!, Silmethule, AryamanA, Caoimhin ceallach, Exarchus, Mellohi!, Pulimaiyi, Victar): Although we have some lemmas in Cumbric in main space, the language is in fact totally unattested, only reconstructed. And it's not even reconstructed on the basis of an attested daughter language, but solely on the basis of place names in England and Scotland. Not enough is known about the language for us to say with any certainty how it differed from Proto-Brythonic, so I propose that we change Cumbric from being a full-fledged L2 language to being an etymology-only variant of Proto-Brythonic. Thoughts? —Mahāgaja · talk 07:15, 3 July 2025 (UTC)Reply

For what it's worth, Jackson (1994, Language and history of early Britain, 4th ed.) states that three words are definitely Cumbric: kelchyn, galnes/galnys, and mercheta. That brings to the argument we've had before here of whether we should consider these true Cumbric words or Latin words with Cumbric roots (because they occur in Latin texts). I'm not sure where we stand on this. —Caoimhin ceallach (talk) 22:50, 6 July 2025 (UTC)Reply

Oh, that's true. I had forgotten about those. I don't think we've ever come to a consensus about how to handle words in barely attested languages that are only mentioned (not used) in a text in another language. Personally, I'm willing to keep Cumbric as a full language for the sake of these four terms (three different lexemes). But I do still think that any other Cumbric words should be listed as (reconstructed) Proto-Brythonic rather than reconstructed Cumbric, as we just don't know enough about it to distinguish reconstructed Cumbric from PBr. —Mahāgaja · talk 08:21, 7 July 2025 (UTC)Reply

As I've stated in some earlier discussions, I am of the opinion that languages that are only attested through another language are unattested. An etym-only code seems fine specifically for these terms that seem to be borrowed from the language, but the attestation through another language also means the language is reshaped so much that any analysis becomes tricky.

On that note, I think we have a bunch of other languages in a similar situation. For instance CAT:Thracian lemmas and CAT:Dacian lemmas seem to be filled with reconstructions that are based on borrowings, even though the attested material is so scarce, that a good reconstruction seems very difficult. I would also like to treat such terms as basically substrate terminology. Thadh (talk) 11:38, 7 July 2025 (UTC)Reply

Initial reaction: I agree with Mahagaja's second comment, keep the code for those words. I suppose it depends on how those words are attested. A Latin author giving a short list of Cumbric words, or mentioning that the Cumric word for 'foo' is bar, is not so different from an English author giving a wordlist of some obscure language or other, but a Latin author saying the equivalent of "and the Cumbrians eat two bars every evening" is more debatable. As suggested on Talk:gangaba and mentioned at Wiktionary:Etymology scriptorium/2025/April#ναρί, I like the idea of an appendix for any such things that aren't included in mainspace. - -sche (discuss) 07:50, 25 January 2026 (UTC)Reply

Hello everyone, I propose that Balinese and Javanese (and other regional languages of Indonesia) be lemmatized in Latin script. Because both languages today are mostly written in Latin script by their speakers, the use of traditional script is rare and only used for special occasions, for example on certain road signs. Even almost all modern dictionaries of both support this, for example this Javanese dictionary and this official Balinese dictionary; also Google translator and wikis for both languages are written in Latin script too (unlike Hindi, Urdu or Bengali). Even if there is a concern that the use of Latin script is ambiguous, this can easily be resolved by using additional diacritical letters (e.g. "é", "è", "ò" for Javanese; or e.g. "é" for Balinese). Rentangan ^{(talk, contribs)} 08:39, 23 July 2025 (UTC)Reply

Tagging @Austronesier @Rex Aurorum, @Swarabakti, @Sponge2490, @Wiktionarian89 and @Udaradingin for opinions. Rentangan ^{(talk, contribs)} 08:42, 23 July 2025 (UTC)Reply

Totally agree. Udaradingin (talk) 12:36, 23 July 2025 (UTC)Reply

Agree. I think it's better if their entries is standardized along with other Indonesian languages. Acehnese and Sundanese main entries is already being written in the Latin script with diacritics. It makes sense for Balinese and Javanese to follow the same style. The regional script entry can be created with spelling template just like how Sundanese uses the {{su-hana}} template. Sponge2490 (talk) 10:15, 24 July 2025 (UTC)Reply

Agree per the above comments. If anything, it is Old Javanese and Old Sundanese that should be lemmatized in their (attested) non-romanized forms, though IDK how practical (?) that would be. Swarabakti (talk) 14:26, 24 July 2025 (UTC)Reply

Yeah, if the traditional scripts are really important, then they should be only lemmatized for the old(er) stage of the languages (for example, I think Classical Malay should be always linked in Jawi), although as you said before, it might not be practical sometimes. Tagging @Xbypass and @Suku Melayu for additional opinions. Rentangan ^{(talk, contribs)} 20:32, 24 July 2025 (UTC)Reply

I support lemmatization in Latin script. Suku Melayu (talk) 20:37, 24 July 2025 (UTC)Reply

After considering the Malay Arabic script aka Jawi (which limited vowel representation), I support lemmatization in Latin script for Malay entries. For the Classical Malay, while I prefer to have it in Malay Arabic script, I have no knowledge about how the language was (aka how the Malay Arabic script pronounced). Xbypass (talk) 08:46, 26 July 2025 (UTC)Reply

In case of Old Javanese, Old Javanese is written in Kawi/Old Javanese, Javanese, and Balinese scripts. While I prefers to make the Kawi/Old Javanese as main entry while makes others as soft direction entries, the Kawi/Old Javanese script has technical issue (as Kawi script was added to the Unicode Standard 15.0 in September 2022) on font and entry support, so lemmatization on Latin can act as practical temporary solution until Kawi/Old Javanese has no technical issue. Xbypass (talk) 08:53, 26 July 2025 (UTC)Reply

I agree, and I recall this has been brought up before. Personally, I don’t have much say in the matter since I'm not a native speaker, but I've noticed that there are some users who might be native speakers who strongly prefer using the traditional scripts as the main entry forms, which can make it difficult to reach consensus. Wiktionarian89 (talk) 02:26, 25 July 2025 (UTC)Reply

Tagging @Mrachmad59 for opinion. Rentangan ^{(talk, contribs)} 03:46, 26 July 2025 (UTC)Reply

Agree, and this will be easier for user from usability perspective. Mrachmad59 (talk) 03:20, 27 July 2025 (UTC)Reply

If it is lemmatised in Latin script with diacritics, the concerns about this idea are

1. The Latin standard is written without diacritics.

2. The Latin spelling template proposed, such as {{su-hana}}, does not redirect into specific sense, but to ambiguous page. Different senses have different and not interchangeable traditional spellings, but shared the same Latin spelling.

In regards of Indonesian, the Indonesian entry has such problem to differentiate the sound and makes such Indonesian entry clustered with different pronunciation and made such long winded entry. While I appreciate that standardized along with other Indonesian languages," I think this depends on the specific circumstances of the particular language.

If it is lemmatised in traditional script, while Latin entry uses Romanisation templates, the Latin script can redirect in specific traditional script entry and senses, while the entry does not become long winded entry.

As "the basic idea is very simple: you have one main entry with most of the information and lists of alternative forms, and you have multiple alternative form entries that are there mostly to link to the main form" and "good dictionary is characterized by its clarity, accuracy, and comprehensive coverage of language," hence I prefer the traditional script than the Latin script as it is keep the clarity of spelling in traditional while keep accuracy of redirection from commonly use Latin script. It is comparable to Wiktionary vote to keep Chinese entry in traditional script instead of commonly-used simplified (Wiktionary:Votes/pl-2014-12/Making simplified Chinese soft-redirect to traditional Chinese).

Nevertheless, I know that there is not many people who understand the traditional script but that can be the extra point of Wiktionary in comparison to the modern dictionaries. Xbypass (talk) 08:39, 26 July 2025 (UTC)Reply

Okay, I kind of see your points but still, if we really persist on using traditional scripts, it feels very weird, why? These languages are not like Bengal or Hindi where there is active significant usage of traditional scripts for them. And no, just because we lemmatized them (Indonesian regional languages) in Latin script, doesn't mean we just completely remove the alternate traditional script spellings. Also continuing to insist on lemmatizing in traditional script even though most speakers are only proficient in Latin script does not seem neutral and may only make it more difficult for native speakers who really want to look up words and definitions in a dictionary (I even think the soft redirection mechanism you mentioned earlier is impractical for such a language context). To overcome ambiguity due to the use of Latin, it is very easy, just use additional diacritical letters, and if there is a nonstandard diacriticless spelling then just create a new stub entry or etymology section and add for example {{nstd sp|..|..}}. Also, the template {{su-hana}} can be specified in meaning with the example {{su-hana|kolot|t=old}}... and yes I have to admit that these templates cannot redirect to a specific sense/etymology, but they could probably be improved in the future. The comparison you provided doesn't seem to apply to this case. Rentangan ^{(talk, contribs)} 10:50, 26 July 2025 (UTC)Reply

If it's written without diacritics, it can easily be replaced using the {{nstd sp}} template, as shown in an entry added by Zayn Kauthar. Alternatively, both languages can be lemmatized in Latin spelling without diacritics since most speakers don’t typically write with diacritics, while listing the standard form (with diacritics) under the 'Alternative forms' section, though this will result in an "entry clustered with different pronunciation and made such long winded entry".

Also, if the traditional script can’t currently be linked to a specific sense, the template can be modified to redirect to a sense marked with the {{senseid}} tag. Or we could just check which definition matches the traditional script listed in the head template. Sponge2490 (talk) 13:39, 26 July 2025 (UTC)Reply

If the template {{su-hana}} can be specified in meaning with the example {{su-hana|kolot|t=old}}, then it defeat the purpose of "the basic idea is very simple: you have one main entry with most of the information and lists of alternative forms, and you have multiple alternative form entries that are there mostly to link to the main form" as the traditional script entry have to maintain information (at least the sense).

As these templates at this moment does not give redirection to a specific sense in Latin script and I have not seen that "the template can be modified to redirect to a sense marked with the {{senseid}} tag" or "we could just check which definition matches the traditional script listed in the head template" in real situation, so I still prefer to lemmatise in traditional script. I know that "they could probably be improved in the future," but we hold discussion now but not in the future.

The point is diacritical letters is not the standard. The problem lies on maintaining clear convertibility between Latin and traditional script while keep single main entry. If lemmatization is done in Latin script, the traditional script entry have to include "sense" which break the single main entry rule. If lemmatization is done in traditional script (as Chinese entries), then no problem happens except for "practicality". Xbypass (talk) 00:07, 27 July 2025 (UTC)Reply

FYI, the Balinese {{ban-bali}} template already has support for the |id= parameter that can lead to a specific sense given that a {{senseid}} is included in the definition. For example, adding {{senseid|ban|honey}} in madu second definition while modifying the ᬫᬥᬸ definition with {{ban-bali|madu<id:honey>}} or {{ban-bali|madu|id=honey}} will redirect it to the honey definition and subsequently highlight it. Though I don't see similar functionality with the Javanese template. Sponge2490 (talk) 09:17, 27 July 2025 (UTC)Reply

Nevertheless, it resulted in long entry of madu and necessary definition addition in ᬫᬥᬸ such as {{m|ban|ᬫᬥᬸ|t=honey}}. Meanwhile adding {{senseid|ban|honey}} and {{ban-bali|madu|id=honey}} fails to give correct position of honey definition (instead of to fight) although it is correctly highlight the honey definition.

Hence, As these templates at this moment does not give redirection to a specific sense in Latin script and I have not seen that "the template can be modified to redirect to a sense marked with the {{senseid}} tag" or "we could just check which definition matches the traditional script listed in the head template" in real situation, so I still prefer to lemmatise in traditional script. Xbypass (talk) 11:52, 31 July 2025 (UTC)Reply

The traditional/simplified characters dichotomy in Chinese entries is not quite comparable to the traditional scripts/Latin orthographies dichotomy in the regional languages of Indonesia, as the latter have developed largely independent from each other. That is to say, the traditional script spelling conventions often have had little influence on their Latin counterparts, though for Javanese in particular there have been attempts to harmonize orthographies in both writing systems. In any case, it is rather misleading to speak of the Latin forms commonly used by speakers of Indonesian regional languages as "romanisations" of the traditional scripts.

Actually, I think the convention of lemmatizing the traditional forms in Chinese entries is the perfect analogy for the usage of {{nstd sp}} to soft-redirect entries with diacriticless Latin to the ones with diacritics; they are both basically the same writing system, with the former being "simplified" forms combining many of the latter forms.

Of course, as you said, there is still the problem that the use of diacritics in regional languages of Indonesia are hardly standardized, but at least for larger languages there have been official guidelines outlining spelling conventions published by language regulators, in addition to various dictionaries for smaller languages that may also be used to derive ad hoc orthographies (if necessary).

Also, I fail to see why would disambiguating the senses in the traditional script entries defeat the purpose of having the main entries with most (not all!) information at the Latin forms. It is not much different from e.g. the way Vietnamese chu Nom entries are directed to the relevant quoc ngu forms. Obviously, there are more to the entries than just simple glosses... Swarabakti (talk) 12:32, 27 July 2025 (UTC)Reply

...which does not happens in traditional script as it is pretty standardised, similar to Chinese one. However, Chu Nom is not more standardised than Latin orthography, which does not happens to Indonesian traditional script. In the case of chữ Nôm, there are three ways to write it in chữ Nôm (字喃, 𡨸喃, or 𡦂喃). So, Vietnamese has different problem in contrast of Indonesian traditional one.

However, the Vietnamese one is similar to the Old Javanese needs, so as I wrote "While I prefers to make the Kawi/Old Javanese as main entry while makes others as soft direction entries, the Kawi/Old Javanese script has technical issue (as Kawi script was added to the Unicode Standard 15.0 in September 2022) on font and entry support, so lemmatization on Latin can act as practical temporary solution until Kawi/Old Javanese has no technical issue." Xbypass (talk) 11:42, 31 July 2025 (UTC)Reply

So, just because of a technicality of the templates, we suddenly ignore the fact that speakers of these languages use the Latin alphabet as their everyday, commonplace script? I don't think that's fair, and labeling Latin spellings as "romanization" creates the false impression that they're used like Thai or Burmese. Come on, even dictionaries out there, including the official ones, use the Latin alphabet. I must say it again, we can still add traditional script spellings when relemmatizing entries to the Latin script. Rentangan ^{(talk, contribs)} 00:41, 28 July 2025 (UTC)Reply

Well, you ignore the problem that these relemmatizing entries in Latin did not solve the clarity of conversion from tradition script to Latin and vice versa. So, come on, people want clear dictionary... not just another Latin one. Xbypass (talk) 11:34, 31 July 2025 (UTC)Reply

I know that you want a unique dictionary that utilizes both Latin and traditional script to resolve this kind of ambiguity, and it's also elegant right? Just like how I was (and probably still) stubborn about Indonesian transitive verbs (although, I decided to not bother with this topic anymore; anyone can freely edit with any Indonesian verbs) but the more I consider, I think it's best to stick with common usage, right? And yeah, I shouldn't have jumped to certain decisions too quickly. Rentangan ^{(talk, contribs)} 11:48, 31 July 2025 (UTC)Reply

Well, Wiktionary, to be honest, has advantage in this kind of convertibility issue which is sacrificed in other dictionaries. Lemmatization in Latin orthography makes this capability go to waste. However, I agree to your proposition that people wrote these language in Latin and we shall accommodate this in Wiktionary. Hence, I have a suggestion to accommodate this, while lemmatization is done in traditional script, the Latin entry is allowed to have pronunciation entries (IPA, homophone, etc) and the entry headers uses the normal one (not the -form one and romanisation) and the definition uses soft redirection of romanisation.

So, the entry of the madu page will goes like this

==Balinese==
===Pronunciation===
* {{IPA|ban|/ma.du/}}
* {{rhymes|ban|du|s=2}}
* {{hyphenation|id|ma|du}}
===Noun===
{{head|ban|noun}}
# {{romanization of|ban|ᬫᬤᬸ}}
# {{romanization of|ban|ᬫᬥᬸ}}
===Verb===
{{head|ban|verb}}
# {{romanization of|ban|ᬫᬤᬸ}}

Xbypass (talk) 12:10, 31 July 2025 (UTC)Reply

I don't like this compromise, it still gives false impression that the languages are used like Thai (and not even close to Hindi) even though they're always written in Latin orthography by native speakers. If we actually want lemmatization in a traditional script, apply it for the old stage of the languages instead, not the modern one. Also, @Mrachmad59, a native Javanese speaker, agrees with lemmatization in Latin. Rentangan ^{(talk, contribs)} 05:30, 3 August 2025 (UTC)Reply

While I don't like that proposal to lemmatise in Latin as it is unclear. Do you think that Wiktionary lemmatization in Traditional Chinese meant that majority uses Traditional Chinese? Xbypass (talk) 03:00, 6 August 2025 (UTC)Reply

No, many Chinese speakers also competent in writing the traditional characters. And, do you remember when you were linking Malay entries in Jawi? Sorry, I suspect that the only reason you prefer lemmatization in traditional scripts because of personal desire. If you still want to edit Wiktionary, why don't create audio pronunciations or check entries in the Category:Indonesian entry maintenance. Rentangan ^{(talk, contribs)} 04:05, 6 August 2025 (UTC)Reply

Sorry @Xbypass if I sounded harsh. So, you're welcome to contribute in any area, even in both languages. I must admit that sometimes your edits can be very helpful, but please use Latin orthography (and again, we can still provide the traditional spelling), and besides, do readers come here more interested in having entries written in traditional script than Latin?

To address your concern, what if ambiguity arises if Latin orthography is used? If you mean to distinguish a particular vowel (e-é, for example), then we could standardize the spelling directly—it's more legible to readers than the traditional script—or if we don't want to use Latin orthography with diacritics, we could simply mark the diacritics in the headword. What if your intention is ambiguous because it would "make it difficult" for readers to find the intended meaning for terms with diverse etymologies? Let's pause and think twice: is that the only reasoning we use to prioritize traditional scripts (of which many native speakers are incompetent) over Latin? After all, almost all readers, if they're genuinely curious about a term, they would scroll from the top to the end of the entry and find the intended meaning. Isn't it a good thing to have very long entries (in a reasonable way), because we're indirectly entertaining readers with a wealth of information that might otherwise be overlooked?

We should reconsider that as responsible editors. I want this dictionary to be accessible to a reasonable audience; I don't want it to treat these languages like museum pieces. And don't forget that we're talking about Javanese and Balinese (and other regional languages in Indonesia), not Thai or Hindi. Because all native speakers are proficient in Latin script today, but only a small minority are truly literate in their traditional scripts. Rentangan ^{(talk, contribs)} 03:28, 11 August 2025 (UTC)Reply

No, many Chinese speakers also competent in writing the traditional characters.

Of course, many Chinese speakers can write in Traditional, but most Chinese writing is done in Simplified.

To address your concern, what if ambiguity arises if Latin orthography is used? If you mean to distinguish a particular vowel (e-é, for example), then we could standardize the spelling directly—it's more legible to readers than the traditional script—or if we don't want to use Latin orthography with diacritics, we could simply mark the diacritics in the headword.

Hence, for resolving inconsistency and ambiguity in the Latin orthography, then Wiktionary add "new standard" that only used in Wiktionary. Personally, I see this as adding more unclarity. We should reconsider that as responsible editors. Basically, lematising in Latin orthography will result in a entry with multiple traditional orthography with soft direction in traditional orthography have to include the sense which has to be maintained.

What if your intention is ambiguous because it would "make it difficult" for readers to find the intended meaning for terms with diverse etymologies?

That has been accomodated in homophone templates.

Let's pause and think twice: is that the only reasoning we use to prioritize traditional scripts (of which many native speakers are incompetent) over Latin?

Moreover, as "many native speakers are incompetent", it adds to importance for maintaining in traditional orthography as Wiktionary has capability to maintain clear conversion from traditional orthography to Latin orthography, but that is not the main reason.

After all, almost all readers, if they're genuinely curious about a term, they would scroll from the top to the end of the entry and find the intended meaning.

Sure, if they are curious, they will click or hover over the link for the definition. Nevertheless, soft redirection from traditional orthography needs to maintain senses in the soft redirection page, while soft redirection from Latin orthography does not.

We should reconsider that as responsible editors. I want this dictionary to be accessible to a reasonable audience; I don't want it to treat these languages like museum pieces. And don't forget that we're talking about Javanese and Balinese (and other regional languages in Indonesia), not Thai or Hindi. Because all native speakers are proficient in Latin script today, but only a small minority are truly literate in their traditional scripts.

That is another reason to do lemmatisaation in traditional script as it is unlock the traditional script to wider readers, but that is not the main point. The main point is the traditional orthography (for Javanese and Balinese cases) is more consistent and clear than the Latin orthography, as a good dictionary should be comprehensive and easy to read, containing definitions that are clear and up-to-date, hence it shall be lemmatised in the clearest orthography (in Javanese and Balinese language, it is traditional one). Xbypass (talk) 23:41, 11 August 2025 (UTC)Reply

Again, the Latin orthographies are not mere conversion of the traditional scripts (or vice-versa for that matter—unless you're following KBJ '92 Javanese script orthography). There is nothing unclear about having the traditional script entries soft-redirect to the relevant Latin ones, with disambiguating glosses for different senses if necessary. In fact it would probably be more helpful for readers to find out the different etymologies (and thus different traditional script spellings) for terms that are homonymic in Latin, if they are discussed at once in the Latin entries.

One more thing to consider: while languages of Indonesia are not considered well-documented on the internet for the purpose of WT:ATTEST, forms in traditional scripts are especially harder to find quotations for. If traditional script forms remain the lemmas, we also need to clarify whether quotations of the Latin forms (in any spelling) suffice to verify these lemmas. Even then, IMO it would be pretty confusing to have quotations for entries with traditional script headwords given in their Latin forms. Alternatively, we can separate quotations by forms, but this is potentially even more confusing since most quotations available for these languages will likely be found in Latin entries instead of the main traditional script entries.

If we lemmatize these languages in the Latin forms, we can provide Latin quotations (which are much more readily available) under the same entries as definitions and other information. Of course, many of the Latin materials available are not following the exact same standardized orthographies (neither are actual attested texts in traditional scripts btw). But this can be solved by having them respelled if necessary (cf. Pabaru Cina and anggur sempani). I'd say that respelling quotations in the same writing system is still more intuitive to readers than having them used to verify forms written in different scripts altogether. Swarabakti (talk) 15:21, 31 July 2025 (UTC)Reply

All quotations should be in one place. If you have quotations in Aksara Jawa, they should be in the lemma entry, no matter if that lemma is in Latin script. Suku Melayu (talk) 15:27, 31 July 2025 (UTC)Reply

Adding disambiguating glosses in soft redirect page mean that we maintain both entries, which defeat the purpose of "the basic idea is very simple: you have one main entry with most of the information and lists of alternative forms, and you have multiple alternative form entries that are there mostly to link to the main form". Xbypass (talk) 03:04, 6 August 2025 (UTC)Reply

I'd like to put my two cents to the discussion.

I've almost never used the Sundanese script in my day-to-day life. While it's great that Wiktionary have entries for the Sundanese script variants of words, having them as the main entry while relegating the more widely used Latin script simply as "Romanisation" can be convoluted and discouraging to people who wants to use the website. I can't say the same for Javanese but as someone who's native Sundanese, I think it's better for the Latin script to be the main entry rather than the Sundanese script, same reason why Malay entries doesn't use Jawi as their main. Zayn Kauthar (talk) 13:59, 26 July 2025 (UTC)Reply

Should we start a vote for this matter? (also ping @User:DDG9912, @User:Ekirahardian). Rentangan ^{(talk, contribs)} 12:05, 31 July 2025 (UTC)Reply

@Rentangan: I think yes, but where we will put this vote? DDG 9912 12:08, 31 July 2025 (UTC)Reply

This should be uncontroversial. Overall the fact we still have Qiangic languages as an actual group is a bit iffy. Thadh (talk) 18:30, 26 July 2025 (UTC)Reply

Hi, I would like a code for the Luri language family encompassing CAT:Northern Luri language, CAT:Bakhtiari language, and CAT:Southern Luri language as it's sometimes not specified further in etymologies. (also needed in descendants) Saam-andar (talk) 09:32, 10 August 2025 (UTC)Reply

@Benwing2 Can you add this? Saam-andar (talk) 09:48, 12 August 2025 (UTC)Reply

@Saam-andar I think this needs more discussion given that Wikipedia unequivocally asserts that Luri is a single language, not a family. I'm not sure there was even a discussion concerning whether to add the separate Luri lects as L2 languages; someone may have added the languages out of process. WT:LT has no mention of Luri anywhere. Benwing2 (talk) 03:46, 13 August 2025 (UTC)Reply

@Benwing2 Sorry for the late response, but most point to it being 2-3 languages, [18] [19] and are apparently often mutually unintelligible.

Two other things:

Judeo-Tat (jdt) currently descends from fa-cls, and Tat (ttt) from fa, when both should descend from fa-ear [20]
Saranamd informed me that Judeo-Persian (jpr) doesn't make sense as an L2, since it is just Classical Persian written in Hebrew, (and most JP entries are already under the Persian L2). There is Early Judeo-Persian, which could replace the current L2, maybe like this:
- Late Middle Persian (pal-lat) V
  - Early New Persian (fa-ear) V
    - Caucasian Tat F
      - Tat (ttt)
      - Judeo-Tat (jdt)
    - Classical Persian (fa-cls) V
      - Judeo-Persian (jpr) V
      - [others]
  - Early Judeo-Persian (jpr-ear)
    - ~~Judeo-Persian (jpr) Variety of fa-cls~~

Saam-andar (talk) 13:24, 16 August 2025 (UTC)Reply

Lumping Judeo-Iranian as mere etymology-languages is valid, I and Saranamd already took the stance on Wiktionary:Beer parlour/2023/December § Deprecate Judeo-Persian?, there were but too few Iranian editors to take any measures. Fay Freak (talk) 17:31, 17 August 2025 (UTC)Reply

Only Judeo-Tat remains, which could also be made a variant of Tat, as many sources support a linguistic geographical distinction rather than a religious one. Saam-andar (talk) 11:41, 18 August 2025 (UTC)Reply

@Saam-andar IMO it doesn't make a lot of sense for Early Judeo-Persian to be an L2 but Judeo-Persian to be an etym variety, unless Judeo-Persian is not a descendant of Early Judeo-Persian but something else entirely. Benwing2 (talk) 20:30, 18 August 2025 (UTC)Reply

Yeah, I probably misrepresented that.

<Judeo-Persian> is Classical Persian in the Hebrew script, while <Early Judeo-Persian> is a different dialect at the time of Early New Persian. Saam-andar (talk) 20:44, 18 August 2025 (UTC)Reply

OK that makes sense. How distinct is Early Judeo-Persian from Early New Persian? Is it distinct enough to merit an L2, and how many words do we have attested in this language? Benwing2 (talk) 00:10, 19 August 2025 (UTC)Reply

EJP developed from dialect(s) different from the Khorasan dialects of ENP which in turn developed into the current New Persian.

According to this (pp. 58-59) it shows more features with Middle Persian rather than ENP (Although it's similar to some ENP dialects in the south like in the Quran-e Qods)

The EJP corpus is about 600 pages ([21] p. 241) Saam-andar (talk) 11:13, 19 August 2025 (UTC)Reply

As mentioned above by @Babr, the spelling Sauraseni instead of Shauraseni or Śauraseni is misleading. I don't have a strong opinion as to which of those two it should be. @Svartava Exarchus (talk) 16:00, 17 August 2025 (UTC)Reply

Apparently we already give the spelling "Śaurasenī", for example at 𑀅𑀅𑀁 Exarchus (talk) 16:16, 17 August 2025 (UTC)Reply

Ditto for Category:Kasmiri Apabhramsa and Category:Maharastri Apabhramsa then. I suppose we have a slight preference for the Unicode forms, one major consideration being that h in the English digraph sh tends to be read as an aspiration sign in these words, and the Indian editors are actually more annoyed by it than I am. This historical linguistics subject is so specialist that common usage – which is also largely restricted to drive-by mentions in lexica we copied our language lists from originally – can hardly be regarded as weighty. Fay Freak (talk) 17:21, 17 August 2025 (UTC)Reply

ditto again for Paisaci Prakrit, IDK why the Prakrit's are spelt like that but, they should use digraph's or diacritics. Until I saw the Wikipedia page, I genuinely thought Shauraseni/Śauraseni was pronounced with a [s-]. — BABR・kurwa? 17:53, 17 August 2025 (UTC)Reply

@Babr, Exarchus: This was raised before: Category talk:Prakrit language#Renaming a few lects but there was a dispreference against using ‘sh’ and ‘ch’. Additionally, WT:LANGNAME specifically advices against using diacritics in the canonical name. – Svārtava (t ɕ) 09:58, 18 August 2025 (UTC)Reply

Lots of languages use diacritics. We even have the ǁAni language.

Or Pará Gavião, ancestor: Proto-Northern Jê. Exarchus (talk) 10:44, 18 August 2025 (UTC)Reply

Hmm, renaming to the diacriticized spelling makes sense then. @Kutchkutch: Thoughts? – Svārtava (t ɕ) 11:13, 18 August 2025 (UTC)Reply

There were only a couple of people who disliked sh and ch and not for good reasons IMO. I would much prefer the use of sh and ch to diacritics, in accordance with WT:LANGNAME. Wikipedia also uses sh in e.g. w:Shauraseni Prakrit. The use of ch for a palatal affricate is fairly standard in Indian city and language names already. The use of diacritics in some really obscure language names sometimes does occur, esp. for languages that are normally spelled in the Latin script with diacritics, but that doesn't really apply here and I would advise against it. Benwing2 (talk) 20:20, 18 August 2025 (UTC)Reply

Those with ü only I found are. Lü, Khün, Mündü, Wichí Lhamtés Güisnay, Mün Chin, Natügu, Sabüm, Tai Nüa, Tübatulabal, San Pablo Güilá Zapotec , Güenoa, Volapük, Nüpode Huitoto. There are as many with ö and a few with ä.

Note also the contradiction with Ashokan Prakrit.

Some day the concern with typing Unicode will be wholly irrelevant, when we won't type the language name into L2 anymore but fetch by templates, as other Wiktionaries do. Fay Freak (talk) 14:32, 18 August 2025 (UTC)Reply

Wikipedia also uses sh in e.g. Shauraseni Prakrit […] The use […] is fairly standard in Indian city and language names already.

In general, Wikipedia usage should not be used as a measure of how common a particular romanisation is especially for understudied languages.
It is appropriate to anglicise Indian city and language names because they are everyday words used by ordinary people.
However, the names of Prakrit lects are not everyday words used by ordinary people.

I would much prefer the use of sh and ch to diacritics […] There were only a couple of people who disliked sh and ch and not for good reasons IMO.

sh, ch; they are being used by the government [as Hunterian transliteration], … not in linguistic works … Sanskrit and Prakrit are well-established English words, whereas the names of the Prakrit lects are more recent transliterations.
“h” followed by a consonant is interpreted as an aspiration sign (even if “s” itself is not aspirated) rather than being a digraph.
The issue with “h” can even be seen in the language name “Kutchi”, which is clearly anglicised. This name could confusingly be spelled as “Kacchi” with a single “h” for aspiration or “Kachchhi” with a doubled “hh” for aspiration.

in accordance with WT:LANGNAME … The use of diacritics in some really obscure language names sometimes does occur, esp. for languages that are normally spelled in the Latin script with diacritics

Even if Latin script is not a canonical script for Prakrit lect, WT:LANGNAME does not definitely rule out diacritics in languages names cannot have diacritics.
In this case, there is no single prevailing common English name.

@Babr: I genuinely thought Shauraseni/Śauraseni was pronounced with a [s-]

“s” and “sh” can be merged as a single sound in many contexts.
Thus, the “ś” in “Śauraseni” could be pronounced as either “s” or “sh” depending on the speaker’s background even if it is etymologically “sh”.
Furthermore, the “Śauraseni” lect itself does not have the “sh” sound, so using the digraph “sh” is potentially misleading.
This historical linguistics subject is so specialist that common usage [as “sh”] […] can hardly be regarded as weighty..

The names of Prakrit lects is confined to history, linguistics, Jainism and other scholarly fields that prefer IAST transliteration over anglicisation.

@Fay Freak: The contradiction with Ashokan Prakrit is because Ashok is a common male given name, and Ashoka is a well-known historical figure with the English adjectival form Ashokan.

@Exarchus, Svartava:

Therefore, renaming to the diacriticized spelling of “Sauraseni” seems to be more appropriate compared to “sh” even if having diacritics in the canonical name is generally not preferable.
“ś” would serve as a compromise between both the “s” and “sh” variants in addition to being a variant is that is used in English.

Ditto for Category:Kasmiri Apabhramsa and Category:Maharastri Apabhramsa

Kashmiri is an established term in English with several senses (even though the Kashmiri language is not descended from Category:Kasmiri Apabhramsa), so this situation would be comparable to “Ashokan Prakrit”.
However, “Maharashtri” is not an established term in English as the adjectival form of Maharashtra. The English adjectival form of Maharashtra is Maharashtrian (see diff). User:Equinox probably created the entry for Maharashtri (which has no non-linguistic referent) by looking at Wikipedia.

Kutchkutch (talk) 13:22, 19 August 2025 (UTC)Reply

For the record I don't think anyone got your ping, but I think you are being overly pedantic.
Furthermore, the “Śauraseni” lect itself does not have the “sh” sound, so using the digraph “sh” is potentially misleading.
English generalizes sounds all the time, most languages don't have an "r" sound, that's not an argument that we shouldn't use an "r" in transliteration. Using diagraphs in transliteration is extremely common, and while scientific works tend to prefer single letters, many of them do use diagraph as well. It's not as crazy as you are claiming it is. But on that note, I'm not necessarily opposed to using diacritics (buts it's not my preference), I would just like to change the name to literally anything else that's more clear. — BABR・kurwa? 05:35, 3 September 2025 (UTC)Reply

"I would just like to change the name to literally anything else that's more clear."

my thoughts exactly Exarchus (talk) 07:17, 3 September 2025 (UTC)Reply

We already have Appendix:Proto-Ainu reconstructions. There are also a number of (ill-formatted) Ainu etymology sections referring to Proto-Ainu (see Special:Search/insource:"Proto-Ainu"). – wpi (talk) 11:23, 31 August 2025 (UTC)Reply

I see that Church Slavonic (zls-chs) has been created as an L2 language distinct from Old Church Slavonic (cu). Any objection to adding etym-only variants for the different recensions? I encountered a Serbo-Croatian term described by Matasović as having a cognate specifically in Russian Church Slavonic (Church Slavonic достизати (dostizati) vs. Serbo-Croatian dostizati) and it would be nice to have a corresponding etym-only code rather than having to write Compare Russian {{cog|zls-chs|...}}. (Maybe zls-chs-RU or zls-chs-ru or zls-chs-rus? Per Wikipedia there's also an Old Moscow recension.) Ping @Sławobóg @AshFox @ZomBear, @Bezimenen, @IYI681, @Thadh @Vininn126 as some people who may have opinions about this and/or be able to list out the recensions that are deserving of etym-only codes. Benwing2 (talk) 04:12, 10 September 2025 (UTC)Reply

Fine by me. In general I find ety-codes to be safe, but these in particular as well. One reason they might not have been is that there was debate if some should be L2's. Vininn126 (talk) 08:01, 10 September 2025 (UTC)Reply

I don't object--IYI681 (talk) 07:16, 10 September 2025 (UTC)Reply

I was already against creating a single Church Slavic code in the first place. It makes no sense to have Church Slavic separate from Old Church Slavic but still handle the different Church Slavic recensions under one code. Doesn't make anything easier, just increases clutter and difficulty.

By the way, "Old East Slavic Church Slavonic" is about as vague as can be and needs a very thorough description making it distinct from Old East Slavic itself, since borderline cases are now treated as Old East Slavic (basically third-person verbal endings are the major difference between the two from what I can tell). Thadh (talk) 09:44, 10 September 2025 (UTC)Reply

I raised this topic 6 months ago. Many supported the addition of etymological codes for Church Slavonic, but that was the end of it. AshFox (talk) 01:20, 11 September 2025 (UTC)Reply

Just a list I wrote on the Discord server:

zls-chs-orv Old East Slavic Church Slavonic
- zls-chs-ru Russian Church Slavonic ✅
- zls-chs-uk Ukrainian Church Slavonic ✅
zls-chs-cs Czech Church Slavonic
zls-chs-ro Romanian Church Slavonic
zls-chs-bg Bulgarian Church Slavonic
zls-chs-mk Macedonian Church Slavonic
zls-chs-hr Croatian Church Slavonic
zls-chs-sr Serbian Church Slavonic

AshFox (talk) 01:26, 11 September 2025 (UTC)Reply

Is the term Old East Slavic Church Slavonic used anywhere in scientific journals? Unsure about that. Otherwise Czech, Croatian, Macedonian CS and such are pretty different. Chihunglu83 (talk) 04:17, 13 September 2025 (UTC)Reply

OK, since we seem to have a consensus (with one dissenter), I added etym-only codes for the Russian, Ukrainian (aka Rusyn, Belarusian) and Old Moscow recensions. The remainder from Czech Church Slavonic down to Serbian Church Slavonic are commented out for now (using HR for Croatia and RS for Serbia, consistent with their official country codes), as they are not well-described in Wikipedia (except for Serbian Church Slavonic, which is described in a confusing fashion) and don't seem to have Wikidata codes. I entirely left out Old East Slavic Church Slavonic and Old Ruthenian Church Slavonic pending clarification of whether these really exist and are used in scholarly journals. Benwing2 (talk) 04:56, 13 September 2025 (UTC)Reply

@Benwing2 hi, could you make these Etymological codes in the form of a tree? Because these recensions go from one to another.

Old East Slavic Church Slavonic zls-chs-orv — 10th‒14th century (RusWiki)
- Old Moscow Church Slavonic zls-chs-omo — 14th‒15th century (RusWiki, EngWiki)
  - Russian Church Slavonic zls-chs-ru — 16th / 17th century ‒ present (RusWiki, EngWiki), other names: "Synodal Church Slavonic"
- Ukrainian Church Slavonic zls-chs-ua — 14th‒18th century / present (UkrWiki, RusWiki, EngWiki), other names: "Kiev Church Slavonic", "Ruthenian Church Slavonic".
- Belarusian Church Slavonic zls-chs-be — 15th‒17th century (UkrWiki, RusWiki)

Here is the most precise scheme of development of all East Slavic revisions of the Church Slavonic language. If you want precision on Wiktionary, this is it. But if you consider such detail unnecessary, then you can skip some revisions (I will tell you which ones, just tell me). I also suggest being consistent and using codes in lowercase letters zls-chs-RU/zls-chs-UA ➜ please change to zls-chs-ru/zls-chs-uk. AshFox (talk) 17:53, 13 September 2025 (UTC)Reply

If this tree is too redundant, it can be reduced/combined to the 3 most common recensions:

Old East Slavic Church Slavonic zls-chs-orv — 10th‒14th century
- Russian Synodal Church Slavonic zls-chs-ru — 16th century ‒ present
- Ukrainian Church Slavonic zls-chs-uk — 14th‒18th century

AshFox (talk) 18:01, 13 September 2025 (UTC)Reply

@Benwing2, in fact it seems doubtful to single out a separate code Old Moscow recension... Firstly, it was for a very short time, ~200 years. Secondly, it has little difference from the modern Russian (Synodal) Church Slavonic ‒ list of differences. The existence of a separate code for the Church Slavonic of the Eastern Slavs of the 10th-14th centuries is much more reasonable. I already mentioned it in Chihunglu83's answer, but the Russian Wikipedia has a separate article about the "Old East Slavic Church Slavonic" with dates and its distinctive features. Regarding "Ruthenian Church Slavonic"... under what code can we unite Belarusian Church Slavonic and Ukrainian Church Slavonic. Belarusian recension similarly Old Moscow recension, was also short-lived and there are no distinctive features anywhere. Which cannot be said about the Ukrainian recension, which is still partly used today.

There are three main East Slavic recension of Church Slavonic: "Old East Slavic (aka Old Russian)", "Russian (aka Synodal or New Moscow)" and "Ukrainian (aka Kievan)". The other two are very small: "Old Moscow" and "Belarusian". AshFox (talk) 20:07, 13 September 2025 (UTC)Reply

@AshFox OK I am happy to remove "Old Moscow Church Slavonic". My main concern about "Old East Slavic Church Slavonic" is essentially the same issue brought up by @Thadh: if this variant existed from the 10th to 14th centuries, it overlapped substantially with Old East Slavic itself and OCS, so (a) should it instead be considered a variant of OCS not CS, and (b) is it distinctive enough from Old East Slavic to have its own code? Benwing2 (talk) 21:24, 13 September 2025 (UTC)Reply

@Benwing2 please add at least 3 more etymological codes that are currently first in line for necessity:

zls-chs-cs Czech Church Slavonic (EnWiki), aka Moravian-Czech Church Slavonic
zls-chs-hr Croatian Church Slavonic (EnWiki)
zls-chs-sr Serbian Church Slavonic (EnWiki)

Controversial and, for now, second-in-line recensions:

zls-chs-orv Old East Church Slavonic
zls-chs-ro Romanian Church Slavonic, aka Wallachian-Moldavian Church Slavonic
zls-chs-mk Macedonian Church Slavonic
zls-chs-bg Bulgarian Church Slavonic (?)

Currently, the first 4 (Czech, Croatian, Serbian, Old East) are actively used/mentioned on Wiktionary. AshFox (talk) 14:29, 18 September 2025 (UTC)Reply

@Chihunglu83 yes. Here is the article on Russian Wikipedia, the correct title: Древнерусский извод церковнославянского языка = which is literally "Old East Slavic Church Slavonic". The difference from the modern Russian Church Slavonic is the presence of reduced ones. This is simply Church Slavonic with (Old) East Slavic elements. Compare:

"Real" Russian Church Slavonic, it has no reduced sounds and is very modern. Its main source is "Большой словарь церковнославянского языка Нового времени" (just created a module for it Module:bibliography/data/zls-chs). If you see that some word is labeled "Russian Church Slavonic", but at the same time there are reduced sounds and the word forms themselves are archaic ‒ this is in fact the "Old Russian Church Slavonic" = "Old East Slavic Church Slavonic". The term "Old Russian" is obsolete, therefore it is replaced "Old East Slavic". AshFox (talk) 16:51, 13 September 2025 (UTC)Reply

@AshFox: Both OES and OCS could have влъкъ as a variant: OES graphically (metathesis happened quite often with these, possibly because it was more-or-less pronounced as a syllabic liquid) and OCS due to vowel harmonic tendencies. So such a feature would not be sufficient.

{{RQ:orv:IS2}} can be said to be Church Slavonic with East Slavic features but we on Wiktionary consider it OES since the difference between Church Slavonic with East Slavic features and Old East Slavic with Church Slavonic features is almost non-existent. Thadh (talk) 06:02, 14 September 2025 (UTC)Reply

"Russian Church Slavonic" of the Rus times (10-14th century) and "Russian Church Slavonic" that emerged in the 16/17th century and is currently used in Russia are completely different recensions of Church Slavonic. They cannot be labeled with the same code. Here is a dictionary of the new Russian (Synodal). It has a completely different spelling than that of the Kievan Rus times. There are no reduced sounds (except for the final ъ by tradition), strong reduced sounds have all gone into ъ/ь>о/е, weak reduced sounds are not written at all ъ/ь>Ø, non-etymological use of the letters ѧ/ѫ, use of new letters ї/й, etc. In order to have the opportunity in (!) exceptional cases to indicate a Church Slavonic word of the 10-14th century in East Slavic area, the code of the modern Russian (Synodal) will not work, it will be wrong. Therefore, I believe that in exceptional cases the code for "Древнерусского извода церковнославянского" (Old East Slavic Church Slavonic) is still needed for the ancient period. If we don't have a separate code for East Slavic Church Slavonic then it turns out that we will ignore the existence of Church Slavonic in the region of Rus' (modern Russia/Ukraine/Belarus) until the 14th century. AshFox (talk) 07:23, 14 September 2025 (UTC)Reply

@AshFox: I never proposed that the CS from the middle ages be treated the same as the modern Russian CS, I'm just saying it's pretty difficult (or even impossible) to distinguish between the East Slavic CS in the middle ages and Old East Slavic. You still have not adressed that. Thadh (talk) 09:37, 14 September 2025 (UTC)Reply

Moved from Wiktionary:Beer_parlour/2025/September#Category:Linear_A_language

Is this redundant to Category:Minoan language? Should it be deleted and its subcats reconfigured to be subcats of Minoan? —Justin (koavf)❤T☮C☺M☯ 02:38, 14 September 2025 (UTC)Reply

@Koavf Could you move this discussion to WT:LTR? BTW it definitely sounds like "Minoan language" and "Linear A language" refer to the same thing but since Ethnologue/ISO 639-3 accepted codes for both, it would be enlightening to see the justification for creating whichever one was created later. Benwing2 (talk) 20:15, 14 September 2025 (UTC)Reply

—Justin (koavf)❤T☮C☺M☯ 20:18, 14 September 2025 (UTC)Reply

Confusing label, claiming language status of a script, as if Cretan hieroglyphs were surely of a different language. Result of a technical mixup, to be deleted. Catonif (talk) 17:15, 16 September 2025 (UTC)Reply

I haven't spotted any documentation (in the usual places on the SIL website) explaining the ISO's addition of Minoan. The request forms that led to them adding Linear A are extremely brief, and cite two references (one of which calls the language Minoan), though the person applying for a code for Linear A does say: "There is no code that even approximates the language that Linear A represents, nor is there any code with which it may be confused. Linear B is used as a synonym for Mycenaean Greek, but Linear A, a related script, writes a language which is clearly completely unrelated to Greek." (Whether this person was actually aware of the Minoan code is not clear to me.) One possible argument for a Linear A language code is that we need to mention it in situations where using the script code does not work: for example, in an entry like 𐀲, the main point to be made in the etymology is that the Linear B symbol 𐀲 is similar to and almost certainly derived from the Linear A symbol 𐘳; what language happened to be using that symbol seems to be of secondary importance; but a language code has to be used, not a script code. Also, the inability to decipher it means, as Wikipedia mentions, that "it is not certain that the texts are all in the same language": "Minoan" is a placeholder for the one or more languages the script is assumed to encode. Nonetheless, I saw that 𐀲 and most of the other symbols just used the Minoan code, so I tentatively updated the 6 entries which were using the Linear A code (𐀷, 𐀱, 𐀇. 𐀁, 𐠀, 𐠊) to also use the Minoan code, before wondering whether that was ultimately the right direction to go in. I'm not sure which code it would be best to keep: "Linear A" is the name of a script moreso than a language, and "Minoan" is kind of a placeholder moreso than a language (though some names are known through comparison with Linear B). I wonder if they should just be treated as und ==Undetermined==, like e.g. Buyla inscription stuff, but that seems silly when there's a conventional name and code available... - -sche (discuss) 07:08, 16 January 2026 (UTC)Reply

Hi every one, here are my requests:
1. Split Southern Altai into ‘Altai’, Telengit, and Teleut;
2. Split Northern Altai into Kumandin, Tubalar, and Chelkan;
3. Make Olguya Ewenki a distinct language. All new languages have unique grammatical features. We need independent inflection tables. Now we‘re forced to add every entry of these languages as a ‘dialectal form’. LibCae, or ‘Lithuanian Lime’ 14:44, 22 September 2025 (UTC)Reply

@LibCae Whether you need to create more than one inflection table for a given form is not a criterion for splitting a language. Plenty of languages do this just fine. The issue is, are these generally considered separate languages by the relevant academic communities, or dialects of the same language? AFAICT, the consensus of Wikipedia, Glottolog and Ethnologue is that Southern Altai and Northern Altai are each a single language, not several languages. Based on this, I would

Oppose this proposal. Benwing2 (talk) 06:09, 24 September 2025 (UTC)Reply

Morphologically Buriat and Kalmyk should not be considered as descendants of Classical Mongolian. Both preserved final -n from Middle Mongol, in contrast of -n hidden in Classical Mongolian, e.g. MM usun > CM usu, but Buriat uhan, Kalmyk us°n.

Budaev (1992, Бурятские диалекты, pp. 36–37)’s list showed us lexical similarities between Buriat and Kalmyk. Oirat chronicles mentioned Buriats were part of them. I suggest a code for the new term ‘Oiratic’ being the ancestor of Buriat and Kalmyk and a descendant of MM.

Classical Mongolian chronicles from steppe dumas already clarified that written CM had been introduced into Transbaikalia quite late. And there is even little relation between Cisbaikalian dialects and CM. LibCae (talk) 11:15, 29 September 2025 (UTC)Reply

@Theknightwho We need to discuss this. LibCae (talk) 11:16, 29 September 2025 (UTC)Reply

According to Lieberherr (2015), Puroik is not "one language", but multiple; the Bulu and Chayangtajo/Sanchu "dialects" have little-to-no mutual intelligibility. Thus, Puroik on Wiktionary should be split into separate languages and "Puroik" (suv) proper be redefined as a family. Following Lieberherr's study, we should split Puroik into at least 3 languages:

Bulu Puroik (suv-bul)
Kojo-Rojo Puroik (suv-krj)
Chayangtajo Puroik (suv-cht)

If this split goes ahead, Category:Proto-Puroik language should have its code changed from sit-khp-pro to suv-pro. — mellohi! (Goodbye!) 00:09, 10 October 2025 (UTC)Reply

@AryamanA, Thadh, -sche Pinging for thoughts. — mellohi! (Goodbye!) 00:13, 10 October 2025 (UTC)Reply

Not opposed per se, but I'd like to see more than once source making this claim, to establish some sort of consensus in the field, rather than just trusting what one source says. Keep in mind that mergers are a lot harder to execute than splits, so splitting a language is akin to a one-way-door decision (once you make it, it's hard to undo). Also, is there anyone actually working on this language/group or are you just proposing this for theoretical correctness? Benwing2 (talk) 06:36, 10 October 2025 (UTC)Reply

I made a few entries a while back. I think the split makes sense, and, as Thadh mentioned, reconstructing Proto-Puroik is non-trivial. —Aryaman^A ^{(मुझसे बात करें • योगदान)} 00:00, 12 October 2025 (UTC)Reply

If we have Proto-Puroik it only makes sense to have multiple Puroik languages. But I'm not very familiar with this language group. Thadh (talk) 14:32, 10 October 2025 (UTC)Reply

I have recently talked with administrator @Polomo about this. My initial suggestion here is to split Kariri into its attested varieties, but only as etymology-only languages. This is mainly because I am not yet convinced treating them as fully independent languages is the best solution here, and I am apparently the only one interested.

Most recent works seem to treat them as separate languages rather than dialects of a single one. This is especially true for the best attested varieties, Dzubukuá (a catechism and a manuscript with some sentences) and Kipeá (a grammar and a catechism). Bernard de Nantes, author of the works in Dzubukuá, drew a parallel between their linguistic differences and those between Portuguese and Spanish (“the Kariris called Dzubukuá whose language is as different from that of the Kariris called Kipeá as Portuguese is from Castilian”). As for Pedra Branca and Sabujá, they are attested only in vocabularies collected by von Martius (see here and here, with some additional words scattered throughout the work).

As an example, the cognate of English maize is recorded as madiki (Dzubukuá), masichí / masikí (Kipeá), and mosiccih (Pedra Branca) and maschicöh (Sabujá). Here, although they are cognates, madiki apparently actually meant manioc, while masichí really meant corn. Splitting them into etymology-only languages would make it easier to handle such cases in the Etymology section rather than Alternative forms. In the entry masichí, for example, we could have: “Ultimately from Proto-Arawak *marikɨ (‘corn’) as a Wanderwort. Cognate with Dzubukuá madiki, Pedra Branca mosiccih, and Sabujá maschicöh.” Something similar happens with badzé, which in Kipeá is attested with the meanings tobacco and to divine, but in Dzubukuá Badze appears only as the name of a god.

I propose the codes kzw-dzu, kzw-kip, kzw-ped, and kzw-sab.

(P.S. 1: As for masichí, there is even a reconstruction of this term for Proto-Kariri: *masiki. I am not suggesting the creation of Proto-Kariri for now though, since that is the only reconstruction I have found.) (P.S. 2: There are also three vocabularies recorded in the past century, but from a time when no closely related languages were still spoken. For now I prefer to disregard these, except perhaps for occasional mentions in the Etymology section when they help clarify a meaning, as done in yacá, where these vocabularies seem to show a semantic shift occurred from fox to dog.) (P.S. 3: I have been using here and hopefully henceforth for Dzubukuá the orthography from the appendix of Queiroz's (modern) grammar, and everywhere for Kipeá I follow Mamiani's instructions, which are easy to grasp.) Yacàwotçã (talk) 05:25, 19 October 2025 (UTC)Reply

Pinging @Trooper57 in case they're interested. Any suggestions are welcome. Yacàwotçã (talk) 05:26, 19 October 2025 (UTC)Reply

How would having these etymology-only codes help? You can’t exactly say a word in “Kariri” is cognate with one in “Pedra Branca”. What you may want is a similar treatment to (also Chinese), which are listed under one heading... I don't really know why.

Then you could lemmatize, say, Kipeá, and list the other languages’ words as alt forms. But I’m not sure this would work, given, say, madiki never meant manioc, only its alt form.

In practice, I believe this could only be handled by having multiple lemma pages for each set of cognates, and at that point they should be under different headings any way. We have a separate code for languages like São Paulo Kaingáng.

In any case, probably best to look into other possibilities. Hope someone can explain how Prakrit is organized. — Polomo ⟨⁠ ⁠oi!⁠ ⁠⟩ · 14:59, 19 October 2025 (UTC)Reply

@Polomo I'm not sure if the languages you mentioned are really comparable to this case. I was thinking of Coptic where cognates of different varieties are treated as alternate forms, but it seems to me as a layperson that those are actually much closer to each other than the Kariri varieties, and also much better attested. Anyway, just so I understand are you suggesting splitting them into separate languages? Or are you just questioning my idea so we can refine it better? Personally I'd be fine with treating them as independent languages (and would make my life considerably easier) but I wonder if there's any issue with that or if reverting to the current state later would be too much of a hassle. Yacàwotçã (talk) 05:52, 20 October 2025 (UTC)Reply

Yeah, I’m suggesting treating them as separate languages. It can’t get much clearer than when an Italian missionary in the early 18th century recognizes it. Also, it seems this comparison with Portuguese and Spanish is pretty common... that guy Couto de Magalhães used it for Nheengatu vs. Guarani. — Polomo ⟨⁠ ⁠oi!⁠ ⁠⟩ · 10:41, 20 October 2025 (UTC)Reply

I also

Support treating them as separate languages, they seem different enough. Trooper57 (talk) 19:07, 23 October 2025 (UTC)Reply

Support adding them as etymology-only languages at a minimum. I would also support treating them as separate full languages, but I may not be aware of the technical problems that may cause. Apparently, it's a lot easier to split than it is to merge (I don't know why; ask Benwing2 if you want an explanation). 0DF (talk) 19:52, 23 October 2025 (UTC)Reply

Polomo, given your statement and that of Trooper57, I agree with the division as separate languages. When you do so, if you indeed do, please let me know so that I can adjust the current entries. Thanks, Yacàwotçã (talk) 10:17, 26 October 2025 (UTC)Reply

I lack the technical knowledge of what needs to be done in these cases. Either Mr. ’Wing or another of our more technical editors needs to pitch in. — Polomo ⟨⁠ ⁠oi!⁠ ⁠⟩ · 14:58, 26 October 2025 (UTC)Reply

@Polomo @Yacàwotçã I have no issue adding etymology codes for the different varieties, but if we are proposing a full L2 split I feel that needs a bit more discussion, especially given that all these lects are extinct are some are known only from single word lists. Pinging @-sche for thoughts. Benwing2 (talk) 20:41, 3 November 2025 (UTC)Reply

Yeah, actually, are there no modern-day studies on this matter? It seems pretty reasonable to say Kipeá and Dzubukuá are separate languages even without them, but for the remaining two it’s more iffy. — Polomo ⟨⁠ ⁠oi!⁠ ⁠⟩ · 20:52, 3 November 2025 (UTC)Reply

It gives me pause that Wikipedia says they are "generally considered dialects of a single language"; if you are correct that more recent works tend to regard them as separate languages, it'd be good to update our sibling project. After that... when he was active, Metaknowledge took (and I was persuaded to accept) the view that for extinct wordlist-only languages, it can actually be tidier and more conservative to give each one its own (full) code, if there's no clear case that they are the same language (and if they are mostly referred to under their separate names in literature, rather than as one language) : that way, people can just enter and link to each one under the name it's most known by, rather than entering it under an umbrella language name + labels, and even if we later decide to (re)merge them, the number of words in most of them being so small and well-defined means that shouldn't be nearly as difficult as merging e.g. large living languages. - -sche (discuss) 02:37, 6 November 2025 (UTC)Reply

Makes sense. I guess it depends, as you said, on whether there's general consensus that it's the same language being referred to in different wordlists. Benwing2 (talk) 02:55, 6 November 2025 (UTC)Reply

@-sche, Benwing2, if not an inconvenience, could you then add them as etymology-only languages, at least for now—on an experimental basis? I am not very mentally motivated to revive this discussion, and there seemed to be consensus on this as the minimum measure. I have the impression this would make the management of the Alternative forms section easier, as it would then be reserved for historical forms within the same language (which is especially useful for Dzubukuá, with its somewhat irregular orthography, but also for Kipeá), while cognates would, appropriately, remain in the Etymology section.

If you would like an example, see madiki / masichí / mosiccih / maschicöh. It is one of the very few words attested in all four varieties. All occurrences of masichí in Kipeá appear with a different spelling (although Mamiani’s grammar standardized it quite well); in any case, there are certainly two distinct forms, /masit͡ʃi/ and /masiki/. The approach so far, rather conservative (only in badzé), has been to present these historical forms in a table, but I think it would generally be more appropriate to place them directly in the Alternative forms section.

All this, of course, is simply to say adding them as etymology-only languages would make my life easier. Perhaps this will make more sense later. I can let you know if you wish. Jacaguoçãrana (talk) 04:23, 27 February 2026 (UTC)Reply

Add "Early Modern Portuguese" as an etymology-only language, like we already do for Early Modern English (en-ear) and Early Modern Spanish (es-ear); macau is surely one term that comes to mind. Trooper57 (talk) 04:08, 26 October 2025 (UTC)Reply

Support. leixar and the dixer verb forms come to mind (though we have entries for neither right now). — Polomo ⟨⁠ ⁠oi!⁠ ⁠⟩ · 15:15, 26 October 2025 (UTC)Reply

Though I’m not sure how this is supposed to be added in an entry. Some English entries have [[Category:Early Modern English]] and only label the term as “obsolete”; others have {{lb|en|EME|obsolete}}; others yet have just {{lb|en|EME}}. — Polomo ⟨⁠ ⁠oi!⁠ ⁠⟩ · 15:20, 26 October 2025 (UTC)Reply

I'd only label as Early Modern, obsolete is already implied. Trooper57 (talk) 15:29, 26 October 2025 (UTC)Reply

But then it won’t categorize under Category:Portuguese obsolete terms/forms; that cat would need to be manually added. It seems to me like it’s most sensible to add both labels, actually. — Polomo ⟨⁠ ⁠oi!⁠ ⁠⟩ · 16:20, 26 October 2025 (UTC)Reply

You can make it add both categories with a single label in Module:labels/data/lang/pt. Trooper57 (talk) 16:23, 26 October 2025 (UTC)Reply

Then it wouldn't work with "obsolete forms" hmmm. Trooper57 (talk) 16:31, 26 October 2025 (UTC)Reply

Support. This would help standardize the weird spellings from the 16th–17th centuries. MedK1 (talk) 19:34, 26 December 2025 (UTC)Reply

@MedK1: I think the category already suffices that, the etymology-only code would be for borrowings. Trooper57 (talk) 19:45, 26 December 2025 (UTC)Reply

@Benwing2, AshFox, Sławobóg, Vininn126

Recently, Church Slavonic has been split off from Old Church Slavonic. In my opinion, the current state of affairs is a tangled mess that is worse than the original situation:

Pre-split we just called all Church Slavonic varieties 'Old Church Slavonic'. Agreed, that naming is probably not the best, and could be changed, but the handling of these various lects was still consistent - we had a defined set of language varieties that were closer to each other than to other languages under one L2.

What we have now is the following: Canonical Church Slavonic has been split off, with the rest still under one L2 under the name 'Church Slavonic'. There are many problems with this:

The canonical language is almost or completely identical to the language of many following centuries of Church Slavonic, and so basically the entirety of the canonical OCS can be duplicated into CS.
The modern varieties are more distinct from each other than they are from (canonical) OCS.

This makes the split both arbitrary and more of a mess than before the split happened. To make an analogy, imagine we had Belarusian, Carpathian Rusyn and Ukrainian starting from the 1500s under one L2 named 'Ruthenian'. The CS split was basically like splitting off Old Ruthenian specifically from 1500 to 1550, while keeping Belarusian, Carpathian Rusyn and Ukrainian as well as Old Ruthenian from 1550 to the 1800s as one single language.

So, I propose either of the following two solutions:

(the easiest option) We just re-merge all Church Slavonic varieties into one L2, name it 'Church Slavonic', and proceed with finding regional/temporal labels for anything we want.
(the better, but more complex option) We actually do the homework and find out what varieties are distinct enough from each other to warrant their own L2, leading to multiple CS varieties (likely OCS, RusCS and SerbCS, with a well-defined cutoff date from OCS to the different modern varieties).

Since in the last few years nobody has tried to do the latter, and I personally definitely don't have time for that, I suppose the former option is more realistic, and we should probably go ahead with it before the time we can split the language properly. Thadh (talk) 11:48, 27 October 2025 (UTC)Reply

@Thadh I

Support either one of your options; the split out of OCS seems similar to splitting Classical Latin out from "all other Latin varieties", which doesn't make much sense since "all other Latin varieties" is not a clade and Classical Latin is closer to Late Latin than Late Latin is to Medieval Latin. Benwing2 (talk) 20:45, 3 November 2025 (UTC)Reply

@Thadh

Support The Russian (Synodal) recension "... is a highly codified and standardized tradition of Church Slavonic..." (page 7 [5]), "...has very rigid typographic and orthographic norms..." (page 40 [38]). See "The Use and Pronunciation of the Letters" (page 23). I don't mind having a separate heading for it. ПростаРечь (talk) 15:38, 17 December 2025 (UTC)Reply

Oppose. 1 - I don't see how is CS + labels better than OCS-CS split + labels. 2 - It's an overkill, we dont need 10 вода or богъ CS lemmas. Having separate lemmas for canon and non-canon is good enough. I don't think anything should be changed. Sławobóg (talk) 21:31, 9 November 2025 (UTC)Reply

I won't let my vote count on an issue of languages I don't edit, but for what it's worth I am also a firm believer that a good labelling infrastructure under a single header would be the best solution in terms of both practical management and (trusting your judgement) linguistic accuracy. Splitting, with all its benefits, still seems a nearsighted decision, and the earlier we revert it the less work will you guys have to do. To avoid the issue of "polluting" the original corpus, which as far as I understand is one of the main reasons for the split of Middle Armenian, there should also be some categorisation in place to contain terms which are attested already from canonical OCS. Catonif (talk) 16:47, 30 November 2025 (UTC)Reply

Following Halfmann (2021) - Terminological Proposals for the Nuristani Languages and Halfmann (2024) - A Grammatical Description of the Katë Language (Nuristani), I propose renaming Kamkata-viri (the designation used by Richard Strand) to Katë. Kwékwlos (talk) 20:22, 25 November 2025 (UTC)Reply

EDIT: The Nuristani languages shouldn't be classified into "Northern" and "Southern" branches; each language should be a primary branch as per Halfmann (2021). Kwékwlos (talk) 22:51, 27 November 2025 (UTC)Reply

As explained in Drechsel 2008 (available for download here: [22]), Mobilian "proper" and Mobilian Jargon—misleadingly—are not the same languages. The former was a (poly)synthetic vernacular of uncertain origin spoken by the Mobile people in the environs of Mobile Bay. The latter was an analytic pidgin and lingua franca with a mostly Choctaw/Chickasaw lexicon spoken widely throughout the modern American South as a second language. Wiktionary's nascent coverage of "Mobilian" to this point is entirely made up of Mobilian Jargon entries. I propose a rename for the sake of accuracy and to prevent confusion in the future. In my view, "Mobilian Jargon" is the most appropriate option, as the term is well-represented in the literature and much more widely recognizable than any of various endonymic variants (e.g. Yama, Yamá, Yamma, etc.). Monsuu (talk) 21:31, 30 November 2025 (UTC)Reply

(Notifying KamiruPL, BigDom, Hythonia, Tashi, Sławobóg, Silmethule, Rakso43243, Darellur): and also @Benwing2, @Thadh I just finished adding information on standard Kurpian, see w:Kurpie dialect.

I am writing this thread not to suggest splitting Kurpian, but rather just to get consensus on how to handle it.

Reasons not to split:
Despite sharing a lot with Masurian, there are still some major linguistic differences, where several additional sound changes in Masurian are not present in Kurpian, and mutual intelligibility still remains high. Furthermore, Kurpians consider themselves Polish and are bi-ethnic (Masurians consider themselves tri-ethnic). Also, Polish dialects are already considered LDL's, so attestbility is not an issue. I have a copy of {{R:pl:Gadomski:2022}} that I'd be able to use as a source.

Reasons to adapt this standard as it is:
This standard has seen fairly widespread success in the region, and numerous publications using this orthography and grammar have been made, including the little prince, some non-translated books, and an online and print dictionary.

I might not personally be a fan of this orthography, but it does see use, and if we want to be descriptive, we have to describe was is used.

What I propose:
I propose to create several Kurpian specific templates, to be suffixed with -kur after pl for alternative forms/spellings and inflection. The inflectional templates may take some time to set up, but given all the leveling that has happened, it should be more straightforward.

We will also have to update MOD:zlw-lch-IPA, but I'm not sure the current orthography causes that maybe problems, strangely enough. A few extra letters and we might want to change how decomposed bilabials syllabify (this will also be an issue for Warmian and Masurian). Either way, test cases should be made.

I also propose to normalize all existing Kurpian entries (not just those in CAT:Kurpie Polish, but those with |kur= in {{pl-pr}} as well) to the current orthography. I wouldn't be surprised if all existing entries have an entry in Gadomski's dictionary, anyway. Vininn126 (talk) 10:59, 8 December 2025 (UTC)Reply

Just for full transparency, this standard sees use mostly by Związek Kurpiów, a local organization, but even that is way more actual publication in any standard than basically any other Lechitic dialect other than Silesian or Kashubian (barring standard Polish, of course). Vininn126 (talk) 13:53, 9 December 2025 (UTC)Reply

Minor update - Gadomski has given permission (as far as I can tell by his response) to create even a large number of entries based on his dictionary. Vininn126 (talk) Vininn126 (talk) 09:37, 3 April 2026 (UTC)Reply

Solresol is a constructed language that has 7 phonemes based on the solfège: do, re, mi, fa, sol, la and si. There is an online Solresol dictionary that has many words listed. Netizen3102 (talk) 01:32, 10 December 2025 (UTC)Reply

Currently, sw and bnt-swh both expand to “Swahili”, which is confusing. Could bnt-swh be renamed to something else? Like “Swahili languages” or “Macro-Swahili”. I think the first term is most common on the internet, but personally I prefer the second. @Tbm, Smashhoof, HeavenlyAestheticist for opinions. MuDavid 栘𩿠 (talk) 07:40, 10 December 2025 (UTC)Reply

Given the lack of opinions after almost a month, could someone (@Benwing2, Theknightwho, -sche) take a decision? MuDavid 栘𩿠 (talk) 03:38, 6 January 2026 (UTC)Reply

@MuDavid Sorry, this is the first time I've seen this. This is not the only case by any means where a family has the same name as a language. For example, we have the Arabic language (a language), the Arabic languages (a family) and the Arabic script. The word "languages" appears automatically in certain contexts, such as the appropriate category (Category:Swahili languages). We could rename it (Swahilic? Macro-Swahili?) but I would oppose doing so if the only reason is to make it different from Swahili; we need to be using the most common name for the family, and it "Swahili languages" is it, that's how it should be named. Benwing2 (talk) 03:53, 6 January 2026 (UTC)Reply

As User:Benwing2 says, this is not the only case like this, and while in some cases the family has other just-as-common names than can be used to avoid ambiguity, in other cases there is no less ambiguous name to use. I agree it's not ideal that something like "From Swahili" in an etymology section could mean either the language or the family, and elsewhere on this page (raising this same general issue) I suggested that perhaps we should change our templates / modules like {{der}} etc so that, either in the case of any and all family codes (which would probably be the easiest thing to code), or at least in the case of family codes that have the same names as language codes (which I believe we can detect because I believe WT:FSCK already does this or something similar to this, detecting that e.g. "Literary Chinese (lzh-lit) has a canonical name that is not unique; it is also used by the code lzh"), the template would output something that would make the distinction between e.g. {{der|en|sw}} and {{der|en|bnt-swh}} clear: perhaps "From {{der|en|sw}}" could produce "From Swahili" whereas "From {{der|en|bnt-swh}}" could produce "From a Swahili language" or "From one of the Swahili languages" or something? - -sche (discuss) 23:34, 8 January 2026 (UTC)Reply

@-sche There is the immediate issue at hand and there's a larger issue that some languages, families, and/or scripts might want to display differently in different contexts. This comes up currently with "translingual" (mul), "taxonomic name(s)" (mul-tax) and "undetermined" (und), for example, which variously need to show up capital or lowercase and singular or plural; similarly with the scripts "flag semaphore" and "Morse code". There's some hacky code in various places like Module:etymology#L-138 to deal with this, and other hacks in other places, but it needs to be rethought. For this specific case, we'll run into the capital vs. lowercase issue with almost any replacement for bare "Swahili" or "Indo-European", so we need to deal with this more generally. But in this case, do you think it makes more sense to change things generally or just when the family matches the name of a language (which is easy to detect because there's a mapping from language names to codes and we just have to look up in this mapping to see if a given language name exists)? So for example, which of the following for From {{der|en|ine}}.?

From Indo-European.
From an Indo-European language. [ignore the fact that we'll have to handle the a vs. an issue; that's not hard]
From the Indo-European family.
From the Indo-European family of languages.
From Indo-European languages.
From the Indo-European languages.
From one of the Indo-European languages.

If you have an answer for that, does it work as well with e.g. the "Albanian languages" family?

From Albanian.
From an Albanian language.
From the Albanian family.
From the Albanian family of languages.
From Albanian languages.
From the Albanian languages.
From one of the Albanian languages.

My instinct BTW is that "From one of the Indo-European languages" might be misleading because a borrowing that is known to be Indo-European but can't be further identified might come from a mixture of more than one language; this is especially common with Middle Iranian borrowings into Old Armenian, which is why we have the horrid "etymology families" Middle Iranian languages and Old Iranian languages, even though neither is a clade. OTOH for a small family like Albanian, "From one of the Albanian languages" might be perfectly reasonable. I suspect "From the Foo family [of languages]" will work in all cases, though. Benwing2 (talk) 00:18, 9 January 2026 (UTC)Reply

Re whether to change this for all families or only for ones where the name is ambiguous, I don't have a strong preference, so hopefully other people will comment if they do; otherwise, my instinct is to change as little as possible — so, only change things for families that have the same name as languages, but leave things like "Indo-European" as-is — because for every case where we change what a code displays, we have to check all the current etymologies using that code, which may have been written with the current behaviour/display in mind (e.g. "from a {{der|ar|zhx}} language" in طوفان would need to be changed before we made {{der|ar|zhx}} itself display something like that), and changing just a few families thus requires less work than changing every family. But maybe some people think it would be preferable for all families to display the same way, consistently; hopefully more people weigh in. At first blush, I like "an Albanian language", which seems like a natural phrasing, but if there are situations where we're saying "{{der|en|FOO}}" and meaning that something is from more than one of the FOO languages simultaneously, something like "the Albanian languages" or (wordier) "the Albanian family of languages" may be more robust / correct in more situations, as you say. (I trust that "from" will not be part of the text the language code itself generates, because we need to be able to interpolate wording like "a native name in" or "a term for 'horse' in" between the "from" and the family name; in e.g. typhoon we mention a specific term, "{{der|en|zhx}} {{zh-l|大風|tr=-}}", so we need some way of writing something like "from a term {{zh-l|大風|tr=-}} in {{der|en|zhx}}" to produce something like "from a term 大風 in a Sinitic language" or "from a term 大風 in one of the Sinitic languages" or "from a term 大風 in the Sinitic family of languages", whatever we decide.) - -sche (discuss) 02:31, 9 January 2026 (UTC)Reply

@-sche Indeed, the "From" is text added by the user; it wouldn't make sense for the template to generate it, as you note. Using anything like "an Albanian language" (or any of the options other than "Albanian" or "Albanian languages") brings up potential casing issues, because some people write just {{der|en|ine}}. as the etymology. My instinct for handling that is to introduce a cap=1 param to capitalize the first letter of the output if people really want to write {{der|en|ine}}.; IMO it's better to write From {{der|en|ine}}.

BTW it didn't occur to me that there are uses like "from a {{der|ar|zhx}} language" but now that you point this out, I'm leaning towards changing only the cases where there's ambiguity, like with "Arabic" and "Swahili". Benwing2 (talk) 02:45, 9 January 2026 (UTC)Reply

Since Terengganu Malay (poz-ter) and Sarawak Malay (poz-sml) are already available, I would like to request for adding code for Pahang Malay and Perak Malay. Here's my suggestion:

Pahang Malay - poz-pah
Perak Malay - poz-per

Thanks in advance. Mirlim (talk) 17:27, 10 December 2025 (UTC)Reply

We have noticably flat structure for the ca. fifty Cushitic languages so far. I have been developing a research interest in this field for the last few years, and I think we could use several updates to proto-language and subgroup treatment. For now I will avoid trying to cite every point in detail, but feel free to ask for any that seem contentious. I imagine some points here should go over without trouble, while others are just meant to open a discussion.

For review, our current protolanguages:

Proto-Cushitic, cus-pro: a dozen-ish entries and much more in Appendix:Proto-Cushitic reconstructions.
- Suggestion: Keep, but establish inclusion criteria for reconstruction entries.
There are not many primary sources covering all of Cushitic. Our current main source {{R:cus:Ehret 1987}} should not be presumed reliable, if an etymology is not found in any other reference. Regardless other sources exist such as Dolgopolsky 1973, Lamberti 1986, Bender 2020; also none of these by themselves reliable. Overlap between these has to date not been systematically reviewed by any Afroasiaticist or Cushiticist to my knowledge, but this would probably be generally enough to establish reliability, as would be appearence in any work on sub-branch etymology. Perhaps even appearence in general Afroasiatic etymology works (when not from the same author — both Ehret and Dolgopolsky being also known for proposing their own PAA reconstructions).

Some researchers doubt the general validity of Cushitic, including me, but that's neither here nor there as long as the discussion is ongoing. Any work we do on this could be later on just moved to Proto-Afroasiatic entries if the maximal skeptics turn out to be right and there was no Proto-Cushitic; while if it turns out some rump sense remains valid, then that would mainly mean rewriting some reflexes as being likely loanwords.
- - Oppose. Since lower-branch reconstructions are mostly still in their preliminary phases and as you mention all of the main sources for Proto-Cushitic are unreliable, it is very difficult to come up with a single overarching reconstructed model to base our entries on so that they are coherent. While we promote original research, I'm afraid a reconstruction of Proto-Cushitic basically from nothing is a bit outside of the scope of this project. Thadh (talk) 07:37, 15 December 2025 (UTC)Reply
I find from review that all the individual models have a large amount of agreement in terms of sound correspondences per se. Most disagreement is in finer details of reconstruction, and on all the primary sources having their own contentious sets of marginal etymologies (that would allegedly demonstrate some of the former details). But the about 30–50% of the material that is of better quality has agreement on it, and it seems to me this agreement is enough to claim also reliability for the existence of an etymon. The hairier questions would be on how would we reconstruct cases where people agree that it's an etymon but not on what to reconstruct for it, but they are not all that common. (Mainly this just applies to the gaggle of proposed front emphatics like *tʼ *tsʼ *čʼ *tɬʼ *ɗ *ʄ.) --Tropylium (talk) 17:39, 15 December 2025 (UTC)Reply

~~Proto-Somaloid, cus-som-pro: a half a dozen entries, many more could be sourced from literature.~~
- Suggestion: Keep, no major updates needed, though I might quibble about some entry formatting details eventually.

~~Proto-Highland East Cushitic, cus-hec-pro: a handful of entries, many more could be added.~~
- Suggestion: Keep, but will require entry guidelines eventually. The major source Hudson 1989 lacks basic groundwork in comparative phonology and probably includes many arealisms (he is happy to propose even obviously anachronistic reconstructions like 'rifle' or '(chili) pepper').
  - Support (also as creator), but I think the major question is when Proto-HEC was spoken. The languages seem similar to such an extent that a proto-language within the last five hundred years does not seem unthinkable, so it's a good question which terms are actually anachronistic and which are simply due to the young age of the branch. In any case, the overall correspondences identified by Hudson seem fine as a starting point. Thadh (talk) 07:37, 15 December 2025 (UTC)Reply
    On the contrary I find Proto-HEC at 500 BP to be clearly absurd. Lexicostatistic coherence is in the 50–60% range, which is suggestive of millennia at least; and since these languages remain a close areal, much of the current similarity could be late convergence. Blažek has a glottochronological spitball date of 3300 BP (but maybe overblown depending on what's the deal with Burji, which clearly has at least a lot of convergence to Oromo). --Tropylium (talk) 17:39, 15 December 2025 (UTC)Reply
    Okay, reviewing Hudson again it seems I did misremember how similar the languages were. I would definitely disagree with 3300 BP however. Thadh (talk) 17:58, 15 December 2025 (UTC)Reply
    I don't believe Wiktionary needs any official position about these sort of things, but my rough personal impression is that if we cleaned off all the obviously numerous Oromo loanwords and other such secondary reasons for divergence, it has about the same or slightly lower effective diversity as e.g. West Germanic or North Finnic (or, I guess, Ethiopian Semitic…) and suggests an age in a 2000–1000 BP bracket. --Tropylium (talk) 18:34, 15 December 2025 (UTC)Reply
Striking as kept, then. - -sche (discuss) 22:28, 5 February 2026 (UTC)Reply

Proto-South Cushitic, cus-sou-pro: a handful of reconstructions that will require cleanup.
- Suggestion: Rename to Proto-Rift, and, perhaps, also rename our South Cushitic, cus-sou to Rift.
The six languages we have currently under this group are conventionally considered to form the "Rift" languages, while "South Cushitic" is strictly speaking the hypothesis (Greenberg, Fleming, Ehret) combining Rift, Ma'a and Dahalo — of which the 2nd is now generally agreed to be Bantu with copious Cushitic loanwords, and the 3rd has no consensus of if the Eastern-like or Rift-like layer of lexicon is the native one (accordingly we currently have it as an independent 5th branch of Cushitic). Several other substrates to Bantu, some with independent names like "Taita" or "Tale", have been also assigned to South Cushitic however, and maybe "South Cushitic" should be maintained as a synonym.

The only work on this maximal Proto-South Cushitic is {{R:cus:Ehret 1980}} which is widely considered to be of poor quality. Ehret however reconstructs Proto-SC and Proto-Rift proper as separate nodes. The latter has tentative agreement in literature on this being at least a real subgroup, has some work by others too, and to my impression even Ehret's work on it is less likely to contain obvious nonsense. See below for the further West Rift subgroup.
- - Support renaming to Rift and re-purposing the proto-language, although I don't know which standard works on Proto-Rift there are, so perhaps it's a good idea to create some kind of overview what we'll base our local model on. Thadh (talk) 07:37, 15 December 2025 (UTC)Reply
    There is no "standard Proto-Rift" really, it's at the stage of some number of Proto-West Rift reconstructions having also obvious Aasax and Kw'adza cognates, often simply identical or only with trivial innovations. We'd need hands-on case review of Ehret to decide how much of his work we want to take at face value when something more complicated happens. --Tropylium (talk) 17:39, 15 December 2025 (UTC)Reply
Elsewhere on this page, there is a section concerned with standardizing the names of proto-languages and their families, Wiktionary:Language treatment requests#harmonizing families and proto-languages, and other proto-language warnings, so I would be reluctant to introduce a difference in the naming of the proto-language (Rift) vs the family (South Cushitic) unless it is truly most common in literature to name the family and its proto-language two different things. AFAICT from Ngrams and Google Scholar, the two forms are not that far apart in commonness (but South Cushitic seems more common). No objection to renaming both the proto-language and the family together, if that's what you want to do, and/or to narrowing the scope of our category if modern literature tends to think the valid group is smaller than our current category. - -sche (discuss) 22:28, 5 February 2026 (UTC)Reply

Well reconstructed protolanguages that need adding:

Proto-Agaw (Appleyard 2006), cf. Appendix:Proto-Agaw reconstructions and Category:Central Cushitic languages.
- Suggestion: Add cus-cen-pro, primary name Proto-Agaw, synonym Proto-Central Cushitic.
There has been not much work on this from anyone other than Appleyard, but this work is sound and well-received. Our coverage of the Agaw languages is so low that it may be long before anyone creates actual Proto-Agaw entries, but a code would at least help in formatting some of the clear general Proto-Cushitic entries like *ʔil- (“eye”), or for that matter Ethiopian Semitic etymologies (where there are many Agaw loanwords).

Proto-West Rift, cf. {{R:KiesslingMous 2003}}.
- Suggestion: Add cus-wrf-pro (or -wri-?) and also assign Iraqw, Gorowa, Alagwa and Burunge to a new family West Rift, cus-wrf. (I do not recommend anything like "Southwest", no one calls it that.)
Another well-received proto-language, for the shallow and evident grouping of I G A B. There is here too only one main monograph source so far, but there are slightly more people actively working on these languages from various angles.

A complementary "East Rift" for the other two Rift languages, the extinct Aasax aas + Kw'adza wka, has some reconstruction work sketched by Ehret, but is not well agreed to be even valid and is better kept out for now.

Other well agreed on subgroups that may need adding:

Arboroid / Galaboid / West Omo–Tana: comprises Arbore, El Molo and Daasanach.
- Suggestion: Add, primary name Arboroid, family code perhaps cus-arb, synonyms as mentioned.
"Galaboid" appears in a decent amount of literature but is based on an obsolete exonym of Daasanach, while "West Omo-Tana" seems to me confused about the point that "Omo–Tana" is meant to imply Arboroid as an "Omo" group vs. Somaloid as a "Tana" group.

No work seems to exists on reconstructing Proto-Arboroid in actual detail, and no protolanguage should be created at this time.
Dullay / Werizoid / Qawko: comprises Tsamai / Tsamakko, Gawwada and several small varieties we do not currently cover (at least Dobase, Gollango and Harso have decent documentation).
- Suggestion: Add, primary name Dullay, family code perhaps cus-dul, synonyms as mentioned.
No work targeted on Proto-Dullay seems to exist, though literature on wider East Cushitic sometimes notes common Dullay or common East Dullay (= non-Tsamakko) reconstructions. Probably no current need for a protolanguage entry. We may need at some point also some thinking on how to deal with East Dullay, "Gawwada" is only one of its dialects although they're all claimed to be broadly mutually intelligible. "Ale" meaning "Upland(er)" has been proposed in some recent papers as a cover term. For now we can watch how things develop.
Saho + Afar.
- Suggestion: Add. Everyone agrees these are closely related, maybe even to the point of a dialect continuum (and they also seem to be noticably distant from everything else in East Cushitic). But should this be Afar-Saho with the larger member first, or Saho-Afar which seems for some reason preferred in tertiary literature (Wikipedia, Glottolog, Encyclopedia Britannica, etc.)? Family code should depend on this as well.
Not much work on reconstruction seems to exist yet and currently probably no protolanguage needs to be created
Oromoid: comprises Oromo, Khonso / Konso, Bussa, Dirasha and some small varieties we don't currently cover.
- Suggestion: Add?. Family code perhaps cus-orm? A possible synonym is Oromo-Konsoid, but this may be unnecessary.
The non-Oromo languages are also usually placed together in a Konsoid subgroup, but it does not seem entirely clear if this is a subgroup or an areal, and adding this seems unnecessary as long as we have little to no coverage of any of these. Proto-Oromoid also seems unlikely to be currently necessary, though at least a PhD thesis Black 1975 has a phonological sketch and several lexical reconstructions.

At least one paper exists that is explicitly skeptical of this grouping, and it does seem not highly obvious. Out of the four groups I list here, we could easily do without this one — although it is regardless almost consensus in overview works.

In this area, the southernmost Orma–Waata 'dialect' group of Oromo is also quite divergent and could be eventually entered as a language or two of its own, though it seems no linguistics work has been done on these in decades.
- - Oppose for now - I think we should first figure out how we want to split Oromo in the first place (or if we want to have a single Oromoid macrolanguage) before we decide on higher grouping - an 'Oromoid' family with only one member is not what we want but might be what we get. Thadh (talk) 07:37, 15 December 2025 (UTC)Reply

This lastly brings me to the question of general East Cushitic. There has been much partial work since the 70s on reconstructing this cluster from various angles (and more is currently ongoing). Much of this has been actually attributed to large subgroups like "Lowland East" (all except Highland and Dullay) or "Omo–Tana" (Arboroid + Somaloid), but more on the basis of data being used than due to these being thought to be clearly valid (this is sometimes stated explicitly). At least one proto-language is clearly needed here. My suggestion is to not get hung on the still labile high-order subgrouping of the up to ca. ten known basic daughter groups (Highland, Saho-Afar, Oromo, Konsoid, Arboroid, Somaloid, Dullay; Yaaku, Boon, ?Dahalo) and to just make a general Proto-East-Cushitic cus-eas-pro, no new subgroupings, which could serve for all the various reconstructions (to which it is often easy to add more data from subgroup reconstructions or from new documentation of individual languages). Entry guidelines will require discussion, but that will be a different topic elsewhere if we can first agree on adding the protolanguage per se. --Tropylium (talk) 21:44, 14 December 2025 (UTC)Reply

Oppose a Proto-East Cushitic for mainly the same reasons as Proto-Cushitic - if there is no way to determine what languages belong to this branch, and no way to devise a good phonological model as a result, what are we even doing? We have this problem elsewhere (eg Proto-Afroasiatic) but I don't think it's a good idea to propagate it further. Thadh (talk) 07:37, 15 December 2025 (UTC)Reply
No, that was what I was saying: "East Cushitic" is the level where there is rough agreement on what languages belong in it (the one major exception is Dahalo). It's the lower intermediate levels where this is more unclear. Phonology too has a long-established core, the standard reference being Sasse 1979. Here too there are phonetic reconstruction questions but we would not ban, say, Proto-Indo-European just because there isn't consensus on what *h₂ or *h₃ were phonetically. --Tropylium (talk) 17:39, 15 December 2025 (UTC)Reply

If you will, if you think "there is no way to determine what languages belong to this branch", why is it that we have such a branch anyway then? (To be clear I would oppose any dismantling because it's been entirely stable in literature that the languages we have there currently are too in an East Cushitic group.) --Tropylium (talk) 17:43, 15 December 2025 (UTC)Reply
@Tropylium: Our classification has no real effect on what will appear on other pages - the only effect is the fact that borrowings could be called 'borrowed from an East Cushitic language', which is indifferent to whether Dahalo is part of the branch or not. A reconstructed proto-language, however, is wholly dependent on whether Dahalo is included or not, since any correspondences have to account for all languages, not just a majority (and if a correspondance does not account for Dahalo, while it is included in the descendants, then we should not use that model). I just really don't want to move our reconstructions fifty times just because we weren't sure how to reconstruct the proto-language and on which languages to base that reconstruction. Thadh (talk) 17:53, 15 December 2025 (UTC)Reply
There are zero problems of this kind actually resulting from Dahalo, though: what material that appears there is known to derivable from the same reconstruction as everything else. (I personally find this to be an argument in favor of them to be probably loans, because the language is in overall appearence plenty divergent.) Plus to reiterate, we currently have Dahalo as an independent unplaced Cushitic language; I do not propose or support moving it into East Cushitic (nor into a South Cushitic). --Tropylium (talk) 18:26, 15 December 2025 (UTC)Reply

I added Proto-Dangari as inc-dng-pro per {{R:inc-dng:Liljegren}}. This is a subfamily under Shina which has been reconstructed to some degree. In general, we need to figure out the subclassing of "Dardic" and this is a step over doing so where we have a source available. @Victar —Aryaman^A ^{(मुझसे बात करें • योगदान)} 03:14, 17 December 2025 (UTC)Reply

Discussion moved from Wiktionary:Grease_pit/2026/January#Classification_of_Gagauz.

According to "Ildiko Beller-Hann The Oghuz split: the emergence of Turc Ajämi as a written idiom // Materialia Turcica", it is generally accepted that the Gagauz language is a special development of Ottoman. According to Ludmila Alexandrovna Pokrovskaya, a linguist who studied Gagauz language, Balkan period/Old Gagauz emerged in 17th century (see: Языки мира. Тюркские языки, 1997. С. 225./Languages of the world. Turkic Languages, 1997. p. 225.). Dobruja, where Gagauz people lived, was under Ottoman control from 1420s to the 19th century. Thus, Ottoman Turkish should be classified as the ancestor of Turkish and Gagauz languages (Wiktionary classifies Old Anatolian Turkish as the direct ancestor of the Gagauz language). 고위얗믖후 (talk) 08:40, 13 January 2026 (UTC)Reply

There might be some benefit to adding a label and etymology-only code for early Ottoman Turkish, the language of the early modern Ottoman Empire. I'm not sure. I only look at late Ottoman texts. Vox Sciurorum (talk) 11:00, 14 January 2026 (UTC)Reply

Linguists usually refer to a variety/dialect of Old Anatolian Turkish as Early Ottoman. It would be nice to add a label and etymology-only code for something like Middle Ottoman Turkish (which was an actual independent language). By the way, yeah, I need it for the etymologies of the words that are not attested in Old Anatolian Turkish, but are present in Ottoman/Turkish and Gagauz languages, not just for the sake of correct classification, although that would be nice too. For example, see Gagauz "kaybetmää", which is partly Arabic-derived and is so widespread that it is used in idioms/expressions, it doesn't look like the word is borrowed from Turkish. 고위얗믖후 (talk) 13:10, 14 January 2026 (UTC)Reply

inviting "ağalar" to share their views too @Benwing2 @Surjection@AmaçsızBirKişi @Rttle1@Ardahan Karabağ@Bartanaqa @Allahverdi Verdizade on a flying visit @Yerkishisi @Lingnerdy @Əkrəm @BurakD53 고위얗믖후 (talk) 15:34, 14 January 2026 (UTC)Reply

Gagauz should be listed under Ottoman Turkish. Bartanaqa (talk) 15:39, 14 January 2026 (UTC)Reply

Gagauz should be classified under Old Anatolian Turkish and not under Ottoman Turkish. Gagauz split off in the 13th century, together with the language of Kaykaus, before Ottoman Turkish had fully emerged. Historically, it may have been influenced by Ottoman Turkish due to being under Ottoman rule; however, it retains archaic features. These include both the presence of only a limited number of long vowels and the preservation of lexical items inherited from Old Anatolian Turkish but not used in Ottoman Turkish, such as taaramaa. Therefore, although it was influenced by the variety of Ottoman Turkish spoken in the Balkans and also exerted influence on it, I think it should be considered separate from Ottoman Turkish. – BurakD53 (talk) 16:02, 14 January 2026 (UTC)Reply

"Keykays" version is just a hypothesis, I think we should use serious works of linguists. I would cite Oleg Mudrak as an example, who thinks that Gagauz is almost a dialect of Turkish, but his works are not that good compared to the works of Beller-Hann and of Pokrovskaya. 고위얗믖후 (talk) 16:12, 14 January 2026 (UTC)Reply

It is a good one, a good hypothesis.

In Gagauz, archaic vowel length is particularly evident in environments involving the consonant r. However, in some cases lengthening may also occur purely due to stress. For example, geeri “back side”, beeri “near side”, geerimää "to clip" are lengthened entirely as a result of stress, whereas words such as aaramaa, aaraştırmaa “to search”, koorumaa “to protect”, aarı “bee”, taaramaa “to win”, yaanmaa “to threaten”, and iiri “large” exhibit vowel length that is inherited from Proto-Oghuz. That said, this does not occur in all cases: a vowel must follow the consonant r for length to surface. For instance, although vowel length is present in Turkmen in words such as armaa “to get tired” and argın, it is not found in their Gagauz cognates. On the other hand, it should be noted that length is preserved in kaar “snow” and that in some usages vowel length is preserved in words such as aaç “hungry”, aad “name”, and aaz “few”.
However, vowel length is not always original in origin: the loss of the phoneme g may also cause compensatory lengthening of the preceding vowel, as in aarımaa “to ache”, daatmaa “to scatter”, and iiri “curve” in Gagauz. In this respect, Gagauz shares the same feature with the dialects of Balkan Turkish and generally accepted pronunciation of Istanbul Turkish. In fact, even though it is often regarded as incorrect, yarın is pronounced with a long vowel in Turkish dialects, as it is in Gagauz. In neither Anatolian nor Balkan dialects is vowel length in native Turkic words directly preserved in the way it is in Arabic and Persian loanwords (with a doublet exception: daahi "even", doublet of daha). So, Gagauz is distinctive by lengths.

– BurakD53 (talk) 16:41, 14 January 2026 (UTC)Reply

Another one is that their name means "Gök Oğuz", which is supported by Oleg Mudrak too, but as i said his works are not good enough. Yet, there is a version that it means "Gagan uz (olsun)", which is a folk etymology supported by Gagauz Orthodox "Ultras" since 19th century because of the emergence of Bulgarian and Greek nationalism. 고위얗믖후 (talk) 16:50, 14 January 2026 (UTC)Reply

This folk etymology is not acceptable and lacks any scientific basis. There is no word in Gagauz that supports such a phonological change: Gök Oğuz cannot become Gagauz; at most, it could be pronounced as Gökooz/Göyooz or in similar forms. – BurakD53 (talk) 16:55, 14 January 2026 (UTC)Reply

If you place it under Ottoman Turkish, I won’t object; my strongest basis for considering it Old Anatolian Turkish is the story of Keykavus. Length and archaic elements are only there to support that justification. – BurakD53 (talk) 22:03, 14 January 2026 (UTC)Reply

Kinda late, but I think Gagauz is better represented under OAT rather than OT. (Bir yerde Gagavuzların Kumanların soyundan geldiklerine dair bir kanı duymuştum, tarihî olarak ne kadar doğrudur bilemem ama Gagavuzların Boğdan bölgesine Osmanlı zamanından daha önce yerleştiklerini düşünürsek EAT özelliği taşımaları normal.)

AmaçsızBirKişi (talk) 20:53, 3 February 2026 (UTC)Reply

Also, courtesy link to @Yorınçga573, maybe he is also interested in this.

AmaçsızBirKişi (talk) 20:56, 3 February 2026 (UTC)Reply

Of course Gagauz should be a descendent of Ottoman, lol. There are dialects of Azerbaijani that also (partially) preserve historical length. Big deal. Are you actually Korean? Allahverdi Verdizade on a flying visit (talk) 20:44, 14 January 2026 (UTC)Reply

Discussion moved from Wiktionary:Grease pit/2026/January#request: list of ISO codes we don't use and haven't discussed at WT:LT.

Most ISO-coded languages, we include. Most ISO codes that we don't include have been discussed at WT:LT. But over the years, some additions of codes by the ISO have escaped notice here. My thinking for how to find them is: can someone

take a list of all valid ISO language codes, e.g. from here,
remove the codes that are also valid on Wiktionary (whether as full or etymology-only languages) (accounting for cases where a 639-1 code exists so we use it instead of the 639-3 code),
remove the codes which are not valid here but are mentioned on WT:LT, on the theory that we know about these and our exclusion of them is intentional,
and be left with a list of valid ISO codes which we lack and have not discussed on WT:LT, which are likely to be true oversights that we need to discuss, like rrm

? - -sche (discuss) 06:59, 14 January 2026 (UTC)Reply

Here, BTW, is the counterpart list of 74 codes the ISO lists as deprecated but which we still use. I will go through that later to see in which cases we are intentionally deviating from ISO and in which cases we simply have failed to notice and need to discuss the ISO code's removal. - -sche (discuss) 07:03, 14 January 2026 (UTC)Reply

@-sche Here you go:

Good luck making any use of this lengthy list! This, that and the other (talk) 09:24, 14 January 2026 (UTC)Reply

Thanks! I've set about checking which of these 144 have been discussed — it turns out all but two of the As were discussed and intentionally excluded here, just not recorded on WT:LT — and which need to be brought up for discussion. - -sche (discuss) 17:43, 14 January 2026 (UTC)Reply

Some of these were already discussed and intentionally excluded, just not recorded on WT:LT, and others have now been brought up for discussion; I've been going through a copy of this list in my userspace and have so far brought it down to 116 that are still to be accounted for. (Some others are the result of SIL recently splitting languages.) - -sche (discuss) 04:49, 19 January 2026 (UTC)Reply

@-sche When you have a chance, can you strike through the ones that are dealt with and add any notes to the others that you deem fit? E.g. many of the sign languages are being discussed elsewhere; you could note that. Benwing2 (talk) 03:58, 5 February 2026 (UTC)Reply

I'm going through these in userspace and bringing them here as I look into them; of the 144 codes above, the ~88 I've yet to raise for discussion are these. Anyone with knowledge of particular languages who wants to go ahead and create a discussion about them, please feel free, please beat me to it! I'm getting down to the more questionable or obscure ones. - -sche (discuss) 05:51, 5 February 2026 (UTC)Reply

Currently we have an L2 language "Low German", as well as two daughter L2 languages "Dutch Low Saxon" and "German Low German". "German Low German" in turn has an L2 daughter language "Plautdietsch" (which is also known as "Mennonite Low German"). Under Dutch Low Saxon are a whole series of etym languages (Achterhoeks, Drents, Gronings, Sallands, Stellingwerfs, Twents and Veluws), none of which have "Low German" or "Low Saxon" in their name, contrary to normal practice. Under German Low German are three etym languages, East Frisian Low German, Low Prussian and Westphalian, only one of which has "Low German" in its name. On top of this, we have a separate category Category:Westphalian German, which supposedly consists of High German terms as used in Westphalia; and we also have a separate category Category:Prussian German, which is identified with Wikipedia's High Prussian dialect, which is described as "a group of East Central German dialects in former East Prussia", which means if made into an etym language it might properly be an etym language of the East Central German language (which is a separate L2 on Wiktionary); but it might be, as advertised, standard High German terms as used by speakers who formerly lived in East Prussia, or a mixture of standard High German terms and East Central German terms. And then of course we have the Category:Old Prussian language, which is Baltic and not German at all, but could easily be confused with a not-further-qualified "Low Prussian". And to top off the confusion, there is no category Category:Low Prussian to match the etym language, but there does exist a category Category:Low Prussian Low German, which appears to be intended to be the same as unqualified "Low Prussian".

I propose the following:

Somehow, all the terms identified under the overarching "Low German" L2 language need to be moved into either Dutch Low Saxon or German Low German. I don't know enough about these lects to do it myself, but I can hazard a guess that most of them are Dutch Low Saxon, since that language is more viable currently than German Low German.
Rename all the etym languages under "Dutch Low Saxon" to have "Low Saxon" in their name. Hence Achterhoeks -> Achterhoeks Low Saxon, Gronings -> Gronings Low Saxon, etc.
Rename the unqualified etym languages under "German Low German" to have "Low German" in their name. Low Prussian -> Low Prussian Low German (despite the weird-sounding name) is a no-brainer since the latter category already exists and the former doesn't. The only other one is Westphalian, where "Westphalian Low German" will at least be less confusing given the existence of "Westphalian German" as well (which needs to be investigated further).
Consider renaming "Plautdietsch" to "Mennonite Low German", depending on which term is more common in English (endonyms be damned; there is no rule that says endonyms must be preferred if they are not the most common English term for the lect).
Maybe consider merging "Plautdietsch" and "German Low German". I don't know enough about the situation here to comment on whether this makes sense, but the nested hierarchy of L2 languages needs rethinking for sure. From what I gather, Plautdietsch/Mennonite Low German is actually more conservative than typical German Low German dialects, having been separated for 200-300 years (?) and not subject to the simplification that comes with dying languages (hence the opposite of the Dutch vs. Afrikaans situation; but since German Low German is not standardized, it seems it could easily function as an umbrella language that includes Plautdietsch/Mennonite Low German).

Pinging @-sche (heeeeeeelllllllppppp!!!!!!!) and @Thadh who I suspect may have some insight here. I'm not sure who else to ping; someone or several someones have entered all these etym varieties and the terms under them, but I don't who they are. Benwing2 (talk) 00:37, 16 January 2026 (UTC)Reply

There have been a lot of prior discussions of this over the years (partial list). To my understanding, there are two reasonably well-grounded, defensible approaches.
(1) One is to split Dutch Low Saxon, German Low German, and Plautdietsch on the basis that existing in different nations in contact with different other languages (Dutch, German, English) has led them to develop different ... if not 'standards', at least 'norms' for spelling, and different loanwords, etc.
(2) The other is to group Dutch Low Saxon and German Low German as Low German. (Plautdietsch may still merit being separate, as it has been separated from the DLS/GLG/LG speech communities in Europe for some time, although some users have argued for merging it too.)
Unfortunately, because this is such a thorny tangle and there is no one objectively right approach, it has never been sorted out. I was not initially in favour of it, but over time I have come to think that the second approach, merging DLS and GLG into LG, as User:Korn [kʰũːɘ̃n] was perhaps the most recently-active advocate of, may be the most linguistically defensible. Either way there is the question, do we centralize content, or does every dialect's form host definitions in its own entry? The latter is a nightmare to maintain if the definitions are the same, but occasionally they differ. It was at one time argued (in favour of the latter approach) that dialects differ in not only spelling but pronunciation and sometimes grammar; of course, we could handle the pronunciation differences in one centralize entry's ===Pronunciation=== section, and grammatical/inflectional differences by notes in inflection tables, or if necessary having multiple inflection tables. (And to solve the issue of where to centralize content, IIRC Korn had suggested lemmatizing MLG spellings whenever possible.)
"Prussian German" (as used here) is de with Prussian regionalisms, some of which derive from Low German. If you want to be really confused, know that when it was a more living language, one of its dialects was Breslausch. (Breslauisch, in Breslau, OTOH, was a variety of Silesian German.) IME the terms in Category:Low German lemmas are often GLG, BTW, probably as a product of which language community's speakers were/are active here. - -sche (discuss) 01:55, 16 January 2026 (UTC)Reply

@-sche Ugh. Thanks. In terms of whether to centralize content, I am strongly in favor of doing so, and it does seem to be the norm at Wiktionary, e.g. in Occitan, Romansh and Franco-Provençal, among others, with soft redirects at the other spellings (pace the current mess at Serbo-Croatian, which I will try to straighten out at some point using an {{sh-tcl}}-type approach). I will defer to you concerning whether to merge Dutch Low Saxon, German Low German and/or Plautdietsch. In the meantime, what do you think of my proposals (2) and (3) above concerning renaming the etym varieties to include "Low Saxon" (if they're on the Dutch side) or "Low German" (if they're on the German side) in their names? This seems doable now and should reduce some of the confusion. Benwing2 (talk) 02:07, 16 January 2026 (UTC)Reply

The Dutch Low Saxon subvarieties should not have 'Low Saxon' in them: the names given currently are the ones used pretty much everywhere and by anyone, and nobody would ever call Gronings/Groenings 'Gronings Low Saxon', that would be akin to saying 'Russian Slavic' or 'Italian Romance'.

I'm not very familiar with the languages to say anything useful on how to split them though. Thadh (talk) 05:39, 16 January 2026 (UTC)Reply

I made a proposal several years ago, but it didn't go anywhere: Wiktionary:Beer_parlour/2020/August#Low_German_revisited --{{victar|talk}} 01:41, 17 January 2026 (UTC)Reply

I read your proposal and for the most part it seems sensible to me. Certainly, eliminating nds as a language and making it a family code is the right thing to do. I can't speak to the split between East and West Low German but in general it seems the right thing to do to split based on linguistic and not political boundaries. You got a strong negative response from @Korn, but I think their biggest issue was not the particular split you chose but the fact that you split Low German at all. Could there be a situation with Low German similar to that with Levantine Arabic? The fact that there's a dialect continuum from Northwest Turkey all the way down to western Jordan with no clear boundaries is what prompted the merger of North and South Levantine Arabic, because it was eventually acknowledged that although people in Hatay (in Turkey) might have difficulty understanding people in Aqaba (in southwest Jordan), generally there is broad comprehensibility across much of the region and no intelligibility boundary in the middle where the North/South boundary was formerly drawn. Benwing2 (talk) 07:54, 17 January 2026 (UTC)Reply

I stand by everything I said back then, and I will summarise again curtly: LG is a language without a standardised spelling and otherwise a smooth dialectal continuum. (With the notable exception being the Westphalian region, which stands out a tad more.) The only major difference between Dutch and German dialects is that the people in the Netherlands use a more traditional spelling (Dutch influence) while Germans tend to ape the random spelling of standard High German cognates. I don't think distributing language codes on the basis of orthographic habits is conductive for the purposes of a dictionary, especially if thereafter each language code still covers 2-5 different orthographic standards.

And since Benwing2 has misread my prior posts, I will be impolitely blunt: Any split between "West Low German" and "East Low German" as proposed before is a delusion, invented by confused people who don't know what they're talking about. It doesn't exist. Literally every example and features which I ever read to be supposedly "west" or "east" of some imagined border either exist on both sides or only in one specific dialect instead of the entire region. Korn [kʰũæ̃n] (talk) 11:13, 17 January 2026 (UTC)Reply

I am finest with a merger of the current L2 concepts of Dutch Low German, German Low German, and Plautdietsch which should be labelled Mennonite Low German, which can well be handled by labels, exactly because I don't like to create delusional realities. For distinction, it is more important that entries will (practically) require heavy sourcing anyway, even if only an editor mentions where he heard something; we won't meet anyone creating Mennonite Low German without that given its bearers are 85+-years-olds in secluded sectarian communities, the rest Sovietized, Americanized, Prussianized, from the second half of the 20th century.

Then, as Korn said, I am pretty sure Westphalian is more divergent than these three which were created with less insight by Americans from American databases, surely weighing ethnofactors more heavily than linguistic material, I mean that in a pure US horizon you have this powder-keg of idpol of national and racial communities that is essentialized even in acknowledgement of traces of so-called heritage and a different risk profile of the librarian is a real reason for ghost language divisions.

Analogously for Westphalian, it is challenging to find any unstilted situation of the mutual intelligibility of it being tested – theoretically soundly proven – because due to the peculiar accent it was penalized in this (until the 1950s) rural region, so nobody would actually complain about it being comprehended as Low German, and anyhow I never found anyone being offended by being called German simply, so no “Dutch Low German” and probably write-only “Plautdietsch” needed or even separatist “Westphalian”. We are toning down artificial terminology that is practically of little use. Fay Freak (talk) 17:39, 17 January 2026 (UTC)Reply

Regarding whether to split our LG content between DLS and GLG, or merge our DLS and GLG content into LG, if this were a !vote with multiple options, my first choice would be to throw my hands up in despair because it's a thorny tangle and anything we do or don't do has not-insignificant downsides; my second and actionable choice would be to grit my teeth and support merging DLS and GLG, west and east, into one Low German language. (Whether to merge Plautdietsch or not I am on the fence about; I can see arguments for either approach; in its origin it is obviously Low German, but it has been separated from the other Low German speech communities for a long time. After exposure and acquaintance, it is intelligible to other Low German varieties, but it does have [as Wikipedia puts it] "several developments and sound shifts not found in any other Low German dialect"; of course, lots of Low German dialects have things that they do differently from various other Low German dialects. If people want to merge it, I don't oppose that.)
(Wiktionary is probably just not the best place to most clearly record any one Low German variety: We can record the various varieties together under one Low German header to accomplish our own goal of recording Low German words, but any variety which has the goal of recording its own words and preserving itself is probably better served by making its own separate dictionary of itself that lemmatizes its own forms, because lemmatizing some other form — as we will probably of necessity do in most cases, from the perspective of most dialects, because even T:tcling definitions into a bunch of entries instead of having them actually written out in a bunch of entries, while conceivable, still sounds like a maintenance headache — creates a situation like An Caighdeán, Rumantsch Grischun, Quechua, etc, where many speakers don't see their forms lemmatized in the reference work.) - -sche (discuss) 08:40, 18 January 2026 (UTC)Reply

Regarding the names, "Twents" vs "Twents Low Saxon" or "Twents Low German" (and this also serves as a reply re Talysh), I agree with Thadh and (re Talysh) Victar: "Twents", "Sallands" etc are the names these lects usually have (including in works that don't seem particularly focussed on Low German, e.g. [23]). Sure, if you don't know what "Twents" is, the name doesn't tell you, but if you don't know what "Koreguaje" or "Alamblak" is, those names don't tell you either. (And unlike Twents, "Koreguaje" and "Alamblak" as full languages occur in places where context can't help, like translations tables where — although we could change this — language names aren't wikilinked.) Since Twents, Sallands, Drents et al. and Asalemi, Karganrudi et al. are small, etymology-only languages, they're not mentioned in many places: Sallands occurs as a label on definitions in some entries where the language header clarifies things (and "Sallands" is wikilinked to more info), and as a qualifier in translations tables where again the overarching language is specified. It could also be mentioned in etymologies, where it would be a blue wikilink pointing to an article about it and about what language it belonged to. On the whole, I'm not convinced that there is a need to rename these. (With "Demotic", the problem as I understood it was that it commonly means two things, i.e. the problem was not that someone wouldn't have any idea of what "Demotic" meant, but that they would have too many plausible ideas of what it could be. With "Twents" or "Koreguaje", someone either knows what it is, or goes "huh, guess this is another lect I've never heard of" and either ignores it or clicks the link to learn more.) - -sche (discuss) 08:57, 18 January 2026 (UTC)Reply

OK thanks, I won't press this issue any more. Benwing2 (talk) 09:21, 18 January 2026 (UTC)Reply

Merge Dutch Low Saxon (nds-nl) and German Low German (nds-de) into Low German (nds)

[edit]

OK, not wanting to see yet another discussion of this peter out without resolution, I'm pinging anyone I can think to (and you can too!) to weigh in on the proposal Merge ==Dutch Low Saxon== nds-nl and ==German Low German== nds-de into ==Low German== nds.
(Whether to merge ==Plautdietsch== pdt is a separate question IMO, because we can only discuss whether or not to merge pdt into either nds or nds-de after first deciding which of those is wanted.)
To my understanding, in the section above, Korn, Fay Freak, and now I are inclined to merge nds-nl + ds-de → nds, while Victar does not want one nds language (though he wants it bifurcated differently than it's currently bifurcated); feel free to (re)state your positions / !votes in this subsection if you want.
Excluding users who are already in and aware of this discussion, I'm now pinging all active editors with nds, nds-de or nds-nl Babel competence or who I've seen create entries, all still-active participants of prior discussions about this, and the members of the German and Germanic workgroups: User:Phillipm0703, User:Stardsen, User:Yoursmile, User:Tlustulimu, User:Hans-Friedrich Tamke, User:Holtseti, User:Jan Tietje, User:Nyxomniac, User:RocketPlaysCPI, User:Rua, User:Mahagaja, User:Matthias Buchmeier, User:Jberkel, User:Fytcha, User:Helrasincke, User:PhoenicianLetters, User:Mnemosientje, User:The Editor's Apprentice, User:Hazarasp. - -sche (discuss) 02:53, 26 January 2026 (UTC)Reply

I would like nds-nl and nds-de to be merged, as they are basically the same language, merely divided by a national border. East Frisian (nds-de) and Gronings (nds-nl) are more closely related to each other than East Frisian and Mecklenburgian (both nds-de). It would be better to create a dialect chart similar to the Chinese or Ainu ones, using the Template:dialect synonyms. There are more than enough sources for dialectal Low German words; however, most of them are in German. This would solve the problem of “Which lemma do we use?”. Therefore, I am in favor of merging both, provided that all Low German dialects are adequately represented. Phillipm0703 (talk) 10:37, 26 January 2026 (UTC)Reply

I have no objection to merging nds-nl and nds-de into nds, although the different orthographic conventions in the two countries may cause some trouble. I am much more reluctant to include Plautdietsch in the merger, since it's been evolving separately both from the other Low German varieties and from the two standard languages for several centuries. —Mahāgaja · talk 15:33, 26 January 2026 (UTC)Reply

I support merging all the low German dialects together under nds. I personally think Plautdietsch is also a valuable part of this picture, especially given the fate of most of the other eastern varieties, but I agree that there are ways to merge it that would do injustice to its unique development. I also agree with the suggestion from @Phillipm0703 of developing a good strategy to adequately represent dialect information. Helrasincke (talk) 16:45, 26 January 2026 (UTC)Reply

I'm in favour of merging nds-nl and nds-de. I have no view on how Plautdietsch should be treated. —Caoimhin ceallach (talk) 13:23, 30 January 2026 (UTC)Reply

I can agree to such a merge, but I doubt we have enough editors to do this properly at this point. Thadh (talk) 13:56, 30 January 2026 (UTC)Reply

It looks like this is similar to the Levantine Arabic situation; we have consensus on merging but not currently the editor resources to do so. For reference there are:

665 "Low German" lemmas.
860 "German Low German" lemmas.
124 "Dutch Low Saxon" lemmas.

It was mentioned above that the "Low German" lemmas tend to be German Low German rather than Dutch Low Saxon, so we have (at least) two approaches for merging:

First merge Low German with German Low German. If most of the Low German entries are GLG, this should mostly be a deduplication problem. Then merge in the Dutch Low Saxon lemmas.
Or, do it the other way around. This has the advantage that with the smaller number of Dutch Low Saxon lemmas, we can more easily merge them and then delete Dutch Low Saxon as a language, which will give the satisfaction of having made concrete progress; OTOH, the presence of Dutch Low Saxon entries in the resulting Low German lemmas may complicate the merging of the GLG entries.

I imagine that ChatGPT or a similar system may be able to help with the deduplication.

Benwing2 (talk) 04:21, 5 February 2026 (UTC)Reply

I'll chip away at what I can; I merged ~40 entries so far. - -sche (discuss) 07:46, 7 February 2026 (UTC)Reply

I chipped away another ~40, and in the process noticed that the translation adder doesn't allow adding nds translations : now that we're merging things back into nds, the line if (txt == 'nds') return error("Please use the code nds-de for German Low German or nds-nl for Dutch Low Saxon"); should be removed. - -sche (discuss) 05:36, 8 February 2026 (UTC)Reply

@-sche I've deleted this special case and also enabled the display of gender checkboxes for nds. This, that and the other (talk) 07:14, 8 February 2026 (UTC)Reply

Thanks. I've knocked out another ~70 GLG/DLS terms. Maybe I'll prioritize handling the DLS terms (65 left at the moment); we could potentially bot-reheader/recode the GLG terms to Low German (except in any cases where a page with a GLG section already has a LG section: if there are any — I hope not! — those can be handled manually). There would still be some duplication when that's done, because there is already duplication — I've found several cases where people entered every alt form (or just some of them) as a full entry — but it would probably be a net positive. - -sche (discuss) 04:19, 9 February 2026 (UTC)Reply

I'm almost done editing DLS (12 entries left). What seems logical to me as a next step is to ask Benwing or someone else with a bot to change ==German Low German== entries to ==Low German==, and change their headword templates etc from nds-de to nds (which entails changing what parameters they input gender etc with!!), but first : can anyone foresee that causing any problems? (Some templates, e.g. Template:nds-de-cardinal, also need to be renamed, and eventually updated to mention lemmatized forms first.) My thinking is: changing GLG headers/codes to Low German will make it easier to later de-duplicate cases where definitions, etymology etc are repeated across different dialects' forms' entries, if we only have to go through one category looking for such cases, and don't have to update L2s while reducing definitions to {{altform}}. BTW, sometimes definitions are repeated even when each dialect's form is the same: entries just have multiple POS sections in a row repeating the definitions. I tentatively revised cases I found in this way, but a better solution may be to remove the forms from the headword line and have inflection tables for each dialect. Do we even have a Low German noun inflection table template, or does one need to be created?
Another task is to deal with translations tables and descendants lists. Ideally we'd go through those manually to replace with nds + dialect labels, updating links to reflect where entries are now lemmatized, but there are about a thousand nds-de / nds-nl translations and 2,200 {{desc}}s so we might need to consider ways of bot-replacing them with nds + labels. (I tentatively created Dutch Low Saxon and German Low German labels: ultimately we may aspire to replace them with more specific dialect labels, but for now they let nds-de and nds-nl be merged without losing information.) - -sche (discuss) 03:49, 11 February 2026 (UTC)Reply

@-sche I can do the bot conversion of GLG to LG entries. I think what might make the most sense for translation tables and descendant lists is just to convert nds-de and nds-nl to etym codes and leave the translation and descendant entries as-is. That way we don't need to bot-convert the entries to labels. Benwing2 (talk) 04:28, 11 February 2026 (UTC)Reply

I converted Dutch Low Saxon to an etym variety, dealt with any errors coming up and deleted the L2-lang-specific categories. Benwing2 (talk) 01:03, 14 February 2026 (UTC)Reply

I did the bot conversion and converted German Low German to an etym variety. Benwing2 (talk) 05:35, 14 February 2026 (UTC)Reply

Thanks. And good point about leaving translations sections and {{desc}}s as-is in the short term. Over time, I will try to review all the Low German entries, translations, and {{desc}}s to de-duplicate them, update links (currently they don't always link to the most common forms or the forms that exist and are lemmatized: [24]), and revise the labels to make them either more specific or, if a form is in general use, more general. - -sche (discuss) 07:31, 14 February 2026 (UTC)Reply

BTW, while I created "Dutch Low Saxon" and "German Low German" labels so our existing content under those L2s could quickly be merged under one L2 without loss of information (and can now be given more specific labels at leisure), I know some people have over the years criticized the use of those as linguistic units/clades and I want to be clear I'm not wedded to them if anyone has proposals for better (for lack of a better word) taxonomies / labels. The concepts have some obvious basis but also some obvious shortcomings: it's apparent that the varieties in the Netherlands vs Germany, surrounded by Dutch vs German, have been influenced in different ways (including loanwords and, significant to how Wiktionary is structured, capitalization preferences) ... but there are also varieties spoken in other countries. So, for example, treating Pomeranian as a German variety of Low German (="German Low German") makes sense, and treating Brazilian Pomeranian as a subset of Pomeranian also makes sense, but I wonder whether or not it would also make sense to additionally categorize the Brazilian variety as one of the top-level divisions of Regional Low German alongside Dutch and German Low German (?), since it (like the others) has been pulled in a different direction by the nation it exists in and language it is surrounded by: for example, it appears that it may (like Dutch varieties) tend to uncapitalize nouns (unless this is just a quirk of the one main scholar who writes about it? more data needed!). - -sche (discuss) 18:53, 16 February 2026 (UTC)Reply

There's also the fact that the linguistic divide doesn't necessarily correspond to the national borders, e.g. Wikipedia says the Urkers dialect of Urk in Flevoland, even though it's in the Netherlands, is actually part of the Westphalian dialect group, which we have under German Low German. I haven't yet made any such changes that would move any of the DLS lects under GLG but we might consider doing so (I think @Victar's proposal may have had something similar in it). Also feel free to remove the GLG labels when you feel there's some other indication of dialect; I added them fairly promiscuously to all GLG content I merged unless it was labeled with a specific other dialect (and even then I may have messed up and added it regardless in a few cases). Benwing2 (talk) 20:19, 16 February 2026 (UTC)Reply

On a different note, I added category support for etym-only reference template categories like Category:German Low German reference templates. This kind of goes against the idea that langname categories with a prefixed langname are only for L2 langs, but it seems useful as a lot of references cover a specific dialect rather than the language as a whole, and there were quite a lot of existing categories like Category:Hazaragi reference templates that weren't in the category tree. This is perhaps another reason to rename "etym-only language" to "language variety" (or "language variant" or similar), as had been agreed upon before but which I never carried out. Benwing2 (talk) 20:23, 16 February 2026 (UTC)Reply

I think that's a good idea. There are also some Baltic [bat] references. {{victar|talk}} 20:31, 16 February 2026 (UTC)Reply

"Sama" is another ambiguous language name. Our "Sama language" is not the most well-known Sama, which refers to a cluster of Austronesian languages (the Category:Sama-Bajaw languages), three of which have "Sama" in their name (Central Sama, Southern Sama and Pangutaran Sama). Our "Sama language" is a Bantu language spoken in Angola; I would suggest renaming it to "Sama (Bantu)" except that there is another Bantu Sama language, spoken in Gabon, which we call CAT:Osamayi language and Wikipedia calls Samay language. As a result I suggest "Sama (Angola)", as Wikipedia does. Benwing2 (talk) 01:33, 17 January 2026 (UTC)Reply

This particular Chatino variety is comparatively well-studied, by people I happen to know from my time at UT Austin, and probably for this reason we have 1,072 lemmas, which is quite a lot for a minority language variety and is in fact a lot more than the sum of all the lemmas for all other Chatino varieties at Wiktionary. Yet the classification is all messed up. The page for the lect, CAT:San Juan Quiahije Chatino language, claims it's a sign language, apparently confusing it with the unrelated San Juan Quiahije Chatino Sign Language (which Wikipedia simply calls Chatino Sign Language despite the fact that there are likely other Chatino sign languages; probably because it's the only one that's been studied). Furthermore, per Glottolog and ISO 639-3 it's not a separate language, but a variety of the CAT:Western Highland Chatino language; and per Wikipedia, Western Highland Chatino itself is merely a dialect of Eastern Chatino (aka Highland Chatino). The Western Highland Chatino language has only 31 lemmas, of which 14 are tone marking characters and only 17 are actual words. There are no templates per se for Western Highland Chatino; the two in existence are both redirects to templates for San Juan Quiahije Chatino. As a result I suggest we merge San Juan Quiahije Chatino into Western Highland Chatino; or failing that, recategorize the former as a daughter of the latter. Benwing2 (talk) 01:59, 17 January 2026 (UTC)Reply

Oof, what a jumble. It looks like the equation of San Juan Quiahije Chatino with (San Juan Quiahije) Chatino Sign Language was made in August 2024, after entries (for the spoken language??) were created (starting in May 2024). But it's unclear to me what the source of the entries is; spot-checking some, e.g. NaF riqC / NaLM riqM, JlyoI, I can't find anywhere else on the web or in print using them, but maybe I am missing something or a different orthography is better attested...? Benwing2, are you or your UT Austin colleagues able to tell if our existing San Juan Quiahije Chatino entries are correct / real? If so, then please migrate them as y'all deem fit (e.g. changing them to ==Western Highland Chatino== + {{lb}}). If they are not attestable, we might need to mass-RFV and nuke them. Either way, if we want to include Chatino Sign Language we need a separate code for it. - -sche (discuss) 04:48, 25 January 2026 (UTC)Reply

OK, I'll ask one of my former colleagues who is a native speaker to spot check and see if the lemmas make sense. Benwing2 (talk) 06:06, 25 January 2026 (UTC)Reply

Any word on whether our entries look right? Are they supposed to start with capital letters, like they currently do, or has someone done the equivalent of entering Words In Uppercase? I gather that the other capital letters indicate tones. I can't find sources for the entries as spelled, but could believe that they encode words that might be attested in other orthographies. - -sche (discuss) 00:32, 7 February 2026 (UTC)Reply

As far as San Juan Quiahije Chatino Sign Language, shall we add it? (sgn-sjq, or what?) - -sche (discuss) 00:32, 7 February 2026 (UTC)Reply

We have three languages CAT:Karelian language, CAT:Livvi language and CAT:Ludian language, which Wikipedia says are "supradialects" of a Karelian macrolanguage, and refers to them respectively as Karelian Proper language, Livvi-Karelian language and Ludic language (aka Ludic Karelian). Wikipedia says that the larger (Macro-)Karelian is a clade; Glottolog agrees and in fact considers it a single language "Karelian". We put the three lects directly under the Finnic languages; if we agree that (Macro-)Karelian is a clade, we should create a corresponding family "Karelian languages". Benwing2 (talk) 03:43, 17 January 2026 (UTC)Reply

The actual clade amongst these is just Karelian + Livvi, probably not enough to make much of real use out of. Ludian is not part of a common "macrolanguage" with these by any usual sense of the term: it is known in Russian research as a "Karelian language" for geographic adjacency, but is taxonomically instead a sister of (or even, a grade of varieties closely related to) CAT:Veps language. Glottolog on my checking looks to have some even weirder probably-miscited analysis, where they group together Livvi, Ludian and Veps (Livvi has a Ludian-Veps substrate but to my knowledge all Fennicists would agree it's closest related to Karelian proper).

If anyone feels CAT:Finnic languages seems to have "too many" direct daughters, the best-received larger subgroups are probably North Finnic (= Finnish, Meänkieli, Kven, Ingrian, Kukkuzi, Karelian, Livvi, Ludian, Veps) and Gulf-of-Finland Finnic ("Neva" in Glottolog = all except Livonian and South Estonian / Võro). --Tropylium (talk) 20:54, 18 January 2026 (UTC)Reply

OK thanks! I'll defer to you as to whether you think we should create North Finnic and Gulf-of-Finland Finnic families. Benwing2 (talk) 20:59, 18 January 2026 (UTC)Reply

I'd also ask our various currently active Finnic etymology editors on if they're interested in working much on any details of these - I may not be really up to speed on what's the benefit to fine-graded categories, outside of cases like West Germanic that have their own substantial reconstruction schemes in existence. Finnic is meanwhile enough of a dialect continuum that it seems something about the subclassification still gets reworked every few decades. --Tropylium (talk) 21:09, 18 January 2026 (UTC)Reply

Northern Finnic seems fine to me (in fact, we have several reconstructions that are marked as such), Neva/Gulf I'm not sure. But please, no Old Karelian (in case anyone was thinking about it).

@Surjection might have some thoughts on this as well. Thadh (talk) 17:31, 21 January 2026 (UTC)Reply

I don't really have strong opinions either way, but I'd tend to oppose having a separate Proto-Northern Finnic. We should instead start grouping the descendants under the Proto-Finnic entries and can give a 'Proto-Northern Finnic' reconstruction there if we so wish. I have some old proposals set up for something like this. — SURJECTION ^{/ T / C / L /} 19:21, 21 January 2026 (UTC)Reply

Why? Таёжный лес (talk) 14:23, 28 February 2026 (UTC)Reply

@Таёжный лес: Why what exactly? Thadh (talk) 15:27, 28 February 2026 (UTC)Reply

Why Old Karelian shouldn't be included as the historical/reconstructed language. Таёжный лес (talk) 16:43, 28 February 2026 (UTC)Reply

Because it is based on a couple of rather trivial sound changes that aren't even found in all the varieties supposedly deriving from it, the -loi- plural which likewise doesn't show up everywhere and very little research on the relative chronology of these sound changes to others that affected the languages in question have been done. Thadh (talk) 07:01, 1 March 2026 (UTC)Reply

Old Karelian in any of the possible senses (= with or without Ingrian, SE Finnish, Savo Finnish) is indeed neither historical, i.e. is not attested, nor has it been reconstructed to any real extent. I'd probably support adding it if there was an actual reconstruction, but trying to freestyle content for "conceptual" intermediate protolanguages like these is not a good idea. Currently questions like the exact extent of Finnish loanwords in Karelian or Karelian loanwords in East Finnish dialects are very underresearched. --Tropylium (talk) 18:53, 2 March 2026 (UTC)Reply

Last year, I had plans about writing entries about Old Karelian words from birch bark letters. Aren't they eligible to include? Таёжный лес (talk) 12:29, 7 March 2026 (UTC)Reply

Although Wikipedia refers to the languages as Jicarilla, Lipan and Mescalero-Chiricahua, Wikipedia says the corresponding ethnic groups are the Jicarilla Apache, Lipan Apache, Mescalero Apache and Chiricahua Apache. Furthermore, both Ethnologue/ISO 693-3 and Glottolog include the word "Apache" in all of the Apachean languages except for Navajo, whereas we do it only for Western Apache and Plains Apache (which Ethnologue and Glottolog call "Kiowa Apache"). As all the ethnic groups speaking these languages identify as Apache except for the Navajo, I suggest we rename the three languages to include the word "Apache" in them, and follow Glottolog and Ethnologue in using the name Mescalero-Chiricahua Apache to reflect the two ethnic groups speaking this language. Benwing2 (talk) 05:31, 17 January 2026 (UTC)Reply

There are definitely sources that don't use Apache, e.g. Campbell's The Indigenous Languages of the Americas, but Jicarilla Apache definitely seems to be far more commonly used.

However, for Mescalero-Chiricahua, the tribal website (https://mescaleroapachetribe.com/nde-bizaa/) appears to refer to the language exclusively as "Apache," and "Chiricahua" doesn't appear anywhere on the page, so I wonder if "Mescalero Apache" would be more appropriate. Vergencescattered (talk) 19:48, 20 January 2026 (UTC)Reply

Actually, it seems that the Chiricahua also use "Apache" to refer to it, so I guess Mescalero-Chiricahua Apache would be the compromise? Unless we're willing to just use "Apache" for it. Vergencescattered (talk) 19:52, 20 January 2026 (UTC)Reply

@-sche Do you support these renames? Benwing2 (talk) 21:00, 24 January 2026 (UTC)Reply

On one hand, as discussed elsewhere about not just parenthetical disambiguators (Ma'di, etc) but also Sallands etc, the longer we make the names, the more we're making people type each time they enter a word. On the other hand, disambiguation can be helpful, and the long forms ("Jicarilla Apache", etc) can be found, and as Vergencescattered found, the flipside to some people people using just the bare names ("Lipan", etc) is that some people just call each of these bare "Apache". (Indeed, "Jicarilla Apache" is impressively common, pretty much exactly half as common as bare "Jicarilla" in a raw Ngram without trying to filter out hits that aren't about the language, and it's used in the titles of almost all the reference works listed by Glottolog. OTOH, others are not so common: Mescalero Apache is about a third as common as the bare word, Lipan Apache about a sixth, Chiricahua Apache about a seventh lately.) The only name I can't find much use of is "Mescalero-Chiricahua" (with or without "Apache"), but it's easy to find "Mescalero language" / "Mescalero+Apache+language" as well as "Chiricahua [Apache]", so calling it just one of those does seem suboptimal. Ultimately, the cases for one vs the other seem to me balanced enough that neither compels me and I don't have a strong preference one way or the other: it's a meh (abstain) from me. - -sche (discuss) 22:47, 24 January 2026 (UTC)Reply

The Sisaala languages form the following clade:

├[-]┬ Sisaala (nic-sis) F

Under Wikipedia, Sissala language redirects to Sisaala language so I suspect "Sissala" is an older spelling. For consistency I suggest renaming Sissala -> Sisaala. We might also consider renaming Paasaal -> Paasaal Sisaala or similar; Wikipedia's intro to this language says

Paasaal, or Pasaale Sisaala (Southern Sisaala) is ...

For reference, under Glottolog there are 8 English references using "Sissala" but all of them are from the same author (Regina Blass), while there are 4 references using "Sisaala" from 3 or 4 different authors (hard to tell exactly). Benwing2 (talk) 23:35, 18 January 2026 (UTC)Reply

Support per nom (although this will cause the minor Kipchak problem, of language and family having the same name if used in {{der}} etc). - -sche (discuss) 07:40, 24 January 2026 (UTC)Reply

Yeah we need a more general solution to that, maybe I'll disambiguate when necessary, as in the other thread where this was brought up. Benwing2 (talk) 07:44, 24 January 2026 (UTC)Reply

Done renaming Sissala -> Sisaala. I didn't rename Paasaal -> Paasaal(e) Sisaala, but I'm curious to know what you think of that. Benwing2 (talk) 21:06, 24 January 2026 (UTC)Reply

In 2025, after many years of work by various people to revitalize the Digor language (less common and suppressed in the 1930s in favour of Iron), including the launch of a Digor-language news agency in North Ossetia and the publication of textbooks in the language, the ISO added a code for Digor, osd. For clarity, they also renamed oss to specify that it is, as it generally was in practice, only for Iron. Depending on who you ask, Digor and Iron are either "hardly mutually intelligible", "not mutually intelligible", "claimed not to be mutually intelligible", nonetheless "apparently sufficiently cohesive to be regarded as a single language", or "actually barely mutually intelligible". The following represents every English-language book and paper I could find that used the words "Digor", "Iron", and either "mutually intelligible" or "mutual intelligibility" (I did not make an effort to exhaustively quote lists of similarities or differences) :

Statements about the (non-)mutual intelligibility of Digor vs Iron

David Erschler, "Iron Ossetic", chapter 14 of The Oxford Handbook of Languages of the Caucasus (2020, ed. Maria Polinsky), page 641:
Ossetic is a cover term for two closely related but not mutually intelligible East Iranian languages, Iron and Digor. Much of the literature call them dialects of a single language. They are spoken in the Central Caucasus, in North Ossetia—Alania, […]
- Amber Lubera, Sensitivity to complex onsets in Iron Ossetian (2024):
  [Ossetian has] two dialects: Iron and Digor. While similar, they are not mutually intelligible (Erschler 2018, [...]

Fridrik Thordarson, "Ossetic", chapter 4.2.5 in Compendium Linguarum Iranicarum (1989, ed. Rüdiger Schmitt), page 456:
Present-day Ossetic falls into two distinct main dialects: Digor (D. Digoron [I. Dɨguron] ævzag) or West Ossetic, mainly spoken in the western parts of North Ossetia, and Iron (Iron ævzag) or East Ossetic (formerly often named Tagauric, from the tribal name of the Tagaurs), which is the language of the majority of the nation. The language of South Ossetia is a sub-dialect of Iron. The local idiom of the Uællagkom region of East Digoria is a kind of transitional dialect, basically Digor in its structure, but sharing some features with Iron. There is some local variation within each dialect. The literary language is used on Iron. In vocabulary as well as in particulars of phonology and grammatical structure the two dialects diverge from each other to a considerable extent, so that they are mutually barely intelligible. Some of these divergences may date from ancient times and reflect older dialectal differentiation; the inflection of the verb "to be" (4.2.5.3.3.8), some personal endings (4.2.5.3.3.1.3) and the system of demonstrative pronouns (4.2.5.3.2.7.3) can be quoted as possible examples. But in all essentials both dialects are closely related as regards both their historical development and their basic grammatical structure , and there is convincing evidence to show that they are descended from a protodialect that has remained fairly uniform [...]
- Fridrik Thordarson, "Ossetic Literature", chapter 8 in Oral Literature of Iranian Languages (2010, eds. Ulrich Marzolph, Philip Kreyenbroek):
  Ossetic is an Iranian language spoken by about half a million people […] Ossetic has two dialects, which are hardly mutually intelligible: Iron (east Oss.) and Digor (west Oss.). The idiom of southern Ossetia (Georgia) is a variant of Iron; the literary language is based on Iron; very few literary works have been published in Digor.
- Carina Jahani, Standardization and Orthography in the Balochi Language (1989), page 54:
  [...] Digor (the western dialect). The idiom of South Ossetia (in the Georgian SSR) is a variant of Iron. According to Thordarson, Iron and Digor are hardly mutually intelligible, Digor showing a more archaic stage of development than Iron.

Željko Bošković, "Syntax and Prosody of V2 and Clitic Second", in Rethinking Verb Second (2020, eds. Rebecca Woods, Sam Wolfe), page 505:
[…] Ossetic, an Iranian language with two distinct main dialects (they are actually barely mutually intelligible), Iron and Digor. They differ regarding articles: Digor has a definite article but Iron does not […] Iron and Digor also differ regarding second-position cliticization: […]

Ronald Wixman, Peoples of the USSR: An Ethnographic Handbook (2017), page 83:
The Iron speak one of the three dialects of Ossetian, and the Iron dialect forms the basis of the Ossetian literary language. Until 1939 both Iron and Digor had separate literary languages (in that year Digor was abolished).

Yury Makarov, Mark Gibson, Final devoicing across contexts: the case of two varieties of Ossetic (2025):
Iron and Digor are varieties of Ossetic, a minority Iranian language spoken in the Caucasus by approximately 400,000 and 100,000 people, respectively. Although these varieties are claimed not to be mutually intelligible, they exhibit significant similarities in basic vocabulary and phoneme inventories. To the best of our knowledge, the phonetic aspects of voicing in Ossetic have not been studied since the mid-20th century. [...]

Christopher Moseley (ed.), Atlas of the World's Languages in Danger (2010), page 40:
The only Eastern Iranian language in the area under review is Ossete, with two characteristic dialect groups, Digor and Iron, but apparently sufficiently cohesive to be regarded as a single language.

Wikipedia, citing RFE/RL, says "The phonetic, morphological, and lexical differences between the two dialects are greater than between Chechen and Ingush"

(I presume additional sources exist in Russian.) Our entries carefully specify which dialect they are in (and admittedly, it seems the two are rather often homographic or near-homographic). Pinging @Sorjam, Vahagn Petrosyan, კვარია, Hk5183, Arturgudiev93, Уикиредактор as users I have seen create Ossetian entries: Should we make Digor a separate language from Iron, or continue to treat Digor and Iron as dialects of one Ossetian language? - -sche (discuss) 00:47, 19 January 2026 (UTC)Reply

Just to say, whichever way the decision goes, I will not object, and if the decision is to split, I can help (and I imagine @Surjection can too), esp. since the individual entries are marked as to which lect they belong to. Benwing2 (talk) 00:52, 19 January 2026 (UTC)Reply

By default I'm a lumper, until actual editors of the language request the code for their needs, volunteer to split the content we already have (BTW I disagree our Ossetian is carefully labelled, certainly not in translations and etymologies), and teach us which dictionaries and which corpora are for which variety going forward. None of that has happened, so I am against the splitting. Vahag (talk) 20:42, 19 January 2026 (UTC)Reply

The differences are highly predictable and most words are identical. I think keeping them as dialects of one another is the better route. --{{victar|talk}} 00:06, 20 January 2026 (UTC)Reply

I agre with @Vahagn Petrosyan and @Victar. I would rather regard Iron and Digor as dialects and not as separate languages until the community of speakers decides otherwise. Sorjam (talk) 14:56, 24 January 2026 (UTC)Reply

Can't say anything as I don't know much about Ossetian. Уикиредактор (talk) 18:51, 31 January 2026 (UTC)Reply

OK, I have updated WT:LT to note that we have decided not to split Ossetian at this time, and to group Iron and Digor as one Ossetian language. (The link there to this discussion will need to be updated once this discussion is archived.) - -sche (discuss) 20:28, 6 February 2026 (UTC)Reply

I have no objection to replacing our internal etym-only codes [os-iro] and [os-dig] with [oss] and [osd], respectively. Sidenote: Oftentimes, I suspect dialects are classified as languages to receive protected status. --{{victar|talk}} 20:31, 7 February 2026 (UTC)Reply

@Victar Yes, I have made essentially the same point as in your sidenote in #Las Delicias Zapotec, that (it seems especially in Europe), classifying a dialect as a language gives it various sorts of benefits (e.g. protected status), which can create incentives to do this even when it's not linguistically warranted. @-sche @Vahagn Petrosyan assuming no objections, I will replace [os-iro] and [os-dig] with [oss] and [osd], as Victar proposes. Benwing2 (talk) 20:42, 7 February 2026 (UTC)Reply

I support replacing [os-dig] with [osd]. Because in the ISO standard [os] = [oss] = [os] = the same thing (interchangeable), I wonder if it would be confusing for us to be using them to mean two different things; as you can see from how I updated WT:LT, I actually forgot that we aren't currently using [oss] to mean Ossetian (because we aren't currently using [oss] at all), so if it becomes possible to use [oss], I wonder if people will assume it means the same thing as [os]. Then again, now that the ISO has redefined it, maybe people will not expect it to mean [pan-dialectal] Ossetian, and will expect it to mean Iron. Meh... if the editors above who edit Ossetian don't see a problem with replacing [os-iro] with [oss], then I don't object to it either. - -sche (discuss) 00:16, 8 February 2026 (UTC)Reply

I thought about reusing [oss] for the common ancestor of Iron and Digor, following Alanic [xln] -- a stage that is academically reconstructed and commonly referred to as Old Ossetic (the Zelenčuk inscription). However, I would probably instead opt for Proto-Ossetic [os-pro], which is also widely used. --{{victar|talk}} 06:22, 8 February 2026 (UTC)Reply

Also, at some point [os-iro] and [os-dig] were renamed from Digor and Iron to Digor Ossetian and Iron Ossetian, however those names are not commonly used in the academic literature, so I would see them renamed back to Digor and Iron, in line with ISO. --{{victar|talk}} 06:45, 8 February 2026 (UTC)Reply

I would oppose this change. Wikipedia uses Digor Ossetian and Iron Ossetian. The thing is, academic literature is almost always situated in a particular context (Ossetic studies or at least Iranian studies), where the context provides the necessary clarification about what Digor and Iron mean. Wiktionary doesn't usually have that context; e.g. an etymology in English that says it was "borrowed from Iron фоо" will be opaque to the typical reader in a way that "borrowed from Iron Ossetian фоо" is not. That is why Wiktionary usually includes the language name in dialect terms. Benwing2 (talk) 20:05, 8 February 2026 (UTC)Reply

I would venture to guess that most familiar with Ossetian are also familiar with Iron and Digor. If these had been treated as separate languages, would you still argue for the names Digor Ossetian and Iron Ossetian? --{{victar|talk}} 09:24, 10 February 2026 (UTC)Reply

FWIW I would oppose "using [oss] for the common ancestor of Iron and Digor"; using [os] for the whole modern language and not (as the ISO defines it) just Iron is an acceptable level of divergence, but to then use a synonym of [os] ([oss]) to mean neither the thing [oss] means (in the ISO standard) nor the thing we're re-using [os] to mean, but instead some unprecedented third thing, would be too confusing IMO. If we need a code for "Old Ossetic", the ISO provides [oos] "Old Ossetic". (Perhaps that is even what you meant and [oss] is just a typo?) If we need a code for a different "Old Ossetic" in addition to that oos "Old Ossetic", then we need to think of distinct names for them, and then create an exceptional code for the new one. If we need a code for "Proto-Ossetic", then recreating [os-pro] seems like the way to go, if the reasons for which you earlier requested its deletion no longer apply. (I seem to recall that "Proto-Ossetic" meant different things to different scholars, i.e. they differed in which languages they were using "Proto-Ossetic" to mean the ancestor of; is that right? If we want to reconstruct an ancestor of the Ossetian languages Iron and Digor, would "Proto-Ossetian" be any clearer as a name, or no?) - -sche (discuss) 21:02, 8 February 2026 (UTC)Reply

Whoops, my mistake. I did indeed conflate the code for Old Ossetic [oos] with [oss], the set 2 code for Ossetian [os]. Please disregard my comments those codes.

Yes, the labels Old Ossetic, pre-Ossetic, and Proto-Ossetic are indeed used inconsistently in the literature. Following Cheung (2002), I use the name Proto-Ossetic for the intermediate stage between Alanic and the attested Ossetian dialects. However, one could reasonably argue that this stage simply corresponds to Late Alanic, so I haven't pushed for creating [os-pro]. --{{victar|talk}} 23:34, 9 February 2026 (UTC)Reply

This is a hot mess. We have four languages, all spoken in Chad:

Gula: Bagirmi family under Bongo-Bagirmi under Central Sudanic, possibly under Nilo-Saharan.
Tar Gula: Also under Bongo-Bagirmi but not closely related to plain Gula.
Bon Gula: Mbum-Day family, ultimately Atlantic-Congo.
Zan Gula: Mbum-Day family, ultimately Atlantic-Congo, closely related to Bon Gula. Bon Gula, Zan Gula and Kulaal (possibly also with Fania) form a clade per Wikipedia.

There's also:

a Gulay language under Bongo-Bagirmi;
Kulaal also goes by Gula Iro (Wikipedia's name);
the clade containing Bon Gula, Zan Gula and Kulaal is called Gula by Wikipedia;
the Gola language of Liberia, not related to the others except being in Atlantic-Congo and sometimes called Gula;
of course, the Gullah language.

Given this, plain Gula language seems too confusing and it feels like it needs a disambiguator. The only one that seems to work here is Bagirmi, hence maybe "Gula (Bagirmi)"?

Benwing2 (talk) 03:12, 19 January 2026 (UTC)Reply

No idea how to disambiguate glu, but allow me to put in a vote against the option "Sara Gula" used on Wikipedia, since "Sara" is itself an ambiguous term used for most of the more southwestern Bongo-Bagirmi branches; always including the group we call just "Sara languages" and Glottolog calls "Sara Central", but sometimes also others. --Tropylium (talk) 11:58, 19 January 2026 (UTC)Reply

Hmm, this is indeed tricky. My concern about "Gula (Bagirmi)" is that Bagirmi is (also) a different language, and I can foresee people seeing "Gula (Bagirmi)" and thinking it means either that Gula = Bagirmi, or that Gula is a dialect of Bagirmi. "Gula (Bagirmic)" might be marginally better, but both it and "Gula (Bagirmi)" would be Wiktionary neologisms with no use anywhere else AFAICT. Literature appears to just call it Gula, or occasionally Gula Sara, which has the problems Tropylium notes (Gula is not one of the Sara languages, so renaming to that would not make it any less ambiguous, it — like "Gula (Bagirmi)" — would just laterally change the way in which it was ambiguous). Some sources use "Gula (Chad)", which might be adequate: it would distinguish this from Liberia's Gula/Gola (and from Gullah), though other than that it offers little other benefit over "Gula". Ultimately, "Gula" is already distinct from Chad's Gulay and Tar/Bon/Zan Gula by having a different name ("Gula" vs "Tar Gula" etc); as I said about Mon, I don't know if we can ultimately afford to let one language being named Foo block another language from being named Bar Foo, or vice versa. After all, if the concern is that someone may think that a mention they've found in some paper of a "Tar Gula" word means there's dialect named "Tar" of a language "Gula", then renaming "Gula" to "Gula (Bagirmi)" or "Gula (Bagirmic)" or "Gula (Chad)" or anything else does not seem to particularly prevent that misinterpretation, since I suspect the person would figure out that the parenthetical was a Wiktionary disambiguator and not a part of the language's actual name that they should expect the paper they're reading to include. (If they're smart enough to realize that Tar Gula is not in the Bagirmic family, I'd hope they are smart enough to realize it's also a language named "Tar Gula" and is not "Gula".) - -sche (discuss) 21:05, 6 February 2026 (UTC)Reply

South Lembata is a geographic designation. Wikipedia refers to this language as East Atadei. Glottolog's identifier is sout2896 but they call it Eastern Atadei, meaning they changed the name at some point. The one newish citiation they have (2016) cals it Eastern Atadei; the other citation is from 1978, is in Indonesian, and appears to call it Painara. I don't know why Wikipedia truncated Eastern to East. Benwing2 (talk) 03:34, 19 January 2026 (UTC)Reply

Weak support for "Eastern Atadei"; this appears to be the most common name, being found in Glottolog, Krauße 2016 (the "newish citiation"^[sic] you mention), Krauße 2018, and Sinnemäki and Ahola 2023, although "Atadei Painara" appears to be nearly as frequent, being found in Fricke 2019, Fricke 2020 and Klamer et al. 2021. Krauße 2018 notes that "Lerek" was also employed (as is often the case, this is apparently a locality where the variety is spoken). Hazarasp (parlement · werkis) 03:08, 31 January 2026 (UTC)Reply

These are two different Mbembes that aren't related except both being in the Benue-Congo family and both spoken in (different parts of) Nigeria; Tigon Mbembe is in fact spoken on both sides of the Cameroon-Nigeria border. Cross River Mbembe is a Cross River language while Tigon Mbembe is a Jukunoid language. Wikipedia renamed Tigon Mbembe -> Tigon and Cross River Mbembe -> Mbembe, which seems not terribly satisfactory as confusion can still arise. Looking at the literature, from what I can tell, both languages are most frequently called just Mbembe, which suggests maybe Mbembe (Cross River) vs. Mbembe (Jukunoid)? Benwing2 (talk) 04:32, 19 January 2026 (UTC)Reply

Three issues here: (1) postposed "South", (2) inconsistent spelling of Marg(h)i, (3) having two different languages with the same name, one qualified and one not. Glottolog calls Margi "Marghi Central" and maybe we can adopt a variant of that name, e.g. Central Marg(h)i. Marghi South is in the "Kilba-South Margi" family (go figure) and Marghi Central is under Marghic, which is under Bura-Marghi, which is under Margi-Mandara-Mofu (again, go figure). So I'd suggest South Marg(h)i and Central Marg(h)i, depending on whether the h-full or h-less variant is more common in recent literature. Benwing2 (talk) 05:30, 19 January 2026 (UTC)Reply

North Fali and South Fali form the Fali family, which is not closely related to the Fali language. Baissa Fali is a totally different language, extinct and unclassified except under Benue Congo. I don't really know how to sort this out but at least we should rename the Fali language to have a disambiguating tag. Benwing2 (talk) 06:02, 19 January 2026 (UTC)Reply

Oh another one of these "mountain-dweller" terms that surely need disambiguation. fli (our and Glottolog's "Fali") is "Fali of Mubi" on Wikipedia, which seems workable. Theoretically there would be "Bata Fali" after its immediate family, but this might not be in any actual use. --Tropylium (talk) 11:47, 19 January 2026 (UTC)Reply

Wikipedia considers Dungra (Bhil) and Noiri as part of the Bhilori languages and says they are mutually intelligible, so we should consider merging them. But in the meantime, Dungra Bhil is not closely related to Sindhi Bhil; the only connection is that they're both spoken by the Bhil people. Wikipedia thinks the lect we call Dungra Bhil is just Dungra, and I think we should rename accordingly to avoid the implication that Dungra Bhil is close to Sindhi Bhil. Benwing2 (talk) 07:02, 19 January 2026 (UTC)Reply

Per Wikipedia, Sota Kanum is not closely related to the other Kanum languages Bädi Kanum, Ngkâlmpw Kanum and Smärky Kanum (which I think should be renamed to eliminate the accents), and Wikipedia refers to Sota Kanum as Nggarna. I propose we do the same (BTW Wikipedia refers to Smärky Kanum as just Smerky, FWIW). Benwing2 (talk) 07:11, 19 January 2026 (UTC)Reply

Support As with so many other aspects of Wikipedia's treatment of Papuan lects, the treatment of Sota Kanum is presumably from Usher; Glottolog and Carroll 2021 group it with the other Kanum varieties, and we should probably follow them unless someone can corroborate Usher. In any case, "Nggarna" appears to be the name most commonly used in the specialist literature (e.g. in Evans and Usher); on the same grounds, the other Kanum varieties should be named as "Bedi Ngkolmpu", "Ngkontar Ngkolmpu", and "Smerki" (the first two should either go in a "Ngkolmpu" family or be merged into "Ngkolmpu"). Hazarasp (parlement · werkis) 23:17, 30 January 2026 (UTC)Reply

As above: rename to Western Wee, Southern Wee. Wee is what we call the family, and using it avoids an accent. FWIW Wikidata also uses the names "Southern Wee" and "Western Wee". Wikipedia refers to both combined as the Guere language, but when splitting them up, calls them "Southern Wee" and "Western Wee" as well. Benwing2 (talk) 08:54, 19 January 2026 (UTC)Reply

For language(s) spoken by 400,000 people, these are surprisingly obscure in English (and German) sources. I have not managed to find any print uses of the ISO name "Wè Western" at all. The least-rare names for that one seem to be "Western Wè" and "Western Guere"; I can also find the occasional use of "Western We" and "Western Wee" (I wouldn't call any of the names "most common" because I wouldn't call any of them "common"); the situation with the Southern variety is similar. I can't find much use of "Wee languages", either. To me it looks like we should rename the family from "Wee" to "Wè" and the languages from "Wè Western", "Wè Southern" to "Western Wè", "Southern Wè", but if anyone else wants to weigh in please do. - -sche (discuss) 04:35, 26 January 2026 (UTC)Reply

We have three Samo languages (Maya Samo, Matya Samo, Southern Samo) that are located in Burkina Faso and a Samo language located in New Guinea that need disambiguation. Maybe we can rename the Samo language to Samo (New Guinea), as Wikipedia does. Benwing2 (talk) 19:05, 19 January 2026 (UTC)Reply

Needless to say, there are many Berber languages spoken in Tunisia, and many spoken in the Northern Sahara. Hence the language names "Tunisian Berber" and "Northern Saharan Berber" are unhelpful. I suggest renaming Tunisian Berber to Sened (the name used in Wikipedia, ISO 639-3 and Glottolog) and Northern Saharan Berber to Tumzabt (the name used in ISO 639-3 and Glottolog) or Mozabite (the name used in Wikipedia). Benwing2 (talk) 20:09, 19 January 2026 (UTC)Reply

OK I have crossed this out because I didn't realize the names chosen here were intentional. See also the recent discussion Wiktionary:Language_treatment_requests#Splitting_Northern_Saharan_Berber. Benwing2 (talk) 00:13, 25 January 2026 (UTC)Reply

Northwestern Fars (faz) is considered spurious by Glottolog ("all likely candidates in the area already have ISO codes") although still maintained by ISO 639-3. Southwestern Fars (fay, ISO 639-3's name) is a group of disparate dialects that goes by "Fars dialects" in Glottolog and "Kuhmareyi" in Wikipedia. Since Southwestern Fars is an area with several languages, using Kuhmareyi (presumably the endonym) seems better. Benwing2 (talk) 20:30, 19 January 2026 (UTC)Reply

~~Support~~; I can only find "Northwestern Fars" in giant lists of languages that copied it from other lists of languages, and as a place; Ethnologue agrees it is spurious: "Northwestern Fars was thought to be the name of a language in Iran and was thus assigned a three-letter code (faz) in the ISO 639 standard. However, the editors of Ethnologue have not been able to identify any evidence for the existence of this language and are moving ahead with proceedings to have it removed from the ISO 639 standard." Pinging @Victar as the person who added the only mentions Northwestern Fars we seem to have (here), which will need to be updated to use some other language code(s): Wikipedia says at least Somghani and Papuni are Kuhmareyi i.e. fay; is Buringuni also fay?
@Eeranee (though no longer active) as the creator of our Southwestern Fars entries, if you have thoughts on the language name. I can only find one book and one scholarly paper using "Southwestern Fars" as a language name, and no books and just one paper using "Kuhmareyi" (so, commonness-in-English does not appear to be a usable factor in deciding which name to use); two other alt spellings I ran across online (Kohmare, Kohmareh) also see no use in print AFAICT; but the Persian version of Kuhmareyi seems to be in use, so it seems fine to go with that. - -sche (discuss) 22:57, 19 January 2026 (UTC)Reply

According to Lecoq,^[1] the Fars dialects can be split into 3 groups: 1. (of Kazerun) Buringuni, Māsarmi, Pāpuni, and Somghuni; 2. (of Northwest of Shiraz) Ardakāni, Kalāti, and Khullāri; 3. (of Northwest of Sivand) Kondāzi and Davāni. I don't ever trust ~~Glottolog~~ Ethnologue and Wikipedia appear to be just parroting them. --{{victar|talk}} 01:06, 20 January 2026 (UTC)Reply

I don't know anything much about Fars dialects but presumably the editors of Glottolog and Ethnologue know about this source; it doesn't look especially obscure. So I'm hesitant to simply dismiss them in favor of a source from 1989. Benwing2 (talk) 01:25, 20 January 2026 (UTC)Reply

@-sche, what paper used Kuhmareyi? I see @Kwamikagami is the one that moved Davani dialect to Kuhmareyi language back in 2013. --{{victar|talk}} 03:39, 20 January 2026 (UTC)Reply

OK, I'm temporarily striking my support for any particular change while we work out what the best change here is. Some Iranian-language-speaking editors I would otherwise ping to ask if Farsi-language reference works say anything about this have not been active recently, perhaps because of the events (and internet shutdowns) in Iran, but hopefully in a few weeks that's something that can be looked into. (With some discussions on this page dating to a decade ago, there is no immediate deadline.) - -sche (discuss) 07:31, 30 January 2026 (UTC)Reply

Under the Lalo languages, we have Dongshanba Lalo, Xishanba Lalo, Eastern Lalu and Western Lalu. I don't know if there's any significance to the Lalo vs. Lalu distinction, but if not I suggest renaming one or the other pair for consistency. Benwing2 (talk) 21:10, 19 January 2026 (UTC)Reply

We are missing ISO 639-3 code nsf, which is considered the most divergent Nisu language and is called Northwestern Nisu by ISO 639-3 and Wikidata but Far Northwestern Nisu by Glottolog.

Meanwhile, nsv (Southwestern Nisu) is apparently nonexistent or "seriously confused" per Glottolog:

E16/E17/E18/E19/E20/E21/E22/E23/E24/E25/E26/E27/E28 has a Southwestern Nisu [nsv] as well as a Southern Nisu [nsd] said to be spoken in the same (or more) locations and with the same dialect names. The map shows S Nisu [nsd] to be spoken in the region described for both (Pu'er subprefecture), whereas SW Nisu is put much further southwest in Xishuangbanna (Dai) Prefecture. SW Nisu, as a distinct variety spoken in Pu'er or in Xishuangbanna prefecture, is not taken up in Nisu dialect surveys ( Chan, Ken and Bai Bibo and Yang Liujin 2008 , Yang, Cathryn 2009 ). The SW Nisu entry is therefore probably spurious, or seriously confused. See also: Southern Nisu [nsd].

Benwing2 (talk) 22:46, 19 January 2026 (UTC)Reply

We have both "Mandingo" [man] and "Mandinka" [mnk] languages. According to ISO 639-3, the code man is a macrolanguage code for several individual Mande languages (although the hierarchy per Glottolog is quite different), and Glottolog has no record of code 'man' at all. Since our Mandingo language has no lemmas, I'd suggest either deleting it or reusing it as a family code for the Manding languages (currently assigned family code dmn-man). Benwing2 (talk) 23:00, 19 January 2026 (UTC)Reply

Support reusing man for the Manding family; most uses of the term "Mandingo language" apparently refer to the Manding languages or some subset of them considered as a macrolanguage. Hazarasp (parlement · werkis) 01:09, 30 January 2026 (UTC)Reply

Through failing to notice or update our own modules when new ISO codes were added over the years, we are missing codes for several sign languages. I have researched them all and can confirm these are real:

~~rsn Rwandan Sign Language — well-documented, a dictionary exists and there are videos online~~ (added)
sqx Kufr Qassem Sign Language, a.k.a. KQSL — some literature, and a dictionary exists online
~~lsn Tibetan Sign Language — a dictionary exists~~ (added)
~~lsv Sivia Sign Language — literature including a grammatical sketch exists~~ (added)
~~csx Cambodian Sign Language (also known, but less preferentially, as Khmer Sign Language) — well-documented, videos online~~ (added)
ajs Algerian Jewish Sign Language (used in Israel) — at least some literature about it
dsz Mardin Sign Language — at least some literature about it, photos showing how to sign "18" (subtractively) here
ehs Miyakubo Sign Language — at least some literature about it
lsb Burundian Sign Language — little literature, but videos online
lsw Seychelles Sign Language — little literature, but recently recognized governmentally and there are videos online
rib Bribri Sign Language — at least some literature about it, transcribing some signs
rnb Brunca Sign Language — at least some literature about it
wbs West Bengal Sign Language — some literature about it, and videos online
ysm Myanmar Sign Language (also known as Burmese Sign Language) — some literature exists, videos online (both names seem reasonably common, perhaps Myanmar Sign Language is somewhat more common and takes the edge as it is the ISO name)

These are more obscure but still seem to be real:

lsc Albarradas Sign Language (in Spanish Lengua de señas Albarradas) — a little literature exists ([25])
jks Koniya Sign Language (also known as Amami Koniya Sign Language) — used in Japan, very little-discussed in Western literature, but enough to confirm it's real (and endangered)
yhs Yan-nhaŋu Sign Language (also Yan-nhangu Sign Language) — little-discussed in literature, but there is some
szs Solomon Islands Sign Language — some literature about it exists; emergent language, developed at a school that taught Auslan, where students mixed in other signs and created new ones

Assuming there are no objections, I will add all of these, as we seem to be missing them through nothing but oversight. - -sche (discuss) 03:47, 20 January 2026 (UTC)Reply

No objections. Benwing2 (talk) 04:38, 20 January 2026 (UTC)Reply

In 2023, the ISO retired zkb (Koibal), saying here something also discussed in our own prior discussion of it, that it's not a language but an ethnonym for a Samoyedic Kamas group who formerly spoke xas (Kamas) and later shifted to kjh (Khakas). ~~It looks like all uses were cleaned up after the 2023 BP discussion (though please double check), so we just have to actually remove the code from the module.~~ (No, there are still RC pages using it.)

- -sche (discuss) 06:58, 20 January 2026 (UTC)Reply

@-sche: Koibal is not a synonym of Kamassian. It refers to a certain lect that seems to have been closely related to Kamassian, but due to the very small corpus it is hard to say for sure. Combining the two into one is creating scientific consensus where there is none, and not enough data to form one. Neither historical grammatical descriptions nor modern revitalisation attempts use Koibal data for Kamassian, and similarly Koibal revitalisation events (as few as they are, but there are a few comic books translated into the language iirc) do not use Kamassian as a basis. Thadh (talk) 12:36, 20 January 2026 (UTC)Reply

Are you objecting to the removal of zkb (which ISO merged into kjh and not xas), or only to the ISO's comment about which languages that ethnic group spoke? If we need a code, I wonder if it'd be clearer if we devised our own, instead of using a code that the ISO treats as a variety of kjh for a relative of xas...? Or is it best to keep using this code? @Tropylium. - -sche (discuss) 17:30, 20 January 2026 (UTC)Reply

Revisiting the 2023 discussion still reads to me as retiring the code on the basis that it's been used for both Samoyedic Koibal and Turkic Koibal (spoken by the same ethnic group before and after language shift). If we do not have this confusion, there's no problem with continuing to keep Samoyedic Koibal. It isn't highly distinct from xas, I know of maybe four systematic differences, but definitely exists / existed as its own variety. --Tropylium (talk) 17:43, 20 January 2026 (UTC)Reply

I objected to both, although I don't have a particular issue with using our own code for Koibal rather than the retired ISO code. Thadh (talk) 17:25, 21 January 2026 (UTC)Reply

There are several pairs (and a few triplets) of extremely similarly named languages. The following is a list of the languages with the same sort key per my bot script's sort key for sorting translation tables, which is nearly identical to the get_L2_sort_key() function in Module:headword/page used for determining whether the languages on a page are correctly ordered. ~~I will be proposing a remedy for each pair to avoid confusion; for now I'm just listing the pairs.~~ I have proposed remedies below for each pair or triplet.

Lang 1	Lang 2, Lang 3 (if applicable)	Proposed disposition
Abu (ado)	Abu' (aah)	Both are spoken in Papua New Guinea. Abu is a Ramu language called Adjora by Wikipedia but Abu by ISO 639-3 and Glottolog; Wikipedia seems to claim that Abu is a dialect of Adjora. Abu' is a Torricelli language (Arapesh subfamily) called Abu' Arapesh by ISO 639-3, Glottolog and Wikipedia. Abu' is the endonym but Arapesh (the family name) was added in the ISO submission specifically to disambiguate the name, and adopted by Glottolog and Wikipedia. I suggest Abu (Ramu) and Abu' (Torricelli) as both are in the same country.
Ache (yif)	Aché (guq)	Ache is a Yi language of China; Aché is a Guarani language of Paraguay. Wikipedia calls the former Ache Yi but ISO 639-3 and Glottolog use just Ache. I straightforwardly suggest Ache (China) and Aché (Paraguay).
Ap Ma (kbx)	Apma (app)	Ap Ma is a Keram language of Papua New Guinea. Apma is an Oceanic language of Vanuatu. Wikipedia calls the former Kambot but ISO 639-3 and Glottolog use Ap Ma. I suggest Ap Ma (New Guinea), Apma (Vanuatu).
Arua (aru)	Aruá (arx)	Both languages are (or were) spoken in Brazil, aru in Amazonas state and arx in Rondônia and Mato Grosso. aru goes by Arawá in Wikipedia; Arawá (Amazonas State) in Glottolog and Aruá (Amazonas State), Arawá in ISO 639-3, which changed it from Arua to Aruá to Arawá with the note "This [Arawá] is the name used for the language in all the recent literature." The note in question said to use Arawá as primary and maintain Aruá as an alternative name, but it ended up backwards. arx goes by Aruá (Rodonia State) [sic] in ISO, Aruá (Rondonia State) [sic] in Glottolog and Aruá language (Rondônia) in Wikipedia. To make things even more confusing, there's an extinct Aruã language also formerly spoken in Brazil, on two islands in Pará state. This language is coded in Glottolog as Aruan but not in ISO. aru is in what we call the Arauan family but elsewhere is called the Arawan family; arx is Tupian; and Aruã is Arawakan. I suggest renaming aru to Arawá and arx to Aruá (Rondônia) following Wikipedia, since with this change there's only one Arawá but several languages called Aruá or similar. I also suggest changing the Arauan family to Arawan and Proto-Arawa to Proto-Arawan.
Bari (bfa)	Barí (mot)	Bari is Nilotic, spoken in South Sudan, northwest Uganda and small parts of the DRC. Barí is a Chibchan language spoken in Colombia and a disconnected portion of Venezuela. Wikipedia, Glottolog and ISO all agree on the names Bari and Barí. I suggest Bari (East Africa) and Barí (South America). (To add to the confusion, the Bai language of South Sudan can also go by Bari.)
Ga'dang (gdg)	Gadang (gdk)	Ga'dang (gdg) is a Philippine language of the Philippines and Gadang (gdk) is a Chadic language of Chad. Wikipedia, Glottolog and ISO all agree on the names Ga'dang and Gadang, although Wikipedia mentions an alternative name Gâdang for the Philippine language. I suggest Ga'dang (Philippines) and Gadang (Chad). (To make matters worse, there is a Gaddang language [gad] that is closely related to Ga'dang.)
Ma (msj)	Mạ (cma), Mebu (mjn)	Ma (msj) is Ubangian, spoken in the DRC. Mạ (cma) is Mon-Khmer, spoken in Vietnam. msj goes by Ma language (unqualified) in Wikipedia but Ma (Democratic Republic of Congo) in ISO and Glottolog. cma goes by Maa in ISO and Glottolog, and in Wikipedia (listed as the Maa dialect group under the w:Koho language). To make things even more confusing, there's mjn, which we and Wikipedia call Mebu but ISO and Glottolog call Ma (Papua New Guinea), which is a Finisterre (Trans-New Guinea) language. Even in Wikipedia, under w:Mebu language, the lede begins "Ma (Ma Wam), or Mebu, is one of the Finisterre languages of Papua New Guinea" and the box on the right gives Ma as the primary name, which suggests to me that they're only using Mebu for disambiguation purposes. For this reason, I'd suggest msj becomes Ma (Congo), cma becomes Maa or possibly Maa (Vietnam), and mjn becomes Ma (New Guinea).
Ma'di (mhi)	Madi (grg), Jamamadí (jaa)	Ma'di (mhi) is a Central Sudanic language of Uganda and South Sudan. Madi (grg) is a Finisterre language of Papua New Guinea. Ma'di (mhi) is called as such by Wikipedia, ISO and Glottolog; Madi (grg) is called Madi by ISO, Madi (Papua New Guinea) by Glottolog, and Gira by Wikipedia, although Wikipedia acknowledges Madi as an alternative name and gives it as the preferred name in the navbox to the right. Finally, there is what we call Jamamadí (jaa), an Arawan language of Amazonas, Brazil, which goes by Madí in Wikipedia, Madi in Glottolog (although their code is jama1261, suggesting it was formerly Jamamadí or similar) and Jamamadí in ISO. In 2006, ISO expanded the scope of Jamamadí, and the note for it says "Jamamadi, Jaruára, and Banawa are speech varieties of the same language." Wikipedia similarly says that Jamamadí is properly one of the dialects of Madí. To sort this out, I'd suggest mhi -> Ma'di (East Africa), grg -> Madi (New Guinea) and jaa -> Madí (Brazil).
Mbe (mfo)	Mbe' (mtk)	Mbe (mfo) is an Ekoid (Atlantic-Congo) language of Nigeria. Mbe' is a Grassfields (also Atlantic-Congo) language of Cameroon. ISO, Glottolog and Wikipedia agree on the name Mbe for mfo. ISO and Glottolog agree on Mbe' for mtk, but Wikipedia calls is Mbəʼ; this, naturally, is User:Kwamikagami's doing (really now?!). I would suggeste mfo -> Mbe (Nigeria) and mtk -> Mbe' (Cameroon).
Nga La (hlt)	Ngala (nud)	Nga La (hlt) is a Kuki-Chin language of Chin State, Myanmar and Mizoram, India. Ngala (nud) is a Sepik language of Papua New Guinea. hlt goes by Nga La in Glottolog and Wikipedia, and formerly also in ISO, but in 2011 ISO changed the name to Matu Chin, saying "Nga La is a village/variety name within the language group called Matu Chin." Also, Wikipedia's lede for Nga La says "Matu, also known as Matu Chin, is a Kuki-Chin spoken in Matupi township, Chin State, Myanmar, and also in Mizoram, India by the Matu people." with no mention of Nga La. Ngala (nud) goes by Ngala in Glottolog and ISO, but by Nggala in Wikipedia; however, Wikipedia's lede says "Ngala, or Sogap, is one of the Ndu languages of Sepik River region of northern Papua New Guinea." with no mention of the spelling Nggala anywhere in the article. The name Ngala is ambiguous, however, per w:Ngala language, which mentions at least 5 languages that can be called this. In addition to nud, there is (1) mpi (Mpade), which Glottolog appears to split into Mpade proper and Ngala of Lake Chad (https://glottolog.org/resource/languoid/id/ngal1301), an extinct Chadic language; (2) a Zande language of the DRC (originally Central African Republic) also called Ngala, which Glottolog calls Ngala-Santandrea (https://glottolog.org/resource/languoid/id/ngal1296), after Stefano Santandrea, the author who described it, with the comment "The Ngala language (a Zande-group language originally in the Central African Republic) discovered by Santandrea around 1950, is missing from E16/E17/E18/E19/E20/E21/E22/E23/E24/E25/E26/E27/E28 (see Santandrea, Stefano 1952 , data appears in Santandrea, Stefano 1965 ). It is, in fact, missing from all listings of African, Sudanese, or CAR languages and with only some 50 refugee speakers half a century ago, it is presumably extinct now. I wish to thank Raymond Boyd for discussion on the classification of this language."; (3) Lingala (lin) and its mixed-language derivative Bangala (bxg), which are sometimes called Ngala. I would rename hlt -> Matu after the name of the ethnic group, as this (surprisingly) is unambiguous and Matu Chin is merely the ethnic group compounded with the language family, and maintain nud as Ngala, as the other two extinct Ngala languages are incredibly obscure; or if we think we might end up coding one of both of those others, rename nud -> Ngala (New Guinea).
Voro (vor)	Võro (vro)	Voro (vor) is a Volta-Congo language of Nigeria. Võro (vro) is a well-known Finnic language of Estonia. Voro is called Voro language (Adamawa) in Wikipedia and simply Voro in Glottolog and ISO. Given how well-known Võro [vro] is, I'd keep it at Võro and rename vor -> Voro (Nigeria).
Wara (wbf)	Wára (tci)	Wara (wbf) is apparently a Gur language of Burkina Faso. It is called Samwe in Wikipedia but Wara in ISO. Glottolog calls it Samue but their code is wara1292, suggesting they changed it from Wara. Wára (tci) is called Upper Morehead in Wikipedia, Wára in ISO, and Anta-Komnzo-Wára-Wérè-Kémä (ugh) in Glottolog. This language is already being discussed elsewhere in WT:LTR, where I proposed renaming it to Upper Morehead because Wára is properly the name of one or more of the dialects of the language (hence the compound name in Glottolog). To complicate things, Upper Morehead is an alternative name of the closely related Arammba (Aramba) language. I suggest renaming wbf -> Samwe or Samue, whichever spelling seems more common in the literature. As for tci, there seem to be no good names. Perhaps a literature search would reveal some better names, but absent that I would reluctantly propose Upper Morehead as less ambiguous than Wára; alternatively we could pick a name like Macro-Wára, but I am not sure this name can be found elsewhere.

Benwing2 (talk) 09:14, 20 January 2026 (UTC)Reply

I personally would prefer not to use the "(Country)" tags anywhere unless strictly necessary. To me, it looks like unnecessary clutter, and people who work with these languages know the diacritics that should be used.

In some cases, a rename could be argued for however. For instance, for Võro, an alternative name "South Estonian" might be a good idea, since we group all of South Estonian together anyway, and biasing one variety over the other is maybe not a great thing. But even as things are now I think "Voro" vs "Võro" is a better idea than "Voro (Nigeria)" vs "Võro". Thadh (talk) 17:21, 21 January 2026 (UTC)Reply

In general I agree with Benwing2. The assumption that "people who work with these languages know the diacritics that should be used" seems to me to be overly editor-centric. It's easy to see how even an expert linguist reading Wiktionary could be confused by a translation table containing a translation only into "Wara" with no neighbouring "Wára" translation (assuming the languages are vaguely similar in orthography and syllable structure; if not, substitute any other pair of languages in the table!). The only way for them to resolve their confusion, assuming the relevant Wara entry doesn't exist, would be to look at the page source and check the ISO code. Not intuitive at all! This, that and the other (talk) 01:39, 22 January 2026 (UTC)Reply

I think you are mistaken. By "working with", I did not mean editors, I meant literally anyone involved with the language in any meaningful way - learners, linguists... Nobody would expect to find "Voro" instead of "Võro", since that is simply never how the language name is spelled and doing so would discredit us automatically. Similarly, anyone familiar with the Ma'di language would never expect the name to not be written without the apostrophe.

I think trying to adapt for those that would not trust us enough to get the diacritics right by increasing the number of symbols in the name is not a net benefit. Thadh (talk) 05:49, 22 January 2026 (UTC)Reply

But Voro vs. Võro is a particularly clear case. For many of these languages, multiple forms with and without diacritics are found in the literature (e.g. Mạ vs. Ma; Madí vs. Madi; Aruá can refer to two or three languages; etc.). So it's hardly the case that "anyone familiar" with these languages would expect them to be written the way they are. Benwing2 (talk) 05:58, 22 January 2026 (UTC)Reply

Okay, in that case let me revise my statement, I would oppose renaming those languages where the diacritic-ed form is widely used and so confusion with a language without a diacritic is minimal. This is the case for Võro and Ma'di, but I'm not familiar with all the languages. Thadh (talk) 09:06, 23 January 2026 (UTC)Reply

For aah, what if we use "Abu' (Arapesh)" for consistency with (but better adherence to our preference for disambiguation being in parentheses than) the places like Wikipedia / Glottolog and occasional scholarly writings that call it "Abu' Arapesh" / "Abu Arapesh"? For the other Abu, I will try (or someone else can!) to look into whether it is more commonly called Abu or Adjora. - -sche (discuss) 01:46, 22 January 2026 (UTC)Reply

On one hand, as I said elsewhere, I am sympathetic to the point that if we don't have to disambiguate, we should be reluctant to, because (when we disambiguate) we're making people type a lot more on every entry they enter. (In that vein, I agree with apparently everyone here that Võro is well enough known to be fine as-is.) OTOH, I agree with TTO and Benwing that if names are ambiguous (e.g. one spelling is often used for both languages) and the languages themselves are also confusable (e.g. two Bantoid languages that are both written in Latin script), it'd be helpful to distinguish them; we were probably doing people a disservice by being so cavalier for a long time about "sure Pyu means two things, but we found a way to call one of them Tircul so it's not a problem"; we probably did a good thing by renaming "Tircul" to its most common name of "Pyu" + disambiguator a while ago.
Unfortunately, it can be hard to judge how commonly a given spelling is used for a given language when it's also used for other things (I'm looking at you, "Ma", but even "Madi" and "Madang" are used as first and last names, which is chaff when searching); in my own looking-around, the Google Scholar searches I tried (combining "Madi" vs "Ma'di" with "Uganda" or various other keywords to bring up the African rather than PNG Madi) suggested to me that people omit the apostrophe fairly often, but Thadh has the opposite experience, perhaps helped by knowing where to look.
"Mbe" is difficult to search for because both Mbes are Bantoid languages in Africa (and searching together with country names helps less than I would have hoped), but in the papers I can find, the apostrophe-like character is often omitted, so there does seem to be large potential for confusing them. "Ma" is even harder to search for (because it means so many things in so many languages), but I've found about a dozen papers using "Ma" in reference to the Mon-Khmer language, another dozen using "Maa", and only a couple using "Mạ" in English (Vietnamese-language papers do use diacritics), and in general I agree the tiny dot below is too little to distinguish these. Unfortunately, most uses that I can find of bare "Maa" as a language name are in reference to a Nilotic, Kenyan language instead of the Mon-Khmer language (have a look through e.g. [26], much of it is mas), so simply switching cma to bare "Maa" would not really disambiguate it. I could get behind mfo→"Mbe (Nigeria)", mtk→"Mbe' (Cameroon)", msj→"Ma (Congo)", and cma either →"Ma (Vietnam)" or →"Maa (Vietnam)" ("Maa" might be slightly more common, or it might just be easier to search for...). But maybe someone can think of a user/editor-friendlier way to disambiguate than our current system of typing parentheticals after the names of the languages themselves, e.g. Javascript that automatically adds the disambiguators the way flags are added, but also (albeit people don't have to type the names in these cases, templates and scripts do that) to translations tables and etymologies, not just L2s?) I can't find enough information about mjn yet to get a sense of what to do about it: when I search for "Ma" + Guinea, what turns up is "Ap Ma". - -sche (discuss) 02:37, 24 January 2026 (UTC)Reply

I've found that in many cases where you look up Ma'di online without the apostrophe, there are websites that do omit the apostrophe, but once you look at the actual printed/published text that is given on that website, it is there. Both scholarly literature (cf. Glottolog) and native texts (the various christian sermons and whatnot) overwhelmingly use the apostrophe in the name. Thadh (talk) 07:24, 25 January 2026 (UTC)Reply

@Benwing2 For the case of Gâdang, that is because the Commission on the Filipino Language (who governs all languages of the Philippines) recommends that when a glottal stop is in between a vowel and a consonant, the vowel will have a ◌̂. The usage of the ' is due to foriegn linguists who first documented these indigenous languages (see also T'boli and I'wak).

So Gâdang is the standard spelling in Tagalog/Filipino, and it is up to you guys whether it should also be adopted to English. — 🍕 Yivan000 ^view_talk 08:27, 2 February 2026 (UTC)Reply

Wikipedia says about this language:

Ewarhuyana, listed in Campbell (2012), is an alternate name for Tiriyó. [cited to Denise Fajardo Grupioni (2005). Tiriyó - Indigenous Peoples in Brazil. Instituto Socioambiental (ISA).]

There are only scattered tertiary sources mentioning this language, presumably all taking from Campbell. Benwing2 (talk) 04:24, 21 January 2026 (UTC)Reply

Interesting; I can find very little about this at all; this (F) and this (AB) mention the Ewarhuyana people as a group that exist in the same general area as the Karuyana / Kaxuyana and Tiriyó, without saying whether they speak the same or different languages; this (Az) says "Os Tiriyó, também conhecidos como Trio e Tarëno, vivem em conjunto com outros grupos como o Katxuyana, Ewarhuyana e Akuriryó, todos de língua karib. Os Tiriyó se distribuem ao Norte do Estado do Pará e Suriname. [...]", without specifying whether they speak the same Carib(an) language or not. It looks like Metaknowledge and I were swayed by Wikipedia, which had an article about it as a language at the time we added this code. Since Wiktionary has no entries in it, it seems fine to remove the code until such time as someone actually shows up with words in it and we determine whether theyr'e Tiriyó or not. - -sche (discuss) 22:55, 21 January 2026 (UTC)Reply

In 2021, ISO added ocm for Old Cham, a language attested in some inscriptions; we already have some entries which refer to this language (Phan Rang - Tháp Chàm, Phan_Thiết#Vietnamese, Phan Rang, ribu#Malay) and would benefit from adding the code here too. - -sche (discuss) 06:28, 21 January 2026 (UTC)Reply

Support. Sometimes we treat Old variants of languages at etym variants but this language dates back to the 4th century AD which suggests it's probably quite different from modern Cham (and written in a different script, the Pallava script, which doesn't seem encoded in Unicode and which we don't have in Module:scripts/data). Benwing2 (talk) 08:59, 21 January 2026 (UTC)Reply

Pallava script is proposed in 2018 [27] but before that, it must firstly have a code in ISO 15924. Many languages once shared this same script. We're still waiting... --Octahedron80 (talk) 04:18, 23 January 2026 (UTC)Reply

I strongly believe we need to merge the two Norwegian varieties but in the meantime, we have complaints from the data consistency checker due to the way these two languages are set up. Specifically, Bokmål is placed directly under North Germanic and given two ancestors from different subfamilies (Middle Norwegian and Danish), and likewise Caribbean Hindustani is placed directly under the Indo-Aryan languages and given two ancestors from different subfamilies (Bhojpuri and Awadhi). We already consider the newly added Shetland language, with two Germanic ancestors from different subfamilies, as mixed, so I suggest doing the same here. Benwing2 (talk) 20:31, 21 January 2026 (UTC)Reply

Would it possible and/or desirable for any language with two or more immediate ancestors to be automatically classified as a mixed language? —Mahāgaja · talk 09:38, 22 January 2026 (UTC)Reply

I think that is largely hypothetical, because these two cases are the only ones flagged by the data consistency checker (meaning all other cases are in fact considered mixed languages or are given a single parent). Benwing2 (talk) 03:31, 25 January 2026 (UTC)Reply

I do want to see the two Norwegian varieties (nb and nn) merged to a code (no); I mean, we do have Serbo-Croatian (sh) for Croatian (hr), Bosnian (bs), Serbian (sr) and Montenegrin (cnr), so it's possible with Norwegian Adamnewwikipedianaccount (talk) 02:21, 25 February 2026 (UTC)Reply

Atemble [ate] -> Mand

[edit]

Again. ISO has a pending request to change to Mand:

Back in the 1920's, this language was named by linguist Evan Stanley by the village where it is spoken, calling it "Atemble." It was apparently named after what the Catholic missionaries called the village. In 2012, linguist Don Daniels did fieldwork in this language, which had only 8 speakers at the time. He found that speakers of this language called their language "Mand," after the word that means "no." (Daniels 2015:5)

Because languages in the Sogeram area generally do have a name by which they are known, I have decided, at the risk of further multiplying the number of language names in the Papuanist literature, to use the names by which speakers refer to their languages instead of the names by which Z’graggen originally referred to them. The names he used, with the exception of Gants, are village names that do not refer to a language or a kind of speech. Rather, when speakers wish to refer to a language, they often refer to it by means of a salient word in that language, often 'no.' Thus Mand, Nend, Manat, Apalɨ, Magɨ, and Mabɨŋ are named after the word for 'no' in each of those respective languages. (Daniels 2015:p.5)

Since "Atemble" has never been the autonym of this language, this language name should be changed to "Mand."

Benwing2 (talk) 05:45, 23 January 2026 (UTC)Reply

Weak support; seemingly not as well-established as "Kursav" (though this is difficult to verify as searching "Mand" yields results for e.g. "demand" even with qualifiers), but used in e.g. Usher (2020) and Greenhill (2024) Hazarasp (parlement · werkis) 02:11, 28 January 2026 (UTC)Reply

Kamberataro [kbv] -> Dera

[edit]

Dera is the name used by Glottolog, ISO and Wikipedia. Kamberataro (Kamberatoro, Komberatoro) is a toponym where the Dera language is spoken (it's also spoken in several other locations). Benwing2 (talk) 20:27, 25 January 2026 (UTC)Reply

Weak support; while "Dera" is found in most references (also Foley 2018), "Dla" is arguably more common due to the widely-cited The Menggwa Dla language of New Guinea (Sousa 2006) and should be adopted as a alternative. Sousa makes the following remark worth noting: In Dutch documents the Dla tribe was referred as Dĕra. In Dla proper, dla [d(ɨ)ɺa] means ‘name’. In Dutch-Malay orthography, ĕ represents a schwa [ə]; the ĕ in Dĕra is a rendition of the epenthetic vowel in Dla proper, which ranges from [ɨ] to [ə] (§1.4.2). The r in Dera is a rendition of the liquid phoneme in Dla proper, which is usually realised as an alveolar lateral flap [ɺ] Hazarasp (parlement · werkis) 03:03, 28 January 2026 (UTC)Reply

Since there are several Glottolog-only languages I'm finding, I'm grouping them under a single header.

add Molet language?

[edit]

Glottolog has a Molet language (Finisterre family) [28] listed as a sister of Asaro'o (mtv). They state:

Molet, an Trans New Guinea language of the Warup subgroup was discovered on survey. It is closely related to, but not intelligible with Asaro'o [mtv] ( Carter, John and Carter, Katie and Grummitt, John and MacKenzie, Bonnie and Masters, Janell 2012 ). It is missing from E16/E17/E18/E19/E20/E21/E22/E23/E24/E25/E26/E27/E28. See also: Asaro'o [mtv].

Should we add this? If so, the code should presumably be ngf-mol. Benwing2 (talk) 04:25, 23 January 2026 (UTC)Reply

Weak support: As hinted by Glottolog, the idea of Molet's seperateness from and unintelligibility with Asaro'o was raised by Carter et al. 2012 on the grounds of it having only 65-67% lexical similarity to it, despite speakers viewing it as the "same language" and mutually intelligible with it (they also mention "Alalo" and "Groi" as alternative names, though the latter might be a confused reference to Grukoi, a language that was supposedly formerly spoken in the area); presumably Pawley-Hammarström 2018 treat Molet separately following them, and I tentatively concur. Hazarasp (parlement · werkis) 01:40, 28 January 2026 (UTC)Reply

add Kor[a/u]pun-Bromley language under some name?

[edit]

Glottolog has a Korapun-Bromley language [sic] [29] separate from Korupun-Sela [kpq]. Their comment is this:

Volker Heeschen 1978: 9 C. L. Voorhoeve 1975 (The recognition of the separateness from Korapun-Sela of the Korapun data of Bromley is due to Tim Usher)

I don't know whether Korapun or Korupun is more correct, and naming a language after a researcher seems a bit strange, but this does seem to be a separate language. Benwing2 (talk) 04:59, 23 January 2026 (UTC)Reply

add Bai-Maclay language??

[edit]

Glottolog has an extinct Bai-Maclay language [30] not in ISO. Wikipedia mentions it under Dumun language:

Dumun is reported to go by the name Bai, but evidently this is a distinct (though related) language, or at least a variety called Bai recorded by Maclay was distinct.

Glottolog's comment is merely a reference to Malcolm Ross 2019 "A fragment of Papua New Guinea philology. Language and Linguistics in Melanesia 37. 42-60." Benwing2 (talk) 06:12, 23 January 2026 (UTC)Reply

In addition to Mandobo Atas and Mandobo Bawah (which I've previously proposed renaming to Upper Mandobo and Lower Mandobo), Glottolog lists a Kokenop Mandobo language ([31]) with three dialects (Agayop, Bukit Kokenop and Kogonop; I assume Kogonop is an alternative spelling of Kokenop). They give three references but no specific comments:

van den Heuvel, Wilco and Sebastian Fedden 2014: Greater Awyu and Greater Ok: inheritance or contact?. Oceanic Linguistics 53(1). 1-35.
Lourens de Vries and Ruth Wester and Wilco van den Heuvel 2012: The Greater Awyu language family of West Papua. In Harald Hammarström and Wilco van den Heuvel (eds.), History, contact and classification of Papuan languages, 269-312. Port Moresby: Linguistic Society of Papua New Guinea.
Hong-Tae Jang 2003: Survey Report on Languages of Southeastern Foothills in Papua in Papua Merauke Regency of Papua, Indonesia. 3rd Draft. 53pp.

Benwing2 (talk) 01:23, 24 January 2026 (UTC)Reply

Glottolog lists an extinct Sar language of Indonesia [32] in the Timor-Alor-Pantar family. It seems closely related to Teiwa. Glottolog gives the following references:

Klamer, Marian. 2010. A Grammar of Teiwa. (Mouton Grammar Library, 49.) Berlin: Mouton de Gruyter. xviii+540pp. doi: 10.1515/9783110226072.
Kaiping, Gereon A. & Marian Klamer. 2022. The dialect chain of the Timor-Alor-Pantar language family: A new analysis using systematic Bayesian phylogenetics. Language Dynamics and Change 12. 274-326.

Benwing2 (talk) 04:23, 24 January 2026 (UTC)Reply

Glottolog gives this language [33], which seems simply missing in ISO:

Awiakay, with its offshoot Karimba in Imboin, is an Arafundi language not intelligible to any of the other Arafundi lects ( Darja Hoenigman 2007: 132-144 ).

It's also mentioned in Campbell, Lyle and Lee, Nala Huiying and Okura, Eve and Simpson, Sean and Ueki, Kaori 2022 (The Catalogue of Endangered Languages). Benwing2 (talk) 04:45, 24 January 2026 (UTC)Reply

Another uncoded Arafundi language [34], sourced to:

Haberland, Eike. 1966. Zur Ethnographie der Alfendio-Region (Südlicher Sepik-Distrikt, Neuguinea). Jahrbuch des Museums für Völkerkunde zu Leipzig XXIII. 33-67.
Kassell, Alison, Bonnie MacKenzie & Margaret Potter. 2017. Three Arafundi Languages: A Sociolinguistic Profile of Andai, Nanubae, and Tapei. (SIL Electronic Survey Reports 2017-003.) Dallas, Texas: SIL International. 60pp.

Benwing2 (talk) 04:48, 24 January 2026 (UTC)Reply

A pair of Sepik Hill languages, both nearly extinct [35]:

The Sepik Hill language Nigilu is missing from E16/E17/E18/E19/E20/E21/E22/E23/E24/E25/E26/E27/E28. It is not intelligible to the varieties subsumed under Bahinemo [bjh] or Berinomo [bit] ( Dye, T. Wayne and Dye, Sally Folger 2012: 38 ), nor to Wagu. See also: Bahinemo [bjh], Berinomo [bit]. Benwing2 (talk) 05:35, 24 January 2026 (UTC)Reply

The Sepik Hill language Wagu is missing from E16/E17/E18/E19/E20/E21/E22/E23/E24/E25/E26/E27/E28 ( Wayne T. Dye 1990 ). It is not intelligible to the varieties subsumed under Bahinemo [bjh] or Berinomo [bit] ( Dye, T. Wayne and Dye, Sally Folger 2012: 27-29 , 38, Eike Haberland and Siegfried Seyfarth 1974: 28 ), nor to Nigilu. See also: Bahinemo [bjh], Berinomo [bit]. Benwing2 (talk) 05:37, 24 January 2026 (UTC)Reply

A Sko/Skou language [36]:

Leitre, a Skou language spoken in to the east of Vanimo on the north coast of Papua New Guinea, is a distinct language from Vanimo [vam] ( Mark Donohue 2002 , Mark Donohue and Melissa Crowther 2005 , Mark Donohue and San Roque, Lila 2002: 6-7 and p.c. Matthew Dryer 2022) and missing from E16/E17/E18/E19/E20/E21/E22/E23/E24/E25/E26/E27/E28. I wish to thank Matthew Dryer for bringing this language to my attention. See also: Vanimo [vam].

Benwing2 (talk) 06:26, 24 January 2026 (UTC)Reply

Strong support (as paa-lei?); treated in most sources (e.g. Foley 2018:399, Usher 2020]) as a distinct language, which appears justified; from a quick glance at Donohue 2004, it doesn't look closer to Vanimo than Vanimo is to Wutung (Glottolog actually makes Wutung and Vanimo more closely related by grouping them into "West Coast Skouic", though this doesn't appear to be followed elsewhere). Hazarasp (parlement · werkis) 05:13, 28 January 2026 (UTC)Reply

Done as this is a glaring oversight by Ethnologue/ISO. Hazarasp (parlement · werkis) 06:58, 9 February 2026 (UTC)Reply

Per Glottolog [37]:

Avaipa, a South Bougainville language not intelligible to neighbouring languages ( Jason Brown and Melissa Irvine 2021 ) is missing from E16/E17/E18/E19/E20/E21/E22/E23/E24/E25/E26/E27/E28.

Per Glottolog [38]:

Mansim/Borai, a language related to Hatam on the northeastern Bird's Head of Indonesian Papua, is presented as a different language (rather than dialect) from Hatam in Ger P. Reesink 2002: 304-305 and comparisons of old wordlists (e.g. von der Gabelentz, Georg and Meyer, Adolf Bernard 1882 ) readily confirm this difference. Mansim has been assumed to be extinct ( Ger P. Reesink 2002 ), but persistent rumours in the Manokwari area suggest that there are still as many as 50 senior speakers (own field work, 2010). The existence of remaining speakers would also be consistent with the last known published population survey of half a century ago ( A. E. M. J. Pans 1960 ). Mansim is missing from E16/E17/E18/E19/E20/E21/E22/E23/E24/E25/E26/E27/E28.

Wikipedia has an entry on this language (Mansim language). Hatam appears to be an isolate; with Mansim, it would form a small family (called Hatam-Mansim by Glottolog). Benwing2 (talk) 04:30, 25 January 2026 (UTC)Reply

Weak support for Avaipa. We again have a situation where a potential new language is detailed in basically one linguistic source (Brown-Irvine 2021) and speakers identify it as a variety of a related language (Simeku) while a researcher (though in this case a distinct one: [39]) denies its mutual intelligibility, and again following this source seems judicious until further research is performed. Hazarasp (parlement · werkis) 08:48, 28 January 2026 (UTC)Reply

Support for Mansim; aside from the evidence adduced by Glottolog, it appears to be treated as a distinct language in most recent sources (Arnold 2023, Heuvel 2006, Holton-Klamer 2018, Vossen 2016, and Usher 2020 as "Moi Brai"). Since Gasser 2020 mentions the issue of whether two varieties are dialects of the same language or different languages, as with Hatam and Mansim, or Manikion and Sougb she evidently believes the separateness of Mansim isn't so evident. Hazarasp (parlement · werkis) 09:24, 28 January 2026 (UTC)Reply

Per Glottolog [40]: Another extinct Papuan language, in the Yam family. Closest to [nmx] "Nama (Papua New Guinea)", which we don't yet have in Wiktionary. Glottolog's only comment/reference is:

Evans, Nicholas, Wayan Arka, Matthew Carroll, Yun Jung Choi, Christian Döhler, Volker Gast, Eri Kashima, Emil Mittag, Bruno Olsson, Kyla Quinn, Dineke Schokkin, Philip Tama, Charlotte van Tongeren & Jeff Siegel. 2018. The Languages of Southern New Guinea. In Bill Palmer (ed.), Papuan Languages and Linguistics, 641-894. Berlin: Mouton.

So I suspect it won't be easy to find info on this language. Benwing2 (talk) 03:03, 26 January 2026 (UTC)Reply

aside about nmx

Interesting, not only do we not have nmx (as mentioned above), it also wasn't on my list of codes that we're missing and haven't discussed/documented... because it's mentioned on WT:LT as undiscussed but potentially part of a Nambu language (which lacks a single code) along with "nkm (Namat), ncm (Nambo), mxw (Namo, Dorro), nex (Neme), nqn (Nen)", all of which have codes, just not nmx yet... because it would've had a naming conflict (or I and everyone else forgot to add it after [41]), but that is not an issue anymore AFAICT, so it looks like we should add nmx. And classify it and the other aforementioned members into the Nambu languages family. - -sche (discuss) 02:50, 28 January 2026 (UTC)Reply

(OK, nmx has been dealt with. - -sche (discuss) 07:09, 30 January 2026 (UTC))Reply

Per Glottolog [42]: A moribund Papuan language in the Yam family. Closest to [mxw] "Namo" (the Yam family has separate languages Nama, Namat, Nambo, Namo, Neme, Nen and Len; what a terrible confusion). Glottolog's comment on subclassification is merely a reference (same as for Dre):

However, unlike for Dre, Glottolog provides 6 other references, one of which (Gore, 2023) is even titled "A phonological description of Nä and Len: Two undocumented Yam languages of Papua New Guinea", which sounds promising. Benwing2 (talk) 03:24, 26 January 2026 (UTC)Reply

Per Glottolog [43]:

Amam is a Golialan language of Papua New Guinea. Its territory falls under the Weri [wer] entry in E16/E17/E18/E19/E20/E21/E22/E23/E24/E25/E26/E27/E28 but it is impossible that they are the same/intelligible languages given the data of Aki, Mambu and Pennington, Ryan 2013 . Thus, Amam should ideally have its own separate entry as a language related to Weri [wer]. See also: Weri [wer].

The one reference given is that named above:

Aki, Mambu & Ryan Pennington. 2013. Tentative Grammar Description for the Amam Language. Ms. 67pp.

Benwing2 (talk) 05:03, 26 January 2026 (UTC)Reply

Per Glottolog [44]. Extinct. The comment on the separateness of the language from the Anim language Foia-Foia/Minanibai is merely a reference to Usher, Timothy & Edgar Suter. 2015. The Anim Languages of Southern New Guinea. Oceanic Linguistics 54(1). 110-142. But they cite two other references, one a "Vocabulary of Mahigi".

Per Glottolog [45]. Extinct, closest to Bauwaki [bwk]. Their comment is:

O'oku, a presumed extinct language, Mailuan or Yareban affiliation, is missing ( Sidney H. Ray 1938 , W. M. Strong 1911 ).

The two references are Ray, Sidney H. 1938. The Languages of the Eastern and South-Eastern Division of Papua. Journal of the Royal Anthropological Institute of Great Britain and Ireland 68. 153-208. and Strong, W. M. 1911. Notes on the languages of the north-eastern and adjoining divisions. Annual Report for the year ending 30th June 1911. 203-217.

Per Glottolog [46]:

Binahari Ma is arguably a distinct language from Binahari-Neme so ideally the Binahari [bxz] in E16/E17/E18/E19/E20/E21/E22/E23/E24/E25/E26/E27/E28 should be split ( Tom Dutton 1999: 93 ). See also: Binahari-Neme [bxz].

The link above is Dutton, Tom. 1999. From Pots to People: Fine-tuning the prehistory of Mailu Island and Neighbouring Coast, South-East Papua New Guinea. In Roger M. Blench and Matthew Spriggs (eds.), Archaeology and Language, III, 90-108. London & New York: Routledge. This seems rather dubious, but I'm including it because Glottolog considers it a separate language.

There is some serious confusion going on with the name "Arandai" and the language code jbj. Wikipedia says Arandai is a small family consisting of two languages, jbj "Dombano" and bzp "Kemberano". ISO 639-3 disagrees, calling jbj "Arandai", which we follow. Glottolog calls jbj "Dombano" like Wikipedia and says that Arandai is a dialect of Dombano, not the other way around. Wikipedia's Arandai family grouping jbj and bzp is called "Kemberanic" by Glottolog. To make matters worse:

Glottolog's code for Dombano is aran1237 and their code for Arandai is domb1240, showing that they originally adopted a naming scheme similar to ISO and later reversed the meaning of the two terms.
Wikipedia says Ethnologue (hence ISO) is all confused in the head:
The treatment at Ethnologue appears to be inconsistent. ISO codes are assigned to two languages, "Arandai" and "Kemberano", the latter of which is also called Arandai. They are said to have 85% lexical similarity, which would make them dialects of one language. However, the two dialects given for Arandai, also called Kemberano and Arandai (a.k.a. Tomu and Dombano), are said to have only 71% lexical similarity, making them different languages. Dialects of Kemberano (Weriagar) are listed as Weriagar (Kemberano) and Barau.

I'm not completely sure how to straighten this out, but I would suggest we rename jbj -> Dombano because the meaning of Arandai seems so ambiguous, whereas the meaning of Kemberano [bzp] seems straightforward. Benwing2 (talk) 03:50, 24 January 2026 (UTC)Reply

We have several dozen languages whose parent is listed as "paa". There are at least those listed as direct subcategories of Category:Papuan languages, plus any others for which a language category has not yet been created. (Unfortunately the family tree cannot be seen directly from a family, only from the proto-language of the family; this is an oversight I am planning on fixing.) Since "Papuan languages" is not a clade, it seems anything but kosher to list it as a family. I suggest (a) declassifying all such languages (likewise for any other family codes that aren't families), and (b) adding a data consistency check (and maybe eventually an error) that flags any languages whose parent is a pseudo-family that's not a clade, except maybe in certain cases where we've deemed this acceptable (e.g. constructed languages, creoles/pidgins, substrates). Benwing2 (talk) 04:07, 24 January 2026 (UTC)Reply

Indeed, checking for any other geographic non-genetic families (the ones I could think of to check were Caucasian, Australian, "nai" and "sai"), it seems we don't use them as families or categories. Ideally, we should try to assign any Papuan languages that belong to smaller, genetic families to those families, and ... categorize any isolates as isolates, I guess.
We do use some non-geographic non-genetic families as families, e.g. "sgn", which may be OK for now unless there is a better way to fill Category:All sign languages (which should probably be renamed to be consistent with other families), though when possible we should try to subcategorize there too when languages are related or belong to other categories, e.g. Solomon Islands Sign Language is a sort of mixed language or creole where one of the ancestors is Auslan, and various others could go in the "Language isolates" category unless we want to put them in a separate "Sign language isolates" category. (I also wonder whether we should have a qfa-iso, crp, etc style not-a-family code for "village sign languages" or whether the borders of that would be too murky.) - -sche (discuss) 18:22, 24 January 2026 (UTC)Reply

See Wiktionary:Language_treatment_requests#remaining_languages_coded_as_paa, where I went through the remaining languages coded as family "paa". Benwing2 (talk) 05:06, 26 January 2026 (UTC)Reply

BTW, on the subject of unusual families we have, we have Category:Not a family languages containing Category:Undetermined language which we could possibly special-case to go in Category:Unclassified languages instead; would that be desirable? And we have Category:Jewish languages as a manual category containing Hebrew, Ladino and Yiddish (and missing various things like Judeo-Tat and Category:Jewish Babylonian Aramaic); do we want this? - -sche (discuss) 18:27, 24 January 2026 (UTC)Reply

I agree with your suggestion above about putting Undetermined in Unclassified languages. I don't know much about the state of sign language cladistics except that I know that ASL is descended from an older version of French Sign Language, and there are presumably other sign languages descended from ASL. I think it's fine to create sign language families where known and leave the rest under sgn. I think creating a "village sign languages" category might be rather murky esp. since it may not be known in several cases how extensively a given sign language is used. I think Category:Jewish languages is OK because the phenomenon of special Jewish-specific languages has occurred quite a lot, but it should contain all such languages and language varieties (whether or not full languages), not just the obvious ones. As for recategorizing paa languages, I'm thinking of going through them and proposing a list of missing families (which hopefully you can vet fairly quickly) and putting all the rest under qfa-unc (or just declassifying them?) unless their isolate status is particularly clear, because from what I can tell, a lot of putative Papuan "isolates" and that way merely through lack of data. I just discovered that Glottolog does make a distinction between isolate and unclassifiable (see https://glottolog.org/resource/languoid/id/uncl1493) but I suspect the latter only has particularly clear cases. Benwing2 (talk) 19:13, 24 January 2026 (UTC)Reply

On further consideration, some ==Undetermined== words are generally classified, e.g. the prevailing opinion is that the words of the Buyla inscription are Turkic, so it might be suboptimal to call them unclassified. "Category:Not a family languages" still seems a bit silly. Maybe we special-case it to be near the top of Category:All languages and not in Category:Languages by family? And/or maybe we create specific codes for specific undetermined languages (that we're reasonably certain aren't the same), e.g. "Buyla inscription", "Linear A"? Actually, ISO did the latter, as we're discussing elsewhere on this page. But maybe that's too weird. - -sche (discuss) 23:30, 24 January 2026 (UTC)Reply

I think adding things like "Buyla inscription language" is reasonable; we already have various substrate languages as well as Xiongnu and such, and IMO it's better than dumping them all under "Undetermined". (This would suggest that when merging Linear A and Minoan we should use "Linear A language" at least as the name; although per Wikipedia the "Minoan language" also includes Cretan hieroglyphs.) Benwing2 (talk) 00:07, 25 January 2026 (UTC)Reply

BTW, whatever Category:Unclassified languages has that causes it to be sorted as * near the top of Category:Languages by family, we probably want to add to Category:Unclassifiable languages. - -sche (discuss) 23:30, 24 January 2026 (UTC)Reply

ok will do. Benwing2 (talk) 23:45, 24 January 2026 (UTC)Reply

Glottolog claims there is no such Anasi language, and merges them into Nisa-Anasi.

Anasi [bpo] is listed in E16/E17/E18/E19/E20/E21/E22/E23/E24/E25/E26/E27/E28 as a separate language on the left-hand side of the Lower Mamberamo, but the most detailed survey of this area ( Jones, Larry B. 1987: 3 , cf. Detiger, J. G. 1935 ) finds Nisa [njs] be spoken at Anasi. The Anasi entry was, in the first place, based on the incomplete information -- just the name and location -- in Galis, Klaas Wilhelm 1955 . See also: Nisa-Anasi [njs].

Benwing2 (talk) 05:19, 24 January 2026 (UTC)Reply

This is obscure. The only thing I can find is "Nisa"+"Anasi"&dq="Nisa"+"Anasi"&printsec=frontcover this which in a couple places says "Berdasarkan hasil penghitungan dialektometri, isolek Nobuk (Kwerba) merupakan sebuah bahasa dengan persentase perbedaan berkisar 91% - 100% jika dibandingkan dengan bahasa de sekitarnya, misalnya bahasa Armati Sarma, Dubu (Tepi), Anasi, Baedate (Nisa), Airo," and "[...] bahasa dengan persentase perbedaan berkisar 92,25% - 100% jika dibandingkan dengan bahasa lain de sekitarnya, misalnya bahasa Manua (Eritai), Airo, Anasi, Armati, Sarma, Towe, dan Baedate (Nisa).", which seems to just be saying that Nobuk and then something else snippet view won't let me see are unintelligible to speakers of those other lects, without saying much about whether they understand each other. (About Baedate/Nisa it says "Bahasa Baedate (Nisa) dituturkan oleh suku Baedate, Kampung Bariwaro, Distrik Waropen Atas, Kabupaten Mamberamo Raya, Provinsi Papua. Sebelah timur Kampung Bariwaro adalah Kampung Bensor, sebelah barat adalah ..."; it has an entry for Anasi but I can't find it in snippet view.) Wikipedia goes along with Glottolog. Abstain. - -sche (discuss) 00:01, 25 January 2026 (UTC)Reply

We have "Literary Chinese" as an etymology-only language, lzh-lit, in Module:etymology languages/data. We also have "Literary Chinese" as a full language, lzh, in Module:languages/data/3/l. Pick (or rename) one. This has been being detected by Module:data consistency check as an error for a while now. - -sche (discuss) 17:18, 24 January 2026 (UTC)Reply

I think I may have been responsible for this, when I was trying to clean up the ad-hoc lect codes used in {{zh-x}} and convert them to etym varieties (see Module:zh-usex/data). This never got finished because the main Chinese community editors couldn't agree on how to handle the multitude of literary Chinese varieties (in the broad sense). I can't find the relevant discussions on this page any more; I think you may have archived them, but they stagnated rather than coming to a conclusion. Pinging the Chinese workgroup: (Notifying Atitarev, Fish bowl, Frigoris, Justinrleung, kc_kennylau, Mar vin kaiser, Michael Ly, ND381, RcAlex36, The dog2, Theknightwho, Tooironic, Wpi, 沈澄心, 恨国党非蠢即坏, LittleWhole): . Benwing2 (talk) 23:59, 24 January 2026 (UTC)Reply

Ah, that discussion is Wiktionary:Language treatment requests/Archives/2020-24#etymology codes for remaining Chinese varieties in Module:zh-usex/data. - -sche (discuss) 00:22, 25 January 2026 (UTC)Reply

Speaking of which, shouldn't we merge this into Classical Chinese? At least in common parlance, the terms "literary Chinese" and "classical Chinese" are used to refer to the same thing. The dog2 (talk) 05:07, 25 January 2026 (UTC)Reply

Using lzh in {{zh-x}} outputs "Classical Chinese", while lzh-lit outputs "Literary Chinese".

I see four options to resolve the issue at hand:

Remove lzh-lit and leave lzh alone (Literary Chinese).
Remove lzh-lit and rename lzh to Classical Chinese in keeping with Wikipedia and Ethnologue.
Remove lzh-lit, leave lzh alone (Literary Chinese) and create lzh-cla for a Classical Chinese.
Leave lzh-lit alone and rename lzh to Classical Chinese.

If lots of entries distinguished lzh and lzh-lit, I would favour option 3 or 4. However, only 2 entries use lzh-lit. This, combined with The dog2's comment, suggests we don't need the distinction at all. The question then is whether to pick option 1 or option 2. Presumably we don't want this lect called one thing in etymologies and another thing in usexes. The choice between options 1 and 2 seems like a matter for zh editors to decide. This, that and the other (talk) 08:05, 25 January 2026 (UTC)Reply

Although only two entries use "lzh-lit", it looks like Module:zh-usex/data considers "CL-L" an equivalent code, and that appears to be used 600 times. Nonetheless, if no-one else expresses an opinion soon, I am inclined to change the two uses of "lzh-lit" to "lzh", and remove "lzh-lit" as an ety-only code in favour of lzh. Will this break anything? - -sche (discuss) 03:08, 28 January 2026 (UTC)Reply

Hmmm ... the reason I introduced lzh-lit in the first place was so that all the ad-hoc codes in {{zh-x}} could be converted to actual language codes. If you remove lzh-lit, then CL-L won't have any corresponding language code. If we believe that CL (Classical Chinese) and CL-L (Literary Chinese) are different, we should maintain both codes and maybe rename lzh to "Classical Chinese"; otherwise we should merge the two in {{zh-x}}. It seems the Chinese editors couldn't agree on whether CL and CL-L referred to the same thing, so I'd like to try to straighten that out before simply removing lzh-lit. Benwing2 (talk) 04:02, 28 January 2026 (UTC)Reply

And I should add, finally convert {{zh-x}} to use language codes instead of ad-hoc codes. Benwing2 (talk) 04:03, 28 January 2026 (UTC)Reply

If we want to keep the current distribution of names, such that in any place where one name or the other is displayed in an entry, that display will remain unchanged and available for a future discussion to make decisions about changing or merging, then something like TTO's proposal 3 seems like the way to go: to handle the places that want to display "Classical Chinese", create a code like "lzh-cla" = "Classical Chinese", and take any cases which are currently using {{zh-x}} to replace the canonical name of "lzh" ("Literary Chinese") with a non-canonical name "Classical Chinese", and switch those to just use "lzh-cla"; then to handle the places that want to display "Literary Chinese", make "CL-L" just = "lzh" = "Literary Chinese" (instead of specialcasing "lzh" to have a different name in usexes than in etymologies). - -sche (discuss) 04:24, 28 January 2026 (UTC)Reply

Looking through insource:"lzh", the uses are in etymologies, inter-project links, etc... are there also uses in {{zh-x}}? If not, it seems like it would be simple to just (create "lzh-cla" and) reassign "CL" => "lzh-cla", and then let "CL-L" = "lzh", leaving any discussion of whether to merge "Classical" and "Literary" to another time. But the fact that this was not done in the first place, and instead "lzh-lit" was created, leads me to wonder if I'm missing something...? - -sche (discuss) 19:44, 28 January 2026 (UTC)Reply

I'm pretty sure I added lzh-lit, and it was done simply to preserve the existing structure of {{zh-x}}, with CL-L ("Literary Chinese") nested under CL ("Classical Chinese"). From reading Wikipedia's entry on Classical Chinese, it appears that Classical Chinese refers approximately to a written language derived from spoken Old Chinese, which was maintained as a literary language long after it ceased being spoken (indeed, up through the early 20th century). The term "Literary Chinese" refers to Classical Chinese composed during the period when Old Chinese was no longer spoken (approximately the 2nd century AD and later). In this respect it reminds me a lot of Classical Latin (a written language based on the spoken language of c. 50 BC) vs. Medieval Latin (a literary language developed from Classical Latin and used at a time when Latin was no longer spoken natively). The main difference is that the term "Classical Chinese" ambiguously stands for either the entire period of literary use of the language (from the 5th century BC to the 20th century AD, corresponding approximately to our use of "Latin"), or just the period when the spoken language and the written language were close enough that the written language approximated the spoken language (from the 5th century BC to the 2nd century AD, much like our use of "Classical Latin" and "Late Latin"). I am guessing that the use of "Classical Chinese" (CL) in {{zh-x}} corresponds to the narrower use of the term and generally refers to works written between the 5th century BC and the 2nd century AD, while Literary Chinese (CL-L) in {{zh-x}} refers to later works that imitated the style, grammar, vocabulary of the earlier works. If that's the case, I'd suggest either using the equations CL = lzh = "Classical Chinese" and CL-L = "lzh-lit" = "Literary Chinese" (and renaming "lzh" to "Classical Chinese") and or using the equations CL = lzh-cla = "Classical Chinese proper" and CL-L = lzh-lit = "Literary Chinese", keeping lzh for the overarching genre over the entire period and renaming it to "Classical Chinese". Either way, I'd suggest keeping lzh-lit as "Literary Chinese" and renaming lzh to "Classical Chinese".

Does this make sense to you? Benwing2 (talk) 20:11, 28 January 2026 (UTC)Reply

If we rename "lzh" (actually rename it, changing its canonical name in Module:languages, instead of just having Module:zh-usex/data output a different-than-canonical name for it in usexes), that will silently change any uses of it in etymologies etc, so anywhere that someone wrote "from Literary Chinese foo" or "compare Literary Chinese foo" (e.g. August Thearch, guest star) will now display "...Classical Chinese foo"; by my quick count, this would affect about 381 entries (372 derivations, 9 mentions of cognates).
If that's OK (and I certainly don't personally have any reason to object to it), you could do as you suggest.
I will remark that if we're keeping the existence of a distinction between "Literary Chinese" and "Classical Chinese" (keeping both of these as distinct lects that something could derive from), then it seems at least a little bit odd to silently switcheroo etymologies that mention one to now mention the other instead, which is why I was suggesting something to let each current use of "Literary Chinese" or "Classical Chinese" continue displaying as it has been unchanged (changing the codes, rather than the displayed forms), but again, I don't object to switching 'em up. (I suppose, if desired, a rename of "lzh" to "Classical Chinese" could even be accompanied by a replacement of "lzh"→"lzh-lit" in etymologies, so that the display is unchanged, and people have the option to later simplify the etymologies to just "lzh" = "Classical Chinese" at their leisure in any cases where they deem that correct.) - -sche (discuss) 21:11, 28 January 2026 (UTC)Reply

OK, that makes sense, I can rename lzh to "Classical Chinese" in Module:languages/data/3/l and change references to lzh to lzh-lit if necessary. Pinging people who took part in the previous discussion about adding etym-only codes for {{zh-x}} codes: @The dog2 @Justinrleung @Fish bowl @Wpi @ND381 @Theknightwho We are proposing:

renaming the code lzh to "Classical Chinese" in the language modules (Module:languages/data/3/l), corresponding to code CL in {{zh-x}};
keeping the existing etym-only code lzh-lit "Literary Chinese" (code CL-L in {{zh-x}}, which I'm guessing refers to later Literary Chinese imitating the original Classical works, approximately from the 2nd century AD on);
possibly adding an etym-only code lzh-cla "Classical Chinese proper" to cover the period of the original Classical works (5th century BC - 2nd century AD?; if added, we can consider using this for original Classical works currently tagged with code CL in {{zh-x}});
possibly renaming uses of the code lzh in etymologies to use lzh-lit, so that the etymologies continue to read "From Literary Chinese Foo" instead of silently switching to read "From Classical Chinese Foo".

(1) is the only change that I think really needs to be done, because currently both lzh and lzh-lit are named "Literary Chinese". (3) and/or (4) are independent of each other and can be done as necessary if the Chinese editors think this is a good idea.

Any sort of consensus on this should unblock the rename of ad-hoc {{zh-x}} codes to standard language codes. Benwing2 (talk) 22:13, 28 January 2026 (UTC)Reply

I'm fine with either way as long as there's proper distinction between the two codes. I would prefer making more splits but I disgress. My impression is that most borrowings occur during the later (perhaps Western Han period at the earliest) "Literary Chinese" period (or from it), so it would be more ideal to keep that wording for now.

On the topic of language codes for {{zh-x}}, the Classical, Literary, and WVC ones are wrongly named as Classical [Lect], Literary [Lect], and Writtern Vernacular [Lect] (where [Lect] includes Mandarin, Cantonese, Taishanese, etc.) in the language module but [Lect] here should always simply be "Chinese" (the names with [Lect] would refer to a totally different thing). Besides, one could probably make a distinction between all three of them per each lect (and the modern spoken language) but we seems to have use them interchangably sometimes, so that's another thing to sort out before renaming the codes. – wpi (talk) 15:43, 29 January 2026 (UTC)Reply

Just FYI @-sche I pinged @Wpi on Discord asking about their comment here, as I figured I might be able to resolve my questions faster that way (but so far no answer). I am hoping if I can come to understand what the confusion and issues are, I can come up with a good plan pretty quickly. I think so far it's agreed that plain lzh should be Classical Chinese and lzh-lit should be Literary Chinese, but there are some more things to work out. Benwing2 (talk) 03:32, 4 February 2026 (UTC)Reply

Thanks. What are you questions? I take wpi's comment "My impression is that most borrowings occur during the later (perhaps Western Han period at the earliest) "Literary Chinese" period (or from it), so it would be more ideal to keep that wording for now." as supporting your proposed change #4 (so that etymologies keep the wording "Literary Chinese"), and being consistent with your #2. (I am also fine with your proposals 1, 2, and 4, and have no opinion on the merits of 3, other than that I would prefer to avoid introducing "proper" into language and family names unless strictly necessary.) It seems like these changes can be done independently of #3 (and AFAICT independently of any issues with other zh-x labels), which can be put off and discussed at leisure...? - -sche (discuss) 23:45, 4 February 2026 (UTC)Reply

My questions were about what "Written Vernacular Chinese" means, what the Mandarin, Cantonese and Taishanese varieties mean and how they should be named and coded, since @Wpi commented that the current names made no sense, and I couldn't figure out whether they were supporting my proposals. I wrote this:

Hi can you help me understand the distinction between the various zh-x lect codes?

you said onsite that "Written Vernacular Mandarin/Cantonese/Taishanese" is wrong and it should be "Written Vernacular Chinese"

but what is the difference between WVC, WVC-C and WVC-C-T?

should WVC-C be called "Cantonese Written Vernacular Chinese" or something?

currently we code WVC as cmn-wvc, WVC-C as yue-wvc and WVC-C-T as zhx-tai-wvc. Should instead we have zh-wvc, zh-wvc-yue and zh-wvc-tai or something?

Eventually @Justinrleung responded:

WVC shows pinyin, WVC-C shows jyutping, and WVC-C-T is supposed to show Taishanese romanization. These are supposed to be regional flavours of "Standard Written Chinese". I'm not exactly sure how WVC-C differs from C-LIT, though C-LIT is probably a misleading (but more widely used) label

And a discussion ensued about when the cutoff for WVC is (apparently something like pre-1920). I just wrote the following:

OK, then, if "Written Vernacular Chinese" represents pre-"modern era" Standard Written Chinese, do constructs like "Cantonese Written Vernacular Chinese" like I proposed above make sense? I'd like to proceed as follows:

Rename lzh to "Classical Chinese" to eliminate having two codes that both read "Literary Chinese", and probably change all occurrences of lzh in etymologies to lzh-lit so that the wording stays as "Literary Chinese" rather than switching to "Classical Chinese".
Fix the names and possibly codes of the zh-x-equivalent language codes to address things that are (obviously) wrong, like "Written Vernacular Cantonese".
Do a bot run to convert all zh-x codes to proper language codes without any loss of information.
Figure out which language codes can be merged, and merge them.

I've postponed the merger of duplicate codes to the end because I assume it's the most contentious.

Does this make sense?

So, basically I agree that we can do the zh-x code conversion after eliminating the duplicate "Literary Chinese" issue, but I'm trying to avoid things getting bogged down in a rabbit-hole "we need to do something but we're not exactly sure what" type of discussion by proposing doing things in steps and letting the contentious parts be done last. Benwing2 (talk) 02:02, 5 February 2026 (UTC)Reply

OK, I have done step (1) in the four-step procedure laid out under "I'd like to proceed as follows" (the last such procedure listed). I left lzh as plain lzh whenever it was clear that Classical Chinese of < 200 AD was indeed being referred to, and in all other cases converted to lzh-lit, and added an {{attn|lzh|...}} note indicating that this needs to be reviewed except in cases where it was clear that Literary Chinese of >= 200 AD was being referred to (e.g. Tang, Song, Qing etc. dynasty poets, Buddhist works, translations of Mongolian works, etc.). Hopefully this didn't overconvert any categories; if so feel free to rename them back to "Literary Chinese". Benwing2 (talk) 05:25, 11 February 2026 (UTC)Reply

OK there is now an error with the category Category:Classical Chinese, which shouldn't exist but has 389 entries in it. I think they should all be moved to Category:Classical Chinese lemmas but I'm not completely sure. Pinging the Chinese workgroup (Notifying Atitarev, Fish bowl, Frigoris, Justinrleung, kc_kennylau, Mar vin kaiser, Michael Ly, ND381, RcAlex36, The dog2, Theknightwho, Tooironic, Wpi, 沈澄心, 恨国党非蠢即坏, LittleWhole): for help. Benwing2 (talk) 05:28, 11 February 2026 (UTC)Reply

As things stand, Northern Saharan Berber (also called Mzab-Wargla Berber) is used as a very broad umbrella for a large number of Berber varieties. Although here on wiktionary it uses the language code mzb, normally reserved for Mzab Berber, it is here applied to all Northern Saharan Berber varieties, including South Oran and Figuig Berber, Gourara Berber, Tidikelt and Tuat Berber, Ouargli Berber, and Oued Righ Berber.

These varieties form a related cluster, but they are spread across a wide geographic area from Morocco to eastern Algeria. Given the diversity, and the substantial independent resources available for some varieties, it is more appropriate to treat some of them as separate languages.

In particular, Mzab Berber and Figuig Berber stand out. Both are frequently treated in the literature as distinct languages, and both have substantial independent resources, including grammars, dictionaries, and text collections.

Because of this, I propose that at least Mzab Berber and Figuig Berber be split off and assigned their own language codes. Lankdadank (talk) 17:19, 24 January 2026 (UTC)Reply

Berber languages have been the subject of several discussions linked on WT:LT trying (and perhaps failing) to strike a balance between over-splitting and over-lumping; the most recent was Wiktionary:Language treatment requests/Archives/2020-24#RFM discussion: August 2020–August 2021, which noted ISO's failure to encode Figuig and led to merging these varieties as Northern Saharan Berber on grounds of high mutual intelligibility. From your perspective, how often do the words in these languages overlap i.e. have the same spelling, or have predictable differences, vs how often are the spellings (and hence, how often would the dictionary entries here be) unpredictably different? - -sche (discuss) 17:36, 24 January 2026 (UTC)Reply

I wasn't aware that this had been discussed before, thanks for pointing that out. Most Zenati Berber languages, including the Northern Saharan varieties, have a high degree of mutual intelligibility. For reference, here is a brief comparison of some cognates:

"to walk"

Northern Saharan: Figuig: uyur; Mzab: iǧur; Ouargli: igur
Other Zenati: Tarifit: uyur

"man"

Northern Saharan: Figuig: argaz; Mzab: arǧaz; Ouargli: argaz
Other Zenati: Tarifit: aryaz, argaz

As you can see, there are definitely similarities, but also quite some variation. Two main factors suggest treating these varieties as separate languages:

Geographic separation: Figuig and Oued Righ, for example, are several hundred miles apart, and speakers do not consider themselves part of a single larger language.
Practical standardization issues: If Northern Saharan Berber is treated as a single language, it is unclear which variety should serve as the main reference for entries, and which as "dialectal variation".

Lankdadank (talk) 18:11, 24 January 2026 (UTC)Reply

Neither of these seems to me like a particularly strong argument for splitting. Generally we try to go by mutual intelligibility even if the languages are geographically separated or have no single standard (cf. English, Portuguese ...). English and Portuguese have been handled in a fairly random fashion, essentially whoever created the word first determined the form of the main entry, typically based on their own variety. That may not work here, so we could go by whichever form occurs most commonly or whichever is the most conservative form, either on a word-by-word basis or consistently for one variety; e.g. if one variety is clearly more conservative than the others I'd recommend using that one, otherwise maybe whichever one is spoken the most. Benwing2 (talk) 23:51, 24 January 2026 (UTC)Reply

I’m generally fine with that approach if there isn’t broader support for splitting. Mutual intelligibility is a tricky criterion to apply in the Berber context, though: taken on its own, it could easily lead to treating much of Northern Berber as a single dialect continuum rather than as distinct languages. That said, I agree that this also means we need to be cautious about oversplitting, and I understand now why it might be preferable to keep these varieties together. Also, I'd say the most spoken varietes by far are Figuig in Morocco and Mzab in Algeria. Lankdadank (talk) 00:11, 25 January 2026 (UTC)Reply

In 2020, ISO retired cca (Cauca) saying there is no evidence of it: "It is presumed to be from the Cauca Valley, but no such language is known, unless it's the undemonstrated Quimbaya, which is attested in about 10 words [...] all subsequent efforts to confirm the existence of such a language [as Cauca] have failed". I see we already have a code for Quimbaya, so nothing is lost by removing Cauca. (The code deprecation request actually goes on to impugn Quimbaya, saying "there is too little data to assert that the Quimbaya spoke a language different from all its neighbors (Hammarstrom 2015)", but since it's an extinct few-word wordlist-only language, it seems OK to me to keep that one to house the few words it has.) - -sche (discuss) 00:18, 25 January 2026 (UTC)Reply

@-sche As I'm going through Papuan languages I'm finding several that are in the "might be this, might be that, might be an isolate" category like Yele and Fasu. These are living languages, often with considerable research done on them, so it seems a bit strange to list them as unclassifiable; it's more that the languages are divergent so there isn't a consensus among researchers. Sometimes this lack of consensus goes on for a long time, e.g. whether the Wiyot and Yurok languages were related to the Algonquian languages was debated for decades before a conclusion was finally reached (yes they're related). Technically we *could* say they're unclassifiable at the present time due to insufficient research, but I feel maybe it would be better to have a qfa-unk ("Unknown affiliation languages") or qfa-noc ("No consensus languages"). Glottolog does have an "Unclassifiable languages" pseudo-family https://glottolog.org/resource/languoid/id/uncl1493, but it seems they only put languages that are unclassifiable due to lack of data (usually but not always extinct, e.g. they include Traveller Scottish as Unclassifiable although it's listed as threatened rather than extinct) instead of lack of consensus. Benwing2 (talk) 20:44, 25 January 2026 (UTC)Reply

I believe that people would not, in practice, understand and maintain whatever three-way distinction we might intend between "Unclassifiable languages", "Unclassified languages", and "Unknown affiliation languages" (in particular, the latter two sound like they cover the same ground).
One idea, which I did not suggest in the other discussion — because when we were only trying to distinguish "has not been classified" from "linguists think this cannot be classified", I felt "Unclassifiable" more clearly captured how the scope of the second one was different from the first one — is to rename the category that gets generated when a classification is omitted from our modules, from "CAT:Unclassified languages" to something more like ~"Languages with no family code set" (or if anyone can come up with a better name, please do). I think a three-way distinction between ~"Languages with no family code set" (where maybe its classification is perfectly well accepted, but no-one has gotten around to adding the code yet), "Unclassifiable languages" (where linguists think it cannot be classified, because there is too little data), and then maybe something like "Languages with disputed affiliation" or "Languages with debated affiliation" (wordsmithing / better ideas welcome!) might work...? What do you think? - -sche (discuss) 21:29, 25 January 2026 (UTC)Reply

I agree about the term "unknown affiliation" being problematic, so tentatively before reading this message I started assigning languages and families (offline only so far) into a qfa-noc ("no consensus languages") family. However, I think "disputed affiliation" sounds better (and a bit clearer than "debated affiliation"), and I completely agree with renaming "Unclassified languages" to something else; either "languages with no family code set" as you propose, or "unassigned languages" (simply because the code currently adds "languages" onto the end of all families, and a name like "languages with no family code set" will require a bit of code hacking; but if you think this name is better than "unassigned languages", I will go ahead and do the coding to make it all work). So we'd have a distinction between:

unclassifiable languages
disputed affiliation languages (or "languages with disputed affiliation", whichever one you think sounds better)
unassigned languages (or "not-yet-assigned languages" or "languages with no family code set", whichever one you think sounds better)

I think, with the right names, the distinction between the three should be clear. Benwing2 (talk) 21:49, 25 January 2026 (UTC)Reply

@Benwing2 Whatever we arrive at, is there any way to disable the Wikipedia link for such "families"? I cringe every time I see "Wikipedia has an article on Not a family languages." I can understand always following the lead of Wikidata as to which Wikipedia page to link to when Wikidata has that information, but there should be some override for weird exceptions like these. There are also cases where we have a family but Wikipedia has a language and vice versa- but that's for another thread... Chuck Entz (talk) 22:18, 25 January 2026 (UTC)Reply

I suppose it's not worth complicating the code to allow names with alternative word order; "unclassifiable languages", "disputed affiliation languages" and "unassigned languages" (+ text on the category pages that explains what these mean) is probably fine. @Chuck, where are you seeing the Wikipedia link? On the page of the category, like Category:Not a family languages, or somewhere else? On category pages the link can be changed or suppressed with |setwiki=, no? - -sche (discuss) 07:09, 26 January 2026 (UTC)Reply

@-sche I already fixed the issue, that's why you're not seeing it :) ... Benwing2 (talk) 07:13, 26 January 2026 (UTC)Reply

Thanks! I'm just saying, even before you made the module / template know on its own not to add the link, it was already possible to change or suppress such links using the |setwiki= param that {{auto cat}} mentions, AFAICT. - -sche (discuss) 03:12, 28 January 2026 (UTC)Reply

|setwiki= is actually for languages; the problem here was with families, where families with no corresponding Wikipedia article were showing bogus Wikipedia links. I didn't change how Wikipedia links for individual languages were handled so the problem might still exist there, although generally it's less noticeable since most individual languages on Wiktionary do have corresponding Wikipedia articles. Benwing2 (talk) 03:32, 28 January 2026 (UTC)Reply

Some families (I'm looking at you, Nilo-Saharan; also Niger-Congo, Dene-Yeniseian and others) are considered "questionable" or "controversial" but we still have them. One solution is just to get rid of them (I know @Thadh would prefer this; he even wants to get rid of Afroasiatic), but a less drastic solution would be to add support for a "questionable" flag that can be set on families. The effect of this would be to add a question mark after the name of the family in family tree diagrams and infoboxes (like Wikipedia does), and to add verbiage to the family category indicating that the family is considered controversial and not demonstrated to everyone's satisfaction. Possibly also the vertical line in the family tree that connects the family to its children could be colored red or some other color to distinguish it. (Ideally IMO, those vertical lines should be solidly connected; then we could use a dashed line to indicate a controversial connection. But this may not be possible as long as we draw family trees using "ASCII art". @This, that and the other do you have any ideas on how to make family tree lines solid using CSS or HTML? See for example the family tree under Category:Proto-Nilo-Saharan_language. I'm sure this is possible with SVG but generating and rendering SVG on the fly seems likely to be beyond MediaWiki capabilities.) Benwing2 (talk) 22:07, 25 January 2026 (UTC)Reply

@Benwing2 the ability to generate SVGs on-the-fly from Lua was recently added by some volunteer MediaWiki developers: see m:Tech/News/2025/45. However, it's not possible to create links inside the SVG, so not that useful for us. An alternative would be to create a series of images to generate the "tree" components, rather like Wikipedia's railway diagrams.

However, I'm not sure that's needed. I think a lot can be done using pure CSS, including using a box drawing character like ┊ combined with a different colour to indicate questionable connections.

I've done an initial pass through the CSS of Module:family tree to try and improve the look (e.g. use the page's regular font size for the lect names, and remove the gaps between lines). Let's see if anyone complains - personally I think it's a real improvement.

(Incidentally, do you know why family trees are not included on family category pages?) This, that and the other (talk) 04:20, 26 January 2026 (UTC)Reply

Thanks, that looks a lot better! I don't know why family trees are not on family category pages; recently, I've been meaning to add that, which shouldn't be hard (or you can do it). Benwing2 (talk) 04:27, 26 January 2026 (UTC)Reply

They were added by TKW in 2023 but self-reverted due to memory errors. Shouldn't be an issue now. Added anew. This, that and the other (talk) 05:09, 26 January 2026 (UTC)Reply

Did you revert the CSS changes? I don't see them any more. Benwing2 (talk) 05:35, 26 January 2026 (UTC)Reply

Re the title question: in theory, marking "disputed" families sounds like a fine idea; in practice I worry it would encourage people who want to add truly controversial groupings ("why can't we add Nostratic and just mark it as controversial?") or cast doubt on rather well-accepted groupings (Afroasiatic). - -sche (discuss) 06:10, 26 January 2026 (UTC)Reply

I have the same issue - what constitutes a controversial grouping? To me, anything Greenberg did can basically automatically be called controversial, but not to everyone. In Russia, "Uralic-Yukaghir" is generally accepted as truth, as is Altaic, yet we probably don't want those either. So I wonder how arbitrary this would be. Thadh (talk) 07:29, 26 January 2026 (UTC)Reply

I was thinking more or less that any grouping that we have that is marked with a ? in Wikipedia is controversial; or alternatively, any grouping we have that is not accepted by Glottolog is controversial (I suspect there is fairly strong overlap between the two, although from going through the New Guinea languages, Glottolog is generally more conservative than Wikipedia). I do your get point, and @-sche's point, but at the same time it feels like we should somehow indicate the controversial groupings. But maybe a Boolean flag isn't sufficient and it's better just to have text on each family page indicating how accepted a given grouping is. Benwing2 (talk) 08:07, 26 January 2026 (UTC)Reply

Apparently, according to the Wikipedia pages of "North Levantine Arabic" and "South Levantine Arabic", "In 2023, South Levantine Arabic and North Levantine Arabic were merged into a single Levantine Arabic in the ISO,[6] based on the high mutual intelligibility between Arabic varieties spoken by sedentary populations across the Levant and the lack of clear distinctions between variants along national borders."

Should we merge them into one "Levantine Arabic" or what? Adamnewwikipedianaccount (talk) 01:34, 26 January 2026 (UTC)Reply

There have already been several discussions about this. The consensus is to merge but it's *HARD* and currently no one simultaneously has the will and technical expertise to do so. The most recent discussion is on this very page: Wiktionary:Language_treatment_requests#Levantine_merger The person who submitted the successful ISO proposal to merge the two was active for awhile as an editor here, but went inactive before they were able to contribute much expertise towards merging. It may prove that we need to essentially start over in order to merge; that happened with Masurian, which for awhile was split out of Polish and then re-merged. When it was re-merged, it was easier to delete the 700 or so lemmas that had been created and start over than try to merge the existing lemmas. Benwing2 (talk) 03:00, 26 January 2026 (UTC)Reply

Seems reasonable

Basically, they'll stay split until further notice Adamnewwikipedianaccount (talk) 15:52, 26 January 2026 (UTC)Reply

Papi [ppe] to Baiyamo

[edit]

Although Glottolog, Usher 2020, and Wikipedia refer to ppe as "Papi", Foley 2018 employs the name "Baiyamo", and there is a pending ISO change request with the following explanation:

IThe language under the ISO code [ppe] was originally called “Paupe” or “Papi” in the 1970s, named after one of the villages that speaks this language (Conrad and Dye 1975; Laycock and Z’graggen 1975). However, this is not the self-designation that the people use, for themselves or their language. Recent research from Lohmann (who worked with these people during his work among the Abasano) and Skrzypek (who worked among these people in the last 10 years) found that they called themselves “Baiyamo” or “Paiyamo,” and that that they gave their language the same name (Lohmann 2014, Skrzypek 2021). As such, the reference name for this language should be “Baiyamo” (which also appears in Hammarström 2010), but the additional name “Papi” should be kept for historical record.

Yaul [yla] to Ulwa (New Guinea)

[edit]

Glottolog and Usher 2020 refer to yla as "Ulwa (Papua New Guinea)" and "Ulwa" respectively; a pending ISO change request provides the following explanation:

Linguist Donald Laycock named this language after one of the villages where it is spoken. Linguist Russell Barlow did fieldwork in the 2010s, and found that its speakers call their language Ulwa. This is what he says in his grammar (Baker 2023:39):
Speakers from all four villages where the language is spoken agree upon Ulwa as a glottonym. When Laycock conducted his survey work of the Sepik area between 1970 and 1971, he recorded the name of this language as “Yaul”, which is the name of one of the four villages. In doing so, Laycock (1973: 3) seems to have contravened one of his principles in choosing language names: “The name should not be that of a village, clan or locality that is significantly smaller than the language area, or that is not accepted by the whole group without feelings of rivalry”. This name lent itself to the formation of the ISO 639-3 code [yla] and the glottocode [yaul1241]. Nevertheless, I do not use it to refer to the language described by this grammar, since it is not the preferred name for the language among its speakers. Furthermore, the term “Yaul” creates confusion between reference to the village (and dialect) of that name and reference to the language as a whole. That is, I agree with the principle of not naming a language for a village, particularly in cases such as this one, in which the language is spoken in multiple villages.

However, Wikipedia and Foley 2018 maintain "Yaul(-Dimiri)", though the latter work predates Barlow's grammar. If this change is adopted, ulw should be renamed to "Ulwa (Nicuragua)"; in any case, it should be set as a Misumalpan language. Hazarasp (parlement · werkis) 06:15, 28 January 2026 (UTC)Reply

Support, given the ambiguity between the dialect vs language, and because Ulwa seems to have become the common name. Looking at the Google Scholar results for the last ten years and searching together with "New Guinea" to try to exclude the American language, I can find just two groups of authors (one is Mattiola + different co-authors on two different papers) who use Yaul without (either in the same work or in their later works) treating Ulwa as the primary name. I found twelve using Ulwa as the primary name (Barlow; Foley; Yang et al; Levow, et al; Virk, Hammarström, et al; McCarthy, Price; Hanink, Koontz-Garboden; Lee; Bossuyt; Keen; Norcliffe, Majid) before I stopped looking. - -sche (discuss) 06:41, 28 January 2026 (UTC)Reply

Some more alternative names for Ulwa and the other Keram and Ramu languages are listed in Barlow 2023:30 (the grammar mentioned in the change request). For Ulwa/Yaul, the most important is "Yaul-Dimiri" (presumably a earlier attempt at disambiguating from the dialect, used in Foley 2018 and Jodar-Sanchez 2023); others such as "Andjilowa" and "Unamama" appear to lack recent usage upon investigation. Hazarasp (parlement · werkis) 07:15, 28 January 2026 (UTC)Reply

Done with [yla]. Benwing2 (talk) 03:42, 5 February 2026 (UTC)Reply

ISO 639-3 splits the Tobo-Kube dialect group into two languages Tubo and Kube, but Wikipedia has a single Tobo-Kube language and so does Glottolog, under the name Kulungtfu-Yuanggeng-Tobo https://glottolog.org/resource/languoid/id/kube1244, which suggests they renamed it from Kube-Tobo or similar. Wikipedia says:

They [Kube and Tobo] are mutually intelligible and 95% lexicostatistically cognate.

This is sourced to Hammarström (2015) Ethnologue 16/17/18th editions: a comprehensive review: online appendices. Wikipedia also says that Kurungtufu (aka Kulungtfu) and Yoangen (aka Yoanggeng, Yuanggeng) are dialects of Kube. Glottolog's new name suggests they are being more agnostic about the relationship among the three lects. Benwing2 (talk) 21:05, 28 January 2026 (UTC)Reply

Support: We find the following comment in Suter 2018, which facially implies some degree of separateness: "Yongseop and Hyunsook Lee studied the Mongi (aka Kube) language, producing a grammar (Lee 1993) and a dictionary. Tobo is being studied by an NTM team around Chad Mankins, who wrote a phonology and a grammar sketch as well as a dictionary". However, in a later edition of the grammar mentioned there (Lee 2014), we find the following comment:

The Tobo language may also be a dialect of Kube. Tobo is currently classified as a member of West Huon Family, but based on a recent survey by the present author Tobo appears to be a dialect of Kube. […] According to the study of word lists (consisting of about 200 words and simple sentences), the two languages are about 95% cognate. In addition the mutual intelligibility is very high. (Most Tobo children could understand Kube stories very easily.???)

This implies that Kube and Tobo should be merged and the merged language should probably be called Kube (and thus receive the code kgf); Wikipedia's "Tobo-Kube" and Glottolog's "Kulungtfu-Yuanggeng-Tobo" can be added as aliases. The claim in Hooley-McElhanon 1970 that Tobo has phonological similarities to the "Burum and Mindik languages" despite its lexical connection to Kube simply suggests that Tobo is a dialect with heavy Burum/Mindik/Somba-Siawari adstratal influence. Hazarasp (parlement · werkis) 23:54, 28 January 2026 (UTC)Reply

@Dragonoid76, @Kutchkutch, @Pulimaiyi Following recent discussions on the treatment of reconstructed (Ashokan) Prakrit, which weren't satisfactorily concluded, I have a proposal: let's treat reconstructed Prakrit and reconstructed Ashokan Prakrit similarly to Category:Proto-Romance. We can have etym-only language codes for reconstructed Prakrit with the label "Proto-New Indo-Aryan" (matching Dragonoid's category Category:Prakrit Proto-New Indo-Aryan) and for Ashokan Prakrit we can do "Proto-Middle Indo-Aryan". Then we can also use standard etymology templates for these terms (and categorise them separately from inherited Prakrit terms), instead of {{ncog}} as Dragonoid has been doing. —Aryaman^A ^{(मुझसे बात करें • योगदान)} 04:31, 29 January 2026 (UTC)Reply

@AryamanA This is exactly what I'd suggested in the older discussion but it didn't come to fruition at that time. I would still support this. Dragonoid76 (talk) 08:31, 29 January 2026 (UTC)Reply

Cool, added etymology-only codes pra-pro and inc-ash-pro, and the categories CAT:Proto-New Indo-Aryan and CAT:Proto-Middle Indo-Aryan. —Aryaman^A ^{(मुझसे बात करें • योगदान)} 17:58, 29 January 2026 (UTC)Reply

This language seems to refer to nothing: Blackings & Fabb (2003) in their grammar of Ma'di list the Moru-Ma'di group as containing Ma'di (in Uganda and Sudan), Lugbara (in Uganda, DRC and Sudan), Moru, Kaliko (in Sudan and DRC), Avokaya (in Sudan and DRC), Lolubo, Logo (in Sudan and DRC). What 'Southern Ma'di' refers to is unclear - it could either be Lugbara (lgg, which is occasionally called 'Ma'di' as well), the southern dialects of Ma'di (mhi, which we do and should treat as part of Ma'di), or any of the other languages, but it seems we already have codes for all of these.

The only term found in CAT:Southern Ma'di lemmas seems to be the same as in CAT:Lugbara lemmas, so I think it's fine to simply delete the page and the language altogether. Thadh (talk) 08:52, 29 January 2026 (UTC)Reply

Forgot the pings: @Benwing2, -sche. Thadh (talk) 08:53, 29 January 2026 (UTC)Reply

No objection from me but I don't know much about this language group. Maybe @-sche can comment when they have a chance. Benwing2 (talk) 21:13, 1 February 2026 (UTC)Reply

Despite the long-term consensus that there is no single "Baltic" family grouping East Baltic and West Baltic to the exclusion of Slavic, we still have it. Can we get rid of it once and for all, or at least agree that this is the correct approach? Under Category:Proto-Baltic there are no lemmas but unfortunately there's Category:Terms derived from Proto-Baltic by language, which has 30 subcategories, some with lots of entries; in particular, Category:Latvian terms derived from Proto-Baltic has 369 entries, Category:Proto-Finnic terms derived from Proto-Baltic has 194, Category:Finnish terms derived from Proto-Baltic has 127 and Category:Lithuanian terms derived from Proto-Baltic has 54; all the rest have <= 27, which seems a lot more manageable. I don't know enough about Proto-Balto-Slavic to know whether the reconstructions given as "Proto-Baltic" are actually Proto-Balto-Slavic or are some weird outdated concoction, but one simple solution is to convert them all to Proto-Baltic {{m|und|foo}} and add an {{attn}} note by them requesting cleanup to modern standards. Pinging the Proto-Slavic and PIE workgroups (since there's no Balto-Slavic workgroup) (Notifying Atitarev, Bezimenen, Useigor, PUC, Fay Freak, Vorziblix, AshFox, Chernorizets, Silmethule, AryamanA, Caoimhin ceallach, Exarchus, Mellohi!, Pulimaiyi, Victar): , as well as @Thadh and of course @-sche, who may have input. Benwing2 (talk) 05:57, 30 January 2026 (UTC)Reply

Support: I have to agree. It's annoying to police. --{{victar|talk}} 06:08, 30 January 2026 (UTC)Reply

Support: Yes, I think we need to get rid of "Proto-Baltic" and replace it everywhere with Proto-Balto-Slavic. In my opinion, it's simply an outdated term. And the widespread use of the term "Proto-Baltic" in Latvian lemmas is a consequence of the transfer of information from the etymological dictionary LEV, where this outdated term was used. In other places (etymology of Proto-Finnic lemmas, etc.) the situation is similar. All reconstructions as Proto-Baltic are "simplified" reconstructions of Proto-Balto-Slavic, for example with the absence of the use of "acuteness" ˀ < PIE laryngeals. AshFox (talk) 06:30, 30 January 2026 (UTC)Reply

At the same time, a problem arises... in the "2-thousand-year void" from the moment of the split of Proto-Balto-Slavic into Proto-Slavic and the Baltic languages around 1500 BC ‒ and until 500 AD, when East Baltic and West Baltic began to be distinguished separately. What happened from 1500 BC to 500 AD with the Baltic languages? To assume that for the Baltic languages, Proto-Balto-Slavic continued until 500 AD, while early Proto-Slavic separated 2-thousand years earlier? AshFox (talk) 06:47, 30 January 2026 (UTC)Reply

I'm fine with getting rid of Proto-Baltic, but I'm not quite sure about the family.

Although yes, people now generally gravitate towards there not being a single proven node of 'Baltic', I wouldn't call it 'long-term consensus', since there's still no clarity whether West Baltic is then closer to East Baltic or to Slavic (and the majority of people prefer branching trees, not exploding ones). Furthermore, I'm pretty sure there are still some schools (in the US?) that don't recognise Balto-Slavic as a valid grouping.

This gets even more complicated with the Baltic substrate features in Finnic, Saamic and Mordvinic languages, of which it is not clear what kind of Baltic language it was, but as it's pre-Slavic it is clear that it is not Slavic. Thadh (talk) 07:13, 30 January 2026 (UTC)Reply

Is a 3-way "exploding tree" a problem? Indo-European is typically defined as having a 10-way (or so) branching because no one can agree on a branching model. AFAIK the only top-level splits that are well agreed upon are Anatolian followed by Tocharian. Benwing2 (talk) 18:01, 30 January 2026 (UTC)Reply

But in fact, every scholar will have their own pet theory on Indo-European branching. One of the most common ones is the one where you have Anatolian/Tocharian/Germanic/Italo-Celtic/rest in that order splitting off, but there are endless variations on that (in fact, the Tocharian split-off is also becoming more and more debated). The issue is that in Balto-Slavic, there are only three possible variations, and I'm not sure anyone believes West Baltic is the first of the three to split off. This means that about half (if not more) people would believe in a Baltic node. Thadh (talk) 18:07, 30 January 2026 (UTC)Reply

So overall I'm fine with getting rid of Proto-Baltic and keeping Baltic as a node, although it would be a bit strange to do so. Mostly I am concerned about having "Proto-Baltic" reconstructions that are known to be wrong presented in etymologies. Benwing2 (talk) 18:08, 30 January 2026 (UTC)Reply

I think that's more than fine considering a well-reconstructed 'Proto-Baltic' is not a thing anyway. So it's like the majority of our groupings, where we don't have a proto-language because nobody has been able to reconstruct it yet. Thadh (talk) 18:10, 30 January 2026 (UTC)Reply

OK, what do you think of my proposal above for converting the existing Proto-Baltic forms into the "Undetermined" language with an {{attn}} added? Benwing2 (talk) 18:19, 30 January 2026 (UTC)Reply

I think it's probably manageable to do this more specifically by language. I remember how the Latvian etymologies work, and their 'Proto-Baltic' is some type of pre-Latvian taking into account Lithuanian and Slavic - so basically wrongly reconstructed PBS. So I would propose the following:

For Latvian and Lithuanian, transform it to Proto-Balto-Slavic without the term itself, and add an {{attn}}.

For Finnish and Proto-Finnic, probably something like {{der|LANG|bat|-}} with a text like "from a Baltic language" is best, but @Surjection might be of more help with this, as he likely either wrote or has come across these. Thadh (talk) 08:23, 31 January 2026 (UTC)Reply

Finnic etymological sources do still regularly use "Baltic" as a source for borrowings, so something like that is needed, as I see it. — SURJECTION ^{/ T / C / L /} 08:39, 31 January 2026 (UTC)Reply

I'm not necessarily opposed to this as my knowledge of Baltic is quite superficial, but if we're going to go through with this I think we should include evidence in this debate. When I learned about Baltic I don't remember the none-existence of it as a subgroup ever being emphasized. Petit (2010) Untersuchungen zu den baltischen Sprachen uses the family tree model when describing Baltic, although he does say: "Ich bin aber davon überzeugt, daß das geographische Modell für eine Beschreibung der baltischen Dialekte wohl realistischer wäre" (p. 4), by which he is referring to the w:Wave model of language change. But it's not that there is a lack of features which distinguish the Baltic families as a whole from Slavic (pp. 6-11). What exactly is the basis for not wanting to reconstruct Proto-Baltic? I think it has to be established that having Proto-Baltic as a subgroup will render many features unreconstructable. —Caoimhin ceallach (talk) 07:31, 30 January 2026 (UTC)Reply

Well, either there are innovations that characterize all of Baltic (both East and West) and don't characterize Slavic, in which case the Baltic node is valid, or there are no such innovations, in which case the Baltic node isn't valid. When you say there are features that distinguish "the Baltic families as a whole" from Slavic, what are these? Are they innovations or conservative features? Benwing2 (talk) 17:55, 30 January 2026 (UTC)Reply

If I remember correctly, the issue with 'Baltic' as a node is not that there are no common innovations between East and West Baltic that are not present in Slavic, it's that in addition to those there are also common innovations between East Baltic and Slavic not found in West Baltic and even a few common innovations between West Baltic and Slavic not found in East Baltic. Thadh (talk) 18:00, 30 January 2026 (UTC)Reply

Ahh right, I know about the n- becoming d- in the numeral 9 characterizing East Baltic and Slavic; I always figured that had to be explained as an areal transfer. Benwing2 (talk) 18:04, 30 January 2026 (UTC)Reply

I'd like to keep "Proto-Baltic" but redefine it as the common ancestor of Latvian and Lithuanian. P U C – 17:40, 30 January 2026 (UTC)Reply

That would be Proto-East Baltic, a language that is yet to be reconstructed by anyone since untangling the sound changes there is a pain. Thadh (talk) 18:03, 30 January 2026 (UTC)Reply

Support. Anatoli T. ^{(обсудить}/^вклад) 21:42, 30 January 2026 (UTC)Reply

I'm in a similar place to Caoimhin: my impression had been that there were a variety of ideas (rather than a consensus) about what things in between PIE and the attested Baltic and Slavic languages should look like. If we were to deprecate (Proto-)Baltic, there are practical issues to resolve first, as Thadh and AshFox have noted, like how to handle loans from pre-Slavic Baltic (into Finnic, etc) and how to handle loans into Slavic from e.g. Dnieper Baltic / Golyad, which after a couple of discussions here—Ctrl-F "Golyad" to find most of them—are currently set as varieties of Proto-Baltic. - -sche (discuss) 06:33, 31 January 2026 (UTC)Reply

Support. Chihunglu83 (talk) 06:37, 31 January 2026 (UTC)Reply

Proto-Baltic is very much a thing on Wikipedia... Exarchus (talk) 10:34, 31 January 2026 (UTC)Reply

OK I took a look at it. Their noun paradigms look like garbage to me; they're essentially identical to Proto-Balto-Slavic except with the accents missing, but we know the accents were preserved in any putative Proto-Baltic, so essentially there's no difference between the two. Benwing2 (talk) 17:38, 31 January 2026 (UTC)Reply

As I said above, all Proto-Baltic reconstructions are essentially Proto-Balto-Slavic reconstructions but without "acut" ˀ (leaving only long vowels) and also use the more outdated *u̯ and *i̯ instead of *w and *j, respectively.

PS: As I understand it, one way or another, Wiktionary uses the concept described in this quote from Wikipedia: "... some linguists like Frederik Kortlandt or Rick Derksen proposed that Proto-Balto-Slavic split into three language groups — East Baltic, West Baltic and Proto-Slavic — without a Proto-Baltic stage ...". AshFox (talk) 04:24, 1 February 2026 (UTC)Reply

In 2012, ISO split out Yotti, formerly considered a dialect of Yendang, into a separate language. For unknown reasons, in the process they gave Yendang proper a new code [ynq] and retired [yen]; Yotti was given the code [yot]. Glottolog follows suit. The relevant justification follows:

Yotti is listed as a dialect of Yendang. This was a mistake, as they are linguistically and socially distinct. In an interview in Mayo Lope in 2007, the Yotti individuals said that they have two origins: some clans come from Yoro (linking them with Mumuye [mzm]) and others from Bolki (linking them with Bacama [bcy]). They report than many are bilingual in Mumuye [mzm] and Yendang from an eariy age, but there are some Yotti (particularly the Nyagbalaŋ and Yɛyɪpte clans) who do not understand Yendang well.

Intelligibility testing with children in Bonding (a Yotti village) showed very poor understanding of a text recorded in Kpankwai (a Yendang village).

Lexical similarity: 35% with Bali [bcn], 35% with Kpasham [pbn], 13% with Yendang [yen].

Population: Yotti have about eight villages compared with many more than thirty Yendang villages.

Seems like we should follow suit as well. Benwing2 (talk) 20:34, 31 January 2026 (UTC)Reply

Update: User:-sche, it looks like you decided not to split this in 2014 in Wiktionary:Beer_parlour/2014/February#splitting_Yendang_and_Yotti, although with no reason given. Do you remember the reason? Benwing2 (talk) 20:57, 31 January 2026 (UTC)Reply

Back then, not having enough information to justify making a change, and seeing no comments from anyone, I just left things as they were (here), "not split at this time". (Also, it was my impression from other discussions that prevailing mood here at that time was that the SIL, who decide on these codes for the ISO, was too splittist, splitting things based more e.g. on who needed a Bible using their dialectal spellings in order to be receptive to conversion than on actual comprehension issues, so people were more sceptical of following the SIL/ISO in making splits.) Seeing now that Glottolog too has been persuaded to split these, and that there are substantial intelligibility issues, it seems reasonable to make the split. - -sche (discuss) 22:15, 31 January 2026 (UTC)Reply

BTW, there is not much literature about these lects, but seemingly a majority of the literature that does exist uses the spelling "Yoti". But there is overall so little literature that I don't know if that amounts to enough literature using "Yoti" to justify us using it and deviating from the spelling ISO and Glottolog use (vs just listing it as an alias). - -sche (discuss) 22:29, 31 January 2026 (UTC)Reply

In 2012, ISO renamed Yeskwa -> Nyankpa, and Glottolog followed suit. ISO's justification is rather spare:

Yeskwa is an outsider name for this language group. Nyenkpa is their own name for their language.

But the following additional information is provided:

(a) First-hand knowledge. Describe: Through sociolinguistic survey of the language area. Muniru, John, Luther Hon, Carol Magnusson, and Zacharaiah Yoder. forthcoming. A sociolinguistic survey of the Nyenkpa dialects.
(b) Knowledge through personal communication. Describe: This email from Jonathan Barnhoorn, Advisor to the Nyenkpa Language Program, dated 26 July 2010, explains that Nyenkpa has become the standard written form of the language name in the language. "The additions recommended below look good for the Nyenkpa. Roger, you were asking about the pronunciation of the name. It is a compound or combination of 2 words. Nyi – to know and angkpa – leaves (pl). I agree that when they are pronounced together it does sound more like “nyiangkpa” or “nyangkpa”. However, when I tried to change the spelling I was met with resistance. They didn’t want to change it because it had been that way for so long. Apparently the pronunciation comes from Mada or something." Jonathan Barnhoorn
(c) Knowledge from published sources (please give complete bibliographical references): Blench, Roger. 2009. The Nyankpa [=Yeskwa] language of Central Nigeria. Draft posted on Roger's website, which I can't access at the moment, so I can't give the full URL. Roger gives an alternate spelling of the name, identifying it as the correct name for the language.

Benwing2 (talk) 20:43, 31 January 2026 (UTC)Reply

So, just to be clear: you want to say no to yes...;) Chuck Entz (talk) 20:58, 31 January 2026 (UTC)Reply

All of Glottolog, Ethnologue, Wikipedia and ISO 639-3 use the name "Naukan Yupik". "Naukanski" is obviously the Russianized version; I'm not sure how we ended up with this name. Benwing2 (talk) 21:12, 31 January 2026 (UTC)Reply

Weak support: both "Naukan (Yupik)" (e.g. in Aralova et al. 2025, Budyanskaya et al. 2025, Chekin 2025, Gruzdeva 2025, Koryakov 2024, Hunt and Schreiner 2023, Pakendorf 2025, Pavlova 2025, Piispanen 2022, Pupynina and Koryakov 2024, Pupynina et al. 2025, Salaberri 2022) and "Naukanski (Yupik)" (e.g. in Ash 2025, Berge 2023, Berge 2025, Chen 2023, Chlenov and Krupnik 2024, Compton 2024, Hunt 2025, Fortescue and Vajda 2022, Kantarovich 2024, Krupnik 2022, Mithun and Olsen 2024, Panova 2024, Piispanen 2022) appear roughly equally common in the recent literature, though it could be asserted that "Naukan"'s numbers are being artificially propped up since it is disproportionately attested in English abstracts of Russian-language papers. However, I am inclined to prefer "Naukan" since sources that employ "Naukanski" use "Sirenikski" rather than our "Sirenik" (unless of course the name of "Sirenik" is altered). Hazarasp (parlement · werkis) 03:16, 11 February 2026 (UTC)Reply

Given that both "Adyghe" and "Western Circassian" or "Kabardian" and "Eastern Circassian" are common these days, would anyone be opposed to marking these languages as "Adyghe (Western Circassian)", or vice versa? The terms are used in academia alongside "Adyghe" and "Kabardian".

The thing that is especially confusing to anyone who speaks these languages but is not familiar with the semantics of linguistics is that both of these languages call themselves "Adyghe". In fact, in Russian "Adyghe" is actually "Adygean" (Aдыгейский), as in, the language of the Republic of Adygea. "Adyghe" in English was almost certainly a mistranslation of this Russian term. In Russian, the wider "Circassian languages" are also called "Adyghe languages" - with "Western Adyghe (Adygean)" and "Eastern Adyghe (Kabardian)" being the two languages (Circassian is officially used in Russia to refer to Besleney subgroup living in Karachay-Cherkessia, so they are treated as branches of Adyghe and called West/East Adyghe rather than Circassian). Which makes it even more confusing if you are from Russia, since Adyghe and Kabardian being seperate makes no sense even to a Russian person. So it really irks me that this supposed English name of the language that was just a translation error from English to begin with appears everywhere...

(See: https://ru.wikipedia.org/wiki/Адыгейский_язык, https://ru.wikipedia.org/wiki/Кабардино-черкесский_язык)

If a Kabardian speaker is searching for a word in Wiktionary, they will search it as "Circassian" or "Adyghe". Not as "Kabardian". Not only because both of these languages call themselves "Adyghe" but also because they are just called "Circassian" or "West/East Circassian" when referred to in other languages by Circassians themselves. So very few people know that "Adyghe" is supposed to only include the language of Adygea. I've always had to explain this difference to Circassians who either don't understand why or get angry and say "What, so you say I'm not an Adyghe??" In fact, Kabardian learning material in Turkey has been published as "Adyghe language" material, which iis annoying to an Adyghe learner since you expect Western Circassian but are met with Kabardian.

I'd argue the term "Kabardian" is a wrong term anyway because Besleney did not originate as a Kabardian sub-dialect, but rather they both came from a common ancestor. If you tell a Besleney speaker that they are speaking Kabardian, they might just vomit. Kabardian just happens to be the largest language in the Eastern group. That's like calling Austrian dialect of German as Bavarian dialect or vice versa.

With all this in mind, is there a good reason to not include "West/East Circassian" in the language labels? Уикиредактор (talk) 21:21, 31 January 2026 (UTC)Reply

@Adamsa123 I'm pinging you since you are undoubtedly the biggest contributor for Adyghe and Kabardian. What do you think? Уикиредактор (talk) 20:47, 1 February 2026 (UTC)Reply

@Уикиредактор We generally avoid adding parenthetical qualifiers (e.g. Adyghe (Western Circassian) in place of just Adyghe or Western Circassian) into language labels except for disambiguation purposes e.g. Mor (Austronesian) vs. Mor (Papuan) to distinguish two different languages, both of which are called Mor and both are spoken in New Guinea. If the terms Adyghe (unqualified) and Kabardian are problematic, I would suggest we change the names either to Western Adyghe and Eastern Adyghe or Western Circassian and Eastern Circassian, without parenthetical qualifiers. That said, I know little about Adyghe/Circassian and I don't know if my suggested terms are problematic either linguistically or politically. Benwing2 (talk) 21:11, 1 February 2026 (UTC)Reply

Western/Eastern Circassian is totaly fine, in my opinion. Let's look at the usage of the terms. Well, this is anectodal evidence, but I can confirm that within the Circassian community, the terms "Western Circassian" and "Eastern Circassian" are the most commonly used terms to refer to these languages. The terms are also used by prominent linguists of the Circassian languages such as John Colarusso. Searching the terms in Google Scholar Labs yield many results. This is how Colarusso divides up the Circassian languages:^[1]

Circassian

A. West Circassian or Kyakh [Kyakh means "West". Here, Colarusso is using this to refer to our "Adyghe language"]

1) Natukhay

2) Shapsegh

3) Hakuchi

4) Abdzakh or Abadzakh

5) Bzhedukh

6) Khatukay

7) Chemgvi or Temirgoy [basis for literary standard]

B. East Circassian [Our "Kabardian" language, Colarusso calls it "East Circassian" and considers Besleney and Kabardian to be dialects of East Circassian, rather than Besleney being a dialect of "Kabardian", which again is a wrong English translation. The Russian term is Kabardino-Cherkess, with Cherkess referring to Besleney.]

1) Kabardian

2) Besleney or Besney

3) Kubano-Zelenchuk

He also says:^[2] Kabardian, the eastern form of Circassian, is a member of the Northwest Caucasian language family, which includes the Western Circassian or Adyghé dialects, the transitional Besleney Circassian, the distinct Abkhaz and its closely related sister, Abaza, and Ubykh, transitional between Circassian and Abkhaz-Abaza.

George Hewitt also used the terms West/East Circassian while referring to these languages.^[3] It seems to me that these linguists understood how ridicilous the terms "Adyghe" and "Kabardian" are and tried to provide an alternative.

Turkish-language scholars such as Murat Topçu have also utilised the term. In fact, it seems to me that the usage of "Adyghe" and "Kabardian" are mostly limited to channels such as Wikipedia. Most sources seem to use "West/East Circassian" and put (Adyghe/Kabardian) in brackets or vice versa.

The Adyghe Wiki page for the Adyghe language calls itself "Кӏах Адыгабзэ" (Western Circassian/Adyghe).

Also I guess relevant: https://glottolog.org/resource/languoid/id/adyg1241 Уикиредактор (talk) 21:55, 1 February 2026 (UTC)Reply

FWIW, if you look in Glottolog, under the Circassian family, there are two languages "West Circassian" (= Adyghe) and "Kabardian". I dunno why they chose the names this way. Benwing2 (talk) 22:04, 1 February 2026 (UTC)Reply

So, are we going ahead with "West/East Circassian"? Уикиредактор (talk) 10:53, 3 February 2026 (UTC)Reply

I asked on Discord for opinions and so far the editors there are neutral, so I will probably go ahead in a couple of days if I don't hear any opposition. Benwing2 (talk) 21:23, 3 February 2026 (UTC)Reply

I agree with @Уикиредактор, I also think the terms "Western Circasian" and "Eastern Circassian" are better. These terms are also as commonly used as "Adyghe" and "Kabardian". They don't have parentheses. The terms "Adyghe" and "Kabardian" are a bit problematic as Kabardians refer to themselves as "Adyghe" as well, therefore, I think the best solution is "Western/Eastern Circasian", they're common terms, straight forward and has no confusion. Adamʂa123 (talk) 22:14, 1 February 2026 (UTC)Reply

I don't know much about Adyghe or Kabardian, but I can verify that these have always had a very high rate of confusion and mismatches between language code and language name, in both entries and translations. Whatever we do, it will take a lot of work to clean up the mess. Chuck Entz (talk) 21:20, 1 February 2026 (UTC)Reply

While we're on the topic. I improved a lot of Adyghe (Western Circassian) entries offline, I want to re-add them all to Wiktionary. I can write a programming script to covert my data to Wiktionary format, the problem I have is, how can I update all these entries non-manually. I was wondering, is there a way to update existing Wiktionary entries programmatically (Wiktionary API perhaps)? I also have over 10000 Kabardian (Eastern Circassian) entries that I want to add, but again, I don't want to do it manually. Does anyone know someone in Wiktionary I can contact with that can help me on this matter? I'm a developer, so, I just need some direction. @Adamʂa123 (talk) 22:20, 1 February 2026 (UTC)Reply

@Adamʂa123

I do this all the time with a bot. You can either get a bot account (which takes a two-week vote, and usually passes) and use pywikibot yourself, or I can do it for you using my scripts. The way I do this is as follows:

Use a script to download all the entries I want to change into a single file, conventionally ending in .orig.
Copy the file to a new file without the .orig suffix.
Make all the changes to the new file.
Use another script to push the changes to Wiktionary, giving it both the original and changed file.

The reason for having the original file around is to make sure no one else made changes to the entries in the meantime; if that happens, the script that pushes the changes will skip that particular entry. You can then either make the changes manually, or if there are a lot of changed entries, re-download all the entries and use git to merge your changes with the other changes made online.

The format I use for storing entries into a file is extremely simple, like this:

Page 1 Module:languages/data: -------- begin text --------
[contents of file]
-------- end text --------
Page 2 Module:languages/data/2: -------- begin text --------
[contents of file]
-------- end text --------
[etc.]

The page numbers themselves aren't actually important; they could all be 0 and things would still work. It's just the names that are significant. If you want, I can send you a dump of all the Kabardian entries in this format, and you can update it and send me the updated file, and I'll push the changes. Benwing2 (talk) 22:37, 1 February 2026 (UTC)Reply

Oh, if they are new entries, that can also be done; I have another script for that :) ... Benwing2 (talk) 22:39, 1 February 2026 (UTC)Reply

I assume this bot can also be used to mass-rename language headers if we decide to go ahead with West/East after all? Уикиредактор (talk) 22:43, 1 February 2026 (UTC)Reply

yes, I just did that a few minutes ago with Iriga Bicolano -> Rinconada Bikol and Northern/Southern Catanduanes Bicolano -> Northern/Southern Catanduanes Bikol. You also have to rename all the categories and translation table entries, but it's not a big deal. Benwing2 (talk) 22:48, 1 February 2026 (UTC)Reply

I have the collections in JSON format, I can convert it to any format Wiktionary wants programmatically, but there might be conflicts because some contributors updated some of the entries ever since. I think I need to first download the existing entries from Wiktionary in both West and East Circassian, then write a script which will modify them, thus, instead of overriding them, I will add to them, this way there won't be any conflicts. @Adamʂa123 (talk) 14:45, 2 February 2026 (UTC)Reply

I have used git for that; it works great. You don't necessarily need a script, just create two branches, one with your changes and one with the other contributors' changes and rebase your changes on top of the other contributors' changes. Benwing2 (talk) 17:24, 2 February 2026 (UTC)Reply

^
(Can we date this quote?), John Colarusso, The Northwest Caucasian Languages (RLE Linguistics F: A Phonological Survey (Routledge Library Editions: Linguistics), Hoboken: Taylor and Francis, →ISBN:
^
(Can we date this quote?), John Colarrusso, John Colarusso, Karbardian: East circassian (Languages of the world Materials), München: LINCOM Europa, →ISBN:
^
2005 January, George Hewitt, “North West Caucasian”, in Lingua‎^[1], volume 115, numbers 1-2, →DOI, pages 91–145:

This discussion has gotten a bit sidetracked by technical discussions about how to make bot changes. I'd like to bring it back to the original topic of renaming. User:-sche do you have any thoughts? To summarize:

So far we have two editors who work on Circassian languages who are advocating the rename Adyghe -> West Circassian and Kabardian -> East Circassian.
I pinged the #caucasian channel on Discord and got two responses: User:Vahagn Petrosyan was neutral and User:Thadh expressed vague reservations due to the fact that the names in Russian are almost always the Russian equivalent of Adyghe(an) and Kabardian and not usually West/East Circassian. However, we're talking about a rename in English and not Russian, and it appears both names Adyghe/West Circassian and Kabardian/East Circassian are in common use in English.
The two editors advocating for the change say it's confusing because (a) all Circassians identify as Adyghe, so using Adyghe for West Circassian specifically is incorrect, and (b) Kabardian is properly just one of the East Circassian dialects, so using it for East Circassian as a whole is also incorrect.
If you look up Circassian in Glottolog (https://glottolog.org/resource/languoid/id/circ1239), you see two languages under this family, which Glottolog (confusingly) calls "West Circassian" and "Kabardian". The code for West Circassian is adyg1241, which implies that they formerly used Adyghe or similar as the name and renamed it at some point. I am guessing they renamed it for precisely the reason identified above, that Adyghe can be interpreted as an endonym for the family as a whole. What's also strange is that under West Circassian are listed 6 dialects, but under Kabardian is listed only 1 dialect "Greater Kabardian". This seems somewhat garbled; I would expect Greater Kabardian to be at a higher level than Kabardian rather than vice-versa. Possibly they mean to say "Kabardian proper" (which would correspond to the other point made above, that Kabardian proper is one of the dialects of East Circassian aka "Kabardian").
I am inclined to support this rename based on the points made above, and I would argue if we adopt Glottolog's "West Circassian" naming, we should likewise adopt "East Circassian" as its counterpart. But I'd like to get more thoughts before proceeding with the rename since it's a big rename (about 5,500 Adyghe lemmas and 1,000 Kabardian lemmas).

Benwing2 (talk) 21:48, 8 February 2026 (UTC)Reply

The main thing that gives me pause is that, overall, "Adyghe" and "Kabardian" are much more common than "West Circassian" and "East Circassian" in Ngrams, 'raw' Google Scholar comparisons, etc. However, I appreciate that "Adyghe" and "Kabardian" cause confusion (Chuck's point also demonstrates that), and I see that "Circassian" is in turn much more common than "Adyghe" and "Kabardian". I also notice that all of the recent (1980s to present) English-language works in Glottolog's Adyghe bibliography use "Circassian" rather than Adyghe, and while their bibliography is certainly not exhaustive, I notice a similar trend on Google Scholar: a search for "Adyghe" just anywhere in any paper finds that it's an order of magnitude or two more common than "West Circassian" overall, but searching for papers that use those terms in their titles—and perusing the very much reduced results to see which ones are linguistic works specifically about the language—it's a surprisingly tighter race (so to speak). I have no objection to the proposed renames. - -sche (discuss) 05:47, 9 February 2026 (UTC)Reply

Yes, diacritics are super-*kewl*, but not necessary. Glottolog uses the undiacriticked form "Yaroame"; ISO uses the partly-diacriticked form "Yaroamë". It's very hard to tell if there is a "most common" name in English; Glottolog has 7 citations: 5 in Portuguese, one in French and one in English (the ISO creation request for the language, which provides no justifications for their choice of spelling). The Portuguese citations variously use spellings like Yãroamë, Yãroamī, Yawari, Yawaripë, etc. Benwing2 (talk) 21:26, 31 January 2026 (UTC)Reply

It's impressive how little-mentioned this is. Google Scholar finds just one paper where the snippet that Google shows actually does use Ỹaroamë in English, and no papers using Yaroame (nor Yaroamë); searching for uses on .edu websites, I find one more English paper using Ỹaroamë and two using Yaroame; there are also scattered uses of Yaroamë (in both English and Portuguese); it's basically a wash in English. In Portuguese, Ỹaroamë is more common than Yaroame. The language has a lot of vowels, which is where I suspect the diacritics come from, and anyone entering words in the language is probably going to have to enter diacritics already for that reason. The argument for Ỹaroamë, as I see it, would then be that it's probably the more familiar spelling (roughly equally common in English, more common in Portuguese, and probably reflective of the native vowel qualities, which are numerous). The argument for Yaroame is that it's easier to type (though this seems somewhat weakened by the fact that anyone entering terms is probably going to have to figure out how to type diacritics to spell the terms themselves). It doesn't look like we can avoid the question by a (more substantial) rename, either, as Wikipedia says the name Yawari / Jawari is also used by Ninam, so it'd be more ambiguous, and Yawaripë is a dialect, so again less optimal. Meh. Abstain. - -sche (discuss) 00:19, 3 February 2026 (UTC)Reply

OK, I said I would stop posting but this rename seems particularly clear. This is called Quechan in ISO, Glottolog and Wikipedia, and from looking at the numerous citations in Glottolog, it seems that all uses of the exonym Yuma were <= 1976, at which point Kwtsaan and Quechan started to be used instead, with Quechan predominating in recent citations. Benwing2 (talk) 22:26, 31 January 2026 (UTC)Reply

My only reservation is that, although it's spelled "Quechan" pretty much everywhere, it's pronounced quite differently- the initial consonant is "kw", both vowels are definitely low central/"a", and the middle consonant is definitely "ts" rather than "ch". It's also easy to confuse with the South American Quechuan, which is completely unrelated. On the other hand, most of our site visitors will be looking for the "Quechan" spelling rather than "Kwatsáan". Chuck Entz (talk) 23:50, 31 January 2026 (UTC)Reply

While this may be disapproved because it hasn't gained popularity yet but Tausūg refers to the people (note tau meaning person in Austronesian languages), and Bahasa Sūg refers to the language. It is a known misconception to call the language Tausūg as well considering the language code is also "tsg". However, renaming Tausūg to Bahasa Sūg would be more correct. So should we also keep the popular name or the correct name? 𝄽 ysrael214 (talk) 05:15, 1 February 2026 (UTC)Reply

In 2022, the ISO added a code phj for "Pahari", an endangered Newaric (Tibeto-Burman) language spoken by about 3,500 people in central Nepal. "Pahari is closely related to Newar, and has until recently been treated in the linguistic literature as a dialect of it. Pahari shares 55–65% of its basic vocabulary with Newar, which suggests the two are not mutually intelligible, and their speakers consider them to be separate languages."
Unfortunately, "Pahari" just means ~"hill (language)", and is the name used for various (groups of) Indo-Aryan languages, as seen e.g. here, and for the Pahari-Potwari language, and according to Wikipedia it's also used to denote some Dogri varieties, a variety of Bilaspuri, and the Nepali language, among other things, so we probably need to use a disambiguator. "Pahari (Nepal)" would reduce, though not eliminate, the number of things it could be mistaken for. I believe "Pahari (Newaric)" would succeed at ruling out all of the other languages (including ones spoken in Nepal!) which are also sometimes called Pahari, for anyone who is familiar with the word "Newaric", but that might not be many people. Thoughts? - -sche (discuss) 02:40, 4 February 2026 (UTC)Reply

Glottolog uses Pahari Newari and we might as well follow suit. It's somewhat similar to the case of Abu' Arapesh (which I have come around to accepting). Benwing2 (talk) 03:24, 4 February 2026 (UTC)Reply

I should also add, it reminds me of the various "Fali" languages which I brought up in a discussion farther up, where it seems "Fali" just means "mountain dweller" or similar. Benwing2 (talk) 03:26, 4 February 2026 (UTC)Reply

I am going through the translation tables in preparation for converting them to a world where the language name is auto-generated by the {{t}} template and the syntax is changed to allow multiple translations in a single {{t}} template, with inline modifiers (as was the consensus in the BP in December when this was proposed). FWIW there's a discussion going on at User_talk:This,_that_and_the_other#converting_translation_templates_to_new_syntax between me and User:This, that and the other about how to go about making the switch, particularly in a way that minimizes disruption while the switch is happening and allows old revisions to display more or less correctly. I am encountering several repeated subvarieties that might be worth making into etym langs; even if there may not be too many terms borrowed from these particular lects, having codes for them is useful if they show up a lot in Translations or Descendants tables. In particular (in general for all these cases, categories and labels already exist):

For Welsh: Category:North Wales Welsh and Category:South Wales Welsh.
For North Frisian: Category:Föhr-Amrum North Frisian, Category:Halligen North Frisian, Category:Heligolandic North Frisian, Category:Mooring North Frisian and Sylt North Frisian.
For Azerbaijani: At least Category:South Azerbaijani.
For Ladin: Category:Badiot Ladin, Category:Fascian Ladin, Category:Gherdëina Ladin and maybe Fodom Ladin (the latter has only 3 category members and shows up only twice in translation tables AFAICT).
For Assamese: At least Category:Central Standard Assamese and Category:Eastern Standard Assamese. (These are called "Central Assamese" and "Eastern Assamese" in the translation tables. I don't know why the qualifier "Standard" is present in the categories.)
For North Levantine Arabic: Category:Lebanese North Levantine Arabic and Category:Syrian North Levantine Arabic.
For South Levantine Arabic: maybe Category:Palestinian South Levantine Arabic, Category:Galilean South Levantine Arabic and Category:Jordanian South Levantine Arabic.
For Kabuverdianu: maybe Category:São Vicente Kabuverdianu, Category:Santiago Kabuverdianu and a code for ALUPEC (this is not a region but the government-promoted writing system, which is not in wide use).
Finally, although they don't seem to occur a lot in translation tables, I would suggest etym codes for the main Romansch varieties (Category:Puter Romansch, Category:Surmiran Romansch, Category:Sursilvan Romansch, Category:Sutsilvan Romansch, and Category:Vallader Romansch) as well as for "Rumantsch Grischun".

Benwing2 (talk) 04:23, 4 February 2026 (UTC)Reply

Seems reasonable to me. - -sche (discuss) 05:29, 5 February 2026 (UTC)Reply

In 2021, the ISO added code glb for Belning (also known as Belneng, Belnəng, and occasionally as Bəlnəng), a previously (until c. 2019) undocumented language. It is related to anc Ngas (Angas), and some Belning speakers consider their language to be Ngas, while others regard Ngas as a different language; some intercomprehension appears to be due to exposure, and Belning speakers said "it is easier for them to understand Ngas than for the Ngas to understand Belning". This long profile of the language says "comparison of the Belning wordlists with the Ngas wordlist reveals considerable difference (62–70% similarity) between the languages [...] SIL Nigeria uses 70 percent lexical similarity as a standard threshold to differentiate languages from dialects (Bergman 1989:8.1.5–8.1.6). Lexical similarity below 70 percent corresponds with inadequate comprehension between the compared varieties. With lexical similarity of 62 to 70 percent, we expect that, to their speakers, the languages seem quite similar. This also supports the claims that Belning and Ngas are closely related, but we still expect that there is a significant loss of comprehension when they hear each other’s language." Should we add Belning? - -sche (discuss) 20:56, 4 February 2026 (UTC)Reply

In 2021, the ISO split emz Mbessa off from Kom, accepting a request which provided new data asserting low mutual intelligibility (explicitly disputing some previous research which claimed high, 87% mutual intelligibility), and the use of different orthographies. The change request states that "From a sample of 100 Mbessa native speakers we surveyed, only 18 people could understand or speak Kom without learning it [...] The vast majority of Kom people do not understand Mbessa and cannot understand it unless they learn it and vice versa." (The code requester presents this as "18% mutual intelligibility", which seems incorrect.) This paper too says there is some, but "a low degree of[,] intelligibility between Mbessa and Kom. [... A] Kom speaker and a Mbessa speaker can be conversing, each speaking his language, but, at a certain point, intelligibility would be lost. Again at certain points in the conversation both would start to figure out what the other person must have said or must be saying." Should we split Mbessa off from Kom? - -sche (discuss) 21:27, 4 February 2026 (UTC)Reply

I think the argument for splitting Mbessa from Kom is stronger than the argument for splitting Belning from Ngas, which seems borderline. Both the (apparently) lower mutual intelligibility of Mbessa~Kom vs. Belning~Ngas and the orthography difference seem to be arguments in favor of splitting the former. I do note that Glottolog splits both languages, but I don't know if they are simply following ISO in this regard or if they have their own opinion. Benwing2 (talk) 02:06, 5 February 2026 (UTC)Reply

We still have CAT:Coastal Kadazan language and CAT:Tambunan Dusun language. Codes deprecated by ISO, but still included by Wiktionary. Should we just get rid of these two cats and label them as Coastal (if Coastal Kadazan forms ≠ Central Dusun forms)? I've tried to contact @Wiktionarian89 who made these lemmas but the editor has not been online for a while. Chihunglu83 (talk) 07:45, 5 February 2026 (UTC)Reply

I've been pondering this myself: in fact the ISO deprecated all four (but we currently still have all four) of ktr Kota Marudu Tinagas, kzj Coastal Kadazan, kzt Tambunan Dusun, tdu Tempasuk Dusun, because (some) speakers prefer to consider them one language, dtp Kadazan-Dusun or Kadazandusun, seemingly partially for political reasons, despite some linguistic differences. My impression (from e.g. [47], [48]) is that it is a classic dialect continuum in the sense that speakers can understand adjacent varieties extremely well, but varieties from different ends of the continuum may not be able to understand one another, but a standardized variety was created in the 1990s to bridge the continuum. Is that right? It does seem like a sufficiently critical mass of speakers want the lects treated as one language that they persuaded the ISO to merge them, so I, knowing relatively little about the languages, am not going to stand in the way. But when we merge all of these into dtp, we need to rename it from "Central Dusun" to "Kadazandusun", yes? not only to correctly identify the language (I read that some speakers don't identify with Dusun and all, and others do not identify with Kadazan at all, hence the blend word being used for the standardized, bridging / merging / all-encompassing variety), but also because "Kadazandusun" is also significantly more common than "Central Dusun" or the other spellings. - -sche (discuss) 21:46, 5 February 2026 (UTC)Reply

All Central Dusun dialects are mostly mutually intelligible when conversing (Costal Dusun has /h/, /v/, /z/ + adopted many loanwords, particularly from Malay and other northern Borneo langugages). Most locals now use Kadazandusun, I suppose Central Dusun is purely a linguistic term. I personally prefer Kadazandusun and it's obviously more common, but I'll let others decide. Chihunglu83 (talk) 02:29, 7 February 2026 (UTC)Reply

OK, unless anyone comes forward with objections, let's follow ISO's merger of the four codes into dtp + rename dtp to "Kadazandusun". - -sche (discuss) 23:13, 8 February 2026 (UTC)Reply

As they are regularly referred to in both our entries (e.g. mond, noeus, proue, schéuggio, vuruntai) and in research (e.g. Ciconte 2015, Danesi 1977, D'Antuono 2023, Garzoni 2021, Mensching 2023, Michelotti 2008, Wolfe 2015), having the labels listed below would be convenient; they should presumably set as ancestors of the corresponding modern languages analogously to the treatment of Old Franço-Provencal and Old Italian.

[Gallo-Italic]
- [Emilian-Romagnol]
  - Old Emilian (egl-old)
  - Old Romagnol (rgn-old)
- Old Ligurian (lij-old)
- Old Lombard (lmo-old; Category:Old Lombard already exists)
- Old Piedmontese (pms-old)
[Italo-Romance]
- Old Neapolitan (nap-old)
- Old Sicilian (scn-old)
[Rhaeto-Romance] (the utility of these is less apparent)
- Old Friulian (fur-old)
- Old Romans(c)h (rm-old)
[Southern Romance]
Old Venetan (vec-old)

Hazarasp (parlement · werkis) 03:31, 6 February 2026 (UTC)Reply

@Hazarasp No particular objection except that I'm not sure there is such a thing as "Old Romansh"; instead there are old varieties of each major dialect group (also it would use rm- not rms-). For example, in Harris and Vincent The Romance Languages, they state that Old Sursilvan (uniquely of the Romansh varieties? I think this was 17th-18th century) possessed a two-case system, and there are remnants of this in modern Sursilvan, where masculine predicate nouns (or adjectives? I don't remember) take an unexpected -s in the singular, which is a continuation of the old nominative.

A related issue is whether Old Catalan should be treated as its own L2 language or an etym-only variety. Native Catalan dictionaries tend to list Old Catalan terms and meanings as merely obsolete forms of modern Catalan rather than a distinct language, and it could be argued that esp. given the conservative nature of Catalan spelling, Old Catalan isn't distinct enough to merit being a separate L2. Benwing2 (talk) 03:54, 6 February 2026 (UTC)Reply

The term "Old Romansh" appears to be reasonably estabished (e.g. in Haiman and Benincà 1992, Loporcaro et al. 2014, Monteverdi 1995, Ricardelli 2025, and Videsott and Casalicchio 2019), of course more specific labels such as "Old Vallader" also exist, but these are on a level with references to "Old Bolognese", "Old Genoese", "Old Navarrese", "Old Picard", "Old Valencian" instead of "Old Emilian", "Old Ligurian", "Old Navarro-Aragonese", "Old French", and "Old Catalan" (i.e. both can be used depending on context, and it seems advisable to create the less specific varieties first then drill down as needed). Hazarasp (parlement · werkis) 04:49, 6 February 2026 (UTC)Reply

OK makes sense. No objections then. What is the time frame of Old Romansh? Benwing2 (talk) 05:44, 6 February 2026 (UTC)Reply

While I haven't found a explicit definition, the labels "Old Romansh" or "Old [variety of Romansh]" are apparently mostly used for pre-1700 texts, but 1750 would probably be the best cutoff since there is some reference to material from the 18th century, especially its first half (e.g. the 1718 Bibla da Cuera is "Old Sursilvan"). Hazarasp (parlement · werkis) 06:22, 6 February 2026 (UTC)Reply

Done except for converting manually-entered variety names to the codes. Hazarasp (parlement · werkis) 11:57, 6 February 2026 (UTC)Reply

@Hazarasp I (tried to) fix all the variety codes that you didn't fix by grepping through the most recent dump file. Benwing2 (talk) 03:03, 7 February 2026 (UTC)Reply

Strong oppose. What kind of discussion is this? Eleven new codes in a day and nobody was pinged! Are you even editors of any of those languages? This is the kind of thing I would take ages to discuss. Catonif (talk) 14:25, 9 February 2026 (UTC)Reply

This is not the correct modus operandi. When adding a language code I need to know to what it applies to. What are the characteristics of Old Venetan? When is the cutoff date? Which works are written in it and which ones aren't? If an Albanian entry comes from Venetan, do I now have to go and rewrite the etymologies distinguishing between the ones that are from Venetan or Old Venetan? Why did you only raise this issue with Romansch? These codes are not convenient at all, they mess up the language hierarchy and categorisation for something that can much more fittingly handled, both conceptually and practically, by the labels module.

Etymology-only codes are the least efficient solution to the least pressing of problems. Listing a bunch of links with people using the label proves nothing, just like finding attestations of red apple does not make it eligible for inclusion. Ciconte also has Old Romanesco and Old Umbrian, Garzonio lists Old Venetian, Old Paduan and Old Veronese, D'Antuono distinguishes between Old Romagnol, Modern Romagnol and Contemporary Romagnol, Michelotti also has Old Perugino, Old Bolognese. Knowing this, it is apparent that these are not claims of these being languages by any means, and are nothing more than qualitative labels attached to a language name, as much as "regional Italian" or "colloquial French" would be. It just so happens that the label is "obsolete". Old Italian should be deleted as well for these same reasons.

I urge you both to somehow revert yourself. This is not how things should be carried out on a wiki. Catonif (talk) 15:07, 9 February 2026 (UTC)Reply

The major rationale behind the creation of these etymology-only languages was that the overwhelming majority of the varieties covered by them had already been clunkily manually inputted within etymologies, lists of descendants, and other locations which labels don't cover (e.g. Old {{desc|lmo|lmo|mondo}} at mond); their preexisting nature also obviates the criticism that the codes lack precise definitions, as they are formalising existing practice rather than creating new concepts from thin air.

As for the claim that the varieties covered by these codes don't constitute separate "languages", the name "etymology-only language" is somewhat misleading as many of them are better classed as mere dialects or varieties rather than separate languages (e.g. enm-wmi West Midland Middle English, es-PH Philippine Spanish, xno-law Law French); in fact, many varieties have been relegated to etymology-only languages precisely because they proved to be part of another language (e.g. ont Ontenu, sno Snohomish).

Furthermore, I cannot agree marking the relevant terms as {{lb|language|obsolete}} is always sufficient; a distinction between terms that fell out of use a few decades ago and those which haven't been used in centuries seems like a useful affordance to me, and it seems others agree given that many editors entered terms such as "Old Lombard" manually despite us having no affordances for them.

Finally, your claim that e.g. Michelotti using "Old Bolognese" and "Old Perugino" disproves the notion of "Old Emilian" and "Old Italian" being coherent linguistic entities seems no better than claiming that his frequent use of "Bolognese" and "Sammarinese" disproves the notion of "Emilian" and "Romagnol (analogously to your claim that the use of "old" before e.g. "Romagnol" is a mere qualifier, one could claim that e.g. "Romagnol" is a ad-hoc formation from "Romagna" referring to whatever is spoken in the region rather than a particular variety). Hazarasp (parlement · werkis) 16:45, 9 February 2026 (UTC)Reply

Yeah, it's hard to come up with a better name for the module, but obviously both parts of its name ("etymology-only" when these are used outside etymology sections, and "languages" when almost none of them are languages) have confused many people over the years; there have been suggestions of renaming the module to something like "variety codes" or something, but at this point a lot of people are used to "etymology-only languages" and it doesn't really cause actual problems (other than periodic misunderstandings like this). @Catonif I would just echo Hazarasp's point that these have not been added as languages (in terms of how the codes work, they are defined as varieties of other L2s, and e.g. "Old Romagnol" can only be entered as ==Romagnol==), and you may notice that among the other things in that module are "California English" (which we're not treating as a language) and "Pre-Greek" (which we're not even treating as a singular language). In my experience we give out "ety-only" codes like candy to anything that's a distinct lect. - -sche (discuss) 17:38, 9 February 2026 (UTC)Reply

The major rationale behind the creation of these etymology-only languages was that the overwhelming majority of the varieties covered by them had already been clunkily manually inputted [...] they are formalising existing practice. You are formalising existing bad practice. If an entry in an underrepresented language on Wiktionary does something, you cannot simply assume it to be right. I am not aware of there ever being a discussion agreeing to use any of these labels. [...] which labels don't cover. Are you aware of the existence of the parameter |ll= in {{desc}} and any etymology templates? [T]heir preexisting nature also obviates the criticism that the codes lack precise definitions. That's great, give me those precise definitions then.

Regarding other uses of etymology only languages, I assume that many of those are actually particularly useful in etymologies, like Law French. These would not be useful in etymologies, just an added hassle. Now we have to go through all Maltese lemmas and change them to perhaps Old Sicilian, for no additional benefit and with oftentimes no real way of making that claim. I cannot agree marking the relevant terms as obsolete is always sufficient; a distinction between terms that fell out of use a few decades ago and those which haven't been used in centuries seems like a useful affordance to me. A few decades ago do not use the "obsolete" label, but "dated". I agree that there is indeed value to more precise dating, we have {{defdate}} for that. How is splitting something in two any more precise that the obsolete label, especially if you formalise nowhere which are the time periods the labels are supposed to stand for?

[A]nalogously to your claim that the use of "old" before e.g. "Romagnol" is a mere qualifier, one could claim that e.g. "Romagnol" is a ad-hoc formation from "Romagna" referring to whatever is spoken in the region rather than a particular variety. Of course, if the only thing you have is the existence of the label, you could claim that Romagnol is not a language, but we have linguistic reasons besides the existence of the label that allow as to say that that would be wrong. For you to use analogously, you must take my reasoning to be "if a label exists, then it's never a language". What I mean is that the existence of a label essentially means nothing in terms of ety-code eligibility, you need linguistic evidence. My point with Old Perugino is to show that there are a lot of just as used labels that you would not consider of ety-codes either.

In my experience we give out "ety-only" codes like candy to anything that's a distinct lect. Yes, but these are my two questions, are these distinct lects? and is that a good idea? The first requires linguistic insight that was not tackled in this discussion at all. The second one I would be glad to talk about, but seems out of scope. Please let's not just do things by analogy of other things. What research have these people done in the languages' historical stages, besides some Googling of labels? Catonif (talk) 03:24, 10 February 2026 (UTC)Reply

As you acknowledge, the use of these varieties in "existing practice", and not all or even most existing practice at Wiktionary emerges through formal discussion or agreement.
Using |lb= cannot ensure consistency in either the terms used or their definitions; once the category for the etymology-only language is created, a definition can be provided at that category (compare Category:Early Modern English, which is defined as "English as spoken from the late 15th to the mid-17th centuries").
I do not believe it's either necessary necessary to review every entry that might be borrowed at the Old Sicilian stage; compare e.g. Category:Terms borrowed from Early Modern English by language or Category:Old English reflexive verbs, which hardly contain all terms borrowed at the Early Modern English stage or all Old English reflexive verbs. Personally, I would suggest only treating terms as borrowed from these older varieties if they display specific phonological or morphological features that are not found in their modern descendants (compare Category:Middle English terms inherited from Anglian Old English).
As defined in Appendix:Glossary, the distinction between "dated", ("archaic"), and "obsolete" is based on the current status of the term, not when it became obsolescent. A term can rapidly progress through this sequence of designations; contrarily, it might remain "dated" or "archaic" for a extended duration if it sees continued usage in limited contexts (e.g. thou).
My use of analogy aimed to demonstrate that the mere existence of hyponymic designations (such as "Old Emilian" → "Old Bolognese") cannot disprove that a label constitutes a coherent variety, as one can adduce these for labels that denote modern varieties which are universally seen as distinct (e.g. "Emilian" → "Bolognese") and therefore "disprove" the existence of the varieties; i.e. it was not meant to demonstrate your reasoning in itself, but the logical extension of it if it was applied consistently. But to return to your current line of argumentation, any complain that specific varieties are ineligible for etymology-only language codes must be tempered by the absence of a formal standard for "ety-code eligibility" (as far as I know); I would suggest drafting a vote on the matter if you wish to institute one.

Therefore I would have no objection to creating one for "Old Perugino" if distinguishing specific forms from that variety is felt necessary, as etymology-only language codes are fundamentally tools to be employed if the name of a variety is needed in "Etymology" and "Descendants" sections rather than pronouncements about the level of distinctiveness or coherence of a variety.

Hazarasp (parlement · werkis) 12:31, 10 February 2026 (UTC)Reply

[N]ot all or even most existing practice at Wiktionary emerges through formal discussion or agreement. No, but formalisations of language codes do. So if you want to make language codes you need a proper discussion and the fact that it is an existing practice, if unformalised, means nothing.

[O]nce the category for the etymology-only language is created, a definition can be provided at that category. I know, I'm asking you to provide them. Both the dates, and valid and convincing arguments why those dates are linguistically significant.

Using |ll= cannot ensure consistency in either the terms used or their definitions and As defined in Appendix:Glossary, the distinction between "dated", ("archaic"), and "obsolete" is based on the current status of the term, not when it became obsolescent. You are demanding precision that your ety-code cannot give either. Use {{defdate}} if you want to be precise. Ultimately, words fade away rather than die in a day, so a vague system is good as well. Ety-codes cannot be justified as merely a way to be able to represent dating.

My use of analogy aimed to demonstrate that the mere existence of hyponymic designations [...] cannot disprove that a label constitutes a coherent variety. I'm not saying that the existence of Old Bolognese automatically disproves the existence of Old Emilian. Nothing is automatically proven or disproven by just labels. Evidence comes from linguistic data, and all you have given me is labels.

[A]bsence of a formal standard for "ety-code eligibility". Indeed, regarding the vote, I personally do not particularly wish to institute one, opinion varies greatly from language community.

[E]tymology-only language codes are fundamentally tools to be employed if the name of a variety is needed in "Etymology" and "Descendants" sections rather than pronouncements about the level of distinctiveness or coherence of a variety. Surely you could say that, but that's (1) bad infrastructure, as you would need an etymology code for essentially every variety in the labels module, and ll exists, (2) unnecessarily complicates categorisation (we now have "Venetan terms inherited from Old Venetan", please get rid of this absurdity) and (3) not what the reader parses it at. If as a reader I see "Old Italian" in a descendants section or etymology I'm going to think it's a language just like Old French or Old English, and perhaps that what you believe as well? It in fact very much isn't. We can pretend we mean whatever we want, but we are still misleading. Catonif (talk) 13:32, 10 February 2026 (UTC)Reply

Again replying to each of your points in turn:

As etymology-only codes don't constitute separate languages, I don't believe they require the same level of scrutiny as languages do, especially when they reflect extant unformalised practice (I reject your contention that unformalised practice is immaterial and irrelevant).
If I provide dates, it's not unreasonable to assume that you'll disagree with some of them, and I'd prefer to wrap this argument up before starting another about the definition of e.g. "Old Venetan". As for the idea idea that divisions between linguistic periods must be "linguistically significant", it has never seemed necessary to me and doesn't tally with how they are empirically employed. For instance, the choice of 1500 as the division between Middle and modern English arguably is arguably arbitrary; the English of 1450 resembles that of 1700 more than that of 1200, but it is grouped with the latter. To me, linguistic periodisation is like the division of folders in a file cabinet or volumes of a encyclopedia; i.e. it doesn't need to reflect any underlying cleavages in the material (although it can if they are especially evident), but can instead be a essentially arbitrary convention for dividing it into digestible chunks.
I'm somewhat mystified by your suggestion regarding {{defdate}}, since etymology-only codes are used for the etymology and descendants sections of entries rather than in the body of a entry like that template.
As I pointed out above, linguistic periodisation sometimes reflects the more-or-less arbitrary practice of the research community rather than anything clearly established by the "linguistic data", so that shouldn't necessarily be a requirement.
I don't disagree here.
If Wiktionary's modules were being rebuilt from scratch, it would probably make sense to combine the functions Module:etymology languages and Module:labels in a unified module (presumably by making it possible to use certain language-specific labels in etymologies, descendants, etc.), but merging the two would be a herculean task at this point. Moving on to your point that having e.g. Old French and Old Italian treated equivalently in a descendants section might mislead since the latter is far more similar to the modern language, this cannot function as a argument against etymology-only codes since there is variation in the closeness of the older language to the modern even among varieties where the former is coded separately; for instance, Old Spanish is far closer to modern Spanish than Old English is to modern English.

Hazarasp (parlement · werkis) 18:40, 10 February 2026 (UTC)Reply

Pinging the members of the Romance workgroup (apart from Benwing who is already here) and also our decently- and recently-active editors who know these languages, @Ungoliant MMDCCLXIV, Brutal Russian, Koavf, Nicodene, Gloria sah, MGorrone, IvanScrooge98, Oigolue, Underseapineappleu, Àncilu, Samubert96, Sartma, do you have thoughts on whether any or all of these unexpectedly controversial etymology-only codes for older stages of languages should exist or not? (I commented above that in general the threshold for giving something an etymology-only code seems to be quite low, but I abstain and defer to our Romance editors on whether these particular codes should exist or not.) - -sche (discuss) 20:11, 10 February 2026 (UTC)Reply

It seems like a tricky issue. It is very practical for sure, in my opinion even more for etymology sections of other languages that borrowed terms from older stages of these (e.g. English celery, from an Italian-like plural in Lombard that nowadays is no longer the norm in most dialects, if not all). However, I don’t think I have the sufficient knowledge to assess if all of those languages have an old attested form that is distinct enough to have its own etymology code. Count mine as a weak support. [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔ̟t̪ːo] _{(parla con me)} 20:30, 10 February 2026 (UTC)Reply

Etym varieties are very lightweight entities whose main existence is (a) to simplify etymologies, (b) to provide more fine-grained "Foo terms derived from Bar" categories, (c) to simplify Descendant tables, (d) to simply Translation tables in the upcoming world where {{t}} auto-generates the language name, and (e) to allow certain language properties to differ between etym varieties and L2 varieties (e.g. translation schemes, the not-yet-implemented pseudo-family categorization, etc.). In response to @Catonif, I would say that (b) is strictly optional; similarly to how it's fine to derive a term just from "Latin" rather than specifically from Late Latin, Classical Latin, Medieval Latin, etc. it's not required to derive a term from Old Lombard or whatever if Old Lombard etc. is an etym variety of Lombard. Also IMO the issue of whether Old Italian should exist at all should be divorced from the other Old Foo lects; Italian is a special case where the literary language is essentially modernized Old Italian, which AFAIK isn't the case for any other Italo-Romance or Gallo-Italic varieties. As for unifying labels and etym varieties, I have partly unified them in that the properties of etym-variety categories can be taken from the specs in the label modules, and the text such a category will report both the labels that categorize into it and the corresonding etym code if both exist. However, the two serve somewhat different purposes, so I'm not sure we could completely unify them even if restarting from the ground up. Benwing2 (talk) 01:39, 11 February 2026 (UTC)Reply

Please forgive me for adding another long section to this already long page!

As I noted some time ago, there are some obvious and widely-accepted groupings of Pama-Nyungan languages which we simply don't have, or are utterly incomplete.

Here, I'm proposing three families - single-level for the time being to avoid overcategorisation. I believe these families are totally uncontroversial. I initially derived the structures from information at Wikipedia and AIATSIS Austlang, and belatedly checked Glottolog to find my structures are essentially a flattened version of what they have.

To keep things focused, I'm not proposing any mergers or splits of the languages themselves here. There are several dialect continua where mergers could be argued for (e.g. Western Desert/Wati). However, I would generally be somewhat wary of merging Pama-Nyungan lects - happy to expand on my thoughts if anyone is interested. In any event, the classification of the lects into families will assist if the matter of mergers does arise in future.

New family: Kulin languages (code aus-kul) - following Dixon (2002)'s Ta subgroup (yes, Dixon uses the term "areal group" for the overarching T group, but I don't think there is any real doubt the languages in Ta are related)
- dgw - Daungwurrung
- wyi - Woiwurrung (Do we really need so many random 19th-century spellings listed as "other names"? If nobody objects I will pare that list back quite a bit.)
- wth - Wathaurong
  - Rename to Wadawurrung to align with the orthography used for the other lects. (Note that the Wathaurong Aboriginal Co-operative seems to use the form "Wathaurong" to refer to the organisation and "Wadda wurrung" to refer to the people.)
- xww - Wemba Wemba (includes Wergaia and a few other lects)
- dja - Djadjawurrung
- dmd - Madhi Madhi
- llj - Ladji-Ladji
  - rename to Ladji Ladji for consistency with its sibling languages
- xwd - Wadi Wadi
- tjw - Chaap Wuurong
  - We call it Chaap Wuurong, a weird form that seems rarely used (I've never seen it "in the wild"). The people are always called Djab Wurrung or Djabwurrung. Wikipedia and Ethnologue use the latter - let's follow them and rename to Djabwurrung.
New family: Western Desert languages (code aus-wat for "Wati")
- ant - Antakarinya
- mpj - Martu Wangka
- ktd - Kokata
- kux - Kukatja
- ntj - Ngaanyatjarra
- pti - Pintiini
- piu - Pintupi-Luritja
- pjt - Pitjantjatjara
- tjp - Tjupany
  - We lack this lect - if we have all the other Western Desert lects, I see no reason not to follow ISO and add this as a language.
- wbt - Warnman (traditionally included in Western Desert, including by Glottolog and AIATSIS Austlang. This paper argues it should be placed separately, but I am keeping it in its traditional grouping for now)
- kdd - Yankunytjatjara
Add to Arandic languages (which we already have as aus-rnd but only containing 1 language):
- gbb - Kaytetye (already in family)
- adg - Andegerebinha
- amx - Anmatyerre
- aer - Eastern Arrernte
- aly - Alyawarr
- are - Western Arrernte
- axe - Ayerrerenge
- axl - Lower Southern Aranda (this should probably be renamed, but I don't know enough to make a proposal)

I'm new to these types of proposals, so please let me know if more detail or information is required. This, that and the other (talk) 09:37, 6 February 2026 (UTC)Reply

Support adding Kulin and Western Desert groupings, Tjupany language, renaming to Ladji Ladji (which indeed appears to be more common than Ladji-Ladji), and renaming to Wadawurrung (which is also more common). Ngrams has Djab Wurrung being even more common than Djabwurrung, and Google Scholar results are similar (199 for Djabwurrung, 532 for Djab Wurrung), even if I restrict the search to only the last 20 or last 10 years, so probably we should use Djab Wurrung.

For Woiwurrung, are the other names displayed anywhere other than the language category? (That they are listed there very inefficiently one per line, thus taking up a lot of screen space, seems like a solvable problem.) To my understanding, the point of having languages' other names is so that people who find materials that use those other names can find which code they are supposed to include the material under. I suppose this could be solved by simply listing the alt names in a comment in the module, rather than as actual data, but I don't see the harm in listing them as actual data. (Maybe some are only used of the people and not the language and could be pared.)

For Warnman, if there is dispute over whether it is in Western Desert, but people agree it is in Wati (yes? or no?), we could be conservative and just put it in the higher-level Wati family...?

All the Arandic languages seem to already be categorized (since 1 February) except Ayerrerenge, which, yes, we should categorize. (Wikipedia suggests it could even be merged under Andegerebinha, but AIATSIS seems less convinced of this.) - -sche (discuss) 19:31, 6 February 2026 (UTC)Reply

@-sche thanks for having a look.

As for alternative names of Woiwurrung, the alternative names are only displayed on the category page (and WT:LOL), but I don't really see the value in keeping so many variant spellings which any search engine would be able to resolve. (For what it's worth, I suspect most would fail to meet our CFI.) I maintain that, at the very least, obviously defective spellings like "Oorongie" and "Bunurowrung" (can't find outside lists of alternative names) or "Wairwaioo" (used once by William Buckley in reference to the people, not the lect) ought to be removed.

And as for Warnman, the published literature doesn't appear to maintain a consistent distinction between "Wati" and "Western Desert". AIATSIS and Wikipedia do both treat Wati as a higher-level grouping than Western Desert (with Warnman in Wati but outside Western Desert), but Glottolog has Wati only (with Warnman grouped alongside Martu Wangka), and Dixon (2002) has Western Desert only. On this basis I'd be reluctant to add a Wati family just to account for Warnman. But I would not be opposed to leaving Warnman out of Western Desert for now. This, that and the other (talk) 02:34, 7 February 2026 (UTC)Reply

I don't know much about Pama-Nyungan grouping but I

Support adding clearly uncontroversial nodes. I would prefer the code for Western Desert be something like aus-wde rather than being an abbreviation of a different name, esp. if there's any possibility that the different name could actually end up its own node (as seems at least possible here). I also support removing obsolete/uncommon/defective spellings listed as aliases; as you point out, a search engine could probably resolve these automatically given a bit of context (e.g. at one point I forgot the Javanese term for the Javanese script and Googled "Carawak script"; it immediately corrected to "Carakan script"). Benwing2 (talk) 02:48, 7 February 2026 (UTC)Reply

Yes, I hadn't noticed that code, but if we're adding Western Desert, I agree with Benwing that we should use a code that abbreviates Western Desert, not a code that abbreviations some other name (especially not when that other name may denote a different grouping that could, theoretically, also exist here). - -sche (discuss) 01:03, 8 February 2026 (UTC)Reply

While we're on the subject of Australian languages, it's not just tjp Tjupany which we failed to notice (and follow) ISO adding. They also added:

jbi Badjiri, an extinct language, which neither Glottolog nor Wikipedia considers to be a dialect of anything else. AIATSIS notes that "the closest language to Badjidi (D31) is Kalali (44% cognate count)", not mutually intelligible. My suggestion: Add.
nwg Ngayawung, an extinct language. It does not seem to be a dialect of anything else which already has a code; indeed, AIATSIS notes that Ngangaruku / Nganguruku (which lacks an ISO code AFAICT, and which Wikipedia groups with nwg) shows significant vocabulary differences and "the Mobile Language Team treats Ngayawang, Ngangaruku and Ngawadj S10 as distinct languages", so it might even be appropriate to add our own code for Ngangaruku / Nganguruku, but certainly my suggestion is to Add nwg.
tgz Tagalaka (Tagalag, Takalak, Dagalag), an extinct, sparsely documented (~small-wordlist-only) language. My suggestion: Add.
xnm Ngumbarl, an extinct, sparsely documented (~small-wordlist-only) non-Pama Nyungan language. My suggestion: Add.
gll, which is most commonly called Kalali (then Galali; then Kullilla; then the name AIATSIS uses, Kullilli; and then the name ISO used, Garlali). The language went extinct, but there are efforts to revive it. The Wanggumara people formerly spoke one of its dialects, then switched to speaking a dialect of the "Wilson River language", for which reason Waŋkumara / Wanggumara occurs as the name of a dialect of both. Nonetheless, Kalali is its own language different from the "Wilson River language" (cf. Glottolog); as AIATSIS says, "Previously these two language varieties were conflated under the same code, D30, based on McDonald and Wurm's suspicion that they were originally the same. However, given that this is not established fact and that, synchronically, the two are sufficiently different to be classified as belonging to different language families, they are now distinguished in this database and the Thesaurus." My suggestion: Add.

And:

tjj Tjungundji. Glottolog calls this Yangathimri instead, and considers it part of an Anguthimri-Yangathimri language, and AIATSIS says "Crowley states that the language of the Tjungudji (Y14) people is known as Yangathimri"; both names (in various spellings, e.g. Tjunguntji) can be found in literature but Tjungundji seems to be more common. There were ten speakers left in 2005, the last year for which I've found anyone offering a number. Wikipedia groups it with aid Alngith, lnj Linngithigh,awd Mpakwithi; it seems to be through mere oversight that we have those three lects and not this one: should we add this one, or consider merging some of those?
wkr, calling it "Keerray-Woorroong", the language of the Girai wurrung; it is related to gjm, which the ISO calls Gunditjmara. (Wikipedia groups it together with gjm as a dialect of Dhauwurd Wurrung language. Glottolog calls it "Kurnkopanut" which AIATSIS says is not correct, and puts it as a subvariety of gjm which they call "Warrnambool".)

(Also, someone with a bot/script could probably effect the rename of mep more easily than I could, though it's not that many entries.) - -sche (discuss) 18:37, 6 February 2026 (UTC)Reply

I support adding nwg as the only one I have any awareness of. I see no reason not to add the other four.

The situation with Gunditjmara and other Warrnambool-area languages seems to be quite uncertain. They appear to be related to the Kulin languages which I propose to group together above, but I refrained from including them pending further research. (The confusion probably stems from the w:Eumeralla Wars, which led to significant loss of Aboriginal lives and displacement of Aboriginal people in this area of Victoria before the languages were documented.) As for wkr, the Victorian Aboriginal Corporation for Languages describes it as poorly documented and having "87 percent common vocabulary" with gjm. This "dictionary of Keerraywoorroong" may have some useful info in its introduction etc; I'll look and report back when I have time. This, that and the other (talk) 02:11, 7 February 2026 (UTC)Reply

I can't comment on this partiular case but I generally prefer lumping over splitting when there's a question of doing one or the other, simply because when you have microsplits you either end up with a lot of duplication (if people bother to add entries to each language) or you have scattered info that should be consolidated (e.g. the word for terms A and B might be the same in all the split lects, but someone might add the word for A under one of the lects and the word for B under a different lect). Benwing2 (talk) 02:53, 7 February 2026 (UTC)Reply

OK, unless TTO finds anything to the contrary, I am fine with treating wkr as a variety of gjm. Since it has its own dictionary, I suppose we should add a dialect label for it so terms/forms from that dictionary can be labelled, and perhaps make the ISO code into an ety-only code. - -sche (discuss) 00:28, 8 February 2026 (UTC)Reply

@-sche okay, finally got up to doing this. Despite its title, the dictionary is really a dictionary of all "Warrnambool" lects. It covers seven lects: Keerraywoorroong, Peekwoorroong, Koornkopanoot, Wooloowoorroong, Keewoorroong, Tyagoortwoorroong and Thawoortwoorroong. A sample entry: "yoorrook iguana, blue tongue […] yuurok K, yuuruuk P, urook (n) KR, yuruk (m) P, yuruk (r) T, erook (d) W", where the letters in brackets refer to sources. No explanation is offered as to why Keerraywoorroong has been given primacy in the book's title or introductory text, although one of the maps in the book shows that this lect was spoken over a wider area than all others. The foreword by Barry J Blake is titled "The Warrnambool language in perspective" and uses that name for the lect, again without comment.

Overall I see no issue whatsoever with merging, but we need to think carefully about the merged name. (This will be a recurring, and persistenly thorny, issue if/when we decide to merge other Australian Aboriginal lects.)

We (and ISO) call gjm Gunditjmara. I am not sure this is the best choice. The Dictionary of Keerraywoorroong and Blake and Reid (1998) both state Gunditjmara is the name of the peoples of the region.
Wikipedia covers both gjm and wkr in the article on the Dhauwurd Wurrung language.
AIATSIS discusses all the lects under "Dhauwurd Wurrung^". Does anyone know whta the ^ indicates?
Glottolog and Blake's foreword mentioned above call this group of lects Warrnambool.
The Victorian Aboriginal Corporation for Languages only includes wkr and not gjm. I contacted them a couple of weeks ago to ask them why this is, but I haven't heard back. A VACL-produced map from 2016 appears in Bowern (2023) and includes Dhauwurd Wurrung alongside Keerraywoorroong (I didn't write down what spellings it uses).
Quite a number of online sources use "Dhauwurd Wurrung" - I wonder how many of them are blindly following Wikipedia.

I'm struggling to pick a name. Any thoughts? This, that and the other (talk) 07:09, 24 February 2026 (UTC)Reply

I have added jbi as Badjiri (the most common of the various spellings). I have added nwg as Ngaiawang (noticing that Ngaiawang is more common than Ngayawung or other spellings, also in Google Scholar); if there are issues with this spelling let me know. Have not dealt with the others yet. - -sche (discuss) 20:36, 10 February 2026 (UTC)Reply

In 2022, the ISO added zcd Las Delicias Zapotec. We have a lot of Zapotec languages; this is the only ISO-coded one we don't have, AFAICT. Like many of the Zapotec languages, it is similar to some other Zapotec languages and dissimilar to others. Wikipedia, citing the ISO code addition request, notes that "Las Delicias and Yagallo have approximately 75% similarity", but it is not clear to me what code Yagallo belongs to, and whether it is part of Rincón or not (the code request also compares it to Yalalag, which is not part of Rincón); Wikipedia and Glottolog group Las Delicias with Rincón, but it is not clear to me whether they are mutually intelligible. I'm not sure what to do here; it seems weird that we have all of the (ISO-coded) Zapotecs but one. @Hk5183, Thadh, you created some Zapotec entries, do you have any thoughts on adding Las Delicias Zapotec? Aside from you and me, most of the creators of our Zapotec entries are long inactive (Marrovi was globally locked long ago, Ptcamn stopped editing, some entries were created by an IP, and @Lvovmauro sadly hasn't edited here in a while). - -sche (discuss) 19:58, 6 February 2026 (UTC)Reply

FWIW, when addressing the misnamed Central Mahuatlán Zapoteco I wrote this:

There are nearly 100 Zapotec varieties given their own L2 languages, which I am highly skeptical of.

And I remain highly skeptical that we need so many Zapotec varieties (contrast the situation e.g. with Levantine Arabic, where one ISO code covers something like 1000km north to south). So I would prefer we try to merge some of the most related lects to get them down to a more manageable number. But if there's no appetite for that, I would (reluctantly) suggest we just add the missing code. Benwing2 (talk) 03:00, 7 February 2026 (UTC)Reply

By my count, we have 58 "Foo Zapotec" languages (not counting proto-languages), some in the Zapotec family and some in the Zapotecan family. (We also have 8 Chatino languages.) In fairness to them, the time depth at which the Zapotec languages have been diverging is about 1,700 years,^[1] and the time depth at which the Zapotecan languages have been diverging has been estimated at 2,400 years.^[1] They've been compared to the Romance languages in this regard,^[2] which we have 64 of despite those diverging much more recently and sometimes being so inter-intelligible that my passive comprehension of one has let me read another. In this case, the ISO code request seems amateurish, there is no-one asking to add words in the lect to Wiktionary, and the case for including or excluding it seems unclear, so if no-one comes forward here with knowledge of it, I am fine with just leaving it unincluded at this time. But I would not necessarily expect to be able to (sensibly) reduce the Zapotec family by that many more languages than we could reduce e.g. Romance by. - -sche (discuss) 04:42, 7 February 2026 (UTC)Reply

OK, maybe I exaggerated the number a bit since it seemed quite large. In terms of the Romance languages, I thought the point of divergence was about 500 AD (and certainly much earlier in some places, like Sardinia), which would imply 1,500 years of divergence, not so different from the 1,700 years of Zapotec divergence. Part of my sense of why there may be too many Zapotec splits is that the family is spread over a significantly smaller geographic area than the Romance languages (although if the terrain is highly mountainous that may negate this issue somewhat). As for Romance splits I definitely think there are too many, esp. in the Iberian peninsula (which is the area I know best), but probably elsewhere as well. I also think there's a lot of language politics going on here, which (at least in Europe?) tends to incentivize speakers of a given lect to exaggerate the number of speakers and the extent to which their lect is a "language" not a "dialect". (now off the soapbox) Benwing2 (talk) 05:09, 7 February 2026 (UTC)Reply

I'm afraid I have no idea - I don't even remember creating those entries and definitely don't remember whatever I might have known about Zapotec back then. Thadh (talk) 09:36, 7 February 2026 (UTC)Reply

↑ ^1.0 ^1.1 Terrence Kaufman, "Areal Linguistics and Middle America", in (among other places) Native Languages of the Americas, volume 2 (ed. by Thomas Sebeok, 2013), page 65. Or 1,500 years per an article in Graduate Quarterly: News and Information for UCLA Graduate Students, volume 12 (2003): "Zapotec languages are assumed to go back to an ancestral language, Proto-Zapotec, which is estimated to have been spoken around 1,500 years ago."
^ E.g. [2], [3].

There is a nine-year-old discussion at the top of this page, #Yenish, but it is so stale (two of its four contributors left, one is me, and the last is User:Prosfilaes, who may have some thoughts on this new proposal), that I don't think it'd be appropriate to do anything on the basis of it and am starting a new discussion: (Notifying Matthias Buchmeier, Jberkel, Mahagaja, Fay Freak, Fytcha, Helrasincke, PhoenicianLetters):
My sense, in 2017 and now, is that yec Yenish is not a separate language, but a cant overlaid onto speakers' main languages: German-speaking users use Yenish words when speaking German, but examples of Yenish from e.g. Alemannic German- or Bavarian-speaking areas are often clearly Alemannic German or Bavarian (with cant words), not German. In this respect, it is similar to CAT:Rotwelsch (which as the {{rfm}} on that category notes should probably be split by language), with which it often overlaps (e.g. Jom bes is also Yiddish and Rotwelsch per the Westjiddisches Wörterbuch), and to e.g. the Gayle cant used in both English and Afrikaans and some other languages.
I propose we deprecate Yenish as a language, and assign Yenish words to whatever language(s) they are attestable in (Bavarian, German, etc) like we do with Rotwelsch. Alemannic German examples, Bavarian examples, etc will continue to meet CFI just as much as they do now since those are all LDLs, but because German is a WDL, this does mean that Yenish terms which can only be found in de-German wordlists may not meet CFI: this is an unfortunate thing we also faced when we merged Polari into English, but in cases where the words are currently only sourced to mentions and wordlists and lack evidence of use, this does not seem like the worst thing: for example, our current source for Baitz is a wordlist, and when looking for uses, I find that that spelling seems atypical and rare — the word may be more attestable as Beiz or Beis (other spellings include Bais and obsolete Baiß) — and it often means specifically a Gasthaus, not just any Haus, i.e. both the form and the definition that we currently have (due to sourcing it only to a wordlist instead of actual use) seem to be not quite right.
- -sche (discuss) 23:49, 6 February 2026 (UTC)Reply

FWIW, @Vininn126 advocates making an exception to the WDL rule for non-standard dialects of WDL languages. The example he gives is Polish lects like Masurian that don't have a lot written in them, so meeting the 3-citation rule for many words is difficult and just seems (IMO) unnecessary. Maybe we could do the same thing here for Yenish. Benwing2 (talk) 03:07, 7 February 2026 (UTC)Reply

Treating Middle Polish and Polish dialects as LDL's within a WDL has been definitely a good decision. Vininn126 (talk) 09:51, 7 February 2026 (UTC)Reply

But such an overlay might be described for multiple languages, “abusing” L2 headers for vocabularies that do not conclude a language. I don't know how much Yenish has the like vocabulary across German dialects but either way we might distinguish restricted terms by labels as regionalisms or dialect-dependent. This also makes attestation restrictions much more handleable, in spite of recent ideas relaxing thresholds for niches within WDL. This of course supposes we don't want to lump German lects, or at least High German dialects, like Chinese at some point. Only red-teaming here. Fay Freak (talk) 18:37, 7 February 2026 (UTC)Reply

While ISO, Wiktionary (undoubtedly following ISO), and sources focusing on its synchronic description (e.g. Hall 2022) deem Saterland Frisian (stq) one of the three modern Frisian "languages" alongside North Frisian (frr) and West Frisian (fy), this classification is problematic since while Saterland Frisian is the only East Frisian variety which survives today, at least four others are attested after the Old Frisian period (ofs; 1275-1600): the three "Weser Frisian" varieties (Harlingerland Frisian, Wangerooge Frisian, and Wursten Frisian), and Upgant Frisian, which belongs to a "Ems Frisian" group with Saterland Frisian. On the other hand, writings on the remaining varieties and the diachronic linguistic literature (e.g. Gregersen 2023) often appears to treat all five varieties as dialects of one "East Frisian" language, but this is also unsatisfactory: the Weser Frisian varieties are quite different from Saterlandic (the distinction between Ems and Weser Frisian dates to the Old Frisian period), while users of our Saterland Frisian entries would be confused by having their "language" lumped in with a bunch of extinct dialects as "East Frisian".

Therefore I tentatively suggest that a language code is created for "Weser Frisian", with etymology-only codes three varieties sitting underneath it (compare "dialects of Weser Frisian" in Smith 2020), while Upgent Frisian is similarly subsumed under Saterland Frisian:

[Saterland Frisian] (stq, language)
- Upgant Frisian (stq-upg, etymology-only)
Weser Frisian (gmw-wef, language)
- Harlingerland Frisian (gmw-haf, etymology-only)
- Wangerooge Frisian (gmw-waf, etymology-only)
- Wursten Frisian (gmw-wuf, etymology-only)

A second option would be to treat all five varieties as separate languages (following e.g. Gregersen 2024 calling Wangerooge Frisian a "language") and Ems and Weser Frisian as families:

Ems Frisian (gmw-emf, family)
- [Saterland Frisian] (stq, language)
- Upgant Frisian (gmw-upf, language)
Weser Frisian (gmw-wef, family)
- Harlingerland Frisian (gmw-haf, language)
- Wangerooge Frisian (gmw-waf, language)
- Wursten Frisian (gmw-wuf, language)

While we're at it, it might be worth creating etymology-only codes for Old Ems Frisian, Old Weser Frisian, Old West Frisian, and the unattested Old North Frisian, as these varieties often appear in the literature and modern Frisian lects' inherited vocabulary sometimes descends from a variant particular to one of them. Hazarasp (parlement · werkis) 12:13, 9 February 2026 (UTC)Reply

@Hazarasp As I remember it, we chose stq to represent all of East Frisian because ISO decided to give that name to a code for a Low Saxon lect. We need to revisit this in a more systematic way with a broader discussion. Chuck Entz (talk) 19:37, 9 February 2026 (UTC)Reply

I'm pretty sure ISO uses stq to refer to Satereland Frisian, which isn't Low Saxon except geographically; you might be getting confused with frs, which ISO used to designate Frisian Low Saxon but we mistakenly used for East Frisian. Hazarasp (parlement · werkis) 21:59, 9 February 2026 (UTC)Reply

@Hazarasp: No, that's precisely what I was talking about. We needed a code for East Frisian and ISO decided to use frs for the Low Saxon lect, so there was no ISO code available for Frisian East Frisian. We decided to use the only available ISO code for an East Frisian lect, stq, even though it was only part of East Frisian- whether considering the latter as a single language, a subfamily, or a dialect continuum within the Frisian branch of West Germanic. We're a lot more free with creating non-ISO codes than we were then, so we ought to rethink how we handle Frisian as a whole. Chuck Entz (talk) 00:27, 10 February 2026 (UTC)Reply

My bad; I read your previous message while half-asleep and interpreted "give that name to a code for" as "give that code to". Hazarasp (parlement · werkis) 01:11, 10 February 2026 (UTC)Reply

How about renaming stq "East Frisian" and having Saterland, Upgant, Harlingerland, Wangerooge, and Wursten all be regional varieties of it (with or without etym-only codes)? —Mahāgaja · talk 09:53, 10 February 2026 (UTC)Reply

I don't like that - we have an okay coverage and fully developed infrastructure of Saterland Frisian, but nothing for the others. Having to introduce labels would decrease the usability of our Saterland Frisian section (belonging to a more-or-less standardised language), just to accomodate some historical unstandardised lects.

I think a better way to handle this is to do what we do with Võro (vro), which is to treat all other South Estonian varieties as 'dialectal' Võro. While the terminology is a bit innovative, and people searching for Seto words might not immediately think of looking under Võro, it is much more workable and useful for the majority of users.

In the end, we do this kind of thing for the majority of our standardised languages, where we define dialects in their relation to the standard. It's just that in this case, the standard would be one specific dialect (Saterlandic) and the dialects would be the rest. Thadh (talk) 11:32, 10 February 2026 (UTC)Reply

While Upgant being lumped in with Saterland Frisian (standing in for Ems Frisian) is fine, I believe the Weser Frisian varieties should be treated together as a distinct language (as I detailed above); there is some precedence for this in that a threefold division of Old Frisian into Old West Frisian, Old Ems Frisian, and Old Weser Frisian is sometimes employed. Furthermore, as someone who has read a fair amount, the notion of e.g. Wangerooge Frisian being lumped underneath Saterland Frisian seems bizarre, while Upgant being placed under it isn't ideal but is perhaps the least bad option since it doesn't appear in the literature as often as the combined Weser Frisian varieties. Hazarasp (parlement · werkis) 13:12, 10 February 2026 (UTC)Reply

On second thought, another solution would be to treat East Frisian as a language covering the non-Saterlandic varieties with all the non-Saterlandic varieties as etymology-only codes, while having Saterlandic as a separate language descending from it since it enjoys a special prominence, analogously to how we have Hunsrik and Luxembourgish descend from Central Franconian or Plautdietsch descend from (German) Low German. Hazarasp (parlement · werkis) 13:17, 10 February 2026 (UTC)Reply

I am weakly inclined to agree with Thadh that it'd be preferable not to disrupt our Saterland Frisian coverage (switching it all from being under its own name ==Saterland Frisian== to being ==East Frisian== + {{label}}) just to incorporate extinct siblings, especially since "East Frisian" also denotes a Low German variety and so doesn't seem like the first or clearest name someone would think to look for Saterland Frisian under (compared to ==Saterland Frisian==). Because all the other (Frisian) East Frisian varieties are extinct and there is thus an obvious division between them and Saterland, I think either Thadh's idea of subsuming them all under the de facto standard, the surviving variety (which as Thadh notes would be similar to other cases; for example, there are [extinct and living] lects on the peripheries of England that the forms of English that we lemmatize can hardly be said to be based on, just as Saterland Frisian is not based on Wursten, which we nonetheless handle as ==English==) or Hazarasp's idea of adding an "East Frisian" language to cover those varieties while leaving the Saterland variety separate (which, as he notes, is also similar to other cases) could work. Adding East Frisian (for the others, and leaving Saterland) would probably be clearer, one L2 for all the extinct lects and one L2 for the living language, unless either (1) the amount of "duplication" (entries having "redundant" Saterland and East Frisian sections) would be very large, or conversely (2) the lexical similarity / intelligibility between Weser and Ems Frisian is so low as to make grouping them under one ==East Frisian== L2 incomprehensible. - -sche (discuss) 20:51, 10 February 2026 (UTC)Reply

I kind of like @Hazarasp's second proposal of having Saterland Frisian as a language and all the others as etym varieties of East Frisian or something similar; but if we're going to create another L2 I'd instead suggest this:

East Frisian (family)
- Saterland Frisian (L2 lang)
  - Upgant Frisian (etym variety)
- Weser Frisian (L2 lang)
  - Harlingerland Frisian, Wangerooge Frisian, Wursten Frisian (etym varieties)

This keeps Saterland Frisian as an L2 language and considers it the "standard" with Upgant Frisian a dialectal variation (per @Thadh's point about Võro) while treating the more distinct Frisian varieties as etym varieties of their own L2 lang. This would only really require labels for non-Saterland forms. Benwing2 (talk) 01:27, 11 February 2026 (UTC)Reply

Re your first sentence, aren't we talking about creating another L2 either way? If we are going to treat Upgant Frisian, Wangerooge Frisian, Wursten Frisian etc as as etym varieties of East Frisian (which is what I am inclining towards as well), we have to create ==East Frisian==. (There are many attested words from the non-Saterland East Frisian lects, so they need to go under some L2.) I think putting Upgant Frisian, Wangerooge Frisian, Wursten Frisian etc under ==East Frisian== and leaving Saterland under its existing L2 (like Luxembourgish etc as Hazarasp says) seems like the way to go, unless the divergence between Ems and Weser was great enough to hamper mutual intelligibility. - -sche (discuss) 03:10, 11 February 2026 (UTC)Reply

I think what @Hazarasp was saying was precisely this, that the difference between Ems and Weser was enough to make them separate languages; that's why I suggested having a Weser Frisian language. But I'm not completely sure and I'll let Hazarasp respond for themself. Benwing2 (talk) 03:15, 11 February 2026 (UTC)Reply

Although mutual comprehension between the Ems and Weser Frisian varieties would've probably been limited at best, that isn't quite what I was saying: my point was that Weser Frisian should be treated as a distinct language code.

But as evidenced by my Luxembourgish-style proposal, I'm not completely committed to having a distinct Weser Frisian code since mutual intelligibility isn't everything (we e.g. have North Frisian as one language even though not all varieties are interintelligible). Hazarasp (parlement · werkis) 04:17, 11 February 2026 (UTC)Reply

To be fair, we should also probably split North Frisian, since that's an enormous mess. So I wouldn't base anything on how we treat that.

I am fine with both my proposal, having Weser Frisian separate and having some kind of "Middle East Frisian" (or however we want to call it) for pre-modern East Frisian varieties. I would prefer the first two however - I can see how a separate code for all of East Frisian except Saterlandic would lead to a mess without standardisation or clear descendant sections or etymologies. Thadh (talk) 07:11, 11 February 2026 (UTC)Reply

My preference is the last two of your listed possibilities, so the middle option (separating out Weser Frisian) seems to be the way forwards unless @Benwing2 or @-sche object. Hazarasp (parlement · werkis) 08:02, 11 February 2026 (UTC)Reply

Just to make sure I understand, of the various proposals, the one you're talking about is : make Weser Frisian a full L2-having language, enter Harlingerland Frisian, Wangerooge Frisian, and Wursten Frisian terms as {{label}}ed dialects of Weser Frisian (and mention them in etymologies using to-be-created etymology codes), and enter Upgant Frisian as an etymology-only variety of Saterland Frisian (and enter words from the one attested Upgant text as {{lb|stq|Upgant}} ==Saterland Frisian==)? (Or something else?) OK, seems reasonable. (Since it's an extinct language with a very small number of attested words, even giving Upgant Frisian its own L2 doesn't strike me as the worst idea, but entering it as {{lb|stq|Upgant}} ==Saterland Frisian== also works.) - -sche (discuss) 07:28, 14 February 2026 (UTC)Reply

That'd be correct. Hazarasp (parlement · werkis) 12:24, 16 February 2026 (UTC)Reply

Glottolog and Wikipedia all use the spelling "Kanakanavu". Ngram[49] shows that "Kanakanabu" used to be more popular but "Kanakanavu" overtook it around 1985, and now is 2.5 more common. Officially it is also spelled like this.[50] Chihunglu83 (talk) 08:24, 10 February 2026 (UTC)Reply

ISO uses the spelling Kanakanabu, but I have no particular objection to renaming. Benwing2 (talk) 02:58, 11 February 2026 (UTC)Reply

I wrote a script to find languages that are prefixes of other languages (e.g. Dutch being a prefix of Dutch Low Saxon) and in the process I came across the languages "Yao" [yao] and "Yao (South America)" [sai-yao]. The latter is extinct and only attested in a single word list from the 1600's, but nonetheless I wonder if we shouldn't convert plain Yao to Yao (East Africa) keeping with our general policy of adding disambiguators whenever two languages have the same spelling. There is also possible confusion with the two Yau languages of New Guinea (Yau (Finisterre) [yuw] and Yau (Torricelli) [yyu]), but from what I can tell, the Yao languages always have an o in their spelling and the Yau languages always have a u (@-sche maybe you can verify this). Benwing2 (talk) 04:49, 11 February 2026 (UTC)Reply

The SIL / ISO must've felt that (for their purposes) they only needed to disambiguate the other (extinct) one, and let the living language have the short name (since they don't disambiguate it, and we just copied their undisambiguated name back in 2004). Since we have entries in both languages, I agree we should disambiguate both. Just "Yao (Africa)" would work (shorter is easier to type), unless there are other African languages with similar names that would make it useful to be more specific. - -sche (discuss) 05:09, 11 February 2026 (UTC)Reply

The only reason I chose (East Africa) as the disambiguator is that all other languages with an Africa disambiguator specify the part of Africa the language is spoken in (North, East, West, Central, Southern). If we use a plain Africa disambiguator here we should consider doing the same for other languages that have a region in them where it isn't strictly necesary. Benwing2 (talk) 05:17, 11 February 2026 (UTC)Reply

I agree. It looks like the only uses of "East Africa" and "West Africa" are the "Aja" languages, a West African "Pana" that needs to be distinguished from a Central African one, and a "Koro" that AFAICT could be renamed to just use "Africa". It looks like the only lect using "North Africa" or "South Africa" is Lala, which is probably using "South Africa" as a country name rather than a region; while we could rename it, leaving it as-is using the country name would have the perhaps-useful side effect of also distinguishing it from the Lala-Foobar languages spoken in the Congo and Nigeria. - -sche (discuss) 07:06, 11 February 2026 (UTC)Reply

One advantage to "Yao (Africa)" is that we don't have to commit ourselves to putting Mozambique and Malawi in either East Africa or Southern Africa. The UN geoscheme considers them Eastern Africa; but the African Union considers them Southern Africa, and they are part of the Southern African Development Community but not the East African Community. —Mahāgaja · talk 12:04, 11 February 2026 (UTC)Reply

Done. I went ahead and renamed the language to Yao (Africa) as it seems fairly uncontroversial and there appears to be consensus on this name. Benwing2 (talk) 02:06, 12 February 2026 (UTC)Reply

Looking through WT:LOL, there are more languages with some sort of "Africa" as a disambiguator:

Aja (West Africa) vs. Aja (East Africa)
Aka (Central Africa) vs. Aka (Sudan)
Bodo (Central Africa) vs. Bodo (India)
Kare (Central Africa) vs. Kare (New Guinea)
Koro (West Africa) vs. Koro (Vanuatu) vs. Koro (India)
Pana (West Africa) vs. Pana (Central Africa)
Yao (Africa) vs. Yao (South America)
Lala (South Africa) vs. Lala (New Guinea)

In a discussion that's since been archived, I listed all the (then) languages with disambiguators and gave some principles for the disambiguators, which I think are to prefer countries over multi-country regions over families over sub-country regions. By this logic, Lala (South Africa) is correct, and as you point out, there are Lala-Bisa and Lala-Roba that are both in Africa and sometimes go by just "Lala". Bodo (Central Africa) is spoken only in the Central African Republic, but in this case Bodo (Central African Republic) is quite long so maybe we can make an exception and use Bodo (Africa). (Wikipedia uses Bodo language (Bantu)).

In the case of Kare (Central Africa), there's another Kare in Central Africa [kbj] that we call Kari; in this case the Central Africa tag doesn't accomplish anything over just Africa so we may as well use Kare (Africa). In the case of Koro (West Africa), it's only spoken in Ivory Coast so we might call it Koro (Ivory Coast), which would also help distinguish it from the multiple Koro languages of Nigeria ([mgi], [uji], [vkn], [vkz], [bqv]), where Koro is an ethnic group, and which Wikipedia respectively calls Jili, Jijili, Koro Nulu, Koro Zuba and Koro Wachi, and which we call Jili, Tanjijili, [not in Wiktionary], [not in Wiktionary] and Begbere-Ejar.

In sum, I'd propose:

Bodo (Central Africa) -> Bodo (Africa)
Kare (Central Africa) -> Kare (Africa)
Koro (West Africa) -> Koro (Ivory Coast).

Benwing2 (talk) 02:22, 13 February 2026 (UTC)Reply

I agree, let's rename Kare, and Koro. For "Bodo", depending on the outcome of #Renaming Bodo (India), we could potentially drop the parenthetical.
FWIW I think most current language names follow WT:LANG (prefer to disambiguate by "place"; fall back on linguistic family), written after this old discussion; the word "place" was used as there was uncertainty over whether "country" or "region" was preferred, but people seemed to agree shorter was better: IIRC mnh was "Mono (Democratic Republic of the Congo)", but as agreed then, "Congo" is enough.
Do any languages use subnational units besides "Mono (California)"? We could rename it "Mono (America)" to be consistent with the apparent tendency to prefer "country-level or bigger" regions.
For "Kari", I notice most papers in Glottlog's bibliography call it "Kare", but (OTOH) they're older and often not English. In the past, people here prioritized "avoid parenthetical disambiguators" even over "use the language's common / recognizable name", leading to things like "Tircul" that have fortunately more recently been renamed (early on, I went along with those priorities, but soon had doubts); I was worried this might be another such case, but it looks "Kari" may indeed be at least as common as "Kare", if not more common. - -sche (discuss) 17:43, 13 February 2026 (UTC)Reply

I checked and we have the following languages using subnational units:

Arára (Pará) and Arára (Mato Grosso): both are states in Brazil
Boano (Sulawesi) and Boano (Maluku): both are islands or archipelagoes in Indonesia
Okpe (Southwestern Edo) and Okpe (Northwestern Edo): Edo is a state in Nigeria
Mono (California) and Mono (Congo)

All the others besides Mono use subnational units within a country that has two languages of the same name in the same country. However, I would not like Mono (America) because Trump has created negative associations with "America" as a placename for many Americans like me (even if the intent is to refer to the Americas as a whole vs. the United States). If we want to rename this language I would suggest Mono (United States), which is more parallel in any case with Mono (Congo) (even though "Congo" can potentially refer to two countries).

For reference, we also have two languages using "Central African Republic" as disambiguators: Ngombe (Central African Republic) vs. Ngombe (Congo) and Ngando (Central African Republic) vs. Ngando (Congo). Benwing2 (talk) 23:01, 13 February 2026 (UTC)Reply

Maybe "Mono (North America)" for the continental unit, as a parallel with "Yao (South America)"? Chaotic Enby (talk) 14:54, 15 February 2026 (UTC)Reply

Ah, I see that those two "(Central African Republic)"s have to stay as-is because the Congo is also in Central Africa (so they have to use the country and not the region).
Sulawesi and Maluku I (at least for my part) could accept as region names (islands make nice clearly distinct regions); I would not necessarily be opposed to renaming them to use linguistic families but I worry that their linguistic families are obscure, less known (and so less useful as disambiguators) than "what island is it on". "Southwestern Edo" and "Northwestern Edo" in relation to Okpe seem linguistic family names (I guess they could be renamed "...Edoid" to more clearly be family names), since only one of them is in Edo state (?). So besides the Californian Mono there are only two which are clearly deviating from the "use countries or regions" rule by using explicitly political subdivisions of the same country, the Aráras... and I guess they are disambiguated that way rather than (as we would normally do for two languages spoken in the same country) by linguistic family because one of them is of uncertain family affiliation? Hmm. I guess this is OK...? (Or we could rename them "Arára (Cariban)" and "Arára (unclassified)"?) Should I add a third bullet point to WT:LANG to document this?

If languages go by the same name and are spoken in the same country and at least one of them is of uncertain family affiliation, then the subnational units (e.g. states or provinces) where each is spoken are appended after the languages' names, as in the case of "Arára (Pará)" (code: aap) and "Arára (Mato Grosso)" (code: axg), both of which are spoken in Brazil, and the second of which is linguistically unclassified.

- -sche (discuss) 22:23, 15 February 2026 (UTC)Reply

Hmm, maybe? Maybe when you get down to choosing between families and subnational regions, it's just case-by-case, whatever seems less obscure? This would accord with Yau (Finisterre) vs. Yau (Torricelli), because Finisterre and Torricelli are among the more well-known families of Papuan languages (granted, Papuan families in general are obscure) and because the subnational regions of Papua New Guinea seem even more obscure (I don't know what subnational regions these languages are spoken in, or if they cross a subnational border; I've never heard of any subnational regions of Papua New Guinea and I don't even know what the term for the subnational regions of Papua New Guinea is, whereas I've heard of Sulawesi and vaguely of Maluku and could guess that a language of Sulawesi and hence of Maluku is spoken in Indonesia). Benwing2 (talk) 22:46, 15 February 2026 (UTC)Reply

ISO retired the codes Coyaima coy and Natagaimas nts by merging them into Pijao pij in 2016; both Glottolog and Wikipedia have followed this. The rationale provided in the change request was as follows:

In Loukotka (1968: 218--219), it is stated that Pijao (Pinao) [pij], classified as a Cariban language, was "once spoken ... in the villages of Orrega, Coyaima and Natagaima." Loukotka indicates that Coyaima (Tupe) [coy] is an extinct Carib language in the Yupe (Yukpa group). Campbell (1997: 202, 205) indicates that Pijao is unclassified and that Coyaima is an extinct Cariban language in the Yukpa group. Adelaar (2007: 38,53) does not mention Coyaima, but provides evidence that Pijao is unclassified. Any mention of Natagaimas [nts] is missing from all the major sources on South American languages. Hammarstrom (2015) believes that both Coyaima [coy] and Natagaimas [nts] are simply place names and should be merged into Pijao, citing Durbin and Seijas 1973.

Hazarasp (parlement · werkis) 03:19, 12 February 2026 (UTC)Reply

Thinking a bit, I no longer think it makes sense to treat Chungli Ao and Mongsen Ao as the same language. They are incredibly divergent in virtually every aspect to the point that Ao (code njo) needs to be redefined as a family. Burling flat-out says in {{R:sit:PCN|7}} that Mongsen and Chungli Ao "border on mutual unintelligibility".

Basic morphology and morphosyntax are vastly different between Mongsen and Chungli. The examples I list below are not exhaustive.
- A bimoraic constraint applies only to nouns in Mongsen but only to verbs in Chungli. Violators of the constraint are prefixed with a-. Thus, many verbs beginning with a- in Chungli have no a- in Mongsen, and several nouns beginning with a- in Mongsen do not have a- in their Chungli cognates.
- Mongsen uses a negative preterite suffix -la which simply does not exist in Chungli.
- Mongsen's causative suffix is /-(p)iʔ-/, while Chungli's is the clearly unrelated -daktsü.
- Plural pronouns are mutually exclusive between the two varieties, so Chungli asenok, onok, nenok, parnok do not exist in Mongsen (which has completely different pronouns for these roles).
- Conjunction use seems to be quite different between Chungli and Mongsen. For example, Chungli aser (“and”) coordinates both noun phrases and clauses, but this conjunction does not exist in Mongsen; noun coordination in Mongsen is done with /kʰə/ while clause coordination is handled by a complex converb system.
- The two varieties of Ao have entirely different case marking systems; for example, agentive in Mongsen is /nə/, while in Chungli this role is filled in by the clearly different -i. Dative and locative are handled by nem and nung in Chungli but by -li and ku respectively in Mongsen. In fact, the only shared case marker between the two varieties is den (“with (a person)”).
- Mongsen has a system of marking topics with the particle la which does not exist in Chungli.
The two languages have gotten quite phonetically divergent to the point that {{R:sit:PCN}} outright has to reconstruct "Proto-Ao" in order to advance further Central Naga comparisons. Examples of divergences include:
- Chungli completely merged aspirates into non-aspirates, while Mongsen keeps a strict distinction between them.
- Mongsen also has devoiced sonorants that Chungli lost.
- Certain coronal syllables have chaotically divergent outcomes between the varieties; there's a bunch of l vs. z correspondences, /tʃʰ/ merger with /s/ in Chungli, m-to-n correspondences, etc.
Chungli is written (and has been for over a century now) and Mongsen remains unwritten. Due to the phonetic divergences between Chungli and Mongsen, and unusual conventions in Chungli's orthography (Chungli interchangeably uses voiced and voiceless letters for the exact same consonant phoneme), transcribing Mongsen with Chungli-like orthography is impractical.

Given how morphologically, morphosyntactically and phonetically different Chungli Ao and Mongsen Ao are and the fact that different orthographic considerations are required for the two languages, I now believe that turning Ao [njo] into a family and splitting it into at least two languages (tentatively [njo-jgl] for Chungli and [njo-mng] for Mongsen) is warranted. The split is not a complex endeavour, since I only created mostly Chungli entries and the few Mongsen entries I created can be deleted entirely to start over. Thus it can tentatively be handled like a language rename. — mellohi! (Goodbye!) 07:30, 13 February 2026 (UTC)Reply

I guess my main concern is on how we would then handle Mongsen. I really don't want to end up with a language lemmatised at IPA because we lack a better solution. Are there really no proposed orthographies out there? Thadh (talk) 08:23, 13 February 2026 (UTC)Reply

What about the orthography in Coupe (2003)? It seems rather straightforward... Thadh (talk) 08:28, 13 February 2026 (UTC)Reply

@Thadh Despite Coupe's work, Temsunungsang and Changkija (2025) state in no uncertain terms that Mongsen speakers have never adopted an orthography (they instead are in diglossia with Chungli for literary purposes, it seems). Closely related Changki Ao does have an orthography, however, but I can't access their dictionary to verify how it works. — mellohi! (Goodbye!) 08:40, 13 February 2026 (UTC)Reply

The Bodo language in India should be renamed Boro as a more accurate name; the name itself in Boro has a rhotic (apparently converted to a retroflex stop when borrowed into Hindi), and apparently the r-spelling is common in linguistic literature. Both major sources (Burling, and Wood's thesis) on the family's proto-language Proto-Boro-Garo also use the r-spelling. The Hindi-influenced form "Bodo" is highly misleading since it could be easily be misinterpreted as a containing a non-retroflex consonant, a problem which the r-spelling avoids while paying better respect to how Boros pronounce their own ethnic/language name. — mellohi! (Goodbye!) 11:45, 13 February 2026 (UTC)Reply

Hmm, the main thing that gives me pause is that "Bodo" seems significantly more common even in recent linguistics literature specifically about the language (contra the claim on Wikipedia / in Das & Mahanta to the contrary): e.g. sorting Glottolog's bibliography by date, all of the works from last fifteen years (by nine different authors, almost all of whom seem to be Indian) use "Bodo" (and among earlier works it looks like a pretty even split); Google Scholar similarly has "Bodo language" India several times more common than "Boro language" India, even if looking at just the last 5 or 10 (or 15 or 25) years. But commonness isn't everything, and (similar to Adyghe) I don't want to stand in the way of a rename if the speakers and editors of this language reject the current name. (If this is renamed, we also need to rename xxb from "Boro" to "Boro (Ghana)" i.e. swap its current main and alt names around.) - -sche (discuss) 19:34, 13 February 2026 (UTC)Reply

I had (wrongly) assumed they were equally common, hence me not saying "most common"; I only meant to say that I thought the d/r spellings were both common. — mellohi! (Goodbye!) 19:39, 13 February 2026 (UTC)Reply

Pinging @AryamanA, Msasag, who created some entries in this language, if you have a preference for what to name it. - -sche (discuss) 19:34, 13 February 2026 (UTC)Reply

In diff, Shidailun73 (talk • contribs) notes that "The language is generally referred to as Seediq, as in the title of the citation, not Taroko, which is a Japanese transliteration." Indeed it seems that Seediq has become significantly more common than Taroko in general use in the ~10 years, and looking at Glottolog's list of works (and sorting them by date) Seediq seems to have been the main name used by works about the language for even longer. (Google Scholar results superficially appear to have the two words being about equally common for the last ten years, but that's because Taroko is also a place name and many uses are of that rather than the language name.) - -sche (discuss) 08:02, 19 February 2026 (UTC)Reply

Oh, thanks! I recognize you as the creator of the page for mqedin. Taroko is indeed a place name, in the Taroko Gorge, and a Japanese transliteration of Truku, the people who live in that area and one of the three main topolects of what's come to be known as Seediq or Sediq (the pronunciation of person/people in two of the dialects). In addition to correcting Taroko to Seediq I'd like to contribute a bunch of words in the three dialects. Shidailun73 (talk) 10:05, 19 February 2026 (UTC)Reply

@-sche I am generally okay with that BUT since now both Seediq and Taroko ("Seediq-Truku dialect") are both standardized and now separate groups , should we use label to note Truku under Seediq like we have now or create a separate code (ZH wiki use map-trv)? And are the old Taroko entries Seediq proper? This requires a discussion and further cleanup. Chihunglu83 (talk) 05:52, 7 April 2026 (UTC)Reply

I am bringing up again the topic that has previously been discussed here—namely, the idea that Gagauz should be regarded as a descendant of Ottoman Turkish. At the same time, I am questioning why the Tuvan and Tofa languages are not considered descendants of Old Uyghur.

We generally assume that Western Yugur is a descendant of Old Uyghur and distinguish it from the other South Siberian Turkic languages on that basis. However, despite existing differences, Tofa and Western Yugur are remarkably similar to each other—I realized this while attempting to transcribe them into Cyrillic. Western Yugur's similarity appears stronger and more direct than the common ground each shares with Khakas or Tuvan.

Additionally, while the Khakas and Shor peoples have historically been Shamanist Turks, we know that the Tuvans and Western Yugurs—like the Old Uyghurs—adhered to Buddhism. The term Khakas derives directly from the Old Turkic word Kyrgyz. Furthermore, the Fuyu Kyrgyz are known to have split from the Khakas at a later period, and they still identify themselves as Kyrgyz. Just as the Khakas, like the Old Kyrgyz, were traditionally Shamanist.

So, to summarize my position:

1) I have withdrawn my earlier objection to viewing Gagauz as descending from Ottoman Turkish. Since this is already the prevailing opinion, it should indeed be regarded as a descendant of Ottoman Turkish.

2) The Tuvan and Tofa languages—whose ethnonyms reflect different pronunciations of the same original name—and the dialects of Tofa should, like Western Yugur, be considered descendants of Old Uyghur.

3) The Khakas and Shor languages should be regarded as descendants of Old Kyrgyz. – BurakD53 (talk) 14:47, 19 February 2026 (UTC)Reply

Hi! I will not take up much of your time. @AmaçsızBirKişi, Yorınçga573, Allahverdi Verdizade, Bartanaqa, Bababashqort, Rttle1, Ardahan Karabağ, Əkrəm, Chihunglu83

Just as in the Salar language, the internationally accepted Common Turkic alphabet should also be adopted for Western Yugur. In fact, this should apply to all Common Turkic languages that do not have a modern written standard—of which I believe there are very few besides these two, since the Soviets assigned writing systems to most of them.

Additionally, in languages where it is not an allophone but phonemically distinctive—such as in Tofa transcriptions and in Tenishev’s work—the /ʰ/ sound occurring before consonants should be represented with the apostrophe (').

I would be grateful if you could let me know, even briefly, whether you agree with this proposal bringing {{support}}. If you have any suggestions for revision, I would appreciate it if you could indicate them in your response.

If you’re interested, you can also take a look at the previous proposal.

Thank you in advance. – BurakD53 (talk) 08:49, 21 February 2026 (UTC)Reply

Support

AmaçsızBirKişi (talk) 14:55, 4 March 2026 (UTC)Reply

@BurakD53: Where you write “the /ʰ/ sound occurring before consonants should be represented with the apostrophe (')”, do you mean the punctuation character U+0027 APOSTROPHE or the letter U+02BC MODIFIER LETTER APOSTROPHE? 0DF (talk) 16:50, 4 March 2026 (UTC)Reply

Honestly, I didn’t have a particular opinion about which one should be used. As far as I can see, U+0027 is preferred in the context of contractions, whereas U+02BC is assigned to represent the /ʔ/ phoneme in various languages. Moreover, I see that in Kildin Sami it is used to represent /ʰ/. I assume you were aiming to draw attention to this very point when you asked your question. Thank you. Yes, U+02BC seems to be a more consistent choice. – BurakD53 (talk) 00:41, 5 March 2026 (UTC)Reply

Per LT, we merge Yaitmathang (ISO xjt) into Dhudhuroa (ddr) on the basis of this 2013 discussion. However, the available evidence points to Yaitmathang being a variety of Ngarigo (xni - we call it Ngarigu) and not at all closely related to Dhudhuroa. Glottolog places Dhudhuroa and Yaitmathang in completely separate branches of Southeastern Pama-Nyungan. See [51] and the Clark article referenced by AIATSIS (see first result here).

I propose to re-merge Yaitmathang into Ngarig[o/u] instead.

All our Dhudhuroa lemmas are referenced to Mathews (1909), "The Dhudhuroa Language of Victoria", and we don't appear to have any translations into Dhudhuroa, so the change wouldn't have any effect on our entries.

Also, I wonder if we should rename the lect. The name Jaitmathang gets twice as many hits on Google Scholar as Yaitmathang. Jaitmathang was the form used by a Traditional Owners Corporation when they applied for official recognition [52]. (This is, of course, a moot point if it's merged into xni.) This, that and the other (talk) 00:40, 25 February 2026 (UTC)Reply

1) Southeastern Kolami > Naiki, the common name used by authors like Bhadriraju, also widely regarded as a distinct language and not a dialect of Kolami.

2) Merging Kumarbhag Paharia/Sawria Paharia to Malto and Korra Koraga/Mudu Koraga to Koraga or having the former as dialects which are very rarely regarded as distinct languages. All major Dravidian works like of Bhadriraju considers them as a single language and word lists only have terms for Malto/Koraga, currently Malto terms are registered under Kumarbhag Paharia and Koraga terms under Korra Koraga. ~2026-13163-15 (talk) 10:47, 28 February 2026 (UTC)Reply

Elfdalian, a form of Dalecarlian/Dalmål, part of the Swedish dialect group/Sveamål, is currently set as a descendant of Old East Norse, while it should be a descendant of Old Dalecarlian, a subvariant of Old Swedish. Linguistically, the chronology of Elfdalian is problematic, since it shows signs of being a middle shild between East and West Norse, while also doing its own thing. However, practically, its a descendant of Old Swedish. ᛒᛚᚮᚴᚴᚼᛆᛁ ᛭ 𝔅𝔩𝔬𝔠𝔨𝔥𝔞𝔧 15:33, 8 March 2026 (UTC)Reply

Support. While it is true that Elfdalian is not, for instance, a direct descendant of the Old Swedish of Äldre Västgötalagen, neither is Finland Swedish or other non-Västergötland dialects. Moreover Dalecarlian is part of the Swedish part of the North Germanic dialect continuum, and Old Swedish encompasses all North Germanic in medieval Sweden apart from Gotland.

The situation is analogous to modern English and Old English; while ModEng is not a direct descendant of the West Saxon OE on which our norm is based, we still list it as a descendant of Old English which is understood to encompass all early medieval English dialects. Mårtensås ᛭ Proto-Norsing ᛭ AMA 18:28, 14 March 2026 (UTC)Reply

I see we have the same issue with Jamtish language. There is a thing called Old Jamtish, and it shows themes from both East and West Norse, yet is currently set as a descendant of Old East Norse. ᛒᛚᚮᚴᚴᚼᛆᛁ ᛭ 𝔅𝔩𝔬𝔠𝔨𝔥𝔞𝔧 01:38, 15 March 2026 (UTC)Reply

Done. Hazarasp (parlement · werkis) 12:18, 10 April 2026 (UTC)Reply

Ty. ᛒᛚᚮᚴᚴᚼᛆᛁ ᛭ 𝔅𝔩𝔬𝔠𝔨𝔥𝔞𝔧 11:56, 13 April 2026 (UTC)Reply

@Blockhaj: On the map to the right, what dialects are shown in gray and marked "Norska mål" but on the Swedish side of the border? Isn't that Dalecarlian? Does this map consider Dalecarlian a variety of Norwegian rather than Swedish? —Mahāgaja · talk 14:22, 13 April 2026 (UTC)Reply

Dalecarlian is area C. The grey area is Jamtish and "Härjedalska". The latter is technically fully "East Norwegian", but its not that simple. ᛒᛚᚮᚴᚴᚼᛆᛁ ᛭ 𝔅𝔩𝔬𝔠𝔨𝔥𝔞𝔧 01:34, 14 April 2026 (UTC)Reply

No, Jamtish is by some reason blue on the map. Herjedalian is not East Norwegian, it is very close to Jamtish in many aspects, otherwise close to upper-Østerdal/Gudbrandsdal in Eastern Norway. Tollef Salemann (talk) 21:36, 15 April 2026 (UTC)Reply

To be honest, the previous topics didn't reach a conclusion and didn't receive much attention, but there are a few points I'd like to address that would be appropriate to evaluate:

1. Why is there a preference for not using the infinitive form in Chuvash verbs? Do we have any consistent reason for this?

2. While this is preferred in academia, why don't we write the suffix pages in Proto-Turkic reconstructions as -GAn, in a way that indicates they are two-fold suffixes? (As we discussed previously).

3. Why don't we denote nominal suffixes with + and verbal suffixes with -, as is the standard practice in academia? For example, +lAr "plural suffix" and -(I)g "deverbal noun suffix". (As we discussed previously).

Thank you for your attention. – BurakD53 (talk) 00:07, 19 March 2026 (UTC)Reply

"Suruí Do Pará" (literally "Suruí of Pará") should, at the very least, be changed to "Suruí do Pará" – unequivocally, without the need for discussion. Capitalized "Do" makes no sense in Portuguese (and even in English) in this context, rather it should be "do", and the term in English is a direct borrowing from Portuguese. The Wikipedia page reflects this reasoning. Pinging @Polomo. Jacaguoçãrana (talk) 01:14, 20 March 2026 (UTC)Reply

Yeah, this capitalization wouldn’t be used in Portuguese under any system, and I don’t think there’s any reason for it to be capitalized in English, especially considering it’s unadapted. — Polomo ⟨⁠ ⁠oi!⁠ ⁠⟩ · 01:50, 20 March 2026 (UTC)Reply

This was probably originally an error based on ignorance of do being a Portuguese preposition. I support renaming this language to Suruí do Pará. 0DF (talk) 02:08, 22 March 2026 (UTC)Reply

Hi, One month ago, we kind of talked about this at https://en.wiktionary.org/wiki/Wiktionary:Language_treatment_requests#Circassian but it a discussion unrelated to the topic, so I want to bring this up again in its own topic. I have 100000 JSON files, that are Circassian->English, and I want to add them all to Wiktionary programmatically or via a BOT. I'm a programmer myself, therefore, I can write a script to format these JSON files to fit Wiktionary's edit format. From my understand, if an entry already exist for a certain word, I need to take these existing entries, modify them, and give Wiktionary API both old and new versions. While I don't need help writing the script, I need help with Wiktionary itself and Wiktionary API. Can somebody send me all West + East Circassian entries dump so I can modify them via a script? Adamʂa123 (talk) 16:19, 25 March 2026 (UTC)Reply

@Adamsa123: On Wiktionary, we pride ourselves for having high-quality content. Will the entries you will create with your bot be stubs (i.e. only contain the part of speech, headword, and definition/gloss) or are you actually adding some other information as well (such as pronunciation, inflection, etymology, and references(!))? If you're proposing the former, I would strongly oppose such a bot job, since that would just lessen our overall quality of entries. If it's the latter, then that does sound exciting.

I think @Surjection, Vahagn Petrosyan are probably interested in this discussion as well. Thadh (talk) 16:59, 25 March 2026 (UTC)Reply

Adamsa123, from what source did you compile those files? And, can you give us an example of an entry? Vahag (talk) 18:21, 25 March 2026 (UTC)Reply

Good question, I'm happy you brought up what's the source of the files. I want to be 100% transparent here, so not to break any Wiktionary rules, I have a lot of West/East Circassian to Russian/Arabic/Turkish dictionaries. Using AI that analyzed all these dictionaries, I was able to generate a Circassian to English dictionary. As a West-Circassian native speaker, I reviewed some of these generated entries myself and the result looks good. In terms of quality, the existing entries as of now on Wiktionary in both West & East are not perfect, so, if there's a concern that the AI generated content would decrease the quality, that's not the case. If AI generated content is an absolute no-no, then there's no point on talking about how to format the JSON files. But if it's allowed, then I think we can further talk about how to reformat the JSON files to fit Wiktionary's format. In any case, I'm proposing the "former", meaning, part of speech, headword, definition and examples. Pronunciation can be added programmatically, as the Circassian language is written phonologically. Adamʂa123 (talk) 21:37, 25 March 2026 (UTC)Reply

Jesus Christ, Adamʂa123, how could you think it's a good idea? AI-generated slop is an absolute no-no. Quality aside, cannibalizing those dictionaries may be a copyright infringement. Vahag (talk) 09:24, 26 March 2026 (UTC)Reply

Yeah you're absolutely right. Because Circassian is a dying language, as a sort of desperate move, I thought maybe it can work because the results of AI seems good to me. Adamʂa123 (talk) 13:16, 26 March 2026 (UTC)Reply

lol, it's like a modern version of the clueless revival of Cornish! ~2026-18647-55 (talk) 13:19, 26 March 2026 (UTC)Reply

The best way to combat language loss is increasing the prestige of a language. Low-quality entries of which you cannot be sure which are correct and which are not likely to do so. A high-quality dictionary with inflection, quotations, usage examples, references, pictures is on the other hand a much more likely thing to have any impact at all. Thadh (talk) 13:43, 26 March 2026 (UTC)Reply

Alright, thanks for the insight. I might create a high quality Circassian to English dictionary with all the features you mentioned, but it will take a lot of time, this is not for the near future. Adamʂa123 (talk) 13:52, 26 March 2026 (UTC)Reply

Wikipedia generally calls this language by their autonyms Innu or the more specific Innu-aimun, and French Wiktionary also does that (with lowercase innu). Montagnais is the French exonym and the (today) less preferred name for the language as well as the people. TagaSanPedroAko (talk) 00:39, 29 March 2026 (UTC)Reply

I request Loglan be added to WT:LOL with lang code: qlo per the ConLang Code Registry. This is to establish proper etymology for Lojban terms. There are at least 4 core conjunctions that are inherited from Loglan and I anticipate more. If this language would be expanded, it would definitely be an Appendix-only language. TranqyPoo [💬 | ✏️] 18:13, 3 April 2026 (UTC)Reply

Support I do support the addition. If Loglan is added, it will be given a Wiktionary language code beginning with art- (the ISO 639-5 code for constructed languages), such as art-log. If Loglan ever gets an official ISO 639-3 code, Wiktionary will switch to using that code. Netizen3102 (talk) 06:57, 17 April 2026 (UTC)Reply

@TranqyPoo: this is a good example of why we don't use codes from non-ISO registries. There is already an ISO code "glo", which refers to Galambu, a West Chadic language. We avoid this kind of mess by adding to the end of the closest existing ISO code, as mentioned in the previous message. Chuck Entz (talk) 18:34, 17 April 2026 (UTC)Reply

Gotcha, I was unaware of ISO 639-5 and I concur with Netizen. Minor comment: the requested code is with a Q, not a G. TranqyPoo [💬 | ✏️] 18:45, 17 April 2026 (UTC)Reply

Oops! While qlo doesn't exist, everything else I said stands. I vaguely seem to remember some kind of reason ISO hasn't been adding 3-letter q- codes, but there are a few. Better to avoid any possibility of a conflict. Chuck Entz (talk) 19:01, 17 April 2026 (UTC)Reply

I would like to propose a rule for PSB whereby we will not create entries for a word in this language if it has no Baltic descendants. The only possible exceptions should be words that already existed in Proto-Indo-European, or words that have some derivatives. Currently, we have a number of words that have no Baltic descendants; sometimes a Slavic descendant is Proto-Slavic innovation. Especially @AshFox and @ɶLerman like creating pages like that, often making up references that allegly support PBS reconstruction. AshFox often uses {{R:ine-bsl:Trautmann}}, but Trautmann often reconstructs PBS without any Baltic (or Slavic) descendants. Boryś says: "It has often been argued that only those lexemes that existed in both language groups should be considered true Balto-Slavic lexemes, and that Trautmann’s dictionary, which has already played an important role in the study of Balto-Slavic, Baltic and Slavic lexicon, should be replaced with a more modern dictionary..." If we agree with that, first exception should be ignored, and we should always skip PBS reconstruction for Slavic-only words. It is unclear whether the PBS was actually a single language or the result of intense linguistic interaction between the early Balts and early Slavs.

Words that should be deleted: *bérgas / *álśis / *baršinan (?) / *brā́ˀtiškas / *bikelā́ˀ / *bī́ˀtei / *déntsnāˀ / *dúsdjus / *dṓˀnis / *gléistan / *kā́ˀmō / *kšwárštan / *maréitei / *mangjás / *munagas / *mḗˀsinkas / *péktis / *paktás (maybe?) / *rébran / *skaujás / *stíltei / *wájas / *źwēˀrīˀnas / *źwēˀrīˀnāˀ / *źénˀtis / *źálˀtan. Sławobóg (talk) 15:55, 9 April 2026 (UTC)Reply

What do we have here, doubts about the existence of Proto-Balto-Slavic? To me, these claims are absurd. It's like claiming that Italian arose directly from Proto-Italic, skipping the Latin stage. Or that Old English arose directly from Proto-Germanic, skipping the Proto-West Germanic stage. I don't think that situations where not all descendants of a proto-language preserve a lexeme are a reason to ignore the reconstruction of this proto-language. It is impossible for Proto-Indo-European *bʰérǵʰos to just develop directly into Proto-Slavic *bȇrgъ without going through the Proto-Balto-Slavic *bérgas stage, which is very easily reconstructed without any problems. @Sławobóg I have always treated you with respect and supported you, but with this initiative of yours, I do not agree with you. AshFox (talk) 17:04, 9 April 2026 (UTC)Reply

I would say the argument is not so much that Proto-Balto-Slavic *bérgas didn't exist as that having a separate page for it brings no added value, since it doesn't tell us anything that [[RC:Proto-Slavic/bergъ]] doesn't. —Mahāgaja · talk 18:25, 9 April 2026 (UTC)Reply

Oppose as proposed, I agree with AshFox here. The standard of having a proto-word entry is reasonable confidence that the word existed at that proto-stage. This can be demonstrated by what I call "daughter + daughter ⇒ mother" (being attested in both Slavic and Baltic) or it can be demonstrated by "aunt + niece ⇒ mother" (being attested in Slavic + another non-Balto-Slavic cognate that proves inheritance from a PIE word, which has to pass through the Balto-Slavic stage to get to Slavic). We should still allow Baltic-free Proto-Balto-Slavic entries where inheritances from fully-formed PIE words can be demonstrated via non-Balto-Slavic cognates. — mellohi! (Goodbye!) 18:51, 9 April 2026 (UTC)Reply

Agree. It is common practice here for other reconstructions as well, and may be usefull in longer perspective. Tollef Salemann (talk) 21:31, 15 April 2026 (UTC)Reply

You can barely argue that a specific requirement for Proto-Balto-Slavic is needed. Creating proto-language entries with only one descendant is already exceptional and I agree with the specifics of AshFox and Mellohi! on exceptions. In other cases I am confident that our editors in this particular language group are able and willing to argue down entries that have insufficient merits. At the same time the proposer himself finds exceptions, so devises a bad rule. Fay Freak (talk) 20:52, 9 April 2026 (UTC)Reply

Notifying @NoKiAthami, @Jacaguoçãrana and @Ovey 56, and @GuaraniNeeha if they're still alive (maybe @Polomo and @Juwan too who like sticking their nose). I've recently added tup to etymon's text allowlist seeing its use recently, but, as victar pointed out, this should have a written consensus for future reference in accordance to the last vote. Since there are very few active editors (most of the languages don't even have a page yet), I felt it would be more productive to ask for the whole Tupian family instead of each language.

So: do you agree in using {{etymon}} and |text= in entries in Old Tupi, Paraguayan Guarani and any other Tupian language? Trooper57 (talk) 02:08, 11 April 2026 (UTC)Reply

Tupian includes some 82 languages. I recommend limiting the scope. --{{victar|talk}} 02:29, 11 April 2026 (UTC)Reply

Removing subfamilies and redlinks (like Wayoró), there are 67 languages on wiki. Most of them have a single or so lemmas added a decade ago that I'm affraid wouldn't even pass RFV. I'm pretty sure I've pinged the whole active community for these languages, but I could limit it to, say, Old Tupi, Paraguayan Guarani, Mbya Guarani and the protolanguages, which seem to be edited the most. Trooper57 (talk) 02:50, 11 April 2026 (UTC)Reply

There could be even less actually, some of the blue links are totally empty (Category:Tuparí language) and others only have maintenance categories (Category:Urumi language). That would be an interesting investigation. Trooper57 (talk) 02:58, 11 April 2026 (UTC)Reply

@Trooper57: Which codes would that be then? --{{victar|talk}} 05:14, 11 April 2026 (UTC)Reply

Support FWIW. Could we also include Kariri (kzw)? I'm kinda the only editor ever of it. Jacaguoçãrana (talk) 04:36, 11 April 2026 (UTC)Reply

That would be a separate request you could start. --{{victar|talk}} 05:17, 11 April 2026 (UTC)Reply

Support as nose-sticker. Juwan 🕊️🌈 04:59, 11 April 2026 (UTC)Reply

Notifying @Apisite, Stríðsdrengur, Iohanen (@Montoya2002 maybe?).

Seeing the result of the last vote on this matter, I must ask if y'all agree on adding hrx to etymon's text allowlist and having |text= in Hunsrik entries. I see it's been used to some extent already. Trooper57 (talk) 03:45, 11 April 2026 (UTC)Reply

@Trooper57: Sorry, I haven't been active here. What would that be? Iohanen (talk) 04:09, 11 April 2026 (UTC)Reply

@Iohanen: basically, there needs to be community consensus for using {{etymon|text=+}} in etymologies, like in Fakong. There's been some dispute aroud using this template, so I gotta ask if it's fine to add Hunsrik to the allowlist. Trooper57 (talk) 04:28, 11 April 2026 (UTC)Reply

I see. I'm fine with it :) Iohanen (talk) 06:02, 11 April 2026 (UTC)Reply

@Trooper57: I'm not aware of the dispute. What's the harm people fear? Korn [kʰũæ̃n] (talk) 08:58, 12 April 2026 (UTC)Reply

@Korn: the main arguments against it I remember seeing are:

The simplification of etymologies, since you have less control over the output than just plain text;
The replacement of established templates {{inh}}, {{der}}, {{bor}} etc. Same controversy we have with the {{bor+}} family, basically (the |text= function also automatically adds a glossary link, so there's that)
The supposed gamefication of adding the template to entries;
The module and format being constantly edited (I think they've settled down for the moment)
The overall usefulness of having etymology trees (separate issue imo).

You can see some of the rationales here:

Trooper57 (talk) 14:51, 12 April 2026 (UTC)Reply

I also err on the side of caution in adopting {{etymon}}, as it is objectively more difficult to use than the {{der}} templates, thereby raising the barrier to entry for editing etymologies. Accordingly, it is important to ensure that members of the community understand what they are agreeing to. --{{victar|talk}} 05:47, 13 April 2026 (UTC)Reply

If the 3 main editors (Trooper, Iohanen and I) know how the template works and agree with its use, then the matter is resolved, there's nothing more to discuss. Stríðsdrengur (talk) 11:38, 13 April 2026 (UTC)Reply

I'm fine with it. Stríðsdrengur (talk) 15:02, 11 April 2026 (UTC)Reply

Tagging active German(ic) editors: @Caoimhin ceallach, DerRudymeister, Hazarasp, Korn, Leasnam, Mahagaja, Mnemosientje. --{{victar|talk}} 06:38, 12 April 2026 (UTC)Reply

For what reason? We are the only active Hunsrik editors. Every time one needs to make even the slightest change to the Polish language layout, do I have to mention all the Slavic editors? Stríðsdrengur (talk) 12:02, 12 April 2026 (UTC)Reply

Hunsrik is part of lect continuum with High German with extensive overlap. --{{victar|talk}} 00:10, 13 April 2026 (UTC)Reply

Stríðsdrengur, you say that as if Hunsrik operates in isolation from other Germanic languages. As a long-time Germanic editor myself, even I have contributed to the etymologies of Hunsrik entries. I would also gently note that neither you nor Iohanen have edited a Hunsrik entry in some time, which is why I chose to tag additional editors who are currently active in Germanic. --{{victar|talk}} 16:03, 13 April 2026 (UTC)Reply

I'm sorry but adding a handful of etymologies doesn't mean much. I may not be the most active editor at the moment, but I was the one who started cleaning up the Hunsrik entries, created the appendices, the first reference templates, encouraged the organization of various categories, and also led to the creation of the pronunciation module. And at the moment I am also trying to recruit native Hunsrik speakers who are willing to help us with audio pronunciations and words not found in Piter Keo's dictionary. After a while editing the language, I asked @Trooper57 to replace me indefinitely. And @Iohanen is the creator of the conjugation module. It is true that Hunsrik technically does not operate separately from the other Germanic languages, but it has its own base of editors who have a consensus among themselves. Stríðsdrengur (talk) 21:39, 13 April 2026 (UTC)Reply

I'm not claiming to be a Hunsrik editor; currently Trooper57 is the only user still actively editing Hunsrik. But with only 1,682 Hunsrik entries, every edit matters.

My concern is that implementing {{etymon|text=1}} will unintentionally discourage contributions from other Germanic editors, like myself, who are uninclined to work with {{etymon|text=1}}. Larger projects like Polish may have enough active editors to absorb that cost, but Hunsrik does not have that luxury.

--{{victar|talk}} 02:40, 14 April 2026 (UTC)Reply

Abstain I am against the usage of |text= for the reasons you accurately summarised but I don't want to impose that view on languages I don't edit in. —Caoimhin ceallach (talk) 23:08, 13 April 2026 (UTC)Reply

I'm taking out my support vote. I must agree with @Victar when he says contributors should know what they are agreeing with in the first place, and I don't. I do not know the consequences of using etymon nor the usability of it. I'll stay neutral until I have time to read about it and the discussions around it. Iohanen (talk) 05:11, 14 April 2026 (UTC)Reply

I noticed that User:AshFox added Dacian entry links to an etymology. I was under the impression that Dacian entries are discouraged, if not explicitly prohibited. Searching for prior discussions on Dacian, I came across the following:

@Chuck Entz, Benwing2, Thadh, -sche, Nicodene, Fay Freak, Urszag, PUC, Tollef Salemann, Mnemosientje --{{victar|talk}} 07:46, 13 April 2026 (UTC)Reply

In that particular case it should not have been linked, if I understand correctly that the morphology of Dacian is speculative—I mean it is present in the same etymology in two wholly different forms, can we even distinguish chronolects of Dacian 🙄?—, but I never really sought out the literature on that Trümmerlanguage. In attested cases (e.g. plant names mentioned as Dacian), we can suffer entries warning against themselves with noted places of attestation, as the best efforts. Maybe we should devise a rule such as: for languages like that, always give the evidence. Fay Freak (talk) 15:12, 13 April 2026 (UTC)Reply

To be fair, it wasn't Fenakhay who added the link (although this is part of the problem with editing etymologies of languages you know nothing about, which is why I discourage users from doing so!). But that etymology does look like bullshit - our knowledge of Dacian isn't even close to good enough to make such reconstructions. Thadh (talk) 16:33, 13 April 2026 (UTC)Reply

"like bullshit"?... Address all claims to Oleg Trubachyov. AshFox (talk) 17:27, 13 April 2026 (UTC)Reply

What's not to like? I didn't just make up the etymology out of thin air. I've included links to the sources I got the etymology from: ESSJa (1978) and Anikin (2020). AshFox (talk) 17:30, 13 April 2026 (UTC)Reply

Thank you for sourcing the etymology, but that does not place it beyond critical scrutiny. Even looking past the unreconstructability of Dacian, there's no reason to assume Proto-Slavic borrowed the term from Proto-Sarmatian via Dacian. --{{victar|talk}} 05:44, 14 April 2026 (UTC)Reply

The problem I have with Dacian and Thracian is that they're all scattered mentions in running text of other languages- 'the Dacians call that "[ ]"'. Can we even guess what a Dacian sentence would look like? How do we keep it from becoming an etymological Rorschach ink blot test? Chuck Entz (talk) 06:16, 14 April 2026 (UTC)Reply

Five years ago I created entry w/ this reconstruction Dacian *skuia... you can delete it if something is wrong. AshFox (talk) 11:59, 14 April 2026 (UTC)Reply

Currently, the page κινούβοιλα (kinoúboila) claims that it is recorded by Pedanius Dioscorides in the 1st-century CE. This is false: These words are first attested in the Recensio Vindobonensis, which is a particular branch of the manuscript tradition for the De materia medica.^[1] Thus, these quotations are instead attributable to Pseudo-Dioscorides, who is some later writer than edited a manuscript of the original work by Dioscorides. According to the classicist Andrew Dalby, this Pseudo-Dioscorides perhaps intended to provide Roman doctors with linguistic information to help them communicate with Dacian-speaking herbalists, implying that the text was composed after the Roman conquest of Dacia by Trajan.^[2] Pseudo-Dioscorides also provides numerous Egyptian glosses, some of which apparently have been correlated with attested Egyptian terms, such as the form μίθ (míth) and Egyptian mꜣtt.^[3] If the Egyptian forms are reliable, that could imply that many of the Dacian terms are also accurate.

Most of our Dacian entries don't appear to fully abide by the existing attestation of plant names. The majority of these entries appear to be taken from Psuedo-Dioscorides, who actually recorded the terms in the Ancient Greek script. For instance, here is the original attestation of Dacian χόδελα (khódela), which is currently recorded on Wiktionary as khodela. However, the Dacian term κινούβοιλα (kinoúboila) is actually recorded on Wiktionary in the Greek script. Moreover, Wiktionary does not transliterate Paeonian terms in the Ancient Greek script, which are also only attested in mentions from Ancient Greek authors. Graearms (talk) 22:52, 15 April 2026 (UTC)Reply

@Graearms, if you feel up to the challenge, it would be great if you could clean up those entries to the best of your ability. --{{victar|talk}} 04:03, 16 April 2026 (UTC)Reply

@Victar I've gone and cleaned up several entries, but there are still a bunch left.

We have this duplicate page amalusta, which treats the Dacian gloss as a Latin term. I'm not sure if Wiktionary should simultaneously take this gloss as both a Latin and Dacian form, so one of these pages should perhaps be deleted. However, we also record Paeonian glosses as both Paeonian and Greek, such as on the page μόναπος (mónapos). Graearms (talk) 19:27, 16 April 2026 (UTC)Reply

Please add an etymology-only language code for Proto-Ob-Ugric urj-obu-pro or urj-oug-pro (Ob-Ugric languages) as part of Proto-Ugric urj-ugr-pro. Proto-Ob-Ugric is a descendant of Proto-Ugric and an ancestor of Proto-Khanty kca-pro and Proto-Mansi mns-pro. This code will be useful for many etymologies, for example Proto-Khanty lemmas and not only, also in Old Novgorodian югъра (jugŭra). It would also be nice to see the code for Proto-Hungarian urj-hun-pro ‒ Hungarian language of the period 500-1000 AD (before the appearance of Old Hungarian ohu around 1000), and this period is omitted from the history of the Hungarian language in Wiktionary. AshFox (talk) 18:16, 13 April 2026 (UTC)Reply

@AshFox: Ob-Ugric is an impossible-to-reconstruct, likely invalid branch. Definitely a very, very bad idea to have a code for it. You should really ask users with more knowledge of Uralic languages to help you with etymologies like this. Old Novgorodian had very little chance of having any contact with Ob-Ugric at all. Thadh (talk) 18:57, 13 April 2026 (UTC)Reply

Agreed. Both terrible ideas. Ob-Ugric is a highly disputed grouping, so creating a code for it would just hardcode a shaky hypothesis as fact, even as an etymology-only code. "Proto-Hungarian" isn't a clearly defined or reconstructable stage, so adding a code would just create a fake sense of precision without adding real etymological value. --{{victar|talk}} 21:31, 13 April 2026 (UTC)Reply

@Victar Since you say that this is bad, let me remind you that the term Proto-Ob-Ugric (with reconstruction) is used on Wiktionary ‒ 127 times! And the term Proto-Hungarian ‒ 8 times. If you conclude that the presence and use of the etym-only code Proto-Ob-Ugric is unacceptable, I will be forced to remove all 127 (not made by me) reconstructions and mentions of Proto-Ob-Ugric from Wiktionary. Just to be consistent. AshFox (talk) 02:46, 14 April 2026 (UTC)Reply

You're absolutely correct -- all those instances should be removed. --{{victar|talk}} 03:56, 14 April 2026 (UTC)Reply

@Thadh In the article by Vladimir Napolskikh (2005) the name of the Nenets people is reconstructed in Proto-Ob-Ugric for the ancestors of Khanty and Mansi. Since you say that the reconstruction of Proto-Ob-Ugric is not allowed, I will be forced to use it as a reconstruction at a level below Proto-Ugric (which is on Wiktionary), and in doing so I will have to remove 127 mentions and reconstructions of Proto-Ob-Ugric that already exist and were added by someone on Wiktionary.

You should really ask users with more knowledge of Uralic languages to help you with etymologies like this. ‒ Why? I wrote the etymology section citing linguistic sources, not just making something up out of thin air. Or am I doing something wrong by using a scientific article as a source and directly citing it? Why should other users know more about the etymology of a term than I do, who directly cites the work of Finno-Ugrist and a linguist.
Old Novgorodian had very little chance of having any contact with Ob-Ugric at all. ‒ What are you talking about? The Yugra region (in the northern Ural Mountains and the lower reaches of the Ob River) was administratively part of the Novgorod Republic as a volost (волость). Novgorodians collected tribute from the local population and hunted fur animals there. Of course, they interacted with the local population and readily adopted the term for the region and the people, a term which, according to the etymology article, goes back to the Proto-Ob-Ugric name for the Nenets.

AshFox (talk) 03:16, 14 April 2026 (UTC)Reply

Napolskih is a very good public speaker, but his work is often extremely controversial. This same person also claims that "Uralic-Yukaghir is proven beyond any doubt" and often plays fast and loose with the comparative method.

The problem with creating etymologies for languages you don't know (or, in this case, involving languages you don't know) means that you cannot judge which sources are good and which are absolute hogwash, as is evidenced in this discussion. I can find you published sources stating that Russian borrowed its term for Moscow from Arabic through Sumerian, or that some obscure village in the Leningrad Oblast is named after a Chinese emperor in the second century. That doesn't mean that this is a good etymology.

Old Novgorodian culture indeed spread far and wide, but we have little evidence of linguistic contact between Slavic-speaking Novgorodians and most other languages in their cultural sphere. Tribute could easily have been collected by the Komi for instance. The feudal system of the middle ages did not require the lord himself traveling to his vassals' vassals' vassals. Thadh (talk) 08:24, 14 April 2026 (UTC)Reply

I’m not quite agree, but it would be more secure for now to refer to Napolskikh in the etymology sections by his name. Like "according to Napolskikh, blah blah blah", it would be a good way to mention him without including doubtful reconstructed language codes. He is not a science freak as far I know, so it may be a good idea to mention him in some cases? Tollef Salemann (talk) 21:25, 15 April 2026 (UTC)Reply

Probably. We refer to Altaic in this fashion, and disputed stages of Proto-Mongolic (→ *hünegen), Turkic and Tungusic. Probably there is a need for crazy people in so difficult a material and in Russia. Fay Freak (talk) 13:37, 16 April 2026 (UTC)Reply

Do not support either. Proto-Ob-Ugric is maybe conceivably real but has little consensus about its reconstruction. "Proto-Hungarian" is not well-defined as a concept and the very few indirect early medieval attestations of Hungarian can be tagged as Old Hungarian without problems. However, I would suggest that most POUg. mentions should be probably unpacked into the Proto-Mansi and Proto-Khanty forms instead of being projected further back into an also hypothetical Proto-Ugric. Mentions on etyma of Uralic origin such as Proto-Khanty *jĕj (“night”) seem to me superfluous entirely.

(Tangentially: our current crop of Proto-Khanty etyma seems like an unfortunate mess in vowel transcription that mixes together Honti's, Helimski's, Zhivlov's and even Steinitz' systems, as well as seems terribly confused between if lax vs. tense vowels would be *a *ā or *ă *a or maybe even *ă *ā. FWIW personally I recommend the 2nd option.)

— In the case of Yugra, the main locus for discussion should be probably primarily Proto-Khanty, as the Mansi reflexes are limited to Northern and Eastern, show multiple variants in Northern (a form *jārɣən should not give ё̄рн; also e.g. "ё̄рын" is mis-orthographicized from Munkácsi's jå̄rėn and simply stands for this same word), and they could be instead borrowings from Northern Khanty or even Komi. Though it is in fact reconstructed to POU already by Honti. We should probably check out and maybe cite also Hajdú 1950, "Die benennungen der Samojeden" which I believe was the initial combing-together of this data. --Tropylium (talk) 13:24, 17 April 2026 (UTC)Reply

I would like to start a follow-up to Wiktionary:Beer parlour/2025/August#Proposal to render reconstructed Proto-Norse in Latin script. To summarize the rational:

Scholarly literature consistently uses Latin script for Proto-Norse reconstructions.
Runic orthography shows variation and ambiguity, while Latin script is explicit.
Greater readability, searchability, and accessibility for readers.

@Fay Freak, Mahagaja, Vahagn Petrosyan, Mårtensås, Benwing2, Exarchus, Chuck Entz, Thadh, Blockhaj --{{victar|talk}} 08:00, 16 April 2026 (UTC)Reply

Boo, runes are more fun :( Jokes aside, the practicality of latinization is obvious, but it also comes with the problem of shoving away runic and period ambiguity. Language is never precise, and the ambiguity of runic script is good for this. For the same reason we do not re-spell English to be more phonetic, I prefer current practice. Each runic entry still should have a latinized bi-entry. ᛒᛚᚮᚴᚴᚼᛆᛁ ᛭ 𝔅𝔩𝔬𝔠𝔨𝔥𝔞𝔧 09:20, 16 April 2026 (UTC)Reply

I agre with Blockhaj, I am of the opinion that if a language is attested in one consistent script, it is best to also record reconstructions in that exact script. One main reason why printed publications do not do this is because of font/printing issues, which is a non-issue for us. Another is scholarly consensus, but we don't really need to follow it if it's not beneficial to us. We can always make hard/soft redirects from Latin-script reconstructions if need be. Thadh (talk) 09:42, 16 April 2026 (UTC)Reply

The transcription script is tantamount to reconstruction and conlanging for coarse driveby editors who are invited to create entries despite being clueless. Correct me if I am wrong, so far Runic script warded these fabrications off. The alternative would be copying-from-Pokorny style with hyphens for morphological or other often outdated schemes: I don't think it would achieve what you like to believe it would achieve. Proto-Norse editors don't want to be role models for that, that's why they stick to what original speakers would have written, so the nature as an actual language is being kept in mind during the creation of entries.

Accessibility will not be consistent if attested entries use the original script. Fay Freak (talk) 13:33, 16 April 2026 (UTC)Reply

Reconstructions are based on phonological analysis, so presenting them in an ambiguous script defeats their purpose. It obscures the very distinctions the reconstruction is meant to capture. And to what benefit, aesthetics?

The font/printing argument is wholly unconvincing. Even if it had some relevance in the past, it certainly doesn't for a paper published in 2018. And no, using a script like runic does not filter out bad editors; we already have clear evidence of that.

There's no practical upside to using the runic script for reconstructions, and it actively makes them worse.

--{{victar|talk}} 09:26, 17 April 2026 (UTC)Reply

There is deeper thought at times in runic reconstructions, and knowledge around this is hampered if we dont use it. It also looks confusing if we use two separate writing systems for primary entries, one of which is unhistorical. I only see this proposal as shoving a spanner in the works if it goes through. A better proposal would be to look for better technical solutions, like being able to feature two searchable writing systems in the same entry. ᛒᛚᚮᚴᚴᚼᛆᛁ ᛭ 𝔅𝔩𝔬𝔠𝔨𝔥𝔞𝔧 09:53, 17 April 2026 (UTC)Reply

"Deeper thought"? We talking about runic magic here? 🤣 --{{victar|talk}} 07:50, 18 April 2026 (UTC)Reply

No. Certain runic grammar, like the addition of epentheses, supplementative characters (for the lack of a better term), etc. ᛒᛚᚮᚴᚴᚼᛆᛁ ᛭ 𝔅𝔩𝔬𝔠𝔨𝔥𝔞𝔧 17:12, 19 April 2026 (UTC)Reply

Seems like we miss this small language under Siloid languages. Can we add it under the family tree? According to the latest research (2015), it is not intelligible with Akeu. Chihunglu83 (talk) 05:58, 22 April 2026 (UTC)Reply

Hi! I would like to make a request to add Tugunese (alternative name: Mardjiker Creole, Batavian Portuguese Creole, Tugunese Creole, ISO 639-3: [tvg]) to Wiktionary. The information regarding the language can be accessed here https://iso639-3.sil.org/request/2025-016. Nandusia (talk) 04:21, 23 April 2026 (UTC)Reply

Wiktionary Request pages (edit) see also: discussions
Requests for verification Requests for verification in the form of durably-archived attestations conveying the meaning of the term in question.	Requests for deletion Requests for deletion of pages in the main and Reconstruction namespace due to policy violations; also for undeletion requests.	Requests for deletion/Others add new request \| history Requests for deletion and undeletion of pages in other namespaces, such as appendices, templates and modules.	Language treatment requests add new request \| history Requests for changes to Wiktionary's language treatment practices, including renames, mergers and splits.
		Requests for moves, mergers and splits add new request \| history \| archives Discussion of proposed moves, mergers and splits of entries or other pages.	Category and label treatment requests add new request \| history Requests for changes to Wiktionary's categories or labels, including additions, deletions, renames, mergers and splits.
		Requests for cleanup add new request \| history \| archives Cleanup requests, questions and discussions.
`{{attention}}` • `{{rfap}}` • `{{rfdate}}` • `{{rfquote}}` • `{{rfdef}}` • `{{rfe}}` • `{{rfeq}}` • `{{rfex}}` • `{{rfi}}` • `{{rfp}}` • `{{rfref}}` • `{{rfscript}}` • `{{rftranslit}}` • `{{t-needed}}`

Wiktionary:Language treatment requests - Wiktionary, the free dictionary

Meänkieli and Kven, Separate Languages from Finnish Treatment Request

Redid Chinese labels

Merge Dutch Low Saxon (nds-nl) and German Low German (nds-de) into Low German (nds)

Atemble [ate] -> Mand

Kamberataro [kbv] -> Dera

add Molet language?

add Kor[a/u]pun-Bromley language under some name?

add Bai-Maclay language??

Papi [ppe] to Baiyamo

Yaul [yla] to Ulwa (New Guinea)