A speculative ethnolinguistic family tree of non-African (and some African) humanity; or, notes toward Proto-World, part 1
Featuring: cowboys and Indians on the Eurasian steppe.
Recently I stumbled across the work of Gerhard Jäger, a German professor involved in computational historical linguistics: the project of using computer programs to automate the kinds of analysis that historical linguists do to reconstruct how languages have evolved. The conventional historical-linguistics rule of thumb is that, beyond five or maybe ten thousand years into the past, connections between languages become difficult or impossible to rigorously uncover. But could computers do better, identifying related language families and families of families, maybe going all the way back to some original, Edenic “Proto-Human” or “Proto-World” language? An initial step would be a global language phylogeny. A 2016 paper that Jäger co-wrote included this programmatically derived “world tree of languages”:
Tantalizingly, the researchers’ methods managed to automatically pick up on well known language families like Indo-European and Afroasiatic (which includes Semitic). But, in properly sober academic fashion, the paper warned that the tree should not be taken at face value: in addition to including some glaring anomalies (e.g. the language of the Sioux is grouped with Southeast Asian languages), it conflates similarities arising merely from intergroup contact with similarities arising from true common descent. Elsewhere, Jäger and his collaborators have acknowledged that “automated methods for cognate detection” still underperform old-fashioned expert judgment, at least as of 2018.
Nonetheless, I was intrigued. I knew that past attempts at constructing broad families of language families (Nostratic, Eurasiatic, Altaic, Transeurasian…) were regarded by many linguists with deep suspicion, if not outright disdain. For whatever reason, this topic attracts almost as many cranks as the topic of my next essay: the historical accuracy of the Bible. But, as an ancient-DNA fanboy, I also knew that science had made great strides in recent years toward understanding human genetic history. While ancestry and language do not, of course, always line up, it is still the case that — especially in the time before mass media, formalized schooling, and fast telecommunications — the main way for people to learn a language has been to hear one or both of their parents speak it.
So I thought it might be feasible to manually smoosh together historical linguistics and ancient DNA into one big phylogeny — at least outside of Africa. The problem with Africa is that human history runs so deep there (660,000 years for Homo sapiens,1 at least three million years if you count all known stone-tool makers2) that it seems less obvious to me that all the languages are even related. Over a few million years, is it possible for language itself to have been independently invented multiple times by widely dispersed small groups? Maybe? In contrast, the out-of-Africa migration that gave rise to present-day non-Africans was a single discrete event, perhaps involving just a couple thousand people.3 I have to think that they all spoke the same language and that all the languages spoken by their cultural descendants across the world trace back to this same Proto-Out-of-Africa tongue.
What I didn’t fully appreciate was that, although the big picture of human genetic history seems well understood, some of the fussier medium-scale details that might be relevant to language-family relationships are still in the midst of being worked out. Also, as an amazing 2023 paper demonstrated, the complex admixture graphs that appear in many ancient-DNA studies are probably unreliable or at least shouldn’t be taken literally. The space of possible admixture graphs is so vast that it’s usually possible to find multiple graphs that fit the data well, but simulations suggest that these are often overfitted and inaccurate.
Unable to grab a comprehensive family tree off the shelf, I made my own by hand. And since it is my own, be warned that it’s a medley of longstanding consensus ideas, aggressive interpretations of recent papers and preprints, and a few idiosyncratic theories that made sense to me.
I’m calling this an “ethnolinguistic” family tree to try to convey that it’s genetic-ish but not exactly genetic. Genetically, human groups don’t form a “clean” tree structure: they’ve repeatedly mixed, diverged, and mixed again. Cultures blend; languages borrow words and acquire areal features. Nonetheless, I think it’s meaningful to say that, for instance, the United States is, overall, an ethnolinguistic descendant of Great Britain, even if British ancestry no longer represents the bulk of the gene pool and even if non-British cultural influences have also mattered.
Below I’ll present the tree. Then I’ll discuss some of the thinking behind it. I’ll leave the actual reconstruction of the Proto-Out-of-Africa language, or proof that such reconstruction is impossible, as an exercise for the reader.
In praise of ecoregions
Before I get into the details of the phylogeny, I want to offer thanks to the creators of the “ecoregion” concept, which I think goes back to a 2001 paper:
We define ecoregions as relatively large units of land containing a distinct assemblage of natural communities and species, with boundaries that approximate the original extent of natural communities prior to major land-use change.
You can view these ecoregions (as slightly redefined in 2017) overlaid on Google Maps, but, as a typically geographically ignorant American, I usually find it easier to think in terms of the 52 larger, simpler “subrealms” defined by the One Earth Bioregions 2023 framework:
While I haven’t bothered trying to analyze this rigorously, I’ve been struck by how often ancient human groups seem to match up with subrealms and ecoregions — which makes sense if you assume that hunter-gatherers and farmers will tend to specialize in plants and animals that may not cross the borders of such regions. Anyway, if I refer to something like the “Central East Asian Forests” subrealm below, that’s what I’m talking about.
Rise of the rhino boys
Once upon a time, 83,000 years ago,4 in the Sub-Saharan Afrotropic subrealm, a group of Homo sapiens crossed the Nile and headed toward the Horn of Africa. Some 19,000 years later — i.e. around 64,000 years ago5 — a small group of the descendants of these East Africans crossed the Red Sea at the strait of Bab-el-Mandeb (“the Gate of Grief”!) and took up residence in southern Arabia.6 This was not the first Homo sapiens out-of-Africa migration, but it was the one that had by far the biggest long-term impact (especially if you were part of the megafauna community).
By 58,000 years ago, some of these out-of-Africa people expanded north into the Levant, where they encountered and “mixed” with Neanderthals.7 An early foray out of the Levant and into Anatolia and then Europe mostly fizzled out,8 but the next one stuck, creating the first major split in the out-of-Africa family tree. On one side was the western branch that (initially) stayed home in the Near East; on the other side was the eastern branch that colonized the lands beyond. Around 54,000 years ago,9 the eastern vanguard moved beyond the Urals into East Asia and then, amazingly quickly, into Southeast Asia (Sunda), Wallacea, and Greater Australia (Sahul) — all within a few millennia.10 Overall, the route looked something like this:11
But then something weird happened: perhaps 37,000 years ago,12 a group of ancient Southeast Asians (genetically more akin to today’s Andaman Islanders than, say, today’s Vietnamese people, most of whose forebears only showed up later) broke off and headed back north. Some of these people, I suspect, ended up turning west into the Himalayas, becoming the ancestors of the Kusunda, a tiny hunter-gatherer group in Nepal that speaks a “language isolate” (that is, a language with no well established relatives, although what I’m trying to claim here is that it actually does have relatives).13 But those who didn’t turn into the Kusunda kept going north. Somewhere around the Altai-Sayan region in southern Siberia they encountered some of their long-lost western cousins, who had started to make their own forays past the Urals and into Asia. The resulting fusion of eastern and western ancestry created what researchers call the Ancient North Eurasians (ANE). One recent model put their western ancestry at 76% of the total,14 but — in a preview of things to come — it looks like all the surviving male lineages were eastern, falling into the Q and R Y-chromosomal haplogroups. I have to guess that this was more of a patriarchal conquest than a pleasant potluck. Though the ANE seemed to have settled on the Altai-Sayan region, from Denisova Cave to Lake Baikal, as their home base, a breakaway group actually went as far north as the Arctic Ocean. They had, as the kids say, that dog in them.
Why did the pre-proto-ANE make this strange journey from Southeast Asia to the Arctic? I have a guess: rhinoceroses. They were rhino hunters. The Arctic ANE archaeological site is, in fact, known as the “Yana Rhinoceros Horn Site.” Today, Sumatran, Javan, and Indian rhinoceroses live in Southeast Asia and at the foot of the Himalayas, but, not so long ago, various rhinoceros species occupied a much wider area, with the now extinct wooly rhinoceros and Merck’s rhinoceros (close relatives of the Sumatran rhinoceros15) roaming across most of Eurasia. And don’t forget the Siberian unicorn. Because this macho lineage was, perhaps, chasing after rhinos, I have come to think of them as the rhino boys. (No one else calls them this.)
Ice Age, Collision Course
But then, from 26.5 to 19 thousand years ago, it got really cold. Pedants call it the Last Glacial Maximum, not the Ice Age, because technically Earth has gone through many different “ice ages,” but at least one paper with David Reich listed as the final author just calls it “the Ice Age” anyway, and that’s good enough for me.
During the Ice Age, the ANE continued to hole up in the Altai-Sayan region. But by the time it ended, they had fragmented into two groups. One eventually went west, toward Eastern Europe; their descendants contributed the bulk of what came to be known as Eastern (European) Hunter-Gatherer (or Sidelkino) ancestry, which in turn contributed about half of the gene pool (and nearly all of the Y chromosomes) of the Proto-Indo-European-speaking Yamnaya on the western steppe. And not just them: I think the the western ANE also moved into the Tian Shan mountains and from there to what is now Pakistan, giving rise to the Burusho people, whose Burushaski language is regarded as an isolate. Many Burusho males carry Y chromosomes from the relatively rare rare R2 haplogroup, signaling their link to the rhino-boy ancestors.
The other ANE group went east, toward what was then a single land mass north of Honshu, Japan: the Paleo-Sakhalin-Hokkaido-Kuril (PSHK) peninsula. There (or maybe on the way), the eastern ANE “mixed” extensively with Northeast Asians, but once again nearly all the male lineages that emerged were rhino-boy lineages, not Northeast Asian ones.16 In turn, these PSHK people, perhaps under pressure from incipient Jomon expansion into Japan, split into two groups. To the east, probably hugging the Pacific coast and not spending much time on the Beringian land bridge itself, some PSHKers made it all the way to the Americas and became Native Americans. To the west, other PSHKers stayed in Asia but spread both into the taiga and back toward the old Altai-Sayan homeland.17 The ancient-DNA literature usually calls these people “Ancient Paleosiberians,” but I will (a bit cheekily) call these closest Asian relatives of Native Americans “Asian Americans.”
Thus, by the onset of the Holocene epoch, the descendants of the rhino boys had spread all across the globe — before the Yamnaya were even a twinkle in a Sidelkino eye. The later explosive expansion of Indo-European-speaking peoples was, in a way, part of a 30,000-year-old tradition.
This explains why there are so many intriguing historical-linguistic hints of a special bond between, for example, the Na-Dene Native American language family and the Yeniseian language family (represented today by the Kets, who are “Asian American” in my sense18), and between Burushaski and Indo-European,19 and between Yeniseian and Burushaski,20 and between Na-Dene, Yeniseian, Burushaski, and Kusunda.21 All these language families are in fact genealogically related; it’s just that so much time has passed since they diverged from each other that the evidence of that relationship has been greatly attenuated.
Oddly, if this theory is correct, then the most widely accepted of those linguistic relationships — between Na-Dene and Yeniseian — is real but somewhat misleading. Yeniseian would be the sister of the common ancestor of all Native American languages, not just Na-Dene, which might simply be unusually conservative in its evolution.22
As I pondered all this, I realized that that the defining conflict of the Western genre — cowboys vs. “Indians” — pitted long-lost ethnolinguistic cousins against each other: the cowboys could trace their male ancestors back to the Corded Ware and, before that, the Yamnaya and the Sidelkino and the western ANE; the “Indians” could trace their male ancestors back to the Paleo-Sakhalin-Hokkaido-Kuril peninsula and the eastern ANE. Why can’t rhino boys just get along?!
Even stranger, the Wild West was not the first time and place that cowboys confronted “Indians”: it also happened in the Late Bronze Age, at the eastern edge of the Kazakh steppe. Around 1300 BCE, Iranian speakers who could trace their male ancestors back to the Indo-European Corded Ware and Sintashta cultures, and who rode horses and herded cattle (🤠), expanded east into the Altai-Sayan and “mixed” with the Asian Americans there, creating the demographic core of the Iron Age’s Scythians.23 It appears that the highest elite of the later Xiongnu and Huns were also primarily derived, on their fathers’ sides at least, from Scythians (suggesting, incidentally, that they spoke an Eastern Iranian language), though some Asian American males did marry into the upper echelons.24 The first great steppe empire: see what cowboys and Indians can accomplish when they work together?
Next time, probably: Turks, snails, and Mesopotamians. [See Part 2]
The estimated date of the divergence of Homo sapiens from the common ancestor of Neanderthals and Denisovans, as per Schlebusch et al. 2017 (TT method).
John Hawks has a very helpful recent post about this: “All the hominins made tools.”
I got this figure from Rito et al. 2019, though I must admit I haven’t tried hard to find other estimates.
Schlebusch et al. 2020, Table S7.2, “Eastern Afr vs. Non-Afr.” Y-chromosome-based estimates look similar: Hallast et al. 2023 dates the most recent common ancestor of haplogroup CT to 77.8 kya.
Climate models from 2021 suggest that “there was a sizeable window of sufficiently wet climate between 65k and 30k years ago” for a so-called southern route out of Africa. This date accords with a mitochondrial DNA phylogeny that dates the most recent common ancestor of non-African haplogroups M and N to 64.3 kya (Rito et al. 2019, Fig. 1). Essel et al. 2023, Supplementary Figure 5.2, looks similar. I would just say 65 kya, but it seems rude not to give my ancestors at least a millennium to acclimate themselves to the new climate.
I don’t feel like getting into why I favor the southern route over the northern route, but it doesn’t really matter for my purposes here.
I’m going with the point estimate from Marchi et al. 2022. I think everyone agrees that the big Neanderthal admixture took place some time between 50 and 60 kya. By the way, I suspect that “basal Eurasians” didn’t really exist as a distinct group — they were just the out-of-southern-Arabia rear guard, who ended up with a smaller share of Neanderthal ancestry. See Quilodrán et al. 2023. A Razib Khan blog post is what first sparked my suspicions.
This is my interpretation of the Zlatý kůň basal out-of-Africa lineage. Some Zlatý kůň–like ancestry may have persisted into later groups (see Bennett et al. 2023), but not much.
What I take to be the eastern Y-chromosome haplogroup (upstream of NO and QR) formed 54.5 kya and split 52 kya, according to Hallast et al. 2023. The earlier Hallast et al. 2021 dated the divergence of the western LT (K1) clade from the eastern K2 clade to 54.4 kya.
Hallast et al. 2021 dates the most recent common ancestor of the MS Australopapuan haplogroup to 51 kya, and the earliest uncontroversial evidence of human occupation in Australia dates to 50 kya.
I’ve often seen it suggested that the out-of-Africa expansion traveled along the southern edge of Iran and India, but as far as I can tell there’s no clear empirical evidence for this theory. Also, I think it’s contradicted by the way that the Ancient North Eurasian QR Y-chromosome haplogroup is nested within a larger Southeast Asian haplogroup; its nearest relative, P*, was discovered in an ancient sample from the Andaman Islands (Moreno-Mayar et al. 2018, supplementary materials, p. 16-17 and Figure S9)! Meanwhile, Neanderthals and Denisovans clearly made it to Denisova Cave, way out in the Altai, and Denisovans ended up out in East Asia and Sunda, so why not assume H. sapiens took the same path?
Y haplogroup QR first split into Q and R 36.1 kya, according to Hallast et al. 2023, and the Yana ANE individuals are dated to 32 kybp, so 37 is just a guess but seems plausible. It also roughly lines up with the end of Heinrich Event 4, a global cold period, as pointed out in Bennett et al. 2023, so maybe the proto-rhino boys only began to migrate once the climate got pleasant enough.
Tellingly, “some Kusunda patrilineal clans still belong to the root of haplogroup Q1* (van Driem 2021), which formed ~34 kya as per the Hallast et al. 2021 dates. Osada and Kawai 2021 (Supplementary Table 3) reported unexpected genetic affinities linking the Kusunda, the Jehai (a Malaysian indigenous group), and Utahns of Northern European descent. I suspect that the common element is the ancient, Onge- or Hoabinhian-like Southeast Asian ancestry that wound up in the ANE and later the Yamnaya. Linguists have actually found respectable evidence of a genealogical relationship between between Kusunda, Burushaski, Yeniseian, and Na-Dene, though the “Dene-Kusunda hypothesis” is certainly not a consensus view.
Maier et al. 2023, Figure 4 — source data 4, b and c (and others). To be fair, I am cherry-picking here: other well fitting admixture graphs show ANE as entirely western. This is why it’s hard to rely on admixture graphs! For what it’s worth, the main admixture graph in Vallini et al. 2022 shows ANE as 50/50 eastern/western.
See Liu et al. 2021, Figure 2.
The one exception is the rare Y haplogroup C-MPB373, which is one of the founding Native American male lineages (see Sun et al. 2021), but a large majority of unadmixed Native American males belong to Y haplogroup Q.
Zeng et al. 2023 (a preprint as of this writing) distinguishes between “Route 1” and “Route 2” Ancient Paleosiberians. Route 1 APS (associated with the Cisbaikal_LNBA cluster) are, in my terminology, Altai-Sayan Asian Americans. Route 2 APS (associated with Kolyma, the Syalakh-Belkachi cluster, Paleo-Eskimos, etc.) are taiga Asian Americans.
See Zeng et al. 2023.
From van Driem 2021: “The finding that Y-chromosomal haplogroup R2 (M479) is the most frequently occurring paternal lineage amongst the Burusho turns out to dovetail neatly with Ilija Čašule’s theory of a deep linguistic relationship between Burushaski and Indo-European.”
See Starostin et al. 2021 (preprint), which finds a seemingly statistically significant lexical similarity between Yeniseian and Burushaski (and a less statistically significant lexical similarity between Yeniseian and Na-Dene).
See Gerber 2017 (“The Dene-Kusunda Hypothesis: A Critical Account”).
I think this idea is supported by the finding in Zeng et al. 2023 that ancient Athabaskans were not admixed with Paleo-Eskimos, contrary to previous claims. Also, Agil et al. 2023 showed that the basal-Na-Dene-speaking Tlingit are quite similar to other northern Native American groups like Haida, Salishan, and Tsimshianic speakers, without any evident “Asian American” ancestry. It would strike me as weird for Na-Dene speakers to be genetically nested inside the NNA clade while their language was Asian American–like. Such things do happen, but I know of no good evidence that they happened here.
See Keyser et al. 2021. I suspect that the non-Scythian ancestral component of the Xiongnu — the Ulaanzukh/Slab Grave people — were ethnolinguistically “Asian American” but not exactly the same group as Cisbaikal_LNBA/Yeniseian. For one thing, they belonged to a different Y-chromosomal haplogroup (Q1a instead of Q1b); for another, they were much more admixed with “normal” Northeast Asian ancestry (Lee et al. 2023 models them as 24% Cisbaikal_LNBA-like, 76% Amur River Basin-like, while earlier papers didn’t pick up on the Asian American component at all). I think the proto-Ulaanzukh branched off relatively early from the Cisbaikal_LNBA, perhaps taking up residence in the Ordos Plateau and creating the Scythian- and Xiongnu-linked Ordos culture. But ultimately they were a different variation on the Asian American theme. Incidentally, this might explain the evidence used to argue that the Xiongnu spoke a Yeniseian language: while the elite were speaking Eastern Iranian, some of the grunts might have spoken the Yeniseian-like tongue of their birth.