October 10, 2003

This is not a handbook; it is a collection of articles, and their subject is not historical linguistics - which would limit their scope to the written past - but it is language change and linguistic reconstruction. Now what is that? Imagine a tree-shaped transmission network; a message is inputted at its root from where it is broadcast to its leaves ("terminal nodes" in jargon), travelling along its branches ("arcs"). Transmission is imperfect and subject to error. Your mission is, on the sole evidence of the garbled versions of the original message collected at the terminal nodes: reconstitute the tree, the original message and the transmission errors. Here, the message is a language, the transmission its evolution through time, resulting in new languages that can be as different as English and Hindi.

If you think that that is similar to piecing together the truth out of the contradictory evidence of second, third, fourth-hand witnesses, you are right. The tree model, however, is only a convenient approximation. In reality, the graph of the evolution of the message, be it a language, be it a testimony, be it a DNA sequence, is seldom a tree, but contains circuits - "bridges" between branches. This happens when a version of the message is influenced by another version on a neighbouring branch. In terms of language evolution it is when, for instance, English took "sky" from Danish, or took "pork" from French.

Very little specialist knowledge is required to solve this linguistic puzzle. Take, for instance, Bernhard Karlgren's introduction to his Analytical Dictionary of Chinese and Sino-Japanese , a classic reprinted in an affordable Dover edition. In just 30 pages, Karlgren explains how it was possible to reconstitute the pronunciation of archaic Chinese (500BC). It is hard going in places, like any good whodunnit, but a student of Chinese would be at no particular advantage over a layman. And like any good whodunnit, it is hard to put down. When you eventually do put it down, it is with the exhilarating feeling of having solved a difficult case. You will have learnt a great deal about linguistic reconstruction into the bargain, and about Chinese too. All that out of a paltry 30 pages.

What do you stand to reap out of the pages of this Handbook of Historical Linguistics ? Having waded through it, still at a loss to form a coherent opinion, I turned to the bibliography. Almost 100 pages long, with some 2,400 references, it should list everything and everyone relevant to the subject. Robert Blust is the foremost modern expert on the comparative study (call it "historical" if you really insist) of the Austronesian language family that extends halfway across the globe, from Madagascar to Easter Island, covering the Malaysian and Indonesian archipelagos, the Philippines, Taiwan, Micronesia and more. No trace of Blust.

No trace of his predecessor either, Otto Dahl. No trace of Otto Dempwolff, the father of comparative Austronesian studies. No Arthur Cappell, who kicked off the comparative study of Australian Aboriginal languages. Jacob Grimm, the 19th-century scholar of Grimm's law fame who articulated the regularity of sound changes, barely scrapes in with one single reference.

But... just what is Stephen Hawking doing here, with his Illustrated Brief History of Time, The Theory of Everything and The Universe in a Nutshell ? And what is the relevance of Albert Einstein's On the Electrodynamics of Moving Bodies and The Foundation of the General Theory of Relativity ? From puzzling to hair-raising now, here comes Stephen Jay Gould with 18 publications, the most recent one dated 2002 (this is important, I will come back to it later). Does historical linguistics extend back to the dinosaurs?

A bibliography that amounts to little more than an exercise in name-dropping, so be it. Still, a line should be drawn somewhere. Would a handbook of archaeology allow Erich von Däniken and Zecharia Sitchin into its bibliography? This Handbook of Historical Linguistics does, in the person of Merritt Ruhlen, whose works are to linguistics precisely and exactly what Sitchin's are to archaeology and palaeo-astronomy (Daniken is a shining scholar compared with Sitchin). The fact that Ruhlen was published by Stanford University Press is not an excuse, only a reflection on how little their editors apparently know, or care. No, Ruhlen is not mentioned approvingly, but he is mentioned, and by three different contributors, none of whom gives the slightest hint that something is sorely amiss. Those who cannot, or will not, tell Galileo from Madame Blavatsky should stick to reading tea leaves.

Is anything salvageable from this "handbook"? Little, and the little is cause for frustration. Thus, in "Phonetics and historical phonology" John Ohala presents some statistics on speech misperception, showing the syllable ki misheard as ti or pi 47 per cent and 15 per cent of the time respectively, and correctly heard as ki only 38 per cent of the time.

Rather than "listeners occasionally mak[ing] errors in perceiving speech" as Ohala writes, they appear to do so with great frequency, and if those figures are correct, then we should expect ki to disappear within a few generations, replaced by ti . Heady stuff indeed.

But are those figures correct? Perhaps not, as Ohala writes: "The average rate of misperception is .173." When in fact, calculated from the table given, it is 0.34. Further, those are secondhand figures taken from a 1972 article by "Winitz et al ", and as Ohala does not bother to say what language was studied, nor how the statistics were obtained, beyond that "nonsense syllables are presented to listeners for identification", the truly inquisitive reader has no choice but to go hunting for the article.

Happily, its title gives the mystery away: "Identification of stops and vowels for the burst portion of /p, t, k/ isolated from conversational speech". Ohala has misled himself and the reader: speech segments much shorter than syllables were used, and it takes only the barest knowledge of acoustic phonetics to grasp the deception. But it takes more knowledge, of speech recognition this time, to realise the experiment was meaningless.

Understandably, since speech recognition was not even in its infancy back in 1972.

Frustration strikes again when Lyle Campbell writes: "One needs only contemplate Ruhlen's proposed Proto-Amerind etymon *t'ana 'child, sibling'

to see how easy it is to find similarities by chance." Not only is that as severe a debunking as you will ever find here of Ruhlen's travesty of the comparative method, but it begs for a chapter on chance resemblances, their likelihood, their effect on reconstruction. Don Ringe has published on the subject, but his single contribution is about internal reconstruction.

Particularly riling is that Ringe was one of the examiners for Olav Kuhn's MA thesis Computational Analysis of Language Relationships , which covered what has been botched here, or just plain ignored. I was another examiner, and gave Kuhn 98 per cent as his work was easily worth a PhD cum laude .

Kuhn's thesis does not rate a mention anywhere, although highly relevant.

Is that because, submitted in August 2001, it came too late? Unlikely: Gould's last mentioned publication is as recent as 2002.

As an example of the botch-ups that are laudably absent from Kuhn's monograph, take Gregory Guy's paper "Variationist approaches to phonological change". A 3 x 3 frequency matrix is submitted to the chi-square test. Of its nine cells, only two have expected frequencies above 5, the minimum for the test to be valid, two are marginally acceptable (4.9 and 4.6), the remaining five are unacceptable, with expected frequencies ranging from 1.6 to 3.3. The solution is to merge some rows and columns until all cells have expected frequencies of at least 5.

This is elementary statistics. It has not been done.

Susan Pintzuk's "Variationist approaches to syntactic change" is another example of statistical abuse. Her data consist of 14 paired counts of the position of the verb in main and in subordinate clauses, drawn from Old English texts dating from 884 to 1100AD. The two sets of points, which form two vaguely S-shaped curves, are then submitted to an ad hoc logarithmic transform to turn them into straight lines (nowhere shown in the paper).

This is standard procedure, its only purpose being to facilitate the computation of regression coefficients (but none of that is explained).

Then, because the slopes of the regression lines for main and subordinate clauses are almost identical (0.519 and 0.525), Pintzuk concludes that "the frequency of verb-second order is increasing at the same rate in main clauses as in subordinate clauses". This is nonsense, as much a parody of statistics as Ruhlen's publications are of linguistics. A cursory glance at the diagram suggests that the frequencies of verb-second constructions in main clauses and in subordinate clauses are quite independent. I went to the trouble of computing the correlation coefficient and I was not surprised at the result: r =0.376. With only 12 degrees of freedom in the data, this does not differ significantly from zero at the 90 per cent confidence level, let alone at the minimum required in practice: 95 per cent.

Such depths of ignorance are alas the norm with linguists. Exactly 30 years ago, before I knew much of statistics and computational methods, I came across an article by Wilhelm Milke that aroused my suspicions and prompted me to seek the advice of our consulting statistician at the Australian National University. As she read the article a smile formed on her face, and became wider until she burst into peals of uncontrollable laughter.

Once recovered, but still with tears in her eyes, she said: "Is that what you people believe?" Humbling words indeed. But yes, Yvonne, that is what most of us people believed then, what they still believe now, and what publications such as this only serve to enshrine.

So is there really nothing worth scavenging from this Handbook of Historical Linguistics ? No. What is not waffle or nonsense has already been said elsewhere, and better.

One must question one's judgement when it is so extreme and, doubting mine, I asked a fellow comparativist who happens to have been awarded two entries in the bibliography, both with "historical linguistics" in their titles. He never made it beyond half-way through the introduction. Lost interest.

Could not see where it was all leading. "An introduction should not be so long," he commented. And long it is, 131 pages followed by 50 pages of notes. In just a miserly 168 pages, Henry Hoenigswald presented an account of language change and linguistic reconstruction, clear, detailed and thorough, in 1960. A true handbook. It is old, yes, but no progress has been made since. Rather, progresses have been made, such as shown in Kuhn's thesis, but they have remained unpublished or ignored. And you can find a used copy of Hoenigswald's Language Change and Linguistic Reconstruction for as little as a fiver. Highly recommended.

Jacques B. M. Guy is a computer scientist interested in natural language understanding. He holds a PhD in linguistics from the Australian National University.

