I speak and write in a language which calls itself English. It's a member of a group of languages with a common ancestor, roughly the language of the folk who lived in England in the aftermath of the wars of the roses (but the other British nations all have versions of the language which had diverged by this time). Within that group of languages, the word English is used to mean of or pertaining to [the people of] England except when it is used as the name of a language, when it is used to refer to the whole group of languages, notwithstanding its propagation all round the planet (and beyond). I propose the name Anglic for the group, so as to liberate the adjective English from this exception.

Throughout the anglic group of languages, where there is a group of anglic speakers with a consciously separate cultural identity, those anglophones are collectively described by an adjective whose meaning is of or pertaining to them. This adjective is then used as the name of their particular variant of anglic (except in the case of English, which I'm trying to render capable of conforming to this pattern). Thus the people of Canada speak Canajan and the New Zealanders speak Kiwi; and, in each case, are politely proud of their difference and respectful of those of their peers.

Inconveniently, the group is woefully short of an adjective for of or pertaining to [the people of] the United States of America; which says something, it's not immediately clear what, about the cultural identify of that nation. For the present, I'll use the emerging term USAish when I need such an adjective (although one former colleague suggested gringo as an alternative with an established user-base – perhaps the right remedy is to use this until the gringos get round to offering an acceptable replacement). The adjective American is widely used in this rôle, but collides horribly with the adjective for of or pertaining to [the people of] the pair of continents called America: a Mexican, Peruvian or Canadian is indisputably American (as is the afore-mentioned former colleague), yet is not USAish, as the U.S.A.'s immigration officers routinely insist.

The U.S.A. (despite the afore-mentioned issues with its cultural identity) is presently being the world's cultural imperialist (a rôle lately relinquished by Britain, substantially through recognition that no-one has the right to so mis-treat the world) so its member of the English-derived group of languages has wide currency: but lacks a name for itself other than the generic, English. This has lead to the silly situation where the language I speak and write gets to be referred to as British English to distinguish it from American English when, given the above, they might properly be called English and USAish without ambiguity.

I contend that the addition of one word, anglic, would greatly reduce the amount of idiocy presently surrounding the word English. It's a natural word to introduce, given that anglophone is a word (in English, as well as in various other languages) for someone fluent in an English language; it also fits the form for Hispanic and Germanic (also applied, in anglic, to groups of languages), is short and rolls easily off an anglophone's tongue.

On the cladistics of the Anglic family

While the evolution and classification of languages is significantly messier than that of species (if nothing else, a language need not arise from only one parent, or even only two, as any half-way decent pidgin or creole will show you), it remains appropriate to employ the methodology of cladistics (grouping several species into a clade if they have a common ancestor – or, for those who have problems with the idea of species evolving, commonalities of morphology and genetic material such as would make shared ancestry seem credible if only you'd get over your problems) when devising nomenclature for languages. In these terms, the Anglic group clearly forms a clade. It is then worth considering how this clade may be sub-divided. The good folks at MIT have a project to discover your dialect of English, for those who speak it.

British English and Commonwealth English

One can make a fairly good argument for a clade, within the group, comprising the English of England and those of its erstwhile colonies outside Europe. It would be fair to charge that 'Strine owes so much to the English of Ireland as to undermine its inclusion in this clade; but then, the North American branch of this clade is similarly indebted to a host of other (mostly European) languages; but, to the extent to which cladistics is a reasonable model for language classification, this will do fairly well. I'll refer to this clade as Tudor-derived, since its members diverged from England's English after the reigns of the Tudors.

Crucially, the English of Scotland (sometimes called Lallands) diverged significantly from that of England before the separation of English and USAish: and it is significantly distinct from the Tudor-derived clade – as anyone reading Robbie Burns' poems in the original will realise. It may fairly be argued that Scotland's English is a richer and more expressive language: in any case, it clearly owes much more to Norwegian than does the Tudor-derived clade. The status of the English spoken in Ireland, Wales and Cornwall is a bit more complex: both were well established dialects diverged from England's version before the world had to endure English colonies, but their speakers (particularly on the mainland) were subjected to heavier pressure to conform to the language of their rulers (so that the analogy with cladistics becomes a bit furry). I am inclined to regard the assorted forms of Anglic spoken in the British nations with Celtic heritage as separate individuals, within the whole group but outside the Tudor-derived clade – I do not expect them to form a clade of their own (just as lizards can only be construed as a clade by counting mammals and birds as lizards).

The primary import of this is that, when it comes to classifying the Anglic languages, the only clade which includes the English and Scottish vernaculars is the whole group – these two were already evolving separately before Columbus sailed the ocean blue. This makes the term British English ridiculous. Those who use the term seem to actually mean England's English (and, at that, the posh forms of it), thereby revealing a failure to understand that Britain and England are by no means the same entity. Indeed, the ancient Celtic peoples ruled by the Romans were the aboriginal British who were displaced by the peoples who formed the English nation, so it seems inappropriate to use the term British unless at least they (and not necessarily the English) are to be understood as being referenced. To use it where they are specifically excluded is to fail to use the adjective English where it is clearly the correct word. Britain is a group of islands, whose peoples constitute several nations, within each of which are to be found significant dialect variations.

Within the Tudor-derived clade, there is some sense in separating the English spoken in the U.S.A. from that spoken in the commonwealth countries that fall within this clade. This might fairly be described as Commonwealth English and it includes the English spoken in India and much of Africa, among other places. However, the term Commonwealth English is presently in use to refer to the sub-clade, within this, comprising the parts of The Commonwealth where descendants of European colonists predominate.

Educated English

One can salvage both of the above usages by the somewhat undemocratic expedient of excluding the vernacular from discussion. Since the educated Irish, Welsh or Scots generally (albeit out of kindness to ignorant peers of other nationalities) defer to English usage, in public discourse, anglophones outside Britain can be forgiven for supposing a homogeneity within these islands. None the less, this would better be described as British Educated English.

Likewise one can salvage Commonwealth English: even in the countries of The Commonwealth where aboriginal peoples predominate (and, in this, we could include Scotland, Ireland and Wales), the educated classes employ a version of English which falls within a clade shared with the educated classes of England and the colonist-dominated commonwealth. This is, indeed, separable from the dialect of the educated classes of the U.S.A. – but, as for British English, this should really be described as Commonwealth Educated English.

The thing to note from both salvagings is that Educated English is what is actually being sub-divided. Indeed, Educated English is a clade within the Anglic family. Though it diverged only after (at least) Scottish – the educated classes were, that early, using Latin rather than their mother tongue – it is fair to say that the educated classes throughout the anglophone world have, since abandoning Latin, evolved a shared dialect of English which stands apart from the vernacular in any of their home countries. We may thus identify a sub-clade of the Tudor-derived clade, which may be described as Educated English. This is the language used (when any form of English is) by the community of letters and in most published writing; when they are not talking legalese, it is also the language of courts, laws and legislatures. It has become the international language of science (and, to varying degrees, other academic disciplines).

Within each Anglic-speaking community, Educated English lives alongside the language of the people – called the vernacular – and the educated folk who use the former also at least understand the latter and generally speak it, without necessarily thinking of it as a separate language. These various languages are what I mean (and am confident any serious student of languages would mean) by the languages of the assorted countries: the English of the U.S.A. is the USAish vernacular, notwithstanding the presence of an indigenous branch of the Educated English clade.

Naturally, usages from the vernacular forms of English sneak into what their speakers contribute to the evolution of Educated English. None the less, the strongly international character of this dialect has caused it to fragment relatively little (though it has evolved a lot – in particular, over the last two centuries, casting off the legacy of Latin, which shaped its early grammar). That, in turn, has caused it to diverge significantly from the vernacular in each community. Within Educated English, even USAish Educated English differs from the rest – Commonwealth Educated English – in little more than its spellings (and even these are slowly leaching into the rest of the group).

The important thing to understand about this is that the sub-divisions within Educated English do not parallel those within the English group as a whole. While the spelling of the written forms of English exhibit a split which does run across from the educated form to the vernacular – separating the U.S.A. from the rest – this is just a consequence of spelling falling primarily under the control of the educated, even when writing in the vernacular. While Educated English has become the international language of science and English has become the international language generally, the local variations in how English is spoken by the general population are separate from the variations in how it is spoken by the educated classes.

Language codes and the web

The web, being world-wide, includes infrastructure for naming languages: this is based on the naming scheme of a standard called ISO 639. The specification of HTTP elaborates on this by providing for language tags which are hyphen-joined sequences of tokens, such as fr (for French), en-US (for USAish), la-UK-legal (which I just made up, but it means the form of Latin used in the legal jargon of the UK's court system). The specification says that, if the first token (fr, en and la in the examples above) in the sequence is exactly two letters long it must be an ISO 639 code; also that, if there is a second (US and UK) and it is two letters long, then this must be an ISO 3166 country code (in which case it indicates that country's version of the language specified by the first token). The HTTP specification further countenances (though with a warning that this does not guarantee intelligibility) using one language as an acceptable surrogate for another if the former's tag is obtained from the latter's by hyphen-joining some more tokens on its right – thus, if a visitor to a web site has indicated they want a document in fr (French) and the site has the document in fr-CA (Quebeçois), the latter is suitable to meet the visitor's request.

This tag truncation rule effectively provides for encoding of the cladistics of languages, albeit without guaranteeing that the encoding will be reliable. To encode the cladistics we would need each clade to have a name and each of its sub-clade's names to be obtained from it by adding at least one more token on the right. This would mess up the nation-part naming implicit in the ISO 3166 reading of two-letter tokens in the second position: at least for the nations whose versions of English fit into some clade, the country code would have to come after the clade identifier, rather than as second token – e.g. en-colonial-AU rather than en-AU.

Furthermore, the more clades we identify the longer we'll have to make the tags; we have a Tudor-derived clade, so let's call that (somewhat inappropriately, but this is only for illustrative purposes) en-tudor; this contains en-tudor-US, en-tudor-educated and the clade of Commonwealth English, en-tudor-empire (say); within this last, in so far as the colonist-dominated form a clade, we get en-tudor-empire-colonial and only within that do we get en-tudor-empire-colonial-AU. In practice, no-one wants to be so complex about language codes, so we don't try to encode so much of the cladistics in them: 'Strine is actually identified by en-AU.

While it's eminently practical – and not necessarily in conflict with the cladistics – to eliminate a lot of the cladistic information, we are still left with en-educated as a pertinent clade: its Australian version, to take an example, is distinct from the local vernacular, so should be identified as en-educated-AU. Crucially, Educated English was imported to Australia alongside the vernacular, and has evolved primarily in harmony with the world-wide Educated English, so it shouldn't be regarded as a dialect of en-AU, i.e. it isn't en-AU-educated – that would denote a dialect of Australia's English vernacular ('Strine) identified with the educated classes (if, in fact, such a dialect exists): after a day writing a paper in en-educated-AU for some academic journal, the Professor goes home, throws a shrimp on the barbie, cracks open a tube of beer and uses en-AU-educated when chatting with the neighbours; it may sound a bit posh to them, but it's definitely 'Strine and any visiting Poms would be apt to find some of it rather confusing.

Now, it happens that there are no ISO 3166 codes for the several nations of The United Kingdom (although other parts of nations and subject territories do; for example the Falkland islands, the Faroe islands, Svalbard and Jan Mayen, Gibraltar, Greenland, Hong Kong and the United States Minor Outlying Islands: but not nations, like Tibet, that have been robbed of their independence). Thus there is no standard language tag (made of two two-letter tokens) for the forms of English spoken in these nations, but the tag en-GB does fit this pattern. That doesn't mean it's actually meaningful (any more than cy-JP, which would be a dialect of Welsh peculiar to Japan), for all that the HTTP specification uses it as an example and describes it as British English. Note, incidentally, that en-IE is Irish English and Ireland is part of Britain (though not of Great Britain, which technically means the biggest of the British isles, though it's commonly used as a synonym for The United Kingdom of Great Britain and Northern Ireland, which includes several of the smaller islands in the group). Thus calling en-GB British English is a further infelicity, since it omits en-IE, which is indisputably British (but not UKish).

Given the above discussion of cladistics, en-educated-GB would make sense – and is what is generally meant by en-GB – but as to vernacular English we would need en-England, en-Scotland and so on, just as we have en-US and en-IE. Now, of course, as a written medium, the web does generally use the educated form of any language, so an implicit -educated in en-GB is fairly harmless for writing: but the web is increasingly a multi-media domain, including the use of voice. An internet radio station would have good cause to advertise the language of its content, which may well be the vernacular: the lack of a standard form with which to identify that would cause complications. The late lamented John Ravenscroft was entirely fluent in en-educated-GB but is best known for radio broadcasts (under the pseudonym John Peel) in en-scouse.

