]> Primitive Parsing

Primitive Parsing

Generally, I take it as given that the reader reads text according to the usual manner for text in the natural language I write (here, English). I augment the natural language by specifying meanings for various forms of text that would not otherwise be meaningful in it and I refine it by giving specialised meaning to some of its words and forms of expression.

See also my discussion of context – and formalised natural language, of which this page is (at least partially) a reworking.

Words and symbols

When I introduce a specialised meaning for a word, that meaning supersedes any meaning that word normally has. I may specify, when defining it to take the specialised meaning, that it only does so for the purposes of the present formula, paragraph, section or page; or that it only does so in particular contexts (e.g. when used in a particular phrasing or applied to an entity of some particular type); otherwise, it tacitly takes that meaning throughout my writings. In so far as I am disciplined, other pages relying on such a specialised meaning shall link to the definition, or to a glossary or bestiary entry that links to the definition, at least when first they do so. When a definition binds such a specialised meaning to a word, to the extent that natural language derives other forms of the word for use in diverse contexts of the language's grammar, I tacitly extend the specialised meaning also to these derived forms.

For example, when defining a relation is less than on some kind of number, I tacitly use the same relation to give meaning to the least of several such numbers. In primitive discussions I may take the trouble to introduce is greater than separately (typically to something that shall prove to be the reverse of is less than); but, otherwise, I may tacitly use the reverse in this rôle. In a context with both defined, I'll use standard idioms such as at least, at most, more than, maxiumum, minimum; along with the parallel use of higher for greater and lower for less. Thus an upper bound on some numbers is a number which is greater than or equal to each of the given numbers; and a least upper bound is an upper bound which is less than or equal to every other upper bound. In this particular case, I should also note that I don't use bigger and smaller as simple synonyms for greater and less; instead, these compare the magnitudes or absolute values of things – all negative values are less than all positive ones, while a low positive value is small, as is its negation, and a high positive value is large, as is its negation; thus a big negative value is less than a small positive one, yet bigger than it.

As well as refining the meanings of existing words, I give meaning to texts that would not otherwise be meaningful, at least in the vernacular – although English (like many other languages) has now been in use for several centures by cultures in which at least a fairly basic level of mathematical education has been ubiquitous, with the result that there is at best a fuzzy boundary between refining existing meaning and giving meaning to texts that would otherwise have no meaning. Such texts use symbols and punctuators in specialised ways and commonly associate values, for the duration of the text, with words or texts having form loosely similar to that of words.

For the purposes of giving specialized meaning to one, a word is here just a text, each character of which is a word character, not immediately preceded or followed by word characters. The word characters are, for these purposes, the letters of the alphabet, the decimal digits and the hyphen character, -, as used in such compound words as mid-point. (Note that the hyphen is a quite different character from the em-dash, the en-dash or the minus sign.) To illustrate all of the word-characters valid in English text, the text between quotes in abcdefghijklmnopqrstuvwxyz-0123456789 uses every word character. (Translations to another language might sensibly, however, use quite other characters as well as these or instead of them.) In particular, the space character which appears between words is not a word character, nor are the usual punctuation characters used normally in non-specialised English. Thus a sequence of letters between spaces is a word, for my purposes, even if there is no dictionary of English that recognises it as such and no opponent in a game of Scrabble® who would allow it into play.

In contrast to words, a symbol is a single rendered (i.e. in visual media it is made visible, in audio it is voiced; it is not simply the gap between other things) character, other than the word characters, to which context has given meaning as such. Some of these are letters from other alphabets; while a Greek translation of my pages might use αλφα as a word, in English I shall not, but shall make use of individual letters of the (ancient) Greek alphabet as symbols. Other symbols are specialised symbols provided in Unicode (and related standards) to match common mathematical usage; for example, × may look somewhat like x or &on; may look somewhat like o but, in each case, the first is a symbol that looks somewhat like an alphabetic letter.

In the raw text of the HTML or XHTML I write, many symbols are actually represented as sequences of word characters between an ampersand & and ; a semicolon; such a sequence is known as a character entity and should normally be rendered as a single character; thus I type α to include the symbol α in the rendered text. It is also possible to use numeric character entities, in which the word between & and ; is replaced by a # sign followed by the Unicode code point number for the symbol; thus I can type α to get α rendered (hopefully the same as α in the previous). I am not, however, particularly good at memorising haphazard mappings from numbers to symbols, where I can typically remember a symbol's name, if I am familiar with its orthodox use.

One of the reasons I use XHTML for many of my pages is that it supports the inclusion of a preamble that lets me associate names of my choosing with character entities. This lets me have an easily-read (in the raw text) and easily-remembered text to represent characters I cannot type directly. I shall not remember that ∪ is the unicode symbol for union, nor would a reader viewing that token as such (if their user agent fails to identify (or find) ∪ as its proper display-form) have any clue that it means unite. I could use the HTML entity name ∪ for it, but this name is a metaphor for its shape, where I prefer to name the operator it denotes. For , I am not aware of an entity name; in HTML ℏ is the only way I know to write it; and I doubt I shall ever remember that 8463 is the right number. So I type &unite; or ℏ and rely on my XHTML header to map it to the right entity for rendering purposes. I have, thus far, only tested that Opera (version 8, and later) copes with the result (and its bold &on;, where used, isn't obviously bold): it is possible other browsers will have problems with this. Please let me know if so … and be aware that even browsers that can cope may fail if needed fonts are not installed.

One of the common ways I give a word meaning is as a name by which to refer to some value; I use some symbols the same way, most obviously the letters from other alphabets. In particular, I follow orthodoxy in having π stand for the ratio of diameter to circumference of a circle in a Euclidean plane. This last is an example of a meaning used everywhere (I doubt I ever use π to mean anything else), but other namings of values are commonly quite localised, typically to one instantiation of a template; see below.

Where symbols are used other than as names for values or relations (which some contexts might not regard as values), they are commonly used as binary operators or as the literal elements of templates. In any case, it is context's responsibility to establish their meanings; so responsible authors should link to something that explains those meanings when first they make use of a symbol.

Enclosing sub-texts

When reading a text, the reader is tacitly expected to decompose it into a hierarchy of sub-texts – paragraphs made up of sentences made up of words; derivations made up of equality assertions made up of expressions (to assert equal); proofs made up of assertions and inferences. The templates I define here serve to augment that hierarchy; they specify how, on recognising some parts of the text as matching the components of a template, arranged in the manner prescribed by the template, to read these components collectively as a use of the template. In making sense of a sub-text, the surrounding text contributes to its context; for example it may give meaning to some words and symbols, e.g. as names and operators. The reader conceptually replaces each sub-text with an atom of meaning that encapsulates the meaning of the sub-text; the enclosing text's meaning is arrived at by using this atom in place of the sub-text.

However, that can get complicated. We might have a sequence of ten tokens (words, symbols, etc.) in a text and find that the first seven of them match some template which gives them meaning, collectively, as a value; and that this, taken with the remaining three, matches another template that gives the whole meaning. So far, so good: but what if we also find that the last six tokens match some template, that gives them meaning as a value and that this, taken with the preceding four, matches a further template that gives the whole meaning ? Of course, if the two meanings of the whole coincide, all is well; but we have no guarantee that this shall always be the case. Such a text would be ambiguous: while I'm keen to be able to support ambiguity where it's intended, it's important to also enable authors to eliminate it where it isn't.

To this end, I define various enclosures of text (matching entirely orthodox rules of parsing); these are bounded by matched start and end delimiters and constrain the decomposition of text into sub-texts. The basic rule is that every sub-text that contains the start of an enclosure must contain the matching end enclosure, and vice versa. For that to be well-defined, I must specify what matching means, in this context. I'll describe a text as balanced precisely if: either it includes no use of { ( ; { ; [ ; ] ; } ; ) }; or it is of one of the following forms:

( text )
{ text }
[ text ]
[ text …]

with each text being balanced. I refer to the { ( ; { ; [ } in which a text of one of the first three types begins as opening an enclosure and the { ] ; } ; ) } in which it ends as closing that enclosure; I describe each as matching the other. Notice that text may include some opening and closing parentheses, braces or brackets, just so long as they appear in appropriately matched pairs; such enclosures are said to be nested within the primary enclosure of a particular match to one of these templates. Even when they use a character, at start or end, identical to the opening or closing of the template, the template's uses of the opening and closing match only each other, not any of those of nested enclosures.

For the purposes of my meta-denotations for templates, each literal text is treated as if it were balanced (so that { … } or [ … ] can enclose it and be balanced) even if (as actually exercised in the specification above) the literal in question is in fact one of the openings or closings. Each template as a whole shall use literal openings and closings in such ways that any text matching the template (as a whole) is indeed balanced. Every text denoted by a word in any template is constrained to be balanced.

I also use quotes and other standard features of the grammar of natural language (such as punctuation and phrasing) to delimit sub-texts; these operate just the same as a template's delimiting of its parts with literals, splitting the text into sub-texts. It is only necessary to use a formal enclosure, as just described, where such natural grammar would leave ambiguity – although it may, sometimes, be clearer to include some enclosures even when not needed.

Quantifying over names

Each use of a name in a formulaic text requires its context to give that name meaning, typically as a denotation for a value; by default, each denotational template propagates this requirement, from the sub-texts it combines, to whatever text is using the compound as a sub-text matched by the template. However, a denotational template may specify that each component sub-text it describes may quantify over or quantifies over names used in some of its components, in so far as context has not given them meaning.

For each use of each name, we can identify the smallest text around that use, in the hierarchy of sub-texts discussed above, for which that use appears within one of the may quantify components of the denotational template describing the given text (i.e. the first step back up the parse-tree at which the name can be quantified); call this sub-text a value-provider for that use of that name; if this is a sub-text of the value-provider for some other use of that name, call it a forwarding value-provider of that name; each use of each name then appears in a unique non-forwarding value-provider for that name; which quantifies over that name. Only {} denotatons for collections, denotations for relations and [for] {a;an;any;some} … texts do quantify over any names.

All quantification over names is, in any reading of the text, subject to assertions imposed by sub-texts exercising that name in the given reading. Note, however (e.g. in an expression sequence, below) that some constraints may be ignored. A text which quantifies over any names has meaning for each choice of meaning, for each such name, under which the text satisfies all constraints imposed by context and its sub-texts. The particular template which allows or imposes the quantification indicates how it combines the diverse meanings this gives it, typically collecting them together in some way.

Valid CSSValid XHTML 1.1 Written by Eddy.