I have been an emacs user for nearly (in 2015) a quarter century now: my spinal column knows how to drive it without help from my brain. There's support in emacs for all manner of forms of data and ways of displaying it, for fetching data from (and storing it to) a marvelous assortment of resources via various protocols – for example, the ange-ftp mode of emacs enabled me to treat the entire internet (such as it then was) as my file-system way back in the early '90s. I can say much more in favour of emacs – yet it's not how I'd do the job if I started from scratch today (this was true in 2003, when I first wrote this, and remains in 2015).
I could also say ample against emacs, most of it
rants about the defects of elisp as a configuration and extension
language (for an example, if you're an emacs user, load up SGML-mode
and open a file with its HTML-mode: read the documentation
for sgml-tag-alist and skeletons (if you can find the latter) and see
if you can make any sense at all of what they're saying; try to match up that
documentation with the value of html-tag-alist; chose an enhancement to
the way the mode works (e.g. add lang="en" to the HTML element when
inserted; or put a newline after each LI inserted by an OL or UL – I
actually managed the latter, but the former defeated me) and see if you can work
out how to implement it). But this is neither the time nor the place for
that.
This has lead me to think about what an editor should be, how it
should be cut up into parts, what those parts are and how they should work
together. The parts should be so specified as to allow modular replacement
– i.e. different suppliers of software competing in each of the niches
implied, via an open standard that serves the consumer. The result, inevitably,
intrudes a long way into what the operating system should be, to support the
editor; indeed, I leave no crisp boundary between which parts are the
editor
and which parts are the operating system
; some components
integral to the editor will obviously be integral to other tools, so we might
call them parts of the operating system – but then the editor is a pretty
crucial part of the operating system, and some components of the editor may
later emerge as crucial components of non-editor tools as yet un-dreamed
of.
While I'll generally presume that the operating system is something essentially kindred to GNU/Linux in its form and foundation, it is worth noting that, from the point of view of the generality of users, the essential parts of an operating system are (enough infrastructure to let them log in and have this be a secure process if they so desire, and) graphical tools to
Specialist users will, naturally, have other needs: but these are the basics (as identified in an interesting article I read early in 2003) everyone needs. The given functionalities may be supplied by one program; or by several that cut across the classification above (e.g. my mailer is part of my editor; many folk use mailers embedded in their web browsers); but the user must have easy access to these three kinds of functionality.
Users should be able to specify private mime.types files, in a potentially cascading chain (embracing one shared with peers and a system one, for example); likewise, it would be nice to have a generic mechanism for specifying key-bindings so that the user can use any participating application without having to find out what its particular keyboard short-cuts are, and without having to configure each application to understand the given user's key-bindings.
Fundamentally, following the Unix tradition, a file is a sequence of octets (a.k.a. bytes); it may be stored on disk, it may be a stream coming from a physical device (with or without storage as its source there), it may be a stream going to a physical device, it may be a sequence of states of an oscillator, it may be many things, but it is a sequence of octets. The means by which one accesses a file (or byte-stream, or whatever you want to call it) include the means to identify how those bytes are to be understood. There may be several layers to that understanding: each layer builds upon the one below, but the octet-stream is the foundation on which all else is built.
The first layer of understanding transforms the byte-stream into a sequence
of tokens understood as characters
. In ASCII, each byte is itself a
character (with a redundant bit, which may be used as parity bit
to
enable a simple error-detection to detect mis-transmission); in ISO 8859's
various encodings, each byte encodes a member of a collection of characters,
with each such collection typically sufficing to describe the writing system of
some particular culture; in Unicode, more sophisticated encoding is used to
facilitate expression of every glyph known to any language whose users have yet
taken it into their heads to ask to be included in the catalogue. For an octet
stream to be useful, it needs to come with some (possibly implicit) information
about how it is to be transformed into characters – this information is
known as its encoding.
This should not be confused with encryption or compression, which may be used to package the octet-stream in some form from which the octet stream may be recovered. For my purposes, an encrypted or compressed octet stream is a separate octet-stream whose encoding, grammar and semantics are prerequisites of unpacking the encryption or compression to obtain the packaged octet-stream. Since such packaging need not be applied to an octet-stream – it can as readily be applied to a sequence of characters, lexical tokens, grammatical productions or, at least in principle, semantic atoms – it is best understood as a semantic layer whose comprehension yields the content it packages (which must then be parsed and, in its turn, comprehended).
The resulting sequence of characters is then classically understood as
a text
in some language
conforming to a grammar
characterized by various patterns. A sequence of characters matching a suitable
pattern gets construed into a single entity, a fragment of text characterized by
the pattern. A sequence of fragments of text matching a pattern (now attending
to the type of pattern characterizing each of the text fragments) gets construed
as a single entity likewise; and these likewise serve as constituents in larger
text fragments and so on, up to the entire document. [Contiguous sequences of
letters in the text you are now reading are construed as words; punctuators
group these words into clauses and the clauses into sentences; HTML mark-up
groups the sentences into paragraphs; and so on.] This pattern-matching process
is known as parsing
and provides a decomposition of the document into a
hierarchy of sub-texts associated with one another by the patterns by which they
are combined and individually characterized by the pattern their constituents
matched to form them.
The classic formalism of parsing goes via a layer
called lexical analysis
, which decomposes the sequence of characters
into lexical tokens
or lexemes
(roughly filling, in formal
grammars, the rôle of words
in the grammars of European and similar
languages). The resulting sequence of lexemes is then analyzed to establish its
grammatical structure. While this division is often convenient for those
implementing parsers, it is so mainly because the designers of the languages to
be so parsed have deliberately designed languages to be amenable to some such
low-level simplification; and the two phases of parsing are commonly designed
and specified as a single whole. I shall, thus, treat lexical structure as part
of grammar; lexical analysis as part of parsing.
Thus a file, interpreted via an encoding as a character sequence, needs an
associated grammar to characterize the structure of the character sequence. The
grammar is standardly communicated via a MIME type; in the case of some MIME
types, the grammar is enriched by a further specification of patterns to be
matched to refine the hierarchy implied by the MIME type (thus an XML document
also has a schema or a DTD to specify more detail about its grammar: matching
the patters of the plain XML grammar is described as well-formedness
, but
matching the DTD is a stronger condition – I think it's called conformance
– and the DTD thereby conveys richer information). For many MIME types
(specifically, those which do imply a hierarchical grammar) it should be
possible to specify a reversible transformation between any valid
document of the given type and an XML document conforming to some DTD associated
with the type; consequently, I generally treat all grammars as reducible to
XML and a suitable DTD
.
I'm here mainly concerned with text documents: for an audio or video stream, there are analogous stages of processing on the way from an octet stream to some representation amenable to play-back or display. That may involve an intermediate format, such as a family of MIDI streams for audio, that might be more amenable to editing. I'm not primarily thinking about such data types, though, so I shall leave the audio-visual afficionados to think about how much, if any, of my analysis is applicable to their work. Still, these days, even text documents tend to come with embedded images or audio-visual material, so I cannot ignore this entirely.
Finally, though usually integrated with the process of parsing, there may be
some layer of making sense of
the structured text obtained by parsing the
sequence of characters – this will generally be some form of processing or
analysis of the data taking into account its meaning
. (Note that this is
where there is information – separated from the data by a MIME type, an
encoding and whatever is making sense of
the result of parsing.) As
noted above, the sense made of the result of parsing may, itself, merely be to
interpret it as the result of compressing or encrypting a further sequence of
octets: in this case, the system making sense of the parsed data may simply
transform it into a fresh byte-stream (or character stream, or even a parse
tree). In any case, the thing which makes sense of
the result of parsing
may reasonably be presumed to always be a program (though a family of
inter-operating programs may all interpret the parsed data the same way
)
interpreting the document in a manner typically specified with the MIME type,
DTD or similar document; this is known as the semantics
of the
document.
The file-system should provide a service that amounts to the internet – a virtual disk which is managed by a daemon which talks the protocols (as client), handles caching, transforms file opening into suitable-protocol requests, etc.; this should not be a piece of functionality of any particular application (editor, web browser, helper application launched by either) but a service shared by them all.
Its handling of caching issues requires that applications have some simple way of telling it when they take an interest in a file and when they lose that interest (rather than the web browser launching a helper application, then either keeping the cached document for ever, or flushing it from cache before the user has finished viewing it).
The daemon will need to understand, if using a local proxy, which resources it can rely on the proxy to provide (e.g. when several colleagues read some cartoon daily, some shared machine should cache it in a proxy, rather than each of them caching the image locally; but when I book an airline ticket, my local daemon should record the response I got from the server and not request it again unless I actually click on the relevant button of the page that accessed it). Note that this daemon is not a proxy in the terms of the W3C specs: it is (in those terms) part of the user agent. As such, it needs to be able to distinguish between a request to re-fetch and an application merely wanting to re-display what was last fetched; yet, it probably should also know about caching aspects of the protocols, so that it will transparently fetch any resource for which that is appropriate, even without being asked to re-fetch. It also needs to be capable of recognising that POST, PUT and DELETE actions on a resource may invalidate any cached value for it; in particular, user applications should let it know when they take such actions.
The document exists, within the editor, in two forms: the buffer contains
the byte-stream (and may well carry much of a parse tree with it); this may be
displayed via several portals (each of which may, in turn show up in several
windows, to which I'll return). Each portal has a way of understanding the
buffer – for example, if the buffer's contents are HTML encoded in
Unicode, one can display the raw bytes, one can display the text they encode
(but still show the HTML tags and so on) or one can display the web page
expressed by the HTML document. There may also be developer-mode
portals, such as a view of an HTML document in which selection of a particular
rule in an associated style sheet causes elements affected by that rule, or a
view of source code in which activity in a debugger session (in its own portal)
will highlight the currently active code and permit interaction with variables
in the code to display their current values.
Each portal has its own history of significant positions recently visited (in emacs parlance: point-and-mark chain); and each may be read-only or mutable, independently of other portals on the same document (though having only one mutable portal may be prudent). All portals on a document share one copy of the document, however; one operating-system resource (file, whatever) might be opened as several documents, though this will tend to create complications and won't be the normal policy. Auto-save policy and strategy, if any, will be a property of the document (not the portal, nor the OS resource).
One portal may be displayed in several windows at a single time – e.g. so that one can scroll to different positions in the portal to view them side-by-side. Indentation and margins are portal properties, although the saved document may have its own properties describing this at each of the various levels of processing from the byte-stream upwards. Paragraph re-flow policy will be a property of the portal; for read-only portals this is reflow-to-display but a mutable portal may have a (potentially separate) policy for where the saved form of the document shall contain line-breaks and how it is indented. Long line truncation-or-reflow policy will be a property of the window, but the portal may provide default settings for this on window-creation. For an HTML document, in a text-level portal showing the HTML tags the indentation and line-break policies are matters of actual line-breaks and spacing at the start of each line. For an HTML-view of the same document, they'll be CSS properties, with all the complications that implies for how they are expressed via CSS (on a local element, on all elements mtching a given selector; in the HTML file, or in which of the cascade of imported CSS resourcs).
Most of the important sophistication of the editor will focus on portals. As emacs has editing modes, so edyt must have classes of portal. I believe the kindest way to let users re-configure portals to match user preferences is to use python (or something very similar) as the configuration language, with each type of portal relating to a particular class derived from a portal base-class. Customizing a portal-type would then be achieved by sub-classing an existing portal-type and having the sub-class tweak relevant attributes and behaviours.
Valid CSS ? Valid HTML ? Written by Eddy.