Late in 1999, a discussion on the python community's types-sig lead me to make some suggestions, motivated to some degree by my much earlier work for Malcolm Sabin at Fegs, for a self-bootstrapping class structure based on a minimalist conception of namespaces, with magic attributes to implement particular functionalities, which I view as the heart of python's goodness. The python community went a different way, for good reasons, and I've been left with the ideas to play with. As a tip of the hat towards python's naming, and towards four of the finest comedians in history, I've chosen the name GOON for the language that's evolving out of my ideas chasing up on this. It's generic, it's object-oriented, and namespaces are the fundamental idiom that makes it all work.
My thoughts on GOON are preliminary as yet (Spring 2003) and build
on what I wrote up in the course of thinking about python
and its potential for liberalisation. I intend that GOON do some form of
primitive boot-strapping which it can, at least, pretend it did in GOON; that
it support some structure for telling it, in the course of parsing a module,
to change the rules by which it parses the module (ideally, with enough
flexibility that it can flip into parsing TeX, XML, python, C, Lisp or ALGOL,
but perhaps FORTRAN would be asking a bit much) and even the semantics of what
it parses; that it support documentation strings in XML, complete with an apt
schema/DTD suitably tied to the introspection
data of the entities
whose documentation it is; and that the language (in general, not just the
docstrings) be canonicalisable to XML in some guise, with its plaintext
form being merely a natural presentation (e.g. via a style sheet) of its
canonical XML form.
I intend GOON's primitive form (which might be over-rideable by telling it
new parsing and semantics) to use a more extreme one namespace
variant
on the simple two namespaces
approach of python before 2.2; I intend to
unify all types, notably making class not be a key-word, but merely a
particular creation operator for which various alternates may readily be
defined, with perturbed semantics.
In python, any piece of code comes equipped with two namespaces: one
termed local, the other global. The local namespace pertains to the immediate
piece of code being executed, the global belongs to its context - almost
always the module in which the code appears. This is, in some ways, limiting:
at the same time, however, it lends a clarity and simplicity which is
immensely powerful. In python 2, this has been abandoned in favour
of nested scopes
; I do not intend that GOON follow suit - rather, it
shall solve various related problems by dealing with various issues
differently.
One of the issues motivating nested scopes is the desirability of granting
(for example) a function access to some values from the context in which it is
defined. A standard python-programming idiom for dealing with this is to give
the function a formal parameter with a default value - the context-supplied
value in question - and documenting that the function's callers
should not supply a value for the given parameter; such a parameter
is known by various names, but I call it a tunnel
between the context
and the function. Crucially, it enriches the local namespace into
which it tunnels; it has no effect on the global namespace.
The problem with tunnelling via the parameter list is that it can get messed up by argument passing, hence complicates what can and can't be done with calls to functions exercising it. My 1999 thoughts on python proposed some syntactic complications to the existing python argument list as a means of getting round this: the more brutal approach I intend to take with GOON is to provide a mechanism for explicit tunnelling. While this imposes a burden on the author of code, it should make it easier for maintainers to see which parts of one context are, or might be, accessed by a nested one.
As an example of how this could be handled consider extending python 1 by replacing
function-definition: 'def' name '(' parameter-list ')' ':' suite
with
function-definition: 'def' name '(' parameter-list ')' [ '(' tunnel-list ')' ] ':' suite
in which the specification of a tunnel-list looks a lot like that for a parameter-list, albeit with different semantics. Use of an empty tunnel-list should be equivalent to leaving it out altogether (it was optional) and leave the function definition meaning what it would in python 1. By separating the tunnels from the parameters (whose defaults, where given, will serve to import further values; but these may be over-ridden by arguments) we ensure that the tunnels don't get messed up by arguments.
The other problem with tunnelling, central to the motivation for nested
scopes
in python 2, is that it's a one-way mechanism for passing
values into the function; it doesn't provide for the function to rebind any
names from its context. While it is possible to get round this - use a
one-way tunnel to pass in a mutable object (list, dictionary, or instance) so
that context (or other functions defined in it and passed the same object) can
consult that object for modifications - it would certainly be nicer to provide
access to context's namespace directly by the use of its names as if they were
locals of the function. It thus makes sense to so specify the meaning
of tunnel-list that it provides for this kind of access to context's
namespace.
We thus have two types of tunnelling to support: a one-way mechanism,
providing a snapshot
value obtained from context when the function was
defined, in exactly the same manner as a parameter with default, save that it
cannot be over-ridden by callers of the function; and a two-way mechanism,
providing mutable access to context's name-space, allowing the
function's suite to re-bind names visible outside the function. It is
kinder to the compiler (or, more specifically, the garbage-collector and the
optimiser) to include an explicit statement of which names are tunnelled in
the second manner (so that the interpreter can tell when some of
context's names are never going to be referenced again, so can
be del'd). It is thus entirely natural to have plain names in
the tunnel-list (resembling positional parameters with no default)
specify two-way tunnelling of the given names,
allowing name=expression tunnels (resembling parameters
with defaults) to provide one-way tunnels. It is desirable to segregate the
two types, so let us require all two-way tunnels to appear before all one-way
ones, which will make tunnel-list's syntax agree with that
of parameter-list, except that the *name
and **name forms are not supported in
a tunnel-list.
The above probably makes parsing a bit tiresome; an alternative approach would be to define
tunnel-spec: 'tunnel' '(' tunnel-list ')'
and prefix both function-definition and class-definition with an optional tunnel-spec, as
function-definition: [ tunnel-spec ] 'def' name '(' parameter-list ')' ':' suite
and likewise for class definitions. One possible way to do this would be to have tunnel-spec take the form of a magic decorator, e.g. @tunnel(tunnel-list).
A possible implementation of this last provides another potential approach: allow each namespace to have a magic attribute __context__ to which the namespace falls back for handling names it doesn't, itself, bind. This could then proceed recursively, falling back to the __context__ of each namespace found in this way that has this attribute. The result could function much like the lexical scoping presently defined in python, with the module and each function providing its own namespace as __context__ to each function defined (possibly nested within arbitrarily many layers of class statements) in its scope. Then @tunnel() would merely be setting the __context__ of the function it wraps to a namespace it constructs, substituted in place of that of its lexical context.
It remains to decide what, if any, semantics to give to the *name and **name pseudo-tunnels, analogous to the equivalent pseudo-parameters. Note that my 1999 ruminations provided for parameter-lists using name-less variants on these; no parameter appearing after * can get its value from a positional argument, no parameter appearing after ** can get its value from a keyword argument; so nameless variants make it a type-error to supply more positional arguments than there are parameters before the nameless *, or to supply keyword arguments whose name doesn't match a (non-*) parameter before the nameless **.
Now, clearly, the * and ** pseudo-tunnels should serve
to specify tunnelling of all names from context into the function: albeit the
optimiser may well thin this down to only those names actually referenced by
the function; and it will presumably skip any names used in
the parameter-list or elsewhere in the tunnel-list. One
pseudo-tunnel will provide for mutable two-way access to them, the other will
freeze a snap-shot of context's namespace, when the function is defined, with
which to initialise, each time the function is called, the namespace in which
its suite is executed. The former is equivalent to listing all
context's names early in the tunnel-list; the latter is equivalent to
including name=name, for each of
context's names, later in the tunnel-list. The latter is just
like a from context import *
statement, so let
the name-less * pseudo-tunnel support it. This naturally
leaves the ** pseudo-tunnel to provide two-way access to the whole of
context's name-space.
It must be a syntax error to bind any name in both the parameter-list and the tunnel-list, or to bind any name twice in either list; thus any name bound explicitly in either must be skipped from the names implicitly bound by any pseudo-tunnel, of either kind. For the one-way pseudo-tunnel, this should be no problem: any name from context whose value the function needs to access can always be given an alias via an explicit one-way tunnel. However, for the two-way pseudo-tunnel, this may present problems - for example, class APIs may require certain methods to take keyword parameters, thereby requiring that the relevant names appear in parameter lists; such a method, as context, cannot get out of using that name and, if it defines a function to be used in similar manner, must use that name as a parameter of the function also; yet it may need the function to have two-way access to its own use of that name (one can get out of this by use of a named ** pseudo-parameter and a little hard work; but that's irksome in its own way).
This situation can readilly be handled by supporting a named version of the pseudo-tunnel, providing a dictionary or object to package context's namespace and binding that to the pseudo-tunnel's name. Since this would be mutable, it is only really suitable for the two-way pseudo-tunnel; fortunately, as noted above, it is unnecessary for the one-way pseudo-tunnel.
There remains the issue of which names the name-less **
pseudo-tunnel propagates: since it doesn't tell us which names it contributes
to our inner context, any reference or re-binding in the inner
context to a name that isn't a parameter or named tunnel could be construed as
referencing the outer name-space, so that only the parameters and
named tunnels would genuinely be local
names of the inner suite. This
would appear somewhat excessive. So, instead, restrict ** to only
those names actually bound (in one way or another) by context itself; save
that we must now, in addition to assignment and del, count ordinary
two-way tunnels as binding operations - by the context executing the statement
with the tunnel - on the names tunnelled.
As discussed in my 1999 ruminations, a python class is created after its suite has been executed; and this suite sees the module's name-space as globals, giving it no means, if defined within a function, to access locals of the function. I would certainly change both of these: create the class first, then execute the suite using the class' namespace as locals; I proposed previously that this suite should use, as globals, the locals of the most closely-enclosing function suite, if any, else the module as in python 1. An alternative to that proposal (either scrapping globals altogether or retaining python 1's choice of module as globals) would be to provide for the class definition to be able to tunnel values into the class namespace. To this end, it would make sense for the class statement to allow an optional '('tunnel-list')' following the existing '('bases-list')', which should probably be required in this case (omitting it is equivalent to supplying an empty bases-list, which would now be required explicitly), even if we're only allowing the tunnel-list to contain one-way tunnels - i.e. name=expression entries and/or the * pseudo-tunnel.
Is there any sense to two-way tunnels into a class (or kindred) statement ? This would give the class a rather odd relationship with the namespace in which is defined: this namespace serves somewhat like a base, in that some of its names - albeit only a specified subset, unless the ** pseudo-tunnel is used - are visible in the class namespace, with the curious twist that attempting to re-bind these names in the class namespace (or, presumably, as attributes of the class) would actually change the relevant name's value in context's namespace, rather than concealing that value behind a value in the class's own namespace (as when re-binding an attribute of an instance conceals a like-named attribute of the class from which it is derived). This potentially confusing complication might fairly be taken as an argument against two-way tunnels into classes.
However, if a method of a class is to be able to hold a two-way tunnel out to the context which defined the class, as would apear desirable, we must allow the class to have pulled in the two-way tunnel if only so that it can forward it to the method.
Particularly when a class only takes in a two-way tunnel so as to forward
it to some methods, but possibly in other contexts, it may be desirable to
have tunnels go away once used
. This can't be implemented by
applying del to the tunnelled name, since the whole point of a
two-way tunnel is to let re-binding (including un-binding) operations on the
name apply themselves to the context from which the name was tunnelled. So,
if we don't want the names tunnelled through a class namespace to be left over
in the class as a side-effect of tunnelling, we need some other way of making
them go away.
Furthermore, while rebinding a two-way tunnelled name in the suite of the class is properly redirected to affect context's version of the name, my main dislike of two-way tunnels into a class arose from the fact that, as attributes of the class, they would behave very strangely; re-binding the given name on the class after execution of the class statement would take its effect on the remnant of context's name-space that got preserved to carry the names two-way tunnelled out of it. In all other situations, re-binding a name on a class affects the attribute dictionary of the class itself: if a class inherits some attribute from a base, deling that attribute from the class does not affect the base. Having a strange exception to that might be fascinatingly useful, but it would also certainly be confusing (and liable to make for some bugs which would be very hard to track down).
These two problems may be resolved by one simple expedient: make two-way tunnels affect the suite into which they tunnel without affecting any persistent namespace being created by it. Inner suites into which a two-way tunnelled name is forwarded as a two-way tunnel will (in any case) be accessing the name in the origin namespace from which the name's two-way tunnelling journey begins (i.e. the context of the outermost statement in the nested chain of two-way tunnels leading to it), not from the immediate parent who only had access to the name via a two-way tunnel; so there is no conflict with the forwarding of tunnels (whether via classoids or functions), except in the case where the **name pseudo-tunnel is used. This would, with the given resolution, only provide name's name-space with entries for the actual locals of context, without any two-way tunnels context received. This would be at odds with the name-less variant, which would forward all two-way tunnels context received. I'm not quite sure what to make of this, in all honesty, but it doesn't worry me much.
This would make a class statement's suite see the two-way tunnels of the tunnel-list, and enable it to forward them to methods as required, without contributing those names to the class namespace once constructed.
Sanity check: suppose a method, of a class conforming to some API, defines a class that must conform to that API, so (as discussed when introducing the named pseudo-tunnel) has a method with a parameter with the same name as method which defined the class; if the inner variant on the method needs two-way access to the outer variant's parameter with the relevant name, it has to get it via a named pseudo-tunnel; can it ? We can two-way tunnel the name directly into the class but then we can't named-pseudo-tunnel it into the method; so we'll need to named-pseudo-tunnel it into the class, then forward the name used for this pseudo-tunnel to the method - either as a one-way tunnel or as a two-way one - which will work, so we win.
If a class has a base with an attribute which coincides with a name two-way tunnelled into the class, the base's attribute will be concealed for the duration of the class statement (which will be unable to over-ride the base's value for that name) but will subsequently provide the relevant attribute for the class. If a class needs to define a method with two-way access to what context provides under the same name as the method must have, the class will have to use a named pseudo-tunnel to bring in all context's names, then forward the pseudo-tunnel's name to the method (as either kind of tunnel) so that the method can access its eponymous attribute off the named pseudo-tunnel object.
The two-way form of tunnel should not depend on the relevant name having yet been bound (when the function tunnelling it is defined) in the context which defines the function; otherwise, defining recursive functions will require special treatment, and defining a mutually-recursive collection of functions would be severely tiresome. With this proviso, however, it becomes possible to entirely do away with the global namespace.
It is perhaps sensible to retain the built-in namespace; but even that
could be delivered via some base-class from which module
, class
and maybe even function evaluation namespace
are derived, so that every
module can see the built-in names simply as part of this common heritage. Any
suite whose locals do not inherit the built-ins would need even these names
tunnelled in explicitly; arranging for every suite to be able to see the
built-ins (either by having them as a separate name-space, or by having all
namespaces inherit from a built-in-carrying base) is justified by
functionality delivered by the built-ins being universal, generic and
ubiquitously needed. The advantage of having the builtins provided to suites
during execution by the interpreter, rather than having them inherited off a
base of every name-space, is that the latter would cause the builtins to
appear as attributes of every object, which may fairly be deemed
over-kill.