Generic Object-Oriented Namespaces

Late in 1999, a discussion on the python community's types-sig lead me to make some suggestions, motivated to some degree by my much earlier work for Malcolm Sabin at Fegs, for a self-bootstrapping class structure based on a minimalist conception of namespaces, with magic attributes to implement particular functionalities, which I view as the heart of python's goodness. The python community went a different way, for good reasons, and I've been left with the ideas to play with. As a tip of the hat towards python's naming, and towards four of the finest comedians in history, I've chosen the name GOON for the language that's evolving out of my ideas chasing up on this. It's generic, it's object-oriented, and namespaces are the fundamental idiom that makes it all work.

My thoughts on GOON are preliminary as yet (Spring 2003) and build on what I wrote up in the course of thinking about python and its potential for liberalisation. I intend that GOON do some form of primitive boot-strapping which it can, at least, pretend it did in GOON; that it support some structure for telling it, in the course of parsing a module, to change the rules by which it parses the module (ideally, with enough flexibility that it can flip into parsing TeX, XML, python, C, Lisp or ALGOL, but perhaps FORTRAN would be asking a bit much) and even the semantics of what it parses; that it support documentation strings in XML, complete with an apt schema/DTD suitably tied to the introspection data of the entities whose documentation it is; and that the language (in general, not just the docstrings) be canonicalisable to XML in some guise, with its plaintext form being merely a natural presentation (e.g. via a style sheet) of its canonical XML form.

I intend GOON's primitive form (which might be over-rideable by telling it new parsing and semantics) to use a more extreme one namespace variant on the simple two namespaces approach of python before 2.2; I intend to unify all types, notably making class not be a key-word, but merely a particular creation operator for which various alternates may readily be defined, with perturbed semantics.

One Namespace

In python, any piece of code comes equipped with two namespaces: one termed local, the other global. The local namespace pertains to the immediate piece of code being executed, the global belongs to its context - almost always the module in which the code appears. This is, in some ways, limiting: at the same time, however, it lends a clarity and simplicity which is immensely powerful. In python 2, this has been abandoned in favour of nested scopes; I do not intend that GOON follow suit - rather, it shall solve various related problems by dealing with various issues differently.

One of the issues motivating nested scopes is the desirability of granting (for example) a function access to some values from the context in which it is defined. A standard python-programming idiom for dealing with this is to give the function a formal parameter with a default value - the context-supplied value in question - and documenting that the function's callers should not supply a value for the given parameter; such a parameter is known by various names, but I call it a tunnel between the context and the function. Crucially, it enriches the local namespace into which it tunnels; it has no effect on the global namespace.

Tunnels and Tunnelling

The problem with tunnelling via the parameter list is that it can get messed up by argument passing, hence complicates what can and can't be done with calls to functions exercising it. My 1999 thoughts on python proposed some syntactic complications to the existing python argument list as a means of getting round this: the more brutal approach I intend to take with GOON is to provide a mechanism for explicit tunnelling. While this imposes a burden on the author of code, it should make it easier for maintainers to see which parts of one context are, or might be, accessed by a nested one.

As an example of how this could be handled consider extending python 1 by replacing

   function-definition: 
	'def' name '(' parameter-list ')' ':' suite

with

   function-definition:
	'def' name '(' parameter-list ')' [ '(' tunnel-list ')' ] ':' suite

in which the specification of a tunnel-list looks a lot like that for a parameter-list, albeit with different semantics. Use of an empty tunnel-list should be equivalent to leaving it out altogether (it was optional) and leave the function definition meaning what it would in python 1. By separating the tunnels from the parameters (whose defaults, where given, will serve to import further values; but these may be over-ridden by arguments) we ensure that the tunnels don't get messed up by arguments.

The other problem with tunnelling, central to the motivation for nested scopes in python 2, is that it's a one-way mechanism for passing values into the function; it doesn't provide for the function to rebind any names from its context. While it is possible to get round this - use a one-way tunnel to pass in a mutable object (list, dictionary, or instance) so that context (or other functions defined in it and passed the same object) can consult that object for modifications - it would certainly be nicer to provide access to context's namespace directly by the use of its names as if they were locals of the function. It thus makes sense to so specify the meaning of tunnel-list that it provides for this kind of access to context's namespace.

We thus have two types of tunnelling to support: a one-way mechanism, providing a snapshot value obtained from context when the function was defined, in exactly the same manner as a parameter with default, save that it cannot be over-ridden by callers of the function; and a two-way mechanism, providing mutable access to context's name-space, allowing the function's suite to re-bind names visible outside the function. It is kinder to the compiler (or, more specifically, the garbage-collector and the optimiser) to include an explicit statement of which names are tunnelled in the second manner (so that the interpreter can tell when some of context's names are never going to be referenced again, so can be del'd). It is thus entirely natural to have plain names in the tunnel-list (resembling positional parameters with no default) specify two-way tunnelling of the given names, allowing name=expression tunnels (resembling parameters with defaults) to provide one-way tunnels. It is desirable to segregate the two types, so let us require all two-way tunnels to appear before all one-way ones, which will make tunnel-list's syntax agree with that of parameter-list, except that the *name and **name forms are not supported in a tunnel-list.

The above probably makes parsing a bit tiresome; an alternative approach would be to define

  tunnel-spec:
	'tunnel' '(' tunnel-list ')'

and prefix both function-definition and class-definition with an optional tunnel-spec, as

   function-definition: 
	[ tunnel-spec ] 'def' name '(' parameter-list ')' ':' suite

and likewise for class definitions. One possible way to do this would be to have tunnel-spec take the form of a magic decorator, e.g. @tunnel(tunnel-list).

A possible implementation of this last provides another potential approach: allow each namespace to have a magic attribute __context__ to which the namespace falls back for handling names it doesn't, itself, bind. This could then proceed recursively, falling back to the __context__ of each namespace found in this way that has this attribute. The result could function much like the lexical scoping presently defined in python, with the module and each function providing its own namespace as __context__ to each function defined (possibly nested within arbitrarily many layers of class statements) in its scope. Then @tunnel() would merely be setting the __context__ of the function it wraps to a namespace it constructs, substituted in place of that of its lexical context.

Pseudo-Tunnels

It remains to decide what, if any, semantics to give to the *name and **name pseudo-tunnels, analogous to the equivalent pseudo-parameters. Note that my 1999 ruminations provided for parameter-lists using name-less variants on these; no parameter appearing after * can get its value from a positional argument, no parameter appearing after ** can get its value from a keyword argument; so nameless variants make it a type-error to supply more positional arguments than there are parameters before the nameless *, or to supply keyword arguments whose name doesn't match a (non-*) parameter before the nameless **.

Now, clearly, the * and ** pseudo-tunnels should serve to specify tunnelling of all names from context into the function: albeit the optimiser may well thin this down to only those names actually referenced by the function; and it will presumably skip any names used in the parameter-list or elsewhere in the tunnel-list. One pseudo-tunnel will provide for mutable two-way access to them, the other will freeze a snap-shot of context's namespace, when the function is defined, with which to initialise, each time the function is called, the namespace in which its suite is executed. The former is equivalent to listing all context's names early in the tunnel-list; the latter is equivalent to including name=name, for each of context's names, later in the tunnel-list. The latter is just like a from context import * statement, so let the name-less * pseudo-tunnel support it. This naturally leaves the ** pseudo-tunnel to provide two-way access to the whole of context's name-space.

It must be a syntax error to bind any name in both the parameter-list and the tunnel-list, or to bind any name twice in either list; thus any name bound explicitly in either must be skipped from the names implicitly bound by any pseudo-tunnel, of either kind. For the one-way pseudo-tunnel, this should be no problem: any name from context whose value the function needs to access can always be given an alias via an explicit one-way tunnel. However, for the two-way pseudo-tunnel, this may present problems - for example, class APIs may require certain methods to take keyword parameters, thereby requiring that the relevant names appear in parameter lists; such a method, as context, cannot get out of using that name and, if it defines a function to be used in similar manner, must use that name as a parameter of the function also; yet it may need the function to have two-way access to its own use of that name (one can get out of this by use of a named ** pseudo-parameter and a little hard work; but that's irksome in its own way).

This situation can readilly be handled by supporting a named version of the pseudo-tunnel, providing a dictionary or object to package context's namespace and binding that to the pseudo-tunnel's name. Since this would be mutable, it is only really suitable for the two-way pseudo-tunnel; fortunately, as noted above, it is unnecessary for the one-way pseudo-tunnel.

There remains the issue of which names the name-less ** pseudo-tunnel propagates: since it doesn't tell us which names it contributes to our inner context, any reference or re-binding in the inner context to a name that isn't a parameter or named tunnel could be construed as referencing the outer name-space, so that only the parameters and named tunnels would genuinely be local names of the inner suite. This would appear somewhat excessive. So, instead, restrict ** to only those names actually bound (in one way or another) by context itself; save that we must now, in addition to assignment and del, count ordinary two-way tunnels as binding operations - by the context executing the statement with the tunnel - on the names tunnelled.

Classes (and their kin)

As discussed in my 1999 ruminations, a python class is created after its suite has been executed; and this suite sees the module's name-space as globals, giving it no means, if defined within a function, to access locals of the function. I would certainly change both of these: create the class first, then execute the suite using the class' namespace as locals; I proposed previously that this suite should use, as globals, the locals of the most closely-enclosing function suite, if any, else the module as in python 1. An alternative to that proposal (either scrapping globals altogether or retaining python 1's choice of module as globals) would be to provide for the class definition to be able to tunnel values into the class namespace. To this end, it would make sense for the class statement to allow an optional '('tunnel-list')' following the existing '('bases-list')', which should probably be required in this case (omitting it is equivalent to supplying an empty bases-list, which would now be required explicitly), even if we're only allowing the tunnel-list to contain one-way tunnels - i.e. name=expression entries and/or the * pseudo-tunnel.

Is there any sense to two-way tunnels into a class (or kindred) statement ? This would give the class a rather odd relationship with the namespace in which is defined: this namespace serves somewhat like a base, in that some of its names - albeit only a specified subset, unless the ** pseudo-tunnel is used - are visible in the class namespace, with the curious twist that attempting to re-bind these names in the class namespace (or, presumably, as attributes of the class) would actually change the relevant name's value in context's namespace, rather than concealing that value behind a value in the class's own namespace (as when re-binding an attribute of an instance conceals a like-named attribute of the class from which it is derived). This potentially confusing complication might fairly be taken as an argument against two-way tunnels into classes.

However, if a method of a class is to be able to hold a two-way tunnel out to the context which defined the class, as would apear desirable, we must allow the class to have pulled in the two-way tunnel if only so that it can forward it to the method.

Two-Way Tunnels Are Transient

Particularly when a class only takes in a two-way tunnel so as to forward it to some methods, but possibly in other contexts, it may be desirable to have tunnels go away once used. This can't be implemented by applying del to the tunnelled name, since the whole point of a two-way tunnel is to let re-binding (including un-binding) operations on the name apply themselves to the context from which the name was tunnelled. So, if we don't want the names tunnelled through a class namespace to be left over in the class as a side-effect of tunnelling, we need some other way of making them go away.

Furthermore, while rebinding a two-way tunnelled name in the suite of the class is properly redirected to affect context's version of the name, my main dislike of two-way tunnels into a class arose from the fact that, as attributes of the class, they would behave very strangely; re-binding the given name on the class after execution of the class statement would take its effect on the remnant of context's name-space that got preserved to carry the names two-way tunnelled out of it. In all other situations, re-binding a name on a class affects the attribute dictionary of the class itself: if a class inherits some attribute from a base, deling that attribute from the class does not affect the base. Having a strange exception to that might be fascinatingly useful, but it would also certainly be confusing (and liable to make for some bugs which would be very hard to track down).

These two problems may be resolved by one simple expedient: make two-way tunnels affect the suite into which they tunnel without affecting any persistent namespace being created by it. Inner suites into which a two-way tunnelled name is forwarded as a two-way tunnel will (in any case) be accessing the name in the origin namespace from which the name's two-way tunnelling journey begins (i.e. the context of the outermost statement in the nested chain of two-way tunnels leading to it), not from the immediate parent who only had access to the name via a two-way tunnel; so there is no conflict with the forwarding of tunnels (whether via classoids or functions), except in the case where the **name pseudo-tunnel is used. This would, with the given resolution, only provide name's name-space with entries for the actual locals of context, without any two-way tunnels context received. This would be at odds with the name-less variant, which would forward all two-way tunnels context received. I'm not quite sure what to make of this, in all honesty, but it doesn't worry me much.

This would make a class statement's suite see the two-way tunnels of the tunnel-list, and enable it to forward them to methods as required, without contributing those names to the class namespace once constructed.

Sanity check: suppose a method, of a class conforming to some API, defines a class that must conform to that API, so (as discussed when introducing the named pseudo-tunnel) has a method with a parameter with the same name as method which defined the class; if the inner variant on the method needs two-way access to the outer variant's parameter with the relevant name, it has to get it via a named pseudo-tunnel; can it ? We can two-way tunnel the name directly into the class but then we can't named-pseudo-tunnel it into the method; so we'll need to named-pseudo-tunnel it into the class, then forward the name used for this pseudo-tunnel to the method - either as a one-way tunnel or as a two-way one - which will work, so we win.

If a class has a base with an attribute which coincides with a name two-way tunnelled into the class, the base's attribute will be concealed for the duration of the class statement (which will be unable to over-ride the base's value for that name) but will subsequently provide the relevant attribute for the class. If a class needs to define a method with two-way access to what context provides under the same name as the method must have, the class will have to use a named pseudo-tunnel to bring in all context's names, then forward the pseudo-tunnel's name to the method (as either kind of tunnel) so that the method can access its eponymous attribute off the named pseudo-tunnel object.

No globals

The two-way form of tunnel should not depend on the relevant name having yet been bound (when the function tunnelling it is defined) in the context which defines the function; otherwise, defining recursive functions will require special treatment, and defining a mutually-recursive collection of functions would be severely tiresome. With this proviso, however, it becomes possible to entirely do away with the global namespace.

It is perhaps sensible to retain the built-in namespace; but even that could be delivered via some base-class from which module, class and maybe even function evaluation namespace are derived, so that every module can see the built-in names simply as part of this common heritage. Any suite whose locals do not inherit the built-ins would need even these names tunnelled in explicitly; arranging for every suite to be able to see the built-ins (either by having them as a separate name-space, or by having all namespaces inherit from a built-in-carrying base) is justified by functionality delivered by the built-ins being universal, generic and ubiquitously needed. The advantage of having the builtins provided to suites during execution by the interpreter, rather than having them inherited off a base of every name-space, is that the latter would cause the builtins to appear as attributes of every object, which may fairly be deemed over-kill.

Written by Eddy.