In python there's a lovely construct:
from namespace import names
This transcribes the requested names from the nominated namespace into the local namespace of the suite in which it appears. As it stands, the namespace has to be a module or package, but I want to treat all namespaces equally. Certainly the transcription process can work for arbitrary namespace: it remains to decide how to make sense of a dot-joined sequence of identifiers (which is all that has ever appeared to date as the namespace in a from statement) in some way that leaves it meaning the same as existing code needs it to.
The process by which namespace is turned into a namespace is the same as that used by `import namespace'. This tries using the given dot-joined sequence of identifiers as a key into sys.modules: on success, the thing imported is the module object found for that key. On failure, it does a real import and installs the imported module against that key in sys.modules so that it subsequent lookups will get early success hence cheap import. This is a classic lazy lookup, but I'll return to that.
Now, package all that lookup process as a namespace object, which I'll refer to as `the importer'. An import statement, aside from the current local, global and builtin namespaces of its suite, has access to the importer's namespace. Attempts at reading this namespace may involve serious activity under the bonnet (to load a module), so I expect its place in the pecking order to be later than the globals, possibly after or in place of the builtins. Values are read from this namespace (possibly in order to find some other namespace from which to read values, etc.) and transcribed into the local namespace.
Dissect this namespace in order to know what it should be doing. Now, sys.modules has the odd property that it contains some dotted names as keys; and a namespace doesn't get asked about these - they're there to facilitate optimisations under the bonnet. So my namespace object has, in effect, a __dict__ which provides access to the undotted entries in sys.modules. (Pause to consider, though:
def tryattrs(obj, keys):
i = len(keys)
while i > 0:
key = string.joinfields(keys[:i], '.')
try: new = getattr(obj, key)
except AttributeError: i = i - 1
else: return new, keys[i:]
raise AttributeError, (obj,) + tuple(keys)
def getattrs(obj, *keys):
while keys:
obj, keys = tryattrs(obj, keys)
return obj
as an alternative to the simple familiar of getattr: it's effectively part of what the lazy lookup in sys.modules is doing). Looking up a sub-module of some package must be done by attribute lookup on the package: which requires the package to support lazy lookup in a manner closely related to that of the raw importer. So, time to take a look at laziness.
Take an example: a class whose instances represent physical quantities. These support the usual arithmetic, keeping track of the physical dimensions of the resulting quantities. `Most' instances will be created on-the-fly during computations and only ever have their arithmetic functionality exercised. Suppose, however, that their representation is expensive to compute (I want to coerce numbers into `power of 1000' times a number in some fixed range, e.g. 1 to 1000 or .1 to 100; and representation of the units requires care, ideally spotting when to say `J' rather than `kg.m.m/s/s' and similar). Some quantities may get represented several times, so we want to cache the result of computing the representation, but most quantities never need it so we don't want to compute the representation during `normal' initialisation.
So a quantity's __repr__ looks to see whether it has the cached value, returning that if so; if not, it computes and caches the result. Since representation is implemented by a method interface, we can use a hidden attribute, so this works fine. On the other hand, we might want, for instance, to make the units-part and number-part accessible as part of the public interface of a quantity: the above would oblige us to provide access to them by calling a method which returns the cached value. Such a method returns the same value every time, computing it the first time and caching it thereafter. Once we've computed the cached value, we'd far sooner be just reading that instead of calling the method; but we have to call the method every time.
To get out of that, we can arrange for __getattr__ (which gets invoked after all other attribute lookup has failed on the given key) to do half of the administration of this - compute the value, store it, return it - while removing the need for a `try: return self._repr ... except AttributeError:' wrapper on one-off computation, since it is exactly when the attempted attribute lookup would have failed that __getattr__ is being called. Illustration:
class Lazy:
# ...
def __getattr__(self, key):
# ...
if key and key[-1] != '_':
try: method = getattr(self, '_lazy_get_' + key + '_')
except AttributeError: pass
else:
result = method(key)
setattr(self, key, result)
return result
# ...
raise AttributeError, (self, key)
class Value (Lazy):
# ...
def __repr__(self): return self._repr
def _lazy_get__repr_(self, key):
# compute self's representation and return it.
This works much better for a wide variety of things: a class whose objects represent filenames might support attributes stem, extension, basename, dirname, etc. this way (so code using the object just accesses these attributes in the normal way, automatically computing each value the first time. Note: of course, _lazy_get_stem_ and _lazy_get_extension_ could be one method and set both values, though it would have to do this itself - Lazy's __getattr__ only does the one it was asked for - and return the right one for the key it has been given:
class Path(Lazy):
# ...
def _lazy_get_stem_(self, key):
stem, ext = os.path.splitext(self.name)
self.stem, self.extension = stem, ext
try: return { 'stem': stem, 'extension': ext }[key]
except KeyError:
raise AttributeError, (self, key)
_lazy_get_extension_ = _lazy_get_stem
and similar for basename and dirname.) Insert Lazy's protocol for secondary name lookup into our chain of lookups for some object. The lazily-computable parts of its namespace will always be computed (just) in time for their first uses, but will only be computed once. The relevant computation can be any kind of lookup, including the one done by the importer. (A rather more sophisticated version, than the above (real python 1) implementation in terms of a class Lazy, may be found in lazy.py, based on the infrastructure of python.py discussed elsewhere.)
Lazy hangs its lazy lookup protocol off the standard __getattr__ protocol of python, which in turn hangs off getattr. It thereby manages to arrange that a new object `behaves as if it had been created with' its lazily-computable data. Thus, a package might arrange that its sub-packages get imported on need (without needing to use an explicit import statement in the code needing them) if we could add a lazy lookup to the namespace of the package.
The import engine, on the other hand, isn't triggered by __getattr__ in the normal course of events: importing a package performs the package initialisation, but there may be sub-modules which can be imported but which the package leaves its user to decide whether or not to import. If the package itself is imported, these last won't be visible in its namespace unless (or until) something does import them.
Thus the import statement will need to work with
def getattr(obj, key, rawget=__builtins__.getattr):
# first check for cached value:
try: return rawget(obj, key)
except AttributeError: pass
try: meth = rawget(obj, '__import__')
except AttributeError:
raise AttributeError, (obj, key)
result = meth(key) # yielding a module or package object
setattr(obj, key)
return result
in place of the usual getattr (which it accesses as rawget). In reality, of course, import does some interesting things with local and global namespaces: the __import__ call expects more arguments that I'm giving it (so where I've used the name __import__ a real implementation would want to chose some name not conflicting with that builtin, but hey). The actual relevance of these further arguments comes down to providing a contribution for use (effectively) in sys.path, so would be packaged externally to the substituted getattr.
Make packages instances of a class providing an __import__ method which does the right things with __path__ (etc.) and calls the builtin __import__ with suitable arguments, yielding the module thus imported (with a single-identifier name, so the builtin's subtleties with dotted names don't come into play). The importer, using the above substitute for getattr, thus makes import of sub-modules happen as if it were an ordinary lazy lookup. Meanwhile, code executed by modules being imported is still using the builtin getattr (except, of course, inside any import statements it executes) without exercising this lazy lookup.
Now, putting an __import__ method onto an object will enable the importer to treat that object as if it were a package. Indeed, providing the object with lazy lookups for the relevant attributes would achieve the same effect: and packages could indeed integrate their importing of sub-modules into their lazy lookup, leaving nothing for their import method to do. Still, any time it is desirable to `have to ask for' access to some value `explicitly', rather than just using it in the ordinary way and relying on it to bring itself lazily into being, this desideratum can be packaged into import.
Equally, one could give __builtins__ some lazy lookup machinery to make some (or, indeed, all) of the importer's namespace automagically available as builtins. The plain form of import statement would then be fatuous, while the from form would simply take, as its source namespace, any name (identifier or dot-joined sequence thereof) on which the ordinary rules of lookup succeed.
I now have three components to the import statement:
The first is separately accessible via certain hooks in the python engine. Even with only packages supporting the second, the semantics of the last could be extended to allow import from an arbitrary namespace into the current local namespace. The purpose of the second is to provide a standard idiom for lazy lookup, `triggered' by the import statement: using this, the third is able to find names bound in the `source' namespace from which it does its transcription.
Written by Eddy.