One of my on-going projects is the development of a body of python code for studying mathematical and scientific questions. The package includes pythonic utilities that may be generally useful, some raw mathematical infrastructure and scientific data along with the machinery for representing it faithfully. This may lead to me splitting it into several packages at some later date; but, for the present, a sub-package hierarchy within one package is good enough for me.
On the scientific side, my main emphasis has been the faithful representation of scientific quantities: while software for scientific computation generally concerns itself with computing answers as fast as possible with the best precision available (ignoring the imprecision of the input data), I have concerned myself more with having the numeric types involved keep track of the uncertainties in data and the units in which things are measured.
Units are generally ignored in scientific computation; the system of units in use implies the units associated with each quantity so, the reasoning goes, one only needs to know each quantity's value. There are two flaws with this, from a software maintenance (i.e. we know there exist bugs) point of view: omitting conversion factors (e.g. when one converts from electron-Volts to Joules, or gallons per minute to cubic feet per second) is all too easy, and hard to notice when trying to work out why the code is producing wrong answers; and quantities with different units should not be added together. Having quantities know their own units makes it possible for a program to raise an error in the latter case; while eliminating the former issue. It also provides a handy check, when one does a quick computation in an interactive session, that one has computed what one expected – if the units aren't right, something went wrong !
Since python supports emulation of numeric types via magic
methods, it's entirely practical to ensure that quantities behave as required,
even to the extent of getting nice display. When displayed
(i.e. when repr
or str
is called on it), a number
only displays one more decimal place than is justified by its
precision. Thus, when it displays Hubble's
constant, study.chemy.physics.Cosmos.Hubble
, as 2.27 * atto
/ second, it's saying that there's at least (according to the data at my
disposal) a fifty percent likelihood that the value lies between 2.25
aHz and 2.35 aHz (values in this interval, including 2.27,
would all round to 2.3 aHz; in which a is the
abbreviation for quantifier atto = 1e-18 and Hz is the
short form for Hertz, an alternate name for 1/second)
and the available data favour a value close to 2.27 aHz, within
that range, but there's less than a fifty percent likelihood of the value
falling between 2.265 aHz and 2.275 aHz.
I'm also a big fan of lazy evaluation; most of the types used in the
package support attributes which are only computed when they are first
referenced. When specifying one (or a few) of an object's attributes can
suffice to determine various others, this makes it possible for objects
to know
many attributes as soon as enough are specified. This carries
the added advantage of simplifying some cases where specifying enough of a set
of attributes suffices to determine the rest of the set, without it mattering
(much) which attributes in the set were specified. For example, specifying
momentum, energy, frequency or wavelength of a photon is sufficient to
determine all of the others.
The package includes various decorators, including some that support lazy
attributes: just write a function to compute the desired attribute value for
the instance, using the name you want for the attribute as the function name,
and apply @lazyprop
or one of its siblings
(see study.cache.property) to ensure that function is called just
once, the first time the attribute is called for, with the value being cached
thereafter. The method's doc-string is duly propagated to the
property. There are variants that hold the value via a weak reference (and
recompute it if it's asked for after it's been garbage-collected;
see study.cache.weak) and/or allow direct setting of the attribute
value. (There's also an older lazy attribute infrastructure in use, predating
properties, study.snake.lazy, but I aim to phase that one out.) The
implementations of these decorators are, themselves, facilitated by a suite of
decorators (see study.snake.decorate) designed to help with writing
decorators.
Alongside the lazy property infrastructure I've also got some incomplete infrastructure for caching data to disk: this exists as part of an on-going re-implementation of the tuple of prime numbers: the crude version available in the presently released code makes the hideous mistake of trying to hold all of its data in memory at once, even when most of that data has been saved to disk. The new objects – deployed in the (as yet unreleased) new version – load data from disk but allow the garbage collector to discard the data once no longer actively in use. The class hierarchy of the infrastructure for the new cache is currently rather complex: I should probably analyse it some more and simplify it !
The mathematical infrastructure includes various standard functions,
probability distributions and approximations (e.g. using continued
fractions). It provides classes for polynomials and permutations, an
implementation of the find-unite algorithm for partitioning graphs, a
lazily-evaluating implementation of the tuple of prime numbers, a solution of
the general N queens problem
in chess,
(an extended form of) Pascal's triangle and
diverse tools used in assorted
parts of this web-site, notably including various
families of polynomials implicated in particular solutions of
Schrödinger's equation. For a fuller list of what's available, install
the package as sub-directory study of some directory in
your sys.path (usually initialized from environment
variable PYTHONPATH) and type: print
study.maths.__doc__
The current version of this software is what evolved out of my messing around with wanting to represent error bars and units. I sporadically dream of doing a major re-write which resolves assorted issues I'm lumbered with by how I got to where I am; however, it may be some time before I even complete thinking through the design of that (I can fairly be accused of a bad case of the second-system effect), let alone have anything better to show for it than what I have now. In the mean time, what I've got works and I sporadically improve it; if anyone else wants to play with it, I have a bzip2-ed tar-ball available; I (as copyright holder) grant permission for anyone to download it (last update: 2012/September/1st, based on git commit 0df3ebae6ba446856c090e8a98783ee92eebf90d; this may be the last time I'll bother, as github provides an up-to-date download service) and use it under the terms of the GNU project's General Public License. If you want to do anything more than just play with a snapshot of it, pull a fully up-to-date version from github. Letting me know would be prudent, as I might otherwise take a somewhat cavalier attitude towards changes in the git repository (to be specific: non-fast-forward pushes) while entertaining the delusion that no-one else is affected.
For documentation, print the __doc__
attributes of relevant
objects (as illustrated above for sub-module study.maths – or
just read doc-strings in the source). Either clone the git repo to
a directory called study or unpack the tar-ball (using
the command tar xjf study.tjz; everything it contains is
in a directory called study) as a sub-directory of some directory
in your $PYTHONPATH (or simply in sys.path
) and, in
a python session, import study
or assorted sub-objects
of it. The package gives an over-view of the sub-packages it contains, each
sub-package explains what it contains. I make extensive use of hierarchical
name-spaces, each step of which indicates what it makes available.
Note that:
dir
built-in doesn't always
list all the attributes actually available (and .__dict__.keys()
is even less complete), but each method with a name of
form _lazy_get_name_
contributes an attribute with the
given name
. import
of any major component
of study.value
(this may be triggered by imports
from study.space or study.chemy) is apt to take quite a
long time; don't worry, it's just got a lot to set up ! study.maths.primes
may lead to a cache directory
being created (set environment variable STUDY_FLATPRIME_PATH to
control where it goes); otherwise, I can't think of anything this package
attempts to modify on your hard disk – however,