This is an ancient incomplete page (last meaningful update 1997) and a mess; it also uses an old notation. There's more related to this elsewhere, of similar antiquity. Maintenance is lacking here…
Integration turns a function, from the domain of integration
to some
scalar domain, into a scalar value. If you partition the domain of integration,
integrate the function over each of the parts and add up the resulting scalars,
you get the function's integral over the whole. The integral of a sum of
functions is a sum of their several integrals. Multiply a function by a
(constant) scalar and the result's integral is just the product of the scalar
and the original function's integral. Generally, we use an integration with a
special relationship to some notion of distance we're using: but, for the
moment, let's just see what structure we've got already.
Given any scalar domain S and any S-linear space V, there is a natural sense
in which the S-linear structure of S is inherited
by {mappings (N| :V)}
for any collection N. Specifically, for r in S, u, v in {(N|:V)}, we can
define
It follows immediately from the S-linear structure of V that the given scaling and addition constitute an S-linear structure on {(N|:V)}. Just as V has dual(V) = {linear (V| :S)}, we can consider a linear map, m, from {(N|:V)} to S. For any n in N, the mapping ({(N|:V)}| m-> m(n) :V) is linear (from the definitions): and the mapping eval = (N| n-> ({(N|:V)}| m-> m(n) :V) :dual({(N|:V)})) is linearly independent: no finite non-zero-scaled sum of its outputs gives the zero of dual({(N|:V)}). Furthermore, if N is finite, eval spans dual({(N|:V)}), which makes it a basis: in general, I expect its span to be dense in the dual.
What has this got to do with integration ? Integration over N is given
above to turn mappings (N|:S) into members of S: any sum of such integrates to
the sum of its several integrals; applying a constant scaling to a mapping
(N|:S) has the effect of scaling the integral by the same factor. So
integration is linear ({(N|:S)}: :S). Note that not all functions (N|:S) are
necessarily integrable (e.g. when the integral would be infinite), so
integration need not be ({(N|:S)}| :S). I'll refer to a linear map ({(N|:S)}:
:S) as an S-valued distribution on N: so integration is a
distribution, as is every member of dual({(N|:S)}). I'll use the
verb integrate
for a distribution's action (as a mapping) on any given
(N|:S).
For any (S-valued) distribution, m, on N and any function (N|f:S), we can
define f*m to be the distribution ({(N|:S)}: g-> m(N| n-> f(n).g(n) :S)
:S) obtained by using f to point-wise-scale an arbitrary (N|:S) and m to
integrate the result. It is not hard to see that f*m(g) = g*m(f), provided
multiplication is abelian, and that f*m(constant(1)) = m(f) – indeed,
constant(1)*m is simply m. I'll say that one distribution, d, is compatible
with
another, m, iff d = f*m for some (N|f:S). Note that if f is zero
anywhere, m might not be compatible with d, so the relation isn't symmetric. It
is, however transitive: d = f*m and e = g*d yield e = (f.g)*m where (f.g) is (N|
n-> f(n).g(n) :S).
One can use the above, given one sufficiently well-formed distribution m, to
express plenty of other distributions in the form f*m, making it sometimes more
convenient to discuss the scalar functions, f, rather than the
distributions. I'll refer to the scalar function f as the density
function
of the distribution f*m with respect to
m. [When dealing
with a continuum, a delta function
is the density function of a discrete
distribution with respect to a continuous one: neither is actually compatible
with the other – the delta isn't a function – but the delta
serves as a syntactic token for integrate this term with respect to the
discrete distribution rather than the background continuum
.]
So integration over N turns functions (N| :{scalars}) into scalars, and it goes about it linearly. The other thing I said in opening was that if you partition N, integrate some function over each part and sum the results, you get the same answer as simply integrating the function over N. That strictly needs to be qualified by: provided the function is integrable over all the domains over which we thus integrate it. In particular, we've implicitly supposed that there's some way of inferring, from our integration over N, integration on at least some sub-sets of N.
I'll describe a sub-domain, U, of N as measurable by some distribution on N precisely if the distribution does induce an integration on U. I can define addition of mappings (N::S) as: (f+g) maps n to f(n)+g(n) when n is in (|f) and (|g); to f(n) when n is not in (|g); to g(n) when n is not in (|f); and reject any input which isn't in either (|f) or (|g). That feels sensible and makes the following easier. In particular, it lets me add (N| constant(0) :S) to any (N::S) and get an (N|:S) which is zero wherever the other wasn't defined.
For some given distribution, m, on N, consider any m-measurable domain,
U. Let i(U) = (U: constant(1) :) + (N| constant(0) :) so that i(U)*m integrates
functions (N|f:S) but only pays attention to
(U:f:). At least for
functions (U|f:S) which can be extended to (N|f:S), this gives us integration
over U. Thus, in practice, U is m-measurable
iff ((|m)| f-> f.i(U)
:m)
.
I can do something with (|, (:, |) and :) along these lines: Anywhere ? appears,
it is either nothing or something definite, but either way it stays being the
same when repeated. If it is something definite but being introduced by the
given denotation, its role is to be the thing you can slip in in place of
nothing and get information only about it, not the other parties.
(A|f:?) says that (|f:?) subsumes A, and denotes (A:f:?)
when f is a mapping, this says that ?, if present, subsumes (A:f|)
when f is so introduced it asserts (|f) = A
(?:f|B) says that (?:f|) subsumes B, and denotes (?:f:B)
when f is so introduced it asserts (f|) = B
what's (|f:B) ... ?
When I introduce … I mean
(A:f:B) A subsumes (|f:B), B subsumes (A:f|)
(A|f:B) (|f:B) subsumes A, B subsumes (A:f|)
(A:f|B) A subsumes (|f:B), (A:f|) subsumes B
(A|f|B) (|f:B) subsumes A, (A:f|) subsumes B
where several parties are introduced, they take up the slack so that each
satisfies, for the others, the constraints it imposes on them.
Thus, with A given, … introduces f, B with …
(A:f:B) A subsumes (|f) and B subsumes (f|)
(A|f:B) (|f) = A and B subsumes (f|)
(A:f|B) A subsumes (|f) and B = (f|)
(A|f|B) (|f) = A and B = (f|)
with B given, … introduces f, A with …
(A:f:B) (f|) subsumed by B and A subsuming (|f)
(A|f:B) (f|) subsumed by B and A = (|f)
(A:f|B) (f|) = B and A subsuming (|f)
(A|f|B) (f|) = B and A = (|f)
with f given, … introduces A, B with …
(A:f:B) A subsumes (|f) and B subsumes (f|)
(A|f:B) A=(|f) and B subsumes (f|)
(A:f|B) A subsumes (|f) and B=(f|)
(A|f|B) A=(|f) and B=(f|)
with A and B given, … introduces f with
(A:f:B) (|f) subsumed by A and (f|) subsumed by B
(A|f:B) (|f)=A and (f|) subsumed by B
(A:f|B) A subsumes (|f) and (f|)=B
(A|f|B) (f|)=A, (|f)=B
with A and f given, … introduces B with
(A:f:B) B subsumes (A:f|)
(A|f:B) (|f:B) = A, B subsumes (A:f|)
(A:f|B) (A:f|) = B, A subsumes (|f:B)
(A|f|B) (A:f|) = B, A = (|f:B)
with f and B given … introduces A with
(A:f:B) A subsumes (|f:B), B subsumes (A:f|)
(A|f:B) A = (|f:B), B subsumes (A:f|)
(A:f|B) A subsumes (|f:B), (A:f|)=B
(A|f|B) A = (|f:B), (A:f|) = B
with A, B and f given, …, asserts that
(A:f:B) it's a perfectly good denotation for a restriction of f
(A|f:B) A=(|f:B)
(A:f|B) (A:f|)=B
(A|f|B) all of the above.
There's a function (N| i(U) :{0,1}) which maps any
given n in N to 1 if n is in U, otherwise to 0 (and a function ({measurables}| i
:{(N|:{0,1})}) which this demonstrates at our given U). If we consider i(U)*m,
we find that it is a measure on N which ignores everything but
A measure, on some space, is a way of assigning a value to each subspace of that space within some collection (known as the measurable subspaces) in such a way that, whenever subspaces A, B have meaningful intersection and union, the measures of A and B may be added, as may be those of the intersection and union: and the results of these additions are equal. This requires of the domain in which the measure takes values that one can, at least in the cases given, perform addition. The measure, in this case, may be used in a rôle which corresponds to volume (or, more faithfully, charge contained within the volume).
When it is meaningful to say, of the values taken by the measure, that they lie between 0 and 1, with the measure of the entire space being 1: then we refer to the measure as a probability measure. The common case of this has the measure taking values in the real interval [0, 1]; however, entirely similar structure is to be found if one has, for instance, a measure taking values in some commuting sub-algebra of the hermitian projectors of some Hilbert space. [All such projectors are non-negative in the sense that each is the conjugate-square of something (itself, in fact) and are at most 1 in the sense that each has a complement (1 minus it) which is also a hermitian projector and, thus, non-negative. Furthermore, a conventional (Real) probability measure may be obtained from such a measure by taking the trace of its result times an arbitrary hermitian operator whose trace is one.]
In practice, a measure is at its most useful when it can be used to integrate some class of functions. If a meaningful product can be defined between the domain in which the measure takes value and the domains in which the functions to be integrated take their values, then one may use the measure to integrate functions.
I am perennially interested in how little I can depend on the category Set while describing structures to which I was first introduced (excellently, I might add) in the context of sets. If nothing else, I wish to understand which properties of Set we actually depended on when we were being taught measure theory. This text is too many years from completion for me to be worrying about nice distinctions between research, development and publication.
So, tentatively, let's suppose we have some domain
D; topology and
measure theory require us to discuss sub-domains
of D and collections of
such sub-domains. It is at least expedient to discuss a collection Sub(D)
of all
the sub-domains of D; but all that's really needed is that Sub(D)
subsumes all the interesting collections of sub-domains of D and certain
collections which may be obtained from them (by constructions which
substantially characterize the relevant branches of mathematics). The basic
requirements of Sub(D) are closure under intersection and union. Whether this
is to be applied to arbitrary intersection and union (i.e. taking the union (or
intersection) of any collection of sub-domains of D yields a sub-domain of D)
matters hugely in proofs of existence in Set – but the distinction matters
less to the definitions.
We can also ask for Sub(D) to be closed under complementation – that
is, any A, B in Sub(D) imply a C in Sub(B) disjoint from A whose union with A is
B; C = {b in B: b not in A}. This involves the notion disjoint, which means we
have an empty member of Sub(D) and that empty is in Sub(A) for every A in
Sub(D); C and A disjoint
means their intersection is empty. We also
typically require the union of all members of Sub(D) to be a member of Sub(D);
and if it wasn't actually D we'd use it in place of D or replace D with the
equivalence class of all domains having the same union of all sub-domains as D
has. So, in practice, D is the union of Sub(D).
Such a collection, Sub(D), can be characterized in terms of the poSet of embeddings, in domains, of sub-domains. This is a category; in it, there is precisely one morphism from each domain to each domain of which it is a sub-domain. A morphism from A to B means that A is a sub-domain of B; composing it with the one from B to any C of which B is a sub-domain gives the one morphism from A to C – embedding A in B then embedding B in C just trivially gives the embedding of A in C. In particular, A in Sub(B) and B in Sub(C) imply that A's embedding in C can be factorized via B's embedding in C.
That last unassuming observation paves the way to define intersections and unions (in terms of the poSet): characterizing any A in Sub(D) by its embedding in D, A is a sub-domain of any domain via whose embedding in D this can be factorized. The collection of sub-domains to be intersected or united then appear as a collection of embeddings at D: each of them can be factorized via the union's embedding, and any embedding in D via which each can be factorized can, in turn, be factorized via the union. The intersection's embedding, correspondingly, factorizes via each of the given embeddings; and anything else which does this factorizes via it.
While I shall leave aside the description of Sub(D) as a poSet, I shall be trying to discuss what follows in terms which can be expressed in such terms, so as to give the tools of category theory full sway to support the structure if I find it necessary to escape from Set.
Within Sub(D) we now turn to look at sub-collections. For a collection, S, of sub-domains of D: if the union of all the domains in S is D, I'll describe S as
Note that
D need not be in S, merely equal to its union, nor need empty. This is part of
a policy of avoiding discussion of infinity and taking seriously the
measure-theoretic sense in which empty and other measure zero
domains are
ignorable.
I'll be taking it as read that Sub(D) is all of the above !
For a domain D with Sub(D) as above and a cancellable Abelian binary operator, described as addition (R×R|+R) on R, a (raw) R-measure on D is (Sub(D):m:R) with (|m) a pre-measure space and: for A, B in Sub(D) with union U in Sub(D) and intersection, N, either empty or in (|m);
Thus, whenever an intersection of finitely many members of (|m) is itself in (|m), so is the associated union, with measures adding up just like areas…
I have
quite deliberately not insisted either on the empty member of Sub(D) being in
(|m), or on F having an additive identity. If empty is in (|m), then m(empty)
must be an additive identity for (:+:R): however, it suffices to be able to
describe its ignorability rather than needing to mirror that with
an ignorable
member of R – zero gets unignorable if we want R to be
multiplicatively cancellable.
Note that I have required R's addition to be cancellable, so any such r is unique.
This is just the natural match to the last: collectively, these last two cases get us as close as we can hope for to complementation, B = {u in U: u not in A}, which is written U\A.
The members of (|m) are described as measured (sub-domains of D). A member, Q, of Sub(D) is called ignorable precisely if: for every P in Sub(Q) and A in (|m), the union, U, of P and M is in (|m) and m(U)=m(A). The empty collection is trivially ignorable. The members of Sub(D) which are either ignorable or measured are called measurable (because the additive completion of F will allow us to measure the ignorables with measure 0 and, otherwise, preserve all the structure).
If R has no solutions to b+d+a=a except possibly with b=d an additive identity, we get a partial ordering on it defined by: for any r, s in R r+s is greater than or equal to r. If R has no additive identity, r+s is always greater than r. We can use this ordering to show that: whenever A in (|m) and Q in Sub(D) have ignorable intersection (e.g. they're disjoint); if their union, U, is in (|m) with m(U)=m(A) then Q is ignorable. When R has no solutions to b+d+a=a, I'll describe m as positive definite: when it has no solutions to e+a=a, I'll describe m as positive. Crucially, whenever A is a subset of B in (|m), m(A) is less than or equal to m(B).
Given a topology R, I'll describe a positive measure (Sub(D):m:R) as
If you do this with D some R-vector space with a metric, R with the non-negative reals and all cuboids in D having the volumes we expect, you get the usual Riemann integration for that vector space
You have to be able to cope with the possibility that this last sum is infinite, but we're in a positive domain, so this is reasonably well behaved.
lies between them– i.e. B in Sub(C) and A in Sub(B) – B is also in (|m) with m(B) = m(A).
This last has a delightful simplicity to it, and it enabled Lebesgue and those who've come since him to build a theory of measures which is powerful, flexible and expressive.
If the addition, (:+:R), on (Sub(D):m|) is in fact the addition of a scalar
domain, we can discuss R-linear spaces: since these are additive domains we can
also look at measures (Sub(D):v:V) with V an R-linear space. Integration is the
process of taking a function (D:f:V) and integrating
it over members of
Sub((|f))'s intersection with (|m), to obtain a measure (Sub((|f)):v:V). To
match up with our idea of what integration is, we need any m-measurable A in
Sub(|f) to have v(A)=m(A).k(A) for some (Sub((|f)): k :V) which can sensibly be
thought of as delivering a plausible average, k(A) of f in A. If (|m)
contains small
enough sub-domains everywhere
in D, we can expect f
to be near enough
constant on each, thus supplying a value for k on these
– from which we can expect the machinery of v, as a measure, to imply v's
values.
The trick with which to identify whether a sub-domain of D is small
enough
is to require f to vary little. Among the properties we expect of
integration is that the average of f over some region A should be in the convex
hull of the values f takes on A. If f scarcely varies over A, this ties k's
value tightly. With that, I can define at least the preliminaries of
integration, using a positive scalar measure to integrate a vector function. It
is of note that the measure used to perform this integration has to be positive,
or the convex hull of f's values isn't guaranteed to contain the integral.
I'll work with a positive scalar domain, R, on which positivity induces the ordering r<r+s (for all r,s in R), which (in turn) induces a topology: the open sets are the (arbitrary) unions of sets of form between(s,t)= {r in R: s<r<t} with s,t in R. [This, in turn, induces a topology on {(I::R)} for any set I: the open sets are unions of sets of form I-between(s,t)= {(I:f:R): for each i in I, s(i)<f(i)<t(i)} with s, t in {(I::R)}.] Any topology on R induces one on any R-linear space, V, (by deciding that linear maps must be continuous): the open sets are (arbitrary) unions of finite intersections of sets of form (|w:S) with (V|w:R) linear and S open in R. [This coincides with the topology just given for {(I::R)}, when viewed as an R-linear space.]
I shall describe a subset, U, of an R-linear space, V,
as convex precisely if {au+cv: u,v in U, a,c in R with a+c=1}
is a subset of U. It is easy to show that an arbitrary intersection of convex
subsets is convex; and that both empty and V are convex. A subset, W, of V,
which need not be convex, is a subset of some convex subsets of V: if we
intersect all the convex subsets of V which have W as a subset, we get
a minimal
convex subset of V which subsumes W. This is called the convex
hull of W: name it convexHull(W).
For a function, (:f:V), I shall write convexHull(f) for convexHull((f|)): you can equally regard convexHull(W) as convexHull((W|w->w:W)). We can show that convexHull(W) is the closure of {∑(: i> r(i)v(i): V): (|r:R)=(|v:V) and ∑(r)=1}. This implicitly requires that ∑, the bulk action of addition, is defined on (|r)=(|v): this is guaranteed when (|r) is finite and may be possible beyond that: but taking the closure of the finite sums suffices.
I'll say that a measure (Sub(D):v:V) integrates a function
(D:f:V) with respect to a measure (Sub(D):m:R), with V an R-linear space and R
positive, precisely if: (|v) subsumes the intersection of Sub((|f)) and (|m)
and, for every U in this intersection, there is some u in (¿ the closure
of ?) convexHull(U:f|) for which m(U).u =v(U). It is then
necessary to discover the circumstances under which, or the extent to which,
such a v is the only
measure which integrates f with respect to m.
When such a measure exists, we say that f is integrable with respect to m and denote the measure which does the job, v, as m∫(f); its restriction to the integral over some sub-domain, U, of (|f) is then m∫(U:f:), the integral of f's restriction to U.
There is a standard measure induced on any positive scalar domain, defined as the minimal Lebesgue measure, m, for which m(between(s,s+r))=r for every s,r in R. It should be noted that any other measure, n, satisfying these conditions then has ((|m):n:), its restriction to (|m), equal to m (so, in particular, (|m) is a subset of (|n)). [This induces one on functions from finite sets to R, by giving measure &product;(t-s) to the set given as between(s,t). This, in turn, induces measures on any R-linear space isomorphic to such a {(finite::R)}: the measure, d, induced by an (R-linear) isomorphism, ({(n|:R)}:e:V), is the minimal Lebesgue measure for which a subset, U, of V is measurable precisely if (|e:U) is, and has its measure. [Strictly, what I care about is that e is invertible.] Thus d(U) = m(|e:U), for a given measure (Sub({(n|:R)}):m:R), with n finite. Comparing the measure d, induced by invertible e, with the measure, h, induced by some parallel invertible, ({(n|:R)}:j:V) with inverse (V:i:{n|:R}), we will find that d(U) and h(U) are proportional to one another, in the same ratio to one another as the determinants of e and j. We have d(U) = m(|e:U), h(U) = m(|j:U), with (|j:U) = (U:i|), so (:e:U) is composable after i, yielding ((|e:U): e o i :(|j:U)). The important fact that comes next is that the composite, e o i, is defined on all of (|e)= (|j)= (i|)= {(n|:R)}, not just (|e:U). The study of the standard (Lebesgue) measure on (|e) then reveals that its measures of (|e:U) and (|j:U) are proportional to one another, with ratio equal to the determinant of e o i. Now, e o i is a linear automorphism on {(n::R)}: its determinant is defined as the linear map it induces on the n-antisymmetric self product space of this, ∧{(n::R)}. This is a 1-dimensional R-linear space, and the determinant of the identity on {(n::R)} is its identity isomorphism. It is thus natural to describe determinants of (general) automorphisms in terms of the scalar which, when multiplied by this natural unit, yields the actual determinant. One can, equally, define the determinant of any linear ({(n::R)}:f:W), for n finite but arbitrary R-linear W, via a standard antisymmetric product induced on arbitrary vector spaces. For (W|f:X) linear and n finite, n∧f is the linear map from n&tensor;W (W's self n-tensor product space) to n&tensor;X (well, actually to its linear subspace, n∧X) induced, linearly, from &tensor;(n|w:W)-> ∧(n| f o w :X), with ∧(n|x:X) defined to be ∑(permutations(n): s-> sign(s) . &tensor;(x) / n! ;). Here, n! denotes the factorial of n, defined inductively by factorial= (: 0->1, 1+i-> (1+i) . i! :) and &tensor; denoting the usual tensor product. The construction, ({linear (W|f:X)}| n∧ :{linear (n&tensor;W| :n&tensor;X)}) respects epic and iso, but not monic. There is a natural composition between {linear (W|:X)} and {linear (X|:Y)}: likewise between {linear (n∧W| :n∧X)} and {linear (n∧X| :n∧Y)}. For linear (W|f:X) and (X|g:Y), composing n∧f and n∧g yields n∧(fog). [Making that work depends on dividing by the n! in the definition: n has to be finite.] A nice fact: if mutually composable f, g have f o g invertible then g is monic and f is epic (I fear this depends on things I don't want to wot). This means that if (|g) is one-dimensional, then f and g are invertible (so (|f) and (g|) are also 1-dimensional). When we return to our invertible e and j, linear ({n::R}|:V), with i inverse to j (remember them ? and measures ?), We can, in fact, equally define det(i) to be a linear map from the n-antisymmetric self-product of V to that of {(n::R)} and it is not hard to show that it is the inverse of det(j) – so long as your antisymmetric product is scaled as, with p an arbitrary natural number, p! denoting its factorial, and W an arbitrary R-linear space. When we consider our determinant of e o i, we see that is is the composite of the determinants of e and i, in the same order: that of i is the inverse of that of j, so we have e's measure of U and j's measure of U in the same ratio as the determinants of e and j. map to the ]
Valid CSS ? Valid HTML ? Written by Eddy.