# Measure Theory: the general theory of integration

This is an ancient incomplete page (last meaningful update 1997) and a mess; it also uses an old notation. There's more related to this elsewhere, of similar antiquity. Maintenance is lacking here…

Integration turns a function, from the domain of integration to some scalar domain, into a scalar value. If you partition the domain of integration, integrate the function over each of the parts and add up the resulting scalars, you get the function's integral over the whole. The integral of a sum of functions is a sum of their several integrals. Multiply a function by a (constant) scalar and the result's integral is just the product of the scalar and the original function's integral. Generally, we use an integration with a special relationship to some notion of distance we're using: but, for the moment, let's just see what structure we've got already.

## Linear structure

Given any scalar domain S and any S-linear space V, there is a natural sense in which the S-linear structure of S is inherited by {mappings (N| :V)} for any collection N. Specifically, for r in S, u, v in {(N|:V)}, we can define

r.u
= (N| n-> r.u(n) :V)
u+v
= (N| n-> u(n) + v(n) :V)

It follows immediately from the S-linear structure of V that the given scaling and addition constitute an S-linear structure on {(N|:V)}. Just as V has dual(V) = {linear (V| :S)}, we can consider a linear map, m, from {(N|:V)} to S. For any n in N, the mapping ({(N|:V)}| m-> m(n) :V) is linear (from the definitions): and the mapping eval = (N| n-> ({(N|:V)}| m-> m(n) :V) :dual({(N|:V)})) is linearly independent: no finite non-zero-scaled sum of its outputs gives the zero of dual({(N|:V)}). Furthermore, if N is finite, eval spans dual({(N|:V)}), which makes it a basis: in general, I expect its span to be dense in the dual.

What has this got to do with integration ? Integration over N is given above to turn mappings (N|:S) into members of S: any sum of such integrates to the sum of its several integrals; applying a constant scaling to a mapping (N|:S) has the effect of scaling the integral by the same factor. So integration is linear ({(N|:S)}: :S). Note that not all functions (N|:S) are necessarily integrable (e.g. when the integral would be infinite), so integration need not be ({(N|:S)}| :S). I'll refer to a linear map ({(N|:S)}: :S) as an S-valued distribution on N: so integration is a distribution, as is every member of dual({(N|:S)}). I'll use the verb integrate for a distribution's action (as a mapping) on any given (N|:S).

For any (S-valued) distribution, m, on N and any function (N|f:S), we can define f*m to be the distribution ({(N|:S)}: g-> m(N| n-> f(n).g(n) :S) :S) obtained by using f to point-wise-scale an arbitrary (N|:S) and m to integrate the result. It is not hard to see that f*m(g) = g*m(f), provided multiplication is abelian, and that f*m(constant(1)) = m(f) – indeed, constant(1)*m is simply m. I'll say that one distribution, d, is compatible with another, m, iff d = f*m for some (N|f:S). Note that if f is zero anywhere, m might not be compatible with d, so the relation isn't symmetric. It is, however transitive: d = f*m and e = g*d yield e = (f.g)*m where (f.g) is (N| n-> f(n).g(n) :S).

One can use the above, given one sufficiently well-formed distribution m, to express plenty of other distributions in the form f*m, making it sometimes more convenient to discuss the scalar functions, f, rather than the distributions. I'll refer to the scalar function f as the density function of the distribution f*m with respect to m. [When dealing with a continuum, a delta function is the density function of a discrete distribution with respect to a continuous one: neither is actually compatible with the other – the delta isn't a function – but the delta serves as a syntactic token for integrate this term with respect to the discrete distribution rather than the background continuum.]

## Subsets

So integration over N turns functions (N| :{scalars}) into scalars, and it goes about it linearly. The other thing I said in opening was that if you partition N, integrate some function over each part and sum the results, you get the same answer as simply integrating the function over N. That strictly needs to be qualified by: provided the function is integrable over all the domains over which we thus integrate it. In particular, we've implicitly supposed that there's some way of inferring, from our integration over N, integration on at least some sub-sets of N.

I'll describe a sub-domain, U, of N as measurable by some distribution on N precisely if the distribution does induce an integration on U. I can define addition of mappings (N::S) as: (f+g) maps n to f(n)+g(n) when n is in (|f) and (|g); to f(n) when n is not in (|g); to g(n) when n is not in (|f); and reject any input which isn't in either (|f) or (|g). That feels sensible and makes the following easier. In particular, it lets me add (N| constant(0) :S) to any (N::S) and get an (N|:S) which is zero wherever the other wasn't defined.

For some given distribution, m, on N, consider any m-measurable domain, U. Let i(U) = (U: constant(1) :) + (N| constant(0) :) so that i(U)*m integrates functions (N|f:S) but only pays attention to (U:f:). At least for functions (U|f:S) which can be extended to (N|f:S), this gives us integration over U. Thus, in practice, U is m-measurable iff ((|m)| f-> f.i(U) :m). I can do something with (|, (:, |) and :) along these lines: Anywhere ? appears, it is either nothing or something definite, but either way it stays being the same when repeated. If it is something definite but being introduced by the given denotation, its role is to be the thing you can slip in in place of nothing and get information only about it, not the other parties. (A|f:?) says that (|f:?) subsumes A, and denotes (A:f:?) when f is a mapping, this says that ?, if present, subsumes (A:f|) when f is so introduced it asserts (|f) = A (?:f|B) says that (?:f|) subsumes B, and denotes (?:f:B) when f is so introduced it asserts (f|) = B what's (|f:B) ... ? When I introduce … I mean (A:f:B) A subsumes (|f:B), B subsumes (A:f|) (A|f:B) (|f:B) subsumes A, B subsumes (A:f|) (A:f|B) A subsumes (|f:B), (A:f|) subsumes B (A|f|B) (|f:B) subsumes A, (A:f|) subsumes B where several parties are introduced, they take up the slack so that each satisfies, for the others, the constraints it imposes on them. Thus, with A given, … introduces f, B with … (A:f:B) A subsumes (|f) and B subsumes (f|) (A|f:B) (|f) = A and B subsumes (f|) (A:f|B) A subsumes (|f) and B = (f|) (A|f|B) (|f) = A and B = (f|) with B given, … introduces f, A with … (A:f:B) (f|) subsumed by B and A subsuming (|f) (A|f:B) (f|) subsumed by B and A = (|f) (A:f|B) (f|) = B and A subsuming (|f) (A|f|B) (f|) = B and A = (|f) with f given, … introduces A, B with … (A:f:B) A subsumes (|f) and B subsumes (f|) (A|f:B) A=(|f) and B subsumes (f|) (A:f|B) A subsumes (|f) and B=(f|) (A|f|B) A=(|f) and B=(f|) with A and B given, … introduces f with (A:f:B) (|f) subsumed by A and (f|) subsumed by B (A|f:B) (|f)=A and (f|) subsumed by B (A:f|B) A subsumes (|f) and (f|)=B (A|f|B) (f|)=A, (|f)=B with A and f given, … introduces B with (A:f:B) B subsumes (A:f|) (A|f:B) (|f:B) = A, B subsumes (A:f|) (A:f|B) (A:f|) = B, A subsumes (|f:B) (A|f|B) (A:f|) = B, A = (|f:B) with f and B given … introduces A with (A:f:B) A subsumes (|f:B), B subsumes (A:f|) (A|f:B) A = (|f:B), B subsumes (A:f|) (A:f|B) A subsumes (|f:B), (A:f|)=B (A|f|B) A = (|f:B), (A:f|) = B with A, B and f given, …, asserts that (A:f:B) it's a perfectly good denotation for a restriction of f (A|f:B) A=(|f:B) (A:f|B) (A:f|)=B (A|f|B) all of the above. There's a function (N| i(U) :{0,1}) which maps any given n in N to 1 if n is in U, otherwise to 0 (and a function ({measurables}| i :{(N|:{0,1})}) which this demonstrates at our given U). If we consider i(U)*m, we find that it is a measure on N which ignores everything but

# Old version. I am in the midst of re-writing this page.

## Introduction

A measure, on some space, is a way of assigning a value to each subspace of that space within some collection (known as the measurable subspaces) in such a way that, whenever subspaces A, B have meaningful intersection and union, the measures of A and B may be added, as may be those of the intersection and union: and the results of these additions are equal. This requires of the domain in which the measure takes values that one can, at least in the cases given, perform addition. The measure, in this case, may be used in a rôle which corresponds to volume (or, more faithfully, charge contained within the volume).

When it is meaningful to say, of the values taken by the measure, that they lie between 0 and 1, with the measure of the entire space being 1: then we refer to the measure as a probability measure. The common case of this has the measure taking values in the real interval [0, 1]; however, entirely similar structure is to be found if one has, for instance, a measure taking values in some commuting sub-algebra of the hermitian projectors of some Hilbert space. [All such projectors are non-negative in the sense that each is the conjugate-square of something (itself, in fact) and are at most 1 in the sense that each has a complement (1 minus it) which is also a hermitian projector and, thus, non-negative. Furthermore, a conventional (Real) probability measure may be obtained from such a measure by taking the trace of its result times an arbitrary hermitian operator whose trace is one.]

In practice, a measure is at its most useful when it can be used to integrate some class of functions. If a meaningful product can be defined between the domain in which the measure takes value and the domains in which the functions to be integrated take their values, then one may use the measure to integrate functions.

### Sub-domains

I am perennially interested in how little I can depend on the category Set while describing structures to which I was first introduced (excellently, I might add) in the context of sets. If nothing else, I wish to understand which properties of Set we actually depended on when we were being taught measure theory. This text is too many years from completion for me to be worrying about nice distinctions between research, development and publication.

So, tentatively, let's suppose we have some domain D; topology and measure theory require us to discuss sub-domains of D and collections of such sub-domains. It is at least expedient to discuss a collection Sub(D) of all the sub-domains of D; but all that's really needed is that Sub(D) subsumes all the interesting collections of sub-domains of D and certain collections which may be obtained from them (by constructions which substantially characterize the relevant branches of mathematics). The basic requirements of Sub(D) are closure under intersection and union. Whether this is to be applied to arbitrary intersection and union (i.e. taking the union (or intersection) of any collection of sub-domains of D yields a sub-domain of D) matters hugely in proofs of existence in Set – but the distinction matters less to the definitions.

We can also ask for Sub(D) to be closed under complementation – that is, any A, B in Sub(D) imply a C in Sub(B) disjoint from A whose union with A is B; C = {b in B: b not in A}. This involves the notion disjoint, which means we have an empty member of Sub(D) and that empty is in Sub(A) for every A in Sub(D); C and A disjoint means their intersection is empty. We also typically require the union of all members of Sub(D) to be a member of Sub(D); and if it wasn't actually D we'd use it in place of D or replace D with the equivalence class of all domains having the same union of all sub-domains as D has. So, in practice, D is the union of Sub(D).

Such a collection, Sub(D), can be characterized in terms of the poSet of embeddings, in domains, of sub-domains. This is a category; in it, there is precisely one morphism from each domain to each domain of which it is a sub-domain. A morphism from A to B means that A is a sub-domain of B; composing it with the one from B to any C of which B is a sub-domain gives the one morphism from A to C – embedding A in B then embedding B in C just trivially gives the embedding of A in C. In particular, A in Sub(B) and B in Sub(C) imply that A's embedding in C can be factorized via B's embedding in C.

That last unassuming observation paves the way to define intersections and unions (in terms of the poSet): characterizing any A in Sub(D) by its embedding in D, A is a sub-domain of any domain via whose embedding in D this can be factorized. The collection of sub-domains to be intersected or united then appear as a collection of embeddings at D: each of them can be factorized via the union's embedding, and any embedding in D via which each can be factorized can, in turn, be factorized via the union. The intersection's embedding, correspondingly, factorizes via each of the given embeddings; and anything else which does this factorizes via it.

While I shall leave aside the description of Sub(D) as a poSet, I shall be trying to discuss what follows in terms which can be expressed in such terms, so as to give the tools of category theory full sway to support the structure if I find it necessary to escape from Set.

Within Sub(D) we now turn to look at sub-collections. For a collection, S, of sub-domains of D: if the union of all the domains in S is D, I'll describe S as

an open topology
for D ⇔ it is closed under countable intersection and arbitrary union and its members include both D and the empty collection.
a closed topology
for D ⇔ it is closed under arbitrary intersection and countable union and its members include both D and empty.
a monotone class
of D ⇔ for any function, (natural:f:S) from a countable ordinal to S: if i, j in (|f) with i<j implies f(i) in Sub(f(j)), then the union of (f|) is in S; if i, j in (|f) with i<j implies f(j) in Sub(f(i)), then the intersection of (f|) is in S. [This is only interesting when (|f) is infinite.]
a pre-measure space
for D ⇔ for any two members of Sub(D) whose intersection is either empty or in S, consider the two members and their union (all in Sub(D)): if two of these three are in S, so is the third.

Note that D need not be in S, merely equal to its union, nor need empty. This is part of a policy of avoiding discussion of infinity and taking seriously the measure-theoretic sense in which empty and other measure zero domains are ignorable.

a Jordan field
for D ⇔ it is a pre-measure space closed under pairwise intersection (hence union and difference).
a Borel (or σ-) field
for D ⇔ it is closed under pairwise difference and under countable (that means infinite, though of the smallest kind) union and intersection. (Given pairwise differences, either of union and intersection implies the other.)

I'll be taking it as read that Sub(D) is all of the above !

## Measure

For a domain D with Sub(D) as above and a cancellable Abelian binary operator, described as addition (R×R|+R) on R, a (raw) R-measure on D is (Sub(D):m:R) with (|m) a pre-measure space and: for A, B in Sub(D) with union U in Sub(D) and intersection, N, either empty or in (|m);

N, A, B in (|m)
&implies; m(U) + m(N) = m(A) + m(B).

Thus, whenever an intersection of finitely many members of (|m) is itself in (|m), so is the associated union, with measures adding up just like areas…

A, B in (|m), N empty
&implies; m(U) = m(A) + m(B).

I have quite deliberately not insisted either on the empty member of Sub(D) being in (|m), or on F having an additive identity. If empty is in (|m), then m(empty) must be an additive identity for (:+:R): however, it suffices to be able to describe its ignorability rather than needing to mirror that with an ignorable member of R – zero gets unignorable if we want R to be multiplicatively cancellable.

A, U in (|m), N empty
&implies; if some r in R satisfies m(A)+r =m(U) then B is in (|m) and m(B) = r.

Note that I have required R's addition to be cancellable, so any such r is unique.

A, U, N in (|m)
&implies; if some r in R satisfies m(A)+r = m(U)+m(N), then B in (|m) with m(B)=r.

This is just the natural match to the last: collectively, these last two cases get us as close as we can hope for to complementation, B = {u in U: u not in A}, which is written U\A.

The members of (|m) are described as measured (sub-domains of D). A member, Q, of Sub(D) is called ignorable precisely if: for every P in Sub(Q) and A in (|m), the union, U, of P and M is in (|m) and m(U)=m(A). The empty collection is trivially ignorable. The members of Sub(D) which are either ignorable or measured are called measurable (because the additive completion of F will allow us to measure the ignorables with measure 0 and, otherwise, preserve all the structure).

If R has no solutions to b+d+a=a except possibly with b=d an additive identity, we get a partial ordering on it defined by: for any r, s in R r+s is greater than or equal to r. If R has no additive identity, r+s is always greater than r. We can use this ordering to show that: whenever A in (|m) and Q in Sub(D) have ignorable intersection (e.g. they're disjoint); if their union, U, is in (|m) with m(U)=m(A) then Q is ignorable. When R has no solutions to b+d+a=a, I'll describe m as positive definite: when it has no solutions to e+a=a, I'll describe m as positive. Crucially, whenever A is a subset of B in (|m), m(A) is less than or equal to m(B).

Given a topology R, I'll describe a positive measure (Sub(D):m:R) as

a Riemann measure
⇔ (|m) is a Jordan field and, whenever, for some C in Sub(D),
• ({A in (|m): C in Sub(A)}:m|)'s closure (upper bounds on m(C)) and
• (Sub(C):m|)'s closure (lower bounds on m(C))
have intersection {r} for some r in R, C is in (|m) with m(C)=r.

If you do this with D some R-vector space with a metric, R with the non-negative reals and all cuboids in D having the volumes we expect, you get the usual Riemann integration for that vector space

a Borel measure
⇔ (|m) is a Borel field and the union of any countable collection of disjoint members of (|m) has, as its measure, the sum of the measures of the disjoint constituents.

You have to be able to cope with the possibility that this last sum is infinite, but we're in a positive domain, so this is reasonably well behaved.

a Lebesgue measure
⇔ it is a Borel measure and, whenever A, C in (|m) with m(A) = m(C) and some B in Sub(D) lies between them – i.e. B in Sub(C) and A in Sub(B) – B is also in (|m) with m(B) = m(A).

This last has a delightful simplicity to it, and it enabled Lebesgue and those who've come since him to build a theory of measures which is powerful, flexible and expressive.

## Integration

If the addition, (:+:R), on (Sub(D):m|) is in fact the addition of a scalar domain, we can discuss R-linear spaces: since these are additive domains we can also look at measures (Sub(D):v:V) with V an R-linear space. Integration is the process of taking a function (D:f:V) and integrating it over members of Sub((|f))'s intersection with (|m), to obtain a measure (Sub((|f)):v:V). To match up with our idea of what integration is, we need any m-measurable A in Sub(|f) to have v(A)=m(A).k(A) for some (Sub((|f)): k :V) which can sensibly be thought of as delivering a plausible average, k(A) of f in A. If (|m) contains small enough sub-domains everywhere in D, we can expect f to be near enough constant on each, thus supplying a value for k on these – from which we can expect the machinery of v, as a measure, to imply v's values.

The trick with which to identify whether a sub-domain of D is small enough is to require f to vary little. Among the properties we expect of integration is that the average of f over some region A should be in the convex hull of the values f takes on A. If f scarcely varies over A, this ties k's value tightly. With that, I can define at least the preliminaries of integration, using a positive scalar measure to integrate a vector function. It is of note that the measure used to perform this integration has to be positive, or the convex hull of f's values isn't guaranteed to contain the integral.

### Definitions

I'll work with a positive scalar domain, R, on which positivity induces the ordering r<r+s (for all r,s in R), which (in turn) induces a topology: the open sets are the (arbitrary) unions of sets of form between(s,t)= {r in R: s<r<t} with s,t in R. [This, in turn, induces a topology on {(I::R)} for any set I: the open sets are unions of sets of form I-between(s,t)= {(I:f:R): for each i in I, s(i)<f(i)<t(i)} with s, t in {(I::R)}.] Any topology on R induces one on any R-linear space, V, (by deciding that linear maps must be continuous): the open sets are (arbitrary) unions of finite intersections of sets of form (|w:S) with (V|w:R) linear and S open in R. [This coincides with the topology just given for {(I::R)}, when viewed as an R-linear space.]

I shall describe a subset, U, of an R-linear space, V, as convex precisely if {au+cv: u,v in U, a,c in R with a+c=1} is a subset of U. It is easy to show that an arbitrary intersection of convex subsets is convex; and that both empty and V are convex. A subset, W, of V, which need not be convex, is a subset of some convex subsets of V: if we intersect all the convex subsets of V which have W as a subset, we get a minimal convex subset of V which subsumes W. This is called the convex hull of W: name it convexHull(W).

For a function, (:f:V), I shall write convexHull(f) for convexHull((f|)): you can equally regard convexHull(W) as convexHull((W|w->w:W)). We can show that convexHull(W) is the closure of {∑(: i> r(i)v(i): V): (|r:R)=(|v:V) and ∑(r)=1}. This implicitly requires that ∑, the bulk action of addition, is defined on (|r)=(|v): this is guaranteed when (|r) is finite and may be possible beyond that: but taking the closure of the finite sums suffices.

I'll say that a measure (Sub(D):v:V) integrates a function (D:f:V) with respect to a measure (Sub(D):m:R), with V an R-linear space and R positive, precisely if: (|v) subsumes the intersection of Sub((|f)) and (|m) and, for every U in this intersection, there is some u in (¿ the closure of ?) convexHull(U:f|) for which m(U).u =v(U). It is then necessary to discover the circumstances under which, or the extent to which, such a v is the only measure which integrates f with respect to m.

When such a measure exists, we say that f is integrable with respect to m and denote the measure which does the job, v, as m∫(f); its restriction to the integral over some sub-domain, U, of (|f) is then m∫(U:f:), the integral of f's restriction to U.

There is a standard measure induced on any positive scalar domain, defined as the minimal Lebesgue measure, m, for which m(between(s,s+r))=r for every s,r in R. It should be noted that any other measure, n, satisfying these conditions then has ((|m):n:), its restriction to (|m), equal to m (so, in particular, (|m) is a subset of (|n)). [This induces one on functions from finite sets to R, by giving measure &product;(t-s) to the set given as between(s,t). This, in turn, induces measures on any R-linear space isomorphic to such a {(finite::R)}: the measure, d, induced by an (R-linear) isomorphism, ({(n|:R)}:e:V), is the minimal Lebesgue measure for which a subset, U, of V is measurable precisely if (|e:U) is, and has its measure. [Strictly, what I care about is that e is invertible.] Thus d(U) = m(|e:U), for a given measure (Sub({(n|:R)}):m:R), with n finite. Comparing the measure d, induced by invertible e, with the measure, h, induced by some parallel invertible, ({(n|:R)}:j:V) with inverse (V:i:{n|:R}), we will find that d(U) and h(U) are proportional to one another, in the same ratio to one another as the determinants of e and j. We have d(U) = m(|e:U), h(U) = m(|j:U), with (|j:U) = (U:i|), so (:e:U) is composable after i, yielding ((|e:U): e o i :(|j:U)). The important fact that comes next is that the composite, e o i, is defined on all of (|e)= (|j)= (i|)= {(n|:R)}, not just (|e:U). The study of the standard (Lebesgue) measure on (|e) then reveals that its measures of (|e:U) and (|j:U) are proportional to one another, with ratio equal to the determinant of e o i. Now, e o i is a linear automorphism on {(n::R)}: its determinant is defined as the linear map it induces on the n-antisymmetric self product space of this, ∧{(n::R)}. This is a 1-dimensional R-linear space, and the determinant of the identity on {(n::R)} is its identity isomorphism. It is thus natural to describe determinants of (general) automorphisms in terms of the scalar which, when multiplied by this natural unit, yields the actual determinant. One can, equally, define the determinant of any linear ({(n::R)}:f:W), for n finite but arbitrary R-linear W, via a standard antisymmetric product induced on arbitrary vector spaces. For (W|f:X) linear and n finite, n∧f is the linear map from n&tensor;W (W's self n-tensor product space) to n&tensor;X (well, actually to its linear subspace, n∧X) induced, linearly, from &tensor;(n|w:W)-> ∧(n| f o w :X), with ∧(n|x:X) defined to be ∑(permutations(n): s-> sign(s) . &tensor;(x) / n! ;). Here, n! denotes the factorial of n, defined inductively by factorial= (: 0->1, 1+i-> (1+i) . i! :) and &tensor; denoting the usual tensor product. The construction, ({linear (W|f:X)}| n∧ :{linear (n&tensor;W| :n&tensor;X)}) respects epic and iso, but not monic. There is a natural composition between {linear (W|:X)} and {linear (X|:Y)}: likewise between {linear (n∧W| :n∧X)} and {linear (n∧X| :n∧Y)}. For linear (W|f:X) and (X|g:Y), composing n∧f and n∧g yields n∧(fog). [Making that work depends on dividing by the n! in the definition: n has to be finite.] A nice fact: if mutually composable f, g have f o g invertible then g is monic and f is epic (I fear this depends on things I don't want to wot). This means that if (|g) is one-dimensional, then f and g are invertible (so (|f) and (g|) are also 1-dimensional). When we return to our invertible e and j, linear ({n::R}|:V), with i inverse to j (remember them ? and measures ?), We can, in fact, equally define det(i) to be a linear map from the n-antisymmetric self-product of V to that of {(n::R)} and it is not hard to show that it is the inverse of det(j) – so long as your antisymmetric product is scaled as, with p an arbitrary natural number, p! denoting its factorial, and W an arbitrary R-linear space. When we consider our determinant of e o i, we see that is is the composite of the determinants of e and i, in the same order: that of i is the inverse of that of j, so we have e's measure of U and j's measure of U in the same ratio as the determinants of e and j. map to the ]

Valid CSS ? Valid HTML ? Written by Eddy.