]> The Hilbert space formalism of Quantum Mechanics

[the concept of a Hilbert space]… offers one of the best mathematical formulations of quantum mechanics. In short, the states of a quantum mechanical system are vectors in a certain Hilbert space, the observables are hermitian operators on that space, the symmetries of the system are unitary operators, and measurements are orthogonal projections.

Wikipedia, Hilbert space, as seen in 2009 on April the 6th, but transformed to use links that'll work from here.

The Hilbert space formalism of Quantum Mechanics

Quantum mechanics has enough weirdness to it that its formalisation requires a significantly richer structure than the intuitively tractable three-dimensional space of real displacements that serves us so well in the description of the macroscopic world we inhabit. The Hilbert space is a structure rich enough to support the full panoply of quantum complications, yet retains as much as can be hoped for of the intuitive tractability of our familiar real three-dimensional geometry. Let me start with a very brief over-view of all the technical jargon used in Wikipedia's account; then I can devote sections to the parts thereof. Along the way, I'll get the chance to restate orthodoxy in my preferred forms.

A Hilbert space is a topological continuum which is also a vector space over the complex numbers, equipped with the most sensible length-like notion a complex vector space can have, a continuous positive-definite hermitian product. This last can be encoded as an invertible mapping, called the metric, from the Hilbert space to its dual, which enables us to divide any other hermitian product by the metric; the result, known as a hermitian operator, is a linear map from the Hilbert space to itself. The natural equivalent of an isometry (length-preserving transformation) in this context is a unitary operator.

For any given hermitian operator, the Hilbert space can be decomposed into orthogonal sub-spaces on each of which the operator acts simply as a scaling; each such sub-space is described as an eigenspace of the operator, the associated scaling is known as the eigenvalue for that space and each non-zero vector in the space is termed an eigenvector of the operator. This eigenspace decomposition lets us write any vector in the space as a sum of eigenvectors; one can define an orthogonal projection onto any eigenspace as a mapping which decomposes its input vector in this way, discards eigenvectors not in the selected eigenspace and returns what remains.

In quantum mechanics, the Hilbert space generaly arises as a sub-space of a more general space of wave functions of a system, namely the span of those that are solutions of the dynamical equation – archetypically Schroedinger's equation – governing the system. Solutions of the system's dynamical equations are identified with unit vectors in the Hilbert space. Superpositions of possible solutions are represented by linear combination of their corresponding unit vectors, followed by re-scaling to obtain a unit vector to represent the superposition.

Each real-valued quantity one can measure (e.g. total energy, or a single component of its momentum) on the system corresponds to a hermitian operator on the Hilbert space. Actually measuring such a quantity forces the system into a state represented by an eigenvector of the measured quantity's associated operator – such a state is called an eigenstate of the operator, associated with the same eigenvalue as the eigenvector in question – and yields the associated eigenvalue as measured value. The action of observing such a measurement projects the prior state vector orthogonally onto the eigenspace for the observed value; this selects the component of the prior state's unit vector, when decomposed into a sum of eigenvectors, in the given eigenspace. The probability of observing any given value is simply the squared magnitude of this component. A positive real scaling can then be applied to this coponent to make it a unit vector once more.

I shall deliberately gloss over the topological details that make a Hilbert space behave relatively sanely, compared to infinite-dimensional vector spaces in general; in calling it a continuum I tacitly assert that it is complete in the necessary sense, but all that actually matters is that it behaves enough like a finite-dimensional vector space to be intelligible.

The hermitian product

Our Hilbert space, S, is a vector space over the complex numbers; this means we can add members of our Hilbert space and/or scale them by arbitrary complex numbers; the results shall always be likewise members of the Hilbert space. As for real vector spaces, we can define linearity of mappings from the Hilbert space to some other (complex) linear space in terms of the mapping respecting addition and scaling: (:f|S) is linear precisely if, for every u, v in S and complex k, f(u+k.v) = f(u) +k.f(v). However, the complex numbers also support a conjugation (the real-linear map induced by preserving the real line but swapping the square roots of −1), *k ←k, allowing us to define a related (and, as it turns out, quite useful) partner notion to go with linearity: a mapping (:f|S) is described as antilinear precisely if, for every u, v in S and complex k, f(u+k.v) = f(u) +*k.f(v). (In that last term, the * only applies to k – there isn't necessarily any meaning to conjugating f(v) – so *k.f(v) should be read as (*k).f(v); I won't clutter the text with the extra parentheses.)

Given linearity we can, as for real vector spaces, define the dual of a Hilbert space to be the space of linear maps from it to its scalars, the complex numbers; so dual(S) = {linear maps ({complex}:|S)}. We could also define an anti-dual to be the corresponding space of antilinear maps; however, since each member of dual(S) is a mapping ({complex}::), simply composing * after it turns it into an anti-linear map ({complex}:|S); likewise, as * is self-inverse, composing * after any such anti-linear makes it linear, so turns it into a member of S. Thus compose * after it identifies dual(S) and S's anti-dual, which we can thus simply write as {*&on;w: w in dual(S)}, making a separate name for it uninteresting. The mapping *&on;w ←w is antilinear; as it identifies the anti-dual and dual, it's iso, so it's an antiisomorhpism,

In a real linear space, we obtain a length notion in terms of a mapping from our space to its dual; after consuming its first input we have a member of the dual which can consume a second input to be construed as inner product of the two inputs. We require it to be symmetric – that is, swapping the order of the inputs shouldn't change the final real output – and, in the simple case, to always give a positive value if the same non-zero vector is supplied as both inputs; that positive value's square root is the length of the vector. We can do similar for our Hilbert space: but note that, where the real space's equivalent consumed each input linearly, we can now chose between linear and anti-linear for each input. If we can arrange for g(u,u) to be a positive real for some u, we'll also want g(k.u, k.u) to be a positive real for any complex scalar k. If g consumes both inputs linearly we get k.k.g(u,u), if it consumes both antilinearly we get *k.*k.g(u,u); neither of these is real except for a limited range of values for k. However, k.*k is always real and positive, so if g consumes one input linearly and the other antilinearly, we win. If it consumes the second input antilinearly, we have a linear map from S to its anti-dual; composing this with the usual antiisomorphism, noted above, from there to the dual turns this into an antilinear map to the dual, which was the other possibility for consuming one input linearly and one antilinearly. So, if we want a length-like notion, we need an anti-linear map from S to its dual.

In the real case, we required swapping the order of inputs to not change the final real output. However, if we consume one input linearly and the other antilinearly, this constraint becomes infeasible: if we scale one of the inputs by some complex k, the final output is scaled by k if the input is consumed linearly but *k if antilinearly; swapping the order of inputs changes which of these scalings is applied. So the appropriate symmetry we can require is that swapping the order of inputs should conjugate the answer. In particular, when the same vector is used as both inputs, swapping order of inputs can't change the value, since it hasn't changed the expression, but does conjugate it; so the result is self-conjugate, i.e. real. Thus selecting this conjugate-symmetry automatically gives us real self-products. We can then define positive definite, just as in the real case, to mean that the final output must be positive if the same non-zero vector is used as both inputs.

So define a hermitian product to be an antilinear map (dual(S): x |S) for which x(u, v) = *x(v, u). We can then use a positive-definite hermitian product g to define our notion of length, just as in a real linear space, via length = ({non-negative reals}: √g(u, u) ←u |S).

The zero member of dual(S) maps every vector in S to 0 and we know each g(u) maps at least u to non-zero, unless u is zero, so the only input to g which yields the zero member of dual(S) is the zero member of S; consequently, distinct inputs to g yield distinct outputs (as g is linear, so maps the differece between inputs to a difference in outputs), making g a monic mapping. One consequence of the Hilbert space being a continuum (formally, topologically complete) is that any monic continuous linear or antilinear map between it and its dual is invertible (this would be automatic if the space's dimension were finite, as S and its dual would be isomorphic). We thus get an inverse (S:|dual(S)) for g; this is trivially antilinear, like g. We can embed S in dual(dual(S)) as (: ({scalars}: w(v) ←w |dual(S)) ←v |S), enabling us to interpret g's inverse as an antlinear map from dual(S) to its dual; this is in fact conjugate-symmetric, like g, so g's inverse is a hermitian product on dual(S); it is also positive-definite, although this won't interest me much here. I'll use g\f to denote the composite of g's inverse after (i.e. to the left of) any suitable f and h/g for the matching composite on the right, for any suitable h; if I need to refer to g's inverse directly, I'll thus use 1/g or g\1, interpreting 1 as a relevant identity linear map in each case.

So now consider any other hermitian product, x, on S; composing g's inverse on its left we get g\x. For u in S and complex k, antilinear x maps k.u to *k.x(u) in dual(S) and going backwards through antilinear g maps this to k.(g\x)(u) in S. Since both x and g's inverse respect addition we can infer that so does their composite, hence g\x is simply linear (S:|S). Composing g after (i.e. on the left of) this linear map gets us x back. These linear (S:|S) which, when composed before g, give hermitian products, encode the observables of our system: they are termed hermitian operators. The attentive might notice that we should properly term them g-hermitian operators, since their specification depends on g.

The hermitian transpose

Above, I've defined an antilinear (dual(S): x |S) to be a hermitian product precisely if every x(u, v) = *x(v, u); now let's pause to consider a general antilinear (dual(S): x |S), not necessarily a hermitian product, and the induced (: ({scalar}: *x(v, u) ←v |S) ←u |S) that would be equal to x if it were a hermitian product. I'll refer to this induced mapping as the hermitian transpose of x and denote it †(x); so we have

antilinear (dual(S): |S) precisely when x is. (The notation here adapts one of several ways, x, that orthodoxy denotes the hermitian transpose; I've also seen xH.) A hermitian product (also called a hermitian form) is thus an antilinear map from a Hilbert space to its dual that's equal to its hermitian transpose.

Now, for some invertible antilinear (dual(S): g |S), consider a linear map (S: f |S); for it to be g-hermitian, we would need †(g) = g and †(g&on;f) = g&on;f. The latter can be re-written as f = g\†(g&on;f); it's only really of interest when g is a hermitian product, but it's defined whenever g is invertible. By analogy with the g-hermitian operator we get when it's symmetric, I'll refer to g\†(g&on;f) as the g-hermitian transpose of f; like f, it is a linear map (S: |S).

With linear maps, transpose(h&on;k) is transpose(k)&on;transpose(h); so we can hope to be able to re-write †(g&on;f) in terms of †(g) – regardless of whether this is equal to g – and some sort of transpose of f. I can, in fact, apply the above definition of † to f, since I carefully left out constraints, provided I select the right spaces to take the two inputs from; to be able to write *f(v, u), we need v to be in S, putting f(v) in S; for u to be an input to it, we need to interpret f(v) as in dual(dual(S)), as above, so u must be in dual(S); in which case the output f(v, u) is indeed scalar, so can be conjugated. So †(f) = (: ({scalars}: *u(f(v)) ← v |S) ←u |dual(S)) = (: *&on;u&on;f ←u :); but it's antilinear in both inputs, so antilinear from dual(S) to the antidual of S. So †(f) isn't a good candidate for f's factor in a decomposition of †(g&on;f); it doesn't accept †(g)'s outputs as inputs, it would produce the wrong outputs (in the anti-dual) if it did and (when composed after antilinear †(g)) would give a linear map rather than the antilinear †(g&on;f).

So let's look at the constraints †(f) violated there: we want to compose it after †(g), so it needs to accept inputs from dual(S); we want the composite to be, like †(g), antilinear; and we need the outputs to be in dual(S). So what we need is a linear (dual(S): |dual(S)) – and that's exactly what the transpose of f is, once we use the usual trick of reading S via its embedding in dual(dual(S)); just leave off the conjugation in †(f) and we have transpose(f) = (dual(S): w&on;f ←w |dual(S)). So let's see if that gets what we want: for given u, v in S,

transpose(f, †(g, u), v)
= f(v, †(g, u))

in which f(v), in S, is being given †(g, u), in dual(S), as input; so we interpret f(v) as being in dual(dual(S)), giving

= †(g, u, f(v))
= *(g, f(v), u)
= *((g&on;f)(v, u))

so, indeed, transpose(f)&on;†(g) = †(g&on;f). So we can write our g-hermitian transpose of f, g\†(g&on;f) as g\transpose(f)&on;†(g); and, when g is a hermitian form, this just reduces to g\transpose(f)&on;g. Note, in passing, that the co-ordinates of this will (thanks to the antilinearity of g) be conjugated co-ordinates of f (when our co-ordinates arise from a basis that diagonalises g), even though we've only simply transposed f.

This last leads to orthodoxy, which focusses rather more on co-ordinate, denoting f's g-hermitian dual in some way that looks like it's applied the same operator, to f, that corresponds to † as applied to g; see above for how that isn't valid with in present expression of the subject matter (since †(f) can sensibly be defined and isn't the same thing as either g\†(g&on;f) or transpose(f), although it looks like the latter, with its co-ordinates transposed, when g is diagonal).


In any linear space V over {scalars}, for any set B of members of V, we can define the span of B to be:

i.e. the set of values of finite sums of scalar multiples of members of B. A member of the span might arise more than one way if sum(h.b) = sum(k.b) for two distinct ({scalars}: :n) h and k and a single (B: b :n) with n natural; subtracting one from another, we then get sum((h −k).b) = 0 with ({scalars}: h −k :n) non-zero on at least some inputs in n; any such mapping ({scalars}: r :n), with at least some non-zero output, for which sum(: r(i).b(i) ←i :n) = 0 for some monic (B: b :n) is known as a linear dependence among the members of B (specifically, the b(i) corresponding to non-zero r(i) values); and B is described as linearly independent if there are no linear dependencies among its members.

Note that, if we have ({scalars}: h :n), (B: b :n), ({scalars}: k :m) and (B: c :m) with equal sum(h.b) = sum(k.c), we can rearrange these as a single ({scalars}: j :n+m), (B: a :n+m), e.g. for i in n, j(i) = h(i) and a(i) = b(i) while, for i in m, j(i+n) = k(i) and a(i+n) = c(i). This might not make a be monic, even if b and c were, as b and c might have an output in common. We can, of course, factor a via some monic (B: x :p) with p natural (and smaller than n+m) that has the same outputs, by mapping each index in n+m to some index in p. Provided each of b and c is monic, each of h and k can then be represented by a ({scalars}: :p) mapping, that uses the same re-indexing as b or c, via j's indices, to p's. This reduces our case with distinct (B: b :n) and (B: c :m) to one with a shared (B: x :p), as used in the specifications. If we try to construct a ({scalars} y :p) corresponding to ({scalars}: j :n+m) by mapping each index in n+m to the same index i in p, y(i) needs to be equal to each entry in j at an index in n+m that comes via i; which can only happen if j's entries are equal whenever a's are; avoiding this complication is the reason for insisting on b being monic in the specifications, above.

A linear space if finite-dimensional if there is some finite set of its members whose span is the whole space; and one can then show that there is some subset of this that is furthermore linearly independent; and that any two finite linearly independent spanning sets of a vector space have the same number of members; this number is known as the dimension of the space. Each such linearly-independent spanning set is called a basis of the space. In the case of our Hilbert space, however, some of the reasoning of that breaks down; even when we have a (countably) infinite subset of the space, the best we can hope for is that the span is dense in the space – that is, for any vector in the space and any positive distance (no matter how small), some member of the span is within that distance of the given vector. So, in the infinite-dimensional case, we define a basis to be a linearly independent countable set of vectors whose span is dense in the Hilbert space. That it's countable means we can express the set as the set of outputs of some function (V: b :{naturals}) and we chose to use such a b for which, for each right value n of b, every member of n (i.e. every natural less than n) is also a right value of b; thus (:b|) is either a natural or the whole of {naturals}. So, although a basis is formally a set of vectors, we invariably deal with it as a sequence of vectors (V: b |N) with N either a ntural or the whole of {naturals}. In support of this, we extend the definition of span, above, to accept not just sets of members of V but also mappings to V, so that span(b) = span((|b:)), the span of the outputs of a mapping.

Dual bases and tensor products

When we have a basis (:b|N) of a vector space V, a member of its span is always (: h(i).b(i) ←i :n) for some (: h :n) with n a natural subsumed by N; and, since b is linearly independent, there is exactly one such h for any given vector in span(b); this lets us define, for each i in N, a mapping from span(b) to scalars, which maps any given vector sum(b.h) in span(b) to h(i). The continuum properties of the Hilbert space ensure that this mapping is cntinuous and thus that we can extend it to the whole of V, not just span(b), so that we get a member of dual(V) associated with each b(i), that maps each member sum(h.b) of span(b) to h(i); let (dual(V): q |N) be the mapping whose q(i) is this member of dual(V), for each i in N. This mapping q is known as the dual of b; and, indeed, b is then the dual of q, equally. (Note that, though related, this is not the same dual as dual(V) is of V.) In span(b), we have q(i, sum(h.b)) = h(i), so each v in span(b) has v = sum(V: q(i, v).b(i) ← i |N); although (when our Hilbert space has infinite dimension) this is an infinite sum (over all of N), we know v is sum(h.b) for some (:h:n) with n finite, so all but finitely many terms in our infinite sum are zero. For the case of general v in (infinite-dimensional) V, not necessarily in span(b), however, we do indeed get an infinite sum; but continuity of our metric then ensures that this infinite sum is indeed (in a suitable sense) convergent and we can, with a clean conscience, write v = sum(: b(i).q(i, v) ←i |N) for all v in V.

The tensor product gives us a multiplication between members of V and its dual for which u×w = (V: u.w(v) ←v |V); this is a linear map (V: |V). When we write the above v = sum(: b(i).q(i, v) ←i :) in this form, we find v = sum(: b(i)×q(i) ←i |N)(v) and thus sum(b×q) is the identity on V.

The tensor product also lets us multiply members of V's dual together in the same way; now, both w and u are in dual(V) so u×w = (dual(V): u.w(v) ←v :V) now produces its outputs in dual(V) rather than V. That'll let us form, for a given basis b with dual q, a mapping sum(q×q) which maps each b(i) to the matching q(i) and extends this mapping linearly to map the whle of V to (as it happens) the whole of dual(V); it should be noted, however, that this mapping (unlike sum(b×q) as identity) depends on choice of basis; if we chose a different basis in place of b the tensor-square of its dual, sum(q×q), will be a different linear map.

Now, for u, v in dual(V), u×w is linear (dual(V): |V); but we're more interested in hermitian forms, which are antilinear (dual(V): |V); for these, we need to conjugate the (scalar) output of w, so we'd want u×(*&on;w) = (dual(V): u.*(w(v)) ←v |V). Since * (presently) only acts on scalars, I can overload it as a binary operator whose left operand is any vector or tensor quantity and whose right operand is any mapping with scalar outputs, so that u*w is this u×(*&on;)w. With this, any basis b with dual q yields a sum(q*q) which maps b(i) to q(i) and extends this anti-linearly to map the whole of V to (as it happens) the whole of dual(V). When we apply this to two members u, v of V we get sum(q*q)(u, v) = sum(dual(V): q(i).*(q(i, u)) ←i :)(v) = sum({scalars}: q(i, v).*(q(i, u)) ←i :), which is manifestly linear in v and antilinear in u. If we conjugate its output, we switch which of q(i, v) and q(i, u) is conjugated and thus get sum(q*q)(v, u), thereby showing that sum(q*q) is conjugate-symmetric. If we give it u = v, we get sum(: q(i, v).*(q(i, v)) ←i :) in which each term is h.*h for some scalar h; this is real and ≥ 0, with equality only if h is 0. So sum(q*q)(v, v) is real and ≥ 0, with equality only if every q(i, v) is zero, i.e. when v is zero; thus sum(q*q) is positive-definite.

Just as sum(q*q) is a positive-definite hermitian form on V, sum(b*b) is a positive-definite hermitian form on dual(V). Since the former is an antilinear map (dual(V): |V) and the latter is antilinear (V: |dual(V)), we can compose them in either order; a composite of two antilinears is linear, so the composite is, in each case an identity, one way round on V, the other way round on dual(V). Consequently, sum(q*q) and sum(b*b) are mutually inverse.


We have sum(b×q) as the identity on V; dually, sum(q×b) is the identity on dual(V); so let's compose these before and after some general hermitian form g; we'll get

g = sum(q×b)&on;g&on;sum(b×q)
= sum(: q(j)×b(j) ←j :)&on;g&on;sum(: b(i)×q(i) ←i :)
= sum(: (q(j)×b(j))·;g·(b(i)×q(i)) ←[i, j] :)

in which g·b(i) is just g(b(i)); as g acts antilinearly on its first argument, any scaling we apply to b(i) gets conjugated; and b(i)×q(i) is a linear map that does indeed apply a scaling (q(i)'s output) to b(i); so g·(b(i)×q(i)) is just g(b(i))×*&on;q(i)

= sum(: b(j)·;g(b(i)).q(j)×*&on;q(i) ←[i, j] :)
= sum(: g(b(i), b(j)).q(j)*q(i) ←[i, j] :)

in which each g(b(i), b(j)) is a scalar. This, indeed, just shows that the q(j)*q(i) for various i, j equip us with a basis of hermitian forms on V.

For a general hermitian form g and basis b with dual q, the expression of g in terms of this basis of hermitian forms involves q(j)*q(i) for diverse i, j in N – in contrast to our sum(q*q), which only has terms with i = j. A hermitian form is described as diagonal with respect to a basis b, with dual q, precisely if it's in the closure of the span of q*q, i.e. it's sum(h.q*q) for some sequence ({real}: h :N); note that, if a ({scalars}: h :N) has g = sum(h.q*q) hermitian, it is necessarily a ({real}: h :N), since each h(i) = g(b(i), b(i)) is real, thanks to g being hermitian. The fun thing about hermitian forms is that, for every hermitian form, there is some basis with respect to which it is diagonal; when the form is positive definite, we can scale each b(i) by √h(i) to get a revised basis whose sum(q*q) is the form. Better yet, if you have a positive definite hermitian form and any other hermitian form, there is some basis with respect to which both are diagonal (and the positive definite one is sum(q*q), once we do some simple rescaling). I'm not going to prove that here, as it's one of the Big Theorems of linear algebra; I'm just going to use it.

That last has the important consequence that we can always put any hermitian operator also in a diagonal form; if its composite before the metric is sum(h.q*q) for some basis q of the dual, dual to a basis b of V, with sum(q*q) as the metric, then the operator is sum(q*q)\sum(h.q*q) = sum(b*b)·sum(h.q*q) = sum(h.b×q) = sum(: h(i).b(i)×q(i) ←i :). The trick, then, is to find the right basis with respect to which to describe our hermitian operator.

Note, however, that this (potentially) uses up our freedom to simplify by choice of basis. (We may have a little left, if the h(i) aren't all distinct values.) As a result, the basis that puts one hermitian operator in diagonal form (while the metric is in unit-diagonal form) needn't do the same for another; when the basis that doagonalises one hermitian operator does also diagonalise another, the two operators commute (of you compose either after the other, you get the same linear map; it's also diagonal, for the given basis, with each diagonal entry being the (scalr) product of the corresponding diagonal entries of the two composed operators). Indeed, the converse shall also prove to be true: if they commute, then they can indeed be simultaneously diagonalized.

Co-ordinates of hermitian transposes

If we have an antilinear (dual(S): g |S) and a basis b of S, with dual q, the co-rdinate form has g = (: g(b(i), b(j)).q(j)×q(i) ←[i, j] :) while that of †(g) has, as q(j)×q(i) co-ordinate, †(g, b(i), b(j)) = *(g(b(j), b(i)), swapping the indices of the co-ordinate of g and conjugating. Of course, when g is a hermitian form this makes no difference, since g = †g. When we go on to put it into diagonal form, all the co-ordinates are real (zero off the diagonal, equal to own conjugate on the diagonal), hiding the conjugation entirely.

Given a hermitian form g, a linear (S: f |S) gave us (dual(S): transpose(f) |dual(S)) and f's g-hermitian transpose just uses g to bridge between S and its dual at either end of this, to get back to something (S: |S). Now let's look at the coordinates – given a basis b of S with dual q, that diagonalise g as sum(q*q):

g\†(g&on;f) = g\transpose(f)&on;g
= sum(b*b)·transpose(f)·sum(q*q)
= sum(b*b)·sum(: transpose(f)·q(i)*q(i) ←i :)
= sum(: b(j)*b(j) ←j :)·sum(: (q(i)·f)*q(i) ←i :)
= sum(: b(j)*b(j)·(q(i)·f)*q(i) ←[i,j] :)
= sum(: *(q(i)·f·b(j)).b(j)×q(i) ←[i,j] :)

Comparing this with f = sum(: (q(j)·f·b(i)).b(j)×q(i) ←[i, j]), we see that the co-ordinates of f's g-hermitian transpose are conjugates of those from f, with the basis-indices swapped. So, as remarked above, although the g-hermitian transpose of f uses the plain transpose of f (not †(f)), its co-ordinates are conjugated aw well as flipped.

Orthodoxy generally deals with the hermitian conjugate in a context of matrices of co-ordinates with respect to a given basis and its dual, that diagonalise the metric, both of which are taken for granted and (mostly) left unmentioned. In such a context, the hermitian g-transpose of a matrix is (as noted above) simply obtained by turning rows into columns and columns into rows while conjugating each co-ordinate; however, in the case of endomorphisms, this hides the implicit change from one space to another, while eliding mention of the two applications of g that change the result back to the space it started in. As noted above, this conjugate and flip action on matrices (of coordinates) doesn't simply correspond to the hermitian transpose I describe above, although (when each is correctly worked through and applied) the two approaches end up describing the same things; in particular, what I mean by transpose is always a swap of the parameters of some function, regardless of how it transforms any particular representation of that function (for orthodoxy, the parameters are indices in a matrix; for the present treatment, the parameters are the vectors acted on by linear maps). The co-ordinate description works – but at the expense of obscuring important aspects of what's happening in the linear spaces. The (matrix) hermitian transpose is also known variously as the conjugate transpose, adjoint, adjugate or hermitian adjoint.

Observables: hermitian operators

We're now ready to return to our Hilbert space S in which our quantum states are represented by the unit sphere of some hermitian form. As specified above, a hermitian operator is a linear map (S:|S) which, when composed before our hermitian product g, give a hermitian composite.

Now let's look at two eigenvectors, u and v with eigenvalues h and k: so k.v = f(v) and h.u = f(u). Consider *h.g(u, v) = g(f(u), v) = (g&on;f)(u, v); but g&on;f = †(g&on;f), so *h.g(u, v) = *((g&on;f)(v, u)) = *(g(f(v), u)) = g(u, f(v)), as g = †(g), whence *h.g(u, v) = k.g(u, v); thus either g(u, v) is zero or *h = k. The case u = v has h = k, so gives *h = k = h, which tells us that every eigenvalue is real. The space of eigenvectors with any given real eigenvalue is referred to as the eigenspace of that eigenvalue; it may have many independent directions; but any two eigenvectors with distinct eigenvalues are perpendicular.

For each eigenvalue, we can thus chose an orthonormal basis of its eigenspace; by combining these from all eigenvalues, we can obtain an orthonormal basis of the whole Hilbert space, described as an eigenbasis for the given hermitian operator; this diagonalises both the metric and the hermitian operator.

Suppose we have two observables U, V and that they commute, that is U&on;V = V&on;U; consider an eigenvector v of V with eigenvalue k: k.U(v) = U(V(v)) = V(U(v)), so U(v) is also an eigenvector of V with eigenvalue k. Thus U preserves each eigenspace of V; and, conversely, V preserves each eigenspace of U. By considering U's restriction to each V-eigenspace, we can construct a U-eigenbasis of each V-eigenspace; this U-eigenbasis is then also a V-eigenbasis of that V-eigenspace; combining these across all V-eigenspaces we get a mutual eigenbasis for U and V; which we could equally have obtained by gathering V-eigenbases of U-eigenspaces. Such a basis puts both U and V into diagonal form, where it is self-evident they commute (since multiplying diagonal hermitian matrices just multiplies reals in corresponding positions, which gets the same result in either order); indeed, any two observables that are both diagonalised by some choice of basis do in fact commute. Thus observables commute precisely if they are simultaneously diagonalisable.

Measurement: orthogonal projection

When we measure some quantity, the quantity measured is an observable and the measured value is an eigenvalue of that observable; after the measurement, the system's state is in an eigenvector in the eigenspace of that eigenvalue. The act of measuring effectively decomposes the prior state into components, each in a single eigenspace of the observable, and selects one of these components; the probability of each component being selected (and thus its eigenspace's eigenvalue being observed) is proportional to the metric's squared length of that component. We normally represent a state by a unit vector (with respect to our given metric), which leads to renormalising the selected component after; however, this is purely a convention of representation, for convenience (with which the probabilities, during an observation, are just the squared magnitudes of the components, which sum to one, saving the need to scale them proportionately to get probabilities).

For each eigenspace of the observable measured, we can define a hermitian operator (so, in fact, an observable) that acts as the identity on that eigenspace while mapping to zero all vectors in the observable's other eigenspaces. Such a mapping only has outputs in the eigenspace it selects, on which it acts as the identity, so composes with itself to give itself; it is transitive. Indeed, if a mapping f is transitive, so relates x to z whenever it relates x to y and y to z, we can state this as f(z) = f(f(z)) for all z, hence f acts as the identity on its outputs. Transitive mappings are (in general) known as idempotent and (at least on linear spaces) as projections.

Consider a linear map (V: f |V) on a linear space that acts as the identity on its outputs; these forma linear sus-space Im(f) of V. We also have Ker(f) = {v in V: f(v) = 0}, the kernel of f. Since f(f(v)) = f(v) for all v in V, f acts as the identity on Im(f), so Im(f) is f's eigenspace with eigenvalue 1; while Ker(f) is its eigenspace with eigenvalue 0. For any v in V, consider u = v −f(v); by linearit yof f this has f(u) = f(v) −f(f(v)) and transitivity of f gives f(f(v)) = f(v), so f(u) = f(v) −f(v) = 0. Thus u is in Ker(f), f(v) is in Im(f) and we have v = u +f(v) as a sum of two vectors, one in Ker(f), the other in Im(f). Thus every projection has exactly two eigenspaces, Im(f) with eigenvalue 1 and Ker(f) with eigenvalue 0; these span the whole space.

Now, without assuming our projection (S: f |S) is associated with an eigenspace of a hermitian operator, consider its interaction with our metric, (dual(S): g |S). If there are u in Ker(f) and v in Im(f) for which g(u, v) is non-zero, then g(f(u), v) = 0, as f(u) is zero, but g(u, f(v)) = f(u, v), as f(v) = v; so g(f(u), v) isn't conjugate to g(u, f(v)), so f isn't g-hermitian. Otherwise, every u in Ker(f) and v in Im(f) have g(u, v) = 0; two general members of S are u +v and x +y with u, x in Ker(f) and x, y in Im(f); these have g(u +v, f(x +y)) = g(u +v, y) = g(u, y) +g(v, y) = g(v, y) as u in Ker(f), y in Im(f) imply g(u, y) = 0; likewise, g(v, x) = *(g(x, v)) = *0 = 0 so g(f(u +v), x +y) = g(v, x +y) = g(v, x) + g(v, y) = g(v, y) and we have g(u +v, f(x +y)) = so g(f(u +v), x +y), so f is g-hermitian. So a projection is g-hermitian precisely if its image is g-perpendicular to its kernel; I'll describe such a linear map as a projector for g or as a g-projector; and, as ever, skip the reference to g when context makes clear which metric is implicated, which it usually shall. A projector may also be referred to orthodoxly as an orthogonal projection.

In the case of projectors onto eigenspaces of some hermitian operator, the eigenspaces are all mutually orthogonal so the projections are indeed projectors. Each such projector commutes with the operator, the composite acting simply as scaling by the eigenvalue on the given eigenspace while mapping all other eigenspaces to zero; two such projectors, for distinct eigenspaces of the same operator, have composite zero either way round, so also commute. The projectors onto eigenspaces of a given hermitian operator are all simultaneously diagonalised by any basis that diagonalises the operator.

Symmetries: unitary operators

There are many bases that put any given (positive-definite symmetric hermitian) metric into diagonal form; when we change between such bases, our co-ordinate representation of any given vector in the Hilbert space is transformed by multiplying it by a square matrix (albeit of possibly infinite dimension, so square is here used in a rather technical sense), each row and each column of which has 1 as its sum of squared moduli of entries. Each entry in such a matrix (describing a change of representation, without any change of the thing represented) is the contraction of a basis member of the initial basis with a member of the final dual basis; this is equally (depending on exactly how you represent each, possibly the transpose of) the matrix that describes the linear map, from the Hilbert space to itself, that maps each member of the new basis to a corresponding member of the old basis. Swapping the two bases transposes the matrix, it doesn't actually matter which of the things represented we consider; so I chose to discuss the linear map rather than the change of co-ordinate representations.

Now, a linear map f that maps one basis to another, each of which diagonalises our metric g, necessarily maps each vector to one to which g ascribes the same length; indeed, it even, maps any two input vectors u, v to two outputs f(u), f(v) whose value of g is the same, g(f(u), f(v)) = g(u, v). Now, g(x, f(v)) = †(g, f(v), x) = †(g&on;f, v, x) so (: (: g(f(u), f(v)) ← v :) ←u :) = †(g&on;f)&on;f; so, when f maps one basis that diagonalises g to another, we get †(g&on;f)&on;f = g. When f satisfies this, I'll describe it as unitary for g, or g-unitary. We're usually meet this where g is a hermitian product, so †(g&on;f)&on;f = transpose(f)&on;†(g)&on;f = transpose(f)&on;g&on;f.

As above for hermitian transposes of linear maps, orthodox treatments are apt to express f being g-unitary in terms of something that looks like it means †(f) composed on the left of f, possibly via g, being equal to g; as before, this arises from orthodoxy focussing more on co-ordinates, where either transpose(f)'s or f's co-ordinates need to be conjugated when computing transpose(f)&on;g&on;f. In the present treatment, that conjugation is implicit in g being antilinear.

Hasty rumination

The phase-space of our system is encoded as a sub-manifold of the unit sphere in some Hilbert space. The Hilbert formalism describes its mechanics in any terms that apply to all vectors in the Hilbert space, implicating the model only to the tune of some metric – a positive-definite hermitian form whose unit sphere has our phase-space as a submanifold. The actual choice of which positive-definite hermitian choice you use makes little practical difference, because the Hilbet space model implicitly renormalises all the time, with respect to the chosen metric. This means the manifolds you get from different choices of mutually-diagonalisable metrics (which would be any pair of them in the finite-dimensional case, but our Hilbert space may be infinite-dimensional; however, being a Hilbert space has some continuity consequences, which may mean all continuous metrics are mutually-diagonalisable; I can't remember) are homeomorphic under a linear map (diagonalised by some choice of basis that has both metrics diagonal) of the Hilbert space in which they are nominally embedded. So the Hilbert space's mechanics – of arithmetic with vectors, and thus of linear maps from the Hilbert space to itself, and how they interact with the metric in use – is largely orthogonal to the question of what co-ordinates it makes sense to use in our sub-manifold phase-space. Our choice to encode those via eigenvaluess of eigenvectors of linear endomorphisms brings in constraints on endomorphisms in terms of how they relate to the metric, that ensure the endomorphisms have a symmetry with respect to the metric which, in turn, ensures they have eigenvalues that can encode the co-ordinates of our phase-space.

It would be worth examining carefully which bits of that (if any) are physics and which are abstract theory, developed in terms that could be understood regardless of whether they were capable of describing the physics.

Valid CSSValid XHTML 1.1 Written by Eddy.