]>
[the concept of a Hilbert space]… offers one of the best mathematical formulations of quantum mechanics. In short, the states of a quantum mechanical system are vectors in a certain Hilbert space, the observables are hermitian operators on that space, the symmetries of the system are unitary operators, and measurements are orthogonal projections.
Wikipedia, Hilbert space, as seen in 2009 on April the 6th, but transformed to use links that'll work from here.
Quantum mechanics has enough weirdness to it that its formalisation requires a significantly richer structure than the intuitively tractable three-dimensional space of real displacements that serves us so well in the description of the macroscopic world we inhabit. The Hilbert space is a structure rich enough to support the full panoply of quantum complications, yet retains as much as can be hoped for of the intuitive tractability of our familiar real three-dimensional geometry. Let me start with a very brief over-view of all the technical jargon used in Wikipedia's account; then I can devote sections to the parts thereof. Along the way, I'll get the chance to restate orthodoxy in my preferred forms.
A Hilbert space is a topological continuum which is also a vector space
over the complex numbers, equipped
with the most sensible length-like notion a complex vector space can have, a
continuous positive-definite hermitian product. This last
can be encoded as an invertible mapping, called the metric, from the Hilbert
space to its dual, which enables us to divide
any other hermitian product
by the metric; the result, known as a hermitian operator, is a linear map from
the Hilbert space to itself. The natural equivalent of an isometry
(length-preserving transformation) in this context is a unitary operator.
For any given hermitian operator, the Hilbert space can be decomposed into orthogonal sub-spaces on each of which the operator acts simply as a scaling; each such sub-space is described as an eigenspace of the operator, the associated scaling is known as the eigenvalue for that space and each non-zero vector in the space is termed an eigenvector of the operator. This eigenspace decomposition lets us write any vector in the space as a sum of eigenvectors; one can define an orthogonal projection onto any eigenspace as a mapping which decomposes its input vector in this way, discards eigenvectors not in the selected eigenspace and returns what remains.
In quantum mechanics, the Hilbert space generaly arises as a sub-space of a more general space of wave functions of a system, namely the span of those that are solutions of the dynamical equation – archetypically Schroedinger's equation – governing the system. Solutions of the system's dynamical equations are identified with unit vectors in the Hilbert space. Superpositions of possible solutions are represented by linear combination of their corresponding unit vectors, followed by re-scaling to obtain a unit vector to represent the superposition.
Each real-valued quantity one can measure (e.g. total energy, or a single
component of its momentum) on the system corresponds to a hermitian operator on
the Hilbert space. Actually measuring such a quantity forces the system into a
state represented by an eigenvector of the measured quantity's associated
operator – such a state is called an eigenstate
of the operator,
associated with the same eigenvalue as the eigenvector in question – and
yields the associated eigenvalue as measured value. The action of observing
such a measurement projects the prior state vector orthogonally onto the
eigenspace for the observed value; this selects the component of the prior
state's unit vector, when decomposed into a sum of eigenvectors, in the given
eigenspace. The probability of observing any given value is simply the squared
magnitude of this component. A positive real scaling can then be applied to
this coponent to make it a unit vector once more.
I shall deliberately gloss over the topological details that make a Hilbert
space behave relatively sanely, compared to infinite-dimensional vector spaces
in general; in calling it a continuum I tacitly assert that it
is complete
in the necessary sense, but all that actually matters is that
it behaves enough like a finite-dimensional vector space to be
intelligible.
Our Hilbert space, S, is a vector space over the complex numbers; this means we can add members of our Hilbert space and/or scale them by arbitrary complex numbers; the results shall always be likewise members of the Hilbert space. As for real vector spaces, we can define linearity of mappings from the Hilbert space to some other (complex) linear space in terms of the mapping respecting addition and scaling: (:f|S) is linear precisely if, for every u, v in S and complex k, f(u+k.v) = f(u) +k.f(v). However, the complex numbers also support a conjugation (the real-linear map induced by preserving the real line but swapping the square roots of −1), *k ←k, allowing us to define a related (and, as it turns out, quite useful) partner notion to go with linearity: a mapping (:f|S) is described as antilinear precisely if, for every u, v in S and complex k, f(u+k.v) = f(u) +*k.f(v). (In that last term, the * only applies to k – there isn't necessarily any meaning to conjugating f(v) – so *k.f(v) should be read as (*k).f(v); I won't clutter the text with the extra parentheses.)
Given linearity we can, as for real vector spaces, define
the dual of a Hilbert space to be the space of linear maps
from it to its scalars, the complex numbers; so dual(S) = {linear maps
({complex}:|S)}. We could also define an anti-dual
to be the
corresponding space of antilinear maps; however, since each member of dual(S)
is a mapping ({complex}::), simply composing * after it turns it into an
anti-linear map ({complex}:|S); likewise, as * is self-inverse, composing
* after any such anti-linear makes it linear, so turns it into a member of S.
Thus compose * after it
identifies dual(S) and S's anti-dual, which we
can thus simply write as {*&on;w: w in dual(S)}, making a separate name for it
uninteresting. The mapping *&on;w ←w is antilinear; as it identifies the
anti-dual and dual, it's iso, so it's an antiisomorhpism,
In a real linear space, we obtain a length notion in terms of a mapping
from our space to its dual; after consuming its first input we have a member
of the dual which can consume a second input to be construed as inner
product
of the two inputs. We require it to be symmetric – that is,
swapping the order of the inputs shouldn't change the final real output
– and, in the simple case, to always give a positive value if the same
non-zero vector is supplied as both inputs; that positive value's square root
is the length
of the vector. We can do similar for our Hilbert space:
but note that, where the real space's equivalent consumed each input linearly,
we can now chose between linear and anti-linear for each input. If we can
arrange for g(u,u) to be a positive real for some u, we'll also want g(k.u,
k.u) to be a positive real for any complex scalar k. If g consumes both
inputs linearly we get k.k.g(u,u), if it consumes both antilinearly we get
*k.*k.g(u,u); neither of these is real except for a limited range of values
for k. However, k.*k is always real and positive, so if g consumes one input
linearly and the other antilinearly, we win. If it consumes the second input
antilinearly, we have a linear map from S to its anti-dual; composing this
with the usual antiisomorphism, noted above, from there to the dual turns this
into an antilinear map to the dual, which was the other possibility for
consuming one input linearly and one antilinearly. So, if we want a
length-like notion, we need an anti-linear map from S to its
dual.
In the real case, we required swapping the order of inputs to not change
the final real output. However, if we consume one input linearly and the
other antilinearly, this constraint becomes infeasible: if we scale one of the
inputs by some complex k, the final output is scaled by k if the input is
consumed linearly but *k if antilinearly; swapping the order of inputs changes
which of these scalings is applied. So the appropriate symmetry we can
require is that swapping the order of inputs should conjugate the
answer. In particular, when the same vector is used as both inputs, swapping
order of inputs can't change the value, since it hasn't changed the
expression, but does conjugate it; so the result is self-conjugate,
i.e. real. Thus selecting this conjugate-symmetry automatically gives us real
self-products. We can then define positive definite
, just as in the
real case, to mean that the final output must be positive if the same non-zero
vector is used as both inputs.
So define a hermitian product to be an antilinear map (dual(S): x |S) for which x(u, v) = *x(v, u). We can then use a positive-definite hermitian product g to define our notion of length, just as in a real linear space, via length = ({non-negative reals}: √g(u, u) ←u |S).
The zero member of dual(S) maps every vector in S to 0 and we know each g(u) maps at least u to non-zero, unless u is zero, so the only input to g which yields the zero member of dual(S) is the zero member of S; consequently, distinct inputs to g yield distinct outputs (as g is linear, so maps the differece between inputs to a difference in outputs), making g a monic mapping. One consequence of the Hilbert space being a continuum (formally, topologically complete) is that any monic continuous linear or antilinear map between it and its dual is invertible (this would be automatic if the space's dimension were finite, as S and its dual would be isomorphic). We thus get an inverse (S:|dual(S)) for g; this is trivially antilinear, like g. We can embed S in dual(dual(S)) as (: ({scalars}: w(v) ←w |dual(S)) ←v |S), enabling us to interpret g's inverse as an antlinear map from dual(S) to its dual; this is in fact conjugate-symmetric, like g, so g's inverse is a hermitian product on dual(S); it is also positive-definite, although this won't interest me much here. I'll use g\f to denote the composite of g's inverse after (i.e. to the left of) any suitable f and h/g for the matching composite on the right, for any suitable h; if I need to refer to g's inverse directly, I'll thus use 1/g or g\1, interpreting 1 as a relevant identity linear map in each case.
So now consider any other hermitian product, x, on S; composing g's inverse on its left we get g\x. For u in S and complex k, antilinear x maps k.u to *k.x(u) in dual(S) and going backwards through antilinear g maps this to k.(g\x)(u) in S. Since both x and g's inverse respect addition we can infer that so does their composite, hence g\x is simply linear (S:|S). Composing g after (i.e. on the left of) this linear map gets us x back. These linear (S:|S) which, when composed before g, give hermitian products, encode the observables of our system: they are termed hermitian operators. The attentive might notice that we should properly term them g-hermitian operators, since their specification depends on g.
Above, I've defined an antilinear (dual(S): x |S) to be a hermitian
product precisely if every x(u, v) = *x(v, u); now let's pause to consider a
general antilinear (dual(S): x |S), not necessarily a hermitian product, and
the induced (: ({scalar}: *x(v, u) ←v |S) ←u |S) that would be equal
to x if it were a hermitian product. I'll refer to this induced mapping as
the hermitian transpose
of x and denote it †(x);
so we have
antilinear (dual(S): |S) precisely when x is. (The notation here adapts one of several ways, x†, that orthodoxy denotes the hermitian transpose; I've also seen xH.) A hermitian product (also called a hermitian form) is thus an antilinear map from a Hilbert space to its dual that's equal to its hermitian transpose.
Now, for some invertible antilinear (dual(S): g |S), consider a linear map
(S: f |S); for it to be g-hermitian, we would need †(g) = g and
†(g&on;f) = g&on;f. The latter can be re-written as f =
g\†(g&on;f); it's only really of interest when g is a hermitian
product, but it's defined whenever g is invertible. By analogy with the
g-hermitian operator we get when it's symmetric, I'll refer to
g\†(g&on;f) as the g-hermitian transpose
of f;
like f, it is a linear map (S: |S).
With linear maps, transpose(h&on;k) is transpose(k)&on;transpose(h); so we can hope to be able to re-write †(g&on;f) in terms of †(g) – regardless of whether this is equal to g – and some sort of transpose of f. I can, in fact, apply the above definition of † to f, since I carefully left out constraints, provided I select the right spaces to take the two inputs from; to be able to write *f(v, u), we need v to be in S, putting f(v) in S; for u to be an input to it, we need to interpret f(v) as in dual(dual(S)), as above, so u must be in dual(S); in which case the output f(v, u) is indeed scalar, so can be conjugated. So †(f) = (: ({scalars}: *u(f(v)) ← v |S) ←u |dual(S)) = (: *&on;u&on;f ←u :); but it's antilinear in both inputs, so antilinear from dual(S) to the antidual of S. So †(f) isn't a good candidate for f's factor in a decomposition of †(g&on;f); it doesn't accept †(g)'s outputs as inputs, it would produce the wrong outputs (in the anti-dual) if it did and (when composed after antilinear †(g)) would give a linear map rather than the antilinear †(g&on;f).
So let's look at the constraints †(f) violated there: we want to compose it after †(g), so it needs to accept inputs from dual(S); we want the composite to be, like †(g), antilinear; and we need the outputs to be in dual(S). So what we need is a linear (dual(S): |dual(S)) – and that's exactly what the transpose of f is, once we use the usual trick of reading S via its embedding in dual(dual(S)); just leave off the conjugation in †(f) and we have transpose(f) = (dual(S): w&on;f ←w |dual(S)). So let's see if that gets what we want: for given u, v in S,
in which f(v), in S, is being given †(g, u), in dual(S), as input; so we interpret f(v) as being in dual(dual(S)), giving
so, indeed, transpose(f)&on;†(g) = †(g&on;f). So we can write our g-hermitian transpose of f, g\†(g&on;f) as g\transpose(f)&on;†(g); and, when g is a hermitian form, this just reduces to g\transpose(f)&on;g. Note, in passing, that the co-ordinates of this will (thanks to the antilinearity of g) be conjugated co-ordinates of f (when our co-ordinates arise from a basis that diagonalises g), even though we've only simply transposed f.
This last leads to orthodoxy, which focusses rather more on co-ordinate, denoting f's g-hermitian dual in some way that looks like it's applied the same operator, to f, that corresponds to † as applied to g; see above for how that isn't valid with in present expression of the subject matter (since †(f) can sensibly be defined and isn't the same thing as either g\†(g&on;f) or transpose(f), although it looks like the latter, with its co-ordinates transposed, when g is diagonal).
In any linear space V over {scalars}, for any set B of members of V, we
can define the span
of B to be:
i.e. the set of values of finite sums of scalar multiples of members
of B. A member of the span might arise more than one way if sum(h.b) =
sum(k.b) for two distinct ({scalars}: :n) h and k and a single (B: b :n) with
n natural; subtracting one from another, we then get sum((h −k).b) = 0
with ({scalars}: h −k :n) non-zero on at least some inputs in n; any
such mapping ({scalars}: r :n), with at least some non-zero output, for which
sum(: r(i).b(i) ←i :n) = 0 for some monic (B: b :n) is known as a linear
dependence
among the members of B (specifically, the b(i) corresponding to
non-zero r(i) values); and B is described as linearly
independent
if there are no linear dependencies among its members.
Note that, if we have ({scalars}: h :n), (B: b :n), ({scalars}: k :m) and (B: c :m) with equal sum(h.b) = sum(k.c), we can rearrange these as a single ({scalars}: j :n+m), (B: a :n+m), e.g. for i in n, j(i) = h(i) and a(i) = b(i) while, for i in m, j(i+n) = k(i) and a(i+n) = c(i). This might not make a be monic, even if b and c were, as b and c might have an output in common. We can, of course, factor a via some monic (B: x :p) with p natural (and smaller than n+m) that has the same outputs, by mapping each index in n+m to some index in p. Provided each of b and c is monic, each of h and k can then be represented by a ({scalars}: :p) mapping, that uses the same re-indexing as b or c, via j's indices, to p's. This reduces our case with distinct (B: b :n) and (B: c :m) to one with a shared (B: x :p), as used in the specifications. If we try to construct a ({scalars} y :p) corresponding to ({scalars}: j :n+m) by mapping each index in n+m to the same index i in p, y(i) needs to be equal to each entry in j at an index in n+m that comes via i; which can only happen if j's entries are equal whenever a's are; avoiding this complication is the reason for insisting on b being monic in the specifications, above.
A linear space if finite-dimensional if there is some finite set of its
members whose span is the whole space; and one can then show that there is
some subset of this that is furthermore linearly independent; and that any two
finite linearly independent spanning sets of a vector space have the same
number of members; this number is known as the dimension of the space. Each
such linearly-independent spanning set is called a basis
of the space. In the case of our Hilbert space, however, some of the
reasoning of that breaks down; even when we have a (countably) infinite subset
of the space, the best we can hope for is that the span is dense in
the space – that is, for any vector in the space and any positive
distance (no matter how small), some member of the span is within that
distance of the given vector. So, in the infinite-dimensional case, we define
a basis to be a linearly independent countable set of vectors whose span is
dense in the Hilbert space. That it's countable means we can express the set
as the set of outputs of some function (V: b :{naturals}) and we chose to use
such a b for which, for each right value n of b, every member of n (i.e. every
natural less than n) is also a right value of b; thus (:b|) is either a
natural or the whole of {naturals}. So, although a basis is formally a set of
vectors, we invariably deal with it as a sequence of vectors (V: b |N) with N
either a ntural or the whole of {naturals}. In support of this, we extend the
definition of span, above, to accept not just sets of members of V but also
mappings to V, so that span(b) = span((|b:)), the span of the outputs of a
mapping.
When we have a basis (:b|N) of a vector space V, a member of its span is
always (: h(i).b(i) ←i :n) for some (: h :n) with n a natural subsumed by
N; and, since b is linearly independent, there is exactly one such h for any
given vector in span(b); this lets us define, for each i in N, a mapping from
span(b) to scalars, which maps any given vector sum(b.h) in span(b) to h(i).
The continuum properties of the Hilbert space ensure that this mapping is
cntinuous and thus that we can extend it to the whole of V, not just span(b),
so that we get a member of dual(V) associated with each b(i), that maps each
member sum(h.b) of span(b) to h(i); let (dual(V): q |N) be the mapping whose
q(i) is this member of dual(V), for each i in N. This mapping q is known
as the dual of
b; and, indeed, b is then the dual of q,
equally. (Note that, though related, this is not the same dual as dual(V) is
of V.) In span(b), we have q(i, sum(h.b)) = h(i), so each v in span(b) has v
= sum(V: q(i, v).b(i) ← i |N); although (when our Hilbert space has
infinite dimension) this is an infinite sum (over all of N), we know v is
sum(h.b) for some (:h:n) with n finite, so all but finitely many terms in our
infinite sum are zero. For the case of general v in (infinite-dimensional) V,
not necessarily in span(b), however, we do indeed get an infinite sum; but
continuity of our metric then ensures that this infinite sum is indeed (in a
suitable sense) convergent and we can, with a clean conscience, write v =
sum(: b(i).q(i, v) ←i |N) for all v in V.
The tensor product gives us a multiplication between members of V and its dual for which u×w = (V: u.w(v) ←v |V); this is a linear map (V: |V). When we write the above v = sum(: b(i).q(i, v) ←i :) in this form, we find v = sum(: b(i)×q(i) ←i |N)(v) and thus sum(b×q) is the identity on V.
The tensor product also lets us multiply members of V's dual together in the same way; now, both w and u are in dual(V) so u×w = (dual(V): u.w(v) ←v :V) now produces its outputs in dual(V) rather than V. That'll let us form, for a given basis b with dual q, a mapping sum(q×q) which maps each b(i) to the matching q(i) and extends this mapping linearly to map the whle of V to (as it happens) the whole of dual(V); it should be noted, however, that this mapping (unlike sum(b×q) as identity) depends on choice of basis; if we chose a different basis in place of b the tensor-square of its dual, sum(q×q), will be a different linear map.
Now, for u, v in dual(V), u×w is linear (dual(V): |V); but we're more interested in hermitian forms, which are antilinear (dual(V): |V); for these, we need to conjugate the (scalar) output of w, so we'd want u×(*&on;w) = (dual(V): u.*(w(v)) ←v |V). Since * (presently) only acts on scalars, I can overload it as a binary operator whose left operand is any vector or tensor quantity and whose right operand is any mapping with scalar outputs, so that u*w is this u×(*&on;)w. With this, any basis b with dual q yields a sum(q*q) which maps b(i) to q(i) and extends this anti-linearly to map the whole of V to (as it happens) the whole of dual(V). When we apply this to two members u, v of V we get sum(q*q)(u, v) = sum(dual(V): q(i).*(q(i, u)) ←i :)(v) = sum({scalars}: q(i, v).*(q(i, u)) ←i :), which is manifestly linear in v and antilinear in u. If we conjugate its output, we switch which of q(i, v) and q(i, u) is conjugated and thus get sum(q*q)(v, u), thereby showing that sum(q*q) is conjugate-symmetric. If we give it u = v, we get sum(: q(i, v).*(q(i, v)) ←i :) in which each term is h.*h for some scalar h; this is real and ≥ 0, with equality only if h is 0. So sum(q*q)(v, v) is real and ≥ 0, with equality only if every q(i, v) is zero, i.e. when v is zero; thus sum(q*q) is positive-definite.
Just as sum(q*q) is a positive-definite hermitian form on V, sum(b*b) is a positive-definite hermitian form on dual(V). Since the former is an antilinear map (dual(V): |V) and the latter is antilinear (V: |dual(V)), we can compose them in either order; a composite of two antilinears is linear, so the composite is, in each case an identity, one way round on V, the other way round on dual(V). Consequently, sum(q*q) and sum(b*b) are mutually inverse.
We have sum(b×q) as the identity on V; dually, sum(q×b) is the identity on dual(V); so let's compose these before and after some general hermitian form g; we'll get
in which g·b(i) is just g(b(i)); as g acts antilinearly on its first argument, any scaling we apply to b(i) gets conjugated; and b(i)×q(i) is a linear map that does indeed apply a scaling (q(i)'s output) to b(i); so g·(b(i)×q(i)) is just g(b(i))×*&on;q(i)
in which each g(b(i), b(j)) is a scalar. This, indeed, just shows that the q(j)*q(i) for various i, j equip us with a basis of hermitian forms on V.
For a general hermitian form g and basis b with dual q, the expression of
g in terms of this basis of hermitian forms involves q(j)*q(i) for diverse i,
j in N – in contrast to our sum(q*q), which only has terms with i = j.
A hermitian form is described as diagonal
with respect
to a basis b, with dual q, precisely if it's in the closure of the span of
q*q, i.e. it's sum(h.q*q) for some sequence ({real}: h :N); note that, if a
({scalars}: h :N) has g = sum(h.q*q) hermitian, it is necessarily a ({real}: h
:N), since each h(i) = g(b(i), b(i)) is real, thanks to g being hermitian.
The fun thing about hermitian forms is that, for every hermitian form, there
is some
basis with respect to which it is diagonal; when the form is
positive definite, we can scale each b(i) by √h(i) to get a revised
basis whose sum(q*q) is the form. Better yet, if you have a positive definite
hermitian form and any other hermitian form, there is some basis with respect
to which both are diagonal (and the positive definite one is
sum(q*q), once we do some simple rescaling). I'm not going to prove that
here, as it's one of the Big Theorems of linear algebra; I'm just going to use
it.
That last has the important consequence that we can always put any hermitian operator also in a diagonal form; if its composite before the metric is sum(h.q*q) for some basis q of the dual, dual to a basis b of V, with sum(q*q) as the metric, then the operator is sum(q*q)\sum(h.q*q) = sum(b*b)·sum(h.q*q) = sum(h.b×q) = sum(: h(i).b(i)×q(i) ←i :). The trick, then, is to find the right basis with respect to which to describe our hermitian operator.
Note, however, that this (potentially) uses up our freedom to simplify by choice of basis. (We may have a little left, if the h(i) aren't all distinct values.) As a result, the basis that puts one hermitian operator in diagonal form (while the metric is in unit-diagonal form) needn't do the same for another; when the basis that doagonalises one hermitian operator does also diagonalise another, the two operators commute (of you compose either after the other, you get the same linear map; it's also diagonal, for the given basis, with each diagonal entry being the (scalr) product of the corresponding diagonal entries of the two composed operators). Indeed, the converse shall also prove to be true: if they commute, then they can indeed be simultaneously diagonalized.
If we have an antilinear (dual(S): g |S) and a basis b of S, with dual q, the co-rdinate form has g = (: g(b(i), b(j)).q(j)×q(i) ←[i, j] :) while that of †(g) has, as q(j)×q(i) co-ordinate, †(g, b(i), b(j)) = *(g(b(j), b(i)), swapping the indices of the co-ordinate of g and conjugating. Of course, when g is a hermitian form this makes no difference, since g = †g. When we go on to put it into diagonal form, all the co-ordinates are real (zero off the diagonal, equal to own conjugate on the diagonal), hiding the conjugation entirely.
Given a hermitian form g, a linear (S: f |S) gave us (dual(S): transpose(f) |dual(S)) and f's g-hermitian transpose just uses g to bridge between S and its dual at either end of this, to get back to something (S: |S). Now let's look at the coordinates – given a basis b of S with dual q, that diagonalise g as sum(q*q):
Comparing this with f = sum(: (q(j)·f·b(i)).b(j)×q(i) ←[i, j]), we see that the co-ordinates of f's g-hermitian transpose are conjugates of those from f, with the basis-indices swapped. So, as remarked above, although the g-hermitian transpose of f uses the plain transpose of f (not †(f)), its co-ordinates are conjugated aw well as flipped.
Orthodoxy generally deals with the hermitian
conjugate in a context of matrices of co-ordinates with respect to a given basis
and its dual, that diagonalise the metric, both of which are taken for granted
and (mostly) left unmentioned. In such a context, the hermitian g-transpose of
a matrix is (as noted above) simply obtained by
turning rows into columns and columns into rows while conjugating each
co-ordinate; however, in the case of endomorphisms, this hides the implicit
change from one space to another, while eliding mention of the two applications
of g that change the result back to the space it started in. As noted above,
this conjugate and flip
action on matrices (of coordinates) doesn't
simply correspond to the hermitian transpose I describe above, although (when
each is correctly worked through and applied) the two approaches end up
describing the same things; in particular, what I mean by transpose
is always a swap of the parameters of some function, regardless of how
it transforms any particular representation of that function (for orthodoxy, the
parameters are indices in a matrix; for the present treatment, the parameters
are the vectors acted on by linear maps). The co-ordinate
description works
– but at the expense of obscuring important
aspects of what's happening in the linear spaces. The (matrix) hermitian
transpose is also known variously as
the conjugate
transpose, adjoint, adjugate or hermitian adjoint.
We're now ready to return to our Hilbert space S in which our quantum states are represented by the unit sphere of some hermitian form. As specified above, a hermitian operator is a linear map (S:|S) which, when composed before our hermitian product g, give a hermitian composite.
Now let's look at two eigenvectors, u and v with eigenvalues h and k: so
k.v = f(v) and h.u = f(u). Consider *h.g(u, v) = g(f(u), v) = (g&on;f)(u, v);
but g&on;f = †(g&on;f), so *h.g(u, v) = *((g&on;f)(v, u)) = *(g(f(v),
u)) = g(u, f(v)), as g = †(g), whence *h.g(u, v) = k.g(u, v); thus
either g(u, v) is zero or *h = k. The case u = v has h = k, so gives *h = k =
h, which tells us that every eigenvalue is real. The space of eigenvectors
with any given real eigenvalue is referred to as
the eigenspace
of that eigenvalue; it may have many
independent directions; but any two eigenvectors with distinct
eigenvalues are perpendicular.
For each eigenvalue, we can thus chose an orthonormal basis of its
eigenspace; by combining these from all eigenvalues, we can obtain an
orthonormal basis of the whole Hilbert space, described as
an eigenbasis
for the given hermitian operator; this
diagonalises both the metric and the hermitian operator.
Suppose we have two observables U, V and that they commute, that is U&on;V = V&on;U; consider an eigenvector v of V with eigenvalue k: k.U(v) = U(V(v)) = V(U(v)), so U(v) is also an eigenvector of V with eigenvalue k. Thus U preserves each eigenspace of V; and, conversely, V preserves each eigenspace of U. By considering U's restriction to each V-eigenspace, we can construct a U-eigenbasis of each V-eigenspace; this U-eigenbasis is then also a V-eigenbasis of that V-eigenspace; combining these across all V-eigenspaces we get a mutual eigenbasis for U and V; which we could equally have obtained by gathering V-eigenbases of U-eigenspaces. Such a basis puts both U and V into diagonal form, where it is self-evident they commute (since multiplying diagonal hermitian matrices just multiplies reals in corresponding positions, which gets the same result in either order); indeed, any two observables that are both diagonalised by some choice of basis do in fact commute. Thus observables commute precisely if they are simultaneously diagonalisable.
When we measure some quantity, the quantity measured is an observable and the measured value is an eigenvalue of that observable; after the measurement, the system's state is in an eigenvector in the eigenspace of that eigenvalue. The act of measuring effectively decomposes the prior state into components, each in a single eigenspace of the observable, and selects one of these components; the probability of each component being selected (and thus its eigenspace's eigenvalue being observed) is proportional to the metric's squared length of that component. We normally represent a state by a unit vector (with respect to our given metric), which leads to renormalising the selected component after; however, this is purely a convention of representation, for convenience (with which the probabilities, during an observation, are just the squared magnitudes of the components, which sum to one, saving the need to scale them proportionately to get probabilities).
For each eigenspace of the observable measured, we can define a hermitian
operator (so, in fact, an observable) that acts as the identity on that
eigenspace while mapping to zero all vectors in the observable's other
eigenspaces. Such a mapping only has outputs in the eigenspace it selects, on
which it acts as the identity, so composes with itself to give itself; it is
transitive. Indeed, if a mapping f is transitive, so relates x to z whenever
it relates x to y and y to z, we can state this as f(z) = f(f(z)) for all z,
hence f acts as the identity on its outputs. Transitive mappings are (in
general) known as idempotent
and (at least on linear spaces)
as projections
.
Consider a linear map (V: f |V) on a linear space that acts as the identity on its outputs; these forma linear sus-space Im(f) of V. We also have Ker(f) = {v in V: f(v) = 0}, the kernel of f. Since f(f(v)) = f(v) for all v in V, f acts as the identity on Im(f), so Im(f) is f's eigenspace with eigenvalue 1; while Ker(f) is its eigenspace with eigenvalue 0. For any v in V, consider u = v −f(v); by linearit yof f this has f(u) = f(v) −f(f(v)) and transitivity of f gives f(f(v)) = f(v), so f(u) = f(v) −f(v) = 0. Thus u is in Ker(f), f(v) is in Im(f) and we have v = u +f(v) as a sum of two vectors, one in Ker(f), the other in Im(f). Thus every projection has exactly two eigenspaces, Im(f) with eigenvalue 1 and Ker(f) with eigenvalue 0; these span the whole space.
Now, without assuming our projection (S: f |S) is associated with an
eigenspace of a hermitian operator, consider its interaction with our metric,
(dual(S): g |S). If there are u in Ker(f) and v in Im(f) for which g(u, v) is
non-zero, then g(f(u), v) = 0, as f(u) is zero, but g(u, f(v)) = f(u, v), as
f(v) = v; so g(f(u), v) isn't conjugate to g(u, f(v)), so f isn't g-hermitian.
Otherwise, every u in Ker(f) and v in Im(f) have g(u, v) = 0; two general
members of S are u +v and x +y with u, x in Ker(f) and x, y in Im(f); these
have g(u +v, f(x +y)) = g(u +v, y) = g(u, y) +g(v, y) = g(v, y) as u in
Ker(f), y in Im(f) imply g(u, y) = 0; likewise, g(v, x) = *(g(x, v)) = *0 = 0
so g(f(u +v), x +y) = g(v, x +y) = g(v, x) + g(v, y) = g(v, y) and we have g(u
+v, f(x +y)) = so g(f(u +v), x +y), so f is g-hermitian. So a projection is
g-hermitian precisely if its image is g-perpendicular to its kernel; I'll
describe such a linear map as a projector
for g or as a
g-projector; and, as ever, skip the reference to g when context makes clear
which metric is implicated, which it usually shall. A projector may also be
referred to orthodoxly as an orthogonal projection.
In the case of projectors onto eigenspaces of some hermitian operator, the eigenspaces are all mutually orthogonal so the projections are indeed projectors. Each such projector commutes with the operator, the composite acting simply as scaling by the eigenvalue on the given eigenspace while mapping all other eigenspaces to zero; two such projectors, for distinct eigenspaces of the same operator, have composite zero either way round, so also commute. The projectors onto eigenspaces of a given hermitian operator are all simultaneously diagonalised by any basis that diagonalises the operator.
There are many bases that put any given (positive-definite symmetric
hermitian) metric into diagonal form; when we change between such bases, our
co-ordinate representation of any given vector in the Hilbert space is
transformed by multiplying it by a square matrix (albeit of possibly infinite
dimension, so square
is here used in a rather technical
sense), each row and each column of which has 1 as its sum of squared
moduli of entries. Each entry in such a matrix (describing a change of
representation, without any change of the thing represented) is the
contraction of a basis member of the initial basis with a member of the final
dual basis; this is equally (depending on exactly how you represent each,
possibly the transpose of) the matrix that describes the linear map, from the
Hilbert space to itself, that maps each member of the new basis to a
corresponding member of the old basis. Swapping the two bases transposes the
matrix, it doesn't actually matter which of the things represented we
consider; so I chose to discuss the linear map rather than the change of
co-ordinate representations.
Now, a linear map f that maps one basis to another, each of which
diagonalises our metric g, necessarily maps each vector to one to which g
ascribes the same length; indeed, it even, maps any two input vectors u, v to
two outputs f(u), f(v) whose value of g is the same, g(f(u), f(v)) = g(u, v).
Now, g(x, f(v)) = †(g, f(v), x) = †(g&on;f, v, x) so (: (: g(f(u),
f(v)) ← v :) ←u :) = †(g&on;f)&on;f; so, when f maps one basis
that diagonalises g to another, we get †(g&on;f)&on;f = g. When f
satisfies this, I'll describe it as unitary
for g, or
g-unitary. We're usually meet this where g is a hermitian product, so
†(g&on;f)&on;f = transpose(f)&on;†(g)&on;f =
transpose(f)&on;g&on;f.
As above for hermitian transposes of linear maps, orthodox treatments are apt to express f being g-unitary in terms of something that looks like it means †(f) composed on the left of f, possibly via g, being equal to g; as before, this arises from orthodoxy focussing more on co-ordinates, where either transpose(f)'s or f's co-ordinates need to be conjugated when computing transpose(f)&on;g&on;f. In the present treatment, that conjugation is implicit in g being antilinear.
The phase-space of our system is encoded as a sub-manifold of the unit
sphere in some Hilbert space. The Hilbert formalism describes its mechanics in
any terms that apply to all vectors in the Hilbert space, implicating the model
only to the tune of some metric
– a positive-definite hermitian
form whose unit sphere has our phase-space as a submanifold. The actual choice
of which positive-definite hermitian choice you use makes little practical
difference, because the Hilbet space model implicitly renormalises all the time,
with respect to the chosen metric. This means the manifolds you get from
different choices of mutually-diagonalisable metrics (which would be any
pair of them in the finite-dimensional case, but our Hilbert space may be
infinite-dimensional; however, being a Hilbert space has some continuity
consequences, which may mean all continuous metrics are
mutually-diagonalisable; I can't remember) are homeomorphic under a linear map
(diagonalised by some choice of basis that has both metrics diagonal) of the
Hilbert space in which they are nominally embedded. So the Hilbert space's
mechanics – of arithmetic with vectors, and thus of linear maps from the
Hilbert space to itself, and how they interact with the metric in use – is
largely orthogonal to the question of what co-ordinates it makes sense to use in
our sub-manifold phase-space. Our choice to encode those via eigenvaluess of
eigenvectors of linear endomorphisms brings in constraints on endomorphisms in
terms of how they relate to the metric, that ensure the endomorphisms have a
symmetry with respect to the metric which, in turn, ensures they have
eigenvalues that can encode the co-ordinates of our phase-space.
It would be worth examining carefully which bits of that (if any) are physics and which are abstract theory, developed in terms that could be understood regardless of whether they were capable of describing the physics.
Written by Eddy.