The tensor product is a
multiplicative combiner on
linear spaces and their members; it gives rise to diverse linear isomorphisms
among linear spaces and provides the context for a general algebra of
combination and factorisation among vector

and matrix

quantities.

Given a space V linear over some algebra of scalars, its dual is the space
of linear maps from V to {scalars}: dual(V) = {linear maps ({scalars}:|V)}.
Under point-wise addition, a+b = (: a(v) +b(v) ←v |V) for a, b in dual(V),
and scaling, k.a = (: k.a(v) ←v |V) for k in {scalars} and a in dual(V),
this forms a linear space, as may readilly be verified, given the additive and
multiplicative structures on {scalars}. Each member v of V induces a map from
dual(V) to {scalars} via (: u(v) ←u |dual(V)); with the given linear
structure on dual(V), this is naturally a linear map. This gives us a natural
embedding of V in dual(dual(V)); when V is finite-dimensional, this is in fact
an isomorphism; so that the relationship between V and dual(V) is mutual (hence
the duality

that gives rise to its name). There is a
natural contraction

which combines members of mutually dual spaces by
letting either act on the other as a mapping to scalars; given mutually dual
spaces U and V and members u and v, respectively, of them, the result of
contracting them is written u·v or v·u and stands simply for u(v),
when u is read as a member of U = dual(V), or v(u), when v is read as a member
of V = dual(U); the value is the same either way round. Since each space's
members act linearly on the other, contraction is linear in each of its two
operands, in much the same way that multiplication is. Thus far, contraction is
Abelian (i.e. the order of operands doesn't matter); but I'll extend it, below,
to a more general binary operator, linear in each operand, for which the order
of operands and operations does matter.

There is an older treatment of this subject that I (perhaps) haven't fully folded into this one. Hopefully this one is more coherent, though …

Given linear spaces V and W, we can define a multiplication

which
combines a member w of W with a member u of dual(V) to produce a linear map
w×u = (W: w.u(v) ←v |V), scaling w by the (scalar) ouput u yields
from the given input. This is multiplicative

in the sense that it is
linear in each of w and u; first, suppose we have another member of dual(V),
y:

- w×(y+u)
- = (: w.((y+u)(v)) ←v :)
- = (: w.(y(v) +u(v)) ←v :)
by the point-wise definition of addition among functions

- = (: w.y(v) +w.u(v) ←v :)
by linearity of scalar multiplication in W

- = (: w.y(v) ←v :) +(: w.u(v) ←v :)
by the point-wise definition of addition among functions

- = w×y +w×u

Next, suppose we have another member of W, z:

- (w+z)×u
- = (: (w+z).u(v) ←v :)
- = (: w.u(v) +z.u(v) ←v :)
by linearity of scalar multiplication in W

- = (: w.u(v) ←v :) +(: z.u(v) ←v :)
by the point-wise definition of addition among functions

- = w×u +z×u

Finally, given a scalar k:

- (k.w)×u
- = (: (k.w).u(v) ←v :)
- = (: k.(w.u(v)) ←v :)
because multiplying by scalars, k on the left and u(v) on the right, is associative

- = k.(: w.u(v) ←v :)
by the definition of how scalar constants multiply functions whose outputs lie in linear spaces

- = k.(w×u)
but equally, going back two steps to (: k.(w.u(v)) ←v :),

- = (: w.(k.u(v)) ←v :)
because scalar multiplication is abelian (i.e. we can swap the order of w and k) as well as associative (we can shift the parentheses)

- = (: w.((k.u)(v)) ←v :)
by the definition of how a scalar constant multiplies the function u

- = w×(k.u)

In particular, this last ensures that scalar multiplication passes freely between operands of ×.

Given a finite basis (V:b|n) of V (when such exist; i.e. when V has finite dimension) we can obtain the dual basis (dual(V):p|n), for which p(i)·b(j) is 1 if i = j, else 0. Since b is a basis, it spans V, so every member of V is sum(: h(i).b(i) ←i :) for some ({scalars}:h:n); contracting any such with p(j), for given j in n, gives h(j), from the h(j).b(j) term in the sum, because all other terms include a factor of 0 = b(i)·p(j) with i≠j. Thus each v in V is sum(: (v·p(i)).b(i) ←i |n) and, applying our definition of × to the case V = W, sum(: b(i)×p(i) ←i |n) is the identity linear map (V:|V), a.k.a. V. Given an arbitrary linear (W:f|V) we have each f(b(i)) in W and p(i) in dual(V) so we can × these and sum over i in n to get sum(: f(b(i))×p(i) ←i |n) as a linear map (W:|V), with:

- sum(: f(b(i))×p(i) ←i |n)
- = (: sum(: f(b(i)).(p(i)·v) ←i |n) ←v :)
by application of linearity among linear maps (W:|V), then of the point-wise nature of addition and scaling of functions,

- = (: f(sum(: b(i).p(i)·v ←i |n)) ←v :)
by linearity of f

- = (: f(v) ←v :)
- = f

hence every linear map (W:|V) is expressible as a sum of terms of form w×u with (f(b(i)) as) w in W and (p(i) as) u in dual(V). In particular, given a basis of W, since each f(b(i)) is in its span, × combines the members of our basis of W with those of our basis of dual(V) to produce a set of linear maps (W:|V) that spans the space of such maps; and linear independence of the bases used to construct these ensures their linear independence, so that these products in fact form a basis of the given space of linear maps.

Thus × effecitvely enables us to transform analysis of linear maps from finite-dimensional linear spaces into the analysis of sums of products of members of simple linear spaces, where we never have to directly consider linear maps to any more complicated space than {scalars}.

Thus far, the right operand of × has necessrily been in the dual of some linear space; however, as noted above, every linear space can at least be embedded in the dual of its dual, so we can naturally apply the above to linear spaces W and U to combine w in W and u in U as w×u, a linear map (W: |dual(U)).

The analysis using a basis depended on V being finite-dimensional; more generally, when V has no finite basis, we don't necessarily obtain the whole of {linear maps (W:|V)} as the span of the products of form w×u with w in W and u in dual(V). However, we still can form such products – so define a binary operator &tensor; on linear spaces by:

- W&tensor;U = span({w×u: w in W, u in U}) ⊆ {linear maps (W:|dual(U))}

modulo some equivalences I'll come to below, by which various spaces
that can be built using &tensor; are conflated with one another via natural
isomorphisms. A space constructed using &tensor; to combine linear spaces is
known as a **tensor space**, with &tensor; and × being
known as tensor product

operators on, respectively, linear spaces and
their members.

Notice that × is linear in each of its two operands; when a mapping
accepts two inputs and is linear in each, it is described
as **bilinear**. One of the central properties of × is
that it can be used to represent any other bilinear map as a linear map from
products obtained using ×.

Suppose we have linear spaces U, V, W and a linear map ({linear maps (U:|V)}: f |W); since f is linear and each output of f is linear, f is bilinear. A typical member of V&tensor;W is T = sum(: y(i)×z(i) ←i |n) for some mappings (V:y|n) and (W:z|n) from finite n; given such mappings, we can also obtain sum(: f(z(i),y(i)) ←i |n). If we can show that the latter depends linearly on T – and, in particular, does not depend on which y and z we chose, so long as they yield the same T – we shall be able to re-write f in terms of a linear map (U: |V&tensor;W). In the finite-dimensional case, this can be done quite trivially by taking bases in W and V, which yield a basis of V&tensor;W, which we can use to linearly decompose each y(i), z(i) and y(i)×z(i). However, in general, we can't use bases (or, to put it another way, I'd like to show that this is true even for infinite-dimensional spaces).

So suppose we have y, z as above and (V:r|m), (W:s|m) likewise with m finite and sum(: r(i)×s(i) ←i |m) = T. Even if V and W are infinite-dimensional, we can restrict attention to the finite-dimensional spans of (|r:)&unite;(|y:) and (|s:)&unite;(|z:). Let p be a basis of the former and q of the latter; then each r(i) and y(j) is expressible as a linear combination of the members of p; likewise, s and z are expressed in terms of q. With suitable functions we thus obtain r(i) = sum(: R(i,a).p(a) ←a :), y(j) = sum(: Y(j,a).p(a) ←a :), s(i) = sum(: S(i,c).q(c) ←c :), z(j) = sum(: Z(j,c).q(c) ←c :). Linear independence among the q(c) and among the p(a) imply that the p(a)×q(c), for diverse a and c, are linearly independent, so the two expressions we get for T as a linear combination of them must have equal coefficients, sum(: R(i,a).S(i,c) ←i :) = sum(: Y(j,a).Z(j,c) ←j :), for each a and c. But then (remembering that all sums involved are finite, so we can re-order them freely)

- sum(: f(z(j),y(j)) ←j :)
- = sum(: sum(: sum(: f(Z(j,c).q(c),Y(j,a).p(a)) ←a :) ←c :) ←j :)
- = sum(: sum(: sum(: Y(j,a).Z(j,c) ←j :).f(q(c),p(a)) ←a :) ←c :)
- = sum(: sum(: sum(: R(i,a).S(i,c) ←i :).f(q(c),p(a)) ←a :) ←c :)
- = sum(: sum(: sum(: f(S(i,c).q(c),R(i,a).p(a)) ←a :) ←c :) ←i :)
- = sum(: f(s(i),r(i)) ←i :)

This being true whenever any two sums of products yield the same member of V&tensor;W, we can infer that the relation defined by

- F = (U: sum(: f(v(i),w(i)) ←i :) ← sum(: v(i)×w(i) ←i :) |V&tensor;W)

is a mapping (U: |V&tensor;W). Scaling its input is equivalent, as applied to the typical input sum(: v×w :), to scaling either all outputs of v or all outputs of w and thus, by linearity of f in whichever of its two parameters you chose to scale, scales the output by the same factor. Adding two inputs is always expressible as adding two sums of products, which simply yields a longer sum of first the products from one sum, then those from the other; linearity of summation then ensures our output is a similar sum made up of the terms in the sums the two inputs would have produced separately. Hence F is linear.

This has the result of making × a **universal bilinear
operator** in the sense that any bilinear f can be expressed in terms
of some linear F, as above, as f(v,w) = F(v×w); that is, bilinearity can
always be subsumed into ×, leaving the rest of a bilinear operator's
action to a linear map from the resulting tensor product space. Note,
however, that the order of factors is not determined by the construction; I
could just as readily have defined (U: F |W&tensor;V) above, swapping V and W,
and proved an equivalent result.

Closely related to this, I define span, not on sets, but on
relations by: given a relation (U:r:V) between linear spaces U and V, span(r)
relates u in U to v in V precisely if there exist a natural n and mappings
({scalars}: h :n), (U: x :n), (V: y :n) for which u = sum(: h(i).x(i) ←i
:), v = sum(: h(i).y(i) ←i :) and, for each i in n, r relates x(i) to
y(i). The resulting span(r) is always necessarily linear; when r is a mapping
and respects linearity it will be a linear map. Given this, I can exploit
universality to define mappings from a tensor space by linear extension

from their actions on simple tensor products; for example, the mapping F above
can be written simply as span(: f(v,w) ← v×w :) and defining
it by linear extension from f(v,w) ← v×w

is just another way
of saying the same thing.

As noted above, linear ({linear (U:|V)}: f |W) can be expressed equally in terms of span(U: f(w,v) ← w×v :W&tensor;V) or of span(U: f(w,v) ← v×w :V&tensor;W) with W and V swapped in the input space. There is no particular reason to chose one order over the other. This bears particularly on the dual of a tensor product of spaces.

Given linear spaces V and W, a member of dual(V)&tensor;dual(W) is, from
the definition of ×, a linear map (dual(V): |W) and hence a bilinear map
from W and V to scalars, so universality lets us express it in terms of a
linear map from W&tensor;V or V&tensor;W to scalars. Thus we can identify
dual(V)&tensor;dual(W) with dual(V&tensor;W) or dual(W&tensor;V); or,
equivalently, we can identify dual(V&tensor;W) with dual(V)&tensor;dual(W) or
dual(W)&tensor;dual(V), again with no particular reason to prefer either
order. Contrast the fact that dual(V)&tensor;dual(W) definitely *does*
(with my definitions) prefer to be read as {linear maps (dual(V): |W)} rather
than as {linear maps (dual(W): |V)}, even though these two are
(via transposition)
naturally isomorphic.

One can, of course, opt to resolve the ambiguity
by *fiat*, but it remains that I have yet to see the need and I am quite
sure different readers are apt to have contrary opinions as to which order
should be chosen. Each camp shall doubtless point to how its choice fits so
nicely with everything else; but, as discussed below, each choice does indeed
fit nicely with everything *that matters* and, for everything else,
each camp's opinions about what looks nice are sure to coincide with what that
camp's choice actually produces.

Our binary operator &tensor; is obviously not Abelian (i.e. W&tensor;U and U&tensor;W are not, in general, the same space); but is it associative ? Consider, then, a three-way product: given linear spaces U, V and W, we can build the product spaces (U&tensor;V)&tensor;W and U&tensor;(V&tensor;W); as specified, these are distinct linear spaces, but how are they related to one another ? Given members u of U, v of V and w of W, a typical member of the first is:

- (u×v)×w
- = (: (u×v).w(m) ←m |dual(W))
- = (: (U: u.v(n).w(m) ←n |dual(V)) ←m |dual(W))

Modulo two applications of the above-mentioned choice of ordering of factors, this bilinear map from dual(W) and dual(V) to U can be expressed in terms, first, of a linear map from dual(W)&tensor;dual(V) to U and, hence, of a linear map from dual(V&tensor;W) to U. In this last form, it is exactly a member of U&tensor;(V&tensor;W); to understand it as such, we need to observe that v×w = (: v.w(m) ←m :dual(W)) acts on dual(V)&tensor;dual(W) as span(: v(n).w(m) ← n×m :); sure enough, expressing the mapping in this way

- (: u.v(n).w(m) ← n×m |dual(V)&tensor;dual(W))
- = (: u.(v×w)(n×m) :dual(V&tensor;W))
- = u×(v×w)

This makes it possible to identify U&tensor;(V&tensor;W) with (U&tensor;V)&tensor;W, making &tensor; an associative operator.

Doing so involved reading a bilinear ({(U: |dual(V))}:
|dual(W)) as a linear(U: |dual(V&tensor;W)); as it happens, this
involves *two* uses of the choice of order in universality, so favours
neither order. If we chose to express the former as a linear (U:
:dual(V)&tensor;dual(W)), each of its inputs is a linear (dual(V): |W) hence
bilinear ({({scalars}:|V)}: |W) which the same choice will lead us to express
as a linear ({scalars}: |V&tensor;W) in dual(V&tensor;W), as
required. Equally, however, if we chose to express our linear ({linear (U:
|dual(V))}: |dual(W)) as a linear (U: |dual(W)&tensor;dual(V)), each of its
inputs is a linear (dual(W): |V) hence bilinear ({({scalars}: |W)}: |V) and
our new choice expresses this as a linear ({scalars}: |V&tensor;W) in
dual(V&tensor;W), again as required. All the same, we do see one piece of
definiteness: if a bilinear map's first input is in dual(W) and second in
dual(V), then we do want it expressed as a linear map from the dual of
V&tensor;W rather than from the dual of W&tensor;V.

Given associativity, we can proceed to a bulk action, bulk(&tensor;), which takes any list (:L|n) of linear spaces and combines the entries to produce a tensor space whose typical member is obtained, correspondingly, by applying bulk(×) to a list (:k|n) with k(i) in L(i) for each i in n. Any permutation of n induces, by composition before L and each such k, a linear isomorphism from the resulting tensor space to another, with the same factors in permuted order. When L is not monic (i.e. it has two equal entries) there are permutations of n which, composed before L, yield L; the isomorphisms these induce are from the tensor space to itself; they are not, in general, the identity on it and, indeed, warrant much study in their own right.

In particular, when n is 2, L is [U,V] for some linear
spaces U and V and we obtain U&tensor;V; the permutation [1,0] induces the
isomorphism span(V&tensor;U: v×u ← u×v :U&tensor;V) and this
is referred to as transposition.

The roots of that nomenclature are
tied to permutations that just swap two entries being commonly referred to as
transpositions; I call them swaps to avoid confusion with my consistent use
of transpose to mean
reversing the order of the first two inputs to a function whose outputs, when
given a first input, are all functions (and the corresponding transformation
of a relation, that generalises this). As it happens, the use of transpose
for span(: v×u ← u×v :), despite its provenance, is actually
compatible with my usual meaning, since we can read u and v as linear maps
from dual(U) and dual(V) to {scalars}, hence u×v as a function which
takes a first input in dual(V) and produces, as output, a function which takes
a second input in dual(U); when read this way, the transpose of u×v is
indeed v×u.