]> The tensor product

The tensor product

The tensor product is a multiplicative combiner on linear spaces and their members; it gives rise to diverse linear isomorphisms among linear spaces and provides the context for a general algebra of combination and factorisation among vector and matrix quantities.

Given a space V linear over some algebra of scalars, its dual is the space of linear maps from V to {scalars}: dual(V) = {linear maps ({scalars}:|V)}. Under point-wise addition, a+b = (: a(v) +b(v) ←v |V) for a, b in dual(V), and scaling, k.a = (: k.a(v) ←v |V) for k in {scalars} and a in dual(V), this forms a linear space, as may readilly be verified, given the additive and multiplicative structures on {scalars}. Each member v of V induces a map from dual(V) to {scalars} via (: u(v) ←u |dual(V)); with the given linear structure on dual(V), this is naturally a linear map. This gives us a natural embedding of V in dual(dual(V)); when V is finite-dimensional, this is in fact an isomorphism; so that the relationship between V and dual(V) is mutual (hence the duality that gives rise to its name). There is a natural contraction which combines members of mutually dual spaces by letting either act on the other as a mapping to scalars; given mutually dual spaces U and V and members u and v, respectively, of them, the result of contracting them is written u·v or v·u and stands simply for u(v), when u is read as a member of U = dual(V), or v(u), when v is read as a member of V = dual(U); the value is the same either way round. Since each space's members act linearly on the other, contraction is linear in each of its two operands, in much the same way that multiplication is. Thus far, contraction is Abelian (i.e. the order of operands doesn't matter); but I'll extend it, below, to a more general binary operator, linear in each operand, for which the order of operands and operations does matter.

There is an older treatment of this subject that I (perhaps) haven't fully folded into this one. Hopefully this one is more coherent, though …

The fundamental product operator

Given linear spaces V and W, we can define a multiplication which combines a member w of W with a member u of dual(V) to produce a linear map w×u = (W: w.u(v) ←v |V), scaling w by the (scalar) ouput u yields from the given input. This is multiplicative in the sense that it is linear in each of w and u; first, suppose we have another member of dual(V), y:

w×(y+u): = (: w.((y+u)(v)) ←v :); = (: w.(y(v) +u(v)) ←v :)
by the point-wise definition of addition among functions; = (: w.y(v) +w.u(v) ←v :)
by linearity of scalar multiplication in W; = (: w.y(v) ←v :) +(: w.u(v) ←v :)
by the point-wise definition of addition among functions; = w×y +w×u

Next, suppose we have another member of W, z:

(w+z)×u: = (: (w+z).u(v) ←v :); = (: w.u(v) +z.u(v) ←v :)
by linearity of scalar multiplication in W; = (: w.u(v) ←v :) +(: z.u(v) ←v :)
by the point-wise definition of addition among functions; = w×u +z×u

Finally, given a scalar k:

(k.w)×u: = (: (k.w).u(v) ←v :); = (: k.(w.u(v)) ←v :)
because multiplying by scalars, k on the left and u(v) on the right, is associative; = k.(: w.u(v) ←v :)
by the definition of how scalar constants multiply functions whose outputs lie in linear spaces
= k.(w×u): but equally, going back two steps to (: k.(w.u(v)) ←v :),; = (: w.(k.u(v)) ←v :)
because scalar multiplication is abelian (i.e. we can swap the order of w and k) as well as associative (we can shift the parentheses); = (: w.((k.u)(v)) ←v :)
by the definition of how a scalar constant multiplies the function u
= w×(k.u)

In particular, this last ensures that scalar multiplication passes freely between operands of ×.

Given a finite basis (V:b|n) of V (when such exist; i.e. when V has finite dimension) we can obtain the dual basis (dual(V):p|n), for which p(i)·b(j) is 1 if i = j, else 0. Since b is a basis, it spans V, so every member of V is sum(: h(i).b(i) ←i :) for some ({scalars}:h:n); contracting any such with p(j), for given j in n, gives h(j), from the h(j).b(j) term in the sum, because all other terms include a factor of 0 = b(i)·p(j) with i≠j. Thus each v in V is sum(: (v·p(i)).b(i) ←i |n) and, applying our definition of × to the case V = W, sum(: b(i)×p(i) ←i |n) is the identity linear map (V:|V), a.k.a. V. Given an arbitrary linear (W:f|V) we have each f(b(i)) in W and p(i) in dual(V) so we can × these and sum over i in n to get sum(: f(b(i))×p(i) ←i |n) as a linear map (W:|V), with:

sum(: f(b(i))×p(i) ←i |n): = (: sum(: f(b(i)).(p(i)·v) ←i |n) ←v :)
by application of linearity among linear maps (W:|V), then of the point-wise nature of addition and scaling of functions,; = (: f(sum(: b(i).p(i)·v ←i |n)) ←v :)
by linearity of f; = (: f(v) ←v :); = f

hence every linear map (W:|V) is expressible as a sum of terms of form w×u with (f(b(i)) as) w in W and (p(i) as) u in dual(V). In particular, given a basis of W, since each f(b(i)) is in its span, × combines the members of our basis of W with those of our basis of dual(V) to produce a set of linear maps (W:|V) that spans the space of such maps; and linear independence of the bases used to construct these ensures their linear independence, so that these products in fact form a basis of the given space of linear maps.

Thus × effecitvely enables us to transform analysis of linear maps from finite-dimensional linear spaces into the analysis of sums of products of members of simple linear spaces, where we never have to directly consider linear maps to any more complicated space than {scalars}.

Thus far, the right operand of × has necessrily been in the dual of some linear space; however, as noted above, every linear space can at least be embedded in the dual of its dual, so we can naturally apply the above to linear spaces W and U to combine w in W and u in U as w×u, a linear map (W: |dual(U)).

The analysis using a basis depended on V being finite-dimensional; more generally, when V has no finite basis, we don't necessarily obtain the whole of {linear maps (W:|V)} as the span of the products of form w×u with w in W and u in dual(V). However, we still can form such products – so define a binary operator &tensor; on linear spaces by:

W&tensor;U = span({w×u: w in W, u in U}) ⊆ {linear maps (W:|dual(U))}

modulo some equivalences I'll come to below, by which various spaces that can be built using &tensor; are conflated with one another via natural isomorphisms. A space constructed using &tensor; to combine linear spaces is known as a tensor space, with &tensor; and × being known as tensor product operators on, respectively, linear spaces and their members.

Bilinear universality

Notice that × is linear in each of its two operands; when a mapping accepts two inputs and is linear in each, it is described as bilinear. One of the central properties of × is that it can be used to represent any other bilinear map as a linear map from products obtained using ×.

Suppose we have linear spaces U, V, W and a linear map ({linear maps (U:|V)}: f |W); since f is linear and each output of f is linear, f is bilinear. A typical member of V&tensor;W is T = sum(: y(i)×z(i) ←i |n) for some mappings (V:y|n) and (W:z|n) from finite n; given such mappings, we can also obtain sum(: f(z(i),y(i)) ←i |n). If we can show that the latter depends linearly on T – and, in particular, does not depend on which y and z we chose, so long as they yield the same T – we shall be able to re-write f in terms of a linear map (U: |V&tensor;W). In the finite-dimensional case, this can be done quite trivially by taking bases in W and V, which yield a basis of V&tensor;W, which we can use to linearly decompose each y(i), z(i) and y(i)×z(i). However, in general, we can't use bases (or, to put it another way, I'd like to show that this is true even for infinite-dimensional spaces).

So suppose we have y, z as above and (V:r|m), (W:s|m) likewise with m finite and sum(: r(i)×s(i) ←i |m) = T. Even if V and W are infinite-dimensional, we can restrict attention to the finite-dimensional spans of (|r:)&unite;(|y:) and (|s:)&unite;(|z:). Let p be a basis of the former and q of the latter; then each r(i) and y(j) is expressible as a linear combination of the members of p; likewise, s and z are expressed in terms of q. With suitable functions we thus obtain r(i) = sum(: R(i,a).p(a) ←a :), y(j) = sum(: Y(j,a).p(a) ←a :), s(i) = sum(: S(i,c).q(c) ←c :), z(j) = sum(: Z(j,c).q(c) ←c :). Linear independence among the q(c) and among the p(a) imply that the p(a)×q(c), for diverse a and c, are linearly independent, so the two expressions we get for T as a linear combination of them must have equal coefficients, sum(: R(i,a).S(i,c) ←i :) = sum(: Y(j,a).Z(j,c) ←j :), for each a and c. But then (remembering that all sums involved are finite, so we can re-order them freely)

sum(: f(z(j),y(j)) ←j :): = sum(: sum(: sum(: f(Z(j,c).q(c),Y(j,a).p(a)) ←a :) ←c :) ←j :); = sum(: sum(: sum(: Y(j,a).Z(j,c) ←j :).f(q(c),p(a)) ←a :) ←c :); = sum(: sum(: sum(: R(i,a).S(i,c) ←i :).f(q(c),p(a)) ←a :) ←c :); = sum(: sum(: sum(: f(S(i,c).q(c),R(i,a).p(a)) ←a :) ←c :) ←i :); = sum(: f(s(i),r(i)) ←i :)

This being true whenever any two sums of products yield the same member of V&tensor;W, we can infer that the relation defined by

F = (U: sum(: f(v(i),w(i)) ←i :) ← sum(: v(i)×w(i) ←i :) |V&tensor;W)

is a mapping (U: |V&tensor;W). Scaling its input is equivalent, as applied to the typical input sum(: v×w :), to scaling either all outputs of v or all outputs of w and thus, by linearity of f in whichever of its two parameters you chose to scale, scales the output by the same factor. Adding two inputs is always expressible as adding two sums of products, which simply yields a longer sum of first the products from one sum, then those from the other; linearity of summation then ensures our output is a similar sum made up of the terms in the sums the two inputs would have produced separately. Hence F is linear.

This has the result of making × a universal bilinear operator in the sense that any bilinear f can be expressed in terms of some linear F, as above, as f(v,w) = F(v×w); that is, bilinearity can always be subsumed into ×, leaving the rest of a bilinear operator's action to a linear map from the resulting tensor product space. Note, however, that the order of factors is not determined by the construction; I could just as readily have defined (U: F |W&tensor;V) above, swapping V and W, and proved an equivalent result.

Closely related to this, I define span, not on sets, but on relations by: given a relation (U:r:V) between linear spaces U and V, span(r) relates u in U to v in V precisely if there exist a natural n and mappings ({scalars}: h :n), (U: x :n), (V: y :n) for which u = sum(: h(i).x(i) ←i :), v = sum(: h(i).y(i) ←i :) and, for each i in n, r relates x(i) to y(i). The resulting span(r) is always necessarily linear; when r is a mapping and respects linearity it will be a linear map. Given this, I can exploit universality to define mappings from a tensor space by linear extension from their actions on simple tensor products; for example, the mapping F above can be written simply as span(: f(v,w) ← v×w :) and defining it by linear extension from f(v,w) ← v×w is just another way of saying the same thing.

Universality and duality

As noted above, linear ({linear (U:|V)}: f |W) can be expressed equally in terms of span(U: f(w,v) ← w×v :W&tensor;V) or of span(U: f(w,v) ← v×w :V&tensor;W) with W and V swapped in the input space. There is no particular reason to chose one order over the other. This bears particularly on the dual of a tensor product of spaces.

Given linear spaces V and W, a member of dual(V)&tensor;dual(W) is, from the definition of ×, a linear map (dual(V): |W) and hence a bilinear map from W and V to scalars, so universality lets us express it in terms of a linear map from W&tensor;V or V&tensor;W to scalars. Thus we can identify dual(V)&tensor;dual(W) with dual(V&tensor;W) or dual(W&tensor;V); or, equivalently, we can identify dual(V&tensor;W) with dual(V)&tensor;dual(W) or dual(W)&tensor;dual(V), again with no particular reason to prefer either order. Contrast the fact that dual(V)&tensor;dual(W) definitely does (with my definitions) prefer to be read as {linear maps (dual(V): |W)} rather than as {linear maps (dual(W): |V)}, even though these two are (via transposition) naturally isomorphic.

One can, of course, opt to resolve the ambiguity by fiat, but it remains that I have yet to see the need and I am quite sure different readers are apt to have contrary opinions as to which order should be chosen. Each camp shall doubtless point to how its choice fits so nicely with everything else; but, as discussed below, each choice does indeed fit nicely with everything that matters and, for everything else, each camp's opinions about what looks nice are sure to coincide with what that camp's choice actually produces.

Extending the product

Our binary operator &tensor; is obviously not Abelian (i.e. W&tensor;U and U&tensor;W are not, in general, the same space); but is it associative ? Consider, then, a three-way product: given linear spaces U, V and W, we can build the product spaces (U&tensor;V)&tensor;W and U&tensor;(V&tensor;W); as specified, these are distinct linear spaces, but how are they related to one another ? Given members u of U, v of V and w of W, a typical member of the first is:

(u×v)×w: = (: (u×v).w(m) ←m |dual(W)); = (: (U: u.v(n).w(m) ←n |dual(V)) ←m |dual(W))

Modulo two applications of the above-mentioned choice of ordering of factors, this bilinear map from dual(W) and dual(V) to U can be expressed in terms, first, of a linear map from dual(W)&tensor;dual(V) to U and, hence, of a linear map from dual(V&tensor;W) to U. In this last form, it is exactly a member of U&tensor;(V&tensor;W); to understand it as such, we need to observe that v×w = (: v.w(m) ←m :dual(W)) acts on dual(V)&tensor;dual(W) as span(: v(n).w(m) ← n×m :); sure enough, expressing the mapping in this way

(: u.v(n).w(m) ← n×m |dual(V)&tensor;dual(W)): = (: u.(v×w)(n×m) :dual(V&tensor;W)); = u×(v×w)

This makes it possible to identify U&tensor;(V&tensor;W) with (U&tensor;V)&tensor;W, making &tensor; an associative operator.

Doing so involved reading a bilinear ({(U: |dual(V))}: |dual(W)) as a linear(U: |dual(V&tensor;W)); as it happens, this involves two uses of the choice of order in universality, so favours neither order. If we chose to express the former as a linear (U: :dual(V)&tensor;dual(W)), each of its inputs is a linear (dual(V): |W) hence bilinear ({({scalars}:|V)}: |W) which the same choice will lead us to express as a linear ({scalars}: |V&tensor;W) in dual(V&tensor;W), as required. Equally, however, if we chose to express our linear ({linear (U: |dual(V))}: |dual(W)) as a linear (U: |dual(W)&tensor;dual(V)), each of its inputs is a linear (dual(W): |V) hence bilinear ({({scalars}: |W)}: |V) and our new choice expresses this as a linear ({scalars}: |V&tensor;W) in dual(V&tensor;W), again as required. All the same, we do see one piece of definiteness: if a bilinear map's first input is in dual(W) and second in dual(V), then we do want it expressed as a linear map from the dual of V&tensor;W rather than from the dual of W&tensor;V.

Given associativity, we can proceed to a bulk action, bulk(&tensor;), which takes any list (:L|n) of linear spaces and combines the entries to produce a tensor space whose typical member is obtained, correspondingly, by applying bulk(×) to a list (:k|n) with k(i) in L(i) for each i in n. Any permutation of n induces, by composition before L and each such k, a linear isomorphism from the resulting tensor space to another, with the same factors in permuted order. When L is not monic (i.e. it has two equal entries) there are permutations of n which, composed before L, yield L; the isomorphisms these induce are from the tensor space to itself; they are not, in general, the identity on it and, indeed, warrant much study in their own right.

In particular, when n is 2, L is [U,V] for some linear spaces U and V and we obtain U&tensor;V; the permutation [1,0] induces the isomorphism span(V&tensor;U: v×u ← u×v :U&tensor;V) and this is referred to as transposition. The roots of that nomenclature are tied to permutations that just swap two entries being commonly referred to as transpositions; I call them swaps to avoid confusion with my consistent use of transpose to mean reversing the order of the first two inputs to a function whose outputs, when given a first input, are all functions (and the corresponding transformation of a relation, that generalises this). As it happens, the use of transpose for span(: v×u ← u×v :), despite its provenance, is actually compatible with my usual meaning, since we can read u and v as linear maps from dual(U) and dual(V) to {scalars}, hence u×v as a function which takes a first input in dual(V) and produces, as output, a function which takes a second input in dual(U); when read this way, the transpose of u×v is indeed v×u.

Extending contraction

Written by Eddy.