Tensors

Other related fragments

tensors, transposition and trace
linear structure
scalars
foundations, denotations and primitives.
`up'

This page is due to be retired in favour of a newer treatment of the same material … but that may take some time.

The Tensor Product

For any linear spaces U, V, there is a natural mapping (: ({linear (U:|V)}: u×w = (U: u.w(v) ←v |V) ←w |dual(V)) ←u |U) which may be construed as a multiplication between members of U and members of V's dual. The span of the products × produces is denoted U⊗dual(V), which I'll intermittently write as U&tensor;dual(V) until I kick the habit or discover how to make it as likely to display sensibly as is ⊗. Unless both U and V are infinite-dimensional, U&tensor;dual(V) is the whole of {linear (U:|V)}.

Define multi = (: (: multi(s,W) ←W :{linear spaces}) ← ({linear spaces}:s:) :{non-empty lists}) by: multi([U],W) = {linear (W: |U)}, multi([U]@s,W) = {linear (multi(s,W): |U)}. Then multi([U,...,V],W) is the space of multilinear maps from U,... and V to W and unite(|multi([U,...,V]):) is the collection of multilinear maps from U,... and V.

Consider a linear space U and some u in U. For any linear space W this induces a linear map (W: f(u) ←(W:f|U) :{linear}). So `supply as input to' is a linear map (: W is a linear space; (W: f(u)←(W:f|U) :{linear}) ←u |U) for each linear space U. Likewise, for any list, [U,...,V], of linear spaces and any linear space W, any [u,...,v] with u in U, ..., v in V induces a linear map (W: w(u,...,v) ←w |multi([U,...,V],W)).

The foregoing shows that each list [u,...,v] induces a linear map (W: w(u,...,v) ←w :multi([U,...,V],W)) for each linear space W For given linear spaces U,... and V we have multilinear maps from U,... and V to any linear space, W; Thus any list [u,...,v] in Renee([U,...,V]) induces a polymorphic

Define supply = (: (: x(u,...,v) ←x :) ←[u,...,v] :{lists}) i.e. supply([a]) = (: x(a) ←x :) for any value a and supply(s@t) = (: supply(t, supply(s, x)) ←x :) for all lists s and t, with @ defined by [a,...,m]@[n,...,z] = [a,...,m,n,...,z]. Define (: M :{non-empty lists of linear spaces}) by: M([U]) = {linear (: :U)}, M([U]@s) = {linear (M(s): :U)} for any linear space U and non-empty list, s, of linear spaces: M([U,...,V]) is the union, over linear spaces W, of the space of multilinear maps from U, ... and V to W. We can then examine, for given linear spaces U, ..., V,

(: (: s :M([U,...,V])) ←s :)&on;(: supply :Renee([U,...,V]))

which restricts supply([u,...,v]) down to its action on multilinear maps from U, ... and V; on this space, supply([u,...,v]) acts as a linear map, mapping any linear (...{linear (W: :V)}...: w :U) to w(u,...,v) which depends linearly on w (for fixed u, ..., v), for fixed choice of W (we can't add a multilinear with outputs in W to a multilinear with outputs in some other linear space). (U×dual(V)| [u,w]-> u×w :{linear (V|:U)}) defined by u×w = (V| v-> u.w(v) :U). Evaluating w(v) gives a scalar, which we multiply by u to get a member of U. The span of the embedding is denoted U⊗dual(V) - in which ⊗ wants to be a × in a circle - and known as the tensor product of [U,dual(V)]: I'll also write ⊗([U,dual(V)]) for the same thing, leading on towards, ⊗(W) for at least some mappings (: W :{linear spaces}). Every member of U⊗dual(V) may be written as sum(n| i-> u(i)×w(i) :{linear (V|:U)}) for some (n|u:U) and (n|w:dual(V)).

Notational aside: I'll sometimes combine linear spaces using ×, the Cartesian product (but, much more frequently, I'll combine them with ⊗): I'll often combine their members with this new operator ×. In most contexts, it'll be clear which entities are being dealt with as linear spaces and which as their members, making it unnecessary to distinguish between × and ×: in such contexts (primarily when only one of them would have been in use), I may use × for both and leave the reader to work out which meaning it has in each context. Both ⊗ and × are known as `the tensor product', the first on linear spaces, the second on their members. There is, again, seldom any opportunity for confusion: indeed, it is more conventional for them to share the symbol on which ⊗ is based, ideally using a bigger circle for the operator between linear spaces (but the symbol × is familiar to more browsers than is ⊗, and I haven't found ⨷ yet). On some pages I use &tensor; for ⊗ - I pronounce ⊗ as `tensor', so that's the entity name I remember.

Theorem: if u×v is zero, one of u, v is zero; otherwise, if u×v = x×y, there is some scalar k for which u=k.x and k.v=y. In this, u, x are taken to be members of some S-linear space, U, and v, y to be in V, also S-linear.

Proof: if u×v is zero then, for any w in dual(V), 0 = 0(w) = (u×v)(w) = u.v(w) so either u=0 or, for every w in dual(V), v(w)=0, which makes v=0.

Otherwise, u×v = x×y isn't zero so there's some w in dual(V) for which u.v(w) = x.y(w) isn't zero. For any such w, v(w) is a non-zero scalar so y(w) = k.v(w) for some scalar k. This makes u.v(w) = x.k.v(w) with v(w) non-zero so cancellable, leaving u=x.k. Now, x.y(w) is non-zero, hence so is x: whence, for every t in dual(V), x.y(t) = u.v(t) = x.k.v(t) and x is cancellable, leaving y(t) = k.v(t); so y=k.v.

If I have linear spaces U and W but no linear space V with W=dual(V), I can still define U⊗W: formally, this is done by defining U⊗W, where W is the subspace of dual(dual(W)) spanned by W's natural embedding in dual(dual(W)) - typically all of dual(dual(W)). In such a case I'll generally write U⊗W as U⊗W, unless I'm being fussy: W and W are isomorphic.

Most of the power of the tensor product lies in the fact that we can nearly always get away with discussing `typical' members of U⊗V, of form u×v, and inferring what happens to `general' members, obtained by adding together several typical members, from linearity. Note: the identity linear map (V|:V) is a member of V⊗dual(V) but is not typical (unless V is one-dimensional). On the other hand, scaling a typical member always gives a typical member (because k.(u×v) = (k.u)×v, which is typical): the span of a collection of values, U⊗V = span({u×v: u in U, v in V}), is defined in terms of scaling some of those values and summing the scaled values, but we can discuss U⊗V purely in terms of typical values without having to scale these.

Interaction with linear maps.

Consider any (V| u×t :U) and linear (W|a:V), so u×t is composable after a. (V|t:scalars) linear gives t&on;a linear (W|:scalars), so t&on;a is in dual(W). We obtain

(u×t)&on;a: = (W| w-> (u×t)(a(w)) = u.(t(a(w))) :U); = u×(t&on;a).

Likewise, for linear (U|b:X) and (V| u×t :U), we have

b&on;(u×t)

= b&on;(V| v-> u.t(v) :U)

= (V| v->

b(u.t(v)), each t(v) is a scalar and b is linear, so this

= b(u).t(v)

:W)

= b(u)×t,

so the tensor product interacts straightforwardly with composition of linear maps. In particular, a's interaction with the tensor product is entirely described by its interaction with t: u is ignored and left in peace: in this sense, when we compose a linear(V|:U) after a linear(W|:V), they need know little more about each other than what dual(V) and V have to know about each other.

In the special case where W is just scalars, u×(t&on;a) is just u.(t&on;a), as t&on;a, in dual(W), is just a scalar. Indeed, in general, any linear map (scalars|f:X) is synonymous with its value at 1 since f(r) = f(r.1) = r.f(1) then gives its value for any other scalar input, r. This gives a natural isomorphism between {linear (scalars|:X)} and X. This follows readilly from the definition: (W|a:V) is now (scalar|a:V) which means a = (: r-> r.a(1) :V) is entirely determined by linearity and a(1), which can be any member of V: so, FAPP, a is a(1) in V and t&on;a is likewise (t&on;a)(1) = t(a(1)); as above, (u×t)&on;a = u.(t&on;a) as mappings (scalars|:U); and their values at 1 are both u.t(a(1)), by which they are wholly determined.

In general, for a composable list of linear maps, eg b&on;(u×t)&on;a, with a tensor product as one member of the list, the composition separates around it, to give b(u)×(t&on;a). A second tensor product in a composable list, eg c&on;(u×t)&on;b&on;(v×w)×a, has t&on;b&on;v scalar, giving (c&on;u)×(t&on;b&on;v).(w&on;a), so that b's part in proceedings is entirely described by t&on;b&on;v, as a simple linear scaling on c&on;(u×w)&on;a.

The important property of the tensor product is that, for all practical purposes, it implies that we can separate discussion of {linear (U|:V)} into discussion of dual(U) and V: the basic premis of tensor algebra is that U⊗dual(V) is, at least for all practical purposes, all of {linear (V|:U)}; likewise, that W is all of dual(dual(W)). When V and U are finite-dimensional (making dual(dual(V)) = V), U⊗dual(V) = {linear (V|:U)}; and when they aren't, U⊗dual(V) is, at least, `dense' in {linear (V|:U)}, and we generally keep to contexts in which linear maps are continuous and, hence, wholly determined by their values on any dense subspace.

Bilinearity

For fixed w in dual(V), (U| u-> u×w :U⊗dual(V)) is a linear map, since (V| v-> u.w(v) :U) is linear and, for w, x in dual(V) and k in S:

u×(w+x) = (V| v->

u.((w+x)(v)), use the definition of + on function spaces: (w+x)(v) = w(v)+x(v)

= u.(w(v)+x(v)), and . distributes over +: u.(a+b) = u.a + u.b

= u.w(v) + u.x(v), use the definition of tensor product ...

= (u×w)(v) + (u×x)(v), read + on function spaces again

= (u×w + u×x)(v)
:U) = u×w + u×x
u×(k.w)
= (V| v-> u.(k.w(v)) = (u.k).w(v) :U)
= (u.k)×w,
so, assuming u.k = k.u, these are equal to
(k.u)×w = (V| v-> k.(u.w(v)) :U) = k.(u×w). This justifies referring to this quantity as k.u×w.

Consequently, (U| u-> (dual(V)| w-> u×w :U⊗dual(V)) :{linear (dual(V)| :U⊗dual(V))}) is `linear in both arguments'. For any S-linear U, V, W and mapping (U| f :{(V| :W)}), we can ask whether f is linear (using the linear structure on U and on {(V|:W)}) and we can ask whether f's outputs are linear: these are separate questions, but when both f and its outputs are linear, then f is described as a bilinear map from U and V to W. Clearly the tensor product is, thus, a bilinear map from the two input spaces to their tensor product space, (U| :{linear (V| :U⊗V)}). Indeed, as I'll now show, one can factorise any bilinear map from U and V as a composite of the tensor product before some (U⊗V| :W).

Naturality

I now want to show how the tensor product serves as a natural description of bilinearity. This amounts to showing that {bilinear from U then V to W} is functionally {linear (U⊗V| :W)}, by exhibiting a natural isomorphism between these two linear spaces. First, observe that (U⊗V| F :W) induces (U| u> (V| v-> F(u×v) :W) :) which is bilinear from U then V to W. It turns out to be a little trickier to construct the inverse of this.

Given f bilinear from U then V to W, I need to construct `that (U⊗V| F :W) for which f(u,v) = F(u×v)'. This requires F to be a mapping; which turns out to be the hard thing to prove. At the outset, we can define F on `typical' members of U⊗V using our goal, f(u,v) = F(u×v); but we need to define F on all of U⊗V, which includes sums of typical members; and we need to show that when two sums are equal as members of U⊗V, they give equal answers in W. Thus the proof of the naturality of the tensor product amounts to a proof that we can define a mapping on `typical' members of U⊗V and, provided it is bilinear, know that we can extend it linearly to get consistent answers on all of U⊗V.

To define F on a sum(n| u(i)×v(i) :) of typical members, we need its value to be sum(n| i-> f(u(i),v(i)) :). Now, if we have sum(n| u×v :) = sum(m| x×y :), as may arise, we need to be sure that sum(f(u,v)) = sum(f(x,y)). This is easy for n, m in {1, 0}, aka 2, but requires proof at each step to expand that to {2, 1, 0} = 3, then to {3, 2, 1, 0} = 4 and so on. The proof at each step, expanding it to 1+N = {N,...,0}, has, however, a common form: it exploits the knowledge from previous steps - that it works for N, ..., and 0, - to enable us to prove that it works for 1+N also.

So, for some natural N, suppose we're far enough throught the proof already that we know that equality of sums shorter than N always works: if this enables us to show that sums of length at most N all work, we can move on to N+1 knowing that all sums shorter than N+1 will work.

So our generic sum(n| u×v :) = sum(m| x×y :) is given to yield sum(f(u,v)) = sum(f(x,y)) whenever n and m are both members of some natural N, and we need to show that the same still holds when n and m are at most N. In this, the only relevant cases are: n = N = m and; one of n, m is N, the other is in N. To prove these cases from the given cases, n and m both in N, we need to break up the sums, infer a shorter pair of sums and exploit the fact that we can already cope with all shorter sums.

So consider:

a bilinear f from U then V to W and
a natural number N for which

for any n, m in N, (n| u :U), (m| x :U), (n| v :V) and (n| y :V)
"sum(u×v) = sum(x×y)" implies "sum(f(u,v)) = sum(f(x,y))".

and see what happens if we allow n and m to be either members of N, as given, or equal to N: say m = N and n is either in N or equal to N.

Given f bilinear from U and V to W, we can define a relation (U⊗V: F :W) by

for any u in U and v in V, F relates u×v to f(u,v), and
for any mapping G which is a sub-relation of F (and for which we know how to sum G's inputs, which may well mean that G has finitely many inputs, hence outputs), F relates the sum of G's inputs to the sum of G's outputs.

For any u, x in U, v, y in V and scalar k, we have some mapping (U⊗V: G :W) which is a sub-relation of F and for which

G((u+x)×v) = f(u+x,v) = f(u,v) + f(x,v) = G(u×v) + G(x×v)
G(u×(v+y)) = G(u×v) + G(u×y), likewise;
G(u×k.v) = f(u,k.v) = k.f(u,v) = k.G(u×v)
G(k.u×v) = f(k.u,v) = k.f(u,v) = k.G(u×v)
Now, suppose x×y = u×v. If either u or v is zero, this makes one of x, y zero; hence both f(u,v) = 0 = f(x,y). Otherwise, we have some scalar k with u=k.x, k.v=y giving f(u,v) = f(k.x,v) = k.f(x,v) = f(x,k.v) = f(x,y). Consequently, whichever case we seee, u×v = x×y implies f(x,y) = f(u,v). In particular, F relates 0 to 0.

Now, U×V is the span of `for u in U, v in V, map [u,v] to u×v', so every member of it is a finite sum of terms of form u×v. We need to show that if two such sums are equal, the values to which each sum tells us F relates the sum are also equal and, ultimately, that F only relates a given value in U×V to one member of W.

For any natural number N, introduce a relation (U⊗V: E(N) :W) which, for any (n|u:U) and (n|v:V) with n in N: E(N) relates sum(n| i-> u(i)×v(i) :U×V) to sum(n| i-> f(u(i),v(i)) :W). Observe that F is unite(natural: n-> E(n) |) and each n<N has E(n) a sub-relation of E(N): furthermore, E(0) and E(1) are mappings, from the discussion above (which explicitly deals with E(1): and E(0) is just F's action on the empty sum, whose value is zero, which E(0) maps to zero by definition; which fortunately agrees with E(1) mapping 0×v and u×0 to zero).

For any natural number N for which we are given that E(n) is a mapping for each n in N (eg N = 0, 1 or 2, so far: but suppose N is at least 1), consider any (N|u:U), (N|v:V), (n|x:U) and (n|y:V) with n either equal to N or in N and sum(N| i-> u(i)×v(i) :) = sum(n| i-> x(i)×y(i) :). This equality, applied to any w in dual(V), gives us a linear dependence between the outputs of x and those of u, which we can re-arrange to express, for some j in N, u(j) as a weighted sum of N+n-1 terms, u(j) = sum(N: i-> k(i).u(i) :V) - sum(n| i-> h(i).x(i) :V) for some (n| h :scalars) and (N: k :scalars) for which (|k) is all but j of N.

Since j isn't in (|k), but everything else in N is, there is some monic (N-1| m :N) isoCleanly composable before k, using which we can define (N-1| r :V) to map i-> v(m(i))+k(m(i)).v(j). Defining s = (n| i-> y(i)+h(i).v(j) :V) we now find that

sum(N-1| i-> u(m(i))×r(i) :)
= sum(N-1| i-> u(m(i))×(v(m(i))+k(m(i)).v(j)) :)

= sum(N| i-> u(i)×v(i) :) - u(j)×v(j) + sum(N: i-> u(i).k(i) :)×v(j)

= sum(n| i-> x(i)×y(i) :) + sum(n| i-> h(i).x(i) :)×v(j)

= sum(n| i-> x(i)×(y(i)+h(i).v(j)) :)

= sum(n| i-> x(i)×s(i) :).

We thus have a `shorter' pair of equal tensors: with xnew = u&on;m, ynew = r, unew = x and vnew = s, we find

sum(N| i-> f(u(i),v(i)) :) - sum(n| i-> f(x(i),y(i)) :)
= sum(N-1| i-> f(m(i),v(i)) :) + f(u(j),v(j)) - sum(n| i-> f(x(i),y(i)) :)

= sum(N-1| i-> f(u(m(i)),v(m(i))) :)

+ sum(N-1| i-> f(k(m(i)).u(m(i)),v(j)) :)

- sum(n| i-> f(h(i).x(i),v(j)) :)

- sum(n| i-> f(x(i),y(i)) :)

= sum(N-1| i-> f(u(m(i)), v(m(i))+k(m(i)).v(j)) :) - sum(n| i-> f(x(i), y(i)+h(i).v(j)) :)

= sum(N-1| i-> f(xnew(i),ynew(i)) :V) - sum(n| i-> f(unew(i), vnew(i)) :V)

While n might be n, N-1 is definitely in N: if n is also in N, E(N-1) accepts unew×vnew and xnew×ynew, which have equal sums, and our inductive hypothesis tells us that E(N-1) is a mapping, making the sum of f(xnew,ynew) equal to the sum of f(unew,vnew): the above difference is zero, so the sum of f(u,v) must equal the sum of f(x,y); so the two answers E(N) could have given, for the sums of unew×vnew and of xnew×ynew, agree. Our inductive hypothesis was that if two tensor sums shorter than N are equal, the resulting sums of f's outputs are equal: we now have that if either of the sums is shorter than N and the other no longer, the f-sums are again equal.

The remaining case n=N gives us equal sums for xnew×ynew and unew×vnew, one of which is shorter than N, the other still being only as long as N, but we now know that that makes the two sums of f's outputs equal; so, in turn the same holds for the sums of f(u,v) and f(x,y). So, finally, we find that E(N) is a mapping, just like each E(n) with n in N. Consequently 1+N is a natural number for which every n in 1+N has E(n) a mapping: the inductive hypothesis stated for N is thence deduced to hold for 1+N. Since we knew it was true for N = 0, 1, 2, we now know it for all natural N and, since each E(N) subsumes each E(n) with n in N, the union({naturals}| n-> E(n) |) = F must also be a mapping. That it is linear follows promptly from its definition.

This gives us a mapping, ({linear (U| :{linear (V|:W)})}: f-> F :{linear (U⊗V| :W)}) and we find that ({linear (U⊗V| :W)}| F-> (U| u-> (V| v-> F(u×v) :W) :) :) is inverse to it, producing outputs exclusively in {linear (U| :{linear (V|:W)})}, so we find that {linear (U| :{linear (V|:W)})} is isomorphic to {linear (U⊗V| :W)}: a bilinear map from U and V to W can always be factorised via the standard bilinear map from U and V to U⊗V.

Examine dual(U⊗V) = {linear (U⊗V| :S)}, which the above shows equivalent to {linear (U| :{linear (V| :S)})} = {linear (U| :dual(V))} which is dual(V)⊗dual(U). More concretely, from dual(V)⊗dual(U), we have the mapping w×x-> (U⊗V| u×v-> w(v).u(x) :S) whose outputs are manifestly in dual(U⊗V): and any F in dual(U⊗V) gives us (U| u-> (V| v-> F(u×v) :S) :dual(V)).

Now, let's look at (U⊗dual(V))⊗dual(W) = {linear (W| :{linear (V| :U)})} and U⊗(dual(V)⊗dual(W)), which we've now seen equivalent to U⊗dual(W⊗V) = {linear (W⊗V| :U)}, which is, in turn, equivalent to {linear (W| :{linear (V| :U)})}, so that U⊗dual(V)⊗dual(W) is a reasonable name for these isomorphic spaces. Any given u×x×y in it appears as

(W| w-> (V| v-> u.x(v).y(w) :U) :{linear (V|:U)})
(W| w-> u×x.y(w) :U⊗dual(V)),
(W⊗V| w×v-> u.x(v).y(w) :U)
({linear (dual(V)| :W)}| f-> u.y(f(x)) :U)

so discussions of tensor products do need to indicate which meaning of each term to use: however the ambiguities are easy to avoid and to resolve when unavoidable.

Written by Eddy.