]> Lagrange-Hamilton optimization on a smooth manifold

Lagrange-Hamilton optimization

The formalism Lagrange and Hamilton devised, for generalizing Euler's least time formulation of optics, can be expressed in terms of selecting, among available trajectories between a given pair of end-points, those that extremize the integral along the trajectory of a function which, at each point on the trajectory, depends only on the position and rate of variation thereof on the trajectory. Although it is normally stated more concretely, this looser form of words allows me to interpret it on a smooth manifold, without reference to co-ordinates.

On a smooth manifold, M, a trajectory is a smooth function, from some contiguous segment of the real line, to the manifold. The intrinsic differential operators of the smooth manifold, whence spring its tangent and gradient bundles, assigns a tangent to the trajectory at each point. If the trajectory is (M: x :{reals}) then its derivative at input t, x'(t), is a tangent at the position x(t) in M. The integrand needed for the Lagrangian formalism is a function of position and tangent; applying currying as usual, it is thus a function from position to functions from the tangent bundle to scalars. (If this last is, at each position, a linear function, then it is a gradient and we actually have a gradient-valued tensor field; but there is no requirement of linearity in Lagrange's formalism.) So we have a function (:L:M) and, for each m in M, L(m) is a function ({scalars}:L(m)|T(m)), where T(m) is the space of tangents to M at m. We can thus, given a trajectory (M:x:{reals}), evaluate L(x(t),x'(t)) at each point on the trajectory; and we can integrate this along the trajectory to get a scalar. We can then study the properties of trajectories that extremize this integral.

What makes this subtler, on a smooth manifold, as compared to the vector space tacitly assumed by Lagrange and Hamilton, is that L's second parameter doesn't have meaning without its first – and the classical solution to the problem relies on discussing L(m,v) as if its dependencies on m and v were entirely independent. On a smooth manifold, for any given v, we cannot vary m – because v is a tangent at a specific point, m, on the manifold, so is not even an acceptable input to L(n) when n is not m. This need not be a fatal problem, but it does force us to pay attention to details. Consequently, before analyzing the problem in co-ordinate-free terms (as one should), I shall begin by following orthodoxy and looking at it through the eyes of a chart.

Analysis using a chart

Using a chart naturally reduces the problem to the case where M is, in effect, a neighbourhood in a vector space, albeit one whose geometry might not be simple. Even if we want to study trajectories, on our manifold, that extend beyond the domain of any single chart in our available atlas, the trajectories which extremize our integral must do so equally under perturbations within any given chart through whose domain the trajectory passes, leaving the parts of the trajectory outside that chart unchanged; so every result of the per-chart analysis must indeed apply to the whole trajectory, in so far as it can be stated in terms independent of the choice of chart. Since extremization is concerned only with the limiting behaviour under arbitrarily small perturbations, and any global perturbation of our trajectory can be built up as a composite of perturbations, each of which only changes it within a single chart, the constraints imposed by the per-chart analyses must in fact suffice to characterize the extremal trajectories, at least when every trajectory lies in the union of the domains of finitely many charts. It thus, in particular, suffices to consider perturbations that only change our trajectory on bounded intervals.

So take M to simply be an open neighbourhood in a vector space V; as before, we have a trajectory (M:x:{reals}) but now there is, at each point of M, a natural isomorphism between tangents at M and vectors in V, with the result that L effectively becomes a function of two vector inputs that we can vary independently. For each m in M, L(m) is then a function ({scalars}:|V) and we can discuss its derivative in the usual way for such functions; let D1L(m,v) stand for L(m)'s derivative at input v in V (i.e. ∂dL(m,v)/∂v). Likewise, for any v in V, we have L's transpose at v, ({scalars}: L(m, v) ←m |M), with M an open neighbourhood in vector space V, so can discuss it's derivative in the usual terms; let D0L(m,v) be its derivative at input m (i.e. ∂dL(m,v)/∂m). For each m in M and v in V, both D0L(m,v) and D1L(m,v) are linear maps ({scalars}:|V), i.e. members of dual(V).

Given trajectory (M:x:{reals}) and reals a<b in (:x|), write U = {real t: a≤t≤b} and let I = ({scalars}: integral(: L(y(t), y'(t)) ←t |U) ←y :{trajectories in M}). Let (V:h|U) be an arbitrary smooth trajectory in V whose value and (all) derivatives tend to zero in the limit as one approaches either end-point. For any real s we have a perturbed trajectory x+s.h defined explicitly on the interval from a to b and implicitly extended to the whole of x's domain outside the interval by just using x. Since there is no change outside the interval, we can ignore the integral outside the interval U. As long as x lies in M's interior, there must be, in {reals}, some neighbourhood of zero for which, whenever s is within this neighbourhood, x+s.h lies entirely within M and thus is a trajectory for which I(x+s.h) is defined. For x to be extremal for I, dI(x+s.h)/ds must be zero at s = 0 for every h meeting the conditions given above. Since the bounds of the integral do not vary, this derivative is:

= d(integral(: L(x(t)+s.h(t), x'(t)+s.h'(t)) ←t |U))/ds
= integral(: d(L(x(t)+s.h(t), x'(t)+s.h'(t)))/ds ←t |U)
= integral(: h(t)·D0L(x(t)+s.h(t), x'(t)+s.h'(t)) +h'(t)·D1L(x(t)+s.h(t), x'(t)+s.h'(t)) ←t |U)

Now, the second term here is one term in the derivative of h·D1L(x+s.h, x'+s.h'), so we can re-write this as:

= integral(: h(t)·D0L(x(t)+s.h(t), x'(t)+s.h'(t)) +d(h(t)·D1L(x(t)+s.h(t), x'(t)+s.h'(t)))/dt −h(t)·d(D1L(x(t)+s.h(t), x'(t)+s.h'(t)))/dt ←t |U)

and the integral of the exact derivative is just the change in h·D1L(x+s.h,x'+s.h') between the end-points; and h is zero at both end-points, so the change is zero. We are thus left with:

= integral(: h(t)·(D0L(x(t)+s.h(t), x'(t)+s.h'(t)) −d(D1L(x(t)+s.h(t), x'(t)+s.h'(t)))/dt) ←t |U)

and this must be zero, at s = 0, for all h that satisfy our constraints. For any given t (strictly) between a and b and any desired value for h(t), we can construct an h which does indeed satisfy our given smoothness and boundary requirements and take the given value for h(t); furthermore, we can do so while having h zero outside any arbitrarily narrow interval around t. Consequently, I can only be extremal at x if:

for all t, i.e.

This is the conventional solution to the general Lagrangian problem.

Interpretation on a smooth manifold

Now, for given m in M, L(m) is a function from T(m) to scalars so we can, in the usual way, differentiate it at v in T(m) to obtain D1L(m,v) in G(m) = dual(T(m)), essentially as we did above with T(m) = V. However, as noted above, for given v, we can't ask how L(m,v) varies with m (since v is in T(m) for only one m in M), so D0L(m,v) isn't immediately meaningful.

So consider a vector field v, defined in some neighbourhood of m in M, and look at how the scalar field H(v) = (: L(m,v(m)) ←m :) varies. In the eyes of any differential operator that considers v constant, the gradient of this scalar field, dH(v), is what we wanted to use as D0L. More generally, given a differential operator D, its derivative of v is Dv, a tensor of rank G&tensor;T and we expect this to contribute Dv(m)·D1L(m,v(m)) to dH(v,m); so our candidate value for D0L(m,v(m)) is dH(v,m) −Dv(m)·D1L(m,v(m)); but how does this depend on our choice of v and D ? We can more or less tolerate a dependency on differential operator (because we usually work in a context which has already chosen one differential operator to use), but dependency on choice of vector field (i.e. how v varies away from m) would make it impossible to define D0L(m) as a function ({scalars}:|T(m)). So: can we be sure that, for vector fields u and v, defined in some neighbourhood of m in M, and a differential operator D,

Now, the right-hand side is just the gradient at m of ({scalars}: L(p,v(p)+u(p))− L(p,v(p)) ←p :M) which, at least for small u, is well-approximated by (: u(p)·D1L(p, v(p)) ←p :); and u(m) = 0 tells us that u is indeed small near m, so this approximation should be good enough where we actually need it. Differentiating it, we get Du(p)·D1L(p,v(p)) plus a term in u(p) contracted with D(D1L(p,v(p))); evaluation at p = m, with u(m) = 0, annihilates this last term, leaving the first, which is indeed the required Du(m)·D1L(m,v(m)). Thus we can use, for any vector field v with v(m) = w, D0L(m,w) = dH(v,m)−Dv(m)·D1L(m,w) as a working definition of D0L(m,w).

Note that the difference between two differential operators acts on each rank as a linear map at each point, so Du(m) won't depend on choice of differential operator (since the difference is a linear map applied to u(m) = 0); but Dv(m) typically shall (the linear map is applied to v(m) = w). Nor is there any a priori reason to suppose contraction with D1L(m,w) shall necessarily annihilate such differences, so we must suppose that D0L depends on choice of differential operator.

When we substitute that back into our solution, above, we get:

for any (and every) vector field v for which v(x(t)) = x'(t). One further wrinkle remains: D1L(x(t),x'(t)) is in G(x(t)), the space of gradients at x(t), and this is a different space at each t, which obliges us to take more care over d/dt. Fortunately, given that we already need a v with v(x(t)) = x'(t), we can use v to define (: D1L(m, v(m)) ←m :M) as a gradient-valued tensor field in (at least) some neighbourhood of our trajectory; differentiating this and contracting with x'(t) then gives us an appropriate interpretation of d/dt applied to it, so we obtain:

subject to the reading that D(D1L(x(t),x'(t))) really means D(: D1L(m, v(m)) ←m :) evaluated at m = x(t), with v(x(t)) already given to be x'(t). So we can at least express the general Lagrangian solution in a smooth manifold's terms, for all that this requires some interpretation, includes dependencies on our choice of differential operator and needs a proof (where I have merely sketched reasons to believe) of invariance under choice of the vector field used.

Now, the right-hand side has, by now, taken on a form that looks a lot like the product law applied to D(: v(m)·D1L(m,v(m)) ←m :) and evaluated at m = x(t); however, in the second term, x' (i.e. v) is contracted with the gradient rank arising from the newly-applied D rather than with the one in D1L arising from L's variation in its second input. Still, the difference is just x'(t) contracted with twice the antisymmetric part of D(: D1L(m,v(m)) ←m :M); and antisymmetrizing the derivative of a gradient field yields a result independent of our choice of differential operator. (This is important: we started out with a problem posed in terms of integrating a scalar along a trajectory; this contained no dependency on, or even reference to, any differential operator or even any reason for preferring one over another; the solution should not depend on choice of differential operator.) So we can eliminate D and refine our equation to:

with both sides evaluated (after doing the differentiation they entail) at m = x(t) and with v(x(t)) constrained to be x'(t). Notice that, on the left, L(m,v(m)) −v(m)·D1L(m,v(m)) is the estimate of L(m,0) one obtains by linear extrapolation from L(m)'s values near v(m). (When construed as L −x'·∂L(x,x')/∂x', negating it yields the Hamiltonian.)

Valid CSSValid XHTML 1.1 Written by Eddy.