The differential calculus quantifies rates of variation of functions by providing, for a given function evaluated at a specified input, a linear map from a small changes in the input to an estimate of the resulting change in the output. It only does this where such an estimate can be highly accurate near the specified input; specifically, when the scale of its errors shrinks significantly faster than the scale of the small change in input, as the input draws closer to the specified input.

Orthodoxy generally introduces differentiation in the context of functions
from the real number line to itself; in this context, a linear map from (small
changes in) input to (small changes in) output can faithfully be represented by
a real number, namely the change in output divided by the change in input. This
leads to deriving derivatives of functions between vector spaces via partial
derivatives

, that quantify how one co-ordinate of the output (given some
co-ordinate system there) varies with one co-ordinate of the input (given
likewise), while keeping all other co-ordinates of the input fixed. From this
it becomes possible to *synthesize* a linear map from inputs to outputs.
However, in general, a derivative (of a function between linear spaces) is
indeed a linear map from inputs to outputs; and it is entirely possible to
introduce the subject in such terms, without going via any co-ordinate systems
or giving primacy to the real derivative. In my opinion the result is indeed
clearer, for all that it requires an understanding of linear spaces, by virtue
of making clear the distinction between input space, output space and (gradient
as) linear map from the former to the latter.

This requires, of course, the theory of linear
spaces; in particular, we'll need to be able to infer a *gradient*
for any chord, between inputs at the vertices of a
voluminous simplex, that maps the vectors
along each edge of the simplex to the exact differences between the outputs at
the edge's ends; and we'll need a notion of *scale* of difference by
which to demand that differences between these linear maps shrink significantly
faster than the scale of a simplex in whose interior our speficied input lies.
The latter can certainly be done by the use of hermitian forms; I suspect it can
also be done by judicious use of the scaling properties of simplices, at least
in the case of spaces linear over (some sub-field of) the real (rather than
complex) numbers. The gradient replaces the real analysis's ratio of change in
input to change in output; the notion of scale quantifies the sense in which the
derivative provides sufficiently accurate estimates of the changes in
output.

When differentiation is introduced in terms of functions from reals to reals, we divide a (possibly zero) change in output by a (definitely non-zero, but typically small) change in input to get a gradient or rate of change between two evaluations of the function; on a graph, this is represented by a straight line between two points on the curve representing the function; this is a chord of the curve. The slope of the chord gives us our gradient. All chords reasonably close to a given point on the curve have, provided the curve is differentiable there, gradients reasonably close to one another and to that of the tangent to the curve at our given point. If we multiply a gradient by the length of a chord, we get a tolerably good estimate of the change in input between its end-points.

So, what can we multiply

by displacements near our point, to estimate
changes in output of a function evaluated at both ends of the displacement, when
the displacements aren't necessarily all in the same direction ? The
answer is, of course, a linear map; and a linear map is determined by its value
on a basis of its input space; so we need as many linearly independent
displacements as the dimension of our input space. That means we need to
evaluate our function not just at two points but, when the input space had
dimension n, at 1+n points. As long as the displacements from one of these
points to each of the rest are linearly independent, they form a basis; which
has a dual, each member of which we can tensor multiply by the change in output
between the end-points of the corresponding displacement; summing over the whole
basis, we get a linear map which maps each displacement from the given one point
to any of the others to exactly the change in our function's value between the
two points in question. This linear map, then, is our estimate of the
derivative, derived from a set of points, presumed to be near the input at which
we sought a derivative.

Indeed, any 1+n points in an n-dimensional input space suffice, provided the
displacements from any one to each of the others are linearly independent, to
determine a linear map that we can think of as the gradient of the function
among those points; so this set of 1+n points serves in place of the two
end-points of our chord, in the n = 1 case, and the linear map they induce
corresponds to the slope of the chord. We thus generalise our 1-dimensional
chord, with two ends, to an n-dimensional chord, with 1+n vertices, that
suffices to determine a gradient of our function. We can then contrive one
mechanism or another for finding what gradient we get in the limit

as the
1+n points draw in on the point at which we were trying to determine a
derivative.

Now, we have 1+n points in an n-dimensional linear space; this is exactly what it takes to define a simplex, which generalises point = 0-simplex, line = 1-simplex, triangle = 2-simplex, tetrahedron = 3-simplex. For each natural n, there's a canonical n-simplex

- S(n) = {mapping ({positives}: p :1+n): sum(p) = 1}

(this even gives ua an S(−1) = {}, the empty set of empty lists
whose sum is 1). Note that ({positives}: p :1+n) only lets p(i) be meaningful
for i in 1+n, but doesn't guarantee every i in 1+n; the i in 1+n for which there
is no p(i) can be treated as if p(i) were 0 (which isn't positive). We can
express any 1+n points in an n-dimensional space N as a list (N: v |1+n); the
simplex which has the v(i) as its vertices is just {sum(p.v): p in S(n)}, where
sum(p.v) = sum(: p(i).v(i) ←i :); each v(i) arises from a ({positive}: p
:1+n) that's actually 1←i, with p(i) = 1 and no p(j) defined other than i =
j. Any p for which (:p|) isn't all of 1+n ignores at least one of the v(j), for
some j for which there is no p(j), and thus is on a face

of our simlex,
which ensures that there are points *outside* our simplex as close to
sum(p.v) as we care to ask for; such a sum(p.v) is on the boundary of our
simplex.

Now, I mentioned the displacements from any one of our vertices to each of the others, that I want to have linearly independent. Let's pause to notice it doesn't matter which of those vertices we fix as start-point for all those displacements. Given ({N: v |1+n), we can infer e(i) = (: v(j) −v(i) ←j :1+n) and restrict it so that e(i, j) is only defined for j and i distinct (e(i, i) would just be zero, after all); each e(i) is then a partial list of n displacements in our space N. Now, a little arithmetic reveals that we can obtain any e(j) from e(i) as e(j, i) = −e(i, j) while, for all k other than i, j, e(j, k) = e(i, k) −e(i, j). One entry in e(j) is a non-zero-scaled version of an entry in e(i); all the other entries in e(j) are just the remaining entries in e(i) displaced by a multiple of that one entry; in such a case, e(j) is linearly independent precisely if e(i) is. So the displacements out of one vertex, to each of the others, are linearly independent, or not, independent of which one vertex we chose to start from. It is thus convenient to use e(n), which I'll now refer to simply as E = (: e(i) −e(n) ←i |n).

I'm fairly sure this is only (directly) applicable to the case
of linesr spaces over (some sub-field of) the reals, although there may be some
way to salvage it for the complex numbers. At the very least, one can do the
real analysis on a linear space over complex, construed as the twice-dimensional
linear space over reals, then reason about whether the inferred real-linear
derivative *does* represent a complex-linear map. None the less, I'll
pose the following in terms of hermitian forms; if we're limited to reals, with
their fatuous conjugate, these shall simply reduce to quadratic forms.

So, back to our simplex {sum(p.v): p in S(n)} with sum(p.v) on the boundary unless ({positives}: p |1+n). All displacements within our simplex are in the span of E; if this isn't the whole of our linear space N, there is some direction u in N for which, for every sum(p.v) in our simplex, sum(p.v) +t.u isn't in our simplex for any non-zero scalar t; as a result, there are points outside our simplex arbitrarily close to any point of our simplex; the whole simplex is thus boundary; it has no interior. Conversely, if the n vectors in E do span our n-dimensional N, they must be linearly independent and form a basis; consider any x = sum(p.v) with p(i) positive for all i in 1+n; as n is finite and each p(i) is positive, there is a positive q, the least entry in p, for which any sum(r.v), with sum(r) = 1 and every r(i) within q of p(i), lies inside the simplex. We can define the obvious positive-definite hermitian form derived from our basis E by summing the tensor-squares of the members of its dual; every point within q of x in the induced distance of this hermitian form lies within our simplex. For any positive-definite hermitian form on N, we can now simultaneously diagnolise this and our E-derived hermitian form, with the former unit-diagonalized; since E is a basis, the E-derived form is positive-definite too, so its diagonal entries are all positive; select a positive r for which r.r is not more than any diagonal entry of the E-derived form, when the other is unit-diagonalised; then every point of N within q/r of x, in the distance induced from the other hermitian form, is within q of x in the E-derived one, hence inside the simplex. Consequently, for the topology induced from the distance determined by any positive-definite hermitian form on N, sum(p.v) is in the interior of our simplex. So our simplex has an interior precisely if its edges, out of any one vertex, are linearly independent; and that interior consists exactly of those sum(p.v) for which (:p|) is 1+n.

Note that this sense of interior can be defined entirely in
terms of simplices and linear independence, without reference to orthodox
topology or hermitian forms; the reasoning above merely establishes that the
notion of interior

that it gives shares enough, for my purposes, in
common with those that arise from orthodoxy.

I describe a simplex with an interior, in this sense,
as voluminous

(since it does indeed have a (non-zero)
volume, as determined by any metric induced from a positive-definite hermitian
form). I describe such a simplex as being about

any
poitnt in its interior.

Each simplex about a point x provides us with a chord of any function
defined throughout that simplex; and that chord supplies us with a gradient for
the function, between the simplex's vertices; discussion of differentiability
then comes down to whether all sufficiently small

simplices about x give
chord-gradients sufficiently close

to some particular gradient to allow
us to accept that gradient as the derivative

of the function at x. I'll
come back to a more orthodox approach to this, but first let me sketch how we
can do it, at least for linear spaces over an ordered field, using nothing but
simplices: we need them for the chords, so why not use them also for the notion
of nearness

?

If the complex case can be salvaged, it'll be by using a 2.n-simplex in the complex n-dimensional space, since it's a 2.n-dimensional space over reals; and the edges of this simplex must real-span the input space for the simplex to be voluminous. This provides the simplices whose interiors do the work of topology; for chords, we'll still be using lists of 1+n points, whose differences span the complex space.

Since we naturally involve simplices in order to get chords across which to obtain gradients, the question naturally arises: do we need anything else ? At least for the (sub-field of) real case, it turns out we don't; this might generalise to the complex case if the last subsection's reservations can be overcome.

Suppose we have some sub-ringlet R of {reals}, whose positives are dense in those of reals; and a (not necessarily linear) mapping (M: f |N) from n-dimensional R-linear space N to R-linear space M. Let {positives} be the collection of R's positive values.

Given a simplex, characterised by vertex list (N: v |1+n), about a point x, we can rescale the simplex towards or away from x simply by replacing v with (: x +t.(v(i) −x) ←i |1+n) for some positive t. The ({positives}: p |1+n) in S(n), so sum(p) = 1, for which x = sum(p.v) then has sum(: p(i).(x +t.(v(i) −x)) ←i |1+n) = sum(p).x +t.(sum(p.v) −sum(p).x) = x since sum(p).x = x = sum(p.v). Thus the same p that makes x an interior point of v's simplex also gets x as its interior point for each rescaled x +t.(v −x) simplex.

Since the space of linear maps (M: |N) is an R-linear space, just like M and N, we can (at least when M is finite-dimensional) form simplices in this space of linear maps and ask whether, for any simplex G about a putative derivative of f at x, there is some simplex I about x for which every simplex about x whose verfices lie in I has chord-gradient inside the specified simplex G. We can, furthermore, ask whether there are some natural h and positive T for which, for every positive t < T and every voluminous simplex about x within I power(h+1, t)-scaled towards x, the chord-gradient of this simplex lies within the power(h+1, t)-scaled version of G. If such h and T exist for any simplices I about x and G about the putative gradient, we can definitely accept the given putative gradient as the derivative of f at x.

Alternatively, given (M: f :N) and a putative gradient f'(x) for it at x, we can define err(u) = f(u) −f(x) −f'(x)·(u −x) as the error in extrapolating f from x to u and ask whether, for some simplex K about 0 in M and H about 0 in N, err(u) lies within K whenever u−x is in H; and, as before, we can ask whether this remains true when we scale K towards zero faster than we scale H towards zero. If we can, for any H and K, then we can confidently accept f'(x) as a derivative for f at x.

By such means, one may define differentiability of f at x in N, with derivative linear from N to M. The special case where N and M are R then simply exploits the usual isomorhpism between {R-linear (R: |R)} and R itself.

In particular, we can recast this by taking an intersection. A first naïve approach would be to intersect the collections {gradient(S, f): simplex S, which lies within B} over all simplices B about p, with gradient(S, f) being the linear map we get from using the vertices of S to sample the values of f; we might hope for the intersection to be {f'(p)}. However, it is easy enough to find cases where the intersection is empty yet it really makes sense for f to be differentiable: for example, consider cube'(0), where cube is (: power(3) :{reals}); every chord of cube has gradient (power(3, u) −power(3, v))/(v −u) for some reals u, v; this is equal to both u.u +u.v +v.v and power(2, u +v) −u.v; when one of u, v is positive and the other is negative, the latter is necessarily positive; otherwise, the former is necessarily positive unless u = 0 = v; so distinct u, v give a positive gradient for their chord. The only value cube'(0) could have is 0, but no chord actually delivers it; although short enough chords about 0 do give gradients arbitrarily close above zero. Orthodoxy would formalise this by taking the closure of each of the sets intersected; which, with simplices we can do by including the boundaries of simplices and constructing, for function (U: f :V) defined throughout some simplex F ⊂ V about some point p of the linear space V, with U also linear:

- slope(f, p) = intersect({simplices in linear(U: |V)}}: S ←B; S subsumes {gradient(R, f): simplex R within B} :{simplices about p in V})

where, for purposes of the relation intersected, the simplex S is a set of linear maps (U: |V), delineated by its vertices. Since each simplex S (in {linear maps (U: |V)}) includes its boundary, we will pick up a gradient that's a limiting value, like 0 = cube'(0), and since we take an intersection, we avoid including more than this needs. If slope(f, p) has exactly one member, we describe f as differentiable at p, with that one member as the derivative of f at p.

Now, since I'm not sure I can make that work for complex vector spaces, let's look at how orthodoxy can use simplices to do its usual limit process to define differentiation in the linear spaces themselves, rather than in terms of components and differentiation of functions from the reals to itself.

This can be done essentially the same way as the second alternative above, but using the unit balls of hermitian forms instead of simplices about the origins in the input and output space. As before, in an n-dimensional input space, we can get a chord off any n-simplex whose edges span the input space (I'm just not so sure we get a sensible notion of interior out of such a simplex, in the complex case). As before, we can define err(u) = f(u)−(f(x) −f'(x)·(u−x) for u in N, using the putative derivative for f at x as f'(x).

If we have any hermitian forms on N and M for which err(u) lies in M's power(h+2, t)-radius ball whenever u−x lise in N's power(h+1, t)-radius ball, for all positive t < some positive T, then we can accept the f'(x) used to compute err(u) as a derivative for f at x.

Written by Eddy.