]>
When we have a function f from a one-dimensional space, we can use a chord of it between two inputs x and y to obtain an estimate (f(y) −f(x))/(y −x) of the gradient of f between x and y. All we can really say for this chord's slope is that it's the average of the function's gradient between the two inputs, from which we might even be able to prove that, under certain well-behavedness conditions, the function's derivative does take this slope as its value at some point in the interval. However, for the purposes of estimation, we can locate it nominally, at least, at (x +y)/2, the mid-point of the interval. It may seem strange to rewrite the slope as f(y)/(y −x) +f(x)/(x −y), but notice that in this form the symmetry between x and y is more evident.
Now suppose we know f at a third input, z, as well as x and y; we can, of course, form an estimate of f's gradient by pairing z with either x or y, in the same way, and then use the same construction between this estimate of the slope and the previous to obtain an estimate of the second derivative, the rate of change of f's gradient. Let's pair y with z for this: our second estimated slope is f(z)/(z −y) +f(y)/(y −z) nominally located at (y +z)/2, implying a rate of change of slope
and, once again, we have a formula symmetric between the three inputs (even though we used y in both derivative estimates, but x and z only in one each). This, for n = 2, obtains the n-th derivative of f from n+1 distinct inputs and f's outputs at them, by dividing each output by the product of differences between its input and the various other inputs; summing the results and scaling by n! got the result. So let's tentatively define
for which we've just seen slope(f, p) gives an n-th derivative estimate at average(p) when monic p has length n+1, for n in {1, 2}. Pause to verify that it also works for n = 0, understanding the function as it's own zeroth derivative; the list has only one value in it, 0! and the empty product we're to divide by are both one, so slope(f, [x]) = f(x). Now suppose we have accepted slope(f, p), for any monic (:p|n+1), as n-th derivative estimate at average(p), for all n in m, for some natural m (e.g., initially, m = 3 = {0, 1, 2}) and consider the m-th derivative estimate we get from some monic (f:p:m+1) by using the (m−1)-th derivative estimates from (:p:m) and p&on;suc (where suc = (: n+1 ←n :{naturals}) is the intrinsic successor function of the naturals). This works out as:
The denominator's two averages, each over m values, have all the entries of (:p:m)&on;suc in common; these thus cancel in the difference leaving only (p(m) −p(0))/m as denominator.
In the numerator, the two sums each have entries for f(p(i)) for i from 1 through m−1, divided by most of the same terms in the product, save that the first has p(i) −p(m) where the second has p(i) −p(0)
The f(p(m)) term in that is divided by a product of p(m)'s difference from each p(j) for j from 1 through m −1; the overall denominator is just the j = 0 case missing from the product slope(f, p) divides f(p(m)) by. Likewise, the f(p(0)) term's product has p(0)'s difference from p(j) for the same j; transfering the term's negation to the overall denominator turns that into the missing j = m factor its product has in slope(f, p). Finally, 1/(p(i) −p(m)) −1/(p(i) −p(0)) = (p(m) −p(0))/(p(i) −p(m))/(p(i) −p(0)) supplies a numerator that cancels the overall denominator, while contributing the denominator factors the f(p(i)) term's product is missing. So we finally reduce to
giving us a symmetric combination of f's values at the entries in p, even though we obtained it from a pair of estimates at f's (m−1)-th derivative that used p(0) and p(m) differently than the others; so, again, it makes sense to associate this estimate at the m-th derivative with the average of p.
We thus inductively obtain a way to estimate arbitrarily high derivatives from just enough distinct atoms of a function, used symmetrically.
Notice that the factorial is exactly the factor by which Taylor's Series for f will divide its term in the relevant derivative. So if we use the above to obtain all derivatives at some central value, those will cancel one another, leaving each power(n) of the difference from the central value scaled by a sum(: f(p(i))/product(: p(i) −p(j) ←j; j≠i :1+n) ←i :1+n) for some (:p|1+n) whose average(p) is the central value. Each output of p in that is divided by n differences between inputs, one for each factor of the difference from the central value; and, aside from the n = 0 term (which is just f's value at the central value), the sum of the weights of the outputs of f is zero.
If we're given f's values at some input k and at k±(i +1).h for all i in some natural m, for some h, we can select a symmetric (about k) middle subsequence of these (including k for odd length, excluding it for even) as our list of each length, to obtain estimates of the first 2.m derivatives of f at k, from which to build a Taylor Series approximation to f near k. Indeed, the inputs at which we're given f needn't even be evenly-spaced, just as long as they're symmetrically placed around a central value.
Written by Eddy.