Factorial

The function factorial = (: n! ←n |{naturals}) is defined inductively by

yielding 1! = 1, 2! = 2, 3! = 6, 4! = 24, 5! = 120, 6! = 720, 7! = 5040 and so on; notice that, after a slow start, it grows very rapidly. In particular, for any scalar x, (: power(n, x) ←n |{naturals}) grows more rapidly than factorial for n smaller than x (i.e. abs(x) > n) but, since there is some natural larger than x and all subsequent naturals are larger yet, factorial ultimately grows faster than (: power(n, x) ←n :). Consequently, (: power(n, x) / n! ←n :) ultimately stops increasing and starts decreasing; furthermore, since each successive n is bigger than the previous one, each decrease is by a successively bigger factor n/x, so ultimately (: power(n, x) / n! ←n :) tends to zero as n grows large. Thus factorial grows faster than any (power, hence any) polynomial function. Indeed, consider even an exponential (: A.exp(k.n) ←n :) for some constants A, k; as n increases by 1, this grows by a factor exp(k); but there is some natural N > exp(k), and factorial grows by successively larger factors > exp(k) for n > this N; so factorial even grows faster than any exponential.

Factorial shows up in many branches of mathematics, often by way of combinatorics – the theory of combinations. When n distinct objects are considered, the number of orderings of those objects is n!, as may be seen by observing that we may chose any of the n objects first; after which we have n−1 among which to chose as second; then n−2 for third and so on until we are left with three, two and finally one choice. The same argument implies that there are n! permutations of any set of n items. We'll see similar arguments, below, when considering the number of ways of choosing a specified number of members from a finite set.

The factorial of n is the number of permutations of n. Many of those permutations have fixed points – that is, some members of the permuted set are mapped to themselves, while all the others get rearranged. A derangement is a permutation with no fixed points: a derantement of n is an iso (n| f |n) for which {i in n: f(i) = i} is empty. The number of these, for a given n, is called the subfactorial of n, written !n. Since it grows as n!/exp(1) (in fact, it's the nearest integer to that) I'll not be devoting much attention to it.

Wilson's theorem

One of the ways to define prime numbers is as those with no non-fatuous factorisations; for these purposes, the general truth n×1 = n = 1×n is, for these purposes, considered fatuous. A second way to say that is that a number is prime precisely if its only divisors are itself and 1. This, in turn, is equivalent so saying no number between 1 and it is a divisor of it. With this definition, 1 is deemed prime, which some other definitions of prime exclude; in particular, I favour one that does exclude it, so I want to avoid using this meaning of prime. Numbers with non-fatuous factorisations are also described as composite, so I'll here use the term non-composite to describe numbers whose factorisations are all fatuous.

If you multiply together all the primes in any finite set of primes, then add 1 to the result, you get a number that is necessarily not a multiple of the primes in your finite set, hence there is no finite set of primes that suffices to factorise all the natural numbers. This is the essence of the proof that there are as many primes as there are natural numbers. Likewise, given any number, including a prime, we can take its factorial and add one to get a number whose prime factors are all greater than our given number; the least possible prime factor of n! +1 is n +1. This leads to the question of when n! +1 is a multiple of n +1. Let's start by looking at the values of n! +1 modulo n +1, for the first few naturals.

n +11234567891011
n! +12237251217215041403213628813628801
modulo00030101110

Now, all natural numbers are equal modulo 1, so the first column isn't particularly interesting. Aside from that 3 in the fourth column, we get a 1 for every composite and a 0 for the rest. Wilson's theorem says that we get 0 exactly when n +1 is prime; it was first proved by Laplace. I'm not going to prove that here, but I will pause to show how we can reason out some of the details, mainly as an exercise in showing how we can reason about prime-ness in connection with factorials.

The smallest number with a non-fatuous factorisation is 2×2 = 4 = 3 +1 and 3! +1 = 7 is −1 (or, equivalently, 3) modulo 4. Any other pefect square, k.k = n +1, has k > 2 hence n +1 = k.k > 2.k > k so that k and 2.k are distinct terms in n!'s product; and their product, 2.k.k, is a multiple of n +1. Any other non-fatuous factorisation of n +1 is as a product of two distinct numbers stricly between 1 and n +1, hence two distinct terms in n!'s product. Consequently, for any non-fatuous factorisation other than 2×2 of n +1, we find n! a multiple of n +1 and hence n! +1 equal to 1 modulo n +1. This shows that, modulo any composite n +1, n! +1 is ±1, with −1 only arising in the case n +1 = 4; in particular, no composite n +1 is a divisor of n! +1.

For natural n, then, n +1 is composite and n +1 divides n! +1 are mutually exclusive; at least one of them is false. Wilson's theorem goes beyond that to say that exactly one of them is true; they are complementary. Proving that depends on understanding which numbers are self-inverse modulo a prime, which is too much of a digression for the present (but see the link above).

The Gamma function, Γ

There's also a function on the complex plane which turns out to coincide with factorial at integer inputs. It's most orthodox to specify a shifted version of this function, known as the Gamma function or Γ (that's the Greek letter roughly corresponding to G), defined by:

(in which exp(ln(t).(x −1)) effectively implements power(x −1, t) even when x isn't a positive real) for which we can show that

Γ(x+1)
= integral(: exp(−t).power(x, t) ←t |{positives})

to which we can apply integration by parts, interpreting exp(−t) as −exp'(−t) so that the integrand is one term in the derivative of −exp(−t).power(x, t), yielding

= change(: −exp(−t).power(x, t) ←t |{positives}) −integral(: −exp(−t).x.power(x−1, t) ←t |{positives})

but, when x > 0, power(x, 0) = 0 makes one end of the change() zero; while the other is dominated by exp(−t) tending to zero as t grows, so is also zero, making the whole change zero; leaving

= x.integral(: exp(−t).power(x−1, t) ←t |{positives})
= x.Γ(x)

from which, by induction, we can infer that Γ(n+1) = n!.Γ(1) for each positive natural n. (I've only really shown the result for positive real x; but it turns out to work for x elsewhere in the complex plane.) Now, Γ(1) is just integral(: exp(−t) ←t |{positives}) which is change(: −exp(−t) ←t |{positives}) = 0 − (−1) = 1, whence Γ(1) = 1 and Γ(1+n) = n! for every natural n. It thus makes sense to extend the definition of factorial to allow n! to mean Γ(1+n) = integral(: exp(n.ln(t) −t) ←t |{positives}) even when n is not natural.

While evaluating Γ directly is generally non-trivial (though see below for a family of exact values and some good approximations) the above enables us to determine its value at positive integer inputs; from Γ(v) it's also possible to use the recurrence relation Γ(x+1) = x.Γ(x) to infer Γ's values at all integer+v inputs, for any non-integer v (at least when it's a positive real). In particular, this implies that we never need to compute Γ for any input with real part outside a range of width 1, e.g. from 0 to 1 or from 1 to 2.

Evaluating Γ at half-integer inputs

To obtain the integer+1/2 values, consider

Γ(1/2) = (−1/2)!
= integral(: exp(−t)/√t ←t |{positives})

substitute u = √t, u.u = t, 2.u.du = dt, 2.du = dt/√t, giving

= 2.integral(: exp(−u.u) ←u |{positives})

and integrating this over negatives would necessarily yield the same as integrating it over positives (and the integrand is smooth at zero), so twice the latter is simply the integral over all reals;

= integral(: exp(−u.u) ←u |{reals})

we can now apply a standard trick for integrating Gaussians, via computing the square root of the square of the integral:

= √(integral(: exp(−x.x) ←x |{reals}).integral(: exp(−y.y) ←y |{reals}))

interpret x and y as Cartesian co-ordinates in two dimensions:

= √integral(: exp(−x.x−y.y) ←[x, y] |{({reals}:|2)})

now switch to polar co-ordinates in two dimensions, with x = r.Cos(θ), y = r.Sin(θ), r in {positives}, θ ranging from −turn/2 to turn/2; the transformation's Jacobian is r/radian, yielding

= √integral(: integral(: exp(−r.r).r/radian ←r |{positives}) ←θ :{angles modulo turn})

but the integrand doesn't involve θ so integrating over its range simply multiplies the inner integral by θ's range, turn; pulling out the division by radian from the Jacobian thus gives a factor of turn/radian or 2.π. Meanwhile, we can substitute v = r.r, dv = 2.r.dr to simplify the inner integral:

= √(2.π.integral(: exp(−v)/2 ←v |{positives}))
= √(π.change(: −exp(−v) ←v |{positives}))
= √(π.(−0 −(−1)))
= √π

from which we can infer exact values for Γ at all odd multiples of 1/2; (1/2)! = (√π)/2, (3/2)! = 3.(√π)/4 and so on. Interestingly, we can wind backwards as well; (n−1)! = n!/n, so (−1/2)! = √π, (−3/2)! = −2.√π, (−5/2)! = 4.(√π)/3 and so on. (Note that we couldn't do this for integers because Γ(0) = (−1)! requires us to divide Γ(1) = 0! by 0; indeed, as a function on the complex plane, Γ has a simple pole at each non-positive integer.) For (n+1/2)! we need to compute

(n+1/2)! / (−1/2)!
= (n+1/2).(n−1/2)…(3/2).(1/2)
= (2.n+1).(2.n−1)…(3).(1)/power(n+1, 2)
= (2.n+1).(2.n).(2.n−1)…(4).(3).(2).(1)/(2.n)/…/4/2/power(n+1, 2)
= (2.n+1)!/n!/power(2.n+1, 2)

whence

(n+1/2)! = Γ(n+3/2)
= (√π).(2.n+1)!/n!/power(n, 4)/2

and

2.power(n, 4.π).n!/(2.n+1)!
= power(n+1/2, π)/(n+1/2)!

the last of which enables a unification of the formulae for the volumes of spheres of various dimensions.

Combinatorial choosing

If we have a set of n objects and want to chose a subset with i members in the subset, we find there are n!/i!/(n−i)! such subsets: as for choosing orderings of our whole set, we have n choices for the first item, n−1 for the second and so on down to n+1−i choices for the i-th member of our subset; but this strategy of choosing will give any particular i-member subset in each of its i! orderings, and a subset is characterized only by the members it has, not the order in which we collected them; so we must divide the number of ordered choices, n.(n−1)…(n+1−i) = n!/(n−i)!, by the number of times each subset shows up in different orders, i!, giving n!/i!/(n−i)!.

More formally, define a function, chose, by: chose(n, i) is the number of distinct subsets of size i in a set of size n. Thus it's the number of distinct ways of choosing i things when n are available. Since chose(n, i) is necessarily natural and is plainly zero for n<i or i<0, we effectively have a mapping (:chose|{naturals}) each output of which is ({naturals}:chose(n)|1+n). Since selecting i items among n is equivalent to choosing the n−i to not select, we can infer chose(n, i) = chose(n, n−i) for all n, i. It should also be clear that chose(n, 0) = 1 = chose(n, n) for all n.

Now, suppose we know chose(n, i) for all i, for some n: consider chose(1+n, i) for some i. Chose any one member, M, of the set of 1+n objects; we are thus choosing among n un-named objects and one that we've arbitrarily named M. Any sub-set of size i either omits M, in which case it's a sub-set of the n un-named objects, or includes M, in which case it's the union of {M} and a sub-set of size i−1 drawn from among the un-named objects. There are chose(n, i) ways to do the former and chose(n, i−1) to do the latter. We thus infer chose(1+n, i) = chose(n, i) +chose(n, i−1), at least when 0<i≤n.

Now, observe that chose(n, i) = n!/i!/(n−i)! at least when i = 0 and i = n; in particular, this formula correctly gives chose(n, i) for all i in 1+n for n = 0. Suppose it gives the right answers for all i in 1+n for some given n (e.g. 0). Now consider chose(1+n, i) for any i. If i = 0 or i = 1+n we already know the formula holds. Otherwise, 0<i≤n and we obtain

chose(1+n, i)
= chose(n, i) +chose(n, i−1)
= n!/(n−i)!/i! +n!/(n+1−i)!/(i−1)!
= n!.(n+1−i)/(n+1−i)!/i! +n!.i/(n+1−i)!/i!
= n!.((n+1−i) +i)/i!/(n+1−i)!
= n!.(n+1)/i!/(n+1−i)!
= (n+1)!/i!/(n+1−i)!

so we find that the formula correctly gives chose(1+n, i) for all i as well; since the formula does work for all i from 0 to n for n = 0, we can now induce that it works for all i from 0 to n for each natural n.

The function chose shows up in many places. Notably, if we consider the binomial power(n, x+y), expanded as a polynomial in x and y, we observe that the term in power(i, x).power(j.y) can only have non-zero co-efficient if i+j = n; it arises from selecting x from i of the n factors of x+y and y from the remaining j = n−i factors; and there are just chose(n, i) ways to do this, so

Using Γ (or otherwise) we can extend chose to be a function of two complex values instead of two natural numbers. Addition of gamma-distributed random variates implies Γ(X).Γ(Y)/Γ(X+Y) = integral(: power(Y−1, 1−t).power(X−1, t) ←t; t≤1 :{positives}) for all X, Y. Comparing this with chose(n+m, n) = (n+m)!/n!/m! and Γ(1+i) = i!, it makes sense to substitute X = n+1, Y = m+1 and obtain:

Approximating (: chose(2.n, n) ←n :)

We can also obtain, from above,

(n −1/2)!/n! = Γ(n+1/2)/Γ(n+1)
= (√π).(2.n)!/n!/n!/power(n, 4)
= (√π).chose(2.n, n)/power(n, 4)

Using Stirling's formula (see below) lead me to expect this to be well approximated by 1/√n; so I took a Python session and defined a function to evaluate chose(2.n, n)/power(n, 4) as product(: 1 −1/2/(i +1) ←i :n), and divided 1 by π times that product's square. The results approximated n+1/4 fairly well, so I subtracted that and observed the remainders to be decreasing, so inverted them and got near 32 times input; dividing by 32 got results close to n+1/4 again, so I subtracted that again. Once more, the remainders were decreasing; on multiplying by n+1/4 the results were all respectably close to 9/64, with tiny differences fluctuating haphazardly, so I suppose I've reached the limits of my (64-bit) computer's precision. Inverting all of that, I obtain a function F = (: x +1/32/(x +9/64/x) ←x :), for which:

and the approximation is really quite accurate even for small n. (At n = 2, the fractional error is barely ten parts in a million; by n=11 it is below one in a milliard. F's answer is low at n = 93 but not otherwise before n = 100, after which it's low almost as often as it's high. I suspect the lows are artefacts of the imprecision of floating-point arithmetic.) In particular (: chose(2.n, n) ←n :{naturals}) grows merely exponentially (slightly slower than (: power(n, 4) ←n :) but faster than (: power(n, 4−u) = exp(ln(4−u).n) ←n :) for any u>0), rather than combinatorially. Re-arranging the above, this yields:

We can re-arrange F(x) = x +1/32/(x +9/64/x) as x.(1 +2/(9 +(8.x)2)). (Switching to computing F this way is sufficient to change how often its answer is high or low, but not the general pattern of being positive only slightly more often than negative.) Of course – since F(n+1/4).F(n−1/4) is not simply n.n (the difference is (15/4).(15/4)/((64.n.n +5).(64.n.n +5) +144), which dies off very rapidly with n, matching the observations above), as implied by multiplying n!/(n −1/2)! by (n −1/2)!/(n −1)! – this remains merely a convenient approximation to the given ratio of factorials; it is not an exact identity of functions.

Stirling's formula

For large n, log(n!) is well approximated by the formula

which is attributed to Stirling. For chose(2.n, n), this implies:

log(chose(2.n, n))
= log(Γ(2.n +1)) −2.log(Γ(n+1))
≈ (2.n+1/2).log(2.n) −2.n +log(2.π)/2 −1/24/n −(2.n+1).log(n) +2.n −log(2.π) +1/6/n
= n.log(4) −log(π.n)/2 +1/8/n

yielding exp(1/8/n)/√n as an approximation to (n −1/2)!/n! (see above).

Apparently, there's an easy and elegant way to derive Stirling's form. I should also derive the sinc(x.π).Γ(1+x).Γ(1−x) = 1 result.

There's a closely related approximation given by approximating log∘factorial via:

for some natural N, ({reals}:b|N) and reals a, g (which also depend on N); approximations of this form are attributed to Lanczos. With N = 6, g = 5, a = 1.000000000190015 and b = [76.18009172947146, −86.50532032941677, 24.01409824083091, −1.231739572450155, 0.1208650973866179e−2, −0.5395239384953e−5], an approximation of this form can be accurate to two or three parts in a million million for modest values of z.

David MacKay, in his book Information theory, Inference and Learning Algorithms (p2) gives a rather nice approach to deriving all but the last term of Stirling's formula. The Poisson distribution for a natural-valued random variate with mean λ has probabilities for its outcomes proportional to (: power(i, λ)/i! ←i :{naturals}); since these sum to exp(λ), the constant of proportionality is exp(−λ). If we add two such random variates with means λ and μ the result has probability distribution

(: sum(: exp(−λ).exp(−μ).power(j,λ).power(i−j, μ)/j!/(i−j)! ←j :1+i)
= exp(−λ).exp(−μ).sum(: power(j,λ).power(i−j,μ).chose(i, j)/i! ←j :1+i)

but the sum here is just the binomial formula, above, divided by i!, so

= exp(−(λ+μ)).power(i,λ+μ)/i!
←i :{naturals})

which is just the Poisson distribution again, but now with parameter λ+μ. The central limit theorem says that if we sum a large number of independent identically-distributed random variates with finite variance then the distribution of the sum tends, as the number of terms increases, to be well approximated by the normal distribution, at least where its probabilities are significant. The Poisson distribution with parameter λ has mean and variance λ; in particular it has finite variance. Adding large numbers of Poisson distributed variables yields a Poisson distributed variable with parameter equal to the sum of the parameter-values of the distributions summed.

Thus the Poisson distribution, when its parameter is large, is well approximated by the Gaussian distribution, at least where either has non-negligible probability; for example, near their mean. The Gaussian with mean and variance λ (to match the Poisson distribution with parameter λ) has probability density (: exp((x−λ).(x−λ)/2/λ) ←x :)/√(2.π.λ) so we infer, for n near λ, that exp(−λ).power(n,λ)/n! is well approximated by the integral of this distribution across the interval between n±½ – an interval of width one, in which the density's second derivative is small, so the integral is well approximated by the interval's width times the density at its middle, i.e. exp((n−λ).(n−λ)/2/λ)/√(2.π.λ). So now consider the case n = λ and we have

whence n! ≅ power(n+½, n).exp(−n).√(2.π), whence log(n!) = (n+½).log(n) −n +log(2.π)/2.

Now let's return to our earlier study of chose(2.n, n) and use Stirling's formula to see what we get, only let's look at chose(n.(k+1), n) for some general constant k.

log(chose(n.(k+1), n)
= log((n.(k+1))!) −log(n!) −log((k.n)!)
≈ (n.(k+1) +½).(log(n) +log(k+1)) −n.(k+1) +log(2.π)/2 −((n +½).log(n) −n +log(2.π)/2) −((n.k +½).(log(n) +log(k)) −n.k +log(2.π)/2)
= (n.k +n +½).log(n) −(n +½).log(n) −(n.k +½).log(n) +(n.k +n +½).log(k+1) −(n.k +½).log(k) −n.(k+1) +n +n.k +log(2.π)/2 −log(2.π)/2 −log(2.π)/2
= −log(n)/2 +n.log(k+1) +(n.k +½).log(k+1) −(n.k +½).log(k) −log(2.π)/2
= n.(log(k+1) +k.log(1 +1/k)) +log( (1 +1/k)/(2.n.π) )/2

which, for n = 1, gives us 2.n.log(2) −log(n.π)/2, so chose(2.n, n) ≈ 4n/√(n.π), although this can (see above) be marginally improved on. For large k, log(1 +1/k) ≈ 1/k so log(chose(n.(k+1), n)) ≈ n.(1 +log(k+1)) +1/k/2 −log(2.n.π)/2 and chose(n.(k+1), n) ≈ exp(n).power(n, k+1)/√(2.n.π). (TODO: complete derivation !)


Valid CSSValid HTML 4.01 Written by Eddy.