The Binomial Coefficients

Consider a sum of two terms, say x +y, and some positive (whole) power of it, power(n, x +y). Such expressions arise in many situations in mathematics, so they have been examined in some detail. Since power is just repeated multiplication, we can think of this as n copies of (x +y) all multiplied together;, which we can expand out into a sum of terms, each obtained by selecting one of x and y from each factor to multiply together, resulting in power(i, x).power(j, y) for some i, j with i +j = n; we selected x in i of the factors and y in the remaining j of them. Each distinct way of doing this leads to a term and, for each i, j we can then count up how many power(i, x).power(j, y) terms arise to express the sum of terms as sum(: F(i, j).power(i, x).power(j, y) ← [i, j] :{lists ({naturals}: f |2): sum(f) = n}, where F(i, j) is the number of terms of the given form for each i, j.

Formula for the coefficients

Given how I've just defined it, F(i, j) is the number of distinct ways to select i items (in our case factors) from a list of i +j items, without regard to the order in which we select them. If we did care about the order in which we selected them, we'd have i +j choices for the first, i +j −1 choices for the second and so on down to j +1 choices for the i-th entry selected; this gives product(: j +1 +k ←k |n) options. We don't actually care about the order of selection: for any given i entries, we can pick any of those i entries first, any of the remaining i −1 entries next and so on down to having only one choice for the last, so the preceding order-aware counting actually counts each selection product(: 1 +k ←k |i) times; if we divide our earlier count by this, we get the number of distinct ways to select i things from among i +j of them as F(i, j) = product(: (j +1 +k)/(1 +k) ←k |i).

Writing out those products can be a bit tedious, so let's exploit the factorial function, (: product(: 1 +k ←k |h) ←h :{naturals}). The output product(: 1 +k ←k |h) it produces for input h is usually denoted h!, which lets us simplify our formula for F. First observe that 0! = product(: 1 +k ←k |0) is an empty product (0 has no members); this is necessarily 1, since product is a homomorphism from concatenation of lists of numbers to multiplication of their products, requiring it to map the empty identity of concatenation to the multiplicative identity. For 1! we then get product(: 1 +k ←k |1) with 0 as 1's single member, hence the only k in the product, making the single term in the product 1 +0 = 1, so the product is again 1 = 1!. Since factorial is a mapping (: :{naturals}) we can write it as a series; as such it begins [1, 1, 2, 6, 24, 120, 720, 5040, 40320 …], each entry being bigger, by a factor of its index, than the previous, (1 +n)! = n!.(1 +n).

The product we divide by to get F(i, j) from the order-aware counting is plainly just i! = product(: 1 +k ←k |i). The numerator, product(: j +i +k ←k |i), can be rewritten – extending the product and then dividing by the extra terms – as product(: 1 +k ←k |i +j)/product(: 1 +k ←k |j) = (i +j)!/j!, as the k in j are all present among the k in i +j and the remaining k in i +j are just j, j +1, … j +i −1, which we can write as the j +e for e in i, delivering the product we were aming for This leaves us with:

In particular, since addition and multiplication of whole numbers are both commutative, this is symmetric under interchange of i and j.

Table of F(n, m) for small n, m
n↓ \ m→0123456
01111111
11234567
213610152128
3141020355684
415153570126210
5162156126252462
6172884210462924

Consider power(i +j +2, x +y) = (x +y).power(i +j +1, x +y); on the left-hand side, we have a term F(i +1, j +1).power(i +1, x).power(j +1, y). On the right-hand side, we can multiply the loose x from the first factor by the F(i, j+1).power(i, x).power(j +1, y) term in the power to get a contribution to this; or we can multiply the loose y by F(i +1, j).power(i +1, x).power(j, y). As no other terms on the right can contribute to the power(i +1, x).power(j +1, y) in the product, we can duly infer F(i +1, j +1) = F(i, j +1) +F(i +1, j).

From the case power(0, x +y) = 1.power(0, x).power(0, y) we can infer F(0, 0) = 1, matching (0 +0)!/0!/0! = 1 from the formula above. From power(1, x +y) = x +y = 1.power(1, x).power(0, y) +1.power(0, x).power(1, y) we infer F(1, 0) = 1 = F(0, 1), also matching (1 +0)!/1!/0! = 1. For more general power(n, x +y), the only way we can get a power(n, x).power(0, y) term is by picking x in every factor, so there is only one such product and F(n, 0) = 1 and, symmetrically, the only option for power(0, x).power(n, y) is to pick y every time, giving F(0, n) = 1; and, indeed, the formula above gives these as (n +0)!/n!/0! = n!/n! = 1. So, in fact, F(1, 0) = 1 = F(0, 1) and F(0, 0) = 1 are special cases of this.

We thus have a second way to get at our F(i, j), by looking at the recurrance rule F(i +1, j +1) = F(i, j +1) +F(i +1, j) with initial condition F(n, 0) = 1 = F(0, n) for every natural n. Since our formula is consistent with this for the initial condition, let's suppose inductively it holds for all i, j with i +j = n, for some n; clearly this is true in the cases n = 0, n = 1. Now consider some i, j with i +j = 1 +n. If either i or j is 0, F(i, j) is an instance of the initial condition, that we already know matches the formula. Otherwise, i = h +1 and j = k +1 for some naturals h, k so we can write F(i, j) = F(h, j) +F(i, k), with h +j = n = i +k, so we know the formula is correct for F(h, j) = n!/h!/j! and F(i, k) = n!/i!/k!. As noted above, (1 +k)! = k!.(1 +k) so we get j! = k!.j and can replace 1/k! with j/j! and, likewise, 1/k! with i/i!, giving us F(i, j) = (i +j).n!/i!/j! and i +j = n +1 so (i +j)! = n!.(n +1) = (i +j).n! and we infer F(i, j) = (i +j)!/i!/j!, consistent with our formula again. This being true for any i, j that sum to n +1 we infer, for n +1, what we assumed for n and, by induction from the given cases n = 0 and n = 1, we can duly infer it always applies.

On the diagonal, we have entries F(n, n) = (2.n)!/n!/n! and, just off the diagonal, we have F(n, n +1) = (2.n +1)!/(n +1)!/n!, which are the highest values in their respective rows of Pascal's triangle (diagonal in the F table). We can rewrite these by dividing each entry in the numerator, starting with the highest and working down, by the highest factor in the denominator that we haven't yet used. This divides each even factor of the numerator, 2.i, by half of it, i, to leave a factor of 2; and divides each odd factor 2.i +1 by i +1, leaving (2 −1/(i +1)), with the result that the highest value in row m of Pascal's triangle (so m is either 2.n or 2.n +1 in the preceding) is bounded above by power(m, 2) and grows roughly proportional to it for large m (since, aside from a few small values of i, 2 −1/(i +1) is close to 2). Indeed, F(m)/power(m, 2) = product(: 1 −1/2/(i +1) ←i; 2.i < m :{naturals}) and this tends to √((m +1/4).π) for large m.

Statistics

Now let's look at the sum of F(i, j) over those i, j with a given sum, n = i +j. If you go back to our definition of F(i, j) as the coefficient of power(i, x).power(j, y) in power(n, x +y), you might spot the marvelously easy way to determine this sum: if we just set x = 1 = y, each power(i, j).power(j, y) is 1 and the sum of coefficients is just power(n, 2).

Alternatively we can look at this as telling us to think of F(i, j)/power(i +j, 2) as coefficients with unit sum, for any fixed i +j = n; since they're all positive, this means they can even be the probabilities of outcomes of some haphazard process. Indeed, when we look at a sequence of n tosses of a fair coin and count how often one of its sides comes up the probability of that count being i is F(i, n −i)/power(n, 2), for reasons very similar to why F gives the coefficients in power(n, x +y). So now let's look at the average of the count and of its square, from which we can obtain its variance.

The average is simply sum(: i.F(i, n −i) ←i |1 +n)/power(n, 2). The i = 0 term, of course, goes away thanks to its factor of i, so we need only consider those i that are j +1 for some natural j; the condition 0 < i in 1 +n then becomes j in n. We can write i.F(i, n −i) as n!.i/i!/(n −i)! = n!/j!/(n −i) = n.(n −1)!/j!/(n −1 −j) = n.F(j, n −1 −j) and summing these over j in n gets us n.power(n −1, 2), so our average is n/2. This should not be in the least surprising, given the symmetry F(i, j) = F(j, i) and the range of values of [i, j] being [0, n], [1, n −1], … [i, n−i] … [n −1, 1], [n, 0].

Next let's look at sum(: i.i.F(i, n −i) ←i |1 +n)/power(n, 2). As before we can discard the i = 0 term. For the i = 1 term, i.i = i, so if I add and subtract the average, I can write the numerator as sum(: (j +1).j.F(j +1, n −1 −j) ←j |n) +sum(: (j +1).F(j +1, n −1 −i) ←j |n) in which the j = 0 term of the first sum is zero, so we can discard it and substitute j = k +1 to get sum(: (k +2).(k +1).F(k +2, n −2 −k) ←k |n −1) +n.power(n −1, 2), substituting the earlier result for the average's numerator. Now, (k +2),(k +1).F(k +2, n −2 −k) = n!/k!/(n −2 −k)! = n.(n −1).F(k, n −2 −k) and summing that over k in n −1 (i.e. for k from 0 through n −2) just gets us n.(n −1).power(n −2, 2); adding the average's numerator, 2.n.power(n −2, 2), back in we're left with the expected square's numerator being n.(n +1).power(n, 2)/4. Dividing through by power(n, 2) to get the expected square n.(n +1)/4. Subtracting the squared average from that we get a variance of n/4 and a standard deviation of (√n)/2.

So here's a diagram showing the various F(i, n −i) sequences, for diverse n, as lines, each rescaled to put its mean in the centre of the diagram and to make its standard deviation about a third of the width. Since I've scaled the width down by a factor proportional to √n, I've also scaled the heights up by that factor, to preserve the areas we'd see if I replaced each line with a histogram, each column of which is centred on one of the points of my line. Redder lines have smaller n, bluer ones higher n, with green in the middle; the lowest reddest horizontal line is n = 1 and all the ones with a horizontal stretch in the middle have odd n, while those with a peak in the middle have even n. The lone central dot at the top is the single point for n = 0.

and here are the histograms I mentioned before, with the higher n ones mostly hiding the lower n ones:

Those familiar with the Gaussian distribution may well recognise that the high n version of the binomial distribution resembles it: this is no accident, it is a consequence of the central limit theorem.


Valid CSSValid HTML 5 Written by Eddy.