The Gamma Distribution

The gamma distribution is so-named because its density function is a suitably-scaled version of the function whose integral is the gamma function. In particular, this means that it is a distribution on positive values: by specification, a gamma-distributed random variate is never negative (and, for α > 1, always positive). Formally, we need two parameters to specify a gamma distribution; one simply controls the scale of the variate, the other controls the shape of the distribution. I'll call these β and α, respectively; the distribution's density is

(: exp(−x/β).power(α−1, x/β).dx/β/Γ(α) ←x :)

where the definition of Γ is exactly what makes this be normalised. Generally a variate to be modelled by a gamma distribution is so specified as to ensure its values are not negative, with β > 0, although nothing in principle prevents one from modelling a never-positive variate with β < 0. The shape parameter, α, is always positive (albeit one could, in principle, use the definition with a negative non-integer value for it).

For 0 < α < 1, the density grows to infinity as x approaches 0; at α = 1, we get the negative exponential distribution, whose density tends to dx/β as x approaches 0; for α ≤ 1, the density decreases monotonically with x. For α > 1, the density is zero at x = 0, increases monotonically to a peak as x increases, then decreases monotonically as x increases further; for 1 < α < 2, the density rises abruptly (with infinite derivative) as x increases away from zero; at α = 2, the density rises linearly as x increases away from zero, before peaking and tailing off; for α > 2, the density has a local minimum at zero, with zero derivative as x increases away from zero. Thus gamma distributed variates can have diverse behaviours near zero, but are always unimodal (with zero as mode for α ≤ 1) and tail off monotonically for x greater than the mode.

One important class of situations in which the gamma distribution arises is in studying the aberration of multi-dimensional gaussian (normally-distributed) variates. When we have several statistics describing the members of a population, we can combine these data to obtain a vector-valued variate on the population; if each of the data is normally distributed, the result is apt to also be normally distributed (absent messy correlations) save that it is now distributed in a vector space, rather than along the number line. Dividing this distribution's density by its value at its mid-point eliminates its normalisation; taking its logarithm and multiplying by −2 recovers the squared offset from the origin that controls the gaussian's variation with position; refer to this squared offset as the aberration of the vector-valued variate. This aberration is a pure scalar variate: it is gamma distributed with β = 1 and α equal to half the dimension of the vector space. In more-than-two-dimensional cases, this makes zero an atypical aberration; in more-than-four-dimensional cases, it makes zero an extraordinary aberration – typical samples of the variate lie close to a spheroid about the mid-point; given a basis that diagonalises the variance of the gaussian, each basis member is aligned with a natural axis of the spheroid, whose radius along that axis is the dimension of the vector space times the square root of half the given basis member's diagonal entry in the variance.

Summing variates

Suppose we add two gamma-distributed variates with the same scale parameter, β; let one, x, have shape parameter α=X and the other, y, have α=Y; then z = x+y has distribution determined by, for arbitrary positive a, b:

P(a ≤ z ≤ b): = integral(: integral(: exp(−y/β).power(Y−1, y/β).dy/β/Γ(Y) ←y; a≤x+y≤b:{positives}).exp(−x/β).power(X−1, x/β).dx/β/Γ(X) ←x; x≤b :{positives}); = integral(: integral(: exp(−(z−x)/β).power(Y−1, (z−x)/β).exp(−x/β).power(X−1, x/β).dx/β.(dz−dx)/β/Γ(X)/Γ(Y) ←x; x≤b, x≤z :{positives}) ←z; a≤z≤b:{positives}); = integral(: integral(: exp(−z/β).power(Y−1, (z−x)/β).power(X−1, x/β).dx/β ←x; x≤b, x≤z :{positives}).dz/β/Γ(X)/Γ(Y) ←z; a≤z≤b:{positives}); whence the density for z is:
integral(: power(Y−1, (z−x)/β).power(X−1, x/β).dx/β ←x; x≤z :{positives}).exp(−z/β).dz/β/Γ(X)/Γ(Y);: = integral(: power(Y−1, 1−t).power(X−1, t).dt/Γ(X)/Γ(Y) ←t; t≤1 :{positives}).power(X+Y−1, z/β).exp(−z/β).dz/β

in which the initial integral(:…:) no longer involves z, so is simply a constant; it scales power(X+Y−1, z/β).exp(−z/β).dz/β which is just the gamma-distribution density with scale factor β and shape parameter α = X+Y, aside from a factor of Γ(X+Y); we may thus conclude that

Γ(X).Γ(Y) = Γ(X+Y).integral(: power(Y−1, 1−t).power(X−1, t).dt ←t; t≤1 :{positives})

for all positive X, Y. We thus conclude that adding two gamma-distributed variates with equal scale parameter yields a gamma-distributed variate with the same scale parameter, whose shape parameter is simply the sum of the shape parameters of the two variates added together.

Moments

To obtain the expected values of assorted powers of a gamma-distributed variate, x, with scale parameter β and shape parameter α, e.g. for power(n, x), we must evaluate:

integral(: power(n, x).exp(−x/β).power(α−1, x/β).dx/β/Γ(α) ←x :{positives}): = integral(: exp(−x/β).power(n+α−1, x/β).dx/β/Γ(α) ←x :{positives}).power(n, β); = power(n, β).Γ(n+α)/Γ(α); = product(: β.(i+α) ←i :n)

for natural n, using Γ(1+x) = x.Γ(x) repeatedly as needed. Thus the expected value (a.k.a. mean) of x is simply β.α and the expected value of its square is β.α.β.(α+1), so its variance is β.β.α, i.e. β times the mean, and its standard deviation is β.√α, i.e. the mean divided by √α.

Pareto parameter

Now let's compute the Pareto parameter. For this, we need to study

Q(S) = integral(: p(x).dx ←x :S).integral(: x.p(x).dx ←x :S)

with p as our density function, p(x) = exp(−x/β).power(α−1, x/β)/β/Γ(α). We solve for the k with Q({real x: x < k}) = Q({real x: x ≥ k}) and integral(: p :{real x: x ≥ k}) is our Pareto parameter. Now,

x.p(x): = x.exp(−x/β).power(α−1, x/β)/β/Γ(α); = exp(−x/β).power(α, x/β)/Γ(α)

is just the density for a gamma variate with the same β but one higher α, with a scaling of β.Γ(α +1)/Γ(α), which is just β.α. So Q(S)/β/α is just the product of measures, on S, of two gamma variates, our original and one with the same β but one higher α. This is just the probability measure for two independent gamma variates to both be in S, with the given parameters; when this is the same for {real x: x < k} as S and for {real x: x ≥ k} as S, our original variat's probability of exceeding k is its Pareto parameter.

Now, to save me some typing, let n = α and h = k/β, define R(S) = Q({real x: x/β in S})/α/β, with S being either {real t: t ≥ h} or its complement {real t: t < h} so that

R(S): = integral(: p(y).dy ←y/β :S).integral(: x.p(x).dx ←x/β :S)/α/β; = power(β, 2).integral(: p(βt).dt ←t :S).integral(: s.p(βs).ds ←s :S)/α; = integral(: exp(−t).power(n−1, t).dt ←t :S).integral(: exp(−s).power(n, s).ds ←s :S)/Γ(n)/Γ(n+1)

with no remaining involvement of β; and our Pareto parameter, once we've found the h for which R(S) is the same for the S on each side of h, is just integral(: p(y).dy ←y/β :S) = integral(: exp(−t).power(n−1, t).dt ←t :S) for the upper side's S, again with no involvement of β. Thus a gamma variate's Pareto parameter depends only on its order parameter, α (here n). This should be no surprise, as β is just an over-all scaling, equivalent to changing choice of unit for measuring the variate.

Given that I can't see any natural way to move this analysis further, I'll leave it there. Maybe some day I'll do some number-crunching and add a graph of how Pareto varies with α, but not today.

Written by Eddy.