Einstein's thermodynamic model for a solid

Back in the early days of quantum mechanics, Einstein cooked up a nice simple little model of the thermodynamics of solids. In this model, the solid is construed simply as a collection of modes in each of which the solid may carry some quanta of energy; the quanta are all the same and are able to move freely among the modes. Quantised simple harmonic oscillators of equal frequency are the simplest systems whose behaviour resembles such modes.

Each actual atom of the solid is presumed to typically be the site of a few modes; the archetypical mode being the atom's oscillation in some physical direction, generally giving each atom three modes. However, the modes are assumed to have lives independent of one another without regard to whether they are situated at the same atom; so it is the modes that we consider, not the atoms or molecules.

In particular, it is supposed that the quanta are perpetually being randomly redistributed among the modes. For the sake of definiteness, this random redistribution may be thought of as the modes playing a never-ending game at each turn in which one quantum is selected at random, removed from the mode that presently has it and given to a random mode. The physical import of this supposition is that the arrangement of quanta among modes is random.

Arrangements for a single body

Consider a body with some number, N, of modes and total energy consisting of some number n of quanta, each of some energy e. We can enquire to know how many ways there are to arrange those n quanta among those N modes; call this number of ways W(N,n). When we have just 1 mode, there is only one way to arrange the quanta, no matter what n is; so we know that W(1) = (: 1 ←n :{naturals}). [Aside: W(0) maps 0 to 1, all other inputs to 0.] Suppose, then, that we know W(N−1) and wish to know W(N); chose some particular mode among the N; if this mode has i quanta, there are W(N−1,n−i) ways to arrange the remaining n−i quanta among the remaining N−1 modes; each value of i from 0 to n is feasible; so we can infer that W(N,n) = sum(: W(N−1,n−i) ←i :n+1). Reversing the order of summation, i.e. substituting j = n−i, this is identically

• W(N,n) = sum(: W(N−1, j) ←j :{0,…,n})

Now, it happens that there is a standard function of combinatorics, chose, for which chose(M,m) is the number of distinct subsets of size m that may be selected from a set of size M; its value is given by chose(M,m) = M!/m!/(M−m)! for m from 0 up to M, and zero for m outside that range. [The function factorial = (: n! ←n :) can be inductively defined by 0! = 1, (1+n)! = n!.(1+n) for each natural n. The famous Pascal's triangle is a representation of the function here called chose.] Furthermore we have, for any naturals m and M,

chose(M,m) + chose(M,m+1)
= M!/m!/(M−m)! + M!/(m+1)!/(M−m−1)!
= M!/(m+1)!/(M−m)! . ( m+1 + M−m )
= (M+1)!/(m+1)!/(M−m)!
= chose(M+1,m+1).

Now, chose(M, 0) = 1 for all M; hence when m is 0 we have sum(: chose(i+M,i) ←i :1+m) = chose(M,0) = chose(M+1+m,m). When this is true for some given M and m, we can add chose(M+1+m,m+1) to it to obtain sum(: chose(i+M,i) ←i :2+m) = chose(M+2+m,m+1) and infer that sum(: chose(i+M,i) ←i :1+n) = chose(n+1+M,n) for any natural n and M.

Now, W(1) = (: 1←n :{naturals}) and chose(n,n) = 1 for all n; so consider any N for which W(N) = (: chose(n+N−1,n) ←n :{naturals}) and consider, for any n,

W(N+1,n)
= sum(: W(N,i) ←i :n+1)
= sum(: chose(i+N−1,i) ←i :n+1)
= chose(n+N,n)

which is just the formula for W(N) with N replaced by N+1. We can thus infer that W(N,n) = chose(n+N−1,n) gives the number of ways n quanta may be distributed among N modes, for every natural n and N. As a way to eliminate the stray −1 from the formula, it will be convenient to discuss bodies with N+1 modes, for various natural N, rather than with N modes; in any case, this conveniently gets rid of the case of 0 modes, which only admits of 0 quanta (so isn't interesting). So, for natural n and N,

• W(N+1, n) = chose(n+N, n) = (n+N)!/n!/N! = chose(n+N, N)

Thermodynamic comparison

So now suppose we bring two such bodies, with same-sized quanta, into contact in such a way that quanta are able to pass between them; suppose one body shares n quanta among N+1 modes and the other shares m quanta among M+1 modes. Transferring i quanta from the former to the latter will give us an arrangement which allows us chose(n−i+N,n−i).chose(m+i+M,m+i) arrangements; in so far as this increases with i, we can expect quanta to flow in the indicated direction; in so far as it decreases with i we can expect quanta to flow the other way; and if its maximum is attained at i=0 we'll see no heat-flow – the two bodies were initially at the same temperature as one another. Now,

• chose(n−i+N,n−i).chose(m+i+M,m+i) = (n−i+N)!.(m+i+M)!/(n−i)!/(m+i)!/N!/M!

Call this F(i) and consider

F(1)/F(0)
= (m+1+M).n/(m+1)/(n+N)
= (M/(m+1) +1) / (N/n +1)
F(−1)/F(0)
= (n+1+N).m/(n+1)/(m+M)
= (N/(n+1) +1) / (M/m +1)

with heat flowing from the N-body to the M-body if F(1) > F(0), the other way if F(−1) > F(0) and neither way otherwise. Thus the N-body is hotter than the M-body iff F(1) > F(0) iff M/(m+1) > N/n iff n/(m+1) > N/M; and the M-body is hotter than the N-body iff F(−1) > F(0) iff m/M > (n+1)/N iff N/M > (n+1)/m. In between, when (n+1)/m ≥ N/M ≥ n/(m+1), neither is hotter. Note that, since (m+1)/M > m/M, when the N-body is hotter than the M-body, we do have n/N > m/M; but this is not (in itself) a sufficient condition to imply that the N-body is hotter than the M-body.

Distribution of quanta among modes

In each distribution of n quanta among N+1 modes, we can look at how many modes get how many quanta; this gives us a function ({naturals}: s :{naturals}) for which s(i) modes have i quanta. There are 1+N modes in all, so sum(s) = 1+N. No mode has more than n quanta or less than 0 of them, so i ranges from 0 to n, i.e. i is in 1+n = {0, …, n}; and we have n quanta in all, so sum(: i.s(i) ←i :1+n) = n. Many distributions can produce the same sharing function s, since s pays no attention to which modes got how many quanta, only how many modes got each number of quanta. For example, there are N+1 distributions that give all n quanta to a single mode, distinguished by which mode got all the quanta; all of these distributions have the same s, which maps n to 1, 0 to N and everything else to 0, because one mode got n quanta and all the others got none.

What if we share all the quanta out evenly ? If n is an exact multiple of N+1, say n = k.(N+1), then every mode gets exactly k quanta; there's only one distribution that does this; it has s(k) = 1+N and all other s(i) = 0. Otherwise, n = k.(N+1) +r for some r in N+1 and r modes get k+1 quanta while N+1−r get k quanta; so s(k) = N+1−r, s(k+1) = r and all other s(i) are zero; there are chose (N+1, r) distinct distributions like this. When r is 0, as we've seen, that's just 1 fair distribution; for r = 1, we get chose(N+1, 1) = N+1 almost-fair distributions; for larger r, we get chose(N+1, r) of these as fair as possible distributions. This peaks around r = (N+1)/2, with chose(N+1,(N+1)/2) growing roughly as power(N+1, 2)/√(N.π/2). That's quite a lot; but n = k.N +r makes the total number of distributions chose(r+k+1 +N.(k+1), N) which, even ignoring r+k+1, grows as exp(N).power(N, k+1)/√(2.N.π), so the proportion of almost fair distributions is 4.power(N, 2/(k+1))/exp(N), which shrinks rapidly for even quite modest k. Of course, there are plenty more distributions close to fair (or, indeed, to monopolistically unfair) but, as we'll soon see, the typical distribution is moderately unfair.

Typical distribution

So what sharing function should we expect a typical distribution to give us ? If we assume that each of our chose(n+N, N) distributions is equally likely to happen, then the number of modes with i quanta in a randomly chosen distribution shall be the average, over all these distributions, of the s(i) of each distribution. We can label our 1+N modes with the naturals from 0 to N and characterise each distribution by the mapping (:c|1+N) with sum(c) = n for which c(j) is the number of quanta mode j has; there is an exact one-to-one correspondence between such functions and distributions. We can obtain s from c by s(i) = count({i}∘c: j←j :), the number of j in 1+N for which c(j) is i. So the averaged s we want is:

S
= (: average(: count({i}∘c: j←j :1+N) ←c :{list ({naturals}: c |{1+N}): sum(c) = n}) ←i :1+n)
= average(: (: count({i}∘c: j←j :1+N) ←i :1+n) ←c :{list ({naturals}: c |{1+N}): sum(c) = n})

in which we use the pointwise arithmetic on functions ({numeric}: :1+n) to average s over the available c. Unfortunately, there's no easy way to evaluate this in either form, so let's try looking at it another way:

= sum(: count({list ({naturals}: c |{1+N}): (: count({i}∘c: j←j :1+N) ←i :1+n) = s}).s ←s :{list (: s :1+n): sum(: i.s(i) ←i :) = n, sum(s) = 1+N})/chose(n+N, N)

that is, we want to weight each s by the number of distributions (characterised by c) that have that s as their sharing function, then average that way. So how do we work out how many distributions realise each particular s ? Well, there are s(i) modes with i quanta, for each i in 1+n, so the answer is just the number of ways of chosing to partition our N modes into the s(0) with no quanta, the s(1) with 1 quantum, the s(2) with 2 quanta and so on, up to the s(n) with n quanta. This gives us chose(1+N, s(0)) ways to chose the 0-quanta modes, chose(1+N−s(0), s(1)) ways to chose the 1-quantum modes and so on, resulting in

• product(: chose(1+N−sum(:s:i), s(i)) ←i |1+n)

ways to realise each particular s with sum(s) = n. Let's define a mapping C from sharing functions to naturals that maps each s to this product. When we substitute chose(m, k) = m!/k!/(m−k)! into this, each (1+N−sum(:s:i))! appears in the numerator of the term for i but also, for i > 0, in the denominator for the term for i−1; the result is that the product above simplifies to

• C(s) = (1+N)!/product(: s(i)! ←i :1+n)

in which most of the s(i)! are 1. [Proof: if there is an i with 2.i = n, then either it has s(i) = 2 or it has s(i) in {0, 1}; the former forces all other j in 1+n to have s(j) = 0, hence s(j)! = 1; the later gives s(i)! = 1 either way. Now, at most one of the i with 2.i > n can have s(i) non-zero (else sum(: i.s(i) ←i :) shall be too big) and, when one does, it can only have s(i) = 1; thus all of the i with 2.i > n have s(i)! = 1. Thus either all but one i have s(i)! = 1 (and the other has s(i)! = 2) or all i with 2.i ≥ n have s(i)! = 1, which is most of (i.e. at least half) the i in 1+n.] This means we can restrict our product (and, once we take log() below, our sum) to the minority of i in 1+n for which s(i) > 1.

We can motivate (1+N)!/product(factorial∘s) another way: if we think of filling up a box with s(0) modes, then a box with s(1) modes, then … and so on, there are N! orders in which we can pick the modes and each choice of which modes land in each box is realised product(factorial∘s) ways, since permuting the order of the s(i) modes selected to get i quanta makes no difference to the partition and there are s(i)! such permutations, for each i.

Now, we can use Stirling's formula to approximate our product of factorials; Stirling actually gives more precision (via terms in 1/k), but let's go with log(k!) = (k+1/2).log(k) −k +log(2.π)/2 so that

log(C(s))
= log((1+N)!) −sum(: log(s(i)!) ←i :1+n)
= (N+3/2).log(1+N) −(1+N) +log(2.π)/2 −sum(: (s(i)+1/2).log(s(i)) −s(i) +log(2.π)/2 ←i :1+n)

in which the sum is tacitly over all i with s(i) > 1. The middle term in the sum is sum(s) = 1+N, which cancels with the 1+N in the term for log((1+N)!), so we're left with

= (N+3/2).log(1+N) −sum(: (s(i) +1/2).log(s(i)) ←i :) +log(2.π)/2.(1 −count({i in 1+n: s(i) > 1}))

in which that last term is a bit inconvenient; and the fact that I've ignored terms in 1/s(i) is relevant for the (likely many) i for which s(i) is (though > 1) small.

Most usual distribution

Furthermore, even after approximating, we don't have a formula that we can in any tractable way use to weight the available candidates for s in taking a weighted average, so we haven't made any huge progress. However, we can at least try investigating which s are realised in the greatest number of ways; that is, the s which maximise C(s) or log(C(s)) subject to our two constraints, sum(s) = 1+N and sum(: i.s(i) ←i :) = n. These are both linear constraints on s, so we can use Lagrange's method of multipliers. For this we'll be subtracting a weighted sum of our constraints, sum(s) and sum(: i.s(i) ←i :), from what we want to maximise, then differentiating the result with respect to s. I'm not in a hurry to work with d(1/x!)/dx, but I do know what to do with log, so I'll chose to maximise log(C(s)). Since we'll be differentiating with respect to s, the log(1+N) term goes away; and I consider it reasonable to assume that the number of entries > 2 in s is constant near the maximum, which helpfully gets rid of those troublesome log(2.π)/2 terms. As the only remaining term is negative and we'll be subtracting from it to get a function to maximise, I'll instead define the positive function that we want to minimise. So let

• f(s) = sum(: (s(i) +1/2).log(s(i)) +B.i.s(i) +H.s(i) ←i :)

for some constants B and H – we'll vary them later, but first we find what s minimises f for given values of them. As there is no entanglement between the s(i) for distinct i, the derivative with respect to each s(i) is just the derivative of the i-term in the sum;

d(f(s))/d(s(i))
= log(s(i)) + (s(i) +1/2)/s(i) +B.i +H
= log(s(i)) +1/2/s(i) +B.i +H +1
which is zero when
−log(s(i))
= B.i +H +1 +1/2/s(i), i.e. when
1/s(i)
= exp(H +1 +B.i +1/2/s(i))

applicable only for those i with s(i) > 1, so 1/2/s(i) is less than 1 and we can hopefully ignore it, leaving

• s(i) = h.power(i, b)

for some constants h and b (derived from H and B, but we can now ignore these). This is the geometric distribution, nominally truncated at n, but actually truncated (in effect) somewhat earlier because at most one i with 2.i ≥ n has non-zero s(i); this means b must in fact be small enough that power(i, b) is negligible by the time we get to 2.i > n; in particular, we must have 0 < b < 1. Since the powers of b are negligible well before we get to the n-th, it should make no difference to skip the truncation, so our N +1 = sum(s) = h.sum(: power(i, b) ←i :) constraint becomes (N +1).(1 −b) = h. Our other constraint is then:

n
= sum(: i.s(i) ← i :)
= h.sum(: i.power(i, b) ←i :)
= h.sum(: b.d(power(i, b))/db ←i :)
= h.b.d(sum(: power(i, b) ←i :))/db
= h.b.d(1/(1 −b))/db
= h.b/(1 −b)/(1 −b)
= (N +1).b/(1 −b)

which gives us (N +1).b = n.(1 −b), so n = b.(n +N +1) leading to b = n/(n +N +1), whence 1 −b = (N +1)/(n +N +1) and h = (N +1).(N +1)/(n +N +1). Now, let q = n/(1 +N), the average number of quanta per mode, and substitute n = q.(1 +N) into these; we get b = q/(q +1) and h = (1 +N)/(q +1). So

• 1/b = 1 +1/q
• 1/q = 1/b −1 = (1 −b)/b
• q = b/(1 −b), q +1 = 1/(1 −b)
• 1 +N = h.(q +1) = h/(1 −b), N = (h +b −1)/(1 −b)
• n = q.(1 +N) = h.b/(1 −b)/(1 −b)

expresses our original n, 1 +N in terms of b = exp(B) and h = exp(H +1).

Statistics

The proportion of modes with i quanta is thus (1 −b).power(i, b) and the expected value of the j-th power of the number of quanta is

• moment(j) = (1 −b).sum(: power(j, i).power(i, b) ←i :)

which we can evaluate by exploiting the fact that i.power(i, b) is b.d(power(i, b))/db. Define f(0, b) = sum(: power(i, b) ←i :) = 1/(1 −b) and, for each natural j, f(1 +j, b) = b.d(f(j, b))/db = sum(: power(j, i).power(i, b) ←i :) so that moment(j) = (1 −b).f(j, b) for each j. I've already used f(0, b) = 1/(1 −b) and f(1, b) = b/power(2, 1 −b) in the derivation above; we can continue with

f(2, b)
= b/power(2, 1 −b) +2.b.b/power(3, 1 −b)
= b.(1 +b)/power(3, 1 −b)
f(3, b)
= b.(1 +2.b)/power(3, 1 −b) +3.b.b.(1 +b)/power(4, 1 −b)
= b.(1 +4.b +b.b)/power(4, 1 −b)
f(4, b)
= b.((1 −b).(1 +8.b +3.b.b) +4.b.(1 +4.b +b.b))/power(5, 1 −b)
= b.(1 −b +8.b −8.b.b +3.b.b −3.b.b.b +4.b +14.b.b +4.b.b.b)/power(5, 1 −b)
= b.(1 +11.b +11.b.b +b.b.b)/power(5, 1 −b)

and so on. These give us moment(0) = 1 and the mean moment(1) = b/(1 −b), as used above; the rest give

• moment(2) = b.(1 +b)/power(2, 1 −b)
• moment(3) = b.(1 +4.b +b.b)/power(3, 1 −b)
• moment(4) = b.(1 +11.b +11.b.b +b.b.b)/power(4, 1 −b)

and so on. From moment(2) −power(2, moment(1)) we get the variance b/power(2, 1 −b) so standard deviation (√b)/(1 −b). Dividing mean by variance shall give us 1 −b, providing a simple way to estimate b for any distribution of roughly this form.

Heat capacity per mode

To increase the temperature of a real body, one must supply energy; the amount of energy grows, for bodies made of a given material, proportionally with the mass of the body. Dividing the amount of energy supplied by the mass of the body and by the temperature change produced we get a quantity called the specific heat capacity of the material of which the body is made. This is found to be the same for all samples of a given material at a given temperature (all other pertinent factors, such as pressure, being equal); for solids and liquids, it varies only slowly with temperature. It has been found, experimentally, that multiplying specific heat capacity by relative molecular mass gives roughly three times the gas constant (R = 8.31 J/K/mol); this is the result of dividing the energy supplied by the amount (number of moles) of substance (in place of the mass) and by the temperature. The gas constant is Boltzmann's constant, k = 13.8e-24 J/K, times Avogadro's constant, the number of molecules in a mole of substance, so this just says that the heat capacity of a body is 3.k times the number of molecules in the body. If we assume that each molecule of the solid can oscillate independently in each of the three spatial directions, then the number of modes in the model above is just three times the number of molecules, so we can read this as saying that the heat capacity of a solid is roughly k times the number of modes; so the heat capacity per mode is just k.

This approximation holds at high temperature and my school notes claim that, at lower temperature T in a material whose quanta have energy E, with x = E/k/T/2, the heat capacity is tolerably approximated by k times the square of x/sinh(x); the function sinh being defined by 2.sinh(x) = exp(x) −exp(−x). Note that high T means x near 0, where x/sinh(x) is 1, its maximum, tailing off to zero on either side.

So now let's look at how b, n, N and friends relate to temperature, T, given quanta of energy E. We naturally expect T to increase as n increases and decrease as N increases, so something like n/N is involved. The stored energy is just n.E and the stored energy per mode is U = n.E/(1 +N) = q.E. The heat capacity is the rate of change of stored energy with temperature, so the heat capacity per mode is dU/dT ≈ k.power(2, x/sinh(x)). As T = E/k/x/2 (with E, k and 2 constant) we have dT/dx = −T/x, so −dU/dx ≈ k.T.x/power(2, sinh(x)) and k.T.x is just E/2, so n/(1 +N) = q = U/E has −dq/dx = 1/2/power(2, sinh(x)). Low temperature is high x, so q tends to 0 as x tends to +∞, with

2.q(X)
= −integral(: 1/sinh(x)/sinh(x) ←x :{positives > X})
= −integral(: 4/power(2, (exp(x) −exp(−x))) ←x :{positives > X})
= −integral(: 4.exp(−2.x)/power(2, (1 −exp(−2.x))) ←x :{positives > X})
= −integral(: d(2/(1 −exp(−2.x)))/dx ←x :{positives > X})
= 2/(1 −exp(−2.X)) −2

since exp(−2.x) tends to 0 as x tends to infinity, so 2/(1 −exp(−2.x)) tends to 2 there,

= 2.exp(−2.X)/(1 −exp(−2.X))
= 2/(exp(2.X) −1)

whence

• q = n/(1 +N) = 1/(exp(2.x) −1) = 1/(exp(E/k/T) −1)
• 1/b = 1/q +1 = exp(E/k/T)
• b = exp(−E/k/T)
• T = −E/k/log(b)

and, indeed, our original Lagrange multiplier B is just log(b) = −E/k/T.

Pareto Parameter

Now let's compute the Pareto parameter of our distribution: this is the fraction p of the modes that hold a fraction 1 −p of the quanta. Our distribution gives i quanta to a fraction power(i, b).(1 −b) of the modes, so the nodes with ≥ m quanta constitute power(m, b) of the total modes; summing the quanta these nodes hold, aside from a constant factor (h, see above), we get:

sum(: i.power(i, b) ←i; i ≥ m :)
= b.d(sum(: power(i, b) ←i; i ≥ m :))/db
= b.d(power(m, b)/(1 −b))/db
= power(m, b).(m −m.b +b)/power(2, 1 −b)

The full total (with the same scaling) is just this with m = 0, b/power(2, 1 −b); dividing by this, the proportion of quanta held by nodes with at least m is then power(m, b).(m/b −m +1). As b < 1, this is necessarily positive. So a fraction power(m, b) of the modes hold a fraction power(m, b).(m/b −m +1) of the quanta; for m = 0, both fractions are 1; for sufficiently large m, each tends to zero; so there must be some m at which the sum of these fractions makes the transition from > 1 to < 1. We can find this by solving for the m that would make the sum 1; it's probably not a whole number, but the closest whole numbers below and above answer our question. We thus want that m for which power(m, b).(2 +m/b −m) = 1, i.e. power(m, 1/b) = 2 +m.(1/b −1). Our Pareto parameter is then f = power(m, b), the proportion of modes with more than m quanta; this is necessarily ≤ 1/2, so 1/f = power(m, 1/b) is ≥ 2. We get log(f)/log(b) = m = (1/f −2)/(1/b −1), so

• log(f)/(1/f −2) = log(b)/(1/b −1) = −E/k/T/(exp(E/k/T) −1).

Thus the Pareto parameter for the distribution is directly linked to the ratio of quantum-size to k.T, the characteristic energy of our temperature. The relationship isn't exactly easy to compute (in either direction), but at least it's well-defined.

Written by Eddy.