]>
The language used to describe music is full of jargon using numbers in ways that the uninitiated are naturally apt to find confusing. There is a quite simple underlying mathematical description, which I'll try to dig out and detail here.
A pure tone
is a sound in which the amplitude (of the movements of
air or the variations in pressure) follow a simple harmonic variation at a
single definite frequency. When appearing in a composite with other sounds (so
no longer pure) it is simply referred to as a tone. Tones are compared to one
another in terms of their frequencies – a tone with a higher frequency is
described as higher
, and conversely for lower
.
A note
is what results when one tone, known as the fundamental tone
of the note, is combined with various other tones –
called harmonics
of the fundamental – at frequencies that are
(necessarily positive) whole-number multiples of the frequency of the
fundamental. How the note sounds then depends on the pattern of relative
intensities of the various frequencies it combines; this is known as
the timbre
of the note and, for the purposes of the present
discussion, I'll treat it as being encoded by a mapping
({positives}::{naturals}) which maps each natural, n, to the real number
obtained by dividing the intensity of the note's n-th harmonic (i.e. its
contribution at frequency n times that of the fundamental) by the intensity of
the fundamental. (Formally this mapping doesn't accept 0 as an input; and
always, by definition, maps one to one.) Every tone is a note whose timbre is
the simple mapping 1←1, accepting no other inputs; each tone, considered as
a note, is its own fundamental tone, with no harmonics. A note is usually
characterized by its fundamental tone but differences in timbre can make a
significant difference to listeners' perceptions of notes.
When two tones are heard together, the total amplitude of the air's oscillations is the sum of their two simple harmonic oscillations; in general, this varies with time, t, as
This is a sum of two terms, each of which can be understood as a
coefficient, albeit varying with low frequency (w−z)/2, times a simple
sinusoid with frequency (w+z)/2, the average of w and z (so reasonably close to
both, as long as they're reasonably close to one another). When (w−z)/2
is small, the result may thus be perceived as a single tone, with frequency
(w+z)/2, whose intensity (i.e. volume) varies with frequency
(w−z)/2. This variation in intensity is called beating
; it
complicates our perception of sounds with frequencies close to one another.
However, within a note, since all the constituent tones' frequencies are multiples of the fundamental's frequency, f, all differences between constituents' frequencies are at least f, which is tacitly high enough to be audible, so no two constituents are close enough to cause us to perceive them as beating in this way; in any case, even if they did, the mixed sound would have a frequency that's a multiple of f/2 and its amplitude would likewise vary at a multiple of f/2, so (at worst) the beating would yield a sound that fits into a note whose fundamental tone has half the frequency of our original. Likewise, when two notes are heard together, if their fundamentals' frequencies are both small whole number multiples of some common frequency, mixing their constituents won't produce significant beating (all constituents being multiples of this common frequency) and, even in so far as they do, the result would fit within a note whose fundamental is either the common frequency or half it. Conversely, when two notes' fundamental tones' frequencies aren't both multiples, by small whole numbers, of some base frequency, it's more or less unavoidable that at least some harmonic of one is close to at least some harmonic of the other, so that these mix to produce a sound which doesn't belong in either – and causes beating.
In a note with some given fundamental tone, all the constituent tones are harmonics of that fundamental; if you could select the constituents which are harmonics of one particular harmonic in the note, you'd get a new note, of which this particular harmonic is the fundamental tone. Such a selection note is known as a harmonic of the original note. On a bowed stringed instrument, it is possible to perform such a selection by holding a finger lightly touching a string at a point a carefully-chosen fraction of the way along the vibrating part of the string. For example, a finger one fifth of the way along the string will filter out all vibrations but the ones whose frequencies are multiples of five times that of the unfiltered note's fundamental. However, you have to place the finger quite accurately on the right spot; otherwise, the resulting sound is just shrill and unpleasant.
Most of the weird jargon relates to the relationships that may exist among
tones. Orthodox music theory describes a pair of notes as being separated by
an interval
, which depends only on the ratio of the frequencies of their
fundamental tones – so an interval really describes a relationship between
tones. With three notes, with the same interval between the lowest and middle
as between the middle and highest, the interval between lowest and highest is
described as twice that interval; or, generally, the language used treats the
outer interval as the result of adding
the two inner intervals; so
multiplication of ratios of frequencies is expressed as addition of intervals;
it is thus most convenient to consider an interval as being characterized by
the logarithm of its associated ratio of frequencies.
The octave
(so-called because orthodoxy's labelling of notes makes
the eighth note one octave above the first) is the interval between two tones
when the frequency of one is twice that of the other. It is the standard unit
of intervals, so the natural base to use for our logarithms is two; so I'll
characterize each interval by the value of ln(s/d)/ln(2) = log2(s/d)
when it's the interval between some higher tone of frequency s and some lower
note of frequency d. This makes 2 the natural base to use for logarithms, so
I'll simply write log2 as log, dropping the subscript.
A chord
is the result of combining a few notes, usually with some
carefully-chosen relationship among the fundamentals of those notes. The study
of chords is thus dominated by the study of the intervals between constituent
notes – although choosing a particular group of notes, separated by these
intervals, naturally makes a difference.
In a note whose fundamental tone has frequency f, every constituent tone has frequency n.f for some whole number n. If we were to remove the fundamental tone from this note we'd be left with a combination of tones all of frequencies n.f with natural n>1. Since there would then be no tone in the mix of whose frequency the frequencies of all others are multiples, this wouldn't be a note. If the note we started with consisted of only a few tones, however, the remainder would be a chord whose constituents' frequencies are all rationally commensurate – that is, the ratios among the frequencies are all of form n/m for whole numbers n, m. In particular, the differences among the frequencies are all multiples of f; as long as all the whole numbers involved (n and m, after any common factors have been eliminated) aren't large, this won't be small compared to the frequencies themselves, so we shouldn't notice any beating; and, even if we do, it'll all be at frequencies that are multiples of f/2 – at worst one octave lower than the fundamental tone we removed.
Conversely, if we combine a few notes whose fundamental tones' frequencies
are all rationally commensurate, there's necessarily some frequency of which all
of the given frequencies are multiples. If this frequency happens to be the
frequency of the fundamental tone of one of our notes, the resulting chord is
actually (at least as far as my definition, above, is concerned) a note:
otherwise, we could always add the tone with this frequency to transform our
chord formally into a note. However, if we don't add this tone and it isn't
already present, the chord won't be (formally) a note – although (as may
be observed by adding the tone and then removing it) it can be regarded as the
residue of a note from which we've removed the fundamental. So a chord, whose
constituent notes' fundamental tones' frequencies are rationally commensurate,
is like a note with its fundamental tone removed: it should not be too
surprising that, in so far as the constituent notes sound nice
, such
chords are apt to sound nice, particularly when only moderately small whole
numbers are needed to express the ratios between the frequencies of fundamental
tones. Indeed, as long as the whole numbers – needed to describe the
ratios among the fundamentals of the notes making up the chord – are
reasonably small, the common factor frequency shall not be so small compared to
the others as to give rise to beating; and, as ever, any beating that does
result shall at least only involve frequencies rationally commensurate with
those present in our notes.
A scale
is a system of co-ordinates in logarithm-of-frequency space:
it specifies some tone as its origin, from which we may infer a lattice of tones
at intervals of whole numbers of octaves from it, and some scheme for
sub-dividing the octave. All the oddities of jargon come from the different
ways of sub-dividing the octave. All the complications come from a tension
between two competing priorities, when sub-dividing the octave: on the one hand,
it would be nice to have the sub-division simply extend the octave lattice to a
finer granularity, using some fraction of the octave as smaller unit; on the
other hand, representing nice-sounding chords requires that the intervals within
such chords match up with the sub-divisions of the octave. However, the
nice-chord intervals are logarithms (to base two) of ratios of moderately small
integers; and, except where the ratios are powers of two, these intervals are
irrational fractions of the octave – so they can't give us an exact
lattice to refine the octave-lattice.
Describe a ratio as simple
in so far as its numerator and
denominator, once any common factors have been eliminated, are small; the
smaller they are, the simpler the ratio is. One approach to our problem is to
look for simple ratios whose logarithms, to base two, are reasonably close to
simple ratios; using these to define our scale seems likely to have a reasonable
chance of working reasonably well. Conveniently, I can truncate continued
fractions to obtain fairly accurate rational approximations to logarithms. The
following table lists coprime ratios, strictly between one and two, of naturals
up to twelve. The various other ratios of naturals up to twelve can be inferred
by cancelling any common factors, multiplying by factors of two (which just adds
1 to the log of the ratio) and inverting (which simply negates the
log).
numerator, n | denominator, d | log(n/d) | approximated by |
---|---|---|---|
3 | 2 | 0.585 | 3/5 = 0.6, 7/12 = 0.583, 31/53 = 0.585 |
4 | 3 | 0.415 | 2/5 = 0.4, 5/12 = 0.417, 22/53 = 0.415 |
5 | 3 | 0.737 | 3/4 = 0.75, 14/19 = 0.737 |
4 | 0.322 | 1/3 = 0.333, 9/28 = 0.321 | |
6 | 5 | 0.263 | 1/4 = 0.25, 5/19 = 0.263 |
7 | 4 | 0.807 | 4/5 = 0.8, 21/26 = 0.808 |
5 | 0.485 | 1/2 = 0.5, 17/35 = 0.486 | |
6 | 0.222 | 2/9 = 0.222 | |
8 | 5 | 0.678 | 2/3 = 0.667, 19/28 = 0.679 |
7 | 0.193 | 1/5 = 0.2, 5/26 = 0.192 | |
9 | 5 | 0.848 | 6/7 = 0.857, 11/13 = 0.846, 39/46 = 0.848 |
7 | 0.363 | 4/11 = 0.364, 29/80 = 0.362 | |
8 | 0.170 | 1/6 = 0.167, 9/53 = 0.170 | |
10 | 7 | 0.515 | 1/2 = 0.5, 18/35 = 0.514 |
9 | 0.152 | 1/7 = 0.143, 2/13 = 0.154, 7/46 = 0.152 | |
11 | 6 | 0.874 | 7/8 = 0.875 |
7 | 0.652 | 2/3 = 0.667, 15/23 = 0.652 | |
8 | 0.459 | 6/13 = 0.462, 17/37 = 0.459 | |
9 | 0.290 | 2/7 = 0.286, 11/38 = 0.289 | |
10 | 0.138 | 1/7 = 0.143, 4/29 = 0.138 | |
12 | 7 | 0.778 | 7/9 = 0.778 |
11 | 0.126 | 1/8 = 0.125 | |
The highlit entries are plainly visible in the following diagram, which shows selected fractions on the left, positioned vertically in accord with their logarithms to base two, with the range of values of logarithms likewise divided up into selected fractions on the right:
The diagram above includes a sub-division of the octave into equal twelfths: it shows that the 5/12 and 7/12 logarithms correspond very closely with the ratios 4/3 and 3/2, respectively. Of the numbers up to 12, only 1, 5, 7 and 11 are coprime to 12; all the others share a factor of 2 or 3 with it. Because they are coprime to 12, the multiples of each of these, reduced modulo 12, enumerate all the numbers up to 12. For 1 we get the obvious sequence, 1, 2, …, 11, 12; for 11, we get the same sequence simply reversed: 11×2 = 22 = 12+10 is 10 modulo 12; 11×3 = 33 = 24+9 is 9 modulo 12; and so on. From 5 we get 5, 10, 3 (i.e. 15 = 12+3), 8, 1, 6, 11, 4, 9, 2, 7, 12; and, since 7+5 = 12, 7 just gives us the same sequence reversed. Because 5/12 and 7/12 logarithms match ratios 4/3 and 3/2 reasonably accurately, the powers of the last two, modulo factors of two, should give us logarithms close to 5/12, 10/12, 3/12 and so on. Successive powers involve steadily higher powers of 3, but we can use the first few powers of 4/3, corresponding to 5/12, to get the first half of the sequence and the powers of 3/2, corresponding to 7/12, to get the second half in reverse. Thus, for 5, 10, 3, 8, 1 and 6 we get 4/3, 16/9, 32/27, 128/81, 256/243 and 1024/729; for 7, 2, 9, 4, 11 and 6 we get 3/2, 9/8, 27/16, 81/64, 243/128 and 729/512. The two values for 6 aren't quite equal; dividing the former by the latter, we get 729×729/512/1024 = 1.0136 rather than 1. Still, it's reasonably accurate. Re-ordering the sequence, the ratios corresponding to successive twelfths (from 1/12 to 11/12) as logarithms are: 256/243, 9/8, 32/27, 81/64, 4/3, 729/512 or 1024/729, 3/2, 128/81, 27/16, 16/9 and 243/128. The ratios of smaller numbers, in this list, give more accurate approximations to their corresponding twelfths in the logarithm lattice.
This gives us a scale – that is, a logarithmically (approximately) even sub-division of the octave – corresponding to fairly simple ratios; if nothing else, they involve no prime factors aside from 2 and 3, even if they do involve some fairly high powers of each (up to 36 and 210). Its approximations are best at 5/12 and 7/12; and reasonably good at 2/12 = 1/6, 3/12 = 1/4, 9/12 = 3/4 and 10/12 = 5/6, with the remaining lattice points less good. Thus, if we use a scale that sub-divides the octave evenly in twelve, the notes at 1/12, 4/12 = 1/3, 6/12 = 1/2, 8/12 = 2/3 and 11/12 shall be further off nice ratios than the others; or, if we use the ratios above accurately, these same notes (corresponding to the middle portion, …, 8, 1, 6, 11, 4, … of the 5 … 7 sequence) are the ones that deviate most from the plain lattice, so the intervals between them and other notes are more noticeably not quite the same as the nominally corresponding intervals elsewhere in the scale. Either way, we'll tend to avoid these intervals.
Now, let's label the intervals (modulo octaves) that we're not
avoiding. For reasons of historical accident, I'll use the letters A through G,
cycling back to A again after G, and let D stand for the base-note of our
intervals, the zero-point of our scale. So D corresponds to 0/12 and 12/12 on
our scale, the ratios 1 and 2, along with all other simple powers of 2. We
avoid 1/12, so give the name E to 2/12, corresponding to ratio 9/8. Next F is
3/12, ratio 32/27 and we skip 4/12. Thus G is 5/12, ratio 4/3; we skip 6/12; so
A is 7/12, ratio 3/2. We skip 8/12; B is 9/12, ratio 27/16; C is 10/12, ratio
16/9; finally, we skip 11/12 and we're back to D as 12/12, ratio 2. We can
refer to the notes we skipped as modified forms of the ones on either side of
them: between D and E we skipped 1/12, which we can refer to as D♯ (D
sharp
) or as E♭ (E flat
); between F and G we skipped 4/12, so
that's F♯ or G♭; likewise 6/12 is G♯ or A♭, 8/12 is
A♯ or B♭ and 11/12 is C♯ or D♭. If we stick to the notes
named by simple letters, their intervals with D are all whole numbers of
twelfths of an octave and correspond to simple ratios. Oddly enough, using this
set of notes is known as playing in the key of C
(or C major,
indeed). Did I mention that the classical nomenclature involves a lot of
historical accidents ?
If we count round the cycle from D to the notes corresponding to the two
simplest ratios in our scale, we get 4/3 at G, which is the fourth letter when
we start with D as first, and 3/2 at A, which follows G, so is fifth. Because
of this, the intervals with ratios 4/3 and 3/2 are known as fourth
and fifth
in classical nomenclature.
If you let one of the notes labelled A have frequency 55 Hz (a.k.a. A1) and
use exact powers of 21/12 for all intervals in the scheme
above, you
get the orthodox well-tempered
(i.e. evenly spaced, logarithmically)
scale. Here, for example, are some data for the octave that contains middle
C
:
Note | frequency/Hz | r = frequency/(220 Hz) | 12.log(r) | approximate r |
---|---|---|---|---|
A | 440 | 2 | 12 | 2 |
G♯, A♭ | 415.305 | 1.888 | 11 | 15/8, 17/9 |
G | 391.995 | 1.782 | 10 | 16/9 |
F♯, G♭ | 369.994 | 1.682 | 9 | 5/3 |
F | 349.228 | 1.587 | 8 | 8/5 |
E | 329.628 | 1.498 | 7 | 3/2 |
D♯, E♭ | 311.127 | 1.414 | 6 | 7/5 |
D | 293.665 | 1.335 | 5 | 4/3 |
C♯, D♭ | 277.183 | 1.260 | 4 | 5/4 |
C | 261.626 | 1.189 | 3 | 6/5 |
B | 246.942 | 1.122 | 2 | 9/8, 10/9 |
A♯, B♭ | 233.082 | 1.059 | 1 | 18/17, 16/15 |
A | 220 | 1 | 0 | 1 |
Notes:
X|Ymeans
X♯ = Y♭
standard key: [C, C|D, D, D|E, E, F, F|G, G, G|A, A, A|B, B, C], known as C major; the standard notation has each line and gap natural.
In the course of preparing the table above, I noticed which denominators showed up among the logarithms, so tested to see how well each of them fared, as a subdivision of the octave. Each, by construction, works for some of the simple ratios above but most work poorly (at least when compared to how well 12 works) for most simple ratios. However, 53 turns out to work really rather well – appreciably better than 12 on most simple fractions, in fact. The diagram illustrates the closely matched ratios; from which we can infer ratios to fill in the remaining sub-divisions of 53. Notice that factors of 7 and 11 are passed over, while 5 is used only sparingly, in favour of factors of 13 and 19. Where more than one approximation is given, the first is the most accurate; note, however, that one of the simpler entries may be preferable in practical use.