On colour perception

In all matters of perception, our senses behave in subtler ways than a simplistic description of them might anticipate. In the case of colour perception, the received wisdom of the world (and, in particular, of display screen technology) over-simplifies even the simple model it starts with; on this page, I intend to describe some subtleties that are overlooked in the usual RGB and CMYK models.

Eyes

Give or take some subtleties of genetics, the human eye has four types of light sensors: one type (the rods, IIRC) simply detect light and dark across a broad spectrum; the other three types (the cones) are sensitive to more selective spectra, roughly concentrated in the red, green and blue areas of the spectra, variously.

The rods are sensitive down to much lower light levels than the cones, which is why the world looks much more nearly black-and-white at night, when we have little light to see by. The part of the eye (called fovea centralis) which our vision is best adjusted to see in – thus the part we are looking at when we're looking at something in particular – has a high density of cones, which consequently crowd out the rods in that area; this is why, at night, you'll see a thing better if you look just a little to one side of it; you're only seeing with rods and there are more of them a little away from the point you look directly at. Away from this point of attention, the eye has more rods and fewer cones; but the brain keeps track of what colour things are from when you have looked at them, so you don't notice that your peripheral vision isn't in such full colour as your direct vision. However, I'll here be concerned mainly with how we see what we're looking at in good light.

The genetics of the pigment that makes the red cones sensitive to red light are a little subtle. The gene which codes for the pigment is on the X chromosome: women have a pair of these but men have only one (paired with a Y chromosome). There are two wide-spread variants (formally: alleles) of the gene, which are expressed as subtly different reds (there's a third, much rarer, variant which is actually a green). Each man thus has one variant (and, if it's the rare green variant, he's red-green colour-blind); each woman has two. Of course, a woman's two variants may be two copies of one variant, in which case she'll only have one of the two red variants; this arises for roughly half of all women (roughly a quarter with two copies of one variant, a quarter with two copies of the other); the other half have two distinct variants, hence both shades of red – we may, consequently, expect that they have a richer perception of colour than the rest of us. (Of course, a woman may also have the rare green: the proportion of women who have one rare green is roughly the same as the proportion of men with the rare green, but these rare women have a second copy of the gene, which is usually red, so they're not red-green colour-blind – except for a likewise tiny proportion of these rare women; so the relevant red-green colour-blindness is much rarer among women than among men.)

The reality of the genetics of colour vision is actually significantly more complex than the above. Some people simply lack one, or some, pigments; and there may be several variants on each pigment, when considered across the population. To describe all of human colour perception, we need to take account of all the various pigments expressed in the cones of people's eyes; so we'll have even more than four pigments to take into account; and, in discussing each individual's perception, we'll select out only the information relating to the pigments that individual expresses.

Thus, officially, one should consider more than three pigments when describing human vision fully: however, the issue I'll be raising doesn't actually depend on how many pigments there are, only on the fact that there are only finitely many of them. So I'll mostly play along with the conceit that we only need to think about three.

The probability that a given cone shall absorb a given photon that hits it depends on which pigment that cone expresses. Interestingly, the signal the cone sends to the brain (or, strictly speaking, to the specialized image-processing network of nerves in the eye, which then report to the brain) doesn't depend on the frequency of the photons it absorbs, only on how many of them there were; so the eye's response (as reported to the brain) only depends on the colours of the light and of the pigment via the latter's probability of absorbing the former.

Spectra

Formally, light is any kind of electromagnetic radiation, including radio waves, microwaves, infra-red, ultra-violet, X-rays and gamma rays; however, colloquially, we use the term to refer to visible light: since I'm here discussing vision, I'll mostly stick with the vernacular. The difference between the types of electromagnetic radiation is simply their wavelength; or, equivalently, their frequency. Any wave phenomenon has wavelength (the distance between two nearest points where the wave is doing the same thing, measured along a line in the direction of its movement) and frequency (the inverse of the time between when, at a fixed place, it does a given thing and when it repeats that, later); if we multiply the two together, we get the speed of the wave. Since light has a fixed speed, independent of its wavelength or frequency, we only need to know one of these; we can infer the other from it by dividing the speed of light by the one we know. (For waves on the surface of water, it's more complex: their speed does vary with their physical dimensions.) Consequently, light with a lower frequency has bigger wavelength than light of a higher frequency. My list of types of electromagnetic radiation, at the start of this paragraph, gives them in order of increasing frequency, hence of decreasing wavelength. Visible light falls between infra-red and ultra-violet: the human eye can detect light whose wavelength is between about 0.38 microns (violet, at 762 tera Hz) and 0.7 microns (red, at 412 tera Hz); a micron is one thousandth of a millimetre (a tera Hz frequency describes something that repeats itself a million million times per second; visible light thus repeats itself several hundred million million times per second).

In music, a pure tone is a sound comprising just one wavelength (or frequency): likewise, a pure colour is light of just one wavelength. Just as real sounds are made of mixtures of tones, so also is a real beam of light made up of a mixture of colours. Thus the proper description of the colour of a beam of light is given in terms of a spectrum: this expresses how much of the mixture is at which wavelength (or frequency). If we pass a beam of light through a prism, it's famously split up into its various colours: if we project the result onto a screen, we'll see the colours neatly spread out from violet to red, with some parts of the range brighter and others darker. The spectrum of our beam is expressed as a function from position on our screen (which encodes wavelength and frequency) to intensity (how bright is the light falling on that part of our screen).

This function formally encodes a density or measure: if we adjust the angle of our screen to the spread-out beam, some colours shall meet it more nearly straight on, others less so; the ones that are more spread out shall appear correspondingly dimmer, because the amount of light of that colour is spread out over a bigger area. All the same, between where one frequency (or wavelength) hits the screen and where another does, the total light falling on the screen is the same, regardless of how we angle (or even curve) the screen. A density is a formal way of describing the intensity so that it respects that. If position on our screen is described by a length, x, along the line from bluest to reddest; and intensity at each point is expressed as I(x); then the density is really I(x).dx. If we adjust the angle of the screen and describe position on the adjusted screen by a length, y, analogous to x, we'll get a new intensity function, J(y); when we set up a one-to-one mapping between x and y which maps each to the other where they get the same frequency (or wavelength), we'll find that J(y).dy and I(x).dx are equal. Frequency varies along the screen as a direct function of x, say f(x), expressing the density as F(f).df = F(f(x)).f'(x).dx, so F(f(x)) = I(x)/f'(x); likewise, with wavelength varying with position as λ(x) = c/f(x), we get Λ(λ).dλ = −Λ(c/f).c.df/f/f whence Λ(λ)/λ = F(f)/f = I(x)/f(x)/f'(x) = I(x)/λ(x)/λ'(x). The functions which encode the density, when described in terms of position on our screen (I), frequency (F) or wavelength (Λ) are thus closely related to one another but distinct; the density, however, is the same in all cases. All the same, the discussion here shan't depend on this subtlety: you're allowed to think of the spectrum as being described as a simple intensity function, without worrying about how it's parameterised, or how that adjusts its values.

I'll refer to this intensity function (or, strictly, to the density it encodes) as the spectrum of the light. When one spectrum is, at each frequency, just the result of multiplying another (at the same frequency) by some constant (not dependant on frequency), the two spectra (plural of spectrum) are generally considered equivalent; we're not really interested in the over-all scale of a spectrum so much as its shape – how it varies with frequency.

Transmission and absorption

Now, if you have a transparent material, such as glass or perspex, and you put a sheet of it into the beam of light before it reaches the prism, it just lets the light through: so the spectrum won't change. If you stain your transparent material with some dye, however, the dye shall absorb some of the light; if the dye isn't grey, it'll absorb some colours more than others; so the spectrum shall change. Such a dyed transparent sheet, or anything else that light passes through to possibly alter its spectrum, is described as a filter; the spectrum that emerges from it is described as a transmitted spectrum.

If we vary our initial beam's over-all spectrum, but keep the ratio of intensity between any two frequencies unchanged, the transmitted spectrum likewise just varies by an over-all scaling, at least for simple dyes. (This is why we don't usually care about over-all scale; react-to-light glasses use pigments that do change their absorption properties with intensity, but I won't be considering whacky things like that.) If the initial beam's spectrum is zero outside some given range of frequencies then so is the transmitted spectrum (except in the case of very unusual materials (including ruby), and even then except at a very few specific frequencies: I'll ignore these exceptions in the following analysis). If we pass our initial beam through a prism to split it, we can put a screen with a small slot in it into the split beam to select only a narrow range of frequencies to obtain such a limited beam; the transmitted spectrum it produces turns out to be exactly the same as the one you would get by using the original beam and putting your slotted screen in the corresponding place in the transmitted spectrum. Thus, in effect, transmission treats each frequency (or wavelength) of light independently of the others: and, at each frequency, simply scales the amount of light at that frequency down by some factor, which depends on the frequency. We can thus model our transmitter by a simple function of frequency (or wavelength), G, for which the transmitted spectrum, T, resulting from an input spectrum I is just obtained by multiplying their respective values at each frequency (or wavelength, or whatever other parameter you're using): T(f) = I(f).G(f). This function, G, is known as the transmission spectrum of our filter: its value at each frequency is between zero and one – and really is just a scalar (i.e. numeric-valued) function (rather than a density, as the beam's spectrum is).

If we put two filters into our beam of light, with transmission spectra G and H, we'll get an emerging spectrum S(f) = H(f).T(f) = H(f).G(f).I(f); if we consider the two filters as one compound filter, its transmission spectrum is thus H(f).G(f). Generally, one uses almost clear filters, so that H and G are both close to one; if we write them as H = 1−h, G = 1−g, with h and g small enough that h.g is so small we can ignore it, then H.G is almost exactly 1−(h+g). Instead of combining two filters, each made by staining a clear filter with a dye, you can stain one clear filter with the dyes you would have used in the two filters; naturally enough, the result has the same transmission spectrum.

Thus far we've only been considering the light that our material lets back out. Conservation of energy implies that the rest of the light must have been absorbed by the material; the absorbed light plus the escaping light must be equal to the inbound light. If, at each frequency, we divide what's absorbed by what's inbound, we'll get one minus the transmission spectrum: this describes how much of each frequency was absorbed. The above discussion showed that, when we combine two (almost clear) filters, whose absorption-describing functions are h and g, the compound's matching function is h+g. Thus, when combining filters, we just add the extent to which they absorb: a compound filter absorbs everything that either of its parts absorbs.

Indeed, to describe a filter, we properly need to do so in terms of its absorption: the more dye we add, the more light is absorbed. A photon travelling through a medium has a certain probability, p, per unit distance of being absorbed; the proportion F of photons of given frequency that travel at least some distance x through the medium shall depend on x and satisfy dF(x)/dx = −p.F(x), implying F(x) = F(0).exp(−p.x); since it's a proportion, implicitly of the photons entering our medium, F(0) is necessarily 1; p is known as the optical density of the medium, at the given frequency. For the most part (barring unusual interactions between the medium and the dyes in it) we can express p as a sum of terms, one per dye in our otherwise clear medium; each term is proportional to the concentration of the given dye; and that term's form, aside from this over-all scaling due to concentration, depends only on the dye in question. When the medium contains only one dye, divide the optical density of the medium, at each frequency, by the concentration of the dye; this yields a function, the absorption spectrum of the dye, which characterizes its contribution to a filter's effects. (If we don't have an entirely clear medium, we can obtain the dye's contribution to the optical density by subtracting the optical density without the dye from the optical density with it.) When combining dyes in a filter, it suffices to scale the absorption spectrum of each by its concentration, then sum over dyes; scale the result by the thickness of the filter and take the negative exponential of the result to obtain the transmission spectrum.

Pigments and light

If you put a sheet of paper in your beam of light, you can point a telescope at the point where the beam hits the paper and get out a beam of light that's the original beam, bounced off the paper. The light bounces off in lots of directions, so the emerging beam has only a fraction of the original's over-all intensity; so we need to scale the bounced light's spectrum up to compensate – or collect the light bouncing off in all directions, rather than just into our telescope. If the paper is white, the bounced beam shall show the same spectrum as the original; but, if the paper is coloured, the bounced beam's spectrum shall be different – some frequencies are scaled down relative to what you would have seen if the paper had been white.

Paper (or canvas) is made of lots of fine fibres which don't absorb much light; light scatters off the fibres and bounces around until it escapes from the paper; regardless of where the light came from, it emerges in essentially random directions, so it gets spread out over all directions. When we stain the paper with a dye, or apply paint to it, we embed lots of little pieces of coloured material in the generally translucent mess. As light bounces around in the paper (and any solvent in the paint), it has to pass through or bounce off this coloured stuff, which absorbs light just as a filter would. If we mix paints, the light ends up being filtered by coloured fragments from both paints, so the emerging light is coloured as if by combining filters: i.e., we have to add the absorption spectra (or multiply the transmission spectra) to work out the result. The things you can apply to your canvas (paint, cloth or whatever) to make it coloured are known collectively as pigments – e.g. paints, inks and dyes are all pigments (or, often, a translucent goo with some pigment in it).

In contrast to filtering, if we shine two beams of light on one piece of paper, we add the spectra of the two beams. If we take two beams of white light, pass each through a filter, then combine them like this, the result looks like what would result from shining both white beams through a filter whose transmission spectrum is the average of those of the two given filters. Thus, in effect, combining lights (that have come through different filters) is described by adding the spectra; whereas combining filters, paints or inks is described by multiplying the transmission spectra, which is more intuitively interpreted as adding the absorption spectra.

Colour visual displays – whether cathode-ray tubes, liquid crystal flat screens, light-emitting diode arrays or projectors illuminating a passive screen – emit light from a vast number of small spots (pixels) each of which comprises a group of light sources of various colours. Because the eye cannot resolve the sources within an individual pixel as separate, it perceives each pixel as a single light source which mixes the light from its various parts, so the effective spectrum of each pixel is the sum of the spectra of its parts.

Back in the eye, remember that we have cones, with various pigments in them; and what they report to the brain depends on how many photons they absorb from the light shining on them. We can now describe that in terms of the spectrum of the light and the absorption spectrum of the cone's pigment. If we multiply the two up and integrate (formally, in fact, integrate the absorption spectrum – a scalar function of frequency – using the light's spectrum – a density, or measure; which is exactly the mathematical tool that encodes integration) over frequency, we'll get the number of photons the cone shall be absorbing – possibly give or take a scaling; if we measured intensity in terms of energy, we'll have to divide by the energy per photon (which is proportional to frequency) to get intensity in terms of number of photons. To avoid that complication, I'll presume hereafter that we've described intensity in terms of number of photons, so the integral of a light beam's spectrum between two frequencies encodes the number of photons, arriving per unit time, with frequencies between the given two. So we can describe the signal the brain gets from the eye in terms of integrals of the light's spectrum multiplied by the absorption spectra of the various pigments of the cones.

The orthodox simplifications: RGB and CMYK

The standard descriptions of colour are, consequently, based on two dual models: an additive model which adds red, green and blue lights (the primary colours) to build up arbitrary colours; and a subtractive model which is formally described as removing cyan, magenta and yellow pigments (the secondary colours) from a nominally black initial mixture. The colours used in each model are described as the complements of the colours used in the other: red complements cyan, green complements magenta and blue complements yellow.

The subtractive model can also be viewed in terms of adding cyan, magenta and yellow to a white background to build up an absorption spectrum, which subtracts the relevant coloured light from the white. Mixing all three pigments in equal amounts notionally produces black: but, in practice, a mixture of cyan, magenta and yellow pigments yields an imperfect black; so it is usual to supplement these three pigments with some actual black (which is cheaper in any case); if your recipe for a given colour calls for some of each of cyan, magenta and yellow, you can subtract from each the smallest (so that one of them is zero) and add in that much black to achieve an officially equivalent result which is likely to actually look better (and be cheaper). The CMYK printing model (cyan, magenta, yellow and key, since black is being used in the rôle of a key plate in traditional printing) is thus equivalent, in theory, to a plain CMY model – the addition of black is just an implementation detail to make the actual results a better match to what the theory describes.

So the two standard descriptions are RGB (red, green and blue) for light and CMY (cyan, magenta and yellow) for pigments. Mixing two lights produces the colour of the pigment complementary to the third; mixing two pigments produces the colour of the light complementary to the third. In practice, these work reasonably well; a great diversity of the colours distinguishable by human eyes can be produced by combining suitable lights or pigments. However, there are colours neither can produce; and each can produce colours the other cannot.

Theory

The spectrum of an actual beam of light can combine any and all frequencies in any amounts; we can add spectra; and we can (though, as stated above, we don't normally attend to it much) scale a spectrum (i.e. turn up its over-all brightness, without changing the relative brightnesses of the individual frequencies in it): formally, the possible spectra form a vector space. Furthermore, since the available frequencies form a continuum (ranging from 412 to 762 tera Hz), there are infinitely many distinct frequencies at which we can independently chose the beam's brightness, so our vector space is infinite-dimensional.

The eye

However, we only perceive our beam of light via three pigments in our eyes. Scaling a light spectrum by an absorption spectrum and integrating constitutes a linear map from our vector space of light spectra to the trivial vector space of scalars. The space of linear mappings from a vector space to scalars is known as the dual of the vector space, dual(V) = {linear map ({scalars}:|V)}; so we can represent each pigment in our eyes by a member of dual({spectra}). Let C be a list of these mappings, say C = [R, G, B]; if the light coming into the eye has spectrum s then the signal to the brain can be encoded as a list of three numbers, [R(s), G(s), B(s)]. Thus, although {visible spectra} has infinite dimension, the signal reported to the eye can be described in terms of a vector space of finite dimension – its dimension is the length of C, here taken to be three but, as noted above, it might be more or less. When we want to consider the general population's various perceptions of a colour, we may need several more components in C even though any given person gets only a subset of C's components; but C shall still remain finite.

Note that the eye's response isn't linear; however, we can express its response in terms of the linear algebra I'm using and a non-linear process down-stream, which I'll merrily ignore. The information available to the downstream processing has all come through the linear process I'm describing, though it gets mangled downstream; so the brain gets no more information than the process I'm describing provides for – and that suffices for my purposes. So I'll describe perception in terms of the signal that's my nominal input to the eye's and brain's nonlinear processing; this signal depends linearly on the spectrum of the incoming light and on the absorption spectra of the pigments of the cones.

Thus the eye reduces the input spectrum (in an infinite-dimensional linear space) to a signal in a finite-dimensional linear space.

RGB light

When we make up a beam of light by combining light from several sources, we simply add the spectra from those sources to obtain the beam's spectrum. In the RGB colour model, we have a list of standard spectra, c = [r, g, b], each of which we can deliver at any intensity; so any signal we produce can be described by a list of numbers, h = [q, e, a], describing scalings by which we multiply the standard spectra, yielding a spectrum sum(h.c) = q.r + e.g + a.b, whence the signal received by the eye is encoded by the list

(: C(i, sum(h.c)) ← i :): = [R(sum(h.c)), G(sum(h.c)), B(sum(h.c))]; = q.[R(r), G(r), B(r)] +e.[R(g), G(g), B(b)] +a.[R(b), G(b), B(b)]; = (: q.C(i, r) +e.C(i, g) +a.C(i, b) ← i :); = (: sum(: h(j).C(i, c(j)) ←j :) ← i :); = sum(: h(j).(: C(i, c(j)) ← i :) ←j :)

Thus the matrix (: (: C(i, c(j)) ← i :) ← j :) = [ [R(r), G(r), B(r)], [R(g), G(g), B(b)], [R(b), G(b), B(b)] ] fully describes how the signal the eye receives, (: C(i, sum(h.c)) ←i :), depends on the signal we send to our RGB light source, h. If we could arrange for C(i, c(j)) to be one when i = j and zero otherwise, this matrix would be the identity, so the signal received by the eye would simply be the signal we sent to our light source and we'd be able to deliver any colour at all. In practice the absorption spectra encoded by R, G and B are quite broad and overlap one another extensively, so we can only approximate this happy state of affairs, though we can do so quite well.

CMY pigments

When we mix up some pigments, apply them to a white back-ground and illuminate the result, we again generate a spectrum; which the eye converts to a signal as before. However, the spectrum we generate is a little fiddlier to describe. We start with the illumination, described by a spectrum s; this is then filtered through (possibly by bouncing off) our pigments.

Complications

Absorption spectra are the correct description, but need to be described in terms of specific optical density at each frequency; the rate of proportionate decrease in (i.e. the rate of decrease of the log of intensity at) that frequency is obtained by adding the specific optical densities, each scaled by how much of the stuff is present, of the various ingredients.

The absorption spectra of the eye's pigments overlap, and are positive, so any dual is necessarily negative at some frequencies, which is unphysical.

Things I need to read before I continue: comparing RGB with CIE; CIE vs Y'UV representations; and background details on colour vision.

Theory

Written by Eddy.