On the scale of data-rates

Wherever information is generated, stored, retrieved or consumed, data flows and one relevant metric is the amount of it that does so in any given amount of time: that's a data-rate. It's worth remarking that information theory defines an information content associated with a body of data, that roughly corresponds to the smallest amount of data that could express the same information (this is consequently context-dependant); this is typically smaller (and often much smaller) than the actual amount of data involved. (For a half-way decent idea of how much smaller, save the data to file and compress it with your favourite compression program, whose name probably ends in zip.) For the purposes of this page, I'm mainly talking about the raw data; if I mention information, it'll be in this compressed sense.

There are special SI modifiers for the scale of data – 1024 happens to be 210 and data-wranglers love powers of two (with good reason), so use 1024 in place of 1000 as standard quantifier. Thus 1024 bytes used to be colloquially referred to as a kilobyte; it's now properly called a kibibyte; likewise, 1024 kibibytes is one mibibyte (previously megabyte) and 1024 mibibytes make a gibibyte (previously megabyte); I'm not aware of official parallel quantifiers beyond that, but we can guess they'll be tibi (1024 gibi) for tera, pibi (1024 tibi) for peta and so on, once someone gets round to introducing them. The old colloquial and new official nomenclatures co-exist and are mixed rather haphazardly – indeed, even before the new terms were devised, it was not uncommon to encounter floppy disks that held 1.4 megabytes, by which they meant (using a factor of 1000) 1400 kilobytes, but the kilobytes were actually meant 1024 bytes. Confusion thus persists. Fortunately, I'm only interested in the general scale of values, so a few factors of 1.024 aren't going to matter too much for this page's purposes. I'll use standard SI nomenclature (i.e. factors of the third power of ten, rather than the tenth power of two), for consistency with the rest of my scale-of-value pages.

One further complication is between the bit, the byte and the word. Formally, the byte is whatever bundle of bits is the smallest that the computer system under discussion knows how to handle as a whole; but, in practice, all computers have now standardized on the 8-bit byte, a.k.a. the octet, used by the internet as its standard unit of transfer. The word is the standard sized chunk of data that a computer operates on in a single clock cycle of its processor; as technology progresses, systems that deal with bigger words become prevalent. The number of bits in a word is commonly used to characterize the type of a computer; when we speak of 16-bit computers (whose use was phased out during the 1990s), we're referring to the size of the word on such systems; they had two-byte words. Those were replaced by the 32-bit generation, with four-byte words, and these in turn are now (around 2010) being replaced by 64-bit systems, with eight-byte words.

In principle the bit (properly abbreviated as a lower-case b) is the more primitive datum; in practice, the octet (always called byte) is the usual unit in use. I shall thus aim for consistency by using the byte (properly abbreviated as an upper-case B, but lower-case b is sometimes used in the wild) in preference to the bit. Consequently, my standard unit of data-rate is the byte/second, B/s, equal to eight bits/second, 8 b/s.

One could argue that b/s is the same thing as Hz (the unit of frequency); but the two have distinct use in practice. Indeed, when modern computers list their processor speeds (in GHz), I'm fairly sure the units of data moved around that many times per second are actually words – at each clock cycle, each CPU processes one word (or, more likely, a few words), handling all the bits involved at the same time. Thus the 64-bit four-core 2.6 GHz processor of the computer on which I'm typing this (in October 2011) performs operations 2.6 milliards of times per second; each core processes (conceptually) one word each time, so the whole system processes (some small multiple of) four words of eight bytes, for a total of 32 bytes = 256 bits. The data-rate is thus (were my computer fully loaded, so wasting none of its capacity) 32×2.6 GB/s or about 640 Gb/s, where the processor speed is (despite having four cores) still just 2.6 GHz. Since no physical thing is actually doing what it does at 640 GHz – rather, 256 things are doing what they do in parallel at 2.6 GHz – I thus decline to conflate b/s with Hz.

When a democracy picks a body of a few hundred representatives every four or five years, the government receives data from the electorate at a rate of a modest fraction of a micro-byte per second.
Under representative democracy, a population of around a hundred million citizens, among whom a modest but significant proportion each make a few choices (in elections at various levels), in each case among a handful of candidates, each four years: the electorate supplies data at a rate of order 100 mB/s towards the process of choosing representatives.
B/s = byte/second

This is the standard unit of data-rate. I type several characters per second on my key-board, thus transmitting data to my computer at a rate of a few bytes per second. I likewise read only a few words of text per second, thereby consuming data at a dozen or two bytes per second.

PB/s, peta-byte/second

The Large Hadron Collider's ATLAS data-collection apparatus outputs about a million gigabytes of data per second; that's roughly 1 PB/s.

Valid CSSValid HTML 4.01 Written by Eddy.