A cipher is a transformation that can be applied to text to hide its
meaning – e.g. while being transmitted in contexts where it might be
intercepted by folk the sender doesn't want to see it – yet which can
later be undone to recover the text. When designing a cipher, one needs the
author to be able to apply the cipher – to encipher
the real
message, known as the plain text
, yielding the encoded message, known
as the cipher-text
– and the intended recipient to be able to
undo the cipher – to decipher
the message – but it should
be hard for anyone else – known as an attacker
– to
decipher the message. The hard part lies in making it easy for the intended
recipient to do something that's hard for the attacker: the intended recipient
needs to have someting the attacker lacks, usually knowledge of a secret, that
makes the difference.
One of the oldest ciphers known (attributed to Julius Caeser) is to simply
replace each letter of the alphabet with the one thirteen steps forward or
backwards from it; if you write the alphabet evenly-spaced round a circle,
this just corresponds to rotating the circle thirteen steps, so it's known
as rotate 13
or just rot13
. More usefully, one can simply write
out the alphabet in two lines of thirteen letters; then each letter is
converted to the one in the same position of the other line. One can do more
complex substitutions – in general, any permutation of the letters is
open to you – but, no matter how cleverly you permute the letters, this
class of cipher – the substitution cipher – is extremely easy to
break.
Natural languages don't use all letters equally frequently – for
example, in English, the letter e
occurs a little over 9% of the
time, t
7%, o
a little less, i
almost 6% and a
a
little less and so on down to q
and z
which occur less than once
per thousand letters – so an attacker can count the frequencies of the
substituted letters in the cipher-text; whichever letters show up most often
are good candidates to be some of the more common letters in whichever
language the underlying message uses. Likewise, in any given language,
certain words are more common than others. Furthermore, even when you've
substituted the letters, the patterns of letter re-use within a word are
preserved; even after you've encoded forever
as LGMHTHM
(say),
the re-use of r
and e
is still visible, as re-use of H
and M
; it's fairly easy (particularly nowadays, with computers at the
attaker's disposal) to search a dictionary for the words with a given pattern
of letter re-use. This kind of approach gives the attacker a small number of
possibilities to try, with good likelihood that one of these shall work,
instead of having to try all of the
26×25×…×3×2×1 ≈ 4e26 possibilities
(this is as many as the number of atoms in about 670 grammes of
Hydrogen). Once an attacker has worked out a few words, the same
substitutions as worked for those shall apply to the rest of the text, making
it easier to make good guesses and thereby discover more clues.
More sophisticated ciphers (dating at least as far back as the 1600s) change the substitution as they go along, sometimes even in ways that depend on the text being enciphered. Modern ciphers go beyond that, but that's a topic for another day, and some other page.
It remains that substitution ciphers have their uses: rot13 is widely used
in various internet contexts as a way to hide
text from those
who don't want to see it. Some folk, when publishing a joke-riddle,
will supply the answer in rot13 so that readers can have the fun of trying to
puzzle out the riddle before reading the punch-line. If someone is reviewing
a book or movie and parts of their review would give away the plot (but are
pertinent to discussion with those who already know the plot), encoding that
part lets folk who haven't read or seen the work get as much out of
the review as possible, without spoiling the work. I've seen
one children's book use a variant
on the theme to add a bit of a challenge for readers; and I was prompted to
write this page by an enjoyable comic having a protracted interlude where one
of its characters, as a result
of a traumatic
loss, lost the
ability to speak normally – the author subjected her speech to a
substitution cipher which changed from one strip to the next (with bonus
errors in enciphering to make it harder to decipher).
You can use the form below to perform substitution on a text: either to encode a text using a substitution cipher or as a helper in trying to decode one. Just type the text into the text area, in place of the example text, and fill in the substitutions you want to apply.
Status display.