While reading The Black Sphinx
, I
noticed (after all, the blurb on the back did ask Can you crack the
hieroglyphic code ?
) the hieroglyphics in the margins. I fairly
rapidly noticed that the first page of each chapter used a specific sequence;
after that, the left pages of the chapter had one sequence and the right pages
another; changing from each chapter to the next. The end of the book informed
me that each chapter contains a verse of the Black Sphinx's curse, so I
studiously ignored the table given on the facing page (mapping hieroglyphs to
letters) and set about attacking the puzzle.
With hind-sight, it seems to me that an account – of how I
solved the puzzle – may be of interest to someone trying to break a code
(albeit the approach I took would really only work for a substitution cipher,
and no-one serious about cryptography uses those anymore, as they're easy to
break). To that end, here's this page – faithfully reporting what I
actually did (in my review of the book,
I correct the errors I eventually discovered I'd made). Before I'd reached the
end of the book, I'd guessed that the hieroglyphic code was just a substitution
cipher – using each hieroglyph for a distinct letter of our alphabet
– and that the plain text, once revealed, would be in English. This guess
counts as an application of prior knowledge
– I found the cipher in
a book for anglophone children and it was clear the author intended for such
children to have at least some chance of cracking it. This guess was bolstered
by the fact that the book ended with a look-up table from hieroglyphs to letters
(so I knew I'd be able to check my results when finished).
My first job was to transcribe all those messages out of the chapters.
Since I'm not a dab hand with hieroglyphics, I copied each out just once and
selected a letter (the first one I hadn't yet used) to replace it; I wrote that
up as a look-up table and transcribed the verses using the letters thus implied.
Since the end of the book referred to each chapter's contribution as a
verse
, I treated each chapter's left-page text as one line and right-page
text as the other in a two-line verse. I interpreted larger gaps between
hieroglyphs as spaces breaking up some lines into words. Where a
hieroglyph
that looked a lot like a question mark appeared, I decided to
treat it as a hieroglyph, rather than an actual question mark (which, in fact,
it was). I transcribed the six-hieroglyph word from the first page of each
chapter first, so that it's ABCDEF
, but presumed that it isn't actually a
verse itself. This gave me:
GBHE ICH
JKIHADELHMEG
KNKDIAAHI OHPGEA
LMHHICH AQE
RQMEA GQIK PGGE
GL RSGGOJMKTH
AHUMHIAOHKO RGEHA
NKSVDEJUMHKIQMHA
AIKSVDEJIHMMGM
CKQEIALGMHAIA
KEO BHKVAGUHKEA
RGDS OMWICH AIKMA
OMGBEDJCI LKSSA
LGMHTHMNCGAH
ACKOGNXNCGAH
NCDABHMXKEQRDA
DA UKSSDEJCH AKDSHA
KEO BGDEIAK BKSH
LDEJHMNKSV
NDIC CDPGTHM ICH
KACHAJDTH CDP
WGQM AGQSQEIDS
WGQ MHKUCICH RSKUV
ABCDEF
and I spent a happy half hour or so (the rest of the flight I was on)
staring at that and trying to find patterns. The two uses of K as a word on its
own strongly suggest that it's either i
or a
. I noticed the
repeated words
ICH ×5, KEO ×2, NCGAH ×2 and CDP
×2 – strongly hinting that ICH is the
– and the
repeated sub-string DEJ ×4, particularly the two instances of it in
KSVDEJ. I counted the letter frequencies as (here also giving a description of
each hieroglyph):
description | eye | square | square spiral | duck | zig-zag | goose | snow-drop | flag | dome | oven | hawk | snail | lozenge | dove | mitten | owl | curved spiral | stork | hound | man | bowl | hole | two flags | question-mark |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
letter | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X |
count | 31 | 7 | 17 | 16 | 21 | 1 | 23 | 35 | 19 | 4 | 21 | 7 | 21 | 8 | 8 | 4 | 8 | 6 | 12 | 4 | 5 | 4 | 3 | 2 |
The commonest, H, is also the third letter of ICH, which encourages reading
this as the
, since e
is generally the commonest letter in English
texts; I and C are also fairly common, in agreement with t
and h
being fairly common (albeit a poor match for t
, usually the second most
common letter in English). That was about as far as I got, scribbling on the
back of the paper bag in which I'd brought my sandwiches for the journey, before
we reached Stansted and I gave the flight attendant's pen back to her. I stared
at it all for a bit on my next train journey, then put it aside until I had a
handy computer on which to go further – it's much easier to do letter
substitutions correctly, I don't run out of space to write more and it's much
easier to correct mistakes. Besides, the paper bag was somewhat rumpled, from
having been used !
In fact, when I got home and transcribed my notes into my computer, I forgot
all about ICH = the
, so started from scratch. Furthermore, in
transcribing to the computer, I made some transcription errors.
Now, helpfully, my Debian GNU/Linux computer doesn't just have
(effectively) endless space to play around in; it also comes with a file called
/usr/share/dict/words
– which is just a huge list of English
words (96029 of them, in my installation) – and a program called
grep
which I can use to search for words matching the patterns
described by what I'd transcribed. For example, LGMHTHM is a seven-letter word
in which the third letter is repeated as the last and the fourth is repeated as
the sixth, but no other letter is repeated: I can transform that description
into some things called regular expressions
and get grep
(whose name means Get REgular ExPression
) to find all words matching that
pattern. Since you either know all about grep
and can work out the
regular expressions yourself, or would just see them as unintelligibly runic
gobbledegook contributing nothing to the narrative, I shalln't go into the
details; so – when I say I grabbed
(I live in Norway: in Norwegian,
the word grep just happens to mean grab
) the words
matching some fragment, it's a short-hand for saying I used grep
to
find all the words in my big word-list that the fragment could have been.
So, first, I found the longest word in my text, UMHKIQMHA. Notice that the sequence MH appears twice in this, so any matching word must have second and third letters agreeing with last-but-two and last-but-one. When I grabbed all the words that match, I found 25:
backspace bounteous cardsharp cherished cindering courteous creatures curvature donations dormitory earthward furniture glandular hindering lengthens linchpins menhadens perfumery renascent tinkering toreadors treasured violation visualise wintering
Looking at the matches for MH, I found five of these matching it with
in
, so tried substituting M→i and H→n. However, I fairly
rapidly found that this turned DELHMEG into DELniEG, for which the only match
was peonies
. Not only was that implausible (more prior knowledge: The
Curse of the Black Sphinx
is unlikely to be talking about peonies, I
guessed) but the substitutions it implied turned LMHH into oinn
(which doesn't look like a word to me) and gave IHMMGM = Iniisi, LDEJHM =
opeJni, which got me no matches when I grabbed for them. Of course, DELniEG
might be some word that my dictionary missed, such as a foreign word or a name,
but even without matching it to peonies
I was left with AHUMHIA =
AnUinIA, IHMMGM = IniiGi and LGMHTHM = LGinTni, none of which got me any
matches.
Still, checking up on that had brought my attention to these words: although
they're shorter than UMHKIQMHA, they re-use letters a lot in a smaller space,
which actually restricts what a grab for them can find better than merely being
long. So I back-tracked from reading MH as in
and looked for matches to
these. AHUMHIA got 25 matches (equalling UMHKIQMHA), IHMMGM got 47 and LGMHTHM
got a mere ten. Best of all, those ten were:
crybaby forever indexed linemen manikin oldened pacific tireder videoed widened
Now, once again, my prior knowledge revealed that one of these was way more
plausible than the others: forever
is just the sort of ominous word one
expects to find in an ancient curse. So I tried substituting L→f,
G→o, M→r, H→e and T→v. That turned IHMMGM into Ierror,
which was obviously terror
(indeed, grabbing for matches got me
exactly only this), telling me I→t. It also turned DELHMEG into DEferEo;
which grabbed exactly only inferno
, telling me D→i and E→n.
Notice how the two words just matched are excellent matches for our prior
knowledge: terror and inferno are words that fit just nicely in a curse. At
this point I noticed my old friend ICH had become tCe and remembered that it was
obviously the
, so converted C→h.
By this stage, the text was starting to look almost intelligible; it also
became clear that each verse should really be a single line, not split in two.
LDEJHM has become finJer, which could be either finder
or finger
;
a quick look at EDJCI shows it's become niJht, which isn't going to let J be
d
, so we get J→g to make these finger
and night
,
respectively. By this stage, EDJCI LKSSA LGMHTHM has become night fKSSA
forever
, which fairly screams (particularly as we're decoding a curse)
K→a, S→l (ell, L) and A→s. Thus EDJCI LKSSA LGMHTHM becomes
oBen the gates
so B→p. At the same time NCGAH ACKOGNX has become
Nhose shaOoNX
giving us O→d and N→w; it also seems to want to
make X be s
, but we've already got that, as A – but now I remember
that the hieroglyph
for which I used X looked like a question mark, and
that works just fine here and in the other use of X: NCGAH NCDABHMX, which has
now become whose whisper?
At this point it's all pretty clear: KEQRDA DA UKSSDEJ has become anQRis
is Ualling
and I know there's an Egyptian deity called Anubis, so I get
Q→u, R→b and U→c; ICH RSKUV ABCDEF has become the blacV
sphinF
yielding V→k, F→x; GUHKEA RGDS OMW is oceans boil
drW
so W→y; K PGGE GL RSGGO is a Poon of blood
so P→m.
Thus, finally, we're left (now deploying the word that began every chapter as a
title) with:
Sphinx
open the gates
inferno awaits
set semons free
the sun burns out
a moon of blood
grave secrets
dead bones walking
creatures stalking
terror haunts
forests and peaks
oceans boil dry
the stars drop
night falls forever
whose shadow?
whose whisper?
anubis is calling
he sailes and points
a pale finger
walk with him
over the ashes
give him your soul
ontil you reach
the black sphinx
I can see what look like errors due to my transcription in here: chapter 3's
semons
, chapter 17's sailes
('though this might be deliberate
archaism by the author) and the penultimate chapter's ontil
. The first
and last of these turn out to be simple transcription errors between paper and
computer – if you apply the substitutions I've given in this section to
the raw text I've given for my first attack (which faithfully records what I
wrote on the paper, rather than what I initially transcribed to the computer),
you'll get demons
and until
instead. For the second,
re-transcribing from chapter 17, I find APDSHA which is smiles
–
which makes much better sense than sailes
!
One obvious lesson to be learned is that transcribing things manually is error-prone, so it's a huge advantage to get your information onto a computer as early in the process as is practical – computers copy (and transform) information much more reliably than people do !
The other is that judicious use of prior knowledge makes a
huge difference. Notice how the key word forever
was the break-through
in the above solution: the other nine words LGMHTHM could have matched might
have been the right thing to try in other cipher-texts, but what I knew about
the text before me favoured forever
as its meaning in this case.
It may seem frivolous of me to count, as prior knowledge, my guess that we were looking at a substitution cipher (because kids have a chance of cracking that), but this does actually match up with another reality in cryptography: what you know about the sender, intended recipient and their relationship can indeed help you to know what kind of cipher you're dealing with. If you know they are both parties to an elaborate scheme of exchange of secret knowledge, then you may have to cope with a more elaborate cipher; but, if they're mutual strangers, that gives you some clues as to what kind of cipher is in use.
Finally, I hope you can see that a plain substitution cipher is not even remotely secure – if you really need to communicate secretly, you need something more elaborate. Fortunately much more robust ciphers are now readilly available, including ones which make it possible for two parties with no prior knowledge of one another to set up a secure channel between themselves – and the fact that their enemies know about these ciphers doesn't make them insecure, much less make a substitution cipher more secure.
Written by Eddy.