The riddle of the black sphinx

While reading The Black Sphinx, I noticed (after all, the blurb on the back did ask Can you crack the hieroglyphic code ?) the hieroglyphics in the margins. I fairly rapidly noticed that the first page of each chapter used a specific sequence; after that, the left pages of the chapter had one sequence and the right pages another; changing from each chapter to the next. The end of the book informed me that each chapter contains a verse of the Black Sphinx's curse, so I studiously ignored the table given on the facing page (mapping hieroglyphs to letters) and set about attacking the puzzle.

With hind-sight, it seems to me that an account – of how I solved the puzzle – may be of interest to someone trying to break a code (albeit the approach I took would really only work for a substitution cipher, and no-one serious about cryptography uses those anymore, as they're easy to break). To that end, here's this page – faithfully reporting what I actually did (in my review of the book, I correct the errors I eventually discovered I'd made). Before I'd reached the end of the book, I'd guessed that the hieroglyphic code was just a substitution cipher – using each hieroglyph for a distinct letter of our alphabet – and that the plain text, once revealed, would be in English. This guess counts as an application of prior knowledge – I found the cipher in a book for anglophone children and it was clear the author intended for such children to have at least some chance of cracking it. This guess was bolstered by the fact that the book ended with a look-up table from hieroglyphs to letters (so I knew I'd be able to check my results when finished).

First Attack

My first job was to transcribe all those messages out of the chapters. Since I'm not a dab hand with hieroglyphics, I copied each out just once and selected a letter (the first one I hadn't yet used) to replace it; I wrote that up as a look-up table and transcribed the verses using the letters thus implied. Since the end of the book referred to each chapter's contribution as a verse, I treated each chapter's left-page text as one line and right-page text as the other in a two-line verse. I interpreted larger gaps between hieroglyphs as spaces breaking up some lines into words. Where a hieroglyph that looked a lot like a question mark appeared, I decided to treat it as a hieroglyph, rather than an actual question mark (which, in fact, it was). I transcribed the six-hieroglyph word from the first page of each chapter first, so that it's ABCDEF, but presumed that it isn't actually a verse itself. This gave me:

GBHE ICH
JKIHA

DELHMEG
KNKDIA

AHI OHPGEA
LMHH

ICH AQE
RQMEA GQI

K PGGE
GL RSGGO

JMKTH
AHUMHIA

OHKO RGEHA
NKSVDEJ

UMHKIQMHA
AIKSVDEJ

IHMMGM
CKQEIA

LGMHAIA
KEO BHKVA

GUHKEA
RGDS OMW

ICH AIKMA
OMGB

EDJCI LKSSA
LGMHTHM

NCGAH
ACKOGNX

NCGAH
NCDABHMX

KEQRDA
DA UKSSDEJ

CH AKDSHA
KEO BGDEIA

K BKSH
LDEJHM

NKSV
NDIC CDP

GTHM ICH
KACHA

JDTH CDP
WGQM AGQS

QEIDS
WGQ MHKUC

ICH RSKUV
ABCDEF

and I spent a happy half hour or so (the rest of the flight I was on) staring at that and trying to find patterns. The two uses of K as a word on its own strongly suggest that it's either i or a. I noticed the repeated words ICH ×5, KEO ×2, NCGAH ×2 and CDP ×2 – strongly hinting that ICH is the – and the repeated sub-string DEJ ×4, particularly the two instances of it in KSVDEJ. I counted the letter frequencies as (here also giving a description of each hieroglyph):

description	eye	square	square spiral	duck	zig-zag	goose	snow-drop	flag	dome	oven	hawk	snail	lozenge	dove	mitten	owl	curved spiral	stork	hound	man	bowl	hole	two flags	question-mark
letter	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O	P	Q	R	S	T	U	V	W	X
count	31	7	17	16	21	1	23	35	19	4	21	7	21	8	8	4	8	6	12	4	5	4	3	2

The commonest, H, is also the third letter of ICH, which encourages reading this as the, since e is generally the commonest letter in English texts; I and C are also fairly common, in agreement with t and h being fairly common (albeit a poor match for t, usually the second most common letter in English). That was about as far as I got, scribbling on the back of the paper bag in which I'd brought my sandwiches for the journey, before we reached Stansted and I gave the flight attendant's pen back to her. I stared at it all for a bit on my next train journey, then put it aside until I had a handy computer on which to go further – it's much easier to do letter substitutions correctly, I don't run out of space to write more and it's much easier to correct mistakes. Besides, the paper bag was somewhat rumpled, from having been used !

Finishing up

In fact, when I got home and transcribed my notes into my computer, I forgot all about ICH = the, so started from scratch. Furthermore, in transcribing to the computer, I made some transcription errors.

Now, helpfully, my Debian GNU/Linux computer doesn't just have (effectively) endless space to play around in; it also comes with a file called /usr/share/dict/words – which is just a huge list of English words (96029 of them, in my installation) – and a program called grep which I can use to search for words matching the patterns described by what I'd transcribed. For example, LGMHTHM is a seven-letter word in which the third letter is repeated as the last and the fourth is repeated as the sixth, but no other letter is repeated: I can transform that description into some things called regular expressions and get grep (whose name means Get REgular ExPression) to find all words matching that pattern. Since you either know all about grep and can work out the regular expressions yourself, or would just see them as unintelligibly runic gobbledegook contributing nothing to the narrative, I shalln't go into the details; so – when I say I grabbed (I live in Norway: in Norwegian, the word grep just happens to mean grab) the words matching some fragment, it's a short-hand for saying I used grep to find all the words in my big word-list that the fragment could have been.

So, first, I found the longest word in my text, UMHKIQMHA. Notice that the sequence MH appears twice in this, so any matching word must have second and third letters agreeing with last-but-two and last-but-one. When I grabbed all the words that match, I found 25:

backspace bounteous cardsharp cherished cindering courteous creatures curvature donations dormitory earthward furniture glandular hindering lengthens linchpins menhadens perfumery renascent tinkering toreadors treasured violation visualise wintering

Looking at the matches for MH, I found five of these matching it with in, so tried substituting M→i and H→n. However, I fairly rapidly found that this turned DELHMEG into DELniEG, for which the only match was peonies. Not only was that implausible (more prior knowledge: The Curse of the Black Sphinx is unlikely to be talking about peonies, I guessed) but the substitutions it implied turned LMHH into oinn (which doesn't look like a word to me) and gave IHMMGM = Iniisi, LDEJHM = opeJni, which got me no matches when I grabbed for them. Of course, DELniEG might be some word that my dictionary missed, such as a foreign word or a name, but even without matching it to peonies I was left with AHUMHIA = AnUinIA, IHMMGM = IniiGi and LGMHTHM = LGinTni, none of which got me any matches.

Still, checking up on that had brought my attention to these words: although they're shorter than UMHKIQMHA, they re-use letters a lot in a smaller space, which actually restricts what a grab for them can find better than merely being long. So I back-tracked from reading MH as in and looked for matches to these. AHUMHIA got 25 matches (equalling UMHKIQMHA), IHMMGM got 47 and LGMHTHM got a mere ten. Best of all, those ten were:

crybaby forever indexed linemen manikin oldened pacific tireder videoed widened

Now, once again, my prior knowledge revealed that one of these was way more plausible than the others: forever is just the sort of ominous word one expects to find in an ancient curse. So I tried substituting L→f, G→o, M→r, H→e and T→v. That turned IHMMGM into Ierror, which was obviously terror (indeed, grabbing for matches got me exactly only this), telling me I→t. It also turned DELHMEG into DEferEo; which grabbed exactly only inferno, telling me D→i and E→n. Notice how the two words just matched are excellent matches for our prior knowledge: terror and inferno are words that fit just nicely in a curse. At this point I noticed my old friend ICH had become tCe and remembered that it was obviously the, so converted C→h.

By this stage, the text was starting to look almost intelligible; it also became clear that each verse should really be a single line, not split in two. LDEJHM has become finJer, which could be either finder or finger; a quick look at EDJCI shows it's become niJht, which isn't going to let J be d, so we get J→g to make these finger and night, respectively. By this stage, EDJCI LKSSA LGMHTHM has become night fKSSA forever, which fairly screams (particularly as we're decoding a curse) K→a, S→l (ell, L) and A→s. Thus EDJCI LKSSA LGMHTHM becomes oBen the gates so B→p. At the same time NCGAH ACKOGNX has become Nhose shaOoNX giving us O→d and N→w; it also seems to want to make X be s, but we've already got that, as A – but now I remember that the hieroglyph for which I used X looked like a question mark, and that works just fine here and in the other use of X: NCGAH NCDABHMX, which has now become whose whisper?

At this point it's all pretty clear: KEQRDA DA UKSSDEJ has become anQRis is Ualling and I know there's an Egyptian deity called Anubis, so I get Q→u, R→b and U→c; ICH RSKUV ABCDEF has become the blacV sphinF yielding V→k, F→x; GUHKEA RGDS OMW is oceans boil drW so W→y; K PGGE GL RSGGO is a Poon of blood so P→m. Thus, finally, we're left (now deploying the word that began every chapter as a title) with:

Sphinx
open the gates
inferno awaits
set semons free
the sun burns out
a moon of blood
grave secrets
dead bones walking
creatures stalking
terror haunts
forests and peaks
oceans boil dry
the stars drop
night falls forever
whose shadow?
whose whisper?
anubis is calling
he sailes and points
a pale finger
walk with him
over the ashes
give him your soul
ontil you reach
the black sphinx

I can see what look like errors due to my transcription in here: chapter 3's semons, chapter 17's sailes ('though this might be deliberate archaism by the author) and the penultimate chapter's ontil. The first and last of these turn out to be simple transcription errors between paper and computer – if you apply the substitutions I've given in this section to the raw text I've given for my first attack (which faithfully records what I wrote on the paper, rather than what I initially transcribed to the computer), you'll get demons and until instead. For the second, re-transcribing from chapter 17, I find APDSHA which is smiles – which makes much better sense than sailes !

Lessons

One obvious lesson to be learned is that transcribing things manually is error-prone, so it's a huge advantage to get your information onto a computer as early in the process as is practical – computers copy (and transform) information much more reliably than people do !

The other is that judicious use of prior knowledge makes a huge difference. Notice how the key word forever was the break-through in the above solution: the other nine words LGMHTHM could have matched might have been the right thing to try in other cipher-texts, but what I knew about the text before me favoured forever as its meaning in this case.

It may seem frivolous of me to count, as prior knowledge, my guess that we were looking at a substitution cipher (because kids have a chance of cracking that), but this does actually match up with another reality in cryptography: what you know about the sender, intended recipient and their relationship can indeed help you to know what kind of cipher you're dealing with. If you know they are both parties to an elaborate scheme of exchange of secret knowledge, then you may have to cope with a more elaborate cipher; but, if they're mutual strangers, that gives you some clues as to what kind of cipher is in use.

Finally, I hope you can see that a plain substitution cipher is not even remotely secure – if you really need to communicate secretly, you need something more elaborate. Fortunately much more robust ciphers are now readilly available, including ones which make it possible for two parties with no prior knowledge of one another to set up a secure channel between themselves – and the fact that their enemies know about these ciphers doesn't make them insecure, much less make a substitution cipher more secure.

Written by Eddy.