Sometimes, in software, one must do something rather strange to solve a problem. Ingenious solutions to problems tend to get called hacks; and they tend to be heinous in one way or another – which is part of their charm. (One of the finest examples of the genre is Duff's device; the perpetrator's ambivalence about it is a standard feature of any truly heinous hack.) So here is an account of one such hack I perpetrated, to assist my friends in the Qt project's documentation team.
We have a tool called QDoc which scans source code to generate
documentation; in so doing, it needs to make enough sense of the software and
the comments embedded in it to work out what data types and routines make up the
public interface and to associate each with a comment that describes it, warning
us of any mismatch (public interface that isn't documented, or documentation for
a public interface that it can't find). In order to do that it needs
to know where to find standard header
files that declare system-provided facilities used
by programs in the language (C++) of the code. We'd recently reworked QDoc to
have clang make sense of the
code (saving us the maintenance of a parser
for the language), with some
code of ours hooked into clang to do the job we want. This all went
well until we found that clang needed (back in the old version we
were then using) to be told where to find some system header files; and we
couldn't (entirely) rely on a fixed location for that. A frustrated colleague
turned to me for suggestions and I gave it some thought.
C++ is based on an older language called C and shares, with it,
a preprocessor
that lets one embed, in code, directives that get replaced
in predictable ways; this allows one to (among other things) save repeating the
same code in many places. One of these mechanisms is the #include
directive used (usually at the top of a file) to refer to header files that
declare facilities used by the code in the file. The other is
the #define directive, which specifies a word (technically, an
identifier; it is composed of letters, digits and underscores and need not
appear in any dictionary of any language any folk speak) and a text to be used
to replace that word (optionally using, in the replacement, some texts in
parenthesis after the word) wherever it appears. When such replacement is
performed, if the word being replaced appears in the replacement text, it is
(crucially) not replaced again; uses of the word in the replacement
text remain in the text finally seen by the program that makes sense of the
source code – ordinarily, to turn it into the machine instructions that
make up the actual executable you'll run; but, in QDoc's case, to generate
documentation from it. The program that does this is known as a compiler
(which strictly produces object code
; another program, the linker
combines a bunch of that into the final executable file of machine
instructions). In particular, there are certain special words that the compiler
implicitly defines (as if with a #define directive), with
well-defined standard meanings. One of these is __FILE__, which
expands to (a string literal encoding) the path-name (i.e. the full statement of
where the file is on the system, along with its name) of the file in whose text
the compiler gets to replace this word.
Now, QDoc itself is a program written in C++, so is compiled to produce an
executable program to be run, that in turns compiles
other source code to
generate documentation (rather than an executable program). The program that
compiles QDoc does know where the system headers are; so I just needed to
arrange for QDoc to sneak that information off it when it got compiled.
My devious idea was to
arrange for the source of QDoc to define a carefully chosen word – the
name of a function provided by a system header file in the location we needed to
know – to expand to some text that included the word __FILE__
so that the compiler, when compiling QDoc, would see __FILE__ in that
header and expand it to the path-name of that header, from which I could then
extract (with some fairly simple code) the directory name we needed. The core
of the hack looks like this:
#define setlocale locale_file_name_for_clang_qdoc() { \ static char data[] = __FILE__; \ return data; \ } \ char *setlocale #include <locale.h> #undef setlocale
It's reolying on the fact that the header file locale.h contains a declaration that looks at least somewhat like (this is the POSIX standard's form of the declaration, so actual implementations have to look enough like it to work the way it's specified to)
char *setlocale(int category, const char *locale);
which, thanks to my #define, will actually get read by the compiler as (give or take some spaces being arranged differently, in ways that the compiler ignores)
char *locale_file_name_for_clang_qdoc() { static char data[] = __FILE__; return data; } char *setlocale(int category, const char *locale);
So the original declaration of setlocale remains as it was
(beacuse we don't re-replace the word when it appears in its own replacement)
and I've inserted a definition of an (inline) function into the header file;
since the compiler sees this as part of the text of the header file, it replaces
the word __FILE__ with the path-name of locale.h; my
little function then returns that path-name to its caller. The other part of my
code calls this function, locale_file_name_for_clang_qdoc(), and can
duly find the last directory separator (which is followed
by locale.h
, the file's name within the directory) and know
that the part before that is the directory name we needed.
We initially tried this with two other functions, declared in a different header file, but the trick relies on the function only ever being declared in its header and never used or even redeclared elsewhere in the header; unfortunately, the first two functions we tried got an extra mention, aside from the one we needed. So we needed a singly-declared (and never used) function; and it had, furthermore, to return a pointer to chracter data (the char * return type above; although const char * would have done, too). It took some trial-and-error before one of my colleagues found a function that satisfied these criteria on all the systems where we needed it to work.
We later had to suppress this on Microsoft's systems (Microsoft's compiler, MSVC, had (quite understandable) reservations about defining a function in a context that's really meant to just declare them) and, later still, we were able to retire it when a new version of clang turned out to know how to find those headers for itself (which it might have been able to do before, for that matter; we may just have failed to select the right options to specify for it). It remains a heinous hack of which I'm mildly proud, all the same.
Written by Eddy.