Many bugs in programs are buffer over-runs
: many security exploits are based on these, as they
enable an attacker to deliver code of their own chosing to be executed by the
program. There really is no excuse for writing code which makes the relevant
mistake - I intend to disect the mistake and lay bare its workings here.
In general, a buffer
is a portion of memory reserved by the computer
program in which to hold some data, usually temporarily. The problem comes when
the data to be stored in that piece of memory is bigger than the piece of memory
- in which case, if the program doesn't notice this, the data fills up the
reserved piece of memory and over-writes whatever was written in the next
portions of memory along. This is called a buffer over-run. It may damage
other data being used by the program, which may lead to the program crashing;
but, in the worst case, it over-writes the program itself, so that the surplus
data gets executed in place of the programmer's instructions. This last case is
the usual one for security exploits, as the attacker supplies some data
which are, in fact, instructions for the machine, in machine code.
This problem is widely understood among programmers, as is its natural solution - allocate a big enough buffer once you know how big your data will be, and check that you aren't over-running your space when there's any doubt. If a buffer has enough space to hold 80 characters of text, don't try to write more than 80 characters into it; and, wherever possible, find out how much data you'll be processing and allocate a big enough buffer for it.
All the same, one can make mistakes: so it's a good idea to ensure that, in the event of an over-run, the memory you'll be trampling on isn't the program itself but something less critical - corrupting data and crashing is bad, but surrendering control of the machine to an attacker is worse. It happens, in fact, that the discipline needed to achieve this brings with it some further benefits. To illustrate, let me show you some buggy code written in C:
static int tohex(char ch) { if (isdigit(ch)) return ch - '0'; if (islower(ch)) return ch - 'a' + 10; if (isupper(ch)) return ch - 'A' + 10; return -1; } #define BUFSIZ 100 extern size_t urldecode(char *text) { /* returns the lengh of text it managed to decode */ static char buffer[BUFSIZ]; size_t r = 0, w = 0; /* for Read (from text) and Write (to buffer) */ while (text[r] && w < BUFSIZ) { char ch = text[r++]; switch (ch) { case '%': /* over-simplified buggy %-decoding */ ch = tohex(text[r++]); /* sequence point */ ch = 16 * ch + tohex(text[r++]); break; case '+': ch = ' '; break; default: break; } buffer[w++] = ch; } memcpy(text, buffer, w); if (w < r) text[w] = '\0'; return r; } #undef BUFSIZ
This is the classic buffer over-run scenario: a function which processes an array of data, needs some temporary workspace for it and assumes that the data will always be less than some fixed size (here, 100 bytes - the size of array buffer). If the green code to check against it were omitted (a not uncommon mistake), the code could over-run its buffer: as it is, it may lose some data. There's several incidental bugs in its caricature of %-decoding, so don't go using it ! [Furthermore, replacing each use of buffer with text in the above, while removing the green code and the call to memcpy, would be perfectly sensible - w never exceeds r and each read from the latter happens before the write to the former - the buffer isn't really needed. I would say that makes the above a contrived example: but I've seen plenty of code as clumsy as the above, and the Code Red bug seems to depend on such a bug in a URL-decoding routine.] There are, however, situations where one genuinely needs a transitory buffer in a function, via which to perform some transformation on the memory containing the supplied data; when there's some strong enough reason to be confident we know how bit that needs to be, a fixed-size array is more convenient (and efficient) than allocating and freeing buffers of the right size every time.
The author has assumed that the text, once urldecoded, will never be more
than 100 characters long. This may well be usually
true (albeit, these
days, many URLs are so long the assumption is unwise: an URL can be arbitrarily
long) but if an attacker can arrange for the decoded text to exceed 100
characters, the data would (without the green check)
over-run our buffer array. As it is, the defence against unexpected
size obliges the function to return the length of the decoded portion of the
original, while over-writing some of the original with its decoded form; which
will complicate life for its caller, but mainly because my chosen example is a
caricature, in which a mistake has been patched up clumsily.
Now, the two functions are declared to be static and
extern, respectively. This means that urldecode is accessible
to code in other source files of the program (it has external linkage
),
while tohex is only accessible to the code in urldecode's
source file (it has local linkage
). Both functions will appear, when the
program is running, in the run-time loaded image of the program. When a
function is called, the program allocates some space, on a so-called
stack
, for the local variables of the function - text,
r, w and ch - which are known as automatic
variables. The stack is a separate portion of memory from the one in which the
program itself is stored: and, if two threads of execution are using these
functions in parallel with one another (in a single running process) each has
its own stack, hence its own local variables, though they share the memory which
contains the program itself.
However, declaring buffer to be static (which I've done in red to highlight my disapproval of it, which I'll come to) tells the compiler to arrange for this array to be in part of the same memory that holds the program itself - the executable instructions into which tohex and urldecode get compiled. This has several consequences. The reason many programmers use static arrays in this way is that there is a small performance advantage, especially if one wants a larger buffer than just 100 bytes. There are, however, some costs.
Forcing the array to be part of the code memory means that there will only be one array, so different threads of execution may trample on one another's data if they try to execute urldecode at the same time as one another. However, most programmers assume their code is never going to be run in a multi-threaded context - with the result that they write things which can't be re-used in such a context. Even then, using static means that the program takes up the 100 bytes of space reserved for the array even when it's not using the function that requires the array: 100 bytes may not seem like much, but the widespread use of static arrays in a program will bloat up the memory it needs to load. With an automatic array, in contrast, the array only takes up memory when the program is executing the code which uses the array.
A modern computer uses a cache
of memory into which it copies memory
that it accesses a lot, such as the memory with the instructions on it making up
our functions. Our array is part of that memory, and our code writes data into
the array. When that happens, the computer has to synchronise the cached memory
with the memory of which it's a copy. This wouldn't happen if the buffer were,
like the other local variables, automatic
- which is what it'd be if we
left out the red use of static. So, even
without threading, the code above forces gratuitous caching activity when it
writes to its static array: if the array were automatic, the code
memory would only ever be read, never written to; only memory on the stack would
get modified.
Furthermore, because the array is static, the memory adjacent to it, on which any over-run will trample, is the memory which holds the code which is being executed. If this gets over-written, the changed version may be executed - the excess data may get used as instructions to the computer, over-riding the program. This is the classic approach to buffer over-run attacks. For comparison, an automatic array is held in memory which holds the stack of local data for functions which have been invoked but are waiting for another to complete before they can resume doing their computation. Sufficient memory is set aside for the automatic array as the program calls urldecode, and relinquished when the function returns. The memory adjacent to this is local data of other functions: messing with it may well cause the program to crash. However, over-running the array won't change the memory which contains code - at least, not not without exploiting parts of the stack (if any) which are used to control what code gets executed. Thus, bloating code with static arrays not only hampers threading and cacheing, it also raises the stakes if memory over-run happens.
Written by Eddy.