On the care and feeding of make

This started out as the content These are the slides of a talk I first gave to the Bergen Linux Users Group, on 2007/January/25th at 7pm. Since then, it's evolved somewhat; the latest version can be found at .

If you can persuade your user agent (a.k.a. browser) to display this as CSS media-type screen (which is currently what you're using), you'll see the relatively verbose text; in CSS media-type projection (which Opera uses when in full-screen mode) – currently active – you'll see its appearance as a presentation. Alternatively, this page has the presentation style-sheet as an alternate stylesheet (with title Slide-show); your browser may provide you with a way to switch to that, instead of the default one.

I'd like to thank Eira for giving me the excuse to visit Bergen again: I spent most of a year here from late 1994 to late 1995. If any of you remember the short-lived Jugglers' cafe at Sigurds Gate 5 that summer, I was the one serving the food most of the time. However, I normally earn my living by programming computers; I started in my first such job just a few weeks over 25 years ago, at the start of January 1982; I made the transition from ForTran to C in 1988 and from VAX to Unix in about 1990.

Over that quarter century I've had occasion to struggle with a varied assortment of problems, some of which are doubtless familiar to those here today. Over the last decade I've had more than my fair share of experience with build systems, so my chosen topic is the care and feeding of make – one of the various things that I've become intimately involved with during my nearly five years in the Linux^W Unix team at Opera Software.

Improving make

11½ years ago, I started in a new job. Due to illness, I missed most of my first week there, which constituted my overlap with a colleague who'd spent the previous several months coping with one of those ghastly projects where the objectives are coarsely stated, except for the part about and we want it by yesterday. They'd been using recursive make and hadn't read Recursive Make Considered Harmful. Instead, they'd bodged and bashed, in all the industry-standard (but not best practice) ways to make it only be wrong occasionally; and the result was taking unacceptably long to build their product. So Kevan was asked to write a tool which would replace make. Thankfully he did so in python so the result was in fact maintainable by the new recruit who got to take it over when Kevan left (even if I did need to learn python first). I am fairly sure that their problems could have been solved better and in less time if they'd simply read Recursive Make Considered Harmful and followed up on is advice.

Plenty of teams have responded to problems with make by deciding to write something else to replace it. I would contend that nearly all of them would have been better off installing (an up-to-date version of) GNU make on all their build machines and taking the time to read its manual. Far too many developers have learned a small number of simple tricks with make and suppose that any problem they cannot solve with those tricks is a deficiency of make when, if they would but read the manual, they'd find the problem easy to solve. The attempts at improving make that I've seen have, generally, not been as good or as powerful as GNU make. Attempts at making it easier to configure make have tended to solve one or two particular problems – which could have been better solved by using make more competently – while obstructing my access to features of make that I need in order to solve problems they didn't think of.

To take one example, we had a make file at Opera which was generated for us by a (third-party) tool that was meant to make our lives easier. It only had to look after a small bunch of source files used for an example program, but wc reported 201 lines, 476 words and 4947 bytes. After cleaning up by using make properly, I reduced it to 55 lines, 170 words and 1333 bytes. Running bzip2 on the original only compressed it to 1429 bytes. I dread to think how much waste that helper tool would have subjected us to on make files for a large-scale compilation, rather than a noddy example program.

I don't claim to be an expert on the alternatives and helpers available for make (when I've met them I've usually given up in disgust – and solved the problem using make – before getting familiar enough with them to do them justice); nor do I deny that make (even GNU make) has its deficiencies; and the software development toolchain desperately needs something better so that we can make it redundant. I am glad to see that there is ample activity in this area, for example within the Software Carpentry project. However, make (particularly GNU make) is actually very good at its job if you take the time to learn how to use it properly.

One of the central purposes of make is tracking dependencies among resources so that it's possible to know when it's necessary (and possible) to make them. For direct and obvious dependencies, make makes this very easy. However, complex networks of dependencies arise in real software projects; tracking these requires rather more care and effort. What follows is a journey through some of the kinks and knots that can arise, particularly in very large projects; ultimately, I might turn this into a patch to the part of GNU make's manual that I quote below; however, it seems worth-while to share with others before I've got it into the right form for that, if only to give many eyeballs a chance to spot any mistakes.

Dependency generation for make

One configures make by writing plain text files (always better than some impenetrable magic format intelligible only to one's IDE): these specify how to go about turning some source code (and other materials) into a final deliverable product. The make utility has become pretty complex over the three decades or so since it was invented – I shalln't be surprised if GNU make is Turing complete – but the basic idea behind it hasn't changed much: each file you need to generate is created by running a command; that command uses some pre-existing files; when those files are newer than the file created by the command (or the latter doesn't exist) you need to run the command. The file to be created is called a target, the files it's made from are called prerequisites (which may be source files or the targets of other rules); and the part of a make file that says the former depends on the latter, and how to build it when needed, is called a rule.

To save a lot of repetition, make supports some pattern rules, using a % to stand for an arbitrary text to appear both in the target's name and in the name of a prerequisite; and the command in a rule can refer to the names of target, $@, and prerequisites in various ways, so that the command for a pattern rule can specify a general recipe for building targets matching a given pattern from sources matching a related pattern, without the make file needing to be encumbered with information about what exact text the % matched in both cases. Thus, for example, a rule like

%.o: %.c
        $(COMPILE.c) -o $@ -c $< 

specifies (subject to some configuration causing $(COMPILE.c) to expand so something suitable) how to build an object file from a C source file. The space before the command has to be an actual tab character (not some equivalent number of spaces) for make to recognise that line as a command; hopefully, it's styled such that you can distinguish it from the ordinary spaces elsewhere on the example – this page uses that style consistently for tab characters.

However, to take the above rule as example, the generated object file may actually depend on a great deal more than the C source file that's referenced in the command to build the object file. The C source file can, by way of #include directives, pull in code from diverse other files. If any of these changes, even when the C source file hasn't changed, you (potentially) need to regenerate the object file. A make rule is actually allowed to omit the command used for building the target and merely declare that the target depends on some other file: so one approach to this problem is to supplement the above pattern rule (which provides the command we'll need) with a set of dependency declarations, like

this.o: that.h ../other/thing.h

Then, if either that.h or ../other/thing.h changes, make shall know that it need to run the command it gets from its %.o: %.c rule to regenerate this.o; however, on a large project, it gets quite laborious to keep track of all the things that each object file depends on – especially as this changes and varies extensively as the project's various .h files (which can themselves use #include directives) change which of one another they pull in to each compilation. In practice, the only sensible way to handle this is to generate this dependency information automatically. Fortunately, with a little help from one's compiler, this is entirely possible.

The received wisdom

The GNU make manual (4.14: Generating Prerequisites Automatically) has this to say (inter alia):

   The practice we recommend for automatic prerequisite generation is
to have one makefile corresponding to each source file.  For each
source file `NAME.c' there is a makefile `NAME.d' which lists what
files the object file `NAME.o' depends on.  That way only the source
files that have changed need to be rescanned to produce the new
prerequisites.

   Here is the pattern rule to generate a file of prerequisites (i.e.,
a makefile) called `NAME.d' from a C source file called `NAME.c':

     %.d: %.c
             @set -e; rm -f $@; \
              $(CC) -M $(CPPFLAGS) $< > $@.$$$$; \
              sed 's,\($*\)\.o[ :]*,\1.o $@ : ,g' < $@.$$$$ > $@; \
              rm -f $@.$$$$ 

*Note Pattern Rules::, for information on defining pattern rules.  The
`-e' flag to the shell causes it to exit immediately if the `$(CC)'
command (or any other command) fails (exits with a nonzero status).  

   With the GNU C compiler, you may wish to use the `-MM' flag instead
of `-M'.  This omits prerequisites on system header files.  *Note
Options Controlling the Preprocessor: (gcc.info)Preprocessor Options,
for details. 

   The purpose of the `sed' command is to translate (for example): 

     main.o : main.c defs.h 

into:

     main.o main.d : main.c defs.h 

This makes each `.d' file depend on all the source and header files
that the corresponding `.o' file depends on.  `make' then knows it must
regenerate the prerequisites whenever any of the source or header files
changes. 

   Once you've defined the rule to remake the `.d' files, you then use
the `include' directive to read them all in.  *Note Include::.  For
example:

     sources = foo.c bar.c

     include $(sources:.c=.d)

Tidy that up a bit

That's all nice, but anyone who's read Recursive Make Considered Harmful and followed its advice wants to make files in a directory hierarchy under the control of a single make process (albeit configured by files spread throughout the hierarchy). The %.d rule we've got only does the job for a file in the current directory, since the dependency file gcc emits only names the .o file's basename, without any path. (This may have been changed since I first did this, or be a sign that I gave gcc the wrong flags; YMMV.) This is sensible enough: the compiler has no idea where you are planning to put your generated files – alongside your source files, or elsewhere for an out-of-source build. So we'll have to tweak that sed command a bit to take account of this.

The approach above calls for the .d file for every .o file you might need to compile – but, when the .o file doesn't exist, you don't need its .d file at all. So it's actually worth trimming the list of sources to only the ones we need. You'll still need to generate the skipped .d files for your next run of make, but this skips them on the first – and newer versions of gcc give us a way to generate the .d files as a side-effect of compiling their matching .o files, so they should exist (unless someone's manually deleted some) after the first without need to make them separately.

When creating each .d file as above, we can dispense with the $@.$$$$ intermediate file by using a pipe downstream of the compilation step with -M. However, when generating dependency information as a side-effect, you don't have the option of sending it down a pipe; you just pass a filename to -MMD -MF and pick it up after the compilation has generated its .o file. So now you need a separate extension – I use .D – for the direct dependency file, which you subsequently process to produce the .d file. But this is good in any case, since it lets you put that ugly sed command (which I'm going to make a lot uglier later) in one rule, instead of having to repeat it for each type of source file – %.d: %.cpp for C++ code, as well as the rule we already have for plain C source, for example. We still need a rule to produce the .D file from sources in case it gets deleted somehow while the .o file exists, but at least we can isolate the hairy sed.

As an incidental bonus, generating .d as a side-effect of compiling its .o ensures that we'll regenerate the former whenever we regenerate the latter, so we don't need to declare the former to depend on everything the latter depends on – which would have forced wanton regeneration of the .d during make file parsing, when the .o exists. The .d is out of date, but it tells us we need to re-build the .o, which is all we needed to know.

So, now we need:

%.o: CPPFLAGS += -MMD -MF $(@:.o=.D)

%.d: %.D
       sed 's!.*$(@F:.d=.o) *: *!$(@:.d=.o): !g' $< > $@
$(GENROOT)/%.D: $(SRCROOT)/%.c
       $(CC) -M $(CPPFLAGS) $< >$@
$(GENROOT)/%.D: $(SRCROOT)/%.cpp
       $(CXX) -M $(CPPFLAGS) $< >$@

object := $(patsubst $(SRCROOT)/%.cpp, $(GENROOT)/%.o, \
        $(sources:$(SRCROOT)/%.c=$(GENROOT)/%.o))
gotobj := $(wildcard $(object))
include $(gotobj:.o=.d)

When files go missing

OK, so you've done a build. When you make changes, only the files that need it get re-compiled. Everything is nice. So, time to update your source tree and find out what your colleagues have broken today; cvs up. But what if one of the header files gets removed ? Obviously, whoever removed it has probably also removed all the #include directives that referenced it, so we should have no problem. However, our .d files say they and their .o files depend on the lost header file. We run make and it pulls in our .d files; then it checks to see if any of them need to be regenerated, due to changes in things they depend on. It wants to rebuild any that need it and re-start loading its make files with up-to-date versions. But it finds that there has been a change in something some .d files depend on, and it can't regenerate them because they depend on something that's gone missing. So make barfs.

Of course, you can remove the offending .d files and regenerate them; they won't depend on the missing header after that, so it'll all be fine. However, you do need to remove them, since they're what's saying they depend on the missing files. This is fine in a tiny project, but not in a large-scale project. So we need to hack our .d files a bit more. The file tells make about things that, if they change, need us to recreate our target. So we still need to respond to changes in any of these that does exist, but we need to be able to ignore any that have gone missing. Helpfully, make provides a function to do that: $(wildcard ...) expands to just the files that exist, among those listed as its parameters. So we just need to wrap the list of files our .d and .o depend on in that. The list of files may be spread over many lines, using \ on the end of each to continue onto the next; so we put $(wildcard after the : that follows the names of our two targets, and a final ) on the line that doesn't end in a \.

But hang on – what if some of the things we depend on can be regenerated ? They may go missing when we make clean and we need to exercise some rule to bring them back. If we try to compile our .o, or regenerate our .D, without them the compiler's going to fail. So we still need to regenerate them – which means we need our targets to depend on them even if they don't exist. So we need to leave them out of our $(wildcard ...), which means we need to close the parenthesis before, and re-open the wild-card after, each file we know how to regenerate. So set up a variable, GENSRC, that lists them. Then we can hack that sed command to close parenthesis before each of these (and the actual source file), re-opening the wildcard after:

SPACE := $(EMPTY) # a single space character
OrGenSrc := $(subst $(SPACE),,$(GENSRC:%=\|%))

%.d: %.D
        sed \
    -e 's|^.*$(@F:.d=)\..*: *|$(@:.d=.o): $$(wildcard |' \
    -e 's!\([^ ]*$(*F)\.cp*$(OrGenSrc)\) *!) \1 $$(wildcard !g' \
    -e 's|\([^\\]\)$$|\1 )|' -e 's|\$$(wildcard *) *||g' $< > $@ 

Pretty it ain't, but it works. (But the second -e's parameter can get quite long – FreeBSD's sed silently truncates expressions over 1066 bytes long, it would appear, which forced me to restructure this expression, making it even uglier.) Note that the last bit is taking out any stray instances of empty $(wildcard ) that've resulted from all our hackery; they did no harm, but we may as well clean them away.

Autogenerating directories

While we're at it, it's nice to keep one's generated files separate from one's source tree; for example, you can then switch between debug builds (under one generated directory) and optimised ones (under another) without having to make clean and re-build everything in between. So the .o files and .d files should go somewhere other than where the source files are. You might remember I put paths on the source and dependency files earlier:

$(GENROOT)/%.D: $(SRCROOT)/%.c
       $(CC) -M $(CPPFLAGS) $> >$@
$(GENROOT)/%.D: $(SRCROOT)/%.cpp
       $(CXX) -M $(CPPFLAGS) $> >$@ 

Your rules for .o and other generated files need similar work, of course. That's all nice and simple, but it can't possibly work unless something is going to make all the directories it calls for – which means a directory tree under $(GENROOT) mirroring the one under $(SRCROOT). You could do that brutally by running

(cd $(SRCROOT); find . -type d -print0) | \
(cd $(GENROOT); xargs -0r mkdir -p)

but there may be revision-control subdirectories, documentation, test data or any manner of other cruft in your source tree – it'd be cleaner to only generate the directories we need. So naturally we want to use dependencies in make to drive that for us.

The obvious approach is for each output file to depend on the directory it needs to go in; we then have a mkdir rule for each directory, and we're done. However, this doesn't work the way you'd like: it forces you to remake everything all the time. Last time you ran make, you created the directory and added a bunch of files in it. But adding a file to a directory changes the modified time of the directory. So now the directory is more recent than all but at most one of the files in it: each of which depends on the directory, so now thinks it needs to be re-built – which shall make all the other files out of date relative to the directory again. That won't do.

Autogenerating needed directories

So, instead, make each output file depend on a .exists touch-file in its own directory; then the rule for the touch file makes the directory before touching its target. It just remains to make everything depend on suitable touch-files. While we're at it, if we make each touch-file rule depend on its parent directory's touch-file rule, we'll be able to skip the -p flag to mkdir. The remaining problem is simply to construct the rules we need to say that each file depends on its directory's touch file; which turns out to be a bit fiddly.

%/.exists:
        mkdir $(@D) && touch $@

define ObjDirTemplate
$(addprefix $1, .o .D): $(dir $1).exists
endef
$(foreach D, $(object:%.o=%), $(eval $(call ObjDirTemplate,$D)))

define DirDirTemplate
$1 $1/.exists: $$(if $$(wildcard $(dir $1)),,$(dir $1).exists)
endef
$(foreach D, $(patsubst %/,%,$(sort $(dir $(object)))), \
        $(eval $(call DirDirTemplate,$D))) 

(and you really don't want to see the evil bodge I needed, to achieve equivalent results in versions of make too old to support the $(eval ...) construct relied on here). Note that the $(sort …) is here used to remove duplicates (which this function does, though this might not be obvious from its name); we don't care about the order of entries, but we've mapped each object files's name to the directory it's in, so some directories may be duplicated. Since I make each .d file from a .D file in the same directory (which thus already exists before we try to make the .d), I only need to make the latter depend on the directory's existence.

Packaging objects into archives

In a big project, one can have so many .o files (e.g. > 2300) that the command-line to ar ends up being too long for the shell (> 100 kB). This is particularly apt to happen when doing out-of-source builds because, even when using relative paths, your $(GENDIR) is apt to add quite a lot (49 bytes in my case, adding another >100 kB) to the length of each object file's name – and to the library file's name.

$(GENDIR)/libhuge.a: $(object)
        $(AR) $(ARFLAGS) $@ $^ 

ends up being over 240 kB of text. One solution is to use make's magic library file syntax:

$(GENDIR)/libhuge.a: $(object:%=$(GENDIR)/libhuge.a(%))
        $(RANLIB) $@
$(GENDIR)/libhuge.a(%): %
        $(AR) $(ARFLAGS) $@ $< 

(and actually the second rule here is one of make's built-ins, so we could skip it) which solves the command-line length problem – but it's disgustingly slow, even when leaving out the r flag from $(ARFLAGS) (which is why we need to run $(RANLIB) once we're done). If you've got some sensible way to split up your $(object) list into smaller chunks, you can add each chunk to the library as a single command to get a solution part-way between the two above; but it's pretty ghastly to implement and still fairly slow.

Packaging objects into archives quickly

My colleague Joakim Bengtsson deserves credit for the following inspired piece of hackery:

$(GENDIR)/libhuge.a: libhuge-objtmpdir $(object)
        $(foreach O,$(?:libhuge-objtmpdir=),$(shell ln -f $O $(@D)/objtmp/)) \
                cd $(@D)/objtmp; \
                $(AR) $(ARFLAGS) ../$(@F) $(notdir $(?:libhuge-objtmpdir=)) || failed=yes; \
                cd ..; rm -fr objtmp; [ -z "$$failed" ]

.PHONY: libhuge-objtmpdir
libhuge-objtmpdir: $(GENDIR)/.exists
        rm -fr $(<D)/objtmp; mkdir $(<D)/objtmp 

This populates a temporary directory with hard links to the object files so that we can run the command in that directory, with no path component on the file names (except ../ on the library itself); the failed variable is used to propagate any failure of $(AR) past the tidy-up that follows it. Note that the $(foreach ...) is evaluated by make in the course of preparing to run the command; this causes make to run one ln -f process per (changed) object file, collecting the resulting (empty) output and including it as part of the command it executes (where it gets ignored, because it was empty). The uses of $(?:libhuge-objtmpdir=), by the way, just list the prerequisites excluding the phony libhuge-objtmpdir; so they could just say $(object), but it'd get messier if our list of objects were more complex (which it was, in the real make files on which this is based). Even with this hack, our command-line to ar was over 41½ kB.

One problem I still haven't solved for libraries is what happens when a .o file goes away. If a .c or .cpp file is removed from your $(sources) – e.g. by a version-control update, so you don't necessarily know this has happened – your existing libhuge.a still contains the corresponding .o even though it no longer should. Since some of the code from the removed source file has usually moved elsewhere (indeed, the source file may simply have been renamed), this can lead to duplicate symbol errors from your linker. The removed source file may have referenced symbols no longer supplied by other files that were updated when it was removed; this can lead to missing symbol errors from the linker. One would need to run ar t on the archive, identify entries not in $(notdir $(object)) and run ar d on each of these. The brave can do this manually but it's easier to just rm libhuge.a when these errors show up and let make regenerate it.

Conclusion

GNU make is mighty.

Unix is user friendly, it's just picky about who its friends are. — Tollef Fog Heen

Addenda

Assorted folk had favourable things to say about various other tools to do similar jobs. Dag subsequently sent me a link to a page about CMake (which is a helper to generate make-files, akin to automake). I might add further links to other related tools here, if I find them interesting.

Make deals poorly with commands that

and the combination is particularly painful. The latter leaves files out-of-date relative to what they depend on, so the command is always run, even when it isn't needed. The former shall run the command once per file if one makes each generated file a target of a rule that runs the command; so one has to have a .PHONY rule on which they all depend, which runs the command. It would be nice to support only write on change, but doing so would require an extra time-stame on each file; make would need to keep track of both the last time the file was changed and the latest time at which the file was known to be up-to-date.

See Also

Other folk have written about GNU make, too.


Valid CSSValid HTML 4.01 Written by Eddy.