This started out as the content These are the slides of a talk I first gave to the Bergen Linux Users Group, on 2007/January/25th at 7pm. Since then, it's evolved somewhat; the latest version can be found at .
If you can persuade your user agent (a.k.a. browser) to display this as CSS
media-type screen (which is currently what you're
using), you'll see the relatively verbose text; in CSS media-type
projection (which Opera uses when in
full-screen mode) – currently active
– you'll see its appearance as a
presentation. Alternatively, this page has the
presentation style-sheet as an alternate stylesheet (with
title Slide-show
); your browser may provide you with a way to switch to
that, instead of the default one.
I'd like to thank Eira for giving me the excuse to visit Bergen again: I spent most of a year here from late 1994 to late 1995. If any of you remember the short-lived Jugglers' cafe at Sigurds Gate 5 that summer, I was the one serving the food most of the time. However, I normally earn my living by programming computers; I started in my first such job just a few weeks over 25 years ago, at the start of January 1982; I made the transition from ForTran to C in 1988 and from VAX to Unix in about 1990.
Over that quarter century I've had occasion to struggle with a varied assortment of problems, some of which are doubtless familiar to those here today. Over the last decade I've had more than my fair share of experience with build systems, so my chosen topic is the care and feeding of make – one of the various things that I've become intimately involved with during my nearly five years in the Linux^W Unix team at Opera Software.
Improvingmake
Recursive Make Considered Harmfulalso helps
11½ years ago, I started in a new job. Due
to illness, I missed most of my first week there, which constituted my overlap
with a colleague who'd spent the previous several months coping with one of
those ghastly projects where the objectives are coarsely stated, except for the
part about and we want it by yesterday
. They'd been using recursive make
and hadn't
read Recursive Make
Considered Harmful
. Instead, they'd bodged and bashed, in all the
industry-standard (but not best practice) ways to make it only be wrong
occasionally; and the result was taking unacceptably long to build their
product. So Kevan was asked to write a tool which would
replace make. Thankfully he did so
in python so the result was in
fact maintainable by the new recruit who got to take it over when Kevan left
(even if I did need to learn python first). I am fairly sure that
their problems could have been solved better and in less time if they'd simply
read Recursive Make Considered Harmful
and followed up on is advice.
Plenty of teams have responded to problems with make by deciding
to write something else to replace it. I would contend that nearly all of them
would have been better off installing (an up-to-date version of)
GNU make on all their build machines and taking the time to read its
manual. Far too many developers have learned a small number of simple tricks
with make and suppose that any problem they cannot solve with those
tricks is a deficiency of make when, if they would but read the
manual, they'd find the problem easy to solve. The attempts
at improving
make that I've seen have, generally, not been as
good or as powerful as GNU make. Attempts at making it easier
to configure make have tended to solve one or two particular problems
– which could have been better solved by using make more
competently – while obstructing my access to features of make
that I need in order to solve problems they didn't think of.
To take one example, we had a make file at Opera which was
generated for us by a (third-party) tool that was meant to make our lives
easier. It only had to look after a small bunch of source files used for an
example program, but wc reported 201 lines, 476 words and 4947
bytes. After cleaning up by using make properly, I reduced it to 55
lines, 170 words and 1333 bytes. Running bzip2 on the original only
compressed it to 1429 bytes. I dread to think how much waste that helper
tool would have subjected us to on make files for a large-scale compilation,
rather than a noddy example program.
I don't claim to be an expert on the alternatives and helpers available for make (when I've met them I've usually given up in disgust – and solved the problem using make – before getting familiar enough with them to do them justice); nor do I deny that make (even GNU make) has its deficiencies; and the software development toolchain desperately needs something better so that we can make it redundant. I am glad to see that there is ample activity in this area, for example within the Software Carpentry project. However, make (particularly GNU make) is actually very good at its job if you take the time to learn how to use it properly.
One of the central purposes of make is tracking dependencies among resources so that it's possible to know when it's necessary (and possible) to make them. For direct and obvious dependencies, make makes this very easy. However, complex networks of dependencies arise in real software projects; tracking these requires rather more care and effort. What follows is a journey through some of the kinks and knots that can arise, particularly in very large projects; ultimately, I might turn this into a patch to the part of GNU make's manual that I quote below; however, it seems worth-while to share with others before I've got it into the right form for that, if only to give many eyeballs a chance to spot any mistakes.
One configures make by writing plain
text files (always better than some impenetrable magic format intelligible only
to one's IDE): these specify how to go about turning some source code (and other
materials) into a final deliverable product. The make utility has
become pretty complex over the three decades or so since it was invented –
I shalln't be surprised if GNU make is Turing complete – but
the basic idea behind it hasn't changed much: each file you need to generate is
created by running a command; that command uses some pre-existing files; when
those files are newer than the file created by the command (or the latter
doesn't exist) you need to run the command. The file to be created is called
a target
, the files it's made from are called prerequisites
(which
may be source
files or the targets of other rules); and the part of a
make file that says the former depends on the latter, and how to build it when
needed, is called a rule
.
To save a lot of repetition, make supports some pattern
rules, using a %
to stand for an arbitrary text to appear both in
the target's name and in the name of a prerequisite; and the command in a rule
can refer to the names of target, $@
, and prerequisites in various
ways, so that the command for a pattern rule can specify a general recipe for
building targets matching a given pattern from sources matching a related
pattern, without the make file needing to be encumbered with information about
what exact text the %
matched in both cases. Thus, for example, a
rule like
%.o: %.c
$(COMPILE.c) -o $@ -c $<
specifies (subject to some configuration
causing $(COMPILE.c)
to expand so something suitable) how to build
an object file from a C source file. The space before the command has to be an
actual tab character (not some equivalent number of spaces) for make
to recognise that line as a command; hopefully, it's styled such that you can
distinguish it from the ordinary spaces elsewhere on the example – this
page uses that style consistently for tab characters.
However, to take the above rule as example, the generated object file may
actually depend on a great deal more than the C source file that's referenced in
the command to build the object file. The C source file can, by way
of #include
directives, pull in code from diverse other files. If
any of these changes, even when the C source file hasn't changed, you
(potentially) need to regenerate the object file. A make rule is
actually allowed to omit the command used for building the target and merely
declare that the target depends on some other file: so one approach to this
problem is to supplement the above pattern rule (which provides the command
we'll need) with a set of dependency declarations,
like
this.o: that.h ../other/thing.h
Then, if either that.h
or ../other/thing.h changes, make shall know that it need
to run the command it gets from its %.o: %.c
rule to
regenerate this.o; however, on a large project, it gets quite
laborious to keep track of all the things that each object file depends on
– especially as this changes and varies extensively as the project's
various .h files (which can themselves use #include
directives) change which of one another they pull in to each compilation. In
practice, the only sensible way to handle this is to generate this dependency
information automatically. Fortunately, with a little help from one's compiler,
this is entirely possible.
The GNU make manual (4.14: Generating Prerequisites Automatically) has this to say (inter alia):
The practice we recommend for automatic prerequisite generation is to have one makefile corresponding to each source file. For each source file `NAME.c' there is a makefile `NAME.d' which lists what files the object file `NAME.o' depends on. That way only the source files that have changed need to be rescanned to produce the new prerequisites. Here is the pattern rule to generate a file of prerequisites (i.e., a makefile) called `NAME.d' from a C source file called `NAME.c': %.d: %.c @set -e; rm -f $@; \ $(CC) -M $(CPPFLAGS) $< > $@.$$$$; \ sed 's,\($*\)\.o[ :]*,\1.o $@ : ,g' < $@.$$$$ > $@; \ rm -f $@.$$$$ *Note Pattern Rules::, for information on defining pattern rules. The `-e' flag to the shell causes it to exit immediately if the `$(CC)' command (or any other command) fails (exits with a nonzero status). With the GNU C compiler, you may wish to use the `-MM' flag instead of `-M'. This omits prerequisites on system header files. *Note Options Controlling the Preprocessor: (gcc.info)Preprocessor Options, for details. The purpose of the `sed' command is to translate (for example): main.o : main.c defs.h into: main.o main.d : main.c defs.h This makes each `.d' file depend on all the source and header files that the corresponding `.o' file depends on. `make' then knows it must regenerate the prerequisites whenever any of the source or header files changes. Once you've defined the rule to remake the `.d' files, you then use the `include' directive to read them all in. *Note Include::. For example: sources = foo.c bar.c include $(sources:.c=.d)
That's all nice, but anyone who's read Recursive Make Considered Harmful and followed its advice wants to make files in a directory hierarchy under the control of a single make process (albeit configured by files spread throughout the hierarchy). The %.d rule we've got only does the job for a file in the current directory, since the dependency file gcc emits only names the .o file's basename, without any path. (This may have been changed since I first did this, or be a sign that I gave gcc the wrong flags; YMMV.) This is sensible enough: the compiler has no idea where you are planning to put your generated files – alongside your source files, or elsewhere for an out-of-source build. So we'll have to tweak that sed command a bit to take account of this.
The approach above calls for the .d file for every .o file you might need to compile – but, when the .o file doesn't exist, you don't need its .d file at all. So it's actually worth trimming the list of sources to only the ones we need. You'll still need to generate the skipped .d files for your next run of make, but this skips them on the first – and newer versions of gcc give us a way to generate the .d files as a side-effect of compiling their matching .o files, so they should exist (unless someone's manually deleted some) after the first without need to make them separately.
When creating each .d file as above, we can dispense with
the $@.$$$$
intermediate file by using a pipe downstream of the
compilation step with -M. However, when generating dependency
information as a side-effect, you don't have the option of sending it down a
pipe; you just pass a filename to -MMD -MF and pick it up after the
compilation has generated its .o file. So now you need a separate
extension – I use .D – for the direct dependency file,
which you subsequently process to produce the .d file. But this is
good in any case, since it lets you put that ugly sed command (which
I'm going to make a lot uglier later) in one rule, instead of having to repeat
it for each type of source file – %.d: %.cpp
for C++ code, as
well as the rule we already have for plain C source, for example. We still need
a rule to produce the .D file from sources in case it gets deleted
somehow while the .o file exists, but at least we can isolate the
hairy sed.
As an incidental bonus, generating .d as a side-effect of compiling its .o ensures that we'll regenerate the former whenever we regenerate the latter, so we don't need to declare the former to depend on everything the latter depends on – which would have forced wanton regeneration of the .d during make file parsing, when the .o exists. The .d is out of date, but it tells us we need to re-build the .o, which is all we needed to know.
So, now we need:
%.o: CPPFLAGS += -MMD -MF $(@:.o=.D) %.d: %.D sed 's!.*$(@F:.d=.o) *: *!$(@:.d=.o): !g' $< > $@ $(GENROOT)/%.D: $(SRCROOT)/%.c $(CC) -M $(CPPFLAGS) $< >$@ $(GENROOT)/%.D: $(SRCROOT)/%.cpp $(CXX) -M $(CPPFLAGS) $< >$@ object := $(patsubst $(SRCROOT)/%.cpp, $(GENROOT)/%.o, \ $(sources:$(SRCROOT)/%.c=$(GENROOT)/%.o)) gotobj := $(wildcard $(object)) include $(gotobj:.o=.d)
#include
of it GENSRC
variable $(wildcard ...)
and sed to the rescue: OK, so you've done a build. When you make changes,
only the files that need it get re-compiled. Everything is nice. So, time to
update your source tree and find out what your colleagues have broken
today; cvs
up. But what if one of the header files gets removed ? Obviously,
whoever removed it has probably also removed all the #include
directives that referenced it, so we should have no problem. However,
our .d files say they and their .o files depend on the
lost header file. We run make and it pulls in our .d
files; then it checks to see if any of them need to be regenerated, due to
changes in things they depend on. It wants to rebuild any that need it and
re-start loading its make files with up-to-date versions. But it finds that
there has been a change in something some .d files depend on, and it
can't regenerate them because they depend on something that's gone
missing. So make barfs.
Of course, you can remove the offending .d files and regenerate
them; they won't depend on the missing header after that, so it'll all be
fine. However, you do need to remove them, since they're what's saying they
depend on the missing files. This is fine in a tiny project, but not in a
large-scale project. So we need to hack our .d files a bit
more. The file tells make about things that, if they change, need us
to recreate our target. So we still need to respond to changes in any of these
that does exist, but we need to be able to ignore any that have gone
missing. Helpfully, make provides a function to do
that: $(wildcard ...)
expands to just the files that exist, among
those listed as its parameters. So we just need to wrap the list of files
our .d and .o depend on in that. The list of files may be
spread over many lines, using \
on the end of each to continue onto
the next; so we put $(wildcard
after the :
that
follows the names of our two targets, and a final )
on the line
that doesn't end in a \
.
But hang on – what if some of the things we depend on can be
regenerated ? They may go missing when we make clean and we
need to exercise some rule to bring them back. If we try to compile
our .o, or regenerate our .D, without them the compiler's
going to fail. So we still need to regenerate them – which means we need
our targets to depend on them even if they don't exist. So we need to leave
them out of our $(wildcard ...)
, which means we need to close the
parenthesis before, and re-open the wild-card after, each file we know how to
regenerate. So set up a variable, GENSRC
, that lists them. Then
we can hack that sed command to close parenthesis before each of
these (and the actual source file), re-opening the wildcard after:
SPACE := $(EMPTY) # a single space character
OrGenSrc := $(subst $(SPACE),,$(GENSRC:%=\|%))
%.d: %.D
sed \
-e 's|^.*$(@F:.d=)\..*: *|$(@:.d=.o): $$(wildcard |' \
-e 's!\([^ ]*$(*F)\.cp*$(OrGenSrc)\) *!) \1 $$(wildcard !g' \
-e 's|\([^\\]\)$$|\1 )|' -e 's|\$$(wildcard *) *||g' $< > $@
Pretty it ain't, but it works. (But the
second -e's parameter can get quite long –
FreeBSD's sed silently truncates expressions over 1066 bytes long, it
would appear, which forced me to restructure this expression, making it even
uglier.) Note that the last bit is taking out any stray instances of
empty $(wildcard )
that've resulted from all our hackery; they did
no harm, but we may as well clean them away.
SRCROOT
and GENROOT
to roots of source and
generated directory trees While we're at it, it's nice to keep one's generated files separate from one's source tree; for example, you can then switch between debug builds (under one generated directory) and optimised ones (under another) without having to make clean and re-build everything in between. So the .o files and .d files should go somewhere other than where the source files are. You might remember I put paths on the source and dependency files earlier:
$(GENROOT)/%.D: $(SRCROOT)/%.c $(CC) -M $(CPPFLAGS) $> >$@ $(GENROOT)/%.D: $(SRCROOT)/%.cpp $(CXX) -M $(CPPFLAGS) $> >$@
$(GENROOT)
mirroring that under $(SRCROOT)
Your rules for .o and other generated files
need similar work, of course. That's all nice and simple, but it can't possibly
work unless something is going to make all the directories it calls for –
which means a directory tree under $(GENROOT)
mirroring the one
under $(SRCROOT)
. You could do that brutally by running
(cd $(SRCROOT); find . -type d -print0) | \ (cd $(GENROOT); xargs -0r mkdir -p)
but there may be revision-control subdirectories, documentation, test data or any manner of other cruft in your source tree – it'd be cleaner to only generate the directories we need. So naturally we want to use dependencies in make to drive that for us.
The obvious approach is for each output file to depend on the directory it needs to go in; we then have a mkdir rule for each directory, and we're done. However, this doesn't work the way you'd like: it forces you to remake everything all the time. Last time you ran make, you created the directory and added a bunch of files in it. But adding a file to a directory changes the modified time of the directory. So now the directory is more recent than all but at most one of the files in it: each of which depends on the directory, so now thinks it needs to be re-built – which shall make all the other files out of date relative to the directory again. That won't do.
So, instead, make each output file depend on a .exists touch-file in its own directory; then the rule for the touch file makes the directory before touching its target. It just remains to make everything depend on suitable touch-files. While we're at it, if we make each touch-file rule depend on its parent directory's touch-file rule, we'll be able to skip the -p flag to mkdir. The remaining problem is simply to construct the rules we need to say that each file depends on its directory's touch file; which turns out to be a bit fiddly.
%/.exists:
mkdir $(@D) && touch $@
define ObjDirTemplate
$(addprefix $1, .o .D): $(dir $1).exists
endef
$(foreach D, $(object:%.o=%), $(eval $(call ObjDirTemplate,$D)))
define DirDirTemplate
$1 $1/.exists: $$(if $$(wildcard $(dir $1)),,$(dir $1).exists)
endef
$(foreach D, $(patsubst %/,%,$(sort $(dir $(object)))), \
$(eval $(call DirDirTemplate,$D)))
(and you really don't want to see the evil bodge I needed, to
achieve equivalent results in versions of make too old to support
the $(eval ...)
construct relied on here). Note that
the $(sort …) is here used to remove duplicates (which this
function does, though this might not be obvious from its name); we don't care
about the order of entries, but we've mapped each object files's name to the
directory it's in, so some directories may be duplicated. Since I make
each .d file from a .D file in the same directory (which
thus already exists before we try to make the .d), I only need to
make the latter depend on the directory's existence.
In a big project, one can have so many .o
files (e.g. > 2300) that the command-line to ar ends up being too
long for the shell (> 100 kB). This is particularly apt to happen when doing
out-of-source builds because, even when using relative paths,
your $(GENDIR)
is apt to add quite a lot (49 bytes in my case,
adding another >100 kB) to the length of each object file's name – and
to the library file's name.
$(GENDIR)/libhuge.a: $(object)
$(AR) $(ARFLAGS) $@ $^
ends up being over 240 kB of text. One solution is to use make's magic library file syntax:
$(GENDIR)/libhuge.a: $(object:%=$(GENDIR)/libhuge.a(%)) $(RANLIB) $@ $(GENDIR)/libhuge.a(%): % $(AR) $(ARFLAGS) $@ $<
(and actually the second rule here is one
of make's built-ins, so we could skip it) which solves the
command-line length problem – but it's disgustingly slow, even when
leaving out the r flag from $(ARFLAGS)
(which is why we
need to run $(RANLIB)
once we're done). If you've got some
sensible way to split up your $(object)
list into smaller chunks,
you can add each chunk to the library as a single command to get a solution
part-way between the two above; but it's pretty ghastly to implement and still
fairly slow.
My colleague Joakim Bengtsson deserves credit for the following inspired piece of hackery:
$(GENDIR)/libhuge.a: libhuge-objtmpdir $(object) $(foreach O,$(?:libhuge-objtmpdir=),$(shell ln -f $O $(@D)/objtmp/)) \ cd $(@D)/objtmp; \ $(AR) $(ARFLAGS) ../$(@F) $(notdir $(?:libhuge-objtmpdir=)) || failed=yes; \ cd ..; rm -fr objtmp; [ -z "$$failed" ] .PHONY: libhuge-objtmpdir libhuge-objtmpdir: $(GENDIR)/.exists rm -fr $(<D)/objtmp; mkdir $(<D)/objtmp
This populates a temporary directory with hard links to
the object files so that we can run the command in that directory, with no path
component on the file names (except ../ on the library itself);
the failed variable is used to propagate any failure
of $(AR) past the tidy-up that follows it. Note that
the $(foreach ...)
is evaluated by make in the course of
preparing to run the command; this causes make to run one ln
-f process per (changed) object file, collecting the resulting (empty)
output and including it as part of the command it executes (where it gets
ignored, because it was empty). The uses of $(?:libhuge-objtmpdir=),
by the way, just list the prerequisites excluding the
phony libhuge-objtmpdir; so they could just say $(object),
but it'd get messier if our list of objects were more complex (which it was, in
the real make files on which this is based). Even with this hack, our
command-line to ar was over 41½ kB.
One problem I still haven't solved for libraries is what
happens when a .o file goes away. If a .c
or .cpp file is removed from your $(sources)
–
e.g. by a version-control update, so you don't necessarily know this has
happened – your existing libhuge.a still contains the
corresponding .o even though it no longer should. Since some of the
code from the removed source file has usually moved elsewhere (indeed, the
source file may simply have been renamed), this can lead to duplicate
symbol
errors from your linker. The removed source file may have referenced
symbols no longer supplied by other files that were updated when it was removed;
this can lead to missing symbol
errors from the linker. One would need
to run ar t on the archive, identify entries not in $(notdir
$(object)) and run ar d on each of these. The brave can do
this manually but it's easier to just rm libhuge.a when these errors
show up and let make regenerate it.
GNU make is mighty.
Unix is user friendly, it's just picky about who its friends are. — Tollef Fog Heen
Assorted folk had favourable things to say about various other tools to do similar jobs. Dag subsequently sent me a link to a page about CMake (which is a helper to generate make-files, akin to automake). I might add further links to other related tools here, if I find them interesting.
Make deals poorly with commands that
and the combination is particularly painful. The latter
leaves files out-of-date relative to what they depend on, so the command is
always run, even when it isn't needed. The former shall run the command once
per file if one makes each generated file a target of a rule that runs the
command; so one has to have a .PHONY
rule on which they all depend,
which runs the command. It would be nice to support only write on
change
, but doing so would require an extra time-stame on each
file; make would need to keep track of both the last time the file
was changed and the latest time at which the file was known to be
up-to-date.
Other folk have written about GNU make, too.