Building mozilla-central with tup

2013-08-07 by Mike Shal, tagged as mozilla, tup

Building mozilla-central with make is slow, and in many cases broken (requiring a clobber build). Recently, there was a blog post discussing how to build parts of mozilla-central with Ninja. Ninja is much faster than make, but in many cases it is still broken (requiring clobber builds). In this post, we'll look at building mozilla-central with tup, which is even faster still, and does not clobber. This is done using the same build configuration that make uses, but without using make at all.

Why you should care

Tup has a few features that make it different from other build systems:

Each sub-process is examined and its dependencies are tracked automatically. These dependencies are cross-checked against the build description, and tup reports errors if any inconsistencies are found. As a result, build maintainers don't have to worry about getting the dependencies 100% correct. Get them wrong? Get an error message. No more clobbers because dependencies are missing.
Tup keeps track of what it built, so it knows when to remove stale files. Delete something from a jar.mn? It gets removed from dist/bin. Delete the whole moz.build file? All files that were built from that directory are removed, and anything that was dependent on any of it is re-built. No more clobbers because some junk was left around from a previous build. You can actually refactor the build configuration and rely on the build system to do the right thing.
Tup only parses the build description when it needs to. Change a moz.build file? Only that file is re-parsed, not the whole tree. No-op builds are nearly instantaneous (0.002s with tup's file monitor). If the size of m-c doubles tomorrow, it will still be 0.002s.
Tup only loads what it needs based on what files actually changed. Editing a single cpp file and building will start the compiler in a few milliseconds, no matter how big the project is.

These features are built-in to tup - there is no specific logic for anything in mozilla-central. For more details on that side of things, or as a cure for insomnia, feel free to read Build System Rules and Algorithms. The only mozilla-central specific bits are the python scripts to convert the build configuration into something tup understands. In short, if a 'git/hg pull; tup upd' ever fails for an incremental build, but works with a clobber build, it is a bug in tup.

Some examples

No-op build time

This is where tup's algorithms make it look like cheating.

$ time make -f client.mk
...
real	0m28.784s
user	0m32.344s
sys	0m7.004s

$ time tup upd
[ tup ] [0.000s] No filesystem scan - monitor is running.
[ tup ] [0.000s] Reading in new environment variables...
[ tup ] [0.000s] No Tupfiles to parse.
[ tup ] [0.000s] No files to delete.
[ tup ] [0.000s] No commands to execute.
[ tup ] [0.000s] Updated.

real	0m0.002s
user	0m0.000s
sys	0m0.000s

Single .cpp file

Now something useful. The turn-around time is dominated by the linker, which sucks. The good news is that tup starts compiling in 5ms (from a top-level build!), so we know if the compilation succeeded very quickly (depending on which file we choose, of course).

$ touch xpcom/base/AvailableMemoryTracker.cpp
$ time make -f client.mk
...
real	0m58.669s
user	0m57.968s
sys	0m10.328s

$ touch xpcom/base/AvailableMemoryTracker.cpp
$ time tup upd
[ tup ] [0.000s] No filesystem scan - monitor is running.
[ tup ] [0.000s] Reading in new environment variables...
[ tup ] [0.000s] No Tupfiles to parse.
[ tup ] [0.000s] No files to delete.
[ tup ] [0.005s] Executing Commands...
 1) [0.235s] xpcom/base: C++ AvailableMemoryTracker.cpp
...
 5) [40.112s] toolkit/library: SHLIB ../../obj-x86_64-unknown-linux-gnu/toolkit/library/libxul.so
...
real	0m47.961s
user	0m35.192s
sys	0m17.540s

You can see we lose some time from doing the dependency detection, and the fact that I haven't been able to get the FUSE developers to listen about the read()/write() performance issues of FUSE.

Single .jsm file

Now we'll try touching a single .jsm file:

$ touch services/metrics/storage.jsm
objdir/services/metrics$ time make -f client.mk
...
(Preprocessor storage.jsm)
(Preprocessor Metrics.jsm)
...
real	0m0.128s
user	0m0.084s
sys	0m0.016s

$ touch services/metrics/storage.jsm
$ time tup upd
[ tup ] [0.000s] No filesystem scan - monitor is running.
[ tup ] [0.000s] Reading in new environment variables...
[ tup ] [0.000s] No Tupfiles to parse.
[ tup ] [0.000s] No files to delete.
[ tup ] [0.001s] Executing Commands...
 1) [0.033s] services/metrics: Preprocessor storage.jsm -> ../../obj-x86_64-unknown-linux-gnu/dist/bin/modules/services/metrics/storage.jsm
 2) [0.038s] services/metrics: Preprocessor Metrics.jsm -> ../../obj-x86_64-unknown-linux-gnu/dist/bin/modules/Metrics.jsm
 3) [0.058s] services/healthreport: Preprocessor HealthReport.jsm -> ../../obj-x86_64-unknown-linux-gnu/dist/bin/modules/HealthReport.jsm
 [   ] 100%
[ tup ] [0.088s] Updated.

real	0m0.092s
user	0m0.088s
sys	0m0.036s

Here we used make in a subdirectory, and it's still slower than tup doing a top-level build. It's also wrong, because make didn't build HealthReport.jsm. In comparison, tup's automatic dependency detection lets it rebuild Metrics.jsm and HealthReport.jsm, without resorting to hacks like those found in services/metrics/Makefile.in. It's faster, more correct, and you don't have to think about which subdirectory to build in.

Edit a jar.mn file

In this test, I've edited browser/locales/jar.mn to remove the aboutDialog.dtd from the manifest.

$ remove aboutDialog.dtd from jar.mn
$ time make -f client.mk
...
real	0m28.468s
user	0m32.388s
sys	0m7.012s

Fortunately the top-level case, though slow, works correctly and aboutDialog.dtd no longer shows up under dist/bin. This is accomplished by completely blowing away dist/bin and re-populating it. Let's try to speed it up with a sub-directory build:

$ remove aboutDialog.dtd from jar.mn
objdir/browser/locales$ time make
...
real	0m0.065s
user	0m0.040s
sys	0m0.012s

Much faster! But unfortunately aboutDialog.dtd still exists in dist/bin, and the About button still works. We might be inclined to check-in our change and break everything. Let's try with tup:

$ remove aboutDialog.dtd from jar.mn
$ time tup upd
...
[ tup ] [0.000s] Parsing Tupfiles...
 1) [0.176s] browser/locales
...
 1) rm: obj-x86_64-unknown-linux-gnu/dist/bin/browser/chrome/en-US/locale/browser/aboutDialog.dtd
...
 1) [0.050s] browser/locales: JarMaker.py jar.mn
processing jar.mn
 2) [0.010s] Generate ./obj-x86_64-unknown-linux-gnu/dist/bin/browser/chrome/en-US.manifest
...

real	0m0.298s
user	0m0.200s
sys	0m0.036s

Here most of the time spent using the python script (including pymake!) to parse the build description (176ms). The good news is it's still a very quick turn-around time, and the stale file was removed from the output, and now the About button is broken. Hooray, it took us less than a second to learn not to remove that line from jar.mn, rather than wasting all day debugging a broken build.

Edit a moz.build file

Now for something truly dangerous in make - editing a build file. Here we'll edit a moz.build file to remove/add an EXTRA_JS_MODULES. When we remove it from the build file, it should disappear from the build directory, and when we add it back in, it should reappear. Sounds simple, but make has no concept of a Makefile changing, so it doesn't support this case at all. Instead, it must be hacked around.

$ remove AlarmService.jsm from dom/alarm/moz.build
$ time make -f client.mk
...
real	1m2.373s
user	1m2.140s
sys	0m10.696s

This "works" in that AlarmService.jsm is now gone from dist/bin, but that's only because dist/bin gets blown away. Additionally, libxul is re-linked for some reason. While building in just obj/dom/alarm is tempting, it is also broken since it leaves AlarmService.jsm in dist/bin when it should be removed. Now compare to tup:

$ remove AlarmService.jsm from dom/alarm/moz.build
$ time tup upd
...
[ tup ] [0.000s] Parsing Tupfiles...
 1) [0.136s] dom/alarm
...
 1) rm: obj-x86_64-unknown-linux-gnu/dist/bin/modules/AlarmService.jsm
...

real	0m0.189s
user	0m0.120s
sys	0m0.024s

$ add AlarmService.jsm back into dom/alarm/moz.build
$ time tup upd
...
[ tup ] [0.000s] Parsing Tupfiles...
 1) [0.136s] dom/alarm
...
[ tup ] [0.139s] No files to delete.
...
 1) [0.002s] dom/alarm: CP AlarmService.jsm -> ../../obj-x86_64-unknown-linux-gnu/dist/bin/modules/AlarmService.jsm
...

real	0m0.196s
user	0m0.120s
sys	0m0.024s

With tup, we can add/remove files like this and see the result in ~200ms. Again most of the time is spent parsing the build description (more on this below).

Building mozilla-central with tup

Mozilla's build configuration, for just a stock build (check out the tree, run make), consists of a lot of information. Here's a rough summary:

Configuration data from 'configure', stored in config.status (a python script), and autoconf.mk (for make). This is currently unchanged by tup.
moz.build files that define variables (eg: CPP_SOURCES, EXPORTS, etc)
Makefile.in files that have some things yet to be converted to moz.build, as well as random custom rules and targets.
Makefiles for NSS in security/nss/, which have a different format and rules.
Makefiles for NSPR in nsprpub/, which have yet another format.
A few GYP files for third party projects that are built from gyp.

I'm sure I'm missing some, but you get the idea - there are a lot of data sources. Tup has a parser that can process plain Makefiles (those that just set variables or have $(FOO) references, but not rules), as well as a Lua parser for more complicated cases. Unfortunately, neither of those really help with this eclectic set of files (technically the Lua parser could handle them, but it'd be a mess). Fortunately, it also has the ability to shell out to arbitrary commands to generate rules, which is handy here because moz.build and gyp are in python, and pymake is in the tree to parse the many random Makefiles. Parsing the data with tup looks like:

Parsing with Tup

The tup rules are what would normally be present in a Tupfile, but this complicated setup allows us to mostly share the existing configuration data with make. The primary exception are the actual rules, which include rules.mk and the custom rules present in the Makefile.ins. The custom rules are being removed as part of the moz.build conversion, so this picture will hopefully be simplified in the future.

Tup invokes tup.py whenever it needs to parse a directory and generates a new set of rules. The upside to this is that we already have python scripts available to parse moz.build files, Makefiles, and gyp files. The downside is that tup executes the script separately for each directory, which is slow when changing tup.py (since everything gets re-parsed). Ideally the data would be in a format that tup could read natively, but that isn't really a viable solution at the moment. Instead tup will probably need to support a better way to execute external scripts to generate rules, while still accurately checking dependencies and making incremental builds (changing a single moz.build file) not parse the whole tree.

Things that need to be fixed

There are a number of things that don't work as well as I'd like:

There is a Tupfile in every directory that gets built, which has the same two lines of code. I think this should be easy to fix, but right now it looks really lame.
The tup.py script is executed for every directory. This is great for incremental changes to a single moz.build file, but sucks for parsing all of them.
The tup.py script is a feature of tup that doesn't work in Windows yet. Obviously, that needs to be fixed.
IPDL generation is pretty bad in tup, since tup needs to parse the .ipdl files just to see what output files will be created. For everything else in the tree, we know what file is created based on the filename (eg: foo.cpp becomes foo.o), but for IPDL this is not true. We could have foo.ipdl become mozilla/dom/foo.h, while bar.ipdl will become mozilla/ipc/bar.h. The directory names are only known once you read the IPDL file itself, which is frustrating for tup's output verification process.
Tup has trouble on Mac OSX 10.8. Apple has trouble responding to mailing lists.

Things that still suck even with tup

Tup removes the build-system bottleneck, but that doesn't solve the whole problem. Here are some other things that will still get in the way of a good development cycle:

Linking a huge library like libxul is a major productivity killer.
The buildid generation is outside of tup (it's elided in these examples, but it causes a bunch of stuff to be built).
configure is completely outside of tup, so changes to mozilla-config.h still rebuild everything. It'd be nice to use tup.config for this, like in gittup.org - changing a configuration option causes only the relevent build files to be re-parsed, and the relevant C files to be recompiled.

As you can see from the list of issues, the tup build of m-c currently only has any hope of working in Linux at the moment, and the non-native python parsing has a number of issues that need to be addressed. Still, we are able to build a working firefox executable with the existing build configuration by only using configure, tup, and a little python.

Mike Shal's Blog

Building mozilla-central with tup

Why you should care

Some examples

No-op build time

Single .cpp file

Single .jsm file

Edit a jar.mn file

Edit a moz.build file

Building mozilla-central with tup

Things that need to be fixed

Things that still suck even with tup

About

Categories

Archive

Links

Quotes