Building Firefox with Tup

Terminology

Build system: The underlying build engine, such as make, SCons, or tup.
Build description: The Mozilla-written files that describe what to build. This consists of a 'front-end' and a 'back-end'.
Front-end: The per-directory files that list what to build. For example, files like dom/base/Makefile.in are in this category.
Back-end: The make-specific files that allow the front-end build description to be processed by make. For example, config/rules.mk is in this category.

Background

Firefox is a large program with a recursive-make style build. This means that incremental builds, where a developer makes a few edits to source files and runs 'make', are slow. Additionally, changes to the build description (such as by adding compiler flags or moving/renaming source files) may generate incorrect output and force a 'make clean' to get a working build. Both of these aspects waste developer time. As a result, developers take shortcuts by running make in sub-directories to try to get the speed to a reasonable level. Effectively, this pushes some of the responsibility of the build system onto the developer. Since this practice is error prone, some attempts have been made to automate this with smartmake. The fact that the build system must be wrapped with a program to execute it correctly is further evidence that it is not up to the task.

Attempts have been made to port the mozilla-central repository to other build systems, such as cmake. Porting to a build-system generator such as cmake or gyp may produce some benefit in maintainability for the front-end build description, but this is a matter of taste. If a generator such as cmake is used to generate Makefiles and run 'make', this will not address the scalability and correctness issues described above, but merely change their appearance.

Additionally, work is underway to try to convert the recursive-make build system to a non-recursive make system by replacing the back-end rules files with a set of Makefiles that include everything in a single make invocation. This effort will produce some speed improvement, but not enough. It will fix some issues with incremental builds, but not all. Incremental builds will still be slower than is possible, and 'make clean' will still be necessary to solve errors in the build system.

These unequivocal statements about alternatives can be made because the problem with the build system is not with Mozilla's Makefiles, nor with the recursive usage of make. Rather, the problem is with make itself. These issues are discussed in the Build System Rules and Algorithms paper. As an executive summary:

Any build system based on make will be slow when scaled to projects the size of mozilla-central, due to make's underlying O(n) algorithms. An optimal build system will scale logarithmically.
Any build system based on make will fail to produce a correct output for an incremental build for certain changesets, because there is no automated dependency checking. An optimal build system performing an incremental build from version A to version B will always produce the same output as a full build of version B.
Developers will try to take short-cuts to avoid these issues, wasting more time and brain power as a result. The build system should be instantaneous and correct, so developers can focus on actual work.

These three statements correspond to the three "rules" in Build System Rules and Algorithms.

About tup

The tup build system is an alternative to make that is designed to adhere to the three rules of build systems. Specifically, incremental builds generally start building within milliseconds. File accesses by the sub-processes (gcc, python scripts, etc) are checked to ensure that they don't conflict and that dependencies are not underspecified. As a result, developers can just run 'tup upd' after making changes to the source files, and see the result quickly. It currently runs in Linux, OSX, and Windows, and is still under active development.

Firefox and Tup - A First Attempt

As a first attempt of getting tup to build firefox, I went with a slightly backward approach. I did a normal make build and saved the output, then tried to reproduce these commands using Tupfiles instead of Makefiles. In essence, this ignores the autoconf layer, cross-platform support, and configuration options for the time being. All it can do is reproduce the single build that a standard "../configure; make" does on my Linux machine. This is not a real port -- it is merely an attempt to answer these questions:

Is there anything in mozilla-central that will prevent tup from building it?
Will incremental builds with tup scale to a project the size of firefox?

I believe the answer to the first question is "no", since tup is able to build a firefox binary that runs. It does not build everything in mozilla-central yet, just enough to get the browser up and running. And I believe the answer to the second question is "yes":

make null build: 2m28s
tup null build (with file monitor): 0.005s
tup null build (scanning mode): 0.728s

Your times may vary - this is on a ~6-year old machine (Pentium-D processor era).

Here is an example of changing a cpp file in the firefox binary:

$ touch browser/app/nsBrowserApp.cpp
$ time tup upd
[ tup ] [0.000s] No filesystem scan - monitor is running.
[ tup ] [0.000s] Reading in new environment variables...
[ tup ] [0.001s] No Tupfiles to parse.
[ tup ] [0.001s] No files to delete.
[ tup ] [0.001s] Executing Commands...
 1) [0.987s] obj-default/browser/app: C++ nsBrowserApp.cpp
 2) [0.011s] obj-default/browser/app: LD -Ur built-in.o
 3) [0.020s] obj-default/browser: LD -Ur built-in.o
 4) [0.124s] obj-default/dist/bin: Link firefox
 [    ] 100%
[ tup ] [1.174s] Updated.

real	0m1.197s
user	0m1.004s
sys	0m0.149s

Some notes:

The built-in.o generation does not happen in a normal make build - it was an attempt to see if it would help linking libxul.so faster by partially linking each directory individually (similar to how the Linux kernel build works).
The generated firefox (and support libraries / chrome) runs, but for some reason I have to run as 'LD_LIBRARY_PATH=. ./firefox' rather than just './firefox' - I suspect I missed some important ninker flag, though I tried various permutations of -rpath-link and -L/-l and have come up empty so far.

Tup is able to start compiling the correct .cpp file after only 1 millisecond. The majority of the time in the build is taken up by the compilation itself, rather than by the build system. In contrast, doing the same operation with make would take at least the 2m28s null-build time (or force the developer to execute the correct sequence of 'make's in the right sub-directories), in addition to the time it takes to do the actual work.

For a longer example, here is a file that goes into libxul.so:

$ touch dom/base/Navigator.cpp
$ time tup upd
[ tup ] [0.000s] No filesystem scan - monitor is running.
[ tup ] [0.000s] Reading in new environment variables...
[ tup ] [0.001s] No Tupfiles to parse.
[ tup ] [0.001s] No files to delete.
[ tup ] [0.002s] Executing Commands...
 1) [4.559s] obj-default/dom/base: C++ Navigator.cpp
 2) [0.265s] obj-default/dom/base: LD -Ur built-in.o
 3) [0.967s] obj-default/dom: LD -Ur built-in.o
 4) [22.990s] obj-default/dist/lib: Link libxul.so
 5) [0.143s] obj-default/xpcom/stub: Link libxpcom.so
 6) [0.013s] obj-default/dist/bin: cp ../../xpcom/stub/libxpcom.so -> libxpcom.so
 7) [0.545s] obj-default/dist/bin: cp ../lib/libxul.so -> libxul.so
 [ETA=<1s] 100%
[ tup ] [30.438s] Updated.

real	0m30.491s
user	0m23.522s
sys	0m5.954s

Again tup is able to start building in a few milliseconds, and the majority of the time is taken up by the sub-processes (here, the compiler and linker) rather than the build system. Note that this is slightly slower than executing the linker outside of tup, since tup wraps the sub-processes to check for file accesses. Linking libxul.so directly in the command-line takes 19.444s, in comparison to the 22.990s that it takes in tup. So there is some overhead, but this is in comparison to a hypothetical oracle build system that instantly knows what to build and does not need to check dependencies. Such a program does not exist, but tup is quite close to this ideal. Although a total turn-around time of 30 seconds is excessive, this example should point out that efforts to improve this would be better directed at speeding up the linking (eg: using gold), rather than improving the build system.

An interesting exercise at this point is to see how big mozilla-central could grow, while still maintaining this level of near-instantaneous feedback with tup. As a test, I cloned the mozilla-central repository 10 times and built them all in the same tup hierarchy. This is effectively a single project that is 10x as large as firefox, with the caveat that not everything is linked together (so the sub-process times are the same). We can repeat the same nsBrowserApp.cpp example using one of the repositories:

$ ls
mozilla-central-tup-1   mozilla-central-tup-3  mozilla-central-tup-6  mozilla-central-tup-9
mozilla-central-tup-10  mozilla-central-tup-4  mozilla-central-tup-7  obj-default
mozilla-central-tup-2   mozilla-central-tup-5  mozilla-central-tup-8
$ touch mozilla-central-tup-1/browser/app/nsBrowserApp.cpp
$ time tup upd
[ tup ] [0.000s] No filesystem scan - monitor is running.
[ tup ] [0.000s] Reading in new environment variables...
[ tup ] [0.001s] No Tupfiles to parse.
[ tup ] [0.001s] No files to delete.
[ tup ] [0.002s] Executing Commands...
 1) [1.001s] obj-default/mozilla-central-tup-1/browser/app: C++ nsBrowserApp.cpp
 2) [0.012s] obj-default/mozilla-central-tup-1/browser/app: LD -Ur built-in.o
 3) [0.022s] obj-default/mozilla-central-tup-1/browser: LD -Ur built-in.o
 4) [0.115s] obj-default/mozilla-central-tup-1/dist/bin: Link firefox
 [    ] 100%
[ tup ] [1.180s] Updated.

real	0m1.199s
user	0m1.003s
sys	0m0.162s

This 1.199s runtime is almost identical to the 1.197s runtime of a single firefox build. This shows that tup can scale to large projects - the time to build a small portion of the program (nsBrowserApp.cpp, in this example) is independent of the size of the whole project. In contrast, an O(n) build system would run approximately 10 times slower for this case. For fun, we can try to build the nsBrowserApp.cpp in two different firefox's at once. My processor has two cores, so tup defaults to running two jobs in parallel since it is always parallel safe:

$ touch mozilla-central-tup-5/browser/app/nsBrowserApp.cpp mozilla-central-tup-8/browser/app/nsBrowserApp.cpp
$ time tup upd
[ tup ] [0.000s] No filesystem scan - monitor is running.
[ tup ] [0.000s] Reading in new environment variables...
[ tup ] [0.001s] No Tupfiles to parse.
[ tup ] [0.001s] No files to delete.
[ tup ] [0.002s] Executing Commands...
 1) [1.053s] obj-default/mozilla-central-tup-8/browser/app: C++ nsBrowserApp.cpp
 2) [1.075s] obj-default/mozilla-central-tup-5/browser/app: C++ nsBrowserApp.cpp
 3) [0.009s] obj-default/mozilla-central-tup-5/browser/app: LD -Ur built-in.o
 4) [0.012s] obj-default/mozilla-central-tup-8/browser/app: LD -Ur built-in.o
 5) [0.021s] obj-default/mozilla-central-tup-8/browser: LD -Ur built-in.o
 6) [0.026s] obj-default/mozilla-central-tup-5/browser: LD -Ur built-in.o
 7) [0.121s] obj-default/mozilla-central-tup-5/dist/bin: Link firefox
 8) [0.127s] obj-default/mozilla-central-tup-8/dist/bin: Link firefox
 [ETA=<1s ] 100%
[ tup ] [1.302s] Updated.

real	0m1.325s
user	0m1.998s
sys	0m0.349s

Here, tup is able to cherry-pick the correct 8 commands to run (out of 117,380 total commands used to build firefox 10 times over) and execute them in parallel in under 2 seconds. This means even as Mozilla grows, tup is able to scale along with it to provide developers with quick feedback.

Test Repository

The tup build is available in this repository: https://bitbucket.org/marfey/releases-mozilla-central

Note that this only has any hope of building on Linux, and even then only if ./configure on your machine produces something comparable to my machine. Also note that many of the files (such as the Tuprules.tup files scattered about) were mostly generated from a script, so they don't really show how a real port to tup would be structured. The following steps will try to build it, and assume you already have tup installed:

$ hg clone https://bitbucket.org/marfey/releases-mozilla-central mozilla-central-tup
$ cd mozilla-central-tup
$ mkdir obj-default
$ touch obj-default/tup.config
$ tup init
$ tup upd

(This my first foray into hg, so please let me know if something is wrong with the repository).

The initial build will take some time, but afterward you should be able to modify files anywhere in the tree and build with just 'tup upd'. The only new files in here that may be of any help for a real port are parts of the top-level Tuprules.tup file, build.tup, and xpt.tup. The rest is mostly script generated or lazily cut'n paste from the make build. See the next section for how a real port should be done.

Firefox and Tup - A Real Port

In order to perform an actual port of mozilla-central to tup, I would recommend the following approach:

Keep the existing make-based front-end and back-end as-is. This will allow developers to continue to use what they are comfortable with, and also develop on platforms where tup itself has not yet been ported.
Implement a tup back-end under config/ to sit side-by-side with the existing make back-end. The front-end Makefile.in files look to be well-maintained and very concise, and should be re-usable by a tup back-end. Tup can parse the syntax of most data-driven Makefiles, and where this is not possible it is able to run external scripts (such as python) to supplement its parsing abilities.
Modify configure to generate Tupfiles instead of Makefiles if a certain option is given. Essentially, at configure time, a developer can choose to build with either tup or make.

Tup still requires a Tupfile in each directory where files are created by the build system. In many cases, I suspect this will be a stub file that just includes the default rules and runs a script. Two possibilities are:

include_rules
include Makefile.in
include $(MOZ_ROOT)/config/build.tup

Or if python is required to parse the Makefile.in (or Makefile), the per-directory Tupfiles may look like:

include_rules
run $(MOZ_ROOT)/config/buildtup.py

Where "build.tup" and/or "buildtup.py" would constitute the tup back-end. An example of this approach can be found in a tup port of the Linux kernel used in gittup.org, where a python script is used to parse most of the kernel's data-driven Makefiles and generate tup rules.

This approach should have the following characteristics:

The transition can happen slowly, so that developers can try out tup if they like, but can still use make in the meantime
There is no "flag day" where everyone must switch over at once.
Most of the build configuration is in the front-end, so for example adding a new EXPORT in dom/base/Makefile.in would work for both make and tup.
Changes to the back-end would require changes in both systems, at least until a decision is made to stick with one build system or the other.

The primary benefits to switching to tup are:

The incremental build speed is extremely fast, and generally limited by the tools rather than by tup.
Tup checks dependencies, and is parallel-safe
Changing a command-line (adding to CXXFLAGS, for example) causes the changed commands to be re-executed.
Renaming/moving/deleting files, Tupfiles, and whole directories are completely supported for incremental builds. Refactor away!

Some potential downsides:

Tup is currently only supported on Linux, Windows, and OSX.
Maintaining two back-ends may be annoying if there is work planned to restructure the build. (Restructuring the front-end shouldn't matter).