Clobber Builds Part 4 - Fixing Other Clobber Causes

2015-03-13 by Mike Shal, tagged as clobber, make, mozilla, tup

In Clobber Builds Part 3, we looked at how changing the build configuration can result in broken builds, even if dependencies are perfect. Here we'll look into addressing these issues in the build system.

Clobber Builds - Recap

Part 1 of the series examined how missing dependencies in the Makefile result in clobbers. In Part 2, we showed how to modify the build system to eliminate the possibility of missing dependencies entirely. Part 3 looked at how changing the build configuration, such as by changing command-lines or adding/removing build targets, could require clobbers. Now we'll try to fix those permanently.

Build System Phases

When you boil it down, a build system really has two main phases: constructing a DAG, and walking the DAG to update targets. In make, constructing the DAG consists of reading the current set of Makefiles to build a DAG in memory. While walking through the DAG, make essentially converts the graph into a list of commands to execute. Conceptually it looks something like this:

Last time we looked at two different cases of how changing a Makefile can result in a broken build. The first was the common case of changing a command line, and finding that make doesn't update the files with the new command unless you jump through some hoops. Other build systems fixed this particular case, but still fail when we broaden the scope to include other build configuration changes. The example we used was changing the Makefile to remove a build target, and finding that the stale file still exists after an update.

Both of these issues actually stem from the same core problem - that make only constructs the DAG from the current set of Makefiles. It has no knowledge of what files a previous build created or how they were created. So when we change the command to create a file like "foo.o", all that make knows is that it has a rule to create foo.o, and that foo.o is still newer than foo.c, so it doesn't need to be recompiled. Similarly, when the rule to create foo.o is removed, make doesn't know that the file no longer belongs in the filesystem. For all it knows, foo.o was created by the developer, and is worth keeping around.

Fixing the DAG Construction Phase

To solve the core problem, we need to change the DAG construction phase so that it can use more than just the current build rules as information — we also need to use the previous DAG as a comparison. The overall picture looks something like this:

The diff function can look at the two DAGs and see if any nodes or links were added or removed, and take appropriate action. For example, if a node is removed (like foo.o), we may want to remove that file from the filesystem. Or if a command-line is changed, we will want to re-execute that command.

To see how this plays out, lets take a look at the same two examples from Part 3. First up is the simple case of changing command lines. Here are the two DAGs:

Makefile A	Makefile B

Diff

With both DAGs, it's pretty easy to see that the command-line changed between A and B. The build system can use this information to know that it needs to execute the command with the new flags, so the object file will be recompiled even if foo.c hasn't changed. (Note: I'm using tup's notation of including command nodes in the DAG, which helps with diffing and commands with multiple outputs).

The second example involved removing a target. Here we'll just start with the Makefile that has two targets and move to the Makefile with one target:

Two-target Makefile	One-target Makefile

Diff

Diffing these two graphs would show that bar.o, bar.c, and the command node to compile bar.c are no longer present. The build system can use this information in order to know that the output file, bar.o, should be removed. We can also clean up any dependency information associated with the command node. Note that we definitely do not want to remove bar.c from the filesystem, since that was created by the developer. The build system can distinguish between the two since all files created by the build system will have incoming links, while all files created by a developer have no incoming links.

Why not just diff a list of commands and output files?

You might be wondering why we need to diff the DAGs instead of diffing just a list of commands or a list of output files. Such an approach would probably solve the above two examples, but is insufficient for a general purpose system. An example that wouldn't fit this model is if we just delete a link:

Generated File Dependency	Remove Generated File Dependency

Diff

In this case, we may want to re-execute the gcc command in order to see if it still builds correctly without the dependency on foo.h. It's possible that file was unused, or will be picked up elsewhere in the include path. If the command does still use the foo.h generated from the python script, then the build system's automatic dependency checker will detect this and flag it as an error. The developer should never worry about making structural changes to the DAG — they must all be picked up by the diffing engine and the appropriate files updated or removed as appropriate.

Isn't it inefficient to diff the entire DAG?

Yes, but it isn't necessary to do so in order to get the benefits. Just like it's inefficient to build and load the entire DAG every time you run the build system, you wouldn't want to load the entire current or previous DAGs to compare them. Instead, the build system only needs to parse and generate a DAG fragment for the build configuration files that have changed, and diff those against the previous fragments. In this way, the build system can efficiently and correctly bring the file-system up to date.

Conclusion

This concludes the main part of the clobber build series, where we looked into some of the underlying reasons for why they are needed with make and similar build systems, and how they can be fixed for good. It is not meant to be an exhaustive look at all the causes of clobbers, but rather provide a different way of thinking about them as understandable and solvable problems. With a properly functioning build system, there is no need to ever delete everything and start over from scratch. At some point I may want to look into how much clobber builds cost Mozilla in terms of developer and machine time, but I'll save that for another post!

Mike Shal's Blog