Waf vs GNU Make — Incremental Build
I don't post in English but since I couldn't google this, I've decided to share the results in a common
language. Since the Waf's community is quite small, it seemed almost pointless to write in Polish.
Waf is a promising build tool. It uses checksums to determine if anything needs updating. I like this feature because when working with Git and switching branches back and forth, I often get different timestamps for files that didn't really change.
The problem with checksums is that it takes some time to compute them, so GNU Make, in theory, could be faster here... I decided to check that. The aim was to measure how long it takes to determine that nothing changed and nothing needs to be rebuilt.
I've generated a sample C++ project. The generator was extremely stupid. I've played a little with parameters: number of files, their size and number of #include statements in the generated code. After some tuning to get enough files for the build to take some time, but not forever, I got a project with:
- 6300 files (3150 cpp, 3150 h)
- 210 MB of code
- 21525 includes in .cpp files
- 10917 includes in .h files
Note: kids, use the pimpl idiom and all that stuff to reduce the number of #includes in your code. It takes forever (an hour on my PC) to compile a project that contains nothing but comments and #include statements. To get a somehow realistic dependency graph I decided that all headers will only include from the set of first 50 headers and implementation files can include from anything.
When the project was ready, I've generated different sets of build files.
First a standard makefile (marked (1) below) in a style I would write manually, i.e. using variables, %.o: %.cpp and all that stuff. It was a single file, no recursion. Dependencies for .o and .so files were listed explicitly, dependencies between headers were generated using g++ -MMD and included. Makefile had 480 lines.
Another makefile (2) was not using variables (except for $@ and $^ in the recipes) but instead it was listing all dependencies and recipes explicitly. The dependencies on included files were generated as before. Over 15900 lines.
Waf scripts were exactly how you would expect them, every task generator got its target's name and a list of sources. Nothing fancy. I've only played with the function responsible for calculating checksums. In the first case (3) it was the standard implementation that calculated md5 of file's content. Then I've tried to get rid of md5s by using just timestamp and I've also tried the md5_tmstamp extension. This extension tries to get the best of two worlds by updating the checksum only if the timestamp changed. Results were very close so I'm showing only the timestamp-based version (4).
- GNU Make variables (1): 9.777 s
- GNU Make explicit (2): 8.859 s
- Waf standard (3): 12.419 s
- Waf timestamp (4): 11.615 s
Few things I should stress. I was using a non-recursive makefile written as one big file so this is probably the best you can get with GNU Make. Most projects either use a set of recursive makefiles or at least split Makefile into several files and include them, often using magic macros. I was planning to measure the recursive makefile but I lost the interest ;-). Furthermore, I was not using any variables that you would use in a real life project, like CC, CFLAGS, etc.
Another thing to note about GNU Make is that it is actually faster to parse a huge makefile, than to compute some values using variables and standard string substitutions.
Waf runs about 3 seconds slower here. I've noticed that it takes something around 3 seconds between starting waf and the first message about entering the build directory. I don't know waf internals but I'm guessing it's the time taken to load the state of the last build.
The calculation of checksums adds almost no penalty to the build time, while in some setups it greatly reduces the number of updated targets. Keep in mind, however, that my sources were empty so the resulting binaries were minimal. In a real life project these binaries would be quite big.
Update
I've measured a solution that uses recursive makefiles. Such setup runs in 3.5 s. Seems like the biggest challenge in my sample project was finding the order of tasks using the dependency graph. When that graph was split into several smaller parts, we got a nice speed up.
Now I'm wondering... normally, when using recursive make, you may get some incomplete dependencies resulting in incomplete builds. But my sample project didn't have such edges in its graph, so the difference is clearly triggered by the size of the graph. It seems like the algorithms used by Waf and Make could use some help from the outside.
m
Ja bym obstawiał 3s jako czas uruchomienie interpretera pythona + sprawdzenia czy pliki sie nie zmienily + kompilacji.
Mialbys czas zamiast zwyklego pythona sprobowac uzyc pypy: http://pypy.org/ ? :)