DXR semantic code browsing

25 lutego 2012, 13:40:24

Przegląd prasy

25 lutego 2012, 13:08:30

It runs on water, man

05 lutego 2012, 01:51:13

Whoa, ambda lambda, hokus pokus! Mózg mnie boli. Miran Lipovača przedstawia Learn you a Haskell. Jeśli ktoś nie ma, jak ja, całej soboty na lekturę, to polecam ostatni rozdział.

Jak uważny czytelnik się zorientował, jestem w trakcie przypominania sobie Haskella. Kilka lat przerwy zrobiło swoje i w głowie zostały mi tylko monady i te wcięcia, jeszcze śmieszniejsze niż pythonowe. Wczoraj sobie dowiodłem, że samymi wcięciami się nie da programować, więc dzisiaj odbył się intensywny kurs. Teraz kursanci udają się na wypoczynek, aby jutro pełni sił pisać programy, które zbawią świat przed atakiem kosmitów-czarodziejów. OK. Może przeczytałbym to szybciej, gdyby nie South Park, ale po prostu bym umarł. Yo!

Waf vs GNU Make — part two

22 stycznia 2012, 16:39:20

Last time I've determined that the biggest problem of make and waf was a big dependency graph. To verify that, I prepared an even bigger project to test. Results follow but first, I've got some news.

Tup is another build tool. But not yet another. Mike Shal, the author of Tup, used a completely different approach to the build problem and have reverted the dependency graph. This way the algorithm starts with changed nodes and builds the graph by adding their dependencies (in this case, the products). This results in a much smaller graph to solve. I read the results from the page and they're stunning.

There are few things I like about Tup. The inverted graph is a good idea. Who cares how to build program foo. If a source file changed, and this source file is there probably not for fun, then compile it. That's what incremental build is all about. Another nice feature is an implicit handling of subdirectories (note, it's not recursion, just splitting the build files). If you want to add a subdirectory, just put a Tup-file here, no need to mention that in the parent.

Another interesting idea is the way that dependencies are handled. Tup monitors applications run during build and if those apps access filesystem, is sees that and adds that info to its database. No need to process #include files anymore. If gcc opens them, it meens they're needed. And it works for all types of files.

What I don't like about tup... first of all, yet another syntax to learn. I really agree with Waf, Scons and others, that using an existing language is better. Second thing is that it needs fuse. My current kernel has no fuse support so right now I'm not able to test tup myself.

A bigger problem is that it is so damn small. I like small things, don't get me wrong. But from a build tool, I would expect support for some common usage patterns like build variants, unit tests, configuration, etc. I like Waf's way of expressing the build process in terms of tasks generators, i.e. gimme a program built from these sources, instead of node-center view that tup, make and many others use. Compile first source, compile another source, link. Crap. Every C++ program is built basically the same way. Don't repeat yourself.

Tup implements some functionality to monitor filesystem, for exapmle it cleans unused build files (e.g. when you rename output, the old file is now unused) automatically. I haven't used it yet, I'm not sure if I like it or not.

OK, back to Waf and Make. It's now clear that they both suck, let's just see how. The project was bigger this time:

  • 9100 files
  • 300 MB of code
  • 30487 includes in .cpp files
  • 15727 includes in .h files

Yesterday I dig into waf's code, I've also payed a little more attention to the first build, because it turned out to be a hint of what's going on. Make sucks at finding the proper build order, but it's damn fast on checking the files. Git does exactly the opposite. Why? Because Git uses domain knowledge to its aid when ordering tasks. In simpler (and not entirely true) words, it just knows that you first need to compile sources, then link.

The observed behavior is that, before actually doing anything, make takes the same amount of time for the first build and for an incremental build. Every time it is run, it checks dates on all files and determines the order. Waf starts almost immediately (just some lag caused by python, parsing, etc.) on the first build, when its cache is empty. On an incremental build it reads its cache which takes forever.... and consumes 0.5 GB of memory.

So for the initial build, Waf starts just as fast no mater how big the project is. The incremental build, however, is a huge challenge on (very) big projects. Finally, the results:

  • GNU Make variables (1): 20,05s user 1,11s system
  • Waf standard (3): 243,03s user 4,38s system

Waf vs GNU Make — Incremental Build

21 stycznia 2012, 21:00:54

I don't post in English but since I couldn't google this, I've decided to share the results in a common language. Since the Waf's community is quite small, it seemed almost pointless to write in Polish.

Waf is a promising build tool. It uses checksums to determine if anything needs updating. I like this feature because when working with Git and switching branches back and forth, I often get different timestamps for files that didn't really change.

The problem with checksums is that it takes some time to compute them, so GNU Make, in theory, could be faster here... I decided to check that. The aim was to measure how long it takes to determine that nothing changed and nothing needs to be rebuilt.

I've generated a sample C++ project. The generator was extremely stupid. I've played a little with parameters: number of files, their size and number of #include statements in the generated code. After some tuning to get enough files for the build to take some time, but not forever, I got a project with:

  • 6300 files (3150 cpp, 3150 h)
  • 210 MB of code
  • 21525 includes in .cpp files
  • 10917 includes in .h files

Note: kids, use the pimpl idiom and all that stuff to reduce the number of #includes in your code. It takes forever (an hour on my PC) to compile a project that contains nothing but comments and #include statements. To get a somehow realistic dependency graph I decided that all headers will only include from the set of first 50 headers and implementation files can include from anything.

When the project was ready, I've generated different sets of build files.

First a standard makefile (marked (1) below) in a style I would write manually, i.e. using variables, %.o: %.cpp and all that stuff. It was a single file, no recursion. Dependencies for .o and .so files were listed explicitly, dependencies between headers were generated using g++ -MMD and included. Makefile had 480 lines.

Another makefile (2) was not using variables (except for $@ and $^ in the recipes) but instead it was listing all dependencies and recipes explicitly. The dependencies on included files were generated as before. Over 15900 lines.

Waf scripts were exactly how you would expect them, every task generator got its target's name and a list of sources. Nothing fancy. I've only played with the function responsible for calculating checksums. In the first case (3) it was the standard implementation that calculated md5 of file's content. Then I've tried to get rid of md5s by using just timestamp and I've also tried the md5_tmstamp extension. This extension tries to get the best of two worlds by updating the checksum only if the timestamp changed. Results were very close so I'm showing only the timestamp-based version (4).

  • GNU Make variables (1): 9.777 s
  • GNU Make explicit (2): 8.859 s
  • Waf standard (3): 12.419 s
  • Waf timestamp (4): 11.615 s

Few things I should stress. I was using a non-recursive makefile written as one big file so this is probably the best you can get with GNU Make. Most projects either use a set of recursive makefiles or at least split Makefile into several files and include them, often using magic macros. I was planning to measure the recursive makefile but I lost the interest ;-). Furthermore, I was not using any variables that you would use in a real life project, like CC, CFLAGS, etc.

Another thing to note about GNU Make is that it is actually faster to parse a huge makefile, than to compute some values using variables and standard string substitutions.

Waf runs about 3 seconds slower here. I've noticed that it takes something around 3 seconds between starting waf and the first message about entering the build directory. I don't know waf internals but I'm guessing it's the time taken to load the state of the last build.

The calculation of checksums adds almost no penalty to the build time, while in some setups it greatly reduces the number of updated targets. Keep in mind, however, that my sources were empty so the resulting binaries were minimal. In a real life project these binaries would be quite big.

Update

I've measured a solution that uses recursive makefiles. Such setup runs in 3.5 s. Seems like the biggest challenge in my sample project was finding the order of tasks using the dependency graph. When that graph was split into several smaller parts, we got a nice speed up.

Now I'm wondering... normally, when using recursive make, you may get some incomplete dependencies resulting in incomplete builds. But my sample project didn't have such edges in its graph, so the difference is clearly triggered by the size of the graph. It seems like the algorithms used by Waf and Make could use some help from the outside.

git notes

05 stycznia 2012, 19:55:21

Do czego w praktyce używacie notatek udostępnianych przez git notes? Czy to tylko nikomu nieprzydatna funkcjonalność, czy fantastyczne narzędzie usprawniające pracę?

Osobiście przyszło mi do głowy tylko coś takiego jak poniżej, ale ponieważ notki są przypięte do commita, to po rebase znikają, więc użyteczność tego rozwiązania jest wątpliwa.

notes add -m 'review passed'

Chętnie poznam inne sytuacje, w których notki okazały się pomocne.

git update-index

05 stycznia 2012, 15:04:26

Jest dopiero 5 stycznia a ja juz dokonałem odkrycia roku. Problem zmian w śledzonych plikach, których nie chcemy przypadkiem commitować jak np. tymczasowe zmiany w konfiguracji, nie jest już problemem. Behold!

git update-index --assume-unchanged <path>

Od razu sobie zrobie alias na git dont-commit :)

Scripting the Vim editor

11 listopada 2011, 11:11:47

Przegląd prasy: const, decltype i inni

28 września 2011, 20:44:01

Gnome terminal with Vim

29 lipca 2011, 23:45:43

Ostatnio pracuję tak, że w kilku zakładkach terminala jestem zalogowany na zdalnej maszynie. W pierwszej edytuję, w pozostałych kompiluję/grepuję/itp. A ponieważ dawno nie widziałem ładnego kodu, postanowiłem sobie na niego popatrzeć i przy okazji dorobić mały ficzer do gnome-terminala.

Jakby ktoś chciał to proszę. Przy okazji zbieram pomysły jak grzecznie poznać, w której zakładce uruchomiony jest edytor (note: ssh)