Monday 19 April 2010

Programming Tools: The Build Farm

In my first essay on programming tools I suggested that you take some code, press a button and it builds.

What I didn't talk about was how it builds.

Firstly, it doesn't build on your local machine (it could, if you install the right tools, but that's an unnecessary addition. If you're working locally, you just use your version control system back to the online system to build...). Instead of local builds we have a build farm - a set of one or more machines which include all the tools for building for one or more particular environments. When you hit build, your code is shunted off to the first available machine that can build for the platform you require. The build is performed, and the results are returned.

Actually - its even better. Because this is always happening in behind the scenes, as you type. Whenever it gets a chance a new build of your code is scheduled... so usually when you press the build button the results are instantly available.

And by results I mean build errors and warnings. I mean binaries. I mean all outputs one would usually see from a build. Of course, you probably won't care about the binaries because for most of your development time you'll be using a test farm to cope with these. The errors are the important thing - and they'll be checked into the appropriate point in the version control system so that they are always instantly available

In order to make such a system work, each machine in the build farm will need to have a working build environment installed (its probably best to do this with a VM image, so that making clones is easy). They will also need a daemon process to control the build. Its probably best to think of this as being the build tool - the equivalent of make.

The build tool will advertise itself on the network (using zero-conf or rendezvous or whatever), along with advertising the systems it is capable of building for. Any machine with a tool put on the network will automatically be available to build

I've spent a lot of time thinking about what the ideal replacement for make would be like, but the long and the short of it is, the characteristics of the build system tool won't matter so long as it is possible to only need to rebuild those files which have changed... it could rely heavily on the vcs here, after all, it will know from the vcs exactly which files are different from the last build.

I do think, however, that directory structure is probably not a desirable thing to have. Rather, it should be possible to put files into groups either explicitly, or by smart groups (just as you have smart folders in itunes). A file can be in many groups. When a build tool needs an arbitrary structure, it should be possible to generate it on the fly by moving (groups of )files to the correct locations. This gets over the issues with - say - Microsoft's driver build environment, which only allows two levels of directory structure, and helps where the directory structures need to be different on different platforms. It also helps with version control: at the moment version control systems have to try to emulate every filing system their files may run on, so that permissions and so forth can be replicated correctly. Moving files around is a pain. With this system, everything can fall into build system metadata, and we can provide our own security features (such as restricting changes different code areas to different program groups) if we need them. Of course, build system metadata is stored in a file in version control - so if a file is merged or permissions need to be changed, those changes are tracked like everything else - transparently.

The end result of this complexity is simplicity. Once a build has been set up, it will 'just work' for everybody. If you suddenly need to build for another platform, its just a simple choice of a new platform to see if it works. You can test every platform quickly and easily without searching for appropriate machines and solving 101 build problems.

And because a build daemon is externally controllable - it is no effort at all to use it (and maybe a particular 'gold' machine) for scheduling nightly builds. Or rolling test builds of the code integration branch.

Programmers life is simpler. Builds are easier. Testing is easier. And everyones' lives can be made faster and easier still just by throwing on new hardware. Its a win for everybody.

No comments:

Post a Comment