The GNU Make Job Server

Quite some years ago I played around with the GNU Make source code as part of a C programming course I studied.  At the time I came across the GNU Make job server but did not really understand what it did.  However, having read a bit about it recently I thought I would share the details because it gives an interesting insight into solving some problems related to parallelism.  It is perhaps a practical example of a solution to the Dining philosophers problem.

To understand the job server you must first understand what 'jobs' are in the context of GNU Make.  When you run GNU make you can give it a 'jobs' parameter (-j).  This causes make to build using multiple processes/threads.  This option can either be given without an argument - meaning it will create as many processes as possible - or with an argument indicating the maximum number of jobs to run at once.  It is normally recommended that you provide the maximum number of jobs because otherwise make with keep spawning thread/processes and effectively fork bomb your computer.  If you want to know the optimal value to supply it really depends who you ask.  Most people will say either the number of cores in your computer plus one or just the number of cores.

Now getting back on topic.  The first thing to note about the GNU make job server is that it is not a server at all.  The job server is more of a technique used to avoid having to use a server.  It attempts to elegantly handle jobs when using recursive make files.  Recursive make files are makefiles that invoke the make command.  It is desirable for the jobs option to be shared with these other instances of make spawned from the original.  However, make needs to ensure only the maximum number of jobs supplied by the user are run at any one time.  This is the tricky bit:  How do you ensure multiple processes only use at most N jobs between them at any one time?

The solution the developers went for is quite elegant and simple.  Lets say we call the make command and permit N jobs.  A pipe is created by the first instance of make.  N-1 one-byte tokens are added to the pipe.  The reason N-1 tokens are written and not N is that one job is used by the first instance of make.  This pipe is then shared with all the child make instances.  Whenever make wants to start a new job it must first acquire a token from the queue.  When a job finishes it writes the token back to the pipe.  This way no more than N jobs will run at a time.

The full details of the implementation and some of the pitfalls can be found here:
http://mad-scientist.net/make/jobserver.html

Comments

Popular Posts