GNU Make: Why not use load average to limit the number of jobs started?
GNU make has a '-l' option which allow you to limit the maximum load used by GNU make. It works by preventing new job from starting if the load average over the last second exceeds the value provided. I decided to write an article about this because I seen a lot of very vague statements about why using this option isn't necessarily a good idea. I don't entirely disagree with these people. I just thought it would be worth putting together an article explain how they came to those conclusions.
What is load average?
First of all it's important when discussing this parameter to understand exactly what load average means. Load is a very simple measure of how busy a CPU is. An idle CPU has a load of zero. It a job added to that CPU the load would then be one. If a further job needs to be run on that CPU but the previous jobs is still running it must be queued. In this case the load would be two and so on. The maximum load a computer can handle before it might be considered overloaded is therefore equal to the number of available CPUs. The load average itself is simply the average load system wide over some period of time.
Unix like operating systems offer the ability query the average load over the last one, five and fifteen minutes. A number of utilities can be used to view these values including top and uptime.
So what's the problem?
As mentioned above the load averages available are the last one, five and fifteen minutes. GNU Make uses the last minute which is pretty much an eternity if you are a computer. The situation isn't quite as bad as it could be because GNU Make attempts to guess the additional load that will be added by jobs added in the last second. However, if you use this feature without limiting the maximum number of jobs you may notice the average load exceeding the value provided. This is because it will take time for each job to be fully represented in the load average meaning more jobs get started. If you are using GNU Make for a big job you may also notice periods where less jobs are running than the number of CPUs on your machine even if there are no pending dependencies. This is because it will take time for the load average to drop after a job finishes.
Conclusions
Despite the shortcoming I still believe this is a useful feature. There are some advantages to using '-l' over simply limiting the number of jobs. For one this feature will take into account other processes that are running on your computer. I personally do not think there is a perfect solution when it comes to figuring out how many jobs to run. The conventional method of simply limiting the number of jobs is not a perfect solution either but that's for another article. As I've already hinted at I'd recommend using this feature in conjunction with the '-jN' parameter to ensure it does not start an excessive number of jobs.
Comments