[ previous ] [ next ] [ threads ]
 
 From:  Jim Gifford <jim at giffords dot net>
 To:  m0n0wall at lists dot m0n0 dot ch
 Subject:  Re: [m0n0wall] about the load averages: 2.11, 0.07, 0.05
 Date:  Mon, 5 Apr 2004 17:02:12 -0400
A simple google search of 'what is load average' yields this link on the
first page:

http://www.teamquest.com/resources/gunther/ldavg1.shtml

I've read the definition of load average many times, and don't remember
it off the top of my head.  Why?  Because it isn't important.  Load
average is just one window into system performance.  I've see systems
with load averages over 20 that were still responsive, and I've seen
systems crawl with a load average of 4.  As you manage a machine, you
tend to get a feel for where its "normal" range is, and load average can
be a good indicator of when things are abnormal.

It is nice to know the precise definition, but really, it isn't
important.  Just like the linux 'bogomips' setting isn't really important
except for bragging rights (and most people have outgrown that by now).

If you record load average information and plot it over time, you can get
some good trend information about the usage of a system.

A much wiser sysadmin than I once explained to me why my concern over a
system hitting 100% CPU usage every so often was misplaced.  As he pointed
out, we had already paid many big bucks for that machine with that CPU,
and any time we weren't running it at 100% CPU usage, we weren't getting
our money's worth out of it.  *grin*

Granted, in a m0n0wall application, you probably don't want to stay
at 100% CPU very often.  In a compute server though, you just might be
getting your money's worth.

hope this helps,
jim

On Mon, Apr 05, 2004 at 10:22:27PM +0200, Adam Nellemann wrote:
> Hi,
> 
> I've been following this thread, in the hope of some kind of definite 
> answer to what these numbers are precisely (being mostly ignorant to 
> *NIX stuff myself). So far there still seem to be no authoritative 
> explanation forthcomming?
> 
> 
> >Try not to think of it as a percent of processor in use. Think of it  like
> >this. Anything more than 0.0 means that processes are asking for time 
> >faster
> >than your system can finish their requested tasks.
> 
> While this sounds right to me overall, are you sure it shouldn't have 
> been "Anything more than [1.0] means[...]"? Otherwise, what number 
> would be shown in a case where the CPU idles, or are at 50% load? (ie. 
> "[...]processes are asking for time [slower] than the system can[...] 
> in you terminology.)
> 
> = = =
> 
> What I know for sure is this: The number will never go below 0.0, 
> which corrosponds to the CPU being idle 100% of the time (duh! Who 
> would have guessed!) Also, I'm quite (but not absolutely) sure that it 
> (theoretically) has no upper limit (ie. it would go towards inifnity 
> if you were running XP and Word on, say a ZX Spectrum or Commodore 64).
> 
> I do have some further, albeit unsure, information about how the 
> number works, but don't put too much trust in this:
> 
> I seem to remember being told that a load of 1.0 means that your CPU 
> is working fulltime, but no active threads have to wait for their 
> timeslices. A lower number means free CPU time is left over, and a 
> higher number indicates that the active threads have to wait.
> 
> Assuming this is correct (although it might very well not be!) it 
> might be considered kind of like a percentage (ie. 1.0 = 100%, 0.5 = 
> 50% and so on), but where more than 100% is possible, such that a load 
> of 2.0 would indicate that the active threads are waiting about the 
> same ammount of time that they get to execute (ie. 2.0 = 200% = the 
> work could be done without waiting by two CPU's, or one twice as fast 
> as the current one, and so forth.)
> 
> This, however, seem a bit converse to what people have been writing 
> here, that the number has to do with the number of threads waiting (or 
> something along those lines). I must therefore reiterate that my 
> information may very well be completely wrong?
> 
> It would make sense using this method though, since the usual (for MS 
> OS'es at least) solution, of showing the "actual" CPU load (0-100%, 
> but never more than 100%), leaves out that crucial bit of information 
> information about how much "overload" is present (ie. how much more 
> CPU load would I need to buy to avoid the slowdown).
> 
> There is quite a difference between a situation where the threads are 
> waiting 10% of the time they spend executing, and one where they are 
> waiting 290% of their running time. In both cases a MS OS would show a 
> CPU load of 100%, whereas on a *NIX box, these loads (assuming my info 
> is correct) would translate to 1.10 respectivly 2.90, indicating that 
> the former situation is somewhat ok, whereas the latter tells you to 
> go shop for either a two new CPUs like the one you currently have or 
> get a new one that is three times faster, in either case, and all 
> things else being equal, you would get a CPU load a little below 100%, 
> or in *NIX speech, it would show just below 1.0 (for each CPU, in case 
> of the multi-CPU solution.)
> 
> = = =
> 
> Anyone care to comment on the above? Perhaps you know this is 
> certainly wrong, or that it might be correct? I sure hope that 
> eventually someone will learn the actual truth of how these, so far 
> magic, numbers are calculated, and make sure to post a note about it here!
> 
> Also, if anyone knows of a utility for Win32 OS'es (specifically 
> Win2k) that will show the load in a similar (*NIX) manner, I'd very 
> much like some linkage!
> 
> 
> Adam.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: m0n0wall dash unsubscribe at lists dot m0n0 dot ch
> For additional commands, e-mail: m0n0wall dash help at lists dot m0n0 dot ch
>