« July 2008 | Main | October 2008 »

September 14, 2008

Using Shark and custom DTrace probes to debug Nagios on Mac OS X

Nagios has for several years represented a favorite wrench in my networking toolbox. It does a fantastic job of monitoring various hosts and services, warning the sysadmin if things start to get wonky, usually well before any user notices a change in service. It compiles and runs cleanly and has always performed like a champ for me. I wouldn't want to run a server without it.

I recently purchased a Mac Mini to use as a small portable network monitor for those occasions when I require some short-term network monitoring. While I strongly considered installing Linux on it, as that's what I usually use for this sort of thing, I decided to build the tools I needed in OS X Leopard. I use Leopard Server at work (I love the newiCal server), and since Leopard is officially Unix compliant, I didn't expect too much trouble, especially with great sources of FOSS for the Mac like MacPorts.

I installed Nagios through Macports (sudo port install Nagios), configured a few hosts to check, and tested the warning system. Everything was happily up and running and I had moved on to downloading and installing ntop and a few other tools when I noticed the CPU fan spooling up. Running top indicated that nagios was spinning a whole processor core. Hmm.

Google tells that others have seen this problem, but I could find no solution online. Let's find out why this is happening.

Shark

Shark.app and its command line utility shark is a sampling performance tool that will find hotspots; you can find the graphical version in /Developer/Applications/Performance Tools/Shark.app. It is simple to run shark from the command line as well, but it helps to have a configuration file to use. You can export a configuration file as shown below.

SystemTraceExport.jpg

Find the process ID of nagios ( ps ax |grep nagios ), then run shark like so: sudo shark -m TimeProfile.cfg -1 -a $NAGIOS_PID, where $NAGIOS_PID is nagios' process ID, and then hit Ctrl+\ to start a trace. Note: If you're running on the local machine, it may not be Ctrl+\, I'm debugging over ssh.

Once the trace is complete, you can open the session_*.mshark output file in Shark.app.

TimeProfile.jpg

It looks like nagios causes a lot of kernel activity, though it's kind of hard to tell what is really happening. Let's try Shark's System Trace utility to see if it tells us anything more. Just change the profile type to "System Trace," export it as before and run a short sample (I wouldn't take too many samples, this takes a while for Shark.app to parse), and open it.

The initial UI shows us what you already know -- the machine is spending a lot of resources on nagios. If you drill down a little and only show the nagios process (select it from the drop-down menu in the lower left), you'll notice that nagios is running two threads and that one of those threads is hogging the processor.

By selecting the system call tab in the middle of the window, you can see that a function called poll is getting called a lot as is read_nocancel. It looks like this might be the hotspot. Let's dig a little more.

ShortSharkSTPollSyscall.jpg

Shark's timeline view shows what's happening on a granular level. When you zoom in, you'll see a lot of red telephones. Opening the advanced settings tab, you'll notice that these red telephones signify system calls.

ShortSharkSTTimeline2.jpg

ShortSharkSTTimeline1.jpg

Click on a few of those red telephones in the timeline. It looks like read_nocancel and poll are getting called one after the other in rapid succession. There's the hot spot. Now open the "Advanced Settings" drawer, enable thread coloring, color by reason and have Shark.app again show all processes. It looks like nagios' poll thread is spinning like crazy except when it gets preempted by the kernel, which it appears is also doing a lot of work.

ThreadColoring.jpg

What's poll?

The man page for poll indicates that "poll examines a set of file descriptors to see if some of them are ready for I/O or if certain events have occurred on them." Grepping the Nagios source for poll, you'll find that it's used in just one function ./base/utils.c:3831 -- command_file_worker_thread. Digging around in the source, you'll find that nagios initializes a worker thread which runs poll in a loop to check for any commands have been entered into a command file (actually, a named pipe). This file stores commands from outside processes (usually the Nagios web app) and nagios grabs these commands to (say) stop obsessing over a host for a while. This code looks AOK. The 500 millisecond timeout for poll should keep this function from spinning the processor unless that file is getting written constantly, which it probably isn't. What's going on?

DTrace

What's DTrace?

DTrace has been added to OS X as of the Leopard release. Originally developed by Sun microsystems for the Solaris operating system, it allows a developer to place debugging probes in their code that can be turned on and off as necessary at runtime. Probes have little effect on the system unless they are activated. This is great news in that it allows the developer to ship an application with debugging code without filling log files and without slowing the software under normal use. It also keeps us from having to recompile to debug or to write a bunch of #ifdef and #defines (you probably noticed the #ifdef DEBUG_CFWT in the source code) to turn debugging facilities on and off. Exciting stuff!

Apple has added probes to much of the system, a fair chunk of the objc runtime, and a whole slew of other useful places. To list all of the available probes on the system, run sudo dtrace -l. As of this writing, there are over 23000 probes available. Apple's new performance tool, Instruments.app, uses DTrace probes under the hood to grab much of the information it displays; you can even create your own instruments based on custom probes or those probes already defined on the system.

Custom probes

Let's add some custom DTrace probes to nagios to see what's going on. Here is the code of the function you're going to instrument. It starts at line 3804 in utils.c (as of the latest stable release).
/* worker thread - artificially increases buffer of named pipe */
void * command_file_worker_thread(void *arg){
	char input_buffer[MAX_EXTERNAL_COMMAND_LENGTH];
 	struct pollfd pfd;
 	int pollval;
	struct timeval tv;
	int buffer_items=0;
	int result=0;

	/* specify cleanup routine */
	pthread_cleanup_push(cleanup_command_file_worker_thread,NULL);

	/* set cancellation info */
	pthread_setcancelstate(PTHREAD_CANCEL_ENABLE,NULL);
	pthread_setcanceltype(PTHREAD_CANCEL_DEFERRED,NULL);

	while(1){

		/* should we shutdown? */
		pthread_testcancel();

		/* wait for data to arrive */
		/* select seems to not work, so we have to use poll instead */
		pfd.fd=command_file_fd;
		pfd.events=POLLIN;
		pollval=poll(&pfd,1,500);

 		/* loop if no data */
 		if(pollval==0)
 			continue;

 		/* check for errors */
 		if(pollval==-1){
 
 			switch(errno){
 			case EBADF:
 				write_to_log("command_file_worker_thread(): poll(): EBADF",logging_options,NULL);
 				break;
 			case ENOMEM:
 				write_to_log("command_file_worker_thread(): poll(): ENOMEM",logging_options,NULL);
 				break;
 			case EFAULT:
 				write_to_log("command_file_worker_thread(): poll(): EFAULT",logging_options,NULL);
 				break;
 			case EINTR:
 				/* this can happen when running under a debugger like gdb */
 				/*
 				write_to_log("command_file_worker_thread(): poll(): EINTR (impossible)",logging_options,NULL);
 				*/
 				break;
 			default:
 				write_to_log("command_file_worker_thread(): poll(): Unknown errno value.",logging_options,NULL);
 				break;
 			        }
 
 			continue;
 		        }
 
 		/* should we shutdown? */
 		pthread_testcancel();

		/* get number of items in the buffer */
		pthread_mutex_lock(&external_command_buffer.buffer_lock);
		buffer_items=external_command_buffer.items;
		pthread_mutex_unlock(&external_command_buffer.buffer_lock);

#ifdef DEBUG_CFWT
		printf("(CFWT) BUFFER ITEMS: %d/%d\n",buffer_items,external_command_buffer_slots);
#endif

		/* process all commands in the file (named pipe) if there's some space in the buffer */
		if(buffer_items


There are a few places this might go wrong. Poll might think that there is always new data in the command file, so you can make a couple of probes that get triggered if poll returns a value or if it returns zero; in the latter case, poll should have timed-out with no data. You can also drop a probe to report any errors (when poll returns -1). A probe that fires just before and one that fires just after each time poll gets called might be useful for timing purposes.

nagiosProbe.d

DTrace uses a programming language called D. You can create scripts to run probes and you can define your own in the D language. Create a new file in the Nagios source base directory and name it nagiosProbe.d. In this file, you'll define some basic probes. It's pretty simple. Just define a provider named nagiosProbe and list a few probes. Probes can take arguments, so you'll pass the error value any time nagios hits an error and you can also pass DTrace the return value of poll itself. Put this code in nagiosProbe.d.
provider nagiosProbe {
	probe pre__poll__fd();
	probe poll__fd();
	probe poll__was__zero();
	probe poll__Err__Val(int);
	probe pollval__was(int);
};

Generating the header file

Now you need to make the compiler aware of your probes. In some versions of DTrace, there is a method by which to create custom probes that will not work on OS X. Having spent an hour trying to figure out why Sun's documentation for creating custom probes didn't work for me, I highly recommend reading the man page for dtrace on OS X. It's also worth noting that there is a bug in OSX in that the macros for creating DTrace probes are still in the header file, this keeps the compiler from complaining if you, like me, follow Sun's tutorials and use macros like DTRACE_PROBE2 in your code. The version of DTrace on OS X can take your nagiosProbe.d file and generate a header file for you that. Run dtrace -h -s nagiosProbe.d. Open the resulting nagiosProbe.h and take a look! You can see that dtrace has generated a bunch of macros that look like this:
#define	NAGIOSPROBE_POLL_ERR_VAL_ENABLED() \
	__dtrace_isenabled$nagiosProbe$poll__Err__Val$v1()
#define	NAGIOSPROBE_POLL_FD() \
{ \
	__asm__ volatile(".reference " NAGIOSPROBE_TYPEDEFS); \
	__dtrace_probe$nagiosProbe$poll__fd$v1(); \
	__asm__ volatile(".reference " NAGIOSPROBE_STABILITY); \
} 
Now you can add probes to the command_file_worker_thread function.

Adding probes to utils.c

The *_ENABLED macros evaluate to true when dtrace has enabled the probes at runtime. To be a good citizen, you should check to ensure that a probe is enabled before you call it. This will make your probes innocuous during normal operations. It also means that you can leave the probes defined in the code (no recompile necessary!) when you're finished debugging. You can always use those probes to debug at a later date without recompiling; they're always available.

Here is the relevant chunk of command_file_worker_thread with added DTrace probes.
/* worker thread - artificially increases buffer of named pipe */
void * command_file_worker_thread(void *arg){
	char input_buffer[MAX_EXTERNAL_COMMAND_LENGTH];
 	struct pollfd pfd;
 	int pollval;
	struct timeval tv;
	int buffer_items=0;
	int result=0;

	/* specify cleanup routine */
	pthread_cleanup_push(cleanup_command_file_worker_thread,NULL);

	/* set cancellation info */
	pthread_setcancelstate(PTHREAD_CANCEL_ENABLE,NULL);
	pthread_setcanceltype(PTHREAD_CANCEL_DEFERRED,NULL);

	while(1){

		/* should we shutdown? */
		pthread_testcancel();

		/* wait for data to arrive */
		/* select seems to not work, so we have to use poll instead */
		pfd.fd=command_file_fd;
		pfd.events=POLLIN;

		/* fire a probe before poll */
		if (NAGIOSPROBE_PRE_POLL_FD_ENABLED())
			NAGIOSPROBE_PRE_POLL_FD();

		pollval=poll(&pfd,1,500);

		/* fire a probe after poll */
		if (NAGIOSPROBE_POLL_FD_ENABLED())
			NAGIOSPROBE_POLL_FD();
 		/* loop if no data */
 		if(pollval==0)
		{
			/* fire a probe when poll returns no data*/
			if (NAGIOSPROBE_POLL_WAS_ZERO_ENABLED())
				NAGIOSPROBE_POLL_WAS_ZERO()
			continue;
		}
		/* fire a probe to report poll's return value*/
		if (NAGIOSPROBE_POLLVAL_WAS_ENABLED())
			NAGIOSPROBE_POLLVAL_WAS(pollval);

 		/* check for errors */
 		if(pollval==-1){
			/* fire a probe to report poll's errno*/
 			if (NAGIOSPROBE_POLL_ERR_VAL_ENABLED())
 				NAGIOSPROBE_POLL_ERR_VAL(errno);
 			switch(errno){

.
.
.
You'll want to include the nagiosProbe.h file as well.

Now stop, recompile, reinstall and restart nagios.

Using the probes

The first thing you might want to do is to make sure that you're instrumenting the right part of nagios. It's vaguely possible that poll gets called indirectly by some other part of nagios that you don't know about and it's good to know that you're not about to start chasing your tail. You can have dtrace count the number of times your pre and post poll probes get hit and compare that number to the number of times that the poll system call gets hit. There is a system-defined DTrace probe already present in poll. Create a new file called testNagios.d and drop in the following D code.
#!/usr/bin/env dtrace -qs
BEGIN
{
   printf("Tracing... Hit Ctrl-C to end.\n");
}

nagiosProbe$1:::pre-poll-fd
{
   @count_prepollfd[probefunc, probename] = count() ;  
}
nagiosProbe$1:::poll-fd
{
   @count_postpollfd[probefunc, probename] = count() ;  
}

/*the /pid == $1/ predicate ensures that we're only counting 
  syscalls originating from our nagios' process ID*/
syscall:::entry
/pid == $1/
{
   @count_syscall[probefunc, probename] = count() ; 
}

END
{
   printf("OK.  All done.")
}
Make the script executable and run it like so: sudo testNagios.d $NAGIOS_PID (where $NAGIOS_PID is the process ID of nagios). Ctrl-C to stop sampling and take a look.

The output should look something like this:
Tracing... Hit Ctrl-C to end.^C
OK.  All done.
  command_file_worker_thread                          pre-poll-fd                                                   80407
  command_file_worker_thread                          poll-fd                                                       80407
  __semwait_signal                                    entry                                                            22
  poll                                                entry                                                         80407
  read_nocancel                                       entry                                                         80407

It looks like you're instrumenting the right spot. read_nocancel, poll, and both command_file_worker_thread probes are getting hit within 1 iteration of each other (you might have hit ctrl-C in the middle of the loop).

How did that work? Like a shell script, the first command-line arg (the process id in this case) is represented by $1. When dtrace enables your probes in nagios, it appends the provider name with the process ID for you. While you aren't likely to have nagios running more than once, it's good to get into the habit of telling dtrace which process you want to look at. Each time nagios hits the probes pre-poll-fd, poll-fd, and any syscall with a DTrace probe, you're simply incrementing a counter that will print out the function (probefunc) name, the name of the probe (probename) and how many times the counter was hit. The /pid == $1/ predicate tells dtrace to filter out those syscalls that aren't a result of nagios' process ID. BEGIN gets run when sampling starts and END runs when sampling ends. When you ran the script, you might have noticed a delay before BEGIN gets called. This is dtrace setting itself up and enabling your probes in nagios.

Now you know that poll gets called a whole lot (TM), and only by the function you're instrumenting, but you don't really know how much time a whole lot (TM) represents.

Timing

It would be nice to know how much time nagios spends in poll and how much time it spends in total. Dtrace has fine-grained timing facilities. You can call timestamp at any point during execution to get the current time (nanoseconds since an arbitrary fixed point in the past). DTrace is also able to sum and aggregate values, so you can just keep adding - up time values throughout execution. Place the text of the script below into testNagiosTime.d and run it as before.
#!/usr/bin/env dtrace -qs
BEGIN
{
   printf("Tracing... Hit Ctrl-C to end.\n");
   self->start = timestamp;
   self->pollTime = 0;
}

nagiosProbe$1:::pre-poll-fd
{
   self->pollTime = timestamp;
}

nagiosProbe$1:::poll-fd
/self->pollTime/
{
   self->pollElapsed = timestamp - self->pollTime;
   @time["pollTime"] = sum(self->pollElapsed);
   self->pollTime = 0;
}

nagiosProbe$1:::poll-Err-Val
{
   printf("POLLING ERROR %i!!!\n", args[0]);
}

END
{
   printf("OK.  All done. \n");
   self->overallElapsed = timestamp - self->start;
   @time["overallTime"] = sum(self->overallElapsed);
   printa(@time);
}

With this above script, you're measuring how long nagios spends in poll and how long it's sampling in total.
Tracing... Hit Ctrl-C to end.
^C
OK.  All done. 

  pollTime                                                 2264640087
  overallTime                                              5464995357

  command_file_worker_thread               poll-fd               111713

It looks like nagios is spending about half of the execution time in poll. You also know that nagios ran poll 111713 times and that it took 2264640087 nanoseconds (a nanosecond is one billionth of a second) in total. Converting 2264640087 nanoseconds to milliseconds (one thousand of a second), you find that running poll 111713 times takes 2264 milliseconds or about .02 milleseconds per iteration of poll.

Poll isn't timing out. Is it possible that poll thinks that it has data every time? You can test that pretty easily, because you have instrumented when poll returns zero and you've also instrumented when poll returns anything at all. Let's take a look. You'll do the DTrace equivalent of printf debugging and have it print poll's return value to stdout. Drop the following into pollCount.d.
#!/usr/bin/env dtrace -qs
BEGIN
{
   printf("Tracing... Hit Ctrl-C to end.\n");
}

nagiosProbe$1:::poll-fd
{
   @count_postpollfd[probefunc, probename] = count() ;
}

nagiosProbe$1:::poll-was-zero
{
   printf("poll was zero--------------------------------------------------------------------------------------");
   @count_pollNuthin[probefunc, probename] = count() ; 
}

nagiosProbe$1:::pollval-was
/args[0] > 0/
{
   printf("----------------poll was : %i\n", args[0]);
   @count_pollNotNuthin[probefunc, probename] = count() ; 
}

nagiosProbe$1:::poll-Err-Val
{
   printf("POLLING ERROR %i!!!\n", args[0]);
}

END
{
   printf("OK.  All done. \n");
}

The output
.
.
.
----------------poll was : 1
----------------poll was : 1
----------------poll was : 1
----------------poll was : 1
OK.  All done. 

  command_file_worker_thread                          poll-fd                                                      123549
  command_file_worker_thread                          pollval-was                                                  123549
That's strange. There likely isn't a lot data getting written to the pipe for nagios to pickup, so why is poll saying that there's data to be had there on every iteration? Let's instrument a little more of this function to see just how much data there is on that pipe. You'll notice that there are some printf statements already in the function you're looking at to debug the buffer code. You might want to know how many buffer items are found and how many slots are available just like the Nagios developers did, so you can replace this code:
#ifdef DEBUG_CFWT
		printf("(CFWT) BUFFER ITEMS: %d/%d\n",buffer_items,external_command_buffer_slots);
#endif
with these probes:
		/* fire a probe to report buffer items */
		if (NAGIOSPROBE_BUFFER_ITEMS_ENABLED())
			NAGIOSPROBE_BUFFER_ITEMS(buffer_items);
		/* fire a probe to report buffer items */
		if (NAGIOSPROBE_BUFFER_SLOTS_ENABLED())
		    NAGIOSPROBE_BUFFER_SLOTS(external_command_buffer_slots);

Now you'll define those probes in the nagiosprobe.d file. Here's the whole thing.
provider nagiosProbe {
/* poll probes */
   probe pre__poll__fd();
   probe poll__fd();
   probe poll__was__zero();
   probe poll__Err__Val(int);
   probe pollval__was(int);

/* buffer information probes  */
   probe buffer__items(int);
   probe buffer__slots(int);
};
Remember that nagiosprobe.h file? It needs regenerating. sudo dtrace -h -s hagiosprobe.d

Now you'll need a D script to use the new probes.
#!/usr/bin/env dtrace -qs
BEGIN
{
   printf("Tracing... Hit Ctrl-C to end.\n");
}

nagiosProbe$1:::poll-fd
{
   @count_postpollfd[probefunc, probename] = count() ;
}

nagiosProbe$1:::poll-was-zero
{
   printf("poll was zero--------------------------------------------------------------------------------------");
   @count_pollNuthin[probefunc, probename] = count() ; 
}

nagiosProbe$1:::pollval-was
/args[0] > 0/
{
   printf("----------------poll was : %i\n", args[0]);
   @count_pollNotNuthin[probefunc, probename] = count() ; 
}

nagiosProbe$1:::poll-Err-Val
{
   printf("POLLING ERROR %i!!!\n", args[0]);
}

nagiosProbe$1:::buffer-items
{
   printf("Buffer items = %i   ", args[0]);
}

nagiosProbe$1:::buffer-slots
{
   printf("Buffer slots = %i   ", args[0]);
}

END
{
   printf("OK.  All done. \n");
}

It prints how much data is in the buffers and how many external command slots nagios has available. Stop, recompile, reinstall, and restart nagios.

The result:
Buffer items = 0   Buffer slots = 4096   ----------------poll was : 1
Buffer items = 0   Buffer slots = 4096   ----------------poll was : 1
Buffer items = 0   Buffer slots = 4096   ----------------poll was : 1
Buffer items = 0   Buffer slots = 4096   OK.  All done. 

  command_file_worker_thread               poll-fd               32408
  command_file_worker_thread               pollval-was               32407
Weird. Now you know that poll always returns 1 and that there appears to be no data to be found on that pipe. Before going too much further, let's make sure that nagios is processing commands when you do put something in the command queue. This bug might have somehow broken that functionality.

checkCmdQueue.d is below.
#!/usr/bin/env dtrace -qs
BEGIN
{
   printf("Tracing... Hit Ctrl-C to end.\n");
}

nagiosProbe$1:::poll-fd
{
   @count_postpollfd[probefunc, probename] = count() ;
}

nagiosProbe$1:::poll-was-zero
{
   @count_pollNuthin[probefunc, probename] = count() ; 
}


/* The /args[0] > 0/ predicate filters out any time that 
   poll() returns zero */
nagiosProbe$1:::pollval-was
/args[0] > 0/
{
   @count_pollNotNuthin[probefunc, probename] = count() ; 
}

nagiosProbe$1:::poll-Err-Val
{
   printf("POLLING ERROR %i!!!\n", args[0]);
}

nagiosProbe$1:::buffer-items
/args[0] > 0/
{
   @count_bufferAintNuthin[probefunc, probename] = count() ; 
   printf("Buffer items = %i   \n", args[0]);
}

END
{
   printf("OK.  All done. \n");
}
Run it, open the Nagios web page on your server, and submit some commands.

.
.
.
Buffer items = 1   
Buffer items = 1   
Buffer items = 1   
Buffer items = 1   
^C
OK.  All done. 

  command_file_worker_thread               poll-fd               862074
  command_file_worker_thread               pollval-was               862074
  command_file_worker_thread               buffer-items               1942

You should see Buffer items = 1 scrolling down your screen every time you submit a command. You'll notice that the web UI acknowledges the commands as well. This is good news. At least poll is functioning as expected when there is really something for it to do.

What's up with poll?

It appears that poll was added to the Darwin kernel fairly recently and some people have complained of the processor spinning problem. There also appear to be some useful thoughts available. There is also an implementation of poll that bypasses that of the system available here. It emulates poll using select.

The Fix

What options available for fixing this? You could re-implement using kqueues. You could try the above poll emulator, which wraps poll in select. You could keep digging and possibly find out why the Darwin Kernel folks implemented poll this way and possibly find a fix in the process. You could just pause execution for a little while if the size of the buffer is zero and when poll returns something other than zero, essentially emulating the missing timeout.

There are some advantages to the last option, even though it seems like a bit of a hack. First, if you test for the buffer having nothing in it and just pause at that point, you aren't likely to break Nagios on other platforms because that code won't get hit; poll will have timed-out and returned zero under (say) Linux, so the manual pause would never happen there. Even if it did get hit unexpectedly, you're only pausing for an extra 500 milliseconds, which would only slightly slow the processing of external commands. In any case, you won't also have to hack the Nagios config script to test for a broken poll like some of the projects linked to above. You also won't have to introduce a third-party wrapper to poll, which would also require the config to test poll during configuration and which could introduce unexpected bugs of its own. Simpler is better, I think. Sometimes a small ugly change that doesn't significantly mess with your code base is the most pragmatic solution.

Try this.
		/* get number of items in the buffer */
		pthread_mutex_lock(&external_command_buffer.buffer_lock);
		buffer_items=external_command_buffer.items;
		pthread_mutex_unlock(&external_command_buffer.buffer_lock);

#ifdef DEBUG_CFWT
		printf("(CFWT) BUFFER ITEMS: %d/%d\n",buffer_items,external_command_buffer_slots);
#endif

		/* fire a probe to report buffer items */
		if (NAGIOSPROBE_BUFFER_ITEMS_ENABLED())
			NAGIOSPROBE_BUFFER_ITEMS(buffer_items);
		/* fire a probe to report buffer items */
		if (NAGIOSPROBE_BUFFER_SLOTS_ENABLED())
			NAGIOSPROBE_BUFFER_SLOTS(external_command_buffer_slots);

        /* On OS X, poll will return 1 even when the buffer is empty.  
           We'll pause manually if poll returns > 0 and the buffer is empty. */ 
        if (0 == buffer_items)
        {
            tv.tv_sec=0;
            tv.tv_usec=500;
            select(0,NULL,NULL,NULL,&tv);
        }

Testing the fix

Compiling nagios and reinstalling, you'll notice that nagios barely registers on top. That's good news. Let's run checkCmdQueue.d one time to make sure that nagios is still reading the command file.
.
.
.
Buffer items = 1   
Buffer items = 1   
Buffer items = 1   
^C
OK.  All done. 

  command_file_worker_thread                          poll-fd                                                       29327
  command_file_worker_thread                          pollval-was                                                   29327
  command_file_worker_thread                          buffer-items                                                    509
Yes. AOK!

Run shark again as before and see what you find.

NiceIdle.jpg

Look at all of that idle time! How about the system calls. What are they doing?

AfterSyscall1.jpg

AfterSyscall2.jpg

AfterSyscall3.jpg

Notice that instead of read_nocancel and poll getting called in rapid succession, there is an intervening select that idles our thread. I still see about 2.5% processor usage out of nagios, which you could probably further curtail. A longer timeout wouldn't kill functionality; nagios responding to commands, in my experience, is not necessarily something that needs to be instantaneous, but this seems to have squashed the processor spinning bug nicely.

Misc. Links

If you want to try fakePoll instead (perhaps I'll do that and post the results later), it's here.

Lftp-devel mailing list members noticed the problem.

Daemontools appears to have been (at least at one point) hit by this.

Curllib had the problem.

Glib2 as well.

It looks like macports has an emulator for poll. I'll send this to them and perhaps they'll modify the port file for nagios.

The Fink project (similar in spirit to macports) recommends those porting software consider not using poll on OS X at all.

I'll be sending a bug report to the Nagios developer list as well. I'm glad I can contribute something back to a project I use so much.

DTrace Links

Top Ten DTrace (D) Scripts

DTrace oneliners SAVE

Exploring Leopard

The Exciter: A quick stroll through DTrace

Everyday DTrace on OSX_ A Guide to Using DTrace for Your Full Application Stack Presentation.pdf

bbum's weblog-o-mat " Blog Archive " Objective-C: Printing Class Name from Dtrace

Instruments User Guide: Creating Custom Instruments with DTrace

Solaris Operating System - DTrace How To Guide

DTraceToolkit at OpenSolaris.org

sysinfo Provider - DTrace - wikis.sun.com

Statically Defined Tracing for User Applications

Adam Leventhal's Weblog

nasrat: OS X and dtrace

Solaris Dynamic Tracing Guide

dtrace.conf - DTrace - wikis.sun.com

A quick stroll through DTrace

The Punchline

Here is the modified version of the function with the DTrace probes and the fix.
#include "nagiosprobe.h"
/* worker thread - artificially increases buffer of named pipe */
void * command_file_worker_thread(void *arg){
	char input_buffer[MAX_EXTERNAL_COMMAND_LENGTH];
 	struct pollfd pfd;
 	int pollval;
	struct timeval tv;
	int buffer_items=0;
	int result=0;

	/* specify cleanup routine */
	pthread_cleanup_push(cleanup_command_file_worker_thread,NULL);

	/* set cancellation info */
	pthread_setcancelstate(PTHREAD_CANCEL_ENABLE,NULL);
	pthread_setcanceltype(PTHREAD_CANCEL_DEFERRED,NULL);

	while(1){

		/* should we shutdown? */
		pthread_testcancel();

		/* wait for data to arrive */
		/* select seems to not work, so we have to use poll instead */
		pfd.fd=command_file_fd;
		pfd.events=POLLIN;

		/* fire a probe before poll */
		if (NAGIOSPROBE_PRE_POLL_FD_ENABLED())
			NAGIOSPROBE_PRE_POLL_FD();

		pollval=poll(&pfd,1,500);

		/* fire a probe after poll */
		if (NAGIOSPROBE_POLL_FD_ENABLED())
			NAGIOSPROBE_POLL_FD();
 		/* loop if no data */
 		if(pollval==0)
		{
			/* fire a probe when poll returns no data*/
			if (NAGIOSPROBE_POLL_WAS_ZERO_ENABLED())
				NAGIOSPROBE_POLL_WAS_ZERO()
			continue;
		}
		/* fire a probe to report poll's return value*/
		if (NAGIOSPROBE_POLLVAL_WAS_ENABLED())
			NAGIOSPROBE_POLLVAL_WAS(pollval);

 		/* check for errors */
 		if(pollval==-1){
			/* fire a probe to report poll's errno*/
 			if (NAGIOSPROBE_POLL_ERR_VAL_ENABLED())
 				NAGIOSPROBE_POLL_ERR_VAL(errno);
 			switch(errno){

 			case EBADF:
 				write_to_log("command_file_worker_thread(): poll(): EBADF",logging_options,NULL);
 				break;
 			case ENOMEM:
 				write_to_log("command_file_worker_thread(): poll(): ENOMEM",logging_options,NULL);
 				break;
 			case EFAULT:
 				write_to_log("command_file_worker_thread(): poll(): EFAULT",logging_options,NULL);
 				break;
 			case EINTR:
 				/* this can happen when running under a debugger like gdb */
 				/*
 				write_to_log("command_file_worker_thread(): poll(): EINTR (impossible)",logging_options,NULL);
 				*/
 				break;
 			default:
 				write_to_log("command_file_worker_thread(): poll(): Unknown errno value.",logging_options,NULL);
 				break;
 			        }
 
 			continue;
 		        }
 
 		/* should we shutdown? */
 		pthread_testcancel();

		/* get number of items in the buffer */
		pthread_mutex_lock(&external_command_buffer.buffer_lock);
		buffer_items=external_command_buffer.items;
		pthread_mutex_unlock(&external_command_buffer.buffer_lock);

#ifdef DEBUG_CFWT
		printf("(CFWT) BUFFER ITEMS: %d/%d\n",buffer_items,external_command_buffer_slots);
#endif

		/* fire a probe to report buffer items */
		if (NAGIOSPROBE_BUFFER_ITEMS_ENABLED())
			NAGIOSPROBE_BUFFER_ITEMS(buffer_items);
		/* fire a probe to report buffer items */
		if (NAGIOSPROBE_BUFFER_SLOTS_ENABLED())
			NAGIOSPROBE_BUFFER_SLOTS(external_command_buffer_slots);

        /* On OS X, poll will return 1 even when the buffer is empty.  
           We'll pause here manually if poll returns > 0 and the buffer is empty. */ 
        if (0 == buffer_items)
        {
            tv.tv_sec=0;
            tv.tv_usec=500;
            select(0,NULL,NULL,NULL,&tv);
        }

		/* process all commands in the file (named pipe) if there's some space in the buffer */
		if(buffer_items

Note: Post slightly modified the evening it was posted to make capitalization and italics more consistent. I try to use the capital Nagios and capital DTrace when talking about a project or a product and use italicized nagios and italicized dtrace when talking about a binary, Unix service, or executable.
My friend Vas reminded me of some net etiquette. I have moved a great portion of this into the extended entry. If you feed reader downloaded this novella before I was able to do that, sorry. It's been a while since I've blogged.
Thanks to Brian Hardy of the Big Nerd Ranch for finding an error in one of my code listings.