You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Nate Kurz <na...@triage.tripod.com> on 1997/07/24 22:58:20 UTC

Re: I love solaris (SIGHUP's)

Dean Gaudet wrote:
> 
> I'll bet it's related to the signal masking problem that Nathan Kurz
> reported... er wait, maybe he only sent that one to me in private email, I
> can't find it in the bugdb.  Nathan?  It's not 792, that one is a race in
> the children not the parent. 
> 
> Anyhow the problem is that reclaim_child_processes, and some code around
> it is not properly masked.  I think. 
>
> On Thu, 24 Jul 1997, Marc Slemko wrote:
> 
> > I had a program doing a while(1){signal(pid,SIGHUP)} on Apache
> > (interesting that even though this sends a lot more signals to Apache then
> > a while true; do kill -HUP pid; done, it results in Apache restarting a
> > lot less frequently.  Adding a couple of directives worked, adding 30
> > didn't.

I can't find anything I wrote you about that.  Although I do recall
what you are talking about...  Maybe my telepathy is acting up again.

The problem I found was that if you flooded Apache with signals, all
the selects()'s in reclaim_child_processes() would be cut short, and
spurious warnings would be given into the error log.  It's possible
this would explain why things would be slower with more signals, since
the child isn't reaped in a timely fashion.  Do you see anything
obvious in the error log about 'child process did not exit'?

My solution was to mask off SIGHUP, SIGTERM, and SIGUSR1 around the
select() that is used as a sleep.  We want this to be a hard sleep,
that can't be interrupted by normal signals.   

Here's an snippet from what I was doing.  I was doing this for a
different server, so some things (like error logging) may not match up
exactly.  But it will explain things better than I managed to.

 /* NOTE: we mask signals in case we are receiving a torrent of them. 
    If we didn't, all the sleeps might be interrupted with no pause. */

 signal_mask(SIGUSR1);
 signal_mask(SIGHUP);
 signal_mask(SIGTERM);
 manager_millisleep(time_to_wait);
 signal_unmask(SIGTERM);
 signal_unmask(SIGHUP);
 signal_unmask(SIGUSR1);

void signal_mask(int signal_number)
{
  sigset_t mask;

  ASSERT(signal_number);

  sigemptyset(&mask);

  sigaddset(&mask, signal_number);

  if (sigprocmask(SIG_BLOCK, &mask, NULL) < 0) {
    log_error("couldn't block signal %d", signal_number);
    log_unix_error("sigprocmask");
    return;
  }

  return;
}

void signal_unmask(int signal_number)
{
  sigset_t mask;

  ASSERT(signal_number);

  sigemptyset(&mask);

  sigaddset(&mask, signal_number);

  if (sigprocmask(SIG_UNBLOCK, &mask, NULL) < 0) {
    log_error("couldn't unblock signal %d", signal_number);
    log_unix_error("sigprocmask");
    return;
  }
  return;
}

void manager_millisleep(int milliseconds)
{
  struct timeval timevalue;

  /* deal with the case that milliseconds is greater than one second */
  timevalue.tv_sec = milliseconds / 1000;

  /* set the microseconds from the milliseconds */
  timevalue.tv_usec = (milliseconds % 1000) * 1000;

  select(0, NULL, NULL, NULL, &timevalue);

  return;
}

nate@tripod.com
http://www.tripod.com

Re: I love solaris (SIGHUP's)

Posted by Dean Gaudet <dg...@arctic.org>.

On Thu, 24 Jul 1997, Nate Kurz wrote:

> The problem I found was that if you flooded Apache with signals, all
> the selects()'s in reclaim_child_processes() would be cut short, and
> spurious warnings would be given into the error log.  It's possible
> this would explain why things would be slower with more signals, since
> the child isn't reaped in a timely fashion.  Do you see anything
> obvious in the error log about 'child process did not exit'?
> 
> My solution was to mask off SIGHUP, SIGTERM, and SIGUSR1 around the
> select() that is used as a sleep.  We want this to be a hard sleep,
> that can't be interrupted by normal signals.   

FWIW the code that's been committed to 1.2 and 1.3 only masks HUP and
USR1.  I didn't mask TERM because a TERM causes the server to do a single
killpg and then exit ... For TERM we can't just miss the signal, we'd have
to trap it, set a "exit when you're done reclaiming children" flag,
continue the loop (deal with the interrupted sleep) and check that when
the reclaim is done ... or am I missing something?

Dean