You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by ra...@bellglobal.com on 1998/01/28 21:04:37 UTC

Latest CVS httpd hangs

This is the second time in 2 days that I have seen Apache stop answering
requests.  I am running a CVS as of 2 days ago with Dean's chunking patch.
(Solaris 2.5.1, gcc-2.8.0)

A truss on the parent process shows:

poll(0xEFFFDAA0, 0, 1000)                       = 0
time()                                          = 886017547
lseek(15, 0, SEEK_SET)                          = 0
read(15, "\00101\0\0\0\0\00101\0\0".., 1284)    = 1284
waitid(P_ALL, 0, 0xEFFFFA20, WEXITED|WTRAPPED|WNOHANG) = 0
poll(0xEFFFDAA0, 0, 1000)                       = 0
time()                                          = 886017548
lseek(15, 0, SEEK_SET)                          = 0
read(15, "\00101\0\0\0\0\00101\0\0".., 1284)    = 1284
waitid(P_ALL, 0, 0xEFFFFA20, WEXITED|WTRAPPED|WNOHANG) = 0
poll(0xEFFFDAA0, 0, 1000)                       = 0
time()                                          = 886017549
lseek(15, 0, SEEK_SET)                          = 0
read(15, "\00101\0\0\0\0\00101\0\0".., 1284)    = 1284
waitid(P_ALL, 0, 0xEFFFFA20, WEXITED|WTRAPPED|WNOHANG) = 0

etc..  infinitely.

The child processes are all sitting in:

fcntl(14, F_SETLKW, 0x000FA69C) (sleeping...)

Server was compiled with: -DNO_SLACK=1 -DUSE_FCNTL_SERIALIZED_ACCEPT=1

Any clues?  Any other debugging steps I should take?  I can't keeo the
server in this state for much longer.  Need to reset it.

-Rasmus

Re: Latest CVS httpd hangs

Posted by Dean Gaudet <dg...@arctic.org>.

On Sat, 31 Jan 1998, Dean Gaudet wrote:

> It's really not a good idea for modules to call exit().  We need to export
> clean_child_exit() and you should use that instead.  I'll commit that with
> an MMN bump.

There's already child_terminate(request_rec *r) which is meant to
terminate the child at the end of the request.  So you could longjmp out
of your signal handler to your top level send routine, and bail at that
point.  I'll leave clean_child_exit() alone for now. 

Dean



Re: Latest CVS httpd hangs

Posted by Rasmus Lerdorf <ra...@lerdorf.on.ca>.
> It's really not a good idea for modules to call exit().

Ack.  I am well aware of that.  I missed that.

-Rasmus


Re: Latest CVS httpd hangs

Posted by Dean Gaudet <dg...@arctic.org>.

On Sat, 31 Jan 1998, Rasmus Lerdorf wrote:

> > And looking at child_main, the top of the main loop:
> > 
> >         /*
> >          * (Re)initialize this child to a pre-connection state.
> >          */
> > 
> >         kill_timeout(0);        /* Cancel any outstanding alarms. */
> >         timeout_req = NULL;     /* No request in progress */
> >         current_conn = NULL;
> > 
> >         clear_pool(ptrans);
> > 
> > If you have any registered cleanup which plays with timeouts (doing
> > block_alarms()/unblock_alarms() is OK) then it could cause trouble.
> 
> Well, I do have a timeout feature to guard against someone tossing an
> infinite loop into a PHP script and thus spinning the server forever.  I
> use an itimer though and thus a SIGPROF.  I didn't think that would
> interfere.  Here is the relevant code:

Ah.  Well, does it ever get triggered on your server? 

It's really not a good idea for modules to call exit().  We need to export
clean_child_exit() and you should use that instead.  I'll commit that with
an MMN bump.

When do you call php3_unset_timeout?

Could you try with this timer disabled, and revert my patch?

Dean


Re: Latest CVS httpd hangs

Posted by Rasmus Lerdorf <ra...@lerdorf.on.ca>.
> And looking at child_main, the top of the main loop:
> 
>         /*
>          * (Re)initialize this child to a pre-connection state.
>          */
> 
>         kill_timeout(0);        /* Cancel any outstanding alarms. */
>         timeout_req = NULL;     /* No request in progress */
>         current_conn = NULL;
> 
>         clear_pool(ptrans);
> 
> If you have any registered cleanup which plays with timeouts (doing
> block_alarms()/unblock_alarms() is OK) then it could cause trouble.

Well, I do have a timeout feature to guard against someone tossing an
infinite loop into a PHP script and thus spinning the server forever.  I
use an itimer though and thus a SIGPROF.  I didn't think that would
interfere.  Here is the relevant code:

static void php3_timeout(int dummy)
{
    TLS_VARS;

    if (!GLOBAL(shutdown_requested)) {
        php3_error(E_ERROR,"Maximum execution time of %d seconds
exceeded",php3_ini.max_execution_time);
        /* Now, schedule another alarm.  If we're stuck in a code portion
that will not go through
         * phplex() or if the parser is broken, end the process
ungracefully
         */
        php3_set_timeout(3);  /* allow 3 seconds for shutdown... */
    } else { /* we're here for a second time.  exit ungracefully */
        exit(1);
    }
}

static void php3_set_timeout(long seconds)
{
    struct itimerval t_r;  /* timeout requested */
   
    t_r.it_value.tv_sec = seconds;
    t_r.it_value.tv_usec=t_r.it_interval.tv_sec=t_r.it_interval.tv_usec=0;

    setitimer(ITIMER_PROF, &t_r, NULL);
    signal(SIGPROF, php3_timeout);
}


static void php3_unset_timeout(void)
{
    struct itimerval no_timeout;

    no_timeout.it_value.tv_sec = no_timeout.it_value.tv_usec = 0;

    setitimer(ITIMER_PROF, &no_timeout, NULL);
}


Re: Latest CVS httpd hangs

Posted by Dean Gaudet <dg...@arctic.org>.
On Sat, 31 Jan 1998, Dean Gaudet wrote:

> On Sat, 31 Jan 1998, Dean Gaudet wrote:
> 
> > mod_php3 doesn't do any fork/exec tricks does it? 
> 
> And doesn't have a signal handler which longjmps out of the accept
> critical region or anything like that, right? 

And looking at child_main, the top of the main loop:

        /*
         * (Re)initialize this child to a pre-connection state.
         */

        kill_timeout(0);        /* Cancel any outstanding alarms. */
        timeout_req = NULL;     /* No request in progress */
        current_conn = NULL;

        clear_pool(ptrans);

If you have any registered cleanup which plays with timeouts (doing
block_alarms()/unblock_alarms() is OK) then it could cause trouble.

Dean


Re: Latest CVS httpd hangs

Posted by Dean Gaudet <dg...@arctic.org>.

On Sat, 31 Jan 1998, Dean Gaudet wrote:

> mod_php3 doesn't do any fork/exec tricks does it? 

And doesn't have a signal handler which longjmps out of the accept
critical region or anything like that, right? 

Dean


Re: Latest CVS httpd hangs

Posted by Rasmus Lerdorf <ra...@lerdorf.on.ca>.
> Er hang on ... I thought you had fcntl locking though.  And my patch was
> even more wrong... which I'm sure you figured out 'cause it wouldn't
> compile.  But here it is fixed for completeness, and I even compiled it
> this time :)

Well, yeah, I fixed your patch.  

Yes, I do have fcntl locking.  But I am only forcing fcntl locking because
pthread locking doesn't work at all when I compile in my module.

> If find it really odd that you need this particular patch period.  Because
> solaris has used fcntl locking forever...

Seems odd to me too.

> mod_php3 doesn't do any fork/exec tricks does it? 

The closest thing is a popen() in a tag like: <? system("ls") ?>.

-Rasmus


Re: Latest CVS httpd hangs

Posted by Dean Gaudet <dg...@arctic.org>.
On Sat, 31 Jan 1998, Rasmus Lerdorf wrote:

> > > +static void accept_mutex_child_cleanup(void *foo)
> > > +{
> > > +    if (accept_mutex != (void *)(caddr_t)-1
> > > +	&& have_accept_mutex) {
> > 
> > should be:
> >        if (lock_fd != -1 && have_accept_mutex) {
> > 
> > like I said, uncompiled. 
> 
> Ok, I have been running a server with this patch for a couple of days now.
> No problems so far.  Another server without the patch has hung twice in
> the same timeframe.
> 
> I'd still love to figure out why linking in mod_php3 completely messes up
> the pthread locking though.

Er hang on ... I thought you had fcntl locking though.  And my patch was
even more wrong... which I'm sure you figured out 'cause it wouldn't
compile.  But here it is fixed for completeness, and I even compiled it
this time :)

If find it really odd that you need this particular patch period.  Because
solaris has used fcntl locking forever...

mod_php3 doesn't do any fork/exec tricks does it? 

Dean

Index: main/http_main.c
===================================================================
RCS file: /export/home/cvs/apache-1.3/src/main/http_main.c,v
retrieving revision 1.279
diff -u -r1.279 http_main.c
--- http_main.c	1998/01/31 14:54:20	1.279
+++ http_main.c	1998/01/31 21:46:20
@@ -608,8 +608,26 @@
 static struct flock unlock_it;
 
 static int lock_fd = -1;
+static int have_accept_mutex;
+static sigset_t accept_block_mask;
+static sigset_t accept_previous_mask;
+
+static void accept_mutex_child_cleanup(void *foo)
+{
+    int ret;
+
+    if (lock_fd != -1 && have_accept_mutex) {
+	while ((ret = fcntl(lock_fd, F_SETLKW, &unlock_it)) < 0 && errno == EINTR) {
+	    /* nop */
+	}
+    }
+}
+
+static void accept_mutex_child_init(pool *p)
+{
+    register_cleanup(p, NULL, accept_mutex_child_cleanup, null_cleanup);
+}
 
-#define accept_mutex_child_init(x)
 
 /*
  * Initialize mutex lock.
@@ -637,12 +655,21 @@
 	exit(1);
     }
     unlink(lock_fname);
+
+    sigfillset(&accept_block_mask);
+    sigdelset(&accept_block_mask, SIGHUP);
+    sigdelset(&accept_block_mask, SIGTERM);
+    sigdelset(&accept_block_mask, SIGUSR1);
 }
 
 static void accept_mutex_on(void)
 {
     int ret;
 
+    if (sigprocmask(SIG_BLOCK, &accept_block_mask, &accept_previous_mask)) {
+	perror("sigprocmask(SIG_BLOCK)");
+	exit (1);
+    }
     while ((ret = fcntl(lock_fd, F_SETLKW, &lock_it)) < 0 && errno == EINTR) {
 	/* nop */
     }
@@ -654,6 +681,7 @@
 		    "your lock file on a local disk!");
 	exit(1);
     }
+    have_accept_mutex = 1;
 }
 
 static void accept_mutex_off(void)
@@ -669,6 +697,11 @@
 		    "Perhaps you need to use the LockFile directive to place "
 		    "your lock file on a local disk!");
 	exit(1);
+    }
+    have_accept_mutex = 0;
+    if (sigprocmask(SIG_SETMASK, &accept_previous_mask, NULL)) {
+	perror("sigprocmask(SIG_SETMASK)");
+	exit (1);
     }
 }
 




Re: Latest CVS httpd hangs

Posted by Rasmus Lerdorf <ra...@lerdorf.on.ca>.
> > +static void accept_mutex_child_cleanup(void *foo)
> > +{
> > +    if (accept_mutex != (void *)(caddr_t)-1
> > +	&& have_accept_mutex) {
> 
> should be:
>        if (lock_fd != -1 && have_accept_mutex) {
> 
> like I said, uncompiled. 

Ok, I have been running a server with this patch for a couple of days now.
No problems so far.  Another server without the patch has hung twice in
the same timeframe.

I'd still love to figure out why linking in mod_php3 completely messes up
the pthread locking though.

-Rasmus


Re: Latest CVS httpd hangs

Posted by Dean Gaudet <dg...@arctic.org>.

On Wed, 28 Jan 1998, Dean Gaudet wrote:

> +static void accept_mutex_child_cleanup(void *foo)
> +{
> +    if (accept_mutex != (void *)(caddr_t)-1
> +	&& have_accept_mutex) {

should be:
       if (lock_fd != -1 && have_accept_mutex) {

like I said, uncompiled. 

Dean



Re: Latest CVS httpd hangs

Posted by Dean Gaudet <dg...@arctic.org>.

On Wed, 28 Jan 1998 rasmus@bellglobal.com wrote:

> A truss on the parent process shows:
> 
> poll(0xEFFFDAA0, 0, 1000)                       = 0
> time()                                          = 886017547
> lseek(15, 0, SEEK_SET)                          = 0
> read(15, "\00101\0\0\0\0\00101\0\0".., 1284)    = 1284
> waitid(P_ALL, 0, 0xEFFFFA20, WEXITED|WTRAPPED|WNOHANG) = 0

What the heck is that read() all about?  Is it maybe using a scoreboard
file for some wacky reason?  Do you have lsof to find out what descriptor
15 is?  Or maybe just truss the boot process and see if you can find it
that way. 

> The child processes are all sitting in:
> 
> fcntl(14, F_SETLKW, 0x000FA69C) (sleeping...)

It looks to me like solaris somehow let a child die while holding the lock
and didn't release the lock... 

You could try this completely untested uncompiled patch...

Dean

Index: http_main.c
===================================================================
RCS file: /export/home/cvs/apache-1.3/src/main/http_main.c,v
retrieving revision 1.277
diff -u -r1.277 http_main.c
--- http_main.c	1998/01/28 10:00:30	1.277
+++ http_main.c	1998/01/28 20:16:39
@@ -605,8 +605,23 @@
 static struct flock unlock_it;
 
 static int lock_fd = -1;
+static int have_accept_mutex;
+static sigset_t accept_block_mask;
+static sigset_t accept_previous_mask;
+
+static void accept_mutex_child_cleanup(void *foo)
+{
+    if (accept_mutex != (void *)(caddr_t)-1
+	&& have_accept_mutex) {
+	pthread_mutex_unlock(accept_mutex);
+    }
+}
+
+static void accept_mutex_child_init(pool *p)
+{
+    register_cleanup(p, NULL, accept_mutex_child_cleanup, null_cleanup);
+}
 
-#define accept_mutex_child_init(x)
 
 /*
  * Initialize mutex lock.
@@ -634,12 +649,21 @@
 	exit(1);
     }
     unlink(lock_fname);
+
+    sigfillset(&accept_block_mask);
+    sigdelset(&accept_block_mask, SIGHUP);
+    sigdelset(&accept_block_mask, SIGTERM);
+    sigdelset(&accept_block_mask, SIGUSR1);
 }
 
 static void accept_mutex_on(void)
 {
     int ret;
 
+    if (sigprocmask(SIG_BLOCK, &accept_block_mask, &accept_previous_mask)) {
+	perror("sigprocmask(SIG_BLOCK)");
+	exit (1);
+    }
     while ((ret = fcntl(lock_fd, F_SETLKW, &lock_it)) < 0 && errno == EINTR) {
 	/* nop */
     }
@@ -651,6 +675,7 @@
 		    "your lock file on a local disk!");
 	exit(1);
     }
+    have_accept_mutex = 1;
 }
 
 static void accept_mutex_off(void)
@@ -666,6 +691,11 @@
 		    "Perhaps you need to use the LockFile directive to place "
 		    "your lock file on a local disk!");
 	exit(1);
+    }
+    have_accept_mutex = 0;
+    if (sigprocmask(SIG_SETMASK, &accept_previous_mask, NULL)) {
+	perror("sigprocmask(SIG_SETMASK)");
+	exit (1);
     }
 }
 


Re: Latest CVS httpd hangs

Posted by ra...@bellglobal.com.
Some more info on this one.  A call trace on the child processes looks
like this:

> /usr/proc/bin/pstack 4858
4858:   bin/httpd -f /u/www/conf/httpd.conf
 ef5b6fcc fcntl    (e, 7, fa69c)
 ef5b6fcc _libc_fcntl (e, 7, fa69c, ef5b8878, 0, 0) + 8
 ef74aba8 s_fcntl  (e, 7, fa69c, fa400, 40, effff81c) + 164
 00041958 accept_mutex_on (4, 1, 0, 5, effff968, 1a) + 1c
 00044b20 child_main (1a, 43740, 43400, 108, 0, 0) + 260
 00045270 make_child (129a58, 1a, 34cf646b, 0, 3e8, 0) + 168
 000456f4 perform_idle_server_maintenance (0, effffb84, 0, 129a58, be8b0,
b9cb0) + 31c
 00045c58 standalone_main (3, effffcac, e9400, 0, 0, 0) + 4cc
 00046304 main     (3, effffcac, effffcbc, 1250d0, 1, 0) + 458
 00020598 _start   (0, 0, 0, 0, 0, 0) + 5c

And a trace on the parent process shows:

> /usr/proc/bin/pstack 23685
23685:  bin/httpd -f /u/www/conf/httpd.conf
 ef5b782c poll     (efffdaa0, 0, 3e8)
 ef5d3994 select   (0, ef612cc4, 0, 0, 3e8, efffdaa0) + 288
 00043630 wait_or_timeout (effffb84, effffb84, 0, 129a58, be8b0, b9cb0) + cc
 00045b04 standalone_main (3, effffcac, e9400, 0, 0, 0) + 378
 00046304 main     (3, effffcac, effffcbc, 1250d0, 1, 0) + 458
 00020598 _start   (0, 0, 0, 0, 0, 0) + 5c

Mutex problems?

-Rasmus