You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Cliff Woolley <cl...@yahoo.com> on 2001/08/08 09:41:59 UTC

Re: Currently known issues with 2.0.23

It looks like there might be a problem with _un_graceful restarts on
threaded, namely that the whole server just vaporizes.  It doesn't do a
clean shutdown because the pidfile is left behind, but it's gone
nevertheless.  I've yet to find evidence of a segfault happening, but that
remains a possibility.  I just checked and it works fine on prefork (as
expected).  Will investigate further tomorrow.

For now, it's 3:45am in my part of the world... I need to get to bed.  =-)

--Cliff

--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA

Re: Currently known issues with 2.0.23

Posted by Greg Ames <gr...@remulak.net>.

Greg Ames wrote:

> Program received signal SIGSEGV, Segmentation fault.
> apr_pool_clear (a=0x0) at apr_pools.c:869
> 869         while (a->sub_pools) {
> (gdb) bt
> #0  apr_pool_clear (a=0x0) at apr_pools.c:869
> #1  0x08090320 in apr_pool_destroy (a=0x0) at apr_pools.c:920
> #2  0x4027d2ff in cgid_maint (reason=0, data=0x811a724, status=15)
>     at mod_cgid.c:238
> #3  0x0808dd69 in apr_proc_other_child_check () at otherchild.c:208
> #4  0x0806ccfc in ap_reclaim_child_processes (terminate=1) at
> mpm_common.c:175
> #5  0x08064553 in ap_mpm_run (_pconf=0x80c443c, plog=0x80e453c,
> s=0x80c4984)
>     at threaded.c:1329
> #6  0x08068e50 in main (argc=1, argv=0xbffff734) at main.c:427
> #7  0x4013f0de in __libc_start_main () from /lib/libc.so.6
> 
> I've taken mod_cgid out of my build for now (didn't realize I had it
> actually) and see what happens.  

OK, graceless restarts are working again in threaded with mod_cgid out
of the picture.  

Greg

Re: Currently known issues with 2.0.23

Posted by Greg Ames <gr...@remulak.net>.

Greg Ames wrote:
> 
> Cliff Woolley wrote:
> >
> > It looks like there might be a problem with _un_graceful restarts on
> > threaded, namely that the whole server just vaporizes.  

> After adding many ap_log_errors, looks like things go normally until we
> hit the code in ap_mpm_run responsible for graceless restart.  Then it
> sure looks like the parent catches a SIGTERM that it intended to send
> the children in ap_start_shutdown (the normal SIGTERM handler).

confirmed w/gdb:

1310        wake_up_and_die();
(gdb) n
1312        if (is_graceful) {
(gdb) p is_graceful
$3 = 0
(gdb) n
1326            if (unixd_killpg(getpgrp(), SIGTERM) < 0) {
(gdb) s
0x4000d500 in _dl_runtime_resolve () at dl-runtime.c:203
203     dl-runtime.c: No such file or directory.
        in dl-runtime.c
(gdb) n

Program received signal SIGTERM, Terminated.
0x4014fd11 in kill () from /lib/libc.so.6

...but then this cleanup stuff for mod_cgid seg faults

(gdb) finish
Run till exit from #0  0x4014fb46 in killpg () from /lib/libc.so.6
0x08064518 in ap_mpm_run (_pconf=0x80c443c, plog=0x80e453c, s=0x80c4984)
    at threaded.c:1326
1326            if (unixd_killpg(getpgrp(), SIGTERM) < 0) {
(gdb) n
1329            ap_reclaim_child_processes(1);          /* Start with
SIGTERM
(gdb) p shutdown_pending
$4 = 1                  <===  gets set in ap_start_shutdown, wasn't on
before              
(gdb) n
 
Program received signal SIGSEGV, Segmentation fault.
apr_pool_clear (a=0x0) at apr_pools.c:869
869         while (a->sub_pools) {
(gdb) bt
#0  apr_pool_clear (a=0x0) at apr_pools.c:869
#1  0x08090320 in apr_pool_destroy (a=0x0) at apr_pools.c:920
#2  0x4027d2ff in cgid_maint (reason=0, data=0x811a724, status=15)
    at mod_cgid.c:238
#3  0x0808dd69 in apr_proc_other_child_check () at otherchild.c:208
#4  0x0806ccfc in ap_reclaim_child_processes (terminate=1) at
mpm_common.c:175
#5  0x08064553 in ap_mpm_run (_pconf=0x80c443c, plog=0x80e453c,
s=0x80c4984)
    at threaded.c:1329
#6  0x08068e50 in main (argc=1, argv=0xbffff734) at main.c:427
#7  0x4013f0de in __libc_start_main () from /lib/libc.so.6

I've taken mod_cgid out of my build for now (didn't realize I had it
actually) and see what happens.  Catching the SIGTERM in the parent
can't be good though.

Greg

Re: Currently known issues with 2.0.23

Posted by Greg Ames <gr...@remulak.net>.

Cliff Woolley wrote:
> 
> It looks like there might be a problem with _un_graceful restarts on
> threaded, namely that the whole server just vaporizes.  It doesn't do a
> clean shutdown because the pidfile is left behind, but it's gone
> nevertheless.  I've yet to find evidence of a segfault happening, but that
> remains a possibility.  I just checked and it works fine on prefork (as
> expected).  Will investigate further tomorrow.

You're right - I get exactly the same sympton on "apachectl restart"
now.  It sure isn't hanging - it vanishes without a trace. hmmmm, I've
been the only one breaking^H^H^H^H^H^H^H^H putting good stuff into
threaded lately, so I must have done it somehow.

After adding many ap_log_errors, looks like things go normally until we
hit the code in ap_mpm_run responsible for graceless restart.  Then it
sure looks like the parent catches a SIGTERM that it intended to send
the children in ap_start_shutdown (the normal SIGTERM handler). 

I'm going to try strace on the parent (thanks Jeff!), and/or backing
out/further scrutinizing my last patch to threaded. 

Greg

Re: Currently known issues with 2.0.23

Posted by Jeff Trawick <tr...@attglobal.net>.

Greg Ames <gr...@remulak.net> writes:

> Jeff Trawick wrote:
> > 
> > Cliff Woolley <cl...@yahoo.com> writes:
> > 
> > > It looks like there might be a problem with _un_graceful restarts on
> > > threaded, namely that the whole server just vaporizes.  It doesn't do a
> > > clean shutdown because the pidfile is left behind, but it's gone
> > > nevertheless.  I've yet to find evidence of a segfault happening, but that
> > 
> > Unlike the child/server processes, the parent process can't rely on
> > the parent (itself) to write the log message for the segfault.
> > 
> > Maybe sig_coredump() needs to call ap_log_error() iff called in the
> > parent.
> 
> Something like that would clearly help.
> 
> But, just to be difficult, what if ap_log_error() seg faults?  We don't
> want recursive seg faults.  Maybe some kind of "I'm trying to log a
> parent seg fault" footprint in sig_coredump to stop the recursion?

AFAIK, it shouldn't be a problem.  Do the ap_log_error() call after we
remove our handler ("apr_signal(sig, SIG_DFL)"), and of course ?GOVRR
from mainline as well as from the handler while testing.

-- 
Jeff Trawick | trawick@attglobal.net | PGP public key at web site:
       http://www.geocities.com/SiliconValley/Park/9289/
             Born in Roswell... married an alien...

Re: Currently known issues with 2.0.23

Posted by Greg Ames <gr...@remulak.net>.

Jeff Trawick wrote:
> 
> Cliff Woolley <cl...@yahoo.com> writes:
> 
> > It looks like there might be a problem with _un_graceful restarts on
> > threaded, namely that the whole server just vaporizes.  It doesn't do a
> > clean shutdown because the pidfile is left behind, but it's gone
> > nevertheless.  I've yet to find evidence of a segfault happening, but that
> 
> Unlike the child/server processes, the parent process can't rely on
> the parent (itself) to write the log message for the segfault.
> 
> Maybe sig_coredump() needs to call ap_log_error() iff called in the
> parent.

Something like that would clearly help.

But, just to be difficult, what if ap_log_error() seg faults?  We don't
want recursive seg faults.  Maybe some kind of "I'm trying to log a
parent seg fault" footprint in sig_coredump to stop the recursion?

hmmmm, I'll play with it

Greg

Re: Currently known issues with 2.0.23

Posted by Jeff Trawick <tr...@attglobal.net>.

Cliff Woolley <cl...@yahoo.com> writes:

> It looks like there might be a problem with _un_graceful restarts on
> threaded, namely that the whole server just vaporizes.  It doesn't do a
> clean shutdown because the pidfile is left behind, but it's gone
> nevertheless.  I've yet to find evidence of a segfault happening, but that

Unlike the child/server processes, the parent process can't rely on
the parent (itself) to write the log message for the segfault.

Maybe sig_coredump() needs to call ap_log_error() iff called in the
parent.

-- 
Jeff Trawick | trawick@attglobal.net | PGP public key at web site:
       http://www.geocities.com/SiliconValley/Park/9289/
             Born in Roswell... married an alien...

Re: Currently known issues with 2.0.23

Posted by Cliff Woolley <cl...@yahoo.com>.

On 9 Aug 2001, Jeff Trawick wrote:

> Greg Ames and I have been playing with this.  See a patch I just
> committed to mod_cgid.  [Apparently] since 8/6/2001 when dougm fixed a
> leak problem, ungraceful restart was segfaulting in the parent
> process.  If you retag this as 2.0.23 just wipe the CHANGES entry
> since the problem appeared and went away in 2.0.23.

I tested it and it works great for me, so I went ahead and pulled it in to
the 2.0.23 tag.  Thanks, guys.

PS: I'd originally pulled the CHANGES entry along with me, but then I saw
your note here so I went back and stripped out the CHANGES entry I'd
committed to the 2.0.23-branch and the one you committed to 2.0.24-dev.

--Cliff

--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA

Re: Currently known issues with 2.0.23

Posted by Jeff Trawick <tr...@attglobal.net>.

Cliff Woolley <cl...@yahoo.com> writes:

> It looks like there might be a problem with _un_graceful restarts on
> threaded, namely that the whole server just vaporizes.  It doesn't do a
> clean shutdown because the pidfile is left behind, but it's gone
> nevertheless.  I've yet to find evidence of a segfault happening, but that
> remains a possibility.  I just checked and it works fine on prefork (as
> expected).  Will investigate further tomorrow.

Greg Ames and I have been playing with this.  See a patch I just
committed to mod_cgid.  [Apparently] since 8/6/2001 when dougm fixed a
leak problem, ungraceful restart was segfaulting in the parent
process.  If you retag this as 2.0.23 just wipe the CHANGES entry
since the problem appeared and went away in 2.0.23.

-- 
Jeff Trawick | trawick@attglobal.net | PGP public key at web site:
       http://www.geocities.com/SiliconValley/Park/9289/
             Born in Roswell... married an alien...