You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Cliff Woolley <cl...@yahoo.com> on 2001/08/08 09:41:59 UTC
Re: Currently known issues with 2.0.23
It looks like there might be a problem with _un_graceful restarts on
threaded, namely that the whole server just vaporizes. It doesn't do a
clean shutdown because the pidfile is left behind, but it's gone
nevertheless. I've yet to find evidence of a segfault happening, but that
remains a possibility. I just checked and it works fine on prefork (as
expected). Will investigate further tomorrow.
For now, it's 3:45am in my part of the world... I need to get to bed. =-)
--Cliff
--------------------------------------------------------------
Cliff Woolley
cliffwoolley@yahoo.com
Charlottesville, VA
Re: Currently known issues with 2.0.23
Posted by Greg Ames <gr...@remulak.net>.
Greg Ames wrote:
> Program received signal SIGSEGV, Segmentation fault.
> apr_pool_clear (a=0x0) at apr_pools.c:869
> 869 while (a->sub_pools) {
> (gdb) bt
> #0 apr_pool_clear (a=0x0) at apr_pools.c:869
> #1 0x08090320 in apr_pool_destroy (a=0x0) at apr_pools.c:920
> #2 0x4027d2ff in cgid_maint (reason=0, data=0x811a724, status=15)
> at mod_cgid.c:238
> #3 0x0808dd69 in apr_proc_other_child_check () at otherchild.c:208
> #4 0x0806ccfc in ap_reclaim_child_processes (terminate=1) at
> mpm_common.c:175
> #5 0x08064553 in ap_mpm_run (_pconf=0x80c443c, plog=0x80e453c,
> s=0x80c4984)
> at threaded.c:1329
> #6 0x08068e50 in main (argc=1, argv=0xbffff734) at main.c:427
> #7 0x4013f0de in __libc_start_main () from /lib/libc.so.6
>
> I've taken mod_cgid out of my build for now (didn't realize I had it
> actually) and see what happens.
OK, graceless restarts are working again in threaded with mod_cgid out
of the picture.
Greg
Re: Currently known issues with 2.0.23
Posted by Greg Ames <gr...@remulak.net>.
Greg Ames wrote:
>
> Cliff Woolley wrote:
> >
> > It looks like there might be a problem with _un_graceful restarts on
> > threaded, namely that the whole server just vaporizes.
> After adding many ap_log_errors, looks like things go normally until we
> hit the code in ap_mpm_run responsible for graceless restart. Then it
> sure looks like the parent catches a SIGTERM that it intended to send
> the children in ap_start_shutdown (the normal SIGTERM handler).
confirmed w/gdb:
1310 wake_up_and_die();
(gdb) n
1312 if (is_graceful) {
(gdb) p is_graceful
$3 = 0
(gdb) n
1326 if (unixd_killpg(getpgrp(), SIGTERM) < 0) {
(gdb) s
0x4000d500 in _dl_runtime_resolve () at dl-runtime.c:203
203 dl-runtime.c: No such file or directory.
in dl-runtime.c
(gdb) n
Program received signal SIGTERM, Terminated.
0x4014fd11 in kill () from /lib/libc.so.6
...but then this cleanup stuff for mod_cgid seg faults
(gdb) finish
Run till exit from #0 0x4014fb46 in killpg () from /lib/libc.so.6
0x08064518 in ap_mpm_run (_pconf=0x80c443c, plog=0x80e453c, s=0x80c4984)
at threaded.c:1326
1326 if (unixd_killpg(getpgrp(), SIGTERM) < 0) {
(gdb) n
1329 ap_reclaim_child_processes(1); /* Start with
SIGTERM
(gdb) p shutdown_pending
$4 = 1 <=== gets set in ap_start_shutdown, wasn't on
before
(gdb) n
Program received signal SIGSEGV, Segmentation fault.
apr_pool_clear (a=0x0) at apr_pools.c:869
869 while (a->sub_pools) {
(gdb) bt
#0 apr_pool_clear (a=0x0) at apr_pools.c:869
#1 0x08090320 in apr_pool_destroy (a=0x0) at apr_pools.c:920
#2 0x4027d2ff in cgid_maint (reason=0, data=0x811a724, status=15)
at mod_cgid.c:238
#3 0x0808dd69 in apr_proc_other_child_check () at otherchild.c:208
#4 0x0806ccfc in ap_reclaim_child_processes (terminate=1) at
mpm_common.c:175
#5 0x08064553 in ap_mpm_run (_pconf=0x80c443c, plog=0x80e453c,
s=0x80c4984)
at threaded.c:1329
#6 0x08068e50 in main (argc=1, argv=0xbffff734) at main.c:427
#7 0x4013f0de in __libc_start_main () from /lib/libc.so.6
I've taken mod_cgid out of my build for now (didn't realize I had it
actually) and see what happens. Catching the SIGTERM in the parent
can't be good though.
Greg
Re: Currently known issues with 2.0.23
Posted by Greg Ames <gr...@remulak.net>.
Cliff Woolley wrote:
>
> It looks like there might be a problem with _un_graceful restarts on
> threaded, namely that the whole server just vaporizes. It doesn't do a
> clean shutdown because the pidfile is left behind, but it's gone
> nevertheless. I've yet to find evidence of a segfault happening, but that
> remains a possibility. I just checked and it works fine on prefork (as
> expected). Will investigate further tomorrow.
You're right - I get exactly the same sympton on "apachectl restart"
now. It sure isn't hanging - it vanishes without a trace. hmmmm, I've
been the only one breaking^H^H^H^H^H^H^H^H putting good stuff into
threaded lately, so I must have done it somehow.
After adding many ap_log_errors, looks like things go normally until we
hit the code in ap_mpm_run responsible for graceless restart. Then it
sure looks like the parent catches a SIGTERM that it intended to send
the children in ap_start_shutdown (the normal SIGTERM handler).
I'm going to try strace on the parent (thanks Jeff!), and/or backing
out/further scrutinizing my last patch to threaded.
Greg
Re: Currently known issues with 2.0.23
Posted by Jeff Trawick <tr...@attglobal.net>.
Greg Ames <gr...@remulak.net> writes:
> Jeff Trawick wrote:
> >
> > Cliff Woolley <cl...@yahoo.com> writes:
> >
> > > It looks like there might be a problem with _un_graceful restarts on
> > > threaded, namely that the whole server just vaporizes. It doesn't do a
> > > clean shutdown because the pidfile is left behind, but it's gone
> > > nevertheless. I've yet to find evidence of a segfault happening, but that
> >
> > Unlike the child/server processes, the parent process can't rely on
> > the parent (itself) to write the log message for the segfault.
> >
> > Maybe sig_coredump() needs to call ap_log_error() iff called in the
> > parent.
>
> Something like that would clearly help.
>
> But, just to be difficult, what if ap_log_error() seg faults? We don't
> want recursive seg faults. Maybe some kind of "I'm trying to log a
> parent seg fault" footprint in sig_coredump to stop the recursion?
AFAIK, it shouldn't be a problem. Do the ap_log_error() call after we
remove our handler ("apr_signal(sig, SIG_DFL)"), and of course ?GOVRR
from mainline as well as from the handler while testing.
--
Jeff Trawick | trawick@attglobal.net | PGP public key at web site:
http://www.geocities.com/SiliconValley/Park/9289/
Born in Roswell... married an alien...
Re: Currently known issues with 2.0.23
Posted by Greg Ames <gr...@remulak.net>.
Jeff Trawick wrote:
>
> Cliff Woolley <cl...@yahoo.com> writes:
>
> > It looks like there might be a problem with _un_graceful restarts on
> > threaded, namely that the whole server just vaporizes. It doesn't do a
> > clean shutdown because the pidfile is left behind, but it's gone
> > nevertheless. I've yet to find evidence of a segfault happening, but that
>
> Unlike the child/server processes, the parent process can't rely on
> the parent (itself) to write the log message for the segfault.
>
> Maybe sig_coredump() needs to call ap_log_error() iff called in the
> parent.
Something like that would clearly help.
But, just to be difficult, what if ap_log_error() seg faults? We don't
want recursive seg faults. Maybe some kind of "I'm trying to log a
parent seg fault" footprint in sig_coredump to stop the recursion?
hmmmm, I'll play with it
Greg
Re: Currently known issues with 2.0.23
Posted by Jeff Trawick <tr...@attglobal.net>.
Cliff Woolley <cl...@yahoo.com> writes:
> It looks like there might be a problem with _un_graceful restarts on
> threaded, namely that the whole server just vaporizes. It doesn't do a
> clean shutdown because the pidfile is left behind, but it's gone
> nevertheless. I've yet to find evidence of a segfault happening, but that
Unlike the child/server processes, the parent process can't rely on
the parent (itself) to write the log message for the segfault.
Maybe sig_coredump() needs to call ap_log_error() iff called in the
parent.
--
Jeff Trawick | trawick@attglobal.net | PGP public key at web site:
http://www.geocities.com/SiliconValley/Park/9289/
Born in Roswell... married an alien...
Re: Currently known issues with 2.0.23
Posted by Cliff Woolley <cl...@yahoo.com>.
On 9 Aug 2001, Jeff Trawick wrote:
> Greg Ames and I have been playing with this. See a patch I just
> committed to mod_cgid. [Apparently] since 8/6/2001 when dougm fixed a
> leak problem, ungraceful restart was segfaulting in the parent
> process. If you retag this as 2.0.23 just wipe the CHANGES entry
> since the problem appeared and went away in 2.0.23.
I tested it and it works great for me, so I went ahead and pulled it in to
the 2.0.23 tag. Thanks, guys.
PS: I'd originally pulled the CHANGES entry along with me, but then I saw
your note here so I went back and stripped out the CHANGES entry I'd
committed to the 2.0.23-branch and the one you committed to 2.0.24-dev.
--Cliff
--------------------------------------------------------------
Cliff Woolley
cliffwoolley@yahoo.com
Charlottesville, VA
Re: Currently known issues with 2.0.23
Posted by Jeff Trawick <tr...@attglobal.net>.
Cliff Woolley <cl...@yahoo.com> writes:
> It looks like there might be a problem with _un_graceful restarts on
> threaded, namely that the whole server just vaporizes. It doesn't do a
> clean shutdown because the pidfile is left behind, but it's gone
> nevertheless. I've yet to find evidence of a segfault happening, but that
> remains a possibility. I just checked and it works fine on prefork (as
> expected). Will investigate further tomorrow.
Greg Ames and I have been playing with this. See a patch I just
committed to mod_cgid. [Apparently] since 8/6/2001 when dougm fixed a
leak problem, ungraceful restart was segfaulting in the parent
process. If you retag this as 2.0.23 just wipe the CHANGES entry
since the problem appeared and went away in 2.0.23.
--
Jeff Trawick | trawick@attglobal.net | PGP public key at web site:
http://www.geocities.com/SiliconValley/Park/9289/
Born in Roswell... married an alien...