You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Ruediger Pluem <rp...@apache.org> on 2008/07/29 22:58:23 UTC

worker MPM on trunk does not shut down cleanly

I just noticed that on trunk the worker MPM on Linux no longer shuts down cleanly.
That means the parent process kills the child with SIGKILL which usually
should not happen. There is no such problem with 2.2.x. I was not able to obtain
a backtrace. The most usable seems to be an strace of the child process:

Process 32721 attached - interrupt to quit
1217364884.451615 read(7, "$", 1) = 1
1217364886.829260 tgkill(32721, 32748, SIGHUP) = 0
1217364886.829413 futex(0x818d880, 0x4 /* FUTEX_??? */, 1) = 25
1217364886.829562 tgkill(32721, 32748, SIG_0) = 0
1217364886.829696 select(0, NULL, NULL, NULL, {0, 500000}) = ? ERESTARTNOHAND (To be restarted)
1217364886.829902 --- SIGTERM (Terminated) @ 0 (0) ---
1217364886.830067 sigreturn() = ? (mask now [HUP INT QUIT USR1 ALRM STKFLT CHLD CONT TSTP TTIN TTOU URG XCPU XFSZ VTALRM 
PROF WINCH IO PWR])
1217364886.830248 tgkill(32721, 32748, SIGHUP) = 0
1217364886.830379 tgkill(32721, 32748, SIG_0) = 0
1217364886.830509 select(0, NULL, NULL, NULL, {0, 500000}) = 0 (Timeout)
1217364887.321620 munmap(0xb7094000, 8392704) = 0
1217364887.321788 munmap(0xaa07a000, 8392704) = 0
1217364887.321934 munmap(0xb6893000, 8392704) = 0
1217364887.322077 munmap(0xb6092000, 8392704) = 0
1217364887.322220 munmap(0xb5891000, 8392704) = 0
1217364887.322367 munmap(0xb5090000, 8392704) = 0
1217364887.322509 munmap(0xb488f000, 8392704) = 0
1217364887.322651 munmap(0xb408e000, 8392704) = 0
1217364887.322794 munmap(0xb388d000, 8392704) = 0
1217364887.322936 munmap(0xb308c000, 8392704) = 0
1217364887.323910 munmap(0xb288b000, 8392704) = 0
1217364887.323943 munmap(0xb208a000, 8392704) = 0
1217364887.323984 munmap(0xb1889000, 8392704) = 0
1217364887.324014 munmap(0xb1088000, 8392704) = 0
1217364887.324043 munmap(0xb0887000, 8392704) = 0
1217364887.324072 munmap(0xb0086000, 8392704) = 0
1217364887.324101 munmap(0xaf885000, 8392704) = 0
1217364887.324130 munmap(0xaf084000, 8392704) = 0
1217364887.324158 munmap(0xae883000, 8392704) = 0
1217364887.324187 munmap(0xae082000, 8392704) = 0
1217364887.324217 munmap(0xad881000, 8392704) = 0
1217364887.324245 munmap(0xad080000, 8392704) = 0
1217364887.324278 munmap(0xac87f000, 8392704) = 0
1217364887.324329 futex(0x819176c, FUTEX_WAIT, 2, NULL) = -1 EINTR (Interrupted system call)
1217364890.157855 --- SIGTERM (Terminated) @ 0 (0) ---
1217364890.158009 sigreturn() = ? (mask now [HUP INT QUIT USR1 ALRM STKFLT CHLD CONT TSTP TTIN TTOU URG XCPU XFSZ VTALRM 
PROF WINCH IO PWR])
1217364890.158189 futex(0x819176c, FUTEX_WAIT, 2, NULL) = -1 EINTR (Interrupted system call)
1217364892.145923 --- SIGTERM (Terminated) @ 0 (0) ---
1217364892.146079 sigreturn() = ? (mask now [HUP INT QUIT USR1 ALRM STKFLT CHLD CONT TSTP TTIN TTOU URG XCPU XFSZ VTALRM 
PROF WINCH IO PWR])
1217364892.146261 futex(0x819176c, FUTEX_WAIT, 2, NULL) = -1 EINTR (Interrupted system call)
1217364894.133997 --- SIGTERM (Terminated) @ 0 (0) ---
1217364894.134155 sigreturn() = ? (mask now [HUP INT QUIT USR1 ALRM STKFLT CHLD CONT TSTP TTIN TTOU URG XCPU XFSZ VTALRM 
PROF WINCH IO PWR])
1217364894.134333 futex(0x819176c, FUTEX_WAIT, 2, NULL) = -1 EINTR (Interrupted system call)
1217364896.126425 +++ killed by SIGKILL +++
Process 32721 detached

So it seems that it wait on some futex indefinitely.
Can somebody reproduce this problem or has an idea why this happens?

Regards

Rüdiger

Re: worker MPM on trunk does not shut down cleanly

Posted by Jim Jagielski <ji...@jaguNET.com>.

On Aug 3, 2008, at 4:50 PM, William A. Rowe, Jr. wrote:

> Jim Jagielski wrote:
>> So does this mean that trunk is now based on a "broken" or
>> incompatible version of apr? Do we need to now break off
>> trunk to 2.4 and baseline APR 1.3 to allow trunk to now work
>> with an incompatible APR rev?
>
> There's no such thing as an incompatible APR revision.

There *shouldn't* be. There is. That's the problem :)

Re: worker MPM on trunk does not shut down cleanly

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.

Jim Jagielski wrote:
> 
> So does this mean that trunk is now based on a "broken" or
> incompatible version of apr? Do we need to now break off
> trunk to 2.4 and baseline APR 1.3 to allow trunk to now work
> with an incompatible APR rev?

There's no such thing as an incompatible APR revision.  Those would
be "bugs" that need to be reverted, and patches are welcome.  This
should be considered an "incompatible" version of APR w.r.t. those
features.

There will "have to" be a 2.4 (or 3.0) once we decide to adopt APR 2,
because there will be no binary or legacy compatibility thunks.  But
in the meantime, it's enough to recommend a particular version, e.g.
state that APR 1.3.3 is required for httpd 2.2.9 and later.  Older
modules will still load and should continue to behave as they had.

The reslist debate has bogged down dev@apr progress towards a 1.3.3
release, but until the debate is resolved and apr 1.2.x behavior is
restored, it's pointless to move forwards.  So the more eyes the
merrier (both Mladen and Bojan are asking for additional review!)

Re: worker MPM on trunk does not shut down cleanly

Posted by Jim Jagielski <ji...@jaguNET.com>.

On Aug 3, 2008, at 2:44 PM, Ruediger Pluem wrote:
>
> Nevertheless I think that the precleanup code in apr trunk and the  
> changes to the reslist
> in apr-util trunk are not backportable just because of the example  
> above: Code may break
> if you change an apr / apr-util 1.3 release under the hood. This  
> should not happen.
>

That's my point.

Re: worker MPM on trunk does not shut down cleanly

Posted by Mladen Turk <mt...@apache.org>.

Ruediger Pluem wrote:
> 
> 
> Nevertheless I think that the precleanup code in apr trunk and the 
> changes to the reslist
> in apr-util trunk are not backportable just because of the example 
> above: Code may break
> if you change an apr / apr-util 1.3 release under the hood. This should 
> not happen.
>

I agree with you. However, like you observed this particular
apr_reslist usage was simply presuming the reslist was not already
destroyed, by doing nasty tricks and registering additional cleanup
on top of existing one so that we can mark the structure
(worker->cp->pool = NULL) element before calling apr_reslist_destroy
so that destructor callback knows their child pools have been
already destroyed, so it doesn't destroy them twice, and still
allows connection pool mix...max maintenance where the child
pools has to be explicitly destroyed so that memory doesn't leak.
All that is a total mess and hard to follow. With pre_cleanup,
the code is simpler and straightforward: constructors creates,
desctructor destroys, and that's it :)

Anyhow, porting that to 1.3 would probably break (presumption)
of the 'under the hood' behavior (not the API itself).

Regards
-- 
^(TM)

Re: worker MPM on trunk does not shut down cleanly

Posted by Ruediger Pluem <rp...@apache.org>.

On 08/03/2008 06:28 PM, Jim Jagielski wrote:
> 
> On Aug 1, 2008, at 4:44 AM, Mladen Turk wrote:
> 
>> Ruediger Pluem wrote:
>>> Ok, this is caused by http://svn.apache.org/viewvc?rev=677505&view=rev
>>> This is the reslist pre_cleanup patch. I don't know why so far, but as
>>> I have a proxy configuration I suspect that it blocks on tearing down
>>> the proxy connection pools.
>>
>> Here is the fix for trunk.
>>
>> Index: proxy_util.c
>> ===================================================================
>> --- proxy_util.c        (revision 681621)
>> +++ proxy_util.c        (working copy)
>> @@ -1939,10 +1939,11 @@
>>                                 worker->hmax, worker->ttl,
>>                                 connection_constructor, 
>> connection_destructor,
>>                                 worker, worker->cp->pool);
>> -
>> +#if 0
>>         apr_pool_cleanup_register(worker->cp->pool, (void *)worker,
>>                                   conn_pool_cleanup,
>>                                   apr_pool_cleanup_null);
>> +#endif
>>
>>
>>
>> Note that because of using pre_cleanup in reslist we don't need
>> the extra registered cleanup (conn_pool_cleanup),
>> just to make sure the ordering is correct.
>> This was bogus anyhow, because we were destroying the reslist in
>> cleanup (that already has it's own cleanup), so the ordering of
>> cleanup callbacks was essential.
>>
> 
> I wonder how many other just uses in other modules would be just
> so affected?
> 
> So does this mean that trunk is now based on a "broken" or
> incompatible version of apr? Do we need to now break off
> trunk to 2.4 and baseline APR 1.3 to allow trunk to now work
> with an incompatible APR rev?

As far as this specific issue is concerned, IMHO no. The following
patch fixes the behaviour on trunk (with apr-util trunk) and does no
harm on 2.2.x:

Index: modules/proxy/proxy_util.c
===================================================================
--- modules/proxy/proxy_util.c  (Revision 681204)
+++ modules/proxy/proxy_util.c  (Arbeitskopie)
@@ -1380,7 +1380,6 @@
      proxy_worker *worker = (proxy_worker *)theworker;
      if (worker->cp->res) {
          worker->cp->pool = NULL;
-        apr_reslist_destroy(worker->cp->res);
      }
      return APR_SUCCESS;
  }

Why?
Trunk (with apr-util trunk):

At the point of time we would execute apr_reslist_destroy the reslist is already destroyed,
because we are in a cleanup of the same pool where the reslist registered itself as
precleanup. This causes the lock at shutdown.

2.2.x:
Calling apr_reslist_destroy is not really useful and needed in this case as we are in a cleanup
that was registered against the same pool that is used by the reslist. As it was registered
*after* the reslist was created it just runs *before* the reslist cleanup would run. This
is somewhat pointless here and we could leave the job of destroying the reslist to the
reslist cleanup.

Nevertheless I think that the precleanup code in apr trunk and the changes to the reslist
in apr-util trunk are not backportable just because of the example above: Code may break
if you change an apr / apr-util 1.3 release under the hood. This should not happen.

Regards

Rüdiger

Re: worker MPM on trunk does not shut down cleanly

Posted by Jim Jagielski <ji...@jaguNET.com>.

On Aug 1, 2008, at 4:44 AM, Mladen Turk wrote:

> Ruediger Pluem wrote:
>> Ok, this is caused by http://svn.apache.org/viewvc? 
>> rev=677505&view=rev
>> This is the reslist pre_cleanup patch. I don't know why so far, but  
>> as
>> I have a proxy configuration I suspect that it blocks on tearing down
>> the proxy connection pools.
>
> Here is the fix for trunk.
>
> Index: proxy_util.c
> ===================================================================
> --- proxy_util.c        (revision 681621)
> +++ proxy_util.c        (working copy)
> @@ -1939,10 +1939,11 @@
>                                 worker->hmax, worker->ttl,
>                                 connection_constructor,  
> connection_destructor,
>                                 worker, worker->cp->pool);
> -
> +#if 0
>         apr_pool_cleanup_register(worker->cp->pool, (void *)worker,
>                                   conn_pool_cleanup,
>                                   apr_pool_cleanup_null);
> +#endif
>
>
>
> Note that because of using pre_cleanup in reslist we don't need
> the extra registered cleanup (conn_pool_cleanup),
> just to make sure the ordering is correct.
> This was bogus anyhow, because we were destroying the reslist in
> cleanup (that already has it's own cleanup), so the ordering of
> cleanup callbacks was essential.
>

I wonder how many other just uses in other modules would be just
so affected?

So does this mean that trunk is now based on a "broken" or
incompatible version of apr? Do we need to now break off
trunk to 2.4 and baseline APR 1.3 to allow trunk to now work
with an incompatible APR rev?

Re: worker MPM on trunk does not shut down cleanly

Posted by Mladen Turk <mt...@apache.org>.

Ruediger Pluem wrote:
> Ok, this is caused by http://svn.apache.org/viewvc?rev=677505&view=rev
> This is the reslist pre_cleanup patch. I don't know why so far, but as
> I have a proxy configuration I suspect that it blocks on tearing down
> the proxy connection pools.
> 

Here is the fix for trunk.

Index: proxy_util.c
===================================================================
--- proxy_util.c        (revision 681621)
+++ proxy_util.c        (working copy)
@@ -1939,10 +1939,11 @@
                                  worker->hmax, worker->ttl,
                                  connection_constructor, 
connection_destructor,
                                  worker, worker->cp->pool);
-
+#if 0
          apr_pool_cleanup_register(worker->cp->pool, (void *)worker,
                                    conn_pool_cleanup,
                                    apr_pool_cleanup_null);
+#endif



Note that because of using pre_cleanup in reslist we don't need
the extra registered cleanup (conn_pool_cleanup),
just to make sure the ordering is correct.
This was bogus anyhow, because we were destroying the reslist in
cleanup (that already has it's own cleanup), so the ordering of
cleanup callbacks was essential.


Regards
-- 
^(TM)

Re: worker MPM on trunk does not shut down cleanly

Posted by Mladen Turk <mt...@apache.org>.

Ruediger Pluem wrote:
> Ok, this is caused by http://svn.apache.org/viewvc?rev=677505&view=rev
> This is the reslist pre_cleanup patch. I don't know why so far, but as
> I have a proxy configuration I suspect that it blocks on tearing down
> the proxy connection pools.
> 

I was afraid of that :(
The amount of code in proxy connection pool used just to
make sure everything gets cleared and still not crashing
probably makes "normal" usage bogus.
We are passing and setting NULL's to some internal
structure members so we can figure out from where
the cleanup come from (parent pool cleanup or connection
pool maintenance)

IMHO the solution would be to get rid of all those tricks
in proxy connection pool, and simplify the things.
BTW, that peace of code was one of my major reasons for
doing all the pre_cleanup stuff.

Regards
-- 
^(TM)

Re: worker MPM on trunk does not shut down cleanly

Posted by Ruediger Pluem <rp...@apache.org>.

Ok, this is caused by http://svn.apache.org/viewvc?rev=677505&view=rev
This is the reslist pre_cleanup patch. I don't know why so far, but as
I have a proxy configuration I suspect that it blocks on tearing down
the proxy connection pools.

Regards

Rüdiger

On 07/29/2008 11:01 PM, Ruediger Pluem wrote:
> Below the same trace for 2.2.x:
> 
> Process 319 attached - interrupt to quit
> 1217365129.950428 read(7, "$", 1) = 1
> 1217365131.583546 tgkill(319, 346, SIGHUP) = 0
> 1217365131.583831 --- SIGTERM (Terminated) @ 0 (0) ---
> 1217365131.584067 sigreturn() = ? (mask now [HUP INT QUIT USR1 ALRM 
> STKFLT CHLD CONT TSTP TTIN TTOU URG XCPU XFSZ VTALRM PROF WINCH IO PWR])
> 1217365131.584361 munmap(0xb709d000, 8392704) = 0
> 1217365131.584614 munmap(0xaa083000, 8392704) = 0
> 1217365131.584860 munmap(0xb689c000, 8392704) = 0
> 1217365131.585105 munmap(0xb609b000, 8392704) = 0
> 1217365131.585349 munmap(0xb589a000, 8392704) = 0
> 1217365131.585593 munmap(0xb5099000, 8392704) = 0
> 1217365131.585854 munmap(0xb4898000, 8392704) = 0
> 1217365131.586098 munmap(0xb4097000, 8392704) = 0
> 1217365131.586343 munmap(0xb3896000, 8392704) = 0
> 1217365131.586587 munmap(0xb3095000, 8392704) = 0
> 1217365131.586831 munmap(0xb2894000, 8392704) = 0
> 1217365131.587074 munmap(0xb2093000, 8392704) = 0
> 1217365131.587228 munmap(0xb1892000, 8392704) = 0
> 1217365131.587370 munmap(0xb1091000, 8392704) = 0
> 1217365131.587511 munmap(0xb0890000, 8392704) = 0
> 1217365131.587652 munmap(0xb008f000, 8392704) = 0
> 1217365131.587793 munmap(0xaf88e000, 8392704) = 0
> 1217365131.587933 munmap(0xaf08d000, 8392704) = 0
> 1217365131.588073 munmap(0xae88c000, 8392704) = 0
> 1217365131.588214 munmap(0xae08b000, 8392704) = 0
> 1217365131.588354 munmap(0xad88a000, 8392704) = 0
> 1217365131.588495 munmap(0xad089000, 8392704) = 0
> 1217365131.588636 munmap(0xac888000, 8392704) = 0
> 1217365131.589828 exit_group(0) = ?
> 
> An ltrace for 2.2.x:
> 
> 1217365232.987293 [0xb7f7d410] SYS_read(7, "$", 
> 1)                                     = 1
> 1217365236.584600 [0x80886df] apr_thread_join(0xbfb64364, 0x81717b0, 1, 
> 0, 0)          = 0
> 1217365236.585364 [0x808874b] pthread_kill(0xaa840b90, 1, 0xbfb64368, 
> 0x8088776, 0 <unfinished ...>
> 1217365236.585907 [0xb7f7d410] SYS_tgkill(386, 413, 1, 0, 2 <no return ...>
> 1217365236.587193 [0xffffffff] +++ killed by SIGTRAP +++
> 
> An ltrace for trunk:
> 
> 1217365285.357994 [0xb7fae410] SYS_read(7, "$", 
> 1)                                     = 1
> 1217365289.126204 [0x808b6bf] apr_thread_join(0xbf971974, 0x818d810, 1, 
> 0, 0)          = 0
> 1217365289.126982 [0x808dcb5] apr_thread_mutex_lock(0x818d9b0, 2, 
> 0xbf971958, 0x808b726, 0x818d998) = 0
> 1217365289.127654 [0x808dcd2] apr_thread_cond_broadcast(0x818d9e0, 2, 
> 0xbf971958, 0x808b726, 0x818d998) = 0
> 1217365289.128331 [0x808b726] apr_thread_mutex_unlock(0x818d9b0, 
> 0x818d810, 0xbf971978, 0x808b766, 0) = 0
> 1217365289.128997 [0x808b73d] pthread_kill(0xaa8e0b90, 1, 0xbf971978, 
> 0x808b766, 0 <unfinished ...>
> 1217365289.129564 [0xb7fae410] SYS_tgkill(435, 462, 1, 0, 2 <no return ...>
> 1217365289.130846 [0xffffffff] +++ killed by SIGTRAP +++
> 
> 
> On 07/29/2008 10:58 PM, Ruediger Pluem wrote:
>> I just noticed that on trunk the worker MPM on Linux no longer shuts 
>> down cleanly.
>> That means the parent process kills the child with SIGKILL which usually
>> should not happen. There is no such problem with 2.2.x. I was not able 
>> to obtain
>> a backtrace. The most usable seems to be an strace of the child process:
>>
>> Process 32721 attached - interrupt to quit
>> 1217364884.451615 read(7, "$", 1) = 1
>> 1217364886.829260 tgkill(32721, 32748, SIGHUP) = 0
>> 1217364886.829413 futex(0x818d880, 0x4 /* FUTEX_??? */, 1) = 25
>> 1217364886.829562 tgkill(32721, 32748, SIG_0) = 0
>> 1217364886.829696 select(0, NULL, NULL, NULL, {0, 500000}) = ? 
>> ERESTARTNOHAND (To be restarted)
>> 1217364886.829902 --- SIGTERM (Terminated) @ 0 (0) ---
>> 1217364886.830067 sigreturn() = ? (mask now [HUP INT QUIT USR1 ALRM 
>> STKFLT CHLD CONT TSTP TTIN TTOU URG XCPU XFSZ VTALRM PROF WINCH IO PWR])
>> 1217364886.830248 tgkill(32721, 32748, SIGHUP) = 0
>> 1217364886.830379 tgkill(32721, 32748, SIG_0) = 0
>> 1217364886.830509 select(0, NULL, NULL, NULL, {0, 500000}) = 0 (Timeout)
>> 1217364887.321620 munmap(0xb7094000, 8392704) = 0
>> 1217364887.321788 munmap(0xaa07a000, 8392704) = 0
>> 1217364887.321934 munmap(0xb6893000, 8392704) = 0
>> 1217364887.322077 munmap(0xb6092000, 8392704) = 0
>> 1217364887.322220 munmap(0xb5891000, 8392704) = 0
>> 1217364887.322367 munmap(0xb5090000, 8392704) = 0
>> 1217364887.322509 munmap(0xb488f000, 8392704) = 0
>> 1217364887.322651 munmap(0xb408e000, 8392704) = 0
>> 1217364887.322794 munmap(0xb388d000, 8392704) = 0
>> 1217364887.322936 munmap(0xb308c000, 8392704) = 0
>> 1217364887.323910 munmap(0xb288b000, 8392704) = 0
>> 1217364887.323943 munmap(0xb208a000, 8392704) = 0
>> 1217364887.323984 munmap(0xb1889000, 8392704) = 0
>> 1217364887.324014 munmap(0xb1088000, 8392704) = 0
>> 1217364887.324043 munmap(0xb0887000, 8392704) = 0
>> 1217364887.324072 munmap(0xb0086000, 8392704) = 0
>> 1217364887.324101 munmap(0xaf885000, 8392704) = 0
>> 1217364887.324130 munmap(0xaf084000, 8392704) = 0
>> 1217364887.324158 munmap(0xae883000, 8392704) = 0
>> 1217364887.324187 munmap(0xae082000, 8392704) = 0
>> 1217364887.324217 munmap(0xad881000, 8392704) = 0
>> 1217364887.324245 munmap(0xad080000, 8392704) = 0
>> 1217364887.324278 munmap(0xac87f000, 8392704) = 0
>> 1217364887.324329 futex(0x819176c, FUTEX_WAIT, 2, NULL) = -1 EINTR 
>> (Interrupted system call)
>> 1217364890.157855 --- SIGTERM (Terminated) @ 0 (0) ---
>> 1217364890.158009 sigreturn() = ? (mask now [HUP INT QUIT USR1 ALRM 
>> STKFLT CHLD CONT TSTP TTIN TTOU URG XCPU XFSZ VTALRM PROF WINCH IO PWR])
>> 1217364890.158189 futex(0x819176c, FUTEX_WAIT, 2, NULL) = -1 EINTR 
>> (Interrupted system call)
>> 1217364892.145923 --- SIGTERM (Terminated) @ 0 (0) ---
>> 1217364892.146079 sigreturn() = ? (mask now [HUP INT QUIT USR1 ALRM 
>> STKFLT CHLD CONT TSTP TTIN TTOU URG XCPU XFSZ VTALRM PROF WINCH IO PWR])
>> 1217364892.146261 futex(0x819176c, FUTEX_WAIT, 2, NULL) = -1 EINTR 
>> (Interrupted system call)
>> 1217364894.133997 --- SIGTERM (Terminated) @ 0 (0) ---
>> 1217364894.134155 sigreturn() = ? (mask now [HUP INT QUIT USR1 ALRM 
>> STKFLT CHLD CONT TSTP TTIN TTOU URG XCPU XFSZ VTALRM PROF WINCH IO PWR])
>> 1217364894.134333 futex(0x819176c, FUTEX_WAIT, 2, NULL) = -1 EINTR 
>> (Interrupted system call)
>> 1217364896.126425 +++ killed by SIGKILL +++
>> Process 32721 detached
>>
>> So it seems that it wait on some futex indefinitely.
>> Can somebody reproduce this problem or has an idea why this happens?
>>
>> Regards
>>
>> Rüdiger
>>
>>
>>
> 
>

Re: worker MPM on trunk does not shut down cleanly

Posted by Ruediger Pluem <rp...@apache.org>.

Below the same trace for 2.2.x:

Process 319 attached - interrupt to quit
1217365129.950428 read(7, "$", 1) = 1
1217365131.583546 tgkill(319, 346, SIGHUP) = 0
1217365131.583831 --- SIGTERM (Terminated) @ 0 (0) ---
1217365131.584067 sigreturn() = ? (mask now [HUP INT QUIT USR1 ALRM STKFLT CHLD CONT TSTP TTIN TTOU URG XCPU XFSZ VTALRM 
PROF WINCH IO PWR])
1217365131.584361 munmap(0xb709d000, 8392704) = 0
1217365131.584614 munmap(0xaa083000, 8392704) = 0
1217365131.584860 munmap(0xb689c000, 8392704) = 0
1217365131.585105 munmap(0xb609b000, 8392704) = 0
1217365131.585349 munmap(0xb589a000, 8392704) = 0
1217365131.585593 munmap(0xb5099000, 8392704) = 0
1217365131.585854 munmap(0xb4898000, 8392704) = 0
1217365131.586098 munmap(0xb4097000, 8392704) = 0
1217365131.586343 munmap(0xb3896000, 8392704) = 0
1217365131.586587 munmap(0xb3095000, 8392704) = 0
1217365131.586831 munmap(0xb2894000, 8392704) = 0
1217365131.587074 munmap(0xb2093000, 8392704) = 0
1217365131.587228 munmap(0xb1892000, 8392704) = 0
1217365131.587370 munmap(0xb1091000, 8392704) = 0
1217365131.587511 munmap(0xb0890000, 8392704) = 0
1217365131.587652 munmap(0xb008f000, 8392704) = 0
1217365131.587793 munmap(0xaf88e000, 8392704) = 0
1217365131.587933 munmap(0xaf08d000, 8392704) = 0
1217365131.588073 munmap(0xae88c000, 8392704) = 0
1217365131.588214 munmap(0xae08b000, 8392704) = 0
1217365131.588354 munmap(0xad88a000, 8392704) = 0
1217365131.588495 munmap(0xad089000, 8392704) = 0
1217365131.588636 munmap(0xac888000, 8392704) = 0
1217365131.589828 exit_group(0) = ?

An ltrace for 2.2.x:

1217365232.987293 [0xb7f7d410] SYS_read(7, "$", 1)                                     = 1
1217365236.584600 [0x80886df] apr_thread_join(0xbfb64364, 0x81717b0, 1, 0, 0)          = 0
1217365236.585364 [0x808874b] pthread_kill(0xaa840b90, 1, 0xbfb64368, 0x8088776, 0 <unfinished ...>
1217365236.585907 [0xb7f7d410] SYS_tgkill(386, 413, 1, 0, 2 <no return ...>
1217365236.587193 [0xffffffff] +++ killed by SIGTRAP +++

An ltrace for trunk:

1217365285.357994 [0xb7fae410] SYS_read(7, "$", 1)                                     = 1
1217365289.126204 [0x808b6bf] apr_thread_join(0xbf971974, 0x818d810, 1, 0, 0)          = 0
1217365289.126982 [0x808dcb5] apr_thread_mutex_lock(0x818d9b0, 2, 0xbf971958, 0x808b726, 0x818d998) = 0
1217365289.127654 [0x808dcd2] apr_thread_cond_broadcast(0x818d9e0, 2, 0xbf971958, 0x808b726, 0x818d998) = 0
1217365289.128331 [0x808b726] apr_thread_mutex_unlock(0x818d9b0, 0x818d810, 0xbf971978, 0x808b766, 0) = 0
1217365289.128997 [0x808b73d] pthread_kill(0xaa8e0b90, 1, 0xbf971978, 0x808b766, 0 <unfinished ...>
1217365289.129564 [0xb7fae410] SYS_tgkill(435, 462, 1, 0, 2 <no return ...>
1217365289.130846 [0xffffffff] +++ killed by SIGTRAP +++


On 07/29/2008 10:58 PM, Ruediger Pluem wrote:
> I just noticed that on trunk the worker MPM on Linux no longer shuts 
> down cleanly.
> That means the parent process kills the child with SIGKILL which usually
> should not happen. There is no such problem with 2.2.x. I was not able 
> to obtain
> a backtrace. The most usable seems to be an strace of the child process:
> 
> Process 32721 attached - interrupt to quit
> 1217364884.451615 read(7, "$", 1) = 1
> 1217364886.829260 tgkill(32721, 32748, SIGHUP) = 0
> 1217364886.829413 futex(0x818d880, 0x4 /* FUTEX_??? */, 1) = 25
> 1217364886.829562 tgkill(32721, 32748, SIG_0) = 0
> 1217364886.829696 select(0, NULL, NULL, NULL, {0, 500000}) = ? 
> ERESTARTNOHAND (To be restarted)
> 1217364886.829902 --- SIGTERM (Terminated) @ 0 (0) ---
> 1217364886.830067 sigreturn() = ? (mask now [HUP INT QUIT USR1 ALRM 
> STKFLT CHLD CONT TSTP TTIN TTOU URG XCPU XFSZ VTALRM PROF WINCH IO PWR])
> 1217364886.830248 tgkill(32721, 32748, SIGHUP) = 0
> 1217364886.830379 tgkill(32721, 32748, SIG_0) = 0
> 1217364886.830509 select(0, NULL, NULL, NULL, {0, 500000}) = 0 (Timeout)
> 1217364887.321620 munmap(0xb7094000, 8392704) = 0
> 1217364887.321788 munmap(0xaa07a000, 8392704) = 0
> 1217364887.321934 munmap(0xb6893000, 8392704) = 0
> 1217364887.322077 munmap(0xb6092000, 8392704) = 0
> 1217364887.322220 munmap(0xb5891000, 8392704) = 0
> 1217364887.322367 munmap(0xb5090000, 8392704) = 0
> 1217364887.322509 munmap(0xb488f000, 8392704) = 0
> 1217364887.322651 munmap(0xb408e000, 8392704) = 0
> 1217364887.322794 munmap(0xb388d000, 8392704) = 0
> 1217364887.322936 munmap(0xb308c000, 8392704) = 0
> 1217364887.323910 munmap(0xb288b000, 8392704) = 0
> 1217364887.323943 munmap(0xb208a000, 8392704) = 0
> 1217364887.323984 munmap(0xb1889000, 8392704) = 0
> 1217364887.324014 munmap(0xb1088000, 8392704) = 0
> 1217364887.324043 munmap(0xb0887000, 8392704) = 0
> 1217364887.324072 munmap(0xb0086000, 8392704) = 0
> 1217364887.324101 munmap(0xaf885000, 8392704) = 0
> 1217364887.324130 munmap(0xaf084000, 8392704) = 0
> 1217364887.324158 munmap(0xae883000, 8392704) = 0
> 1217364887.324187 munmap(0xae082000, 8392704) = 0
> 1217364887.324217 munmap(0xad881000, 8392704) = 0
> 1217364887.324245 munmap(0xad080000, 8392704) = 0
> 1217364887.324278 munmap(0xac87f000, 8392704) = 0
> 1217364887.324329 futex(0x819176c, FUTEX_WAIT, 2, NULL) = -1 EINTR 
> (Interrupted system call)
> 1217364890.157855 --- SIGTERM (Terminated) @ 0 (0) ---
> 1217364890.158009 sigreturn() = ? (mask now [HUP INT QUIT USR1 ALRM 
> STKFLT CHLD CONT TSTP TTIN TTOU URG XCPU XFSZ VTALRM PROF WINCH IO PWR])
> 1217364890.158189 futex(0x819176c, FUTEX_WAIT, 2, NULL) = -1 EINTR 
> (Interrupted system call)
> 1217364892.145923 --- SIGTERM (Terminated) @ 0 (0) ---
> 1217364892.146079 sigreturn() = ? (mask now [HUP INT QUIT USR1 ALRM 
> STKFLT CHLD CONT TSTP TTIN TTOU URG XCPU XFSZ VTALRM PROF WINCH IO PWR])
> 1217364892.146261 futex(0x819176c, FUTEX_WAIT, 2, NULL) = -1 EINTR 
> (Interrupted system call)
> 1217364894.133997 --- SIGTERM (Terminated) @ 0 (0) ---
> 1217364894.134155 sigreturn() = ? (mask now [HUP INT QUIT USR1 ALRM 
> STKFLT CHLD CONT TSTP TTIN TTOU URG XCPU XFSZ VTALRM PROF WINCH IO PWR])
> 1217364894.134333 futex(0x819176c, FUTEX_WAIT, 2, NULL) = -1 EINTR 
> (Interrupted system call)
> 1217364896.126425 +++ killed by SIGKILL +++
> Process 32721 detached
> 
> So it seems that it wait on some futex indefinitely.
> Can somebody reproduce this problem or has an idea why this happens?
> 
> Regards
> 
> Rüdiger
> 
> 
>