You are viewing a plain text version of this content. The canonical link for it is here.
Posted to apache-bugdb@apache.org by dg...@apache.org on 1999/05/01 19:39:02 UTC

Re: os-linux/3312: Children die. Parent stops serving requests

[In order for any reply to be added to the PR database, ]
[you need to include <ap...@Apache.Org> in the Cc line ]
[and leave the subject line UNCHANGED.  This is not done]
[automatically because of the potential for mail loops. ]
[If you do not include this Cc, your reply may be ig-   ]
[nored unless you are responding to an explicit request ]
[from a developer.                                      ]
[Reply only with text; DO NOT SEND ATTACHMENTS!         ]


Synopsis: Children die. Parent stops serving requests

State-Changed-From-To: feedback-analyzed
State-Changed-By: dgaudet
State-Changed-When: Sat May  1 10:39:02 PDT 1999
State-Changed-Why:
I examined the straces a while ago, but forgot to comment.
Here's a portion of the parent's trace:

time(NULL)                              = 909702870
wait4(-1, 0xbffffe64, WNOHANG, NULL)    = 0
select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
time(NULL)                              = 909702871
fork()                                  = 26032
wait4(-1, [WIFEXITED(s) && WEXITSTATUS(s) == 0], WNOHANG, NULL) = 26032
--- SIGCHLD (Child exited) ---
wait4(-1, 0xbffffe64, WNOHANG, NULL)    = -1 ECHILD (No child processes)
select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
time(NULL)                              = 909703113

Somehow 242 seconds passed between the two time() calls... the parent does
nothing cpu intensive, so I doubt it's that.  It's possible the guy's box
is swapping to hell... but we've got about a dozen similar reports.  The
reports are against 2.0.30, 2.0.32, and 2.0.33. 

Oh then there's the odd SIGCHLD followed by ECHILD... there's a few other
instances of that -- SIGCHLDs happenning and wait4() not reporting
anything. 

The short answer:  kernel problem.  Alan Cox hasn't heard of
this problem before, so it's probably an unknown problem.

Dean


Re: os-linux/3312: Children die. Parent stops serving requests

Posted by Ole Tange <ta...@tange.dk>.
On 1 May 1999 dgaudet@apache.org wrote:

> Somehow 242 seconds passed between the two time() calls... the parent does
> nothing cpu intensive, so I doubt it's that.  It's possible the guy's box
> is swapping to hell... but we've got about a dozen similar reports.

Nope. In that case the load ought to rise, which it didnot. The problem
was worked around by disabling keep-alives.

> The reports are against 2.0.30, 2.0.32, and 2.0.33.

After upgrading to kernel 2.0.36 and apache 1.3.4 I have been able to
re-enable keepalives with no problems so far.

> The short answer:  kernel problem.  Alan Cox hasn't heard of
> this problem before, so it's probably an unknown problem.

The short comment: Case appears solved by upgrading.


/Ole