You are viewing a plain text version of this content. The canonical link for it is here.
Posted to apache-bugdb@apache.org by Phillip Ezolt <ez...@perf.zko.dec.com> on 1999/02/17 22:30:58 UTC
os-linux/3911: Under high load, server hangs in "flock or fnctl".
>Number: 3911
>Category: os-linux
>Synopsis: Under high load, server hangs in "flock or fnctl".
>Confidential: no
>Severity: critical
>Priority: medium
>Responsible: apache
>State: open
>Class: sw-bug
>Submitter-Id: apache
>Arrival-Date: Wed Feb 17 13:40:01 PST 1999
>Last-Modified:
>Originator: ezolt@perf.zko.dec.com
>Organization:
apache
>Release: 1.3.4
>Environment:
Linux crappy.zko.dec.com 2.2.1 #5 Fri Feb 12 09:07:00 EST 1999 i686 unknown
gcc version egcs-2.91.60 19981201 (egcs-1.1.1 release)
OS version: Redhat 5.2
Intel PII/266 with FDDI interface card.
Kernel compile with the following things changed:
I have changed the following kernel values in /usr/src/linux/include/net/tcp.h
to:
#define TCP_HTABLE_SIZE 2048 (was 512)
#define TCP_LHTABLE_SIZE 128 (was 32)
#define TCP_BHTABLE_SIZE 2048 (was 512)
Everything is on a local filesystem. The lockfiles are NOT on NFS.
>Description:
I can repeatably get the server to stop responding after signifcantly stressing
the system. Initially, I had apache compiled with flock serialization. After
a while, a large number of the httpd processes were stuck in the following state:
#0 0x400d49c1 in flock ()
#1 0x805aaa9 in accept_mutex_on ()
#2 0x805d6a5 in child_main ()
#3 0x805dc68 in make_child ()
#4 0x805dfe1 in perform_idle_server_maintenance ()
#5 0x805e4e9 in standalone_main ()
#6 0x805ea7b in main ()
There were a few with the following (What they SHOULD be.. )
#0 0x400de5c2 in __libc_accept ()
#1 0x805d7bc in child_main ()
#2 0x805dc68 in make_child ()
#3 0x805dd17 in startup_children ()
#4 0x805e328 in standalone_main ()
#5 0x805ea7b in main ()
When I would try to connect to the server (lynx http://127.0.0.1), it would
just hang. Normally, the response would be instaneous.
I tried to recompile apache with FCNTL support, and the same thing occurs.
This time the stack trace is:
0 0x400d4974 in __libc_fcntl ()
#1 0x1 in ?? ()
#2 0x805d66d in child_main ()
#3 0x805dc30 in make_child ()
#4 0x805dcdf in startup_children ()
#5 0x805e2f0 in standalone_main ()
#6 0x805ea43 in main ()
There is some kind of race condition that occurs under a very heavy load.
I am not sure if it is a linux, apache, or even glibc bug, but I really want to
get a good result here.
>How-To-Repeat:
The load is SPECWeb96. When I try to push my system above 60 Ops/Sec, this
occurs. I don't have an easy way for an external site to repeat it, but for the
next week and a half, it is all I will be working on. So, I can easily try out
any patches that anyone may have.
>Fix:
None.
>Audit-Trail:
>Unformatted:
[In order for any reply to be added to the PR database, ]
[you need to include <ap...@Apache.Org> in the Cc line ]
[and leave the subject line UNCHANGED. This is not done]
[automatically because of the potential for mail loops. ]
[If you do not include this Cc, your reply may be ig- ]
[nored unless you are responding to an explicit request ]
[from a developer. ]
[Reply only with text; DO NOT SEND ATTACHMENTS! ]