You are viewing a plain text version of this content. The canonical link for it is here.
Posted to apache-bugdb@apache.org by "David J. MacKenzie" <dj...@web.us.uu.net> on 2000/11/21 05:20:04 UTC
Re: os-solaris/1190: server processes in keepalive state do not die after keepalive-timeout
The following reply was made to PR os-solaris/1190; it has been noted by GNATS.
From: "David J. MacKenzie" <dj...@web.us.uu.net>
To: apbugs@apache.org
Cc: djm@uu.net, rse@engelschall.com
Subject: Re: os-solaris/1190: server processes in keepalive state do not die after keepalive-timeout
Date: Mon, 20 Nov 2000 23:16:39 -0500 (EST)
We have just started experiencing what seems to be the same problem
as http://bugs.apache.org/index.cgi/full/1190
which was reported by a Solaris 2.5.1 user in 1998 and never resolved.
That person was also using mod_ssl and PHP, which seems to be relevant.
Also http://bugs.apache.org/index.cgi/full/6211 may be related,
though today I applied the patch in that PR to no apparent effect.
We are using the newest versions of (almost) everything, on BSDI
BSD/OS 4.0.1. I have some additional data which should be helpful.
In short, the finger *seems* to point at mod_ssl as the culprit,
though I haven't looked at the code to see how that might be plausible.
A week ago UUNET upgraded our server farm of about 800 servers, of which
a few dozen have SSL, from apache 1.3.12 (for most servers) or
Stronghold 2.4.2 (for those that have SSL). They are now running:
apache 1.3.14, with two patches from bugs.apache.org to fix corrupting
PDF files and mod_rewrite maps (the Bugtraq patch)
mod_ssl 2.7.1
OpenSSL 0.9.5a
PHP 4.0.3pl1
mod_auth_kerb configured for Kerberos v5
All modules except http_core and mod_so are loaded as DSO's. All of
the servers are using the same apache binary and DSO's, compiled with
EAPI, but we only LoadModule mod_ssl for those servers that have SSL
keys and certs. We're not using Java or Perl modules, or anything
that multithreads. The BSD/OS pthreads are user-space anyway.
root@enniskillen 39 $ ldd /usr/local/libexec/apache
libkrb5.so => /usr/local/krb5/lib/libkrb5.so (0xc054000)
libk5crypto.so => /usr/local/krb5/lib/libk5crypto.so (0xc0b4000)
libmm.so.11 => /usr/local/lib/libmm.so.11 (0xc0ce000)
libdl.so => /shlib/libdl.so (0xc0d2000)
libgcc.so => /shlib/libgcc.so (0xc0d5000)
libc.so => /shlib/libc.so (0xc0d8000)
libcom_err.so => /usr/local/krb5/lib/libcom_err.so (0xc15b000)
Our new apache+mod_ssl installation is not always handling HTTP
Keepalive correctly. It's configured to keep connections alive for 5
seconds, but it's not letting some of them go. We see the same
behavior described in PR 1190, in which over the course of a few hours
gradually most of the process slots become filled with Keepalive
connections that are much older than is supposed to be allowed.
Eventually our monitoring systems start alerting that they can't
connect to the servers. Some of the old connections eventually go
away on their own, perhaps those from dialup lines; I'm not sure.
I sampled the mod-status pages of several of our customers, loading
the page, waiting 30 seconds or more, and loading it again in a second
window, and comparing the lists. I looked for which child processes
had connections in the Keepalive state, and checked whether the amount
of data transferred had changed.
The random sample of about a dozen non-SSL customers I checked all
looked normal. Some of the customers I checked who have SSL showed
the problem. For example, one server got a few http (not https)
requests at 7:29 this morning from IP address 212.250.100.120, and
none since. 12 hours later, the TCP connection is still open, and
taking up 3 apache process slots in the Keepalive state. The browser
is "Mozilla/4.0 (compatible; MSIE 4.01; Windows 98)".
Another server shows the same sort of problem, with a connection at 1:13
this afternoon from 192.44.136.113 which lasted 3 seconds but is still
open:
root@platform-33: netstat -an | grep 192.44.136.113
tcp 0 0 208.240.90.209.80 192.44.136.113.39653 ESTABLISHED
tcp 0 0 208.240.90.209.80 192.44.136.113.39650 ESTABLISHED
tcp 0 0 208.240.90.209.80 192.44.136.113.39598 ESTABLISHED
Their mod-status page confirms that 3 child processes are still in the
Keepalive state for this IP address. The browser is
"Mozilla/4.5 [en] (Win98; I)". That address is pingable:
root@platform-31: ping 192.44.136.113
PING 192.44.136.113 (192.44.136.113): 56 data bytes
64 bytes from 192.44.136.113: icmp_seq=0 ttl=246 time=23.961 ms
So the problem doesn't seem to depend on the browser (Netscape or
MSIE). I've seen it with clients on Windows 95/98 (mainly) and MacOS,
and I think on NT.
Most or all of the requests involved have been for static content.
The affected servers aren't using PHP.
Some of our SSL servers aren't showing the problem, but they are doing
little volume. Late this afternoon I temporarily turned Keepalive off
for the two servers affected the worst, who keep failing our monitoring
because all child processes are used. They went from 40-60 child
processes being used simultaneously, to 2-13, though this wasn't in
the busiest part of the day.
I also found this comment on Slashdot from a year ago,
at http://slashdot.org/apache/99/12/22/1711203.shtml:
I've tried both, and while admittedly mod_ssl looks cleaner,
is easier to set up, and is updated more frequently, we mad
several problems with Microsoft and AOL clients connecting
via SSL. All of these problems went away once we moved
over to Apache-SSL. We tried fiddling with the keepalive
and "unclean shutdown" settings to no avail with mod_ssl
but it didn't seem to do any good.
I haven't tried Apache-SSL yet.