You are viewing a plain text version of this content. The canonical link for it is here.
Posted to apache-bugdb@apache.org by co...@hyperreal.org on 1997/11/13 18:44:03 UTC

Re: general/885: After a period of time (not found to coincide with server rehashes or any specific access), the server will read requests, but return no data (and close the connection). It will still respond to a server-status request though.

Synopsis: After a period of time (not found to coincide with server rehashes or any specific access), the server will read requests, but return no data (and close the connection).  It will still respond to a server-status request though.

State-Changed-From-To: analyzed-feedback
State-Changed-By: coar
State-Changed-When: Thu Nov 13 09:44:02 PST 1997
State-Changed-Why:
Has this issue been resolved yet?  Please let us know
(and cc <ap...@apache.org>) if we should keep the report
open; otherwise it'll be closed in a few days.


Re: general/885: After a period of time (not found to coincide with server rehashes or any specific access), the server will read requests, but return no data (and close the connection). It will still respond to a server-status request though.

Posted by Illuminatus Primus <ve...@gate.net>.
Yes, the problem was found to be that the logger process would die,
causing Apache to receive SIGPIPE whenever it would try to log something.
I noticed that on the list of changes for Apache 1.3, reliable piped logs
is amongst the changes.. thanks :)

However, in the process of trying to hunt the bug down, I also noticed
that the ap_slack function can have undesireable behavior on 2.0.29
kernels.. I wrote a small program (testslack.c, attached), which basically
uses the ap_slack function that I copied out of Apache 1.2.1 to
continuously remap fds: 

Here is the ap_slack that I found at the time:

#define LOW_SLACK_LINE 15
int ap_slack (int fd, int line)
{
    int new_fd;

    new_fd = fcntl (fd, F_DUPFD, LOW_SLACK_LINE);
    if (new_fd == -1) {
      return fd;
    }
    close (fd);
    return new_fd;
}

In the case where fcntl reports a failure to remap the fd, ap_slack
returns the old fd.. However, in testing, it appears that fcntl does not
return -1 when it runs out of fds in 2.0.x kernels... 

With kernel 2.1.53, it open() returns -1 before slacking happens, so I
can't determine if fcntl works correctly.. but at least apache will see
the error immediately when open returns -1:
myfd: 3 slacked up: 909
myfd: 3 slacked up: 910
myfd: 3 slacked up: 911
myfd: -1 slacked up: -1

On 2.0.29 (ISS patch #4), this is the behavior right before it
reaches it's descriptor limit:
myfd: 3 slacked up: 253
myfd: 3 slacked up: 254
myfd: 3 slacked up: 255
myfd: 3 slacked up: 3
myfd: 4 slacked up: 4
myfd: 5 slacked up: 5
myfd: 6 slacked up: 6
myfd: 7 slacked up: 7
myfd: 8 slacked up: 8
myfd: 9 slacked up: 9
myfd: 10 slacked up: 10
myfd: 11 slacked up: 11
myfd: 12 slacked up: 12
myfd: 13 slacked up: 13
myfd: 14 slacked up: 14
myfd: -1 slacked up: -1

So, with 2.0.x kernels, ap_slack won't know that there was an error
remapping the fd, close the fd it was given, and return it to be used for
reading/writing/whatever, possibly resulting in SIGPIPES.. I thought
wrongly at the time that this might have been causing my problems, but of
course I soon found the real reason when my problem happened again and I
started an strace of Apache :).  But I still think it's a Bad Thing (tm) 
that a closed fd could possibly go floating around causing SIGPIPEs..

Fortunately, this small bug would/will only appear when 255-LOW_SLACK_LINE
fds have been allocated already, and it's relatively easy to work around
(just check to see if the new fd is the same as the old one)..

It's possible that this small bug has already been fixed in apache 1.2.4,
or linux 2.0.31 behaves more reasonably (I haven't tested it yet).. I was
going to report my findings earlier, but had a big load of work dumped on
me, and haven't remembered to get back to you guys since then.. sorry.

Thanks for the great work on Apache (I really like the new features in
1.3!)..

-vermont@gate.net

On 13 Nov 1997 coar@hyperreal.org wrote:

> Synopsis: After a period of time (not found to coincide with server rehashes or any specific access), the server will read requests, but return no data (and close the connection).  It will still respond to a server-status request though.
> 
> State-Changed-From-To: analyzed-feedback
> State-Changed-By: coar
> State-Changed-When: Thu Nov 13 09:44:02 PST 1997
> State-Changed-Why:
> Has this issue been resolved yet?  Please let us know
> (and cc <ap...@apache.org>) if we should keep the report
> open; otherwise it'll be closed in a few days.
>