You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by "William A. Rowe, Jr." <wr...@rowe-clan.net> on 2001/11/15 07:45:22 UTC

Serious 1.3.22 Win32 sockets bug

Apache 1.3.22 introduced a rather troubling bug in the listeners.

Since we allow the child to start immediately, without waiting on
the prior child to finish, we have introduced a serious flaw.  Both
processes are listening on the same socket, already primed to accept
connections.  The client with an existing socket's data will be
randomly accepted by either process.  This appears to be a serious
flaw, especially in POST data.

This points out a larger simplification, perhaps not Win32 specific.
If we are shutting down a given child, we should be toggling keep_alive 
off on the current requests, no?  That definately drops the restart-
completion time.  As I said, perhaps that's just a win32 anomily.

Without introducing a huge burden of interlocking the individual
sockets (a patch not worthy of the 1.3.23-dev tree), we can actually
solve this since the Win32 accepts are all opened as non-exclusive.
Therefore, if we drop the setup_listeners from the parent, entirely,
and let the children open those listeners, the entire headache goes
away.  The old child continues to fulfill all remaining connections,
and the new child fulfills all newly originated requests.

Attached is a patch to do just that.  Please review (Mr. Stoddard ;)

Bill


Re: Serious 1.3.22 Win32 sockets bug

Posted by Bill Stoddard <bi...@wstoddard.com>.
> Apache 1.3.22 introduced a rather troubling bug in the listeners.
>
> Since we allow the child to start immediately, without waiting on
> the prior child to finish, we have introduced a serious flaw.  Both
> processes are listening on the same socket, already primed to accept
> connections.  The client with an existing socket's data will be
> randomly accepted by either process.

I don't see how that can happen. The thread that does the accepting on Windows (a single
thread) does not release the start_mutex (which enables the other process to begin
accepting connections) until after it has stopped accepting connections. In other words,
unless there is a bug, only one process should ever be accepting connections.


Bill


Re: Serious 1.3.22 Win32 sockets bug

Posted by Bill Stoddard <bi...@wstoddard.com>.
My last note has not made it to the list so I am loosing some context..

Here is the code in httpd-2.0 that does the right thing (the ap_graceful_stop_signalled()
call). Unless I overlooked something, it would be nice to have something like this in 1.3
for Windows. Not a killer deficiency though. It just means that the old process will
service keep-alive sessions until it is done or until the main thread kills it (which
isn't too cool)

Bill


 while ((r = ap_read_request(c)) != NULL) {

        /* process the request if it was read without error */

        ap_update_child_status(AP_CHILD_THREAD_FROM_ID(c->id), SERVER_BUSY_WRITE, r);
        if (r->status == HTTP_OK)
            ap_process_request(r);

        if (ap_extended_status)
            ap_increment_counts(AP_CHILD_THREAD_FROM_ID(c->id), r);

        if (!c->keepalive || c->aborted)
            break;

        ap_update_child_status(AP_CHILD_THREAD_FROM_ID(c->id), SERVER_BUSY_KEEPALIVE, r);
        apr_pool_destroy(r->pool);

        if (ap_graceful_stop_signalled())
            break;
    }


Re: Serious 1.3.22 Win32 sockets bug

Posted by "William A. Rowe, Jr." <wr...@covalent.net>.
From: "Bill Stoddard" <bi...@wstoddard.com>
Sent: Thursday, November 15, 2001 11:50 AM
> >
> > You can demonstrate (interactively) by quering server-status against 1.3.22,
> > with keepalives on.  As you restart, keep refreshing.  The two servers will
> > (alternating randomly) answer the status request, as indicated by uptime.
> 
> Nope, not possible (at least for the reasons you suspect). I suspect your browser is
> opening multiple connections the server and is sending the server-status requests randomly
> on different connections. Once a process starts shutting down, it stops accepting new
> connections before allowing the new process to begin accepting connections (via the
> start_mutex). There is no way for stock Apache 1.3 to accept connections on the process
> shutting down and the new process simultaniously.

Uhmm... here's demonstration test number one, start this test, restart after inititing
the test;

#!perl
#the /cgi-bin/spiffy.pl script, to test that POST content is recieved complete
binmode STDIN;
$l = 0;
while (<STDIN>) {
    $l += length($_);
}
if ( $l == $ENV{"CONTENT_LENGTH"} ) {
    print "Status: 200 OK\n";
} else {
    print "Status: 500 BADLENGTH!\n";
}
print "content-type:text/plain\n\n";
print "Bytes read $l\n";
print "Bytes expected $ENV{'CONTENT_LENGTH'}\n";


ab -k -n 100 -p somefile -c 10 http://localhost:8080/cgi-bin/spiffy.pl


Under 1.3.22, I loose an average of 3 connections (somefile is a bit over 1MB).

I'm reasonably convinced my earlier observations were an artifact of multiple keep-alive
sockets, as you suggest.  We have a similar number of connection errors (about 2 or 3) 
with both pre and post my patch, but with my patch we have about 5 length errors, which 
didn't occur in 1.3.22, so I'll accept that this is the wrong patch.

> The old process may be still servicing keep-alive connections however. I'm positive there
> is code in 1.3.* to quit handling keep alive connections if the process is shutting down.
> I'll take a look and see if that code is broken.

Agreed, if we resolve that, most observations of this artifact will disappear, and the
time-to-restart-completion should drop.

Bill


Re: Serious 1.3.22 Win32 sockets bug

Posted by Bill Stoddard <bi...@wstoddard.com>.
> From: "Bill Stoddard" <bi...@wstoddard.com>
> Sent: Thursday, November 15, 2001 7:22 AM
>
>
> > > Apache 1.3.22 introduced a rather troubling bug in the listeners.
> > >
> > > Without introducing a huge burden of interlocking the individual
> > > sockets (a patch not worthy of the 1.3.23-dev tree), we can actually
> > > solve this since the Win32 accepts are all opened as non-exclusive.
> > > Therefore, if we drop the setup_listeners from the parent, entirely,
> > > and let the children open those listeners, the entire headache goes
> > > away.  The old child continues to fulfill all remaining connections,
> > > and the new child fulfills all newly originated requests.
> > >
> > > Attached is a patch to do just that.  Please review (Mr. Stoddard ;)
> > >
> > > Bill
> > >
> >
> > -1 at least until after we discuss this a bit more.
> >
> > With this patch, connections in the TCP listen queue will be dumped across a restart
(of
> > after hitting max requests per child). Opening the listeners in the parent and
allowing
> > the child processes to inherit the listeners keeps the listen queue intact across
> > restarts.  Having two processes open the same port would introduce undefined behaviour
I
> > suspect (perhaps each will have their own listen queue).
>
> It sounds like we would need to transform the start_mutex to _not_ kill the
> original child until the new child had picked up all the listeners.  This
> would not work, of course, if the original child segfaults :(
>
> > What exactly is the problem with POST? I'd like to investigate other solutions first.
>
> I need to look at the post options to ab or flood, to demonstrate.  I suspect
> packets will be randomly delivered between the two processes.
>
> You can demonstrate (interactively) by quering server-status against 1.3.22,
> with keepalives on.  As you restart, keep refreshing.  The two servers will
> (alternating randomly) answer the status request, as indicated by uptime.

Nope, not possible (at least for the reasons you suspect). I suspect your browser is
opening multiple connections the server and is sending the server-status requests randomly
on different connections. Once a process starts shutting down, it stops accepting new
connections before allowing the new process to begin accepting connections (via the
start_mutex). There is no way for stock Apache 1.3 to accept connections on the process
shutting down and the new process simultaniously.

The old process may be still servicing keep-alive connections however. I'm positive there
is code in 1.3.* to quit handling keep alive connections if the process is shutting down.
I'll take a look and see if that code is broken.

Bill


Re: Serious 1.3.22 Win32 sockets bug

Posted by "William A. Rowe, Jr." <wr...@covalent.net>.
From: "Bill Stoddard" <bi...@wstoddard.com>
Sent: Thursday, November 15, 2001 7:22 AM


> > Apache 1.3.22 introduced a rather troubling bug in the listeners.
> >
> > Without introducing a huge burden of interlocking the individual
> > sockets (a patch not worthy of the 1.3.23-dev tree), we can actually
> > solve this since the Win32 accepts are all opened as non-exclusive.
> > Therefore, if we drop the setup_listeners from the parent, entirely,
> > and let the children open those listeners, the entire headache goes
> > away.  The old child continues to fulfill all remaining connections,
> > and the new child fulfills all newly originated requests.
> >
> > Attached is a patch to do just that.  Please review (Mr. Stoddard ;)
> >
> > Bill
> >
> 
> -1 at least until after we discuss this a bit more.
> 
> With this patch, connections in the TCP listen queue will be dumped across a restart (of
> after hitting max requests per child). Opening the listeners in the parent and allowing
> the child processes to inherit the listeners keeps the listen queue intact across
> restarts.  Having two processes open the same port would introduce undefined behaviour I
> suspect (perhaps each will have their own listen queue).

It sounds like we would need to transform the start_mutex to _not_ kill the 
original child until the new child had picked up all the listeners.  This
would not work, of course, if the original child segfaults :(

> What exactly is the problem with POST? I'd like to investigate other solutions first.

I need to look at the post options to ab or flood, to demonstrate.  I suspect
packets will be randomly delivered between the two processes.

You can demonstrate (interactively) by quering server-status against 1.3.22,
with keepalives on.  As you restart, keep refreshing.  The two servers will
(alternating randomly) answer the status request, as indicated by uptime.
With a post body, this would be poison.

So yes, all new requests are rerouted properly by the 1.3.22 behavior, but
all in-process requests are potentially corrupted.

Bill



Re: Serious 1.3.22 Win32 sockets bug

Posted by Bill Stoddard <bi...@wstoddard.com>.
> Apache 1.3.22 introduced a rather troubling bug in the listeners.
>
> Since we allow the child to start immediately, without waiting on
> the prior child to finish, we have introduced a serious flaw.  Both
> processes are listening on the same socket, already primed to accept
> connections.  The client with an existing socket's data will be
> randomly accepted by either process.  This appears to be a serious
> flaw, especially in POST data.
>
> This points out a larger simplification, perhaps not Win32 specific.
> If we are shutting down a given child, we should be toggling keep_alive
> off on the current requests, no?  That definately drops the restart-
> completion time.  As I said, perhaps that's just a win32 anomily.
>
> Without introducing a huge burden of interlocking the individual
> sockets (a patch not worthy of the 1.3.23-dev tree), we can actually
> solve this since the Win32 accepts are all opened as non-exclusive.
> Therefore, if we drop the setup_listeners from the parent, entirely,
> and let the children open those listeners, the entire headache goes
> away.  The old child continues to fulfill all remaining connections,
> and the new child fulfills all newly originated requests.
>
> Attached is a patch to do just that.  Please review (Mr. Stoddard ;)
>
> Bill
>

-1 at least until after we discuss this a bit more.

With this patch, connections in the TCP listen queue will be dumped across a restart (of
after hitting max requests per child). Opening the listeners in the parent and allowing
the child processes to inherit the listeners keeps the listen queue intact across
restarts.  Having two processes open the same port would introduce undefined behaviour I
suspect (perhaps each will have their own listen queue).

What exactly is the problem with POST? I'd like to investigate other solutions first.

Bill