You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Rainer Jung <ra...@kippdata.de> on 2010/04/25 20:07:34 UTC

Re: Unclean process shutdown in event MPM?

On 23.03.2010 15:30, Jeff Trawick wrote:
> On Tue, Mar 23, 2010 at 10:04 AM, Rainer Jung<ra...@kippdata.de>  wrote:
>> On 23.03.2010 13:34, Jeff Trawick wrote:
>>>
>>> On Tue, Mar 23, 2010 at 7:19 AM, Rainer Jung<ra...@kippdata.de>
>>>   wrote:
>>>>
>>>> I can currently reproduce the following problem with 2.2.15 event MPM
>>>> under
>>>> high load:
>>>>
>>>> When an httpd child process gets closed due to the max spare threads rule
>>>> and it holds established client connections for which it has fully
>>>> received
>>>> a keep alive request, but not yet send any part of the response, it will
>>>> simply close that connection.
>>>>
>>>> Is that expected behaviour? It doesn't seem reproducible for the worker
>>>> MPM.
>>>> The behaviour has been observed using extreme spare rules in order to
>>>> make
>>>> processes shut down often, but it still seems not right.
>>>
>>> Is this the currently-unhandled situation discussed in this thread?
>>>
>>>
>>> http://mail-archives.apache.org/mod_mbox/httpd-dev/200711.mbox/%3Ccc67648e0711130530h45c2a28ctcd743b2160e22914@mail.gmail.com%3E
>>>
>>> Perhaps Event's special handling for keepalive connections results in
>>> the window being encountered more often?
>>
>> I'd say yes. I know from the packet trace, that the previous response on the
>> same connection got "Connection: Keep-Alive". But from the time gap of about
>> 0.5 seconds between receving the next request and sending the FIN, I guess,
>> that the child was not already in the process of shutting down, when the
>> previous "Connection: Keep-Alive" response was send.
>>
>> So for me the question is: if the web server already acknowledged the next
>> request (in our case it's a GET request, and a TCP ACK), should it wait with
>> shutting down the child until the request has been processed and the
>> response has been send (and in this case "Connetion: Close" was included)?
>
> Since the ACK is out of our control, that situation is potentially
> within the race condition.
>
>>
>> For the connections which do not have another request pending, I see no
>> problem in closing them - although there could be a race condition. When
>> there's a race (client sends next request while server sends FIN), the
>> client doesn't expect the server to handle the request (it can always happen
>> when a Keep Alive connection times out). In the situation observed it is
>> annoying, that the server already accepted the next request and nevertheless
>> closes the connection without handling the request.
>
> All we can know is whether or not the socket is readable at the point
> where we want to gracefully exit the process.  In keepalive state we'd
> wait for {timeout, readability, shutdown-event}, and if readable at
> wakeup then try to process it unless
> !c->base_server->keep_alive_while_exiting&&
> ap_graceful_stop_signalled().
>
>> I will do some testing around your patch
>>
>> http://people.apache.org/~trawick/keepalive.txt
>
> I don't think the patch will cover Event.  It modifies
> ap_process_http_connection(); ap_process_http_async_connection() is
> used with Event unless there are "clogging input filters."  I guess
> the analogous point of processing is inside Event itself.
>
> I guess if KeepAliveWhileExiting is enabled (whoops, that's
> vhost-specific) then Event would have substantially different shutdown
> logic.

I could now take a second look at it. Directly porting your patch to 
trunk and event is straightforward. There remains a hard problem though: 
the listener thread has a big loop of type

     while (!listener_may_exit) {
         apr_pollset_poll(...)
         while (HANDLE_EVENTS) {
             if (READABLE_SOCKET)
                 ...
             else if (ACCEPT)
                 ...
         }
         HANDLE_KEEPALIVE_TIMEOUTS
         HANDLE_WRITE_COMPLETION_TIMEOUTS
     }

Obviously, if we want to respect any previously retunred "Connection: 
Keep-Alive" headers, we can't terminate the loop on listeners_may_exit. 
As a first try, I switched to:

     while (1) {
         if (listener_may_exit)
             ap_close_listeners();
         apr_pollset_poll(...);
         REMOVE_LISTENERS_FROM_POLLSET
         while (HANDLE_EVENTS) {
             if (READABLE_SOCKET)
                 ...
             else if (ACCEPT)
                 ...
         }
         HANDLE_KEEPALIVE_TIMEOUTS
         HANDLE_WRITE_COMPLETION_TIMEOUTS
     }

Now the listeners get closed and in combination with your patch the 
connections will not be dropped, but instead will receive a "Connection: 
close" during the next request.

Now the while-loop lacks a correct break criterium. It would need to 
stop, when the pollset is empty (listeners were removed, other 
connections were closed due to end of Keep-Alive or timeout). 
Unfortunately there is no API function for checking whether there are 
still sockets in the pollset and it isn't straightforward how to do that.

Another possibility would be to wait for a maximum of the vhost 
keepalive timeouts. But that seems to be a bit to much.

Any ideas or comments?

Regards,

Rainer

Re: Unclean process shutdown in event MPM?

Posted by Greg Ames <am...@gmail.com>.
On Thu, Apr 29, 2010 at 12:06 PM, Greg Ames <am...@gmail.com> wrote:

>
> The last time I checked, trunk had a related bug:
> https://issues.apache.org/bugzilla/show_bug.cgi?id=43359 .
>
> .  I will look at the patch again and forget mod_status bells and whistles
> for now.


OK, I reviewed this patch again.  It ought to take care of the 2.2.x issue
because it delays the complete listener thread termination until all the
connections are closed when it's one of the graceful process shutdown
scenarios.

The logic that sets SERVER_DEAD before we know if the worker thread(s) will
be doing useful work again caught my attention.  It might be bad because we
won't have any clues in the mod_status display about what that worker thread
is doing.  On the other hand, this process won't be accepting any more new
connections, and seeing SERVER_DEAD for these threads would allow
perform_idle_server_maintenance() to fork replacement processes sooner.  I
think I prefer to see SERVER_READY until the worker thread really exits.  I
don't think we need to worry too much about forking as quickly as possible
during a graceful process shutdown.  Other opinions?

Greg

Re: Unclean process shutdown in event MPM?

Posted by Greg Ames <am...@gmail.com>.
In 2.2, it is expected behavior.  The RFC allows the server to close
keepalive connections when it wants.

The last time I checked, trunk had a related bug:
https://issues.apache.org/bugzilla/show_bug.cgi?id=43359 . Connections
waiting for network writes can also be handled as poll events.  But Event's
process management wasn't updated to take into account that connections
might be blocked on network I/O with no current worker thread.  So those
connections waiting for network writes can also be dropped when the parent
thinks there are too many processes around.

I did a quick scan of the attached patch a while back but didn't commit it
because I thought it should be changed to keep the number of Event - handled
connections (i.e., connections with no worker thread) and what kind of event
they are waiting on in the scoreboard to facilitate a mod_status display
enhancement.  But no Round TUITs for years.  I will look at the patch again
and forget mod_status bells and whistles for now.

On Sun, Apr 25, 2010 at 2:07 PM, Rainer Jung <ra...@kippdata.de>wrote:

> On 23.03.2010 15:30, Jeff Trawick wrote:
>
>> On Tue, Mar 23, 2010 at 10:04 AM, Rainer Jung<ra...@kippdata.de>
>>  wrote:
>>
>>> On 23.03.2010 13:34, Jeff Trawick wrote:
>>>
>>>>
>>>> On Tue, Mar 23, 2010 at 7:19 AM, Rainer Jung<ra...@kippdata.de>
>>>>  wrote:
>>>>
>>>>>
>>>>> I can currently reproduce the following problem with 2.2.15 event MPM
>>>>> under
>>>>> high load:
>>>>>
>>>>> When an httpd child process gets closed due to the max spare threads
>>>>> rule
>>>>> and it holds established client connections for which it has fully
>>>>> received
>>>>> a keep alive request, but not yet send any part of the response, it
>>>>> will
>>>>> simply close that connection.
>>>>>
>>>>> Is that expected behaviour? It doesn't seem reproducible for the worker
>>>>> MPM.
>>>>> The behaviour has been observed using extreme spare rules in order to
>>>>> make
>>>>> processes shut down often, but it still seems not right.
>>>>>
>>>>
>>>> Is this the currently-unhandled situation discussed in this thread?
>>>>
>>>>
>>>>
>>>> http://mail-archives.apache.org/mod_mbox/httpd-dev/200711.mbox/%3Ccc67648e0711130530h45c2a28ctcd743b2160e22914@mail.gmail.com%3E
>>>>
>>>> Perhaps Event's special handling for keepalive connections results in
>>>> the window being encountered more often?
>>>>
>>>
>>> I'd say yes. I know from the packet trace, that the previous response on
>>> the
>>> same connection got "Connection: Keep-Alive". But from the time gap of
>>> about
>>> 0.5 seconds between receving the next request and sending the FIN, I
>>> guess,
>>> that the child was not already in the process of shutting down, when the
>>> previous "Connection: Keep-Alive" response was send.
>>>
>>> So for me the question is: if the web server already acknowledged the
>>> next
>>> request (in our case it's a GET request, and a TCP ACK), should it wait
>>> with
>>> shutting down the child until the request has been processed and the
>>> response has been send (and in this case "Connetion: Close" was
>>> included)?
>>>
>>
>> Since the ACK is out of our control, that situation is potentially
>> within the race condition.
>>
>>
>>> For the connections which do not have another request pending, I see no
>>> problem in closing them - although there could be a race condition. When
>>> there's a race (client sends next request while server sends FIN), the
>>> client doesn't expect the server to handle the request (it can always
>>> happen
>>> when a Keep Alive connection times out). In the situation observed it is
>>> annoying, that the server already accepted the next request and
>>> nevertheless
>>> closes the connection without handling the request.
>>>
>>
>> All we can know is whether or not the socket is readable at the point
>> where we want to gracefully exit the process.  In keepalive state we'd
>> wait for {timeout, readability, shutdown-event}, and if readable at
>> wakeup then try to process it unless
>> !c->base_server->keep_alive_while_exiting&&
>> ap_graceful_stop_signalled().
>>
>>  I will do some testing around your patch
>>>
>>> http://people.apache.org/~trawick/keepalive.txt<http://people.apache.org/%7Etrawick/keepalive.txt>
>>>
>>
>> I don't think the patch will cover Event.  It modifies
>> ap_process_http_connection(); ap_process_http_async_connection() is
>> used with Event unless there are "clogging input filters."  I guess
>> the analogous point of processing is inside Event itself.
>>
>> I guess if KeepAliveWhileExiting is enabled (whoops, that's
>> vhost-specific) then Event would have substantially different shutdown
>> logic.
>>
>
> I could now take a second look at it. Directly porting your patch to trunk
> and event is straightforward. There remains a hard problem though: the
> listener thread has a big loop of type
>
>    while (!listener_may_exit) {
>        apr_pollset_poll(...)
>        while (HANDLE_EVENTS) {
>            if (READABLE_SOCKET)
>                ...
>            else if (ACCEPT)
>                ...
>        }
>        HANDLE_KEEPALIVE_TIMEOUTS
>        HANDLE_WRITE_COMPLETION_TIMEOUTS
>    }
>
> Obviously, if we want to respect any previously retunred "Connection:
> Keep-Alive" headers, we can't terminate the loop on listeners_may_exit. As a
> first try, I switched to:
>
>    while (1) {
>        if (listener_may_exit)
>            ap_close_listeners();
>        apr_pollset_poll(...);
>        REMOVE_LISTENERS_FROM_POLLSET
>        while (HANDLE_EVENTS) {
>            if (READABLE_SOCKET)
>                ...
>            else if (ACCEPT)
>                ...
>        }
>        HANDLE_KEEPALIVE_TIMEOUTS
>        HANDLE_WRITE_COMPLETION_TIMEOUTS
>    }
>
> Now the listeners get closed and in combination with your patch the
> connections will not be dropped, but instead will receive a "Connection:
> close" during the next request.
>
> Now the while-loop lacks a correct break criterium. It would need to stop,
> when the pollset is empty (listeners were removed, other connections were
> closed due to end of Keep-Alive or timeout). Unfortunately there is no API
> function for checking whether there are still sockets in the pollset and it
> isn't straightforward how to do that.
>
> Another possibility would be to wait for a maximum of the vhost keepalive
> timeouts. But that seems to be a bit to much.
>
> Any ideas or comments?
>
> Regards,
>
> Rainer
>

Re: Unclean process shutdown in event MPM?

Posted by Rainer Jung <ra...@kippdata.de>.
On 29.04.2010 18:14, Greg Ames wrote:
> I re-read this thread and see that we have a request in progress, so
> this isn't RFC approved behavior.  Sorry for the noise.

Hmmm, it depends. The situation observed and the discussion I raised is 
about an established connection having already returned a Connection: 
Keep-Alive and a response, waiting for the next request, and now a 
process shutdown arrived via e.g.

- graceful-stop

or

- MaxRequestsPerChild reached

or

- MaxSpare detected during maintenance

Yes, in the observed situation the next request for the connection has 
already been transmitted and ACKed, but not yet read by Apache. So from 
the point of view of the web server it hasn't yet accepted the request 
but it could find out whether there is one waiting to be handled, from 
the point of view of the client the next request has been successfully 
transmitted.

Regards,

Rainer

> On Sun, Apr 25, 2010 at 2:07 PM, Rainer Jung <rainer.jung@kippdata.de
> <ma...@kippdata.de>> wrote:
>
>     On 23.03.2010 15:30, Jeff Trawick wrote:
>
>
>                     Is that expected behaviour? It doesn't seem
>                     reproducible for the worker
>                     MPM.
>                     The behaviour has been observed using extreme spare
>                     rules in order to
>                     make
>                     processes shut down often, but it still seems not right.
>
>
>                 Is this the currently-unhandled situation discussed in
>                 this thread?
>
>
>                 http://mail-archives.apache.org/mod_mbox/httpd-dev/200711.mbox/%3Ccc67648e0711130530h45c2a28ctcd743b2160e22914@mail.gmail.com%3E
>
>                 Perhaps Event's special handling for keepalive
>                 connections results in
>                 the window being encountered more often?
>
>
>             I'd say yes. I know from the packet trace, that the previous
>             response on the
>             same connection got "Connection: Keep-Alive". But from the
>             time gap of about
>             0.5 seconds between receving the next request and sending
>             the FIN, I guess,
>             that the child was not already in the process of shutting
>             down, when the
>             previous "Connection: Keep-Alive" response was send.

Re: Unclean process shutdown in event MPM?

Posted by Greg Ames <am...@gmail.com>.
I re-read this thread and see that we have a request in progress, so this
isn't RFC approved behavior.  Sorry for the noise.

Greg

On Sun, Apr 25, 2010 at 2:07 PM, Rainer Jung <ra...@kippdata.de>wrote:

> On 23.03.2010 15:30, Jeff Trawick wrote:
>
>>
>>>>> Is that expected behaviour? It doesn't seem reproducible for the worker
>>>>> MPM.
>>>>> The behaviour has been observed using extreme spare rules in order to
>>>>> make
>>>>> processes shut down often, but it still seems not right.
>>>>>
>>>>
>>>> Is this the currently-unhandled situation discussed in this thread?
>>>>
>>>>
>>>>
>>>> http://mail-archives.apache.org/mod_mbox/httpd-dev/200711.mbox/%3Ccc67648e0711130530h45c2a28ctcd743b2160e22914@mail.gmail.com%3E
>>>>
>>>> Perhaps Event's special handling for keepalive connections results in
>>>> the window being encountered more often?
>>>>
>>>
>>> I'd say yes. I know from the packet trace, that the previous response on
>>> the
>>> same connection got "Connection: Keep-Alive". But from the time gap of
>>> about
>>> 0.5 seconds between receving the next request and sending the FIN, I
>>> guess,
>>> that the child was not already in the process of shutting down, when the
>>> previous "Connection: Keep-Alive" response was send.
>>
>>