You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Yann Ylavic <yl...@gmail.com> on 2017/01/30 14:15:35 UTC

Re: [Bug 56188] mod_proxy_fcgi does not send FCGI_ABORT_REQUEST on client disconnect

Hi Luca,

continuing on dev@...

On Mon, Jan 30, 2017 at 11:42 AM,  <bu...@apache.org> wrote:
> https://bz.apache.org/bugzilla/show_bug.cgi?id=56188
[]
>
> My question was if there was any corner case in which if, after a client has
> initiated a TCP connection close, mod-proxy-fcgi wrongly assume that the
> connection dropped (because of POLLIN and EOF) taking its own remediation
> (error logs, etc..).

I think so, think about pipelined requests (or TLS-only data).

>
> In this case I am wondering if my patch would introduce a performance
> regression for use cases like big requests coming in (like POSTs for example).

In the above case, your patch would possibly loop endlessly because
nothing empties the socket, or pulls pfd[1] out from the pollset.

> We don't really care about reading those bytes (in the code introduced by my
> patch) but only to check if the connection with the client is still up, so
> there might be a better way to approach the issue.

It could be achievable, but not easy, with care taken to pull the
client's socket out of the pollset if anything but an error or a
connection/TLS close is detected.
It probably also shouldn't start before "last_stdin" is true.

Client side poll() may return with either:
1. HTTP data (pipelined) => client still alive, non-empty/meta brigade
=> don't abort
2. TCP close or reset => bucket EOS or an APR_E* (though
speculative-non-blocking reads won't return EOS, and may turn EOF into
SUCCESS with empty brigade!) => abort
3. TLS close notify => EOS/EOF? => abort
4. TLS renegociation => rejected by httpd (since initiated by the
client) => an APR_E* => abort
n. ...

Probably not an exhaustive list...


BTW, if the client connection is closed under us, why don't we keep
detecting it on write (pass_brigade) while forwarding the response?
We'd want, for good reasons, to be active between the time the request
is sent to the backend and the response arrives?


Regards,
Yann.

Re: [Bug 56188] mod_proxy_fcgi does not send FCGI_ABORT_REQUEST on client disconnect

Posted by Yann Ylavic <yl...@gmail.com>.
On Thu, Feb 2, 2017 at 5:16 PM, Yann Ylavic <yl...@gmail.com> wrote:
> On Thu, Feb 2, 2017 at 4:51 PM, Luca Toscano <to...@gmail.com> wrote:
>>
>> 2017-01-30 21:58 GMT+01:00 Yann Ylavic <yl...@gmail.com>:
>>>
>>> Maybe what is missing is nonblocking reads on the backend side, and on
>>> EAGAIN flush on the client side to detect potential socket errors?
>>
>> Interesting.. So the idea would be to be non blocking while reading from the
>> FCGI backend, and in case of EAGAIN flush to the client in order to force
>> ap_pass_brigade to detect if the client aborted the connection.
>
> Yes, poll() => POLLIN => read() while data available => EAGAIN =>
> flush => poll().

The "read() while data available => EAGAIN" may be replaced by "read()
while read buffer is full" so to avoid a (likely) useless last
non-blocking read.

>
> The flush is necessary because output filters down the road might have
> buffered, so it makes so that everything so far reaches the socket
> (with potential error).
>
>> The flushing
>> part might be tricky to implement (at least for me, but I'll try to work on
>> it :).
>
> It should be as easy as pushing a FLUSH bucket at the end of the
> brigade passed to the client ;)
>
> The hard part may be more the non-blocking read (and handling of
> incomplete headers/data).
>
> Regards,
> Yann.

Re: [Bug 56188] mod_proxy_fcgi does not send FCGI_ABORT_REQUEST on client disconnect

Posted by Yann Ylavic <yl...@gmail.com>.
On Thu, Feb 2, 2017 at 4:51 PM, Luca Toscano <to...@gmail.com> wrote:
>
> 2017-01-30 21:58 GMT+01:00 Yann Ylavic <yl...@gmail.com>:
>>
>> Maybe what is missing is nonblocking reads on the backend side, and on
>> EAGAIN flush on the client side to detect potential socket errors?
>
> Interesting.. So the idea would be to be non blocking while reading from the
> FCGI backend, and in case of EAGAIN flush to the client in order to force
> ap_pass_brigade to detect if the client aborted the connection.

Yes, poll() => POLLIN => read() while data available => EAGAIN =>
flush => poll().

The flush is necessary because output filters down the road might have
buffered, so it makes so that everything so far reaches the socket
(with potential error).

> The flushing
> part might be tricky to implement (at least for me, but I'll try to work on
> it :).

It should be as easy as pushing a FLUSH bucket at the end of the
brigade passed to the client ;)

The hard part may be more the non-blocking read (and handling of
incomplete headers/data).

Regards,
Yann.

Re: [Bug 56188] mod_proxy_fcgi does not send FCGI_ABORT_REQUEST on client disconnect

Posted by Luca Toscano <to...@gmail.com>.
2017-01-30 21:58 GMT+01:00 Yann Ylavic <yl...@gmail.com>:

> On Mon, Jan 30, 2017 at 7:17 PM, Luca Toscano <to...@gmail.com>
> wrote:
> >
> > The use case that I had (the one that caused me to check the original
> > bugzilla task/patch and work on it) was a long running PHP script
> (running
> > on HHVM) that wasn't returning anything until the end of the job
> (minutes),
> > causing mod-proxy-fcgi to keep waiting even if the client disconnected.
> In
> > the beginning I also thought that there was a way to "signal" the FCGI
> > backend to stop whatever was doing (FCGI_ABORT), but it turned out to be
> not
> > widely implemented (at least from what I have checked, more info in
> > bugzilla). Timeout and ProxyTimeout could be a good compromise for this
> > particular issue, but it is a "one size fits all" solution that some
> users
> > didn't like (providing a patch with the pollset solution).
>
> I think that's what CGIDScriptTimeout or ProxyTimeout are for: avoid
> waiting too much for the script or backend (above some reasonably
> configured limit).
>
> >
> > If we remove the above use case (I fixed it in my Production environment
> > with a long Timeout value), would it be fine to rely on something like
> the
> > conn->aborted flag (afaics set by the output filter's chain while
> writing to
> > the client's conn)? It would simplify a lot the problem :)
>
> AFAICT, if forwarding data to the client fails (e.g. socket
> disconnected), ap_pass_brigade() returns an error and the loop exits.
>

Yes confirmed with a test (that flushes data to output periodically), the
connection->aborted flag is set and mod_proxy_fcgi behaves correctly.


> Maybe what is missing is nonblocking reads on the backend side, and on
> EAGAIN flush on the client side to detect potential socket errors?
>

Interesting.. So the idea would be to be non blocking while reading from
the FCGI backend, and in case of EAGAIN flush to the client in order to
force ap_pass_brigade to detect if the client aborted the connection. The
flushing part might be tricky to implement (at least for me, but I'll try
to work on it :).

In any case, it seems like we are in agreement to discard the client
connection polling idea (if anybody has more thoughts please speak up). In
any case, I believe that this use case would need to be documented properly
in the docs to warn users assuming that FCGI_ABORT is implemented.

Thanks for the review!

Luca

Re: [Bug 56188] mod_proxy_fcgi does not send FCGI_ABORT_REQUEST on client disconnect

Posted by Yann Ylavic <yl...@gmail.com>.
On Mon, Jan 30, 2017 at 7:17 PM, Luca Toscano <to...@gmail.com> wrote:
>
> The use case that I had (the one that caused me to check the original
> bugzilla task/patch and work on it) was a long running PHP script (running
> on HHVM) that wasn't returning anything until the end of the job (minutes),
> causing mod-proxy-fcgi to keep waiting even if the client disconnected. In
> the beginning I also thought that there was a way to "signal" the FCGI
> backend to stop whatever was doing (FCGI_ABORT), but it turned out to be not
> widely implemented (at least from what I have checked, more info in
> bugzilla). Timeout and ProxyTimeout could be a good compromise for this
> particular issue, but it is a "one size fits all" solution that some users
> didn't like (providing a patch with the pollset solution).

I think that's what CGIDScriptTimeout or ProxyTimeout are for: avoid
waiting too much for the script or backend (above some reasonably
configured limit).

>
> If we remove the above use case (I fixed it in my Production environment
> with a long Timeout value), would it be fine to rely on something like the
> conn->aborted flag (afaics set by the output filter's chain while writing to
> the client's conn)? It would simplify a lot the problem :)

AFAICT, if forwarding data to the client fails (e.g. socket
disconnected), ap_pass_brigade() returns an error and the loop exits.
Maybe what is missing is nonblocking reads on the backend side, and on
EAGAIN flush on the client side to detect potential socket errors?

Re: [Bug 56188] mod_proxy_fcgi does not send FCGI_ABORT_REQUEST on client disconnect

Posted by Luca Toscano <to...@gmail.com>.
2017-01-30 15:15 GMT+01:00 Yann Ylavic <yl...@gmail.com>:

> Hi Luca,
>
> continuing on dev@...
>
> On Mon, Jan 30, 2017 at 11:42 AM,  <bu...@apache.org> wrote:
> > https://bz.apache.org/bugzilla/show_bug.cgi?id=56188
> []
> >
> > My question was if there was any corner case in which if, after a client
> has
> > initiated a TCP connection close, mod-proxy-fcgi wrongly assume that the
> > connection dropped (because of POLLIN and EOF) taking its own remediation
> > (error logs, etc..).
>
> I think so, think about pipelined requests (or TLS-only data).
>
> >
> > In this case I am wondering if my patch would introduce a performance
> > regression for use cases like big requests coming in (like POSTs for
> example).
>
> In the above case, your patch would possibly loop endlessly because
> nothing empties the socket, or pulls pfd[1] out from the pollset.
>
> > We don't really care about reading those bytes (in the code introduced
> by my
> > patch) but only to check if the connection with the client is still up,
> so
> > there might be a better way to approach the issue.
>
> It could be achievable, but not easy, with care taken to pull the
> client's socket out of the pollset if anything but an error or a
> connection/TLS close is detected.
> It probably also shouldn't start before "last_stdin" is true.
>
> Client side poll() may return with either:
> 1. HTTP data (pipelined) => client still alive, non-empty/meta brigade
> => don't abort
> 2. TCP close or reset => bucket EOS or an APR_E* (though
> speculative-non-blocking reads won't return EOS, and may turn EOF into
> SUCCESS with empty brigade!) => abort
> 3. TLS close notify => EOS/EOF? => abort
> 4. TLS renegociation => rejected by httpd (since initiated by the
> client) => an APR_E* => abort
> n. ...
>
> Probably not an exhaustive list...
>

Thanks a lot for the list! I am not familiar with all the use cases that
you brought up but it feels to me that the mod_proxy_fcgi patch could bring
more headaches than benefits..


>
>
> BTW, if the client connection is closed under us, why don't we keep
> detecting it on write (pass_brigade) while forwarding the response?
> We'd want, for good reasons, to be active between the time the request
> is sent to the backend and the response arrives?
>

The use case that I had (the one that caused me to check the original
bugzilla task/patch and work on it) was a long running PHP script (running
on HHVM) that wasn't returning anything until the end of the job (minutes),
causing mod-proxy-fcgi to keep waiting even if the client disconnected. In
the beginning I also thought that there was a way to "signal" the FCGI
backend to stop whatever was doing (FCGI_ABORT), but it turned out to be
not widely implemented (at least from what I have checked, more info in
bugzilla). Timeout and ProxyTimeout could be a good compromise for this
particular issue, but it is a "one size fits all" solution that some users
didn't like (providing a patch with the pollset solution).

If we remove the above use case (I fixed it in my Production environment
with a long Timeout value), would it be fine to rely on something like the
conn->aborted flag (afaics set by the output filter's chain while writing
to the client's conn)? It would simplify a lot the problem :)

Thanks!

Luca