You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Adam Hill <si...@gmail.com> on 2020/11/02 12:03:55 UTC

Re: Lingering close + unwritten data == failed connections

Hi Yann,

Yep, I can confirm that the patch fixes the issue. Interestingly ( or maybe
not ) I had a quick glance at apr_socket_close and it seems to set a
SO_LINGER timeout of 30 seconds, so I sort of expected the problem still to
happen but at half the transfer rate.... but that doesn't seem to be the
case. As I say, it was a very cursory look, so maybe it does more than that
( or maybe the linger timeout is just time for the close() call to return
but RST isn't sent. )

Anyway, this does seem to be the fix, and you've got to hope that any type
of DoS attempting to take advantage of sockets in the various CLOSE_WAIT et
al states would be mitigated at kernel level.

Thanks for looking at this Yann.

Adam

On Sat, 31 Oct 2020 at 00:57, Yann Ylavic <yl...@gmail.com> wrote:

> On Wed, Oct 28, 2020 at 6:40 PM Joe Orton <jo...@redhat.com> wrote:
> >
> > On Wed, Oct 21, 2020 at 05:17:01PM +0100, Adam Hill wrote:
> > > On Linux at least, you can see how much unsent data remains by
> querying the
> > > SIOCOUTQ ioctl, so the mitigation would be to check to see that ANY
> data
> > > was draining at all, and if so ( and there's some left ) extend the
> > > lingering close time and repeat. However, this wouldn't be a cross
> platform
> > > solution, but it would at least be the "correct" thing to do in terms
> of
> > > network function. Not sure if there's an equivalent on other systems.
> >
> > Nice writeup, thank you.
>
> +1
>
> > So I kind of wish that
> > something was missed here, but multiple people have come to exactly that
> > conclusion independently.
>
> It may be due to r1802875 where I added RST (SO_LINGER.l_linger = 0)
> after lingering close timeout.
> Thinking of it now, it's probably not the right thing to do. Simply
> calling apr_socket_close() in abort_socket_nonblocking() would allow
> the system's lingering close after httpd's.
>
> Adam, can you still observe the same behaviour with the attached
> mpm_event patch applied?
>
>
> Regards;
> Yann.
>

Re: Lingering close + unwritten data == failed connections

Posted by Yann Ylavic <yl...@gmail.com>.
> Thanks for testing!

And investigating the bug, nice report.

Re: Lingering close + unwritten data == failed connections

Posted by Yann Ylavic <yl...@gmail.com>.
Hi Adam,

On Mon, Nov 2, 2020 at 1:04 PM Adam Hill <si...@gmail.com> wrote:
>
> Yep, I can confirm that the patch fixes the issue.

Thanks for testing, committed to trunk in https://svn.apache.org/r1883097
I'll propose a backport to 2.4.x ASAP.

> Interestingly ( or maybe not ) I had a quick glance at apr_socket_close and it seems to set a SO_LINGER timeout of 30 seconds, so I sort of expected the problem still to happen but at half the transfer rate.... but that doesn't seem to be the case.

I don't see any (internal) use of (APR_)SO_LINGER in the APR library,
one can call apr_socket_opt_set() to set the option on the socket but
neither httpd nor APR seem to actually use it.

> As I say, it was a very cursory look, so maybe it does more than that ( or maybe the linger timeout is just time for the close() call to return but RST isn't sent. )

That would be bad actually, SO_LINGER with a positive timeout (as
opposed to zero timeout to reset the connection like mpm_event did)
would/could cause close() to block, while abort_socket_nonblocking()
in mpm_event must not block (at least from some callers).

Unix systems don't block on close() unless SO_LINGER is used, removing
the reset depends on this actually.

>
> Anyway, this does seem to be the fix, and you've got to hope that any type of DoS attempting to take advantage of sockets in the various CLOSE_WAIT et al states would be mitigated at kernel level.

It certainly will do better than httpd which has no control on this anyway :)

>
> Thanks for looking at this Yann.

Thanks for testing!


Regards;
Yann.