You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Brian Pane <br...@apache.org> on 2005/08/26 09:42:19 UTC

PATCH: lazy initialization of TCP_NODELAY (workaround for 2.6 TCP_CORK problem)

The attached patch delays the setting of TCP_NODELAY on client  
connections
until the first time core_output_filter has to do a writev_it_all()  
or emulate_sendfile().
My motivation for this is to work around the TCP_NODELAY/TCP_CORK  
problem
in Linux 2.6.  However, it also will save a couple of syscalls for  
any request that
is handled with sendfile(2).

Note that there was an APR bug that caused TCP_NODELAY to be set on the
listener socket at startup as a side-effect of TCP_DEFER_ACCEPT being  
set.
I just committed a fix for this.  Without that fix, Linux 2.6 systems  
using this httpd
patch will still exhibit the corking problem.

Can a couple of volunteers please test and/or review this patch?  I'd  
appreciate
a second opinion before I commit, given the subtlety of the NODELAY  
and CORK
semantics on various platforms.

Thanks,
Brian

Re: PATCH: lazy initialization of TCP_NODELAY (workaround for 2.6 TCP_CORK problem)

Posted by Brian Pane <br...@apache.org>.
On Aug 26, 2005, at 1:59 AM, Joe Orton wrote:


> On Fri, Aug 26, 2005 at 01:23:15AM -0700, Brian Pane wrote:
>
>>
>> I didn't think it was actually possible for APR to allow TCP_CORK and
>> TCP_NODELAY
>> at the same time.  From the tcp(7) manual page on my FC4  
>> installation,
>>
>>
>
> That's out of date yes, see recent thread ;)
>
> All 2.6.x kernels do allow setting both TCP_CORK and TCP_NODELAY at  
> the
> same time.
>
> All current 2.6.x kernels have the bug in TCP_CORK handling which  
> means
> that if TCP_NODELAY was ever enabled on the socket, TCP_CORK will not
> take effect.
>
> The fix for that was in the 2.6.13-rc7 release, for the curious:
> http://www.kernel.org/git/?p=linux/kernel/git/torvalds/ 
> linux-2.6.git;a=commit;h=89ebd197eb2cd31d6187db344d5117064e19fdde
>

Ah, thanks.  With this kernel fix forthcoming, the current httpd  
implementation
(sans my patch) will do the right thing for both sendfile and non- 
sendfile responses.

Brian


Re: PATCH: lazy initialization of TCP_NODELAY (workaround for 2.6 TCP_CORK problem)

Posted by Joe Orton <jo...@redhat.com>.
On Fri, Aug 26, 2005 at 01:23:15AM -0700, Brian Pane wrote:
> 
> On Aug 26, 2005, at 12:55 AM, Joe Orton wrote:
> 
> >On Fri, Aug 26, 2005 at 12:42:19AM -0700, Brian Pane wrote:
> >
> >>The attached patch delays the setting of TCP_NODELAY on client
> >>connections until the first time core_output_filter has to do a
> >>writev_it_all() or emulate_sendfile(). My motivation for this is to
> >>work around the TCP_NODELAY/TCP_CORK problem in Linux 2.6.  However,
> >>it also will save a couple of syscalls for any request that is  
> >>handled
> >>with sendfile(2).
> >>
> >
> >This will actually end up being *more* expensive on 2.6 systems in the
> >long run, though, right?  I'm not convinced this is a good idea.  With
> >APR changed to allow setting TCP_CORK and TCP_NODELAY at the same time
> >with 2.6, it is cheaper to just set TCP_NODELAY once on the listening
> >socket and never have to touch it again.
> 
> I didn't think it was actually possible for APR to allow TCP_CORK and  
> TCP_NODELAY
> at the same time.  From the tcp(7) manual page on my FC4 installation,

That's out of date yes, see recent thread ;)

All 2.6.x kernels do allow setting both TCP_CORK and TCP_NODELAY at the 
same time.

All current 2.6.x kernels have the bug in TCP_CORK handling which means 
that if TCP_NODELAY was ever enabled on the socket, TCP_CORK will not 
take effect.

The fix for that was in the 2.6.13-rc7 release, for the curious: 
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=89ebd197eb2cd31d6187db344d5117064e19fdde

...
> If it's possible to use both TCP_CORK and TCP_NODELAY on the same
> socket in 2.6.13 or later (assuming that's when the fix for the current
> NODELAY toggling problem becomes available), then yes, my lazy
> evaluation approach will be less efficient than just setting TCP_NODELAY
> on the listener socket--for requests that don't use sendfile.
> For requests that do use sendfile, I think the logic implemented by my 
> patch will be optimal for both 2.6.1-2.6.12 and 2.6.13+

To be clear this is only a partial workaround for the kernel bug: if at 
any time something is sent on the connection which requires enabling 
TCP_NODELAY, any subsequent TCP_CORKs will have no effect.

Given that fact I'm not convinced it's worth changing httpd: this is (1) 
a kernel bug and (2) an APR lack-of-feature; with both of those things 
fixed the current httpd code is perfectly correct.

joe

Re: PATCH: lazy initialization of TCP_NODELAY (workaround for 2.6 TCP_CORK problem)

Posted by Brian Pane <br...@apache.org>.
On Aug 26, 2005, at 12:55 AM, Joe Orton wrote:

> On Fri, Aug 26, 2005 at 12:42:19AM -0700, Brian Pane wrote:
>
>> The attached patch delays the setting of TCP_NODELAY on client
>> connections until the first time core_output_filter has to do a
>> writev_it_all() or emulate_sendfile(). My motivation for this is to
>> work around the TCP_NODELAY/TCP_CORK problem in Linux 2.6.  However,
>> it also will save a couple of syscalls for any request that is  
>> handled
>> with sendfile(2).
>>
>
> This will actually end up being *more* expensive on 2.6 systems in the
> long run, though, right?  I'm not convinced this is a good idea.  With
> APR changed to allow setting TCP_CORK and TCP_NODELAY at the same time
> with 2.6, it is cheaper to just set TCP_NODELAY once on the listening
> socket and never have to touch it again.

I didn't think it was actually possible for APR to allow TCP_CORK and  
TCP_NODELAY
at the same time.  From the tcp(7) manual page on my FC4 installation,

        TCP_CORK
               If set, don’t send  out  partial  frames.   All   
queued  partial
               frames  are sent when the option is cleared again.   
This is use-
               ful for prepending headers before calling  sendfile 
(2),  or  for
               throughput  optimization.   This  option cannot be  
combined with
               TCP_NODELAY.  This option should not be used in code  
intended to
               be portable.

and

        TCP_NODELAY
               If  set,  disable the Nagle algorithm.  This means  
that segments
               are always sent as soon as possible, even if  there   
is  only  a
               small  amount  of  data.   When  not set, data is  
buffered until
               there is a sufficient amount to send out, thereby   
avoiding  the
               frequent  sending  of  small packets, which results in  
poor uti-
               lization of the network.  This option cannot be used  
at the same
               time as the option TCP_CORK.

Is the manpage out of date?

If it's possible to use both TCP_CORK and TCP_NODELAY on the same
socket in 2.6.13 or later (assuming that's when the fix for the current
NODELAY toggling problem becomes available), then yes, my lazy
evaluation approach will be less efficient than just setting TCP_NODELAY
on the listener socket--for requests that don't use sendfile.  For  
requests
that do use sendfile, I think the logic implemented by my patch will be
optimal for both 2.6.1-2.6.12 and 2.6.13+

Brian


Re: PATCH: lazy initialization of TCP_NODELAY (workaround for 2.6 TCP_CORK problem)

Posted by Joe Orton <jo...@redhat.com>.
On Fri, Aug 26, 2005 at 12:42:19AM -0700, Brian Pane wrote:
> The attached patch delays the setting of TCP_NODELAY on client 
> connections until the first time core_output_filter has to do a 
> writev_it_all() or emulate_sendfile(). My motivation for this is to 
> work around the TCP_NODELAY/TCP_CORK problem in Linux 2.6.  However, 
> it also will save a couple of syscalls for any request that is handled 
> with sendfile(2).

This will actually end up being *more* expensive on 2.6 systems in the 
long run, though, right?  I'm not convinced this is a good idea.  With 
APR changed to allow setting TCP_CORK and TCP_NODELAY at the same time 
with 2.6, it is cheaper to just set TCP_NODELAY once on the listening 
socket and never have to touch it again.

> Note that there was an APR bug that caused TCP_NODELAY to be set on 
> the listener socket at startup as a side-effect of TCP_DEFER_ACCEPT 
> being set.

good catch!

joe