You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Brian Pane <bp...@pacbell.net> on 2001/07/03 05:04:44 UTC

Extraneous socket read?

With keepalives on, the system call profile for reading a request
typically looks like this:

accept(5, {sin_family=AF_INET, sin_port=htons(44579), 
sin_addr=inet_addr("127.0.0.1")}}, [16]) = 9 <9.604103>
read(6, 0xbffff873, 1)                  = -1 EAGAIN (Resource 
temporarily unavailable) <0.000007>
getsockname(9, {sin_family=AF_INET, sin_port=htons(80), 
sin_addr=inet_addr("127.0.0.1")}}, [16]) = 0 <0.000006>
fcntl64(9, F_GETFL)                     = 0x2 (flags O_RDWR) <0.000005>
fcntl64(9, F_SETFL, O_RDWR|O_NONBLOCK)  = 0 <0.000004>
read(9, 0x812df10, 8192)                = -1 EAGAIN (Resource 
temporarily unavailable) <0.000007>
select(10, [9], NULL, NULL, {300, 0})   = 1 (in [9], left {300, 0}) 
<0.000058>
read(9, "GET /50k.shtml HTTP/1.0\r\nUser-Ag"..., 8192) = 87 <0.000008>

Note the pattern at the end: the first read fails, because the request
hasn't arrived yet.  It takes a bit more time for the first packet to 
arrive.

Given that I'm seeing this phenomenon over the loopback, I'm assuming
that it's common with external clients with larger round-trip times too:
the server progresses from the accept to the read before the client sends
the first packet of the request.  If this assumption is correct (does 
anybody
have data to confirm or refute it?), then it should be possible to eliminate
a system call by not doing the initial read before the select.

--Brian

Re: Limit directive and unimplemented methods

Posted by "Roy T. Fielding" <fi...@ebuilt.com>.

> It doesn't logically follow that just because the server has a <limit>
> directive with that method name, it necessarily implements that method.

There is no following to the logic -- it is a definition.

      In Apache httpd, LIMITable == implemented.

The only time 501 is preferred is if the server as a whole, in its entirety,
has never heard of that method before and isn't willing to pass it to
the resource handler to find out if it is allowed.  501 is normally only
sent by proxy servers that have been configured to block unknown methods,
and by old/special-purpose servers that only support a fixed set of methods.

....Roy

Re: Limit directive and unimplemented methods

Posted by Cody Sherr <cs...@covalent.net>.

On Wed, 18 Jul 2001, Roy T. Fielding wrote:

> > I have a question regarding unsupported methods in <LIMIT> directives.
> >
> > Let's say that you could limit arbitrary methods that different modules
> > have implemented. For this example, a module has defined a new method FOO,
> > but no module handles method FOO2:
> >
> > In the cconfig file:
> >
> >
> > <LIMIT FOO FOO2>
> >  ...
> > </LIMIT>
> >
> > What would the correct behavior be if a user made a request with method
> > FOO2? It seems to me that according to the RFC, it should ignore this
> > Limit directive and return 501 (not implemented) instead of 401 (auth
> > required)  or 405 (method not allowed).
>
> If the server has a limit directive with the method name, then it knows
> the method and has therefore implemented it.  The module determines what
> is allowed per resource, not per server, so either 401 or 405 should be
> returned.
>
> ....Roy
>
>

It doesn't logically follow that just because the server has a <limit>
directive with that method name, it necessarily implements that method.

As stated in the example, the server has definitely not implemented method
FOO2, but it has been entered in a LIMIT config directive. To rephrase the
question, what would the correct behavior be for a server responding to
request methods that were LIMITED but not implemented?

It seems to me that 501 is more appropriate.

regards,

-- 
Cody Sherr

Engineer
Covalent Technologies

phone: (415)536-5292
email: csherr@covalent.net

Re: Limit directive and unimplemented methods

Posted by "Roy T. Fielding" <fi...@ebuilt.com>.

> I have a question regarding unsupported methods in <LIMIT> directives.
> 
> Let's say that you could limit arbitrary methods that different modules
> have implemented. For this example, a module has defined a new method FOO,
> but no module handles method FOO2:
> 
> In the cconfig file:
> 
> 
> <LIMIT FOO FOO2>
>  ...
> </LIMIT>
> 
> What would the correct behavior be if a user made a request with method
> FOO2? It seems to me that according to the RFC, it should ignore this
> Limit directive and return 501 (not implemented) instead of 401 (auth
> required)  or 405 (method not allowed).

If the server has a limit directive with the method name, then it knows
the method and has therefore implemented it.  The module determines what
is allowed per resource, not per server, so either 401 or 405 should be
returned.

....Roy

Limit directive and unimplemented methods

Posted by Cody Sherr <cs...@covalent.net>.

I have a question regarding unsupported methods in <LIMIT> directives.

Let's say that you could limit arbitrary methods that different modules
have implemented. For this example, a module has defined a new method FOO,
but no module handles method FOO2:

In the cconfig file:


<LIMIT FOO FOO2>
 ...
</LIMIT>

What would the correct behavior be if a user made a request with method
FOO2? It seems to me that according to the RFC, it should ignore this
Limit directive and return 501 (not implemented) instead of 401 (auth
required)  or 405 (method not allowed).

The relevant part of RFC 2616:

>>> start quote
5.1.1 Method

The Method token indicates the method to be performed on the resource
identified by the Request-URI. The method is case-sensitive.

       Method         = "OPTIONS"                ; Section 9.2
                      | "GET"                    ; Section 9.3
                      | "HEAD"                   ; Section 9.4
                      | "POST"                   ; Section 9.5
                      | "PUT"                    ; Section 9.6
                      | "DELETE"                 ; Section 9.7
                      | "TRACE"                  ; Section 9.8
                      | "CONNECT"                ; Section 9.9
                      | extension-method
       extension-method = token

The list of methods allowed by a resource can be specified in an Allow
header field (section 14.7). The return code of the response always
notifies the client whether a method is currently allowed on a resource,
since the set of allowed methods can change dynamically. An origin server
SHOULD return the status code 405 (Method Not Allowed) if the method is
known by the origin server but not allowed for the requested resource, and
501 (Not Implemented) if the method is unrecognized or not implemented by
the origin server. The methods GET and HEAD MUST be supported by all
general-purpose servers. All other methods are OPTIONAL; however, if the
above methods are implemented, they MUST be implemented with the same
semantics as those specified in section 9.
<<< end quote


Thanks for your time,

-- 
Cody Sherr

Engineer
Covalent Technologies

phone: (415)536-5292
email: csherr@covalent.net

Re: Extraneous socket read?

Posted by Brian Pane <bp...@pacbell.net>.

dean gaudet wrote:

>On Sun, 8 Jul 2001, dean gaudet wrote:
>
>>On Sat, 7 Jul 2001, Bill Stoddard wrote:
>>
>>>Humm... If you use TCP_DEFER_ACCEPT, how to you handle clients
>>>connecting but not sending any bytes?
>>>
>>i'm not sure what happens if they connect and close before the timeout.
>>
>
>the socket is passed to accept() and you read() an EOF immediately... just
>like we'd like to see.
>
>also, it appears the timeout is broken (or i'm using it wrong)... awaiting
>word from kernel gurus.
>
Any good news on TCP_DEFER_ACCEPT?  If timeout handling is broken
in the kernel, I propose that it's better to just put in a
select-before-read patch for now, and later retrofit support for
TCP_DEFER_ACCEPT when timeouts work.  (And for Solaris, the
select-before-read solution is probably the long term solution
anyway.)

--Brian

Re: Extraneous socket read?

Posted by Brian Pane <bp...@pacbell.net>.

dean gaudet wrote:

>
>On Sun, 8 Jul 2001, Brian Pane wrote:
>
>>So I guess the ideal algorithm for retrieving the initial request
>>from a client is:
>>
>>  OS with working TCP_DEFER_ACCEPT or equivalent: select first, then read
>>  OS without working TCP_DEFER_ACCEPT or equivalent: read first
>>
>
>isn't it the other way around?
>
Yes; I transposed the rules for the two cases.  What I should have said was:
  OS with working TCP_DEFER_ACCEPT or equivalent: read first
  OS without working TCP_DEFER_ACCEPT or equivalent: select first

--Brian

Re: Extraneous socket read?

Posted by dean gaudet <dg...@arctic.org>.

On Sun, 8 Jul 2001, Brian Pane wrote:

> So I guess the ideal algorithm for retrieving the initial request
> from a client is:
>
>   OS with working TCP_DEFER_ACCEPT or equivalent: select first, then read
>   OS without working TCP_DEFER_ACCEPT or equivalent: read first

isn't it the other way around?

if TCP_DEFER_ACCEPT exists, then you don't get an accept() event until the
first data packet has arrived -- so you can (should) assume that a read()
will succeed.

if TCP_DEFER_ACCEPT doesn't exist, then you're probably (as long as modem
users still exist) going to get EAGAIN on the first read().  but i dunno.
maybe someone wants to hack up a production server with some debugging
code to measure if this happens?

maybe you were right to begin with and i should just set
APR_INCOMPLETE_READ by default so we always try read() first :)

> It's probably worth the complexity to support TCP_DEFER_ACCEPT on
> platforms where it's available.  My rationale for this is that if
> some idiot decides to open a thousand connections to your httpd and
> never send anything, it's more lightweight to buffer each one in
> a node in a kernel list while awaiting a timeout than to buffer it
> in a connection_rec in the httpd.

oh yeah it's definitely a win.  i'm just not sure what the APR API should
be given that freebsd and linux's implementations differ so much.  it
should also be a run time config option.

-dean

Re: Extraneous socket read?

Posted by Brian Pane <bp...@pacbell.net>.

dean gaudet wrote:

>On Sun, 8 Jul 2001, dean gaudet wrote:
>
>>On Sat, 7 Jul 2001, Bill Stoddard wrote:
>>
>>>Humm... If you use TCP_DEFER_ACCEPT, how to you handle clients
>>>connecting but not sending any bytes?
>>>
>>i'm not sure what happens if they connect and close before the timeout.
>>
>
>the socket is passed to accept() and you read() an EOF immediately... just
>like we'd like to see.
>
>also, it appears the timeout is broken (or i'm using it wrong)... awaiting
>word from kernel gurus.
>
So I guess the ideal algorithm for retrieving the initial request
from a client is:

  OS with working TCP_DEFER_ACCEPT or equivalent: select first, then read
  OS without working TCP_DEFER_ACCEPT or equivalent: read first

It's probably worth the complexity to support TCP_DEFER_ACCEPT on
platforms where it's available.  My rationale for this is that if
some idiot decides to open a thousand connections to your httpd and
never send anything, it's more lightweight to buffer each one in
a node in a kernel list while awaiting a timeout than to buffer it
in a connection_rec in the httpd.

--Brian

Re: Extraneous socket read?

Posted by dean gaudet <dg...@arctic.org>.

On Sun, 8 Jul 2001, dean gaudet wrote:

> On Sat, 7 Jul 2001, Bill Stoddard wrote:
>
> > Humm... If you use TCP_DEFER_ACCEPT, how to you handle clients
> > connecting but not sending any bytes?
>
> i'm not sure what happens if they connect and close before the timeout.

the socket is passed to accept() and you read() an EOF immediately... just
like we'd like to see.

also, it appears the timeout is broken (or i'm using it wrong)... awaiting
word from kernel gurus.

-dean

Re: Extraneous socket read?

Posted by dean gaudet <dg...@arctic.org>.

On Sat, 7 Jul 2001, Bill Stoddard wrote:

>
> > On Sat, 7 Jul 2001, Brian Pane wrote:
> >
> > > If I'm reading the code right, there's one problem with this approach: the
> > > APR_INCOMPLETE_READ flag doesn't get set until after the first
> > > read on a socket, so the first read on a new connection (the one that
> > > usually
> > > returns EAGAIN) doesn't get skipped like it should.  I _think_ the solution
> > > is to set APR_INCOMPLETE_READ on any newly created socket, unless
> > > you can think of cases where that would break something else.
> >
> > oh right.
> >
> > plus there's one more thing to take into account:  FreeBSD's
> > SO_ACCEPTFILTER and linux's TCP_DEFER_ACCEPT ... both of which won't
> > present a socket for accept until there's some data to be read.
>
> Humm... If you use TCP_DEFER_ACCEPT, how to you handle clients
> connecting but not sending any bytes?

i'm not sure what happens if they connect and close before the timeout.

there is a timeout though, so if they connect and sit around you'll get
the socket eventually.  i don't think there's any indication that it timed
out -- we'd probably want to set a really low timeout like 2 seconds, then
we can just continue with our usual timeout on top of that and it'll be
close enough.

i'm not sure there's a timeout in the freebsd SO_ACCEPTFILTER ... anyone
with freebsd foo want to comment on these cases?

-dean

Re: Extraneous socket read?

Posted by Bill Stoddard <bi...@wstoddard.com>.

> On Sat, 7 Jul 2001, Brian Pane wrote:
>
> > If I'm reading the code right, there's one problem with this approach: the
> > APR_INCOMPLETE_READ flag doesn't get set until after the first
> > read on a socket, so the first read on a new connection (the one that
> > usually
> > returns EAGAIN) doesn't get skipped like it should.  I _think_ the solution
> > is to set APR_INCOMPLETE_READ on any newly created socket, unless
> > you can think of cases where that would break something else.
>
> oh right.
>
> plus there's one more thing to take into account:  FreeBSD's
> SO_ACCEPTFILTER and linux's TCP_DEFER_ACCEPT ... both of which won't
> present a socket for accept until there's some data to be read.

Humm... If you use TCP_DEFER_ACCEPT, how to you handle clients connecting but not sending
any bytes?

Bill

[PATCH] Re: Extraneous socket read?

Posted by Brian Pane <bp...@pacbell.net>.

Bill Stoddard wrote:

>Brian,
>Are you still considering working on an APR_INCOMPLETE_READ/APR_OPTIMISTIC_READ socket
>option?  I think something like this will be useful to the lingering close code
>(ap_lingering_close).  During lingering close, we do a non-blocking read first which
>returns EAGAIN then do a select.  In this case, it is clearly a win to just do the select
>(since the common case will be that we have no bytes to read on the inbound side of the
>connection) and avoid the initial read altogether.
>
>I think I prefer the existing operation (read then select) for the default but would like
>a way to tell APR to do a select then a read for special cases like this.
>
>Bill
>
Here's a patch that does the select before the read in three cases:
  * First read on a newly accepted connection
  * First read  during lingering close
  * Any read on socket where the last read returned fewer bytes than
    were requested.
The 3rd case is handled by default in the APR socket routines, while
the first two are controlled by the application (i.e., the httpd
overrides the default read-before-select behavior in these two
specific cases where it's useful to force select-first).  If anybody
later adds support for TCP_DEFER_ACCEPT, we can disable the select-first
on the initial read.

Half of the code in this patch is taken from the patch that Dean
posted; his heuristic for handling incomplete reads is more robust
than my original code.

--Brian

Index: srclib/apr/include/apr_network_io.h
===================================================================
RCS file: /home/cvspublic/apr/include/apr_network_io.h,v
retrieving revision 1.107
diff -u -r1.107 apr_network_io.h
--- srclib/apr/include/apr_network_io.h    2001/07/26 15:39:25    1.107
+++ srclib/apr/include/apr_network_io.h    2001/07/27 23:47:50
@@ -103,6 +103,17 @@
                                    * APR_TCP_NODELAY should be turned on
                                    * again when NOPUSH is turned off
                                    */
+#define APR_INCOMPLETE_READ 4096  /* Set on non-blocking sockets
+                   * (APR_SO_TIMEOUT != 0) on which the
+                   * previous read() did not fill a buffer
+                   * completely.  the next apr_recv() will
+                   * first call select()/poll() rather than
+                   * going straight into read().  (Can also
+                   * be set by an application to force a
+                   * select()/poll() call before the next
+                   * read, in cases where the app expects
+                   * that an immediate read would fail.)
+                   */
 
 #define APR_POLLIN    0x001
 #define APR_POLLPRI   0x002
Index: srclib/apr/network_io/unix/sockopt.c
===================================================================
RCS file: /home/cvspublic/apr/network_io/unix/sockopt.c,v
retrieving revision 1.47
diff -u -r1.47 sockopt.c
--- srclib/apr/network_io/unix/sockopt.c    2001/07/25 18:01:01    1.47
+++ srclib/apr/network_io/unix/sockopt.c    2001/07/27 23:47:50
@@ -203,6 +203,12 @@
                 }
             }
         }
+    /* must disable the incomplete read support if we change to a
+     * blocking socket.
+     */
+    if (on == 0) {
+        sock->netmask &= ~APR_INCOMPLETE_READ;
+    }
         sock->timeout = on;
         apr_set_option(&sock->netmask, APR_SO_TIMEOUT, on);
     }
@@ -265,6 +271,9 @@
 #else
         return APR_ENOTIMPL;
 #endif
+    }
+    if (opt & APR_INCOMPLETE_READ) {
+    apr_set_option(&sock->netmask, APR_INCOMPLETE_READ, on);
     }
 
     return APR_SUCCESS;
Index: srclib/apr/network_io/unix/sendrecv.c
===================================================================
RCS file: /home/cvspublic/apr/network_io/unix/sendrecv.c,v
retrieving revision 1.74
diff -u -r1.74 sendrecv.c
--- srclib/apr/network_io/unix/sendrecv.c    2001/07/26 17:11:04    1.74
+++ srclib/apr/network_io/unix/sendrecv.c    2001/07/27 23:47:50
@@ -125,14 +125,21 @@
 apr_status_t apr_recv(apr_socket_t *sock, char *buf, apr_size_t *len)
 {
     ssize_t rv;
-   
+    apr_status_t arv;
+
+    if (sock->netmask & APR_INCOMPLETE_READ) {
+    sock->netmask &= ~APR_INCOMPLETE_READ;
+        goto do_select;
+    }
+
     do {
         rv = read(sock->socketdes, buf, (*len));
     } while (rv == -1 && errno == EINTR);
 
     if (rv == -1 && (errno == EAGAIN || errno == EWOULDBLOCK) &&
       sock->timeout != 0) {
-        apr_status_t arv = apr_wait_for_io_or_timeout(sock, 1);
+do_select:
+    arv = apr_wait_for_io_or_timeout(sock, 1);
         if (arv != APR_SUCCESS) {
             *len = 0;
             return arv;
@@ -146,6 +153,9 @@
     if (rv == -1) {
         (*len) = 0;
         return errno;
+    }
+    if (sock->timeout && rv < *len) {
+    sock->netmask |= APR_INCOMPLETE_READ;
     }
     (*len) = rv;
     if (rv == 0) {
Index: server/protocol.c
===================================================================
RCS file: /home/cvspublic/httpd-2.0/server/protocol.c,v
retrieving revision 1.33
diff -u -r1.33 protocol.c
--- server/protocol.c    2001/07/24 22:55:29    1.33
+++ server/protocol.c    2001/07/27 23:47:51
@@ -588,6 +588,12 @@
                      (int)(keptalive
                      ? r->server->keep_alive_timeout * APR_USEC_PER_SEC
                      : r->server->timeout * APR_USEC_PER_SEC));
+
+    /* Do a select before the first read, rather than trying the
+     * read first (this is based on empirical observations that
+     * calling read first typically results in EAGAIN/EWOULDBLOCK)
+     */
+    apr_setsocketopt(conn->client_socket, APR_INCOMPLETE_READ, 1);
                     
     /* Get the request... */
     if (!read_request_line(r)) {
Index: server/connection.c
===================================================================
RCS file: /home/cvspublic/httpd-2.0/server/connection.c,v
retrieving revision 1.83
diff -u -r1.83 connection.c
--- server/connection.c    2001/07/12 03:20:48    1.83
+++ server/connection.c    2001/07/27 23:47:51
@@ -195,6 +195,7 @@
      */
     timeout = SECONDS_TO_LINGER * APR_USEC_PER_SEC;
     apr_setsocketopt(c->client_socket, APR_SO_TIMEOUT, timeout);
+    apr_setsocketopt(c->client_socket, APR_INCOMPLETE_READ, 1);
     for (;;) {
         nbytes = sizeof(dummybuf);
         rc = apr_recv(c->client_socket, dummybuf, &nbytes);

Re: Extraneous socket read?

Posted by Brian Pane <bp...@pacbell.net>.

Bill Stoddard wrote:

>Brian,
>Are you still considering working on an APR_INCOMPLETE_READ/APR_OPTIMISTIC_READ socket
>option?  I think something like this will be useful to the lingering close code
>(ap_lingering_close).  During lingering close, we do a non-blocking read first which
>returns EAGAIN then do a select.  In this case, it is clearly a win to just do the select
>(since the common case will be that we have no bytes to read on the inbound side of the
>connection) and avoid the initial read altogether.
>
>I think I prefer the existing operation (read then select) for the default but would like
>a way to tell APR to do a select then a read for special cases like this.
>
I think the most general solution is to adopt Dean's APR_INCOMPLETE_READ
logic and let apps set the flag themselves for special cases like the first
read and the lingering close.  I'll try to post a patch this afternoon that
adds the first-read and lingering-close optimizations on top of Dean's code.

--Brian

Re: Extraneous socket read?

Posted by Bill Stoddard <bi...@wstoddard.com>.

Brian,
Are you still considering working on an APR_INCOMPLETE_READ/APR_OPTIMISTIC_READ socket
option?  I think something like this will be useful to the lingering close code
(ap_lingering_close).  During lingering close, we do a non-blocking read first which
returns EAGAIN then do a select.  In this case, it is clearly a win to just do the select
(since the common case will be that we have no bytes to read on the inbound side of the
connection) and avoid the initial read altogether.

I think I prefer the existing operation (read then select) for the default but would like
a way to tell APR to do a select then a read for special cases like this.

Bill

> On Sat, 7 Jul 2001, Brian Pane wrote:
>
> > If I'm reading the code right, there's one problem with this approach: the
> > APR_INCOMPLETE_READ flag doesn't get set until after the first
> > read on a socket, so the first read on a new connection (the one that
> > usually
> > returns EAGAIN) doesn't get skipped like it should.  I _think_ the solution
> > is to set APR_INCOMPLETE_READ on any newly created socket, unless
> > you can think of cases where that would break something else.
>
> oh right.
>
> plus there's one more thing to take into account:  FreeBSD's
> SO_ACCEPTFILTER and linux's TCP_DEFER_ACCEPT ... both of which won't
> present a socket for accept until there's some data to be read.
>
> hmm.  we don't have TCP_DEFER_ACCEPT support yet.
>
> -dean
>
>
>
>

Re: Extraneous socket read?

Posted by dean gaudet <de...@arctic.org>.

On Sat, 7 Jul 2001, Brian Pane wrote:

> If I'm reading the code right, there's one problem with this approach: the
> APR_INCOMPLETE_READ flag doesn't get set until after the first
> read on a socket, so the first read on a new connection (the one that
> usually
> returns EAGAIN) doesn't get skipped like it should.  I _think_ the solution
> is to set APR_INCOMPLETE_READ on any newly created socket, unless
> you can think of cases where that would break something else.

oh right.

plus there's one more thing to take into account:  FreeBSD's
SO_ACCEPTFILTER and linux's TCP_DEFER_ACCEPT ... both of which won't
present a socket for accept until there's some data to be read.

hmm.  we don't have TCP_DEFER_ACCEPT support yet.

-dean

Re: Extraneous socket read?

Posted by dean gaudet <de...@arctic.org>.

On Sat, 7 Jul 2001, Brian Pane wrote:

> If I'm reading the code right, there's one problem with this approach: the
> APR_INCOMPLETE_READ flag doesn't get set until after the first
> read on a socket, so the first read on a new connection (the one that
> usually
> returns EAGAIN) doesn't get skipped like it should.  I _think_ the solution
> is to set APR_INCOMPLETE_READ on any newly created socket, unless
> you can think of cases where that would break something else.

oh right.

plus there's one more thing to take into account:  FreeBSD's
SO_ACCEPTFILTER and linux's TCP_DEFER_ACCEPT ... both of which won't
present a socket for accept until there's some data to be read.

hmm.  we don't have TCP_DEFER_ACCEPT support yet.

-dean

Re: Extraneous socket read?

Posted by Brian Pane <bp...@pacbell.net>.

dean gaudet wrote:

>On Mon, 2 Jul 2001, Brian Pane wrote:
>
>>>On Mon, 2 Jul 2001, Brian Pane wrote:
>>>
>>>>then it should be possible to eliminate a system call by not doing the
>>>>initial read before the select.
>>>>
>>Here's a patch that disables the read before the select, but only for
>>the HTTP client socket.
>>
>
>hey brian, i think i prefer the approach below instead.  but i haven't
>been able to test this patch because my top-of-tree httpd-2.0 won't build
>with top-of-tree apr/apr-util.
>
>the approach i'm using doesn't require any changes to any applications --
>it just implements the heuristic i mentioned earlier.  whenever a read()
>returns an incomplete buffer assume we want to select() first before
>read()ing again (and all of this only if the socket is non-blocking).
>

If I'm reading the code right, there's one problem with this approach: the
APR_INCOMPLETE_READ flag doesn't get set until after the first
read on a socket, so the first read on a new connection (the one that 
usually
returns EAGAIN) doesn't get skipped like it should.  I _think_ the solution
is to set APR_INCOMPLETE_READ on any newly created socket, unless
you can think of cases where that would break something else.

--Brian

Re: Extraneous socket read?

Posted by Brian Pane <bp...@pacbell.net>.

dean gaudet wrote:

>On Mon, 2 Jul 2001, Brian Pane wrote:
>
>>>On Mon, 2 Jul 2001, Brian Pane wrote:
>>>
>>>>then it should be possible to eliminate a system call by not doing the
>>>>initial read before the select.
>>>>
>>Here's a patch that disables the read before the select, but only for
>>the HTTP client socket.
>>
>
>hey brian, i think i prefer the approach below instead.  but i haven't
>been able to test this patch because my top-of-tree httpd-2.0 won't build
>with top-of-tree apr/apr-util.
>
>the approach i'm using doesn't require any changes to any applications --
>it just implements the heuristic i mentioned earlier.  whenever a read()
>returns an incomplete buffer assume we want to select() first before
>read()ing again (and all of this only if the socket is non-blocking).
>

If I'm reading the code right, there's one problem with this approach: the
APR_INCOMPLETE_READ flag doesn't get set until after the first
read on a socket, so the first read on a new connection (the one that 
usually
returns EAGAIN) doesn't get skipped like it should.  I _think_ the solution
is to set APR_INCOMPLETE_READ on any newly created socket, unless
you can think of cases where that would break something else.

--Brian

Re: Extraneous socket read?

Posted by dean gaudet <de...@arctic.org>.

On Mon, 2 Jul 2001, Brian Pane wrote:

> >On Mon, 2 Jul 2001, Brian Pane wrote:
> >
> >>then it should be possible to eliminate a system call by not doing the
> >>initial read before the select.
> >>
> Here's a patch that disables the read before the select, but only for
> the HTTP client socket.

hey brian, i think i prefer the approach below instead.  but i haven't
been able to test this patch because my top-of-tree httpd-2.0 won't build
with top-of-tree apr/apr-util.

the approach i'm using doesn't require any changes to any applications --
it just implements the heuristic i mentioned earlier.  whenever a read()
returns an incomplete buffer assume we want to select() first before
read()ing again (and all of this only if the socket is non-blocking).

-dean

Index: srclib/apr/include/apr_network_io.h
===================================================================
RCS file: /home/cvs/apr/include/apr_network_io.h,v
retrieving revision 1.102
diff -u -r1.102 apr_network_io.h
--- srclib/apr/include/apr_network_io.h	2001/05/02 02:32:44	1.102
+++ srclib/apr/include/apr_network_io.h	2001/07/07 18:18:14
@@ -102,6 +102,12 @@
                                    * APR_TCP_NODELAY should be turned on
                                    * again when NOPUSH is turned off
                                    */
+/* an internal flag.  this is set on non-blocking sockets
+ * (APR_SO_TIMEOUT != 0) on which the previous read() did not fill a
+ * buffer completely.  the next apr_recv() will first call select()/poll()
+ * rather than going straight into read().
+ */
+#define APR_INCOMPLETE_READ 4096

 #define APR_POLLIN    0x001
 #define APR_POLLPRI   0x002
Index: srclib/apr/network_io/unix/sendrecv.c
===================================================================
RCS file: /home/cvs/apr/network_io/unix/sendrecv.c,v
retrieving revision 1.65
diff -u -r1.65 sendrecv.c
--- srclib/apr/network_io/unix/sendrecv.c	2001/05/03 22:38:00	1.65
+++ srclib/apr/network_io/unix/sendrecv.c	2001/07/07 18:18:16
@@ -125,14 +125,21 @@
 apr_status_t apr_recv(apr_socket_t *sock, char *buf, apr_size_t *len)
 {
     ssize_t rv;
-
+    apr_status_t arv;
+
+    if (sock->netmask & APR_INCOMPLETE_READ) {
+	sock->netmask &= ~APR_INCOMPLETE_READ;
+    	goto do_select;
+    }
+
     do {
-        rv = read(sock->socketdes, buf, (*len));
+	    rv = read(sock->socketdes, buf, (*len));
     } while (rv == -1 && errno == EINTR);

     if (rv == -1 && (errno == EAGAIN || errno == EWOULDBLOCK) &&
       sock->timeout != 0) {
-        apr_status_t arv = wait_for_io_or_timeout(sock, 1);
+do_select:
+        arv = wait_for_io_or_timeout(sock, 1);
         if (arv != APR_SUCCESS) {
             *len = 0;
             return arv;
@@ -146,6 +153,9 @@
     if (rv == -1) {
         (*len) = 0;
         return errno;
+    }
+    if (sock->timeout && rv < *len) {
+	sock->netmask |= APR_INCOMPLETE_READ;
     }
     (*len) = rv;
     if (rv == 0) {
Index: srclib/apr/network_io/unix/sockopt.c
===================================================================
RCS file: /home/cvs/apr/network_io/unix/sockopt.c,v
retrieving revision 1.45
diff -u -r1.45 sockopt.c
--- srclib/apr/network_io/unix/sockopt.c	2001/04/05 18:56:08	1.45
+++ srclib/apr/network_io/unix/sockopt.c	2001/07/07 18:18:17
@@ -216,6 +216,12 @@
                 }
             }
         }
+	/* must disable the incomplete read support if we change to a
+	 * blocking socket.
+	 */
+	if (on == 0) {
+	    sock->netmask &= ~APR_INCOMPLETE_READ;
+	}
         sock->timeout = on;
         apr_set_option(&sock->netmask, APR_SO_TIMEOUT, on);
     }

Re: Extraneous socket read?

Posted by Brian Pane <bp...@pacbell.net>.

dean gaudet wrote:

>On Mon, 2 Jul 2001, Brian Pane wrote:
>
>>Here's a patch that disables the read before the select, but only for
>>the HTTP client socket.  Can somebody with a decent benchmark lab
>>test it out to see if the throughput change is significant?
>>
>
>you can make some reasonable performance measurements with a pair of
>machines and a crossover 100baseT segment.  assuming you're able to max
>out the 100baseT (shouldn't be hard) then you want to use "idle CPU" as
>your metric.  vmstat can help you there.
>
 From a CPU consumption perspective, I'm seeing about a 3% reduction in
system CPU utilization and a 0.5% increase in throughput (when sending a
10KB file to ab over the loopback).  No reduction in user CPU (which makes
sense in this case).  These numbers are in the range that I'd have expected,
given the elimination of the extra read call.  However, they're also small
enough to be within the normal variation of measurement error.  Thus,
while my initial results indicate that the patch will yield a speedup in the
real world, a 3rd party benchmark wouldn't hurt.

--Brian

Re: Extraneous socket read?

Posted by dean gaudet <dg...@arctic.org>.

On Mon, 2 Jul 2001, Brian Pane wrote:

> Here's a patch that disables the read before the select, but only for
> the HTTP client socket.  Can somebody with a decent benchmark lab
> test it out to see if the throughput change is significant?

you can make some reasonable performance measurements with a pair of
machines and a crossover 100baseT segment.  assuming you're able to max
out the 100baseT (shouldn't be hard) then you want to use "idle CPU" as
your metric.  vmstat can help you there.

alternately you can use times() to report how much u+s time is consumed by
the processes...

also, copper gigabit cards are getting down to reasonable prices --
there's some $180 cards from 3com now.  i think there are even cheaper
ones, maybe d-link?  the 3com cheap ones are good hardware (with hw
checksum and scatter/gather), some of the others i doubt could ever run at
full gigabit speed, nowhere near enough buffering and no interrupt
coalescing.

btw, copper gigabit needs true 4-pair cat5 cables, and uses all 4 pairs in
a bidirectional manner... and you don't need to use a crossover at all if
you want to connect a pair of NICs back-to-back.  it's really quite nice.
well except for the 15W heat-sinks on the big DSPs they need to do all the
signal processing :)

-dean

Re: Extraneous socket read?

Posted by dean gaudet <de...@arctic.org>.

On Mon, 2 Jul 2001, Brian Pane wrote:

> >On Mon, 2 Jul 2001, Brian Pane wrote:
> >
> >>then it should be possible to eliminate a system call by not doing the
> >>initial read before the select.
> >>
> Here's a patch that disables the read before the select, but only for
> the HTTP client socket.

hey brian, i think i prefer the approach below instead.  but i haven't
been able to test this patch because my top-of-tree httpd-2.0 won't build
with top-of-tree apr/apr-util.

the approach i'm using doesn't require any changes to any applications --
it just implements the heuristic i mentioned earlier.  whenever a read()
returns an incomplete buffer assume we want to select() first before
read()ing again (and all of this only if the socket is non-blocking).

-dean

Index: srclib/apr/include/apr_network_io.h
===================================================================
RCS file: /home/cvs/apr/include/apr_network_io.h,v
retrieving revision 1.102
diff -u -r1.102 apr_network_io.h
--- srclib/apr/include/apr_network_io.h	2001/05/02 02:32:44	1.102
+++ srclib/apr/include/apr_network_io.h	2001/07/07 18:18:14
@@ -102,6 +102,12 @@
                                    * APR_TCP_NODELAY should be turned on
                                    * again when NOPUSH is turned off
                                    */
+/* an internal flag.  this is set on non-blocking sockets
+ * (APR_SO_TIMEOUT != 0) on which the previous read() did not fill a
+ * buffer completely.  the next apr_recv() will first call select()/poll()
+ * rather than going straight into read().
+ */
+#define APR_INCOMPLETE_READ 4096

 #define APR_POLLIN    0x001
 #define APR_POLLPRI   0x002
Index: srclib/apr/network_io/unix/sendrecv.c
===================================================================
RCS file: /home/cvs/apr/network_io/unix/sendrecv.c,v
retrieving revision 1.65
diff -u -r1.65 sendrecv.c
--- srclib/apr/network_io/unix/sendrecv.c	2001/05/03 22:38:00	1.65
+++ srclib/apr/network_io/unix/sendrecv.c	2001/07/07 18:18:16
@@ -125,14 +125,21 @@
 apr_status_t apr_recv(apr_socket_t *sock, char *buf, apr_size_t *len)
 {
     ssize_t rv;
-
+    apr_status_t arv;
+
+    if (sock->netmask & APR_INCOMPLETE_READ) {
+	sock->netmask &= ~APR_INCOMPLETE_READ;
+    	goto do_select;
+    }
+
     do {
-        rv = read(sock->socketdes, buf, (*len));
+	    rv = read(sock->socketdes, buf, (*len));
     } while (rv == -1 && errno == EINTR);

     if (rv == -1 && (errno == EAGAIN || errno == EWOULDBLOCK) &&
       sock->timeout != 0) {
-        apr_status_t arv = wait_for_io_or_timeout(sock, 1);
+do_select:
+        arv = wait_for_io_or_timeout(sock, 1);
         if (arv != APR_SUCCESS) {
             *len = 0;
             return arv;
@@ -146,6 +153,9 @@
     if (rv == -1) {
         (*len) = 0;
         return errno;
+    }
+    if (sock->timeout && rv < *len) {
+	sock->netmask |= APR_INCOMPLETE_READ;
     }
     (*len) = rv;
     if (rv == 0) {
Index: srclib/apr/network_io/unix/sockopt.c
===================================================================
RCS file: /home/cvs/apr/network_io/unix/sockopt.c,v
retrieving revision 1.45
diff -u -r1.45 sockopt.c
--- srclib/apr/network_io/unix/sockopt.c	2001/04/05 18:56:08	1.45
+++ srclib/apr/network_io/unix/sockopt.c	2001/07/07 18:18:17
@@ -216,6 +216,12 @@
                 }
             }
         }
+	/* must disable the incomplete read support if we change to a
+	 * blocking socket.
+	 */
+	if (on == 0) {
+	    sock->netmask &= ~APR_INCOMPLETE_READ;
+	}
         sock->timeout = on;
         apr_set_option(&sock->netmask, APR_SO_TIMEOUT, on);
     }

Re: Extraneous socket read?

Posted by Brian Pane <bp...@pacbell.net>.

dean gaudet wrote:

>On Mon, 2 Jul 2001, Brian Pane wrote:
>
>>then it should be possible to eliminate a system call by not doing the
>>initial read before the select.
>>
>
>that sounds correct and a desirable change.  i think the current behaviour
>goes all the way back to the first iol stuff i did, and was possibly just
>a thinko?
>
>oh perhaps it was because i didn't want to pay for a stack allocation on
>every entry to the low level send/recv routines?  (which is why there's a
>separate wait_for_io_or_timeout routine)
>
>that change is worth benchmarking.
>
>-dean
>
Here's a patch that disables the read before the select, but only for
the HTTP client socket.  Can somebody with a decent benchmark lab
test it out to see if the throughput change is significant?
Thanks,
--Brian

Index: server/protocol.c
===================================================================
RCS file: /home/cvspublic/httpd-2.0/server/protocol.c,v
retrieving revision 1.28
diff -u -r1.28 protocol.c
--- server/protocol.c    2001/06/27 20:18:09    1.28
+++ server/protocol.c    2001/07/03 06:15:36
@@ -588,6 +588,8 @@
                      (int)(keptalive
                      ? r->server->keep_alive_timeout * APR_USEC_PER_SEC
                      : r->server->timeout * APR_USEC_PER_SEC));
+
+    apr_setsocketopt(conn->client_socket, APR_OPTIMISTIC_READ, 0);
                     
     /* Get the request... */
     if (!read_request_line(r)) {
Index: srclib/apr/include/apr_network_io.h
===================================================================
RCS file: /home/cvspublic/apr/include/apr_network_io.h,v
retrieving revision 1.102
diff -u -r1.102 apr_network_io.h
--- srclib/apr/include/apr_network_io.h    2001/05/02 02:32:44    1.102
+++ srclib/apr/include/apr_network_io.h    2001/07/03 06:15:40
@@ -102,6 +102,7 @@
                                    * APR_TCP_NODELAY should be turned on
                                    * again when NOPUSH is turned off
                                    */
+#define APR_OPTIMISTIC_READ  4096
 
 #define APR_POLLIN    0x001
 #define APR_POLLPRI   0x002
Index: srclib/apr/include/arch/unix/locks.h
===================================================================
RCS file: /home/cvspublic/apr/include/arch/unix/locks.h,v
retrieving revision 1.35
diff -u -r1.35 locks.h
--- srclib/apr/include/arch/unix/locks.h    2001/07/01 05:49:44    1.35
+++ srclib/apr/include/arch/unix/locks.h    2001/07/03 06:15:51
@@ -164,6 +164,7 @@
 #if APR_USE_PTHREAD_SERIALIZE
     pthread_mutex_t *intraproc;
 #endif
+
 #ifdef HAVE_PTHREAD_RWLOCK_INIT
     pthread_rwlock_t rwlock;
 #endif
Index: srclib/apr/network_io/unix/sendrecv.c
===================================================================
RCS file: /home/cvspublic/apr/network_io/unix/sendrecv.c,v
retrieving revision 1.65
diff -u -r1.65 sendrecv.c
--- srclib/apr/network_io/unix/sendrecv.c    2001/05/03 22:38:00    1.65
+++ srclib/apr/network_io/unix/sendrecv.c    2001/07/03 06:15:57
@@ -125,13 +125,15 @@
 apr_status_t apr_recv(apr_socket_t *sock, char *buf, apr_size_t *len)
 {
     ssize_t rv;
-   
-    do {
-        rv = read(sock->socketdes, buf, (*len));
-    } while (rv == -1 && errno == EINTR);
 
-    if (rv == -1 && (errno == EAGAIN || errno == EWOULDBLOCK) &&
-      sock->timeout != 0) {
+    if (sock->netmask & APR_OPTIMISTIC_READ) {
+    do {
+        rv = read(sock->socketdes, buf, (*len));
+    } while (rv == -1 && errno == EINTR);
+    }
+    if (!(sock->netmask & APR_OPTIMISTIC_READ) ||
+    (rv == -1 && (errno == EAGAIN || errno == EWOULDBLOCK) &&
+     sock->timeout != 0)) {
         apr_status_t arv = wait_for_io_or_timeout(sock, 1);
         if (arv != APR_SUCCESS) {
             *len = 0;
Index: srclib/apr/network_io/unix/sockets.c
===================================================================
RCS file: /home/cvspublic/apr/network_io/unix/sockets.c,v
retrieving revision 1.74
diff -u -r1.74 sockets.c
--- srclib/apr/network_io/unix/sockets.c    2001/05/02 02:32:47    1.74
+++ srclib/apr/network_io/unix/sockets.c    2001/07/03 06:15:57
@@ -111,6 +111,7 @@
      */
     sock->netmask |= APR_TCP_NODELAY;
 #endif
+    sock->netmask |= APR_OPTIMISTIC_READ;
 }                                                                                                  

 static void alloc_socket(apr_socket_t **new, apr_pool_t *p)
 {
Index: srclib/apr/network_io/unix/sockopt.c
===================================================================
RCS file: /home/cvspublic/apr/network_io/unix/sockopt.c,v
retrieving revision 1.45
diff -u -r1.45 sockopt.c
--- srclib/apr/network_io/unix/sockopt.c    2001/04/05 18:56:08    1.45
+++ srclib/apr/network_io/unix/sockopt.c    2001/07/03 06:15:57
@@ -279,6 +279,9 @@
         return APR_ENOTIMPL;
 #endif
     }
+    if (opt & APR_OPTIMISTIC_READ) {
+    apr_set_option(&sock->netmask, APR_OPTIMISTIC_READ, on);
+    }
 
     return APR_SUCCESS;
 }

Re: Extraneous socket read?

Posted by rb...@covalent.net.

On Mon, 2 Jul 2001, dean gaudet wrote:

> On Mon, 2 Jul 2001, Brian Pane wrote:
>
> > then it should be possible to eliminate a system call by not doing the
> > initial read before the select.
>
> that sounds correct and a desirable change.  i think the current behaviour
> goes all the way back to the first iol stuff i did, and was possibly just
> a thinko?
>
> oh perhaps it was because i didn't want to pay for a stack allocation on
> every entry to the low level send/recv routines?  (which is why there's a
> separate wait_for_io_or_timeout routine)
>
> that change is worth benchmarking.

Looking at the code, that read() is when we first enter
wait_for_io_or_timeout.  There was some discussion about this a while ago,
and nobody could really decide the best way to handle this.  There are two
options:

1)  We assume best case, that data is there immediately.  In this case, we
are best to read before selecting, because we will end up having one
system call.  However, if we are wrong, we end up with three.  One for the
first read, one for the select, and one for the read that succeeds
(assumes success).

2)  Assume worst case, that data is not going to be there.  In this case,
we do the select before the read always.  This means that we will always
have two system calls, regardless of what happens.

Currently, we assume #1.  Nobody ever did any real benchmarking on this
though, so we really don't know if this is a good thing or a bad thing.

Ryan

_____________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
Covalent Technologies			rbb@covalent.net
-----------------------------------------------------------------------------

Re: Extraneous socket read?

Posted by dean gaudet <dg...@arctic.org>.

well having one of those two userland copies is OK i guess (there are many
references to it within apache)... it's just unfortunate that the
apr_socket_t is so heavy-weight -- the request_rec/conn_rec is already
heavy-weight for so many other reasons that it's no worry being there.

-dean

On Mon, 2 Jul 2001 rbb@covalent.net wrote:

> On Mon, 2 Jul 2001, dean gaudet wrote:
>
> > On Mon, 2 Jul 2001, dean gaudet wrote:
> >
> > > (aie! is get{sock,peer}name really so expensive / used so much that
> > > you need to keep userland copies??  that means you're paying for it
> > > twice!  once in the kernel, once in userland!)
> >
> > haha, *three* times!  the third is in conn_rec :)
>
> damn, that one in the conn_rec should have gone away a LONG time ago.  I
> won't comment on the other userland copy.
>
> Ryan
>
> _____________________________________________________________________________
> Ryan Bloom                        	rbb@apache.org
> Covalent Technologies			rbb@covalent.net
> -----------------------------------------------------------------------------
>
>

Re: Extraneous socket read?

Posted by rb...@covalent.net.

On Mon, 2 Jul 2001, dean gaudet wrote:

> On Mon, 2 Jul 2001, dean gaudet wrote:
>
> > (aie! is get{sock,peer}name really so expensive / used so much that
> > you need to keep userland copies??  that means you're paying for it
> > twice!  once in the kernel, once in userland!)
>
> haha, *three* times!  the third is in conn_rec :)

damn, that one in the conn_rec should have gone away a LONG time ago.  I
won't comment on the other userland copy.

Ryan

_____________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
Covalent Technologies			rbb@covalent.net
-----------------------------------------------------------------------------

Re: Extraneous socket read?

Posted by dean gaudet <dg...@arctic.org>.

On Mon, 2 Jul 2001, dean gaudet wrote:

> (aie! is get{sock,peer}name really so expensive / used so much that
> you need to keep userland copies??  that means you're paying for it
> twice!  once in the kernel, once in userland!)

haha, *three* times!  the third is in conn_rec :)

-dean

Re: Extraneous socket read?

Posted by dean gaudet <dg...@arctic.org>.

On Mon, 2 Jul 2001, dean gaudet wrote:

> On Mon, 2 Jul 2001, Brian Pane wrote:
>
> > then it should be possible to eliminate a system call by not doing the
> > initial read before the select.
>
> that sounds correct and a desirable change.  i think the current behaviour
> goes all the way back to the first iol stuff i did, and was possibly just
> a thinko?
>
> oh perhaps it was because i didn't want to pay for a stack allocation on
> every entry to the low level send/recv routines?  (which is why there's a
> separate wait_for_io_or_timeout routine)
>
> that change is worth benchmarking.

aha, i remember a bit now.

certainly for an HTTP server what you describe is the best in almost
all cases (except receiving a big POST or PUT).  but for an HTTP proxy
receiving data, your select-first change would likely add almost a select
per read.  similarly for other non-HTTP protocols.

i bet the heuristic is more along the lines of:

	if (this is the first read, or the previous read filled
		the entire buffer) {
		try read() first
	}
	else {
		try select() first
	}

but it's such a shame to add a flag to the socket structure and more
conditionals :)

er i guess it doesn't matter, apr_socket_t is already bloated :)  (aie!
is get{sock,peer}name really so expensive / used so much that you need to
keep userland copies??  that means you're paying for it twice!  once in
the kernel, once in userland!)

-dean

Re: Extraneous socket read?

Posted by dean gaudet <dg...@arctic.org>.

On Mon, 2 Jul 2001, Brian Pane wrote:

> then it should be possible to eliminate a system call by not doing the
> initial read before the select.

that sounds correct and a desirable change.  i think the current behaviour
goes all the way back to the first iol stuff i did, and was possibly just
a thinko?

oh perhaps it was because i didn't want to pay for a stack allocation on
every entry to the low level send/recv routines?  (which is why there's a
separate wait_for_io_or_timeout routine)

that change is worth benchmarking.

-dean