You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Dean Gaudet <dg...@arctic.org> on 1997/07/25 11:53:11 UTC

[PATCH] writev() combining for large bwrites

This is the patch I wanted to get in to anticipate mmap() development
later.  If a BUFF already has some partially buffered content (i.e. 
headers) and a large bwrite is performed (larger than the BUFF's buffer)
then no memory copy is done, and both buffers are combined and written
using writev().

I think I found a bug in the chunking code... if chunking, and a small
write is performed, filling the buffer, an end_chunk is performed.  Then
write_it_all is called.  If that fails, it's still possible for bwrite() 
to not set the error flag on the BUFF.  I'm not terribily happy with the
fix I included in this (it's the extra start_chunk call).  I'll likely do
another rev of this.

At any rate, on your typical 6k file, this patch reduces the number of
system calls by 1.  Apache is still a far off from the "optimal" number
which is probably in the 20-30 range ;) 

Dean

Index: CHANGES
===================================================================
RCS file: /export/home/cvs/apache/src/CHANGES,v
retrieving revision 1.364
diff -u -r1.364 CHANGES
--- CHANGES	1997/07/24 04:38:07	1.364
+++ CHANGES	1997/07/25 09:45:53
@@ -1,5 +1,15 @@
 Changes with Apache 1.3a2
-  
+
+  *) When a large bwrite() occurs (larger than the internal buffer size),
+     while there is already something in the buffer, apache will combine
+     the large write and the buffer into a single writev().  (This is
+     in anticipation of using mmap() for reading files.)
+     [Dean Gaudet]
+
+  *) In obscure cases where a partial socket write occured while chunking,
+     Apache would omit the chunk header/footer on the next block.
+     [Dean Gaudet]
+
   *) PORT: Various tweaks to eliminate pointer-int casting warnings on 64-bit
      cpus like the alpha.  Apache still stores ints in pointers, but that's
      the relatively safe direction.  [Dean Gaudet] PR#344
Index: buff.c
===================================================================
RCS file: /export/home/cvs/apache/src/buff.c,v
retrieving revision 1.38
diff -u -r1.38 buff.c
--- buff.c	1997/07/24 04:23:57	1.38
+++ buff.c	1997/07/25 09:45:55
@@ -840,6 +840,45 @@
 }
 
 
+#ifndef NO_WRITEV
+/* similar to previous, but uses writev.  Note that it modifies vec.
+ * return 0 if successful, -1 otherwise.
+ */
+static int writev_it_all (BUFF *fb, struct iovec *vec, int nvec)
+{
+    int i, rv;
+
+    /* while it's nice an easy to build the vector and crud, it's painful
+     * to deal with a partial writev()
+     */
+    for( i = 0; i < nvec; ) {
+	do rv = writev( fb->fd, &vec[i], nvec - i );
+	while (rv == -1 && errno == EINTR && !(fb->flags & B_EOUT));
+	if (rv == -1)
+	    return -1;
+	/* recalculate vec to deal with partial writes */
+	while (rv > 0) {
+	    if (rv < vec[i].iov_len) {
+		vec[i].iov_base = (char *)vec[i].iov_base + rv;
+		vec[i].iov_len -= rv;
+		rv = 0;
+		if (vec[i].iov_len == 0) {
+		    ++i;
+		}
+	    } else {
+		rv -= vec[i].iov_len;
+		++i;
+	    }
+	}
+	if (fb->flags & B_EOUT)
+	    return -1;
+    }
+    /* if we got here, we wrote it all */
+    return 0;
+}
+#endif
+
+
 /*
  * A hook to write() that deals with chunking. This is really a protocol-
  * level issue, but we deal with it here because it's simpler; this is
@@ -852,7 +891,6 @@
     char chunksize[16];	/* Big enough for practically anything */
 #ifndef NO_WRITEV
     struct iovec vec[3];
-    int i, rv;
 #endif
 
     if (fb->flags & (B_WRERR|B_EOUT))
@@ -874,9 +912,6 @@
 	return -1;
     return nbyte;
 #else
-
-#define NVEC	(sizeof(vec)/sizeof(vec[0]))
-
     vec[0].iov_base = chunksize;
     vec[0].iov_len = ap_snprintf(chunksize, sizeof(chunksize), "%x\015\012",
 	nbyte);
@@ -884,38 +919,51 @@
     vec[1].iov_len = nbyte;
     vec[2].iov_base = "\r\n";
     vec[2].iov_len = 2;
-    /* while it's nice an easy to build the vector and crud, it's painful
-     * to deal with a partial writev()
-     */
-    for( i = 0; i < NVEC; ) {
-	do rv = writev( fb->fd, &vec[i], NVEC - i );
-	while (rv == -1 && errno == EINTR && !(fb->flags & B_EOUT));
-	if (rv == -1)
-	    return -1;
-	/* recalculate vec to deal with partial writes */
-	while (rv > 0) {
-	    if( rv <= vec[i].iov_len ) {
-		vec[i].iov_base = (char *)vec[i].iov_base + rv;
-		vec[i].iov_len -= rv;
-		rv = 0;
-		if( vec[i].iov_len == 0 ) {
-		    ++i;
-		}
-	    } else {
-		rv -= vec[i].iov_len;
-		++i;
-	    }
-	}
-	if (fb->flags & B_EOUT)
-	    return -1;
-    }
-    /* if we got here, we wrote it all */
-    return nbyte;
-#undef NVEC
+
+    return writev_it_all (fb, vec, (sizeof(vec)/sizeof(vec[0]))) ? -1 : nbyte;
 #endif
 }
 
 
+#ifndef NO_WRITEV
+/*
+ * Used to combine the contents of the fb buffer, and a large buffer
+ * passed in.
+ */
+static int large_write (BUFF *fb, const void *buf, int nbyte)
+{
+    struct iovec vec[4];
+    int nvec;
+    char chunksize[16];
+
+    nvec = 0;
+    /* it's easiest to end the current chunk */
+    if (fb->flags & B_CHUNK) {
+	end_chunk(fb);
+    }
+    vec[0].iov_base = fb->outbase;
+    vec[0].iov_len = fb->outcnt;
+    if (fb->flags & B_CHUNK) {
+	vec[1].iov_base = chunksize;
+	vec[1].iov_len = ap_snprintf (chunksize, sizeof(chunksize),
+	    "%x\015\012", nbyte);
+	vec[2].iov_base = (void *)buf;
+	vec[2].iov_len = nbyte;
+	vec[3].iov_base = "\r\n";
+	vec[3].iov_len = 2;
+	nvec = 4;
+    } else {
+	vec[1].iov_base = (void *)buf;
+	vec[1].iov_len = nbyte;
+	nvec = 2;
+    }
+
+    fb->outcnt = 0;
+    return writev_it_all (fb, vec, nvec) ? -1 : nbyte;
+}
+#endif
+
+
 /*
  * Write nbyte bytes.
  * Only returns fewer than nbyte if an error ocurred.
@@ -951,6 +999,19 @@
 	else
 	    return i;
     }
+
+#ifndef NO_WRITEV
+/*
+ * Detect case where we're asked to write a large buffer, and combine our
+ * current buffer with it in a single writev()
+ */
+    if (fb->outcnt > 0 && nbyte >= fb->bufsiz) {
+	return large_write (fb, buf, nbyte);
+    }
+#endif
+
+    /* in case a chunk hasn't been started yet */
+    if( fb->flags & B_CHUNK ) start_chunk( fb );
 
 /*
  * Whilst there is data in the buffer, keep on adding to it and writing it

Re: [PATCH] writev() combining for large bwrites

Posted by Dean Gaudet <dg...@arctic.org>.

You use autoconf in mod_php to determine mmap() correctness right?

At any rate I wasn't thinking about mod_include and mmap() in this case,
just the default handler.  It would be nice if we could set a #define to
know when it's safe to use ... and I suppose a few mmap routines in
alloc.c to do resource protection are in order.

I've been wondering about the usefulness of the core opening/mmapping the
file early on... but the only case I know of so far where a file is opened
twice is when mod_mime_magic is in use.  Are there others? 

Dean

On Fri, 25 Jul 1997, Rasmus Lerdorf wrote:

> > This is the patch I wanted to get in to anticipate mmap() development
> > later.  
> 
> We have to make sure not to enable mmap() on Alphas running OSF.  mmap()
> is very broken on that OS.  Keep that in mind if/when the Configure stuff
> is done to determine if mmap() should be used.
> 
> Also, is it expected that content-parsing modules such as mod_include and
> mod_php would now be able to receive a caddr_t pointer to the mmap'ed
> file, or is the intention to only do the mmap() right in mod_include?
> 
> -Rasmus
> 
>

Re: [PATCH] writev() combining for large bwrites

Posted by Dean Gaudet <dg...@arctic.org>.

HAVE_MMAP is defined if you want the arch to use mmap for the scoreboard. 
For example, linux has a working mmap but HAVE_MMAP isn't defined. 

Dean

On Fri, 25 Jul 1997, Alexei Kosut wrote:

> On Fri, 25 Jul 1997, Rasmus Lerdorf wrote:
> 
> > > This is the patch I wanted to get in to anticipate mmap() development
> > > later.  
> > 
> > We have to make sure not to enable mmap() on Alphas running OSF.  mmap()
> > is very broken on that OS.  Keep that in mind if/when the Configure stuff
> > is done to determine if mmap() should be used.
> 
> We already do Configure stuff to determine if mmap() should be used,
> since we use it (optionally) for the scoreboard. The HAVE_MMAP define in
> conf.h determines whether it is availble for a given OS. I do see a
> #define HAVE_MMAP in the OSF1 section...
> 
> -- Alexei Kosut <ak...@organic.com>
> 
> 
> 
>

Re: [PATCH] writev() combining for large bwrites

Posted by Alexei Kosut <ak...@organic.com>.

On Fri, 25 Jul 1997, Rasmus Lerdorf wrote:

> > This is the patch I wanted to get in to anticipate mmap() development
> > later.  
> 
> We have to make sure not to enable mmap() on Alphas running OSF.  mmap()
> is very broken on that OS.  Keep that in mind if/when the Configure stuff
> is done to determine if mmap() should be used.

We already do Configure stuff to determine if mmap() should be used,
since we use it (optionally) for the scoreboard. The HAVE_MMAP define in
conf.h determines whether it is availble for a given OS. I do see a
#define HAVE_MMAP in the OSF1 section...

-- Alexei Kosut <ak...@organic.com>

Re: [PATCH] writev() combining for large bwrites

Posted by Rasmus Lerdorf <ra...@lerdorf.on.ca>.

> This is the patch I wanted to get in to anticipate mmap() development
> later.  

We have to make sure not to enable mmap() on Alphas running OSF.  mmap()
is very broken on that OS.  Keep that in mind if/when the Configure stuff
is done to determine if mmap() should be used.

Also, is it expected that content-parsing modules such as mod_include and
mod_php would now be able to receive a caddr_t pointer to the mmap'ed
file, or is the intention to only do the mmap() right in mod_include?

-Rasmus