You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Justin Erenkrantz <je...@ebuilt.com> on 2001/09/24 03:58:19 UTC

[PATCH] Take 2 of the http filter rewrite

Take two.

As Greg suggested, this merges the dechunk and core filters back
to one filter.  (This makes the diff in http_protocol.c a bit
gnarly.)

Other significant changes is that the HTTP_IN (ap_http_filter) is
now a request-level filter rather than a connection-level filter.
This makes a lot more sense to me.

This passes all of the tests in httpd-test (not to say that this
is perfect or anything).  There is a patch at the bottom for 
mod_input_body_filter.c in httpd-test that is required.  (It was 
expecting > readbytes to be read when asking for readbytes).

Comments and feedback?  -- justin

Index: buckets/apr_brigade.c
===================================================================
RCS file: /home/cvs/apr-util/buckets/apr_brigade.c,v
retrieving revision 1.25
diff -u -r1.25 apr_brigade.c
--- buckets/apr_brigade.c	2001/09/03 22:00:36	1.25
+++ buckets/apr_brigade.c	2001/09/24 01:48:04
@@ -221,6 +221,28 @@
     return APR_SUCCESS;
 }
 
+APU_DECLARE(apr_status_t) apr_brigade_to_buffer(apr_bucket_brigade *b, char *c,
+                                                apr_read_type_e mode)
+{
+    apr_bucket *e;
+    char *cur;
+    apr_size_t curlen;
+    apr_status_t status;
+
+    APR_BRIGADE_FOREACH(e, b) {
+
+        status = apr_bucket_read(e, (const char **)&cur, &curlen, mode);
+
+        if (status != APR_SUCCESS)
+            return status;
+        
+        memcpy(c, cur, curlen);
+        c += curlen;
+    }
+
+    return APR_SUCCESS;
+}
+
 APU_DECLARE(apr_status_t) apr_brigade_to_iovec(apr_bucket_brigade *b, 
                                                struct iovec *vec, int *nvec)
 {
Index: buckets/apr_buckets_pipe.c
===================================================================
RCS file: /home/cvs/apr-util/buckets/apr_buckets_pipe.c,v
retrieving revision 1.41
diff -u -r1.41 apr_buckets_pipe.c
--- buckets/apr_buckets_pipe.c	2001/09/22 22:36:07	1.41
+++ buckets/apr_buckets_pipe.c	2001/09/24 01:48:04
@@ -110,10 +110,6 @@
         *str = a->data;
         if (rv == APR_EOF) {
             apr_file_close(p);
-            if (block == APR_NONBLOCK_READ) {
-                /* XXX: this is bogus, should return APR_SUCCESS */
-                return APR_EOF;
-            }
         }
     }
     return APR_SUCCESS;
Index: buckets/apr_buckets_socket.c
===================================================================
RCS file: /home/cvs/apr-util/buckets/apr_buckets_socket.c,v
retrieving revision 1.30
diff -u -r1.30 apr_buckets_socket.c
--- buckets/apr_buckets_socket.c	2001/09/22 22:36:07	1.30
+++ buckets/apr_buckets_socket.c	2001/09/24 01:48:04
@@ -111,10 +111,6 @@
         free(buf);
         a = apr_bucket_immortal_make(a, "", 0);
         *str = a->data;
-        if (rv == APR_EOF && block == APR_NONBLOCK_READ) {
-            /* XXX: this is bogus... should return APR_SUCCESS */
-            return APR_EOF;
-        }
     }
     return APR_SUCCESS;
 }
Index: include/apr_buckets.h
===================================================================
RCS file: /home/cvs/apr-util/include/apr_buckets.h,v
retrieving revision 1.119
diff -u -r1.119 apr_buckets.h
--- include/apr_buckets.h	2001/09/24 01:47:10	1.119
+++ include/apr_buckets.h	2001/09/24 01:48:04
@@ -689,6 +689,17 @@
                                              apr_off_t *length);
 
 /**
+ * create a flat buffer of the elements in a bucket_brigade.
+ * @param b The bucket brigade to create the buffer from
+ * @param c The pointer to the returned character string
+ * @param mode Should we block when reading the buckets
+ * Note: You *must* have enough space allocated to store all of the data.
+ * See apr_brigade_length().
+ */
+APU_DECLARE(apr_status_t) apr_brigade_to_buffer(apr_bucket_brigade *b, char *c,
+                                                apr_read_type_e mode);
+
+/**
  * create an iovec of the elements in a bucket_brigade... return number 
  * of elements used.  This is useful for writing to a file or to the
  * network efficiently.

Index: server/core.c
===================================================================
RCS file: /home/cvs/httpd-2.0/server/core.c,v
retrieving revision 1.61
diff -u -r1.61 core.c
--- server/core.c	2001/09/19 05:52:42	1.61
+++ server/core.c	2001/09/24 01:45:18
@@ -2757,19 +2757,9 @@
 {
     apr_bucket *e;
 
-    if (!f->ctx) {    /* If we haven't passed up the socket yet... */
-        f->ctx = (void *)1;
-        e = apr_bucket_socket_create(f->c->client_socket);
-        APR_BRIGADE_INSERT_TAIL(b, e);
-        return APR_SUCCESS;
-    }
-    else {            
-        /* Either some code lost track of the socket
-         * bucket or we already found out that the
-         * client closed.
-         */
-        return APR_EOF;
-    }
+    e = apr_bucket_socket_create(f->c->client_socket);
+    APR_BRIGADE_INSERT_TAIL(b, e);
+    return APR_SUCCESS;
 }
 
 /* Default filter.  This filter should almost always be used.  Its only job
Index: server/protocol.c
===================================================================
RCS file: /home/cvs/httpd-2.0/server/protocol.c,v
retrieving revision 1.45
diff -u -r1.45 protocol.c
--- server/protocol.c	2001/09/18 22:13:57	1.45
+++ server/protocol.c	2001/09/24 01:45:18
@@ -219,7 +219,7 @@
     while (1) {
         if (APR_BRIGADE_EMPTY(b)) {
             apr_off_t zero = 0;
-            if ((retval = ap_get_brigade(c->input_filters, b,
+            if ((retval = ap_get_brigade(r->input_filters, b,
                                          AP_MODE_BLOCKING,
                                          &zero /* readline */)) != APR_SUCCESS ||
                 APR_BRIGADE_EMPTY(b)) {
@@ -562,6 +562,9 @@
     r->notes           = apr_table_make(r->pool, 5);
 
     r->request_config  = ap_create_request_config(r->pool);
+    /* Must be set before we run create request hook */
+    r->output_filters  = conn->output_filters;
+    r->input_filters   = conn->input_filters;
     ap_run_create_request(r);
     r->per_dir_config  = r->server->lookup_defaults;
 
@@ -572,8 +575,6 @@
 
     r->status          = HTTP_REQUEST_TIME_OUT;  /* Until we get a request */
     r->the_request     = NULL;
-    r->output_filters  = conn->output_filters;
-    r->input_filters   = conn->input_filters;
 
     apr_setsocketopt(conn->client_socket, APR_SO_TIMEOUT, 
                      (int)(keptalive
Index: modules/http/http_core.c
===================================================================
RCS file: /home/cvs/httpd-2.0/modules/http/http_core.c,v
retrieving revision 1.282
diff -u -r1.282 http_core.c
--- modules/http/http_core.c	2001/08/25 23:43:18	1.282
+++ modules/http/http_core.c	2001/09/24 01:45:18
@@ -262,7 +262,6 @@
 
 static int ap_pre_http_connection(conn_rec *c)
 {
-    ap_add_input_filter("HTTP_IN", NULL, NULL, c);
     ap_add_input_filter("CORE_IN", NULL, NULL, c);
     ap_add_output_filter("CORE", NULL, NULL, c);
     return OK;
@@ -302,6 +301,15 @@
     return OK;
 }
 
+static int ap_http_create_req(request_rec *r)
+{
+    if (!r->main)
+    {
+        ap_add_input_filter("HTTP_IN", NULL, r, r->connection);
+    }
+    return OK;
+}
+
 static void ap_http_insert_filter(request_rec *r)
 {
     if (!r->main) {
@@ -320,10 +328,10 @@
     ap_hook_map_to_storage(ap_send_http_trace,NULL,NULL,APR_HOOK_MIDDLE);
     ap_hook_http_method(http_method,NULL,NULL,APR_HOOK_REALLY_LAST);
     ap_hook_default_port(http_port,NULL,NULL,APR_HOOK_REALLY_LAST);
+    ap_hook_create_request(ap_http_create_req, NULL, NULL, APR_HOOK_MIDDLE);
 
     ap_hook_insert_filter(ap_http_insert_filter, NULL, NULL, APR_HOOK_REALLY_LAST);
     ap_register_input_filter("HTTP_IN", ap_http_filter, AP_FTYPE_CONNECTION);
-    ap_register_input_filter("DECHUNK", ap_dechunk_filter, AP_FTYPE_TRANSCODE);
     ap_register_output_filter("HTTP_HEADER", ap_http_header_filter, 
                               AP_FTYPE_HTTP_HEADER);
     ap_register_output_filter("CHUNK", chunk_filter, AP_FTYPE_TRANSCODE);
Index: modules/http/http_protocol.c
===================================================================
RCS file: /home/cvs/httpd-2.0/modules/http/http_protocol.c,v
retrieving revision 1.362
diff -u -r1.362 http_protocol.c
--- modules/http/http_protocol.c	2001/09/17 13:12:37	1.362
+++ modules/http/http_protocol.c	2001/09/24 01:45:18
@@ -481,125 +481,115 @@
     return AP_HTTP_METHODS[methnum];
 }
 
-struct dechunk_ctx {
-    apr_size_t chunk_size;
-    apr_size_t bytes_delivered;
+static long get_chunk_size(char *);
+
+typedef struct http_filter_ctx {
+    apr_bucket_brigade *b;
+    int status;
+    apr_size_t remaining;
     enum {
-        WANT_HDR /* must have value zero */,
-        WANT_BODY,
-        WANT_TRL
+        BODY_NONE   /* must have value of zero */,
+        BODY_LENGTH,
+        BODY_CHUNK
     } state;
-};
-
-static long get_chunk_size(char *);
+} http_ctx_t;
 
-apr_status_t ap_dechunk_filter(ap_filter_t *f, apr_bucket_brigade *bb,
-                               ap_input_mode_t mode, apr_off_t *readbytes)
+/* Hi, I'm the main input filter for HTTP requests. 
+ * I handle chunked and content-length bodies. */
+apr_status_t ap_http_filter(ap_filter_t *f, apr_bucket_brigade *b, ap_input_mode_t mode, apr_off_t *readbytes)
 {
-    apr_status_t rv;
-    struct dechunk_ctx *ctx = f->ctx;
-    apr_bucket *b;
-    const char *buf;
+    apr_bucket *e;
+    char *buff;
     apr_size_t len;
+    char *pos;
+    http_ctx_t *ctx = f->ctx;
+    apr_status_t rv;
 
     if (!ctx) {
-        f->ctx = ctx = apr_pcalloc(f->r->pool, sizeof(struct dechunk_ctx));
+        f->ctx = ctx = apr_pcalloc(f->r->pool, sizeof(*ctx));
+        ctx->b = apr_brigade_create(f->r->pool);
+        ctx->status = f->r->status;
+    
+        /* Due to the way the core is designed, this should happen each time
+         * we get initialized. */
+        if ((rv = ap_get_brigade(f->next, ctx->b, mode, 
+                                 readbytes)) != APR_SUCCESS) {
+            return rv;
+        }
     }
 
-    do {
-        if (ctx->chunk_size == ctx->bytes_delivered) {
-            /* Time to read another chunk header or trailer...  ap_http_filter() is 
-             * the next filter in line and it knows how to return a brigade with 
-             * one line.
-             */
-            char line[30];
+    /* Basically, we have to stay out of the way until server/protocol.c
+     * says it is okay - which it does by setting r->status to OK. */
+    if (f->r->status != ctx->status)
+    {
+        int old_status;
+        /* Allow us to be reentrant! */
+        old_status = ctx->status;
+        ctx->status = f->r->status;
+
+        if (old_status == HTTP_REQUEST_TIME_OUT && f->r->status == HTTP_OK)
+        {
+            const char *tenc, *lenp;
+            tenc = apr_table_get(f->r->headers_in, "Transfer-Encoding");
+            lenp = apr_table_get(f->r->headers_in, "Content-Length");
+
+            if (tenc) {
+                if (!strcasecmp(tenc, "chunked")) {
+                    char line[30];
             
-            if ((rv = ap_getline(line, sizeof(line), f->r,
-                                 0 /* readline */)) < 0) {
-                return rv;
-            }
-            switch(ctx->state) {
-            case WANT_HDR:
-                ctx->chunk_size = get_chunk_size(line);
-                ctx->bytes_delivered = 0;
-                if (ctx->chunk_size == 0) {
-                    ctx->state = WANT_TRL;
+                    if ((rv = ap_getline(line, sizeof(line), f->r, 0)) < 0) {
+                        return rv;
+                    }
+                    ctx->state = BODY_CHUNK;
+                    ctx->remaining = get_chunk_size(line);
                 }
-                else {
-                    ctx->state = WANT_BODY;
-                }
-                break;
-            case WANT_TRL:
-                /* XXX sanity check end chunk here */
-                if (strlen(line)) {
-                    /* bad trailer */
-                }
-                if (ctx->chunk_size == 0) { /* we just finished the last chunk? */
-                    /* ### woah... ap_http_filter() is doing this, too */
-                    /* append eos bucket and get out */
-                    b = apr_bucket_eos_create();
-                    APR_BRIGADE_INSERT_TAIL(bb, b);
-                    return APR_SUCCESS;
-                }
-                ctx->state = WANT_HDR;
-                break;
-            default:
-                ap_assert(ctx->state == WANT_HDR || ctx->state == WANT_TRL);
             }
-        }
-    } while (ctx->state != WANT_BODY);
+            else if (lenp) {
+                const char *pos = lenp;
 
-    if (ctx->state == WANT_BODY) {
-        /* Tell ap_http_filter() how many bytes to deliver. */
-        apr_off_t chunk_bytes = ctx->chunk_size - ctx->bytes_delivered;
-
-        if ((rv = ap_get_brigade(f->next, bb, mode,
-                                 &chunk_bytes)) != APR_SUCCESS) {
-            return rv;
-        }
-
-        /* Walk through the body, accounting for bytes, and removing an eos
-         * bucket if ap_http_filter() delivered the entire chunk.
-         *
-         * ### this shouldn't be necessary. 1) ap_http_filter shouldn't be
-         * ### adding EOS buckets. 2) it shouldn't return more bytes than
-         * ### we requested, therefore the total len can be found with a
-         * ### simple call to apr_brigade_length(). no further munging
-         * ### would be needed.
-         */
-        b = APR_BRIGADE_FIRST(bb);
-        while (b != APR_BRIGADE_SENTINEL(bb) && !APR_BUCKET_IS_EOS(b)) {
-            apr_bucket_read(b, &buf, &len, mode);
-            AP_DEBUG_ASSERT(len <= ctx->chunk_size - ctx->bytes_delivered);
-            ctx->bytes_delivered += len;
-            b = APR_BUCKET_NEXT(b);
-        }
-        if (ctx->bytes_delivered == ctx->chunk_size) {
-            AP_DEBUG_ASSERT(APR_BUCKET_IS_EOS(b));
-            apr_bucket_delete(b);
-            ctx->state = WANT_TRL;
+                while (apr_isdigit(*pos) || apr_isspace(*pos))
+                    ++pos;
+                if (*pos == '\0') {
+                    ctx->state = BODY_LENGTH;
+                    ctx->remaining = atol(lenp);
+                }
+            }
         }
     }
 
-    return APR_SUCCESS;
-}
+    if (!ctx->remaining && ctx->state)
+    {
+        if (ctx->state == BODY_LENGTH)
+        {
+            e = apr_bucket_eos_create();
+            APR_BRIGADE_INSERT_TAIL(b, e);
+            return APR_SUCCESS;
+        }
+        else if (ctx->state == BODY_CHUNK)
+        {
+            char line[30];
+        
+            ctx->state = BODY_NONE;
 
-typedef struct http_filter_ctx {
-    apr_bucket_brigade *b;
-} http_ctx_t;
+            /* We need to read the CRLF after the chunk.  */
+            if ((rv = ap_getline(line, sizeof(line), f->r, 0)) < 0) {
+                return rv;
+            }
 
-apr_status_t ap_http_filter(ap_filter_t *f, apr_bucket_brigade *b, ap_input_mode_t mode, apr_off_t *readbytes)
-{
-    apr_bucket *e;
-    char *buff;
-    apr_size_t len;
-    char *pos;
-    http_ctx_t *ctx = f->ctx;
-    apr_status_t rv;
+            /* Read the real chunk line. */
+            if ((rv = ap_getline(line, sizeof(line), f->r, 0)) < 0) {
+                return rv;
+            }
+            ctx->state = BODY_CHUNK;
+            ctx->remaining = get_chunk_size(line);
 
-    if (!ctx) {
-        f->ctx = ctx = apr_pcalloc(f->c->pool, sizeof(*ctx));
-        ctx->b = apr_brigade_create(f->c->pool);
+            if (!ctx->remaining)
+            {
+                e = apr_bucket_eos_create();
+                APR_BRIGADE_INSERT_TAIL(b, e);
+                return APR_SUCCESS;
+            }
+        }
     }
 
     if (mode == AP_MODE_PEEK) {
@@ -640,12 +630,6 @@
         }
     }
 
-    if (APR_BRIGADE_EMPTY(ctx->b)) {
-        if ((rv = ap_get_brigade(f->next, ctx->b, mode, readbytes)) != APR_SUCCESS) {
-            return rv;
-        }
-    }
-
     /* If readbytes is -1, we want to just read everything until the end
      * of the brigade, which in this case means the end of the socket.  To
      * do this, we loop through the entire brigade, until the socket is
@@ -655,14 +639,21 @@
     if (*readbytes == -1) {
         apr_bucket *e;
         apr_off_t total;
+        const char *str;
+        apr_size_t len;
         APR_BRIGADE_FOREACH(e, ctx->b) {
-            const char *str;
-            apr_size_t len;
+            /* We don't care about these values.  We just want to force the
+             * lower level to just read it. */
             apr_bucket_read(e, &str, &len, APR_BLOCK_READ);
         }
         APR_BRIGADE_CONCAT(b, ctx->b);
+
+        /* Force a recompute of the length */
         apr_brigade_length(b, 1, &total);
         *readbytes = total;
+        /* This had better be equal to zero now! */
+        if (ctx->status)
+            ctx->remaining -= total;
         e = apr_bucket_eos_create();
         APR_BRIGADE_INSERT_TAIL(b, e);
         return APR_SUCCESS;
@@ -670,56 +661,28 @@
     /* readbytes == 0 is "read a single line". otherwise, read a block. */
     if (*readbytes) {
         apr_off_t total;
+        apr_bucket *e;
+        apr_bucket_brigade *newbb;
 
-        /* ### the code below, which moves bytes from one brigade to the
-           ### other is probably bogus. presuming the next filter down was
-           ### working properly, it should not have returned more than
-           ### READBYTES bytes, and we wouldn't have to do any work.
-        */
-
-        APR_BRIGADE_NORMALIZE(ctx->b);
-        if (APR_BRIGADE_EMPTY(ctx->b)) {
-            if ((rv = ap_get_brigade(f->next, ctx->b, mode, readbytes)) != APR_SUCCESS) {
-                return rv;
-            }
-        }
-            
+        newbb = NULL;
+
         apr_brigade_partition(ctx->b, *readbytes, &e);
-        APR_BRIGADE_CONCAT(b, ctx->b);
+        /* Must do split before CONCAT */
         if (e != APR_BRIGADE_SENTINEL(ctx->b)) {
-            apr_bucket_brigade *temp;
+            newbb = apr_brigade_split(ctx->b, e);
+        }
+        APR_BRIGADE_CONCAT(b, ctx->b);
 
-            temp = apr_brigade_split(b, e);
+        /* FIXME: Is this really needed?  Due to pointer use in sentinels,
+         * I think so. */
+        if (newbb)
+            APR_BRIGADE_CONCAT(ctx->b, newbb);
 
-            /* ### darn. gotta ensure the split brigade is in the proper pool.
-               ### this is a band-aid solution; we shouldn't even be doing
-               ### all of this brigade munging (per the comment above).
-               ### until then, this will get the right lifetimes. */
-            APR_BRIGADE_CONCAT(ctx->b, temp);
-        }
-        else {
-            if (!APR_BRIGADE_EMPTY(ctx->b)) {
-                ctx->b = NULL; /*XXX*/
-            }
-        }
         apr_brigade_length(b, 1, &total);
-        *readbytes -= total;
+        *readbytes = total;
+        if (ctx->status)
+            ctx->remaining -= total;
 
-        /* ### this is a hack. it is saying, "if we have read everything
-           ### that was requested, then we are at the end of the request."
-           ### it presumes that the next filter up will *only* call us
-           ### with readbytes set to the Content-Length of the request.
-           ### that may not always be true, and this code is *definitely*
-           ### too presumptive of the caller's intent. the point is: this
-           ### filter is not the guy that is parsing the headers or the
-           ### chunks to determine where the end of the request is, so we
-           ### shouldn't be monkeying with EOS buckets.
-        */
-        if (*readbytes == 0) {
-            apr_bucket *eos = apr_bucket_eos_create();
-                
-            APR_BRIGADE_INSERT_TAIL(b, eos);
-        }
         return APR_SUCCESS;
     }
 
@@ -731,14 +694,19 @@
         }
 
         pos = memchr(buff, APR_ASCII_LF, len);
+        /* We found a match. */
         if (pos != NULL) {
             apr_bucket_split(e, pos - buff + 1);
+            if (ctx->status)
+                ctx->remaining -= pos - buff + 1;
             APR_BUCKET_REMOVE(e);
             APR_BRIGADE_INSERT_TAIL(b, e);
             return APR_SUCCESS;
         }
         APR_BUCKET_REMOVE(e);
         APR_BRIGADE_INSERT_TAIL(b, e);
+        if (ctx->status)
+            ctx->remaining -= len;
     }
     return APR_SUCCESS;
 }
@@ -1406,7 +1374,6 @@
         }
 
         r->read_chunked = 1;
-        ap_add_input_filter("DECHUNK", NULL, r, r->connection);
     }
     else if (lenp) {
         const char *pos = lenp;
@@ -1517,14 +1484,12 @@
  */
 AP_DECLARE(long) ap_get_client_block(request_rec *r, char *buffer, apr_size_t bufsiz)
 {
-    apr_size_t len_read, total;
-    apr_status_t rv;
-    apr_bucket *b, *old;
-    const char *tempbuf;
+    apr_off_t len;
+    apr_bucket *b;
     core_request_config *req_cfg =
 	(core_request_config *)ap_get_module_config(r->request_config,
                                                     &core_module);
-    apr_bucket_brigade *bb = req_cfg->bb;
+    apr_bucket_brigade *bb = req_cfg->bb, *newbb;
 
     do {
         if (APR_BRIGADE_EMPTY(bb)) {
@@ -1546,41 +1511,34 @@
         return 0;
     }
 
-    /* ### it would be nice to replace the code below with "consume N bytes
-       ### from this brigade, placing them into that buffer." there are
-       ### other places where we do the same...
-       ###
-       ### alternatively, we could partition the brigade, then call a
-       ### function which serializes a given brigade into a buffer. that
-       ### semantic is used elsewhere, too...
-    */
-
-    total = 0;
-    while (total < bufsiz
-           && b != APR_BRIGADE_SENTINEL(bb)
-           && !APR_BUCKET_IS_EOS(b)) {
-
-        if ((rv = apr_bucket_read(b, &tempbuf, &len_read, APR_BLOCK_READ)) != APR_SUCCESS) {
-            return -1;
-        }
-        if (total + len_read > bufsiz) {
-            apr_bucket_split(b, bufsiz - total);
-            len_read = bufsiz - total;
-        }
-        memcpy(buffer, tempbuf, len_read);
-        buffer += len_read;
-        total += len_read;
-        /* XXX the next two fields shouldn't be mucked with here, as they are in terms
-         * of bytes in the unfiltered body; gotta see if anybody else actually uses 
-         * these
-         */
-        r->read_length += len_read;      /* XXX yank me? */
-        old = b;
-        b = APR_BUCKET_NEXT(b);
-        apr_bucket_delete(old);
-    }
+    /* 1) Determine length to see if we may overrun. 
+     * 2) Partition the brigade at the appropriate point.
+     * 3) Split the brigade at the new partition.
+     * 4) Read the old brigade into the buffer.
+     * 5) Destroy the old brigade.
+     * 6) Point the context at the new brigade.
+     */
+    apr_brigade_length(bb, 1, &len);
+
+    if (bufsiz < len)
+        len = bufsiz;
+
+    r->read_length += len;
+
+    if (apr_brigade_partition(bb, len, &b) != APR_SUCCESS)
+        return -1;
+
+    newbb = apr_brigade_split(bb, b);
+
+    if (apr_brigade_to_buffer(bb, buffer, APR_BLOCK_READ) != APR_SUCCESS)
+        return -1;
+
+    if (apr_brigade_destroy(bb) != APR_SUCCESS)
+        return -1;
+
+    req_cfg->bb = newbb;
 
-    return total;
+    return len;
 }
 
 /* In HTTP/1.1, any method can have a body.  However, most GET handlers

Index: perl-framework/c-modules/input_body_filter/mod_input_body_filter.c
===================================================================
RCS file: /home/cvs/httpd-test/perl-framework/c-modules/input_body_filter/mod_input_body_filter.c,v
retrieving revision 1.4
diff -u -r1.4 mod_input_body_filter.c
--- perl-framework/c-modules/input_body_filter/mod_input_body_filter.c	2001/09/08 22:07:14	1.4
+++ perl-framework/c-modules/input_body_filter/mod_input_body_filter.c	2001/09/24 01:53:15
@@ -76,25 +76,43 @@
     }
 }
 
+typedef struct input_body_ctx_t {
+    apr_bucket_brigade *b;
+} input_body_ctx_t;
+
 static int input_body_filter_handler(ap_filter_t *f, apr_bucket_brigade *bb, 
                                      ap_input_mode_t mode, apr_off_t *readbytes)
 {
     apr_status_t rv;
     apr_pool_t *p = f->r->pool;
+    input_body_ctx_t *ctx = f->ctx;
 
-    if (APR_BRIGADE_EMPTY(bb)) {
-        rv = ap_get_brigade(f->next, bb, mode, readbytes);
-        if (rv != APR_SUCCESS) {
+    if (!ctx) {
+        f->ctx = ctx = apr_pcalloc(f->r->pool, sizeof(*ctx));
+        ctx->b = apr_brigade_create(f->r->pool);
+    }
+
+    if (APR_BRIGADE_EMPTY(ctx->b))
+    {
+        if ((rv = ap_get_brigade(f->next, ctx->b, mode,
+                                 readbytes)) != APR_SUCCESS) {
             return rv;
         }
     }
 
-    while (!APR_BRIGADE_EMPTY(bb)) {
+    while (!APR_BRIGADE_EMPTY(ctx->b)) {
         const char *data;
         apr_size_t len;
         apr_bucket *bucket;
 
-        bucket = APR_BRIGADE_FIRST(bb);
+        bucket = APR_BRIGADE_FIRST(ctx->b);
+
+        if (APR_BUCKET_IS_EOS(bucket)) {
+            APR_BUCKET_REMOVE(bucket);
+            APR_BRIGADE_INSERT_TAIL(bb, bucket);
+            break;
+        }
+
         rv = apr_bucket_read(bucket, &data, &len, mode);
 
         if (rv != APR_SUCCESS) {
@@ -110,10 +128,6 @@
         }
 
         APR_BRIGADE_INSERT_TAIL(bb, bucket);
-
-        if (APR_BUCKET_IS_EOS(bucket)) {
-            break;
-        }
     }
 
     return OK;


Re: [PATCH] Take 2 of the http filter rewrite

Posted by Greg Stein <gs...@lyra.org>.
On Sun, Sep 23, 2001 at 06:58:19PM -0700, Justin Erenkrantz wrote:
>...
> --- server/protocol.c	2001/09/18 22:13:57	1.45
> +++ server/protocol.c	2001/09/24 01:45:18
>...
> @@ -562,6 +562,9 @@
>      r->notes           = apr_table_make(r->pool, 5);
>  
>      r->request_config  = ap_create_request_config(r->pool);
> +    /* Must be set before we run create request hook */
> +    r->output_filters  = conn->output_filters;
> +    r->input_filters   = conn->input_filters;
>      ap_run_create_request(r);
>      r->per_dir_config  = r->server->lookup_defaults;
>  
> @@ -572,8 +575,6 @@
>  
>      r->status          = HTTP_REQUEST_TIME_OUT;  /* Until we get a request */
>      r->the_request     = NULL;
> -    r->output_filters  = conn->output_filters;
> -    r->input_filters   = conn->input_filters;

Let's go ahead and keep this change, even though we aren't turning HTTP_IN
into a request filter.

>...
> --- modules/http/http_protocol.c	2001/09/17 13:12:37	1.362
> +++ modules/http/http_protocol.c	2001/09/24 01:45:18
> @@ -481,125 +481,115 @@
>      return AP_HTTP_METHODS[methnum];
>  }
>  
> -struct dechunk_ctx {
> -    apr_size_t chunk_size;
> -    apr_size_t bytes_delivered;
> +static long get_chunk_size(char *);
> +
> +typedef struct http_filter_ctx {
> +    apr_bucket_brigade *b;
> +    int status;

The whole mess with this "status" field is a hack. There should be comments
inserted to that effect. Not your fault, and we have to deal with it for a
bit, but it should at least be noted.

Ideally, when this filter becomes a request filter (later), we can simply
delay adding it until the higher level has processed the headers. Once the
headers are out of the way, then this filter can be added to process the
message body.
(i.e. move its insertion to the point you mention: line 626 of protocol.c )

> +    apr_size_t remaining;
>      enum {
> -        WANT_HDR /* must have value zero */,
> -        WANT_BODY,
> -        WANT_TRL
> +        BODY_NONE   /* must have value of zero */,

It doesn't have zero if you simply test for this state. i.e. rather than
saying "if (ctx->state)" .. just say "if (ctx->state != BODY_NONE)". It
could also improve clarity.

> +        BODY_LENGTH,
> +        BODY_CHUNK
>      } state;
> -};
> -
> -static long get_chunk_size(char *);
> +} http_ctx_t;
>  
> -apr_status_t ap_dechunk_filter(ap_filter_t *f, apr_bucket_brigade *bb,
> -                               ap_input_mode_t mode, apr_off_t *readbytes)
> +/* Hi, I'm the main input filter for HTTP requests. 
> + * I handle chunked and content-length bodies. */
> +apr_status_t ap_http_filter(ap_filter_t *f, apr_bucket_brigade *b, ap_input_mode_t mode, apr_off_t *readbytes)
>  {
> -    apr_status_t rv;
> -    struct dechunk_ctx *ctx = f->ctx;
> -    apr_bucket *b;
> -    const char *buf;
> +    apr_bucket *e;
> +    char *buff;
>      apr_size_t len;
> +    char *pos;
> +    http_ctx_t *ctx = f->ctx;
> +    apr_status_t rv;
>  
>      if (!ctx) {
> -        f->ctx = ctx = apr_pcalloc(f->r->pool, sizeof(struct dechunk_ctx));
> +        f->ctx = ctx = apr_pcalloc(f->r->pool, sizeof(*ctx));
> +        ctx->b = apr_brigade_create(f->r->pool);
> +        ctx->status = f->r->status;
> +    
> +        /* Due to the way the core is designed, this should happen each time
> +         * we get initialized. */
> +        if ((rv = ap_get_brigade(f->next, ctx->b, mode, 
> +                                 readbytes)) != APR_SUCCESS) {
> +            return rv;
> +        }
>      }
>  
> -    do {
> -        if (ctx->chunk_size == ctx->bytes_delivered) {
> -            /* Time to read another chunk header or trailer...  ap_http_filter() is 
> -             * the next filter in line and it knows how to return a brigade with 
> -             * one line.
> -             */
> -            char line[30];
> +    /* Basically, we have to stay out of the way until server/protocol.c
> +     * says it is okay - which it does by setting r->status to OK. */
> +    if (f->r->status != ctx->status)
> +    {
> +        int old_status;
> +        /* Allow us to be reentrant! */
> +        old_status = ctx->status;
> +        ctx->status = f->r->status;
> +
> +        if (old_status == HTTP_REQUEST_TIME_OUT && f->r->status == HTTP_OK)

There should be a comment that HTTP_REQUEST_TIME_OUT is the initial value
for r->status. Therefore, this code section is effectively an init()
sequence for processing the message body.

> +        {
> +            const char *tenc, *lenp;
> +            tenc = apr_table_get(f->r->headers_in, "Transfer-Encoding");
> +            lenp = apr_table_get(f->r->headers_in, "Content-Length");
> +
> +            if (tenc) {
> +                if (!strcasecmp(tenc, "chunked")) {
> +                    char line[30];
>              
>...
> +                    if ((rv = ap_getline(line, sizeof(line), f->r, 0)) < 0) {
> +                        return rv;
> +                    }
> +                    ctx->state = BODY_CHUNK;
> +                    ctx->remaining = get_chunk_size(line);

The above section of code, and some of the stuff following is duplicated
from ap_setup_client_block(). There should be a comment about this
duplicated code. Especially because this copy doesn't contain all of the
error checking and responses of the ap_setup_client_block version. A next
step would be to figure out how to remove that duplication.

Specifically, I'd think that we figure out some of this information when we
insert the [request form of this] filter. The ap_read_request() function
would set up the filter context and set it when the filter is added; the
context would indicate the particular parameters for body reading.

[ maybe at some point in the future, the filter can read the request line,
  headers, body, etc, and do all of the computation and checking itself ]

>...
> +        else if (ctx->state == BODY_CHUNK)
> +        {
> +            char line[30];
> +        
> +            ctx->state = BODY_NONE;
>...
> +            /* We need to read the CRLF after the chunk.  */
> +            if ((rv = ap_getline(line, sizeof(line), f->r, 0)) < 0) {
> +                return rv;
> +            }

Is the BODY_NONE thing so that the ap_getline() recursion thingy works
properly, and returns a line?

>...
> +        /* This had better be equal to zero now! */
> +        if (ctx->status)
> +            ctx->remaining -= total;

What better be zero? status or remaining?

I'm not sure why ctx->status would ever be zero. Could you explain? Is that
at some point where r->status becomes OK (rather than HTTP_OK). A lot of
stuff seems to key off ctx->status != 0. I don't see why...

>...
>  AP_DECLARE(long) ap_get_client_block(request_rec *r, char *buffer, apr_size_t bufsiz)

Probably just leave this be, since we don't have the apr_brigade_to_buffer
function for now.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: [PATCH] Take 2 of the http filter rewrite

Posted by Justin Erenkrantz <je...@ebuilt.com>.
On Mon, Sep 24, 2001 at 12:45:19PM -0700, Ryan Bloom wrote:
> The original reason that the core worked this way, was to allow for SSL
> libraries that didn't support in-memory decryption.  It may not be necessary
> anymore however.

A quick look at mod_ssl looks like it should be okay.

But, if a library don't support in-memory decryption, they could 
simply replace the CORE_IN filter.  

(I am currently working on moving most of the logic from HTTP_IN
to CORE_IN - this should resolve the pipelining problem and make
the ap_http_filter much less complicated...)  -- justin


Re: [PATCH] Take 2 of the http filter rewrite

Posted by Ryan Bloom <rb...@covalent.net>.
On Monday 24 September 2001 12:27 pm, Justin Erenkrantz wrote:
> On Mon, Sep 24, 2001 at 03:14:22PM -0400, Cliff Woolley wrote:
> > On Mon, 24 Sep 2001, Justin Erenkrantz wrote:
> > > The brigades should not be returning more data than is asked for
> > > ever.
> >
> > Right.  But the only increment the API allows to ask for are one whole
> > bucket's worth at a time.
>
> util_filter's ap_get_brigade takes in the length to read.  The
> translation from httpd->apr_util in the core filter loses this
> distinction.  The core filter should be splitting the buckets and
> only returning the requested amount to HTTP_IN (and it'd have to
> handle the 0 case when we want to read a line).  Since the
> core is inherently connection-based, I see no reason why the
> buffering can't be done there.  Currently, it passes the
> socket bucket up the chain.
>

The original reason that the core worked this way, was to allow for SSL
libraries that didn't support in-memory decryption.  It may not be necessary
anymore however.

Ryan

______________________________________________________________
Ryan Bloom				rbb@apache.org
Covalent Technologies			rbb@covalent.net
--------------------------------------------------------------

Re: [PATCH] Take 2 of the http filter rewrite

Posted by Greg Stein <gs...@lyra.org>.
On Mon, Sep 24, 2001 at 12:27:21PM -0700, Justin Erenkrantz wrote:
> On Mon, Sep 24, 2001 at 03:14:22PM -0400, Cliff Woolley wrote:
> > On Mon, 24 Sep 2001, Justin Erenkrantz wrote:
> > 
> > > The brigades should not be returning more data than is asked for
> > > ever.
> > 
> > Right.  But the only increment the API allows to ask for are one whole
> > bucket's worth at a time.

[ cliff referring to buckets and brigades; the real problem with the filters
  not the brigade API ]

> util_filter's ap_get_brigade takes in the length to read.  The

Unfortunately, it takes it as a pointer-to-length. That is wrong -- it
should just be an apr_off_t length value.

I changed that at one point in April or so. However, that broke something
else and Ryan reverted most of the patch. I forget the full reason there,
but it was definitely related to some of the other issues we've been
discussing around your patch over the past couple days.

> translation from httpd->apr_util in the core filter loses this 
> distinction.  The core filter should be splitting the buckets and 
> only returning the requested amount to HTTP_IN (and it'd have to 
> handle the 0 case when we want to read a line).

Absolutely. And to support the 0 case, I suggested a set of brigade
functions to help out:

  apr_brigade_getline(bb, line, sizeof(line))
  apr_brigade_pgetline(bb, &line, pool)
  apr_brigade_splitline(bb, &newbb)

The first two would use the last function to implement the getline process.
The reason for the first two forms is for memory management issues. You may
not want to copy the line into a pool, but on the other hand, you may want
the whole line even if it doesn't fit your buffer.

Note on the getline() case: you'd need to distinguish between "the line
filled the buffer" and "the line is longer than the provided buffer". Maybe
a returned length could indicate that somehow.

This getline stuff is in the httpd STATUS file under the item for revamping
the input filter system.

By making brigade functions, you could do a getline anywhere in the code.
Consider the case where you have a decompression filter and a higher filter
asks for a line. The filter would decompress into a buffer or a brigade,
then see the "0" and need to return a line. Having the function available
makes it easy for him. If you isolate the line-from-brigade functionality to
CORE_IN, then it will limit what others can do.

> Since the
> core is inherently connection-based, I see no reason why the
> buffering can't be done there.  Currently, it passes the
> socket bucket up the chain.

It certainly can. Zero complaints here. Hell... loud applause if you are
going to spend the time to fix it!

>...
> The problem is that HTTP_IN has the socket bucket to begin with (which 
> has -1 length - i.e. indeterminate).  In order to properly handle a 
> request-centric HTTP filter, that filter must not get the -1 bucket.
> It must be hidden by the core filter.

Yup.

Note that (strictly speaking) you may not even need the socket bucket. You
could read straight from the socket, create a bucket around that, and insert
the result into the passed-in brigade.

If you use a socket bucket, then you're going to have to have an internal
brigade. Then you're going to read, split, and shift some buckets from your
internal brigade over to the passed-in brigade.

YMMV, but consider the case of avoiding a socket bucket.

> (This is the way I see it...)  -- justin

Cool. Me too :-)

Like I said, I was just hoping that your patch would not require all of this
extra work. That it could be applied as an incremental improvement. But,
alas... that doesn't seem to be the case. So the first step would be
adjusting the return amounts, then applying your patch.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: [PATCH] Take 2 of the http filter rewrite

Posted by Cliff Woolley <cl...@yahoo.com>.
On Mon, 24 Sep 2001, Justin Erenkrantz wrote:

> The problem is that HTTP_IN has the socket bucket to begin with (which
> has -1 length - i.e. indeterminate).  In order to properly handle a
> request-centric HTTP filter, that filter must not get the -1 bucket.
> It must be hidden by the core filter.
>
> (This is the way I see it...)  -- justin

That sounds correct to me.

--Cliff

--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA



Re: [PATCH] Take 2 of the http filter rewrite

Posted by Justin Erenkrantz <je...@ebuilt.com>.
On Mon, Sep 24, 2001 at 03:14:22PM -0400, Cliff Woolley wrote:
> On Mon, 24 Sep 2001, Justin Erenkrantz wrote:
> 
> > The brigades should not be returning more data than is asked for
> > ever.
> 
> Right.  But the only increment the API allows to ask for are one whole
> bucket's worth at a time.

util_filter's ap_get_brigade takes in the length to read.  The
translation from httpd->apr_util in the core filter loses this 
distinction.  The core filter should be splitting the buckets and 
only returning the requested amount to HTTP_IN (and it'd have to 
handle the 0 case when we want to read a line).  Since the
core is inherently connection-based, I see no reason why the
buffering can't be done there.  Currently, it passes the
socket bucket up the chain.

> > What happens then is that we now enter a condition where the
> > caller may not be able to handle the data (i.e. I only wanted 10
> > bytes you gave me 8192, oops).
> 
> The filters should be able to handle this.  Like I said before, if you
> want 10 bytes, fine... just do this:
> 
> if (apr_bucket_split(b, 10) == APR_ENOTIMPL) {
>     /* bucket cannot be split natively */
>     apr_bucket_read(b,str,&len,APR_BLOCK_READ); /* check for errors */
>     apr_bucket_split(b, 10);                    /* check for errors */
> }
> apr_bucket_read(b,str,&len,block);              /* check for errors */
> 
> I guarantee you len will come back as 10 every time, because b is now 10
> bytes and b->next is all the rest.

The problem is that HTTP_IN has the socket bucket to begin with (which 
has -1 length - i.e. indeterminate).  In order to properly handle a 
request-centric HTTP filter, that filter must not get the -1 bucket.
It must be hidden by the core filter.  

(This is the way I see it...)  -- justin


Re: [PATCH] Take 2 of the http filter rewrite

Posted by Justin Erenkrantz <je...@ebuilt.com>.
On Mon, Sep 24, 2001 at 01:31:52PM -0700, Greg Stein wrote:
> > The brigades should not be returning more data than is asked for 
> > ever.
> 
> Fix the wording: not the brigades but the *filters*.
>
> And yes: I totally agree. I've said that a few times recently :-) I was just
> hoping that your patch could be done independently of the limiting thing.
> And that we could go fix the limited-return in a future step.

Yeah, you are right.  The brigades are fine - the core filter isn't
doing enough IMHO.

> > What happens then is that we now enter a condition where the 
> > caller may not be able to handle the data (i.e. I only wanted 10 
> > bytes you gave me 8192, oops).  This is fundamentally incorrect - 
> > changing scopes doesn't seem to be the answer.  -- justin
> 
> All right. If you're willing to tackle the limited return, then that would
> seem to be a good first step. The second step would be the patch you just
> posted.
> 
> IOW, could we get "limited return" first, then the reorg patch second? Or
> are the two things too comingled right now?

In theory, I think you could, but the limited return means shifting
stuff from http to core.  Then, the http filter needs to be rewritten
to handle the core doing the "right" thing now.  Since that is the
case, it seems logical to also remove dechunk by merging the two.

I've got my tree building with the new core filter.  I'll test it 
with httpd-test and flood.  If it looks good (I think it will!), 
I'll post and we can begin reviewing it.

If you think the posted patch is too much, I can see what I can 
do about splitting the two.  -- justin


Re: [PATCH] Take 2 of the http filter rewrite

Posted by Greg Stein <gs...@lyra.org>.
On Mon, Sep 24, 2001 at 11:58:40AM -0700, Justin Erenkrantz wrote:
> On Mon, Sep 24, 2001 at 01:32:02AM -0700, Greg Stein wrote:
> > Eww. No... just move the filter back to the connection and you're fine. We
> > can make it a request filter later on.
> 
> Actually, you can't make it a connection-based filter because there
> is no way to retrieve any request information (such as determining
> what the body type is) from a connection filter.  So, we can't merge 
> dechunk and http_in unless we have access to the current request.  
> This means that HTTP_IN must be a request filter in order to get it
> to understand HTTP.

Hmm. All right...

> What is the problem with using the pool's userdata?  I don't see
> the "Eww" that you guys do.  Or, what is the problem with getting 
> the socket to behave appropriately (i.e. I want 10 bytes, give me
> 10 not 8192)?

It is a hack. I see it as using the userdata as a global variable. You drop
off some data and go pick it back up later on. The code and data flow is
obtuse and non-obvious. When somebody sees the read or write to the
userdata, they're going to go, "what is going on?"

> The brigades should not be returning more data than is asked for 
> ever.

Fix the wording: not the brigades but the *filters*.

And yes: I totally agree. I've said that a few times recently :-) I was just
hoping that your patch could be done independently of the limiting thing.
And that we could go fix the limited-return in a future step.

> What happens then is that we now enter a condition where the 
> caller may not be able to handle the data (i.e. I only wanted 10 
> bytes you gave me 8192, oops).  This is fundamentally incorrect - 
> changing scopes doesn't seem to be the answer.  -- justin

All right. If you're willing to tackle the limited return, then that would
seem to be a good first step. The second step would be the patch you just
posted.

IOW, could we get "limited return" first, then the reorg patch second? Or
are the two things too comingled right now?

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: [PATCH] Take 2 of the http filter rewrite

Posted by Cliff Woolley <cl...@yahoo.com>.
On Mon, 24 Sep 2001, Justin Erenkrantz wrote:

> The brigades should not be returning more data than is asked for
> ever.

Right.  But the only increment the API allows to ask for are one whole
bucket's worth at a time.

> What happens then is that we now enter a condition where the
> caller may not be able to handle the data (i.e. I only wanted 10
> bytes you gave me 8192, oops).

The filters should be able to handle this.  Like I said before, if you
want 10 bytes, fine... just do this:

if (apr_bucket_split(b, 10) == APR_ENOTIMPL) {
    /* bucket cannot be split natively */
    apr_bucket_read(b,str,&len,APR_BLOCK_READ); /* check for errors */
    apr_bucket_split(b, 10);                    /* check for errors */
}
apr_bucket_read(b,str,&len,block);              /* check for errors */

I guarantee you len will come back as 10 every time, because b is now 10
bytes and b->next is all the rest.

[I still wish we allowed pipe and socket buckets to be split natively.  I
 know Ryan has strongly opposed that idea in the past, but I still think
 it should be possible.]

--Cliff


--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA



Re: [PATCH] Take 2 of the http filter rewrite

Posted by Justin Erenkrantz <je...@ebuilt.com>.
On Mon, Sep 24, 2001 at 01:32:02AM -0700, Greg Stein wrote:
> Eww. No... just move the filter back to the connection and you're fine. We
> can make it a request filter later on.

Actually, you can't make it a connection-based filter because there
is no way to retrieve any request information (such as determining
what the body type is) from a connection filter.  So, we can't merge 
dechunk and http_in unless we have access to the current request.  
This means that HTTP_IN must be a request filter in order to get it
to understand HTTP.

What is the problem with using the pool's userdata?  I don't see
the "Eww" that you guys do.  Or, what is the problem with getting 
the socket to behave appropriately (i.e. I want 10 bytes, give me
10 not 8192)?  

The brigades should not be returning more data than is asked for 
ever.  What happens then is that we now enter a condition where the 
caller may not be able to handle the data (i.e. I only wanted 10 
bytes you gave me 8192, oops).  This is fundamentally incorrect - 
changing scopes doesn't seem to be the answer.  -- justin


Re: [PATCH] Take 2 of the http filter rewrite

Posted by Greg Stein <gs...@lyra.org>.
On Sun, Sep 23, 2001 at 09:35:31PM -0700, Ryan Bloom wrote:
> On Sunday 23 September 2001 09:22 pm, Justin Erenkrantz wrote:
> > On Sun, Sep 23, 2001 at 08:34:00PM -0700, Ryan Bloom wrote:
>...
> I don't see this at all.  It is not possible to specify how much is read from the
> socket itself.  I just read through the bucket_socket code, and we always
> try to read APR_BUCKET_BUFF_SIZE.  This seems to mean that we are
> going to read more from the socket than we should, and the second request
> will be lost.

Ryan's quite right.

>...
> > register a cleanup with the request pool that setasides everything
> > that was read but not processed (i.e. all buckets except for the
> > socket) to now live as long as the connection pool.  The question
> > now becomes how do we retrieve this data back.  My first guess would
> > be to store it via a pool userdata option, but maybe we could shove
> > it back down into the core (which definitely breaks some abstractions).
> > So, I think I'm leaning towards the userdata - use "HTTP-IN-<tid>"
> > to specify the uniqueness should work.

Eww. No... just move the filter back to the connection and you're fine. We
can make it a request filter later on.

> No.  I fought against this VERY strenuously last time, and I will fight it again.

Me, too :-)

> This is why it needs to be a connection filter.  To keep the abstraction valid,
> the data being read from the socket is stored in the filter's context, the same
> way it is for the output filters.

For now, yes: a connection filter is appropriate. Later, when we can
guarantee that the request-level filter will never read past the end of the
request (i.e. it never requests more from the lower-level input filter),
then we can move it to a request filter (which, as I mentioned before has
interesting properties).

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: [PATCH] Take 2 of the http filter rewrite

Posted by Justin Erenkrantz <je...@ebuilt.com>.
On Sun, Sep 23, 2001 at 09:35:31PM -0700, Ryan Bloom wrote:
> > The only real question is what happens when readbytes is -1 - a
> > higher-level filter explicitly asks for everything.  We'd then do a
> > blocking read for the entire socket until it is exhausted.  I think
> > the current code would have problems with it as well anyway.  I'm not
> > really sure what we can do here.  Thoughts?  -- justin
> 
> The current code doesn't have problems with this, because we rely on
> the fact that the HTTP_IN filter understands HTTP.  It knows when the request
> is done. Anything else is invalid.

Please take a look at the code again.  ap_http_filter when -1 is
passed in as readbytes will read until the socket is exhausted
and returns the entire brigade to the higher-level filter with EOS.
This is potentially eating up a pipelined request.

If this is an incorrect interpretation, I will be delighted to learn
how this works like you say it does.  =)  -- justin


Re: [PATCH] Take 2 of the http filter rewrite

Posted by Cliff Woolley <cl...@yahoo.com>.
On Sun, 23 Sep 2001, Ryan Bloom wrote:

> I don't see this at all.  It is not possible to specify how much is
> read from the socket itself.  I just read through the bucket_socket
> code, and we always try to read APR_BUCKET_BUFF_SIZE.  This seems to
> mean that we are going to read more from the socket than we should,
> and the second request will be lost.

Yeah... it was decided way back when that apr_bucket_read() should never
return partial content.  You get all of the data in the bucket when you
read from it, or as much as will fit in the bucket if the original data is
too big.  If you want less than that, split the bucket before reading it.

--Cliff


Re: [PATCH] Take 2 of the http filter rewrite

Posted by Ryan Bloom <rb...@covalent.net>.
On Sunday 23 September 2001 09:22 pm, Justin Erenkrantz wrote:
> On Sun, Sep 23, 2001 at 08:34:00PM -0700, Ryan Bloom wrote:
> > I don't see how this works at all for pipelined requests.  You moved the
> > HTTP_IN filter from a connection filter to a request filter. But, the
> > only connection filter we have is the CORE_iN filter.  All that filter
> > does, is to pass the socket up to the HTTP_IN filter. The problem is that
> > HTTP_IN will be destroyed in between requests, so the second request will
> > be lost.  I am continuing to read the code, so I may see what I am
> > missing, but in the mean time, Justin, if you could explain that to me,
> > it would make my review go a lot faster.
>
> Yeah, that's the one tricky part about this patch.  Remember that we
> no longer read more than is explicitly asked for from the socket.
>
> If the request has a body, it must have Content-Length or
> Transfer-Encoding and we now know when to send EOS bucket in these cases.
> (In theory, I think you can do a POST with a Connection: Close -
> half-close the socket, but that's ineligible for pipelining anyway).  So
> in these cases, the extra information is left unread by ap_http_filter
> or in the core input filter (which really does nothing - the new request
> is just left unread at the socket).

I don't see this at all.  It is not possible to specify how much is read from the
socket itself.  I just read through the bucket_socket code, and we always
try to read APR_BUCKET_BUFF_SIZE.  This seems to mean that we are
going to read more from the socket than we should, and the second request
will be lost.

> You do seem to be right - when we read a line at a time
> (readbytes == 0), we could read too much.  However, I think we could
> register a cleanup with the request pool that setasides everything
> that was read but not processed (i.e. all buckets except for the
> socket) to now live as long as the connection pool.  The question
> now becomes how do we retrieve this data back.  My first guess would
> be to store it via a pool userdata option, but maybe we could shove
> it back down into the core (which definitely breaks some abstractions).
> So, I think I'm leaning towards the userdata - use "HTTP-IN-<tid>"
> to specify the uniqueness should work.

No.  I fought against this VERY strenuously last time, and I will fight it again.
This is why it needs to be a connection filter.  To keep the abstraction valid,
the data being read from the socket is stored in the filter's context, the same
way it is for the output filters.

> The only real question is what happens when readbytes is -1 - a
> higher-level filter explicitly asks for everything.  We'd then do a
> blocking read for the entire socket until it is exhausted.  I think
> the current code would have problems with it as well anyway.  I'm not
> really sure what we can do here.  Thoughts?  -- justin

The current code doesn't have problems with this, because we rely on
the fact that the HTTP_IN filter understands HTTP.  It knows when the request
is done. Anything else is invalid.

Ryan

______________________________________________________________
Ryan Bloom				rbb@apache.org
Covalent Technologies			rbb@covalent.net
--------------------------------------------------------------

Re: [PATCH] Take 2 of the http filter rewrite

Posted by Justin Erenkrantz <je...@ebuilt.com>.
On Sun, Sep 23, 2001 at 08:34:00PM -0700, Ryan Bloom wrote:
> I don't see how this works at all for pipelined requests.  You moved the HTTP_IN
> filter from a connection filter to a request filter. But, the only connection filter we
> have is the CORE_iN filter.  All that filter does, is to pass the socket up to the
> HTTP_IN filter. The problem is that HTTP_IN will be destroyed in between requests,
> so the second request will be lost.  I am continuing to read the code, so I may see
> what I am missing, but in the mean time, Justin, if you could explain that to me,
> it would make my review go a lot faster.

Yeah, that's the one tricky part about this patch.  Remember that we
no longer read more than is explicitly asked for from the socket.

If the request has a body, it must have Content-Length or 
Transfer-Encoding and we now know when to send EOS bucket in these cases.
(In theory, I think you can do a POST with a Connection: Close - 
half-close the socket, but that's ineligible for pipelining anyway).  So 
in these cases, the extra information is left unread by ap_http_filter 
or in the core input filter (which really does nothing - the new request
is just left unread at the socket).

You do seem to be right - when we read a line at a time 
(readbytes == 0), we could read too much.  However, I think we could 
register a cleanup with the request pool that setasides everything 
that was read but not processed (i.e. all buckets except for the
socket) to now live as long as the connection pool.  The question
now becomes how do we retrieve this data back.  My first guess would
be to store it via a pool userdata option, but maybe we could shove
it back down into the core (which definitely breaks some abstractions).
So, I think I'm leaning towards the userdata - use "HTTP-IN-<tid>" 
to specify the uniqueness should work.

When the request is completed, we re-enter ap_read_request which sets
up a new HTTP_IN filter.  On the first read, HTTP_IN asks the core
filter for the socket (notice the change here - core will now *always*
return the socket when asked) and ideally, the next request should
have been left unread in the socket (or saved via userdata - see
above).  And, HTTP_IN will now process the new request.

The only real question is what happens when readbytes is -1 - a 
higher-level filter explicitly asks for everything.  We'd then do a 
blocking read for the entire socket until it is exhausted.  I think 
the current code would have problems with it as well anyway.  I'm not 
really sure what we can do here.  Thoughts?  -- justin


Re: [PATCH] Take 2 of the http filter rewrite

Posted by Greg Stein <gs...@lyra.org>.
On Sun, Sep 23, 2001 at 09:27:58PM -0700, Ryan Bloom wrote:
> On Sunday 23 September 2001 09:21 pm, Greg Stein wrote:
>...
> > There is a create_request hook which inserts the HTTP_IN filter when the
> > request is created.
> >
> > This seems like a good idea: each instantiation of the HTTP_IN filter is
> > for *one* request and one request only. We can be much more sure that state
> > is not spanning requests.
> 
> No, this is a bad idea, and I still don't see how it works for pipelined requests.
> The problem is that it is impossible to make sure that you only read a specific
> amount from the socket.

Whoops. Hell ya... you're so right.

All right. Until CORE_IN can return only a specified amount, the new,
combined filter should remain a connection filter. The associated
per-request logic in this patch should go away. The code will put the socket
bucket (and other unread/unused data) into the filter's context, which will
survive until the next request.

Justin... want to try a third pass? :-)

btw, does the test suite actually try out persistent connections? There was
a problem with persistent connections for a good long while, until I fixed
it last week.

(while I like the flatten function, maybe we can leave that out of this
 patch for now; shoot for that one at another time, and replace similar uses
 across the board (rather than one spot))

In some future pass, we can reinstate the per-request behavior, but that is
a discussion for another time...

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: [PATCH] Take 2 of the http filter rewrite

Posted by Ryan Bloom <rb...@covalent.net>.
On Sunday 23 September 2001 09:21 pm, Greg Stein wrote:

> > I don't see how this works at all for pipelined requests.  You moved the
> > HTTP_IN filter from a connection filter to a request filter. But, the
> > only connection filter we have is the CORE_iN filter.  All that filter
> > does, is to pass the socket up to the HTTP_IN filter. The problem is that
> > HTTP_IN will be destroyed in between requests, so the second request will
> > be lost.  I am continuing to read the code, so I may see what I am
> > missing, but in the mean time, Justin, if you could explain that to me,
> > it would make my review go a lot faster.
>
> There is a create_request hook which inserts the HTTP_IN filter when the
> request is created.
>
> This seems like a good idea: each instantiation of the HTTP_IN filter is
> for *one* request and one request only. We can be much more sure that state
> is not spanning requests.

No, this is a bad idea, and I still don't see how it works for pipelined requests.
The problem is that it is impossible to make sure that you only read a specific
amount from the socket.  So, the first request is likely to read too much from the
socket, and then it is lost, and the second request will fail to work.  This is why
the HTTP_IN filter needs to be a connection filter.  It is the place where we figure
out when we have hit the end of one request, and are ready for the next.

> Even better, this enables things like the Upgrade: header (RFC 2616,
> S14.42). One request specifies an upgrade, and the next request should be
> handled by a different protocol.

That is possible with the current code.

> * CORE_IN is returning a new socket bucket each time. That socket bucket
>   goes into a brigade associated with the request, so it gets destroyed
> with the request. Not really a problem since the actual socket is cleared
> by a pool, not by the bucket. When the next request asks for more data, it
> will get a new socket bucket. I don't think this is The Right Thing, but
> until we pass an "amount requested" parameter through the input filter
> chain, this will work quite fine.

I don't mind this too much, because of the free list that Cliff is implementing.

> I need to review the code some more. As Justin pointed out, the massive
> change to http_protocol.c makes that hard to review. Other notes may be
> coming soon...

I'm still reviewing too.

Ryan

______________________________________________________________
Ryan Bloom				rbb@apache.org
Covalent Technologies			rbb@covalent.net
--------------------------------------------------------------

Re: [PATCH] Take 2 of the http filter rewrite

Posted by Greg Stein <gs...@lyra.org>.
On Sun, Sep 23, 2001 at 08:34:00PM -0700, Ryan Bloom wrote:
> On Sunday 23 September 2001 06:58 pm, Justin Erenkrantz wrote:
>...
> > Other significant changes is that the HTTP_IN (ap_http_filter) is
> > now a request-level filter rather than a connection-level filter.
> > This makes a lot more sense to me.

+1

>...
> I don't see how this works at all for pipelined requests.  You moved the HTTP_IN
> filter from a connection filter to a request filter. But, the only connection filter we
> have is the CORE_iN filter.  All that filter does, is to pass the socket up to the
> HTTP_IN filter. The problem is that HTTP_IN will be destroyed in between requests,
> so the second request will be lost.  I am continuing to read the code, so I may see
> what I am missing, but in the mean time, Justin, if you could explain that to me,
> it would make my review go a lot faster.

There is a create_request hook which inserts the HTTP_IN filter when the
request is created.

This seems like a good idea: each instantiation of the HTTP_IN filter is for
*one* request and one request only. We can be much more sure that state is
not spanning requests.

Even better, this enables things like the Upgrade: header (RFC 2616,
S14.42). One request specifies an upgrade, and the next request should be
handled by a different protocol.

[ of course, that still won't work since we'll continue inserting the
  HTTP_IN filter for each request, but making HTTP_IN a request filter takes
  us a step in the right direction. ]

Along the mantra of "an input filter should not return more than is asked
for", we can also be sure that the HTTP_IN filter is going to receive
exactly the amount of data that it asked for, and no more. Thus, the next
request will still be sitting in the next filter down.


Okay. Now all that said. Some random notes about the current code:

* CORE_IN is returning a new socket bucket each time. That socket bucket
  goes into a brigade associated with the request, so it gets destroyed with
  the request. Not really a problem since the actual socket is cleared by a
  pool, not by the bucket. When the next request asks for more data, it will
  get a new socket bucket. I don't think this is The Right Thing, but until
  we pass an "amount requested" parameter through the input filter chain,
  this will work quite fine.

* The uses of ap_getline are destined to eventual failure, which I've noted
  before. They always read from the top of the filter stack -- even if you
  have a mid-level filter (HTTP_IN) trying to read a line(!). This is solved
  again by passing "amount requested" down thru the input chain, where one
  of the "amounts" is a qualified constant saying "one line". We support
  that read-request mode through a brigade function to read a line into a
  pool-allocated buffer, into a provided buffer, or maybe into a new brigade
  (e.g. split a brigade at the newline point, and return the initial
  brigade).


I need to review the code some more. As Justin pointed out, the massive
change to http_protocol.c makes that hard to review. Other notes may be
coming soon...

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: [PATCH] Take 2 of the http filter rewrite

Posted by Ryan Bloom <rb...@covalent.net>.
On Sunday 23 September 2001 06:58 pm, Justin Erenkrantz wrote:
> Take two.
>
> As Greg suggested, this merges the dechunk and core filters back
> to one filter.  (This makes the diff in http_protocol.c a bit
> gnarly.)
>
> Other significant changes is that the HTTP_IN (ap_http_filter) is
> now a request-level filter rather than a connection-level filter.
> This makes a lot more sense to me.
>
> This passes all of the tests in httpd-test (not to say that this
> is perfect or anything).  There is a patch at the bottom for
> mod_input_body_filter.c in httpd-test that is required.  (It was
> expecting > readbytes to be read when asking for readbytes).
>
> Comments and feedback?  -- justin

I don't see how this works at all for pipelined requests.  You moved the HTTP_IN
filter from a connection filter to a request filter. But, the only connection filter we
have is the CORE_iN filter.  All that filter does, is to pass the socket up to the
HTTP_IN filter. The problem is that HTTP_IN will be destroyed in between requests,
so the second request will be lost.  I am continuing to read the code, so I may see
what I am missing, but in the mean time, Justin, if you could explain that to me,
it would make my review go a lot faster.

Ryan
______________________________________________________________
Ryan Bloom				rbb@apache.org
Covalent Technologies			rbb@covalent.net
--------------------------------------------------------------

Re: [PATCH] Take 2 of the http filter rewrite

Posted by Cliff Woolley <cl...@yahoo.com>.
On Sun, 23 Sep 2001, Justin Erenkrantz wrote:

> Index: buckets/apr_buckets_pipe.c
> ===================================================================
> RCS file: /home/cvs/apr-util/buckets/apr_buckets_pipe.c,v
> retrieving revision 1.41
> diff -u -r1.41 apr_buckets_pipe.c
> --- buckets/apr_buckets_pipe.c	2001/09/22 22:36:07	1.41
> +++ buckets/apr_buckets_pipe.c	2001/09/24 01:48:04
> @@ -110,10 +110,6 @@
>          *str = a->data;
>          if (rv == APR_EOF) {
>              apr_file_close(p);
> -            if (block == APR_NONBLOCK_READ) {
> -                /* XXX: this is bogus, should return APR_SUCCESS */
> -                return APR_EOF;
> -            }
>          }
>      }
>      return APR_SUCCESS;

PS: ++1 to getting rid of these damned things.  Good riddance!  =-)

We should grep through the code after this is committed to look for places
that actually accounted for apr_bucket_read() returning APR_EOF, since
they shouldn't have to do that.  I know there are at least a few of them
hanging around out there (maybe the chunk output filter?).

--Cliff


--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA