You are viewing a plain text version of this content. The canonical link for it is here.
Posted to cvs@httpd.apache.org by br...@apache.org on 2006/01/01 00:45:14 UTC

svn commit: r360461 - in /httpd/httpd/trunk: CHANGES include/ap_mmn.h include/httpd.h server/protocol.c

Author: brianp
Date: Sat Dec 31 15:45:11 2005
New Revision: 360461

URL: http://svn.apache.org/viewcvs?rev=360461&view=rev
Log:
Refactoring of ap_read_request() to store partial request state
in the request rec.  The point of this is to allow asynchronous
MPMs do do nonblocking reads of requests.  (Backported from the
async-read-dev branch)

Modified:
    httpd/httpd/trunk/CHANGES
    httpd/httpd/trunk/include/ap_mmn.h
    httpd/httpd/trunk/include/httpd.h
    httpd/httpd/trunk/server/protocol.c

Modified: httpd/httpd/trunk/CHANGES
URL: http://svn.apache.org/viewcvs/httpd/httpd/trunk/CHANGES?rev=360461&r1=360460&r2=360461&view=diff
==============================================================================
--- httpd/httpd/trunk/CHANGES [utf-8] (original)
+++ httpd/httpd/trunk/CHANGES [utf-8] Sat Dec 31 15:45:11 2005
@@ -2,6 +2,9 @@
 Changes with Apache 2.3.0
   [Remove entries to the current 2.0 and 2.2 section below, when backported]
 
+  *) Refactored ap_read_request() to provide a foundation for
+     nonblocking reads of requests.  [Brian Pane]
+
   *) If a connection is aborted while waiting for a chunked line, flag the
      connection as errored out.  [Justin Erenkrantz]
 

Modified: httpd/httpd/trunk/include/ap_mmn.h
URL: http://svn.apache.org/viewcvs/httpd/httpd/trunk/include/ap_mmn.h?rev=360461&r1=360460&r2=360461&view=diff
==============================================================================
--- httpd/httpd/trunk/include/ap_mmn.h (original)
+++ httpd/httpd/trunk/include/ap_mmn.h Sat Dec 31 15:45:11 2005
@@ -109,12 +109,14 @@
  * 20051005.0 (2.1.8-dev) NET_TIME filter eliminated
  * 20051005.0 (2.3.0-dev) Bump MODULE_MAGIC_COOKIE to "AP24"!
  * 20051115.0 (2.3.0-dev) Added use_canonical_phys_port to core_dir_config
+ * 20051231.0 (2.3.0-dev) Added num_blank_lines, pending_header_line, and
+ *                        pending_header_size to request_rec
  */
 
 #define MODULE_MAGIC_COOKIE 0x41503234UL /* "AP24" */
 
 #ifndef MODULE_MAGIC_NUMBER_MAJOR
-#define MODULE_MAGIC_NUMBER_MAJOR 20051115
+#define MODULE_MAGIC_NUMBER_MAJOR 20051231
 #endif
 #define MODULE_MAGIC_NUMBER_MINOR 0                     /* 0...n */
 

Modified: httpd/httpd/trunk/include/httpd.h
URL: http://svn.apache.org/viewcvs/httpd/httpd/trunk/include/httpd.h?rev=360461&r1=360460&r2=360461&view=diff
==============================================================================
--- httpd/httpd/trunk/include/httpd.h (original)
+++ httpd/httpd/trunk/include/httpd.h Sat Dec 31 15:45:11 2005
@@ -967,6 +967,16 @@
     /** A flag to determine if the eos bucket has been sent yet */
     int eos_sent;
 
+    /** Number of leading blank lines encountered before request header */
+    int num_blank_lines;
+
+    /** A buffered header line, used to support header folding while
+     *  reading the request */
+    char *pending_header_line;
+
+    /** If nonzero, the number of bytes allocated to hold pending_header_line */
+    apr_size_t pending_header_size;
+
 /* Things placed at the end of the record to avoid breaking binary
  * compatibility.  It would be nice to remember to reorder the entire
  * record to improve 64bit alignment the next time we need to break

Modified: httpd/httpd/trunk/server/protocol.c
URL: http://svn.apache.org/viewcvs/httpd/httpd/trunk/server/protocol.c?rev=360461&r1=360460&r2=360461&view=diff
==============================================================================
--- httpd/httpd/trunk/server/protocol.c (original)
+++ httpd/httpd/trunk/server/protocol.c Sat Dec 31 15:45:11 2005
@@ -543,80 +543,20 @@
     }
 }
 
-static int read_request_line(request_rec *r, apr_bucket_brigade *bb)
+static void set_the_request(request_rec *r, char *line)
 {
     const char *ll;
     const char *uri;
     const char *pro;
-
-#if 0
-    conn_rec *conn = r->connection;
-#endif
     int major = 1, minor = 0;   /* Assume HTTP/1.0 if non-"HTTP" protocol */
     char http[5];
     apr_size_t len;
-    int num_blank_lines = 0;
-    int max_blank_lines = r->server->limit_req_fields;
-
-    if (max_blank_lines <= 0) {
-        max_blank_lines = DEFAULT_LIMIT_REQUEST_FIELDS;
-    }
-
-    /* Read past empty lines until we get a real request line,
-     * a read error, the connection closes (EOF), or we timeout.
-     *
-     * We skip empty lines because browsers have to tack a CRLF on to the end
-     * of POSTs to support old CERN webservers.  But note that we may not
-     * have flushed any previous response completely to the client yet.
-     * We delay the flush as long as possible so that we can improve
-     * performance for clients that are pipelining requests.  If a request
-     * is pipelined then we won't block during the (implicit) read() below.
-     * If the requests aren't pipelined, then the client is still waiting
-     * for the final buffer flush from us, and we will block in the implicit
-     * read().  B_SAFEREAD ensures that the BUFF layer flushes if it will
-     * have to block during a read.
-     */
-
-    do {
-        apr_status_t rv;
-
-        /* insure ap_rgetline allocates memory each time thru the loop
-         * if there are empty lines
-         */
-        r->the_request = NULL;
-        rv = ap_rgetline(&(r->the_request), (apr_size_t)(r->server->limit_req_line + 2),
-                         &len, r, 0, bb);
-
-        if (rv != APR_SUCCESS) {
-            r->request_time = apr_time_now();
-
-            /* ap_rgetline returns APR_ENOSPC if it fills up the
-             * buffer before finding the end-of-line.  This is only going to
-             * happen if it exceeds the configured limit for a request-line.
-             */
-            if (rv == APR_ENOSPC) {
-                r->status    = HTTP_REQUEST_URI_TOO_LARGE;
-                r->proto_num = HTTP_VERSION(1,0);
-                r->protocol  = apr_pstrdup(r->pool, "HTTP/1.0");
-            }
-            return 0;
-        }
-    } while ((len <= 0) && (++num_blank_lines < max_blank_lines));
-
-    /* we've probably got something to do, ignore graceful restart requests */
+    apr_socket_t *csd;
 
     r->request_time = apr_time_now();
-    ll = r->the_request;
+    ll = r->the_request = line;
     r->method = ap_getword_white(r->pool, &ll);
 
-#if 0
-/* XXX If we want to keep track of the Method, the protocol module should do
- * it.  That support isn't in the scoreboard yet.  Hopefully next week
- * sometime.   rbb */
-    ap_update_connection_status(AP_CHILD_THREAD_FROM_ID(conn->id), "Method",
-                                r->method);
-#endif
-
     uri = ap_getword_white(r->pool, &ll);
 
     /* Provide quick information about the request method as soon as known */
@@ -639,8 +579,6 @@
     }
     r->protocol = apr_pstrmemdup(r->pool, pro, len);
 
-    /* XXX ap_update_connection_status(conn->id, "Protocol", r->protocol); */
-
     /* Avoid sscanf in the common case */
     if (len == 8
         && pro[0] == 'H' && pro[1] == 'T' && pro[2] == 'T' && pro[3] == 'P'
@@ -650,24 +588,172 @@
     }
     else if (3 == sscanf(r->protocol, "%4s/%u.%u", http, &major, &minor)
              && (strcasecmp("http", http) == 0)
-             && (minor < HTTP_VERSION(1, 0)) ) /* don't allow HTTP/0.1000 */
+             && (minor < HTTP_VERSION(1, 0)) ) { /* don't allow HTTP/0.1000 */
         r->proto_num = HTTP_VERSION(major, minor);
-    else
+    }
+    else {
         r->proto_num = HTTP_VERSION(1, 0);
+    }
 
-    return 1;
+    /* We may have been in keep_alive_timeout mode, so toggle back
+     * to the normal timeout mode as we fetch the header lines,
+     * as necessary.
+     */
+    csd = ap_get_module_config(r->connection->conn_config, &core_module);
+    apr_socket_timeout_set(csd, r->connection->base_server->timeout);
+
+    return;
+}
+
+static apr_status_t set_mime_header(request_rec *r, char *line)
+{
+    char *value, *tmp_field;
+
+    if (r->server->limit_req_fields) {
+        const apr_array_header_t *mime_headers = apr_table_elts(r->headers_in);
+        if (mime_headers->nelts >= r->server->limit_req_fields) {
+            apr_table_setn(r->notes, "error-notes",
+                           "The number of request header fields "
+                           "exceeds this server's limit.");
+            r->status = HTTP_BAD_REQUEST;
+            return APR_ENOSPC;
+        }
+    }
+    
+    if (!(value = strchr(line, ':'))) { /* Find ':' or    */
+        r->status = HTTP_BAD_REQUEST;      /* abort bad request */
+        apr_table_setn(r->notes, "error-notes",
+                       apr_pstrcat(r->pool,
+                                   "Request header field is "
+                                   "missing ':' separator.<br />\n"
+                                   "<pre>\n",
+                                   ap_escape_html(r->pool, line),
+                                   "</pre>\n", NULL));
+        return APR_EGENERAL;
+    }
+
+    tmp_field = value - 1; /* last character of field-name */
+
+    *value++ = '\0'; /* NUL-terminate at colon */
+
+    while (*value == ' ' || *value == '\t') {
+        ++value;            /* Skip to start of value   */
+    }
+
+    /* Strip LWS after field-name: */
+    while (tmp_field > line
+           && (*tmp_field == ' ' || *tmp_field == '\t')) {
+        *tmp_field-- = '\0';
+    }
+
+    /* Strip LWS after field-value: */
+    tmp_field = strchr(value, '\0') - 1;
+    while (tmp_field > value
+           && (*tmp_field == ' ' || *tmp_field == '\t')) {
+        *tmp_field-- = '\0';
+    }
+
+    apr_table_addn(r->headers_in, line, value);
+    return APR_SUCCESS;
+}
+
+static apr_status_t process_request_line(request_rec *r, char *line,
+                                         int skip_first)
+{
+    if (!skip_first && (r->the_request == NULL)) {
+        /* This is the first line of the request */
+        if ((line == NULL) || (*line == '\0')) {
+            /* We skip empty lines because browsers have to tack a CRLF on to the end
+             * of POSTs to support old CERN webservers.
+             */
+            int max_blank_lines = r->server->limit_req_fields;
+            if (max_blank_lines <= 0) {
+                max_blank_lines = DEFAULT_LIMIT_REQUEST_FIELDS;
+            }
+            r->num_blank_lines++;
+            if (r->num_blank_lines < max_blank_lines) {
+                return APR_SUCCESS;
+            }
+        }
+        set_the_request(r, line);
+    }
+    else {
+        /* We've already read the first line of the request.  This is either
+         * a header field or the blank line terminating the header
+         */
+        if ((line == NULL) || (*line == '\0')) {
+            if (r->pending_header_line != NULL) {
+                apr_status_t rv = set_mime_header(r, r->pending_header_line);
+                if (rv != APR_SUCCESS) {
+                    return rv;
+                }
+                r->pending_header_line = NULL;
+            }
+            if (r->status == HTTP_REQUEST_TIME_OUT) {
+                apr_table_compress(r->headers_in, APR_OVERLAP_TABLES_MERGE);
+                r->status = HTTP_OK;
+            }
+        }
+        else {
+            if ((*line == ' ') || (*line == '\t')) {
+                /* This line is a continuation of the previous one */
+                if (r->pending_header_line == NULL) {
+                    r->pending_header_line = line;
+                    r->pending_header_size = 0;
+                }
+                else {
+                    apr_size_t pending_len = strlen(r->pending_header_line);
+                    apr_size_t fold_len = strlen(line);
+                    if (pending_len + fold_len >
+                        r->server->limit_req_fieldsize) {
+                        /* CVE-2004-0942 */
+                        r->status = HTTP_BAD_REQUEST;
+                        return APR_ENOSPC;
+                    }
+                    if (pending_len + fold_len + 1 > r->pending_header_size) {
+                        /* Allocate a new buffer big enough to hold the
+                         * concatenated lines
+                         */
+                        apr_size_t new_size = r->pending_header_size;
+                        char *new_buf;
+                        if (new_size == 0) {
+                            new_size = pending_len + fold_len + 1;
+                        }
+                        else {
+                            do {
+                                new_size *= 2;
+                            } while (new_size < pending_len + fold_len + 1);
+                        }
+                        new_buf = (char *)apr_palloc(r->pool, new_size);
+                        memcpy(new_buf, r->pending_header_line, pending_len);
+                        r->pending_header_line = new_buf;
+                        r->pending_header_size = new_size;
+                    }
+                    memcpy(r->pending_header_line + pending_len, line,
+                           fold_len + 1);
+                }
+            }
+            else {
+                /* Set aside this line in case the next line is a continuation
+                 */
+                if (r->pending_header_line != NULL) {
+                    apr_status_t rv = set_mime_header(r, r->pending_header_line);
+                    if (rv != APR_SUCCESS) {
+                        return rv;
+                    }
+                }
+                r->pending_header_line = line;
+                r->pending_header_size = 0;
+            }
+        }
+    }
+    return APR_SUCCESS;
 }
 
 AP_DECLARE(void) ap_get_mime_headers_core(request_rec *r, apr_bucket_brigade *bb)
 {
-    char *last_field = NULL;
-    apr_size_t last_len = 0;
-    apr_size_t alloc_len = 0;
     char *field;
-    char *value;
     apr_size_t len;
-    int fields_read = 0;
-    char *tmp_field;
 
     /*
      * Read header lines until we get the empty separator line, a read error,
@@ -675,7 +761,6 @@
      */
     while(1) {
         apr_status_t rv;
-        int folded = 0;
 
         field = NULL;
         rv = ap_rgetline(&field, r->server->limit_req_fieldsize + 2,
@@ -701,120 +786,14 @@
             }
             return;
         }
-
-        if (last_field != NULL) {
-            if ((len > 0) && ((*field == '\t') || *field == ' ')) {
-                /* This line is a continuation of the preceding line(s),
-                 * so append it to the line that we've set aside.
-                 * Note: this uses a power-of-two allocator to avoid
-                 * doing O(n) allocs and using O(n^2) space for
-                 * continuations that span many many lines.
-                 */
-                apr_size_t fold_len = last_len + len + 1; /* trailing null */
-
-                if (fold_len >= (apr_size_t)(r->server->limit_req_fieldsize)) {
-                    r->status = HTTP_BAD_REQUEST;
-                    /* report what we have accumulated so far before the
-                     * overflow (last_field) as the field with the problem
-                     */
-                    apr_table_setn(r->notes, "error-notes",
-                                   apr_pstrcat(r->pool,
-                                               "Size of a request header field "
-                                               "after folding "
-                                               "exceeds server limit.<br />\n"
-                                               "<pre>\n",
-                                               ap_escape_html(r->pool, last_field),
-                                               "</pre>\n", NULL));
-                    return;
-                }
-
-                if (fold_len > alloc_len) {
-                    char *fold_buf;
-                    alloc_len += alloc_len;
-                    if (fold_len > alloc_len) {
-                        alloc_len = fold_len;
-                    }
-                    fold_buf = (char *)apr_palloc(r->pool, alloc_len);
-                    memcpy(fold_buf, last_field, last_len);
-                    last_field = fold_buf;
-                }
-                memcpy(last_field + last_len, field, len +1); /* +1 for nul */
-                last_len += len;
-                folded = 1;
-            }
-            else /* not a continuation line */ {
-
-                if (r->server->limit_req_fields
-                    && (++fields_read > r->server->limit_req_fields)) {
-                    r->status = HTTP_BAD_REQUEST;
-                    apr_table_setn(r->notes, "error-notes",
-                                   "The number of request header fields "
-                                   "exceeds this server's limit.");
-                    return;
-                }
-
-                if (!(value = strchr(last_field, ':'))) { /* Find ':' or    */
-                    r->status = HTTP_BAD_REQUEST;      /* abort bad request */
-                    apr_table_setn(r->notes, "error-notes",
-                                   apr_pstrcat(r->pool,
-                                               "Request header field is "
-                                               "missing ':' separator.<br />\n"
-                                               "<pre>\n",
-                                               ap_escape_html(r->pool,
-                                                              last_field),
-                                               "</pre>\n", NULL));
-                    return;
-                }
-
-                tmp_field = value - 1; /* last character of field-name */
-
-                *value++ = '\0'; /* NUL-terminate at colon */
-
-                while (*value == ' ' || *value == '\t') {
-                    ++value;            /* Skip to start of value   */
-                }
-
-                /* Strip LWS after field-name: */
-                while (tmp_field > last_field
-                       && (*tmp_field == ' ' || *tmp_field == '\t')) {
-                    *tmp_field-- = '\0';
-                }
-
-                /* Strip LWS after field-value: */
-                tmp_field = last_field + last_len - 1;
-                while (tmp_field > value
-                       && (*tmp_field == ' ' || *tmp_field == '\t')) {
-                    *tmp_field-- = '\0';
-                }
-
-                apr_table_addn(r->headers_in, last_field, value);
-
-                /* reset the alloc_len so that we'll allocate a new
-                 * buffer if we have to do any more folding: we can't
-                 * use the previous buffer because its contents are
-                 * now part of r->headers_in
-                 */
-                alloc_len = 0;
-
-            } /* end if current line is not a continuation starting with tab */
-        }
-
-        /* Found a blank line, stop. */
-        if (len == 0) {
-            break;
+        rv = process_request_line(r, field, 1);
+        if (rv != APR_SUCCESS) {
+            return;
         }
-
-        /* Keep track of this line so that we can parse it on
-         * the next loop iteration.  (In the folded case, last_field
-         * has been updated already.)
-         */
-        if (!folded) {
-            last_field = field;
-            last_len = len;
+        if ((field == NULL) || (*field == '\0')) {
+            return;
         }
     }
-
-    apr_table_compress(r->headers_in, APR_OVERLAP_TABLES_MERGE);
 }
 
 AP_DECLARE(void) ap_get_mime_headers(request_rec *r)
@@ -825,16 +804,11 @@
     apr_brigade_destroy(tmp_bb);
 }
 
-request_rec *ap_read_request(conn_rec *conn)
+static request_rec *init_request(conn_rec *conn)
 {
     request_rec *r;
     apr_pool_t *p;
-    const char *expect;
-    int access_status;
-    apr_bucket_brigade *tmp_bb;
-    apr_socket_t *csd;
-    apr_interval_time_t cur_timeout;
-
+    
     apr_pool_create(&p, conn->pool);
     apr_pool_tag(p, "request");
     r = apr_pcalloc(p, sizeof(request_rec));
@@ -876,77 +850,14 @@
      */
     r->used_path_info = AP_REQ_DEFAULT_PATH_INFO;
 
-    tmp_bb = apr_brigade_create(r->pool, r->connection->bucket_alloc);
-
-    /* Get the request... */
-    if (!read_request_line(r, tmp_bb)) {
-        if (r->status == HTTP_REQUEST_URI_TOO_LARGE) {
-            ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r,
-                          "request failed: URI too long (longer than %d)", r->server->limit_req_line);
-            ap_send_error_response(r, 0);
-            ap_update_child_status(conn->sbh, SERVER_BUSY_LOG, r);
-            ap_run_log_transaction(r);
-            apr_brigade_destroy(tmp_bb);
-            return r;
-        }
-
-        apr_brigade_destroy(tmp_bb);
-        return NULL;
-    }
-
-    /* We may have been in keep_alive_timeout mode, so toggle back
-     * to the normal timeout mode as we fetch the header lines,
-     * as necessary.
-     */
-    csd = ap_get_module_config(conn->conn_config, &core_module);
-    apr_socket_timeout_get(csd, &cur_timeout);
-    if (cur_timeout != conn->base_server->timeout) {
-        apr_socket_timeout_set(csd, conn->base_server->timeout);
-        cur_timeout = conn->base_server->timeout;
-    }
-
-    if (!r->assbackwards) {
-        ap_get_mime_headers_core(r, tmp_bb);
-        if (r->status != HTTP_REQUEST_TIME_OUT) {
-            ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r,
-                          "request failed: error reading the headers");
-            ap_send_error_response(r, 0);
-            ap_update_child_status(conn->sbh, SERVER_BUSY_LOG, r);
-            ap_run_log_transaction(r);
-            apr_brigade_destroy(tmp_bb);
-            return r;
-        }
-
-        if (apr_table_get(r->headers_in, "Transfer-Encoding")
-            && apr_table_get(r->headers_in, "Content-Length")) {
-            /* 2616 section 4.4, point 3: "if both Transfer-Encoding
-             * and Content-Length are received, the latter MUST be
-             * ignored"; so unset it here to prevent any confusion
-             * later. */
-            apr_table_unset(r->headers_in, "Content-Length");
-        }
-    }
-    else {
-        if (r->header_only) {
-            /*
-             * Client asked for headers only with HTTP/0.9, which doesn't send
-             * headers! Have to dink things just to make sure the error message
-             * comes through...
-             */
-            ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r,
-                          "client sent invalid HTTP/0.9 request: HEAD %s",
-                          r->uri);
-            r->header_only = 0;
-            r->status = HTTP_BAD_REQUEST;
-            ap_send_error_response(r, 0);
-            ap_update_child_status(conn->sbh, SERVER_BUSY_LOG, r);
-            ap_run_log_transaction(r);
-            apr_brigade_destroy(tmp_bb);
-            return r;
-        }
-    }
+    return r;
+}
 
-    apr_brigade_destroy(tmp_bb);
+static request_rec *request_post_read(request_rec *r, conn_rec *conn)
+{
+    apr_socket_t *csd;
+    const char *expect;
+    int access_status;
 
     r->status = HTTP_OK;                         /* Until further notice. */
 
@@ -958,9 +869,17 @@
     /* Toggle to the Host:-based vhost's timeout mode to fetch the
      * request body and send the response body, if needed.
      */
-    if (cur_timeout != r->server->timeout) {
-        apr_socket_timeout_set(csd, r->server->timeout);
-        cur_timeout = r->server->timeout;
+    csd = ap_get_module_config(conn->conn_config, &core_module);
+    apr_socket_timeout_set(csd, r->server->timeout);
+
+    if (apr_table_get(r->headers_in, "Transfer-Encoding") &&
+        apr_table_get(r->headers_in, "Content-Length")) {
+        /* 2616 section 4.4, point 3: "if both Transfer-Encoding
+         * and Content-Length are received, the latter MUST be
+         * ignored"; so unset it here to prevent any confusion
+         * later.
+         */
+        apr_table_unset(r->headers_in, "Content-Length");
     }
 
     /* we may have switched to another server */
@@ -1023,8 +942,102 @@
 
     ap_add_input_filter_handle(ap_http_input_filter_handle,
                                NULL, r, r->connection);
-
     return r;
+}
+
+static apr_status_t read_partial_request(request_rec *r) {
+    apr_bucket_brigade *tmp_bb;
+    apr_status_t rv = APR_SUCCESS;
+
+    /* Read and process lines of the request until we
+     * encounter a complete request header, an error, or EAGAIN
+     */
+    tmp_bb = apr_brigade_create(r->pool, r->connection->bucket_alloc);
+    while (r->status == HTTP_REQUEST_TIME_OUT) {
+        char *line = NULL;
+        apr_size_t line_length;
+        apr_size_t length_limit;
+        int first_line = (r->the_request == NULL);
+        if (first_line) {
+            length_limit = r->server->limit_req_line;
+        }
+        else {
+            if (r->assbackwards) {
+                r->status = HTTP_OK;
+                break;
+            }
+            length_limit = r->server->limit_req_fieldsize;
+        }
+        /* TODO: use a nonblocking call in place of ap_rgetline */
+        rv = ap_rgetline(&line, length_limit + 2,
+                         &line_length, r, 0, tmp_bb);
+        if (rv == APR_SUCCESS) {
+            rv = process_request_line(r, line, 0);
+        }
+        if (rv != APR_SUCCESS) {
+            r->request_time = apr_time_now();
+            /* ap_rgetline returns APR_ENOSPC if it fills up the
+             * buffer before finding the end-of-line.  This is only going to
+             * happen if it exceeds the configured limit for a request-line.
+             */
+            if (rv == APR_ENOSPC) {
+                if (first_line) {
+                    ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r,
+                                  "request failed: URI too long (longer than %d)",
+                                  r->server->limit_req_line);
+                    r->status    = HTTP_REQUEST_URI_TOO_LARGE;
+                    r->proto_num = HTTP_VERSION(1,0);
+                    r->protocol  = apr_pstrdup(r->pool, "HTTP/1.0");
+                }
+                else {
+                    ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r,
+                                  "request failed: header line too long "
+                                  "(longer than %d)",
+                                  r->server->limit_req_fieldsize);
+                    r->status = HTTP_BAD_REQUEST;
+                }
+            }
+            break;
+        }
+    }
+    apr_brigade_destroy(tmp_bb);
+    return rv;
+}
+
+request_rec *ap_read_request(conn_rec *conn)
+{
+    request_rec *r;
+    apr_status_t rv;
+
+    r = init_request(conn);
+
+    rv = read_partial_request(r);
+    /* TODO poll if EAGAIN */
+    if (r->status != HTTP_OK) {
+        ap_send_error_response(r, 0);
+        ap_update_child_status(conn->sbh, SERVER_BUSY_LOG, r);
+        ap_run_log_transaction(r);
+        return NULL;
+    }
+
+    if (r->assbackwards && r->header_only) {
+        /*
+         * Client asked for headers only with HTTP/0.9, which doesn't send
+         * headers! Have to dink things just to make sure the error message
+         * comes through...
+         */
+        ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r,
+                      "client sent invalid HTTP/0.9 request: HEAD %s",
+                      r->uri);
+        r->header_only = 0;
+        r->status = HTTP_BAD_REQUEST;
+        ap_send_error_response(r, 0);
+        ap_update_child_status(conn->sbh, SERVER_BUSY_LOG, r);
+        ap_run_log_transaction(r);
+        return r;
+    }
+
+    return request_post_read(r, conn);
 }
 
 /* if a request with a body creates a subrequest, clone the original request's



async reads - request for comments Re: svn commit: r360461 - in /httpd/httpd/trunk: CHANGES include/ap_mmn.h include/httpd.h server/protocol.c

Posted by Brian Pane <br...@apache.org>.
I'll probably have time to work on the changes to ap_core_input_filter 
and/or
the async read code this weekend (starting on the async-read-dev branch).
If anyone has a strong argument for or against putting the buffering of
partial lines inside the ap_core_input_filter context, please speak up.

Thanks,
Brian

On Jan 7, 2006, at 11:57 PM, Brian Pane wrote:

> On Jan 3, 2006, at 8:07 AM, Justin Erenkrantz wrote:
>
>> AFAICT, ap_read_async_request() on the branch can't handle a partial
>> line correctly.
>>
>> Noting of course that ap_core_input_filter is 'cute' and masks EAGAIN.
>>  So, you'll never see EAGAIN from this code path!  As I said earlier,
>> the EAGAIN logic in httpd is completely suspect.
>>
>> Furthermore, as I read it, ap_read_async_request is assuming that it
>> gets a complete line from getline_nonblocking - which almost certainly
>> won't be the case.  -- justin
>
>
> I'm currently working on changing ap_core_input_filter so that it
> doesn't mask the EAGAIN in AP_MODE_GETLINE mode.  There's
> a catch, though:
>
>     if (mode == AP_MODE_GETLINE) {
>         /* we are reading a single LF line, e.g. the HTTP headers */
>         rv = apr_brigade_split_line(b, ctx->b, block, HUGE_STRING_LEN);
>
> if apr_brigade_split_line returns APR_EAGAIN, it will have
> consumed the partial line and put it in b.  So if core_input_filter
> returns rv at this point, the caller will receive the partial line and
> EAGAIN.  We'll need to do one of two things:
>
> - Buffer the partial line in ap_core_input_filter, by removing the
>   buckets from b and putting them back at the start of ctx->b.
>
> - Or buffer the partial line in getline_nonblocking (or  
> read_partial_request).
>
> Anybody have a preference among these options?
>
> Brian
>


Re: svn commit: r360461 - in /httpd/httpd/trunk: CHANGES include/ap_mmn.h include/httpd.h server/protocol.c

Posted by Brian Pane <br...@apache.org>.
On Jan 3, 2006, at 8:07 AM, Justin Erenkrantz wrote:

> AFAICT, ap_read_async_request() on the branch can't handle a partial
> line correctly.
>
> Noting of course that ap_core_input_filter is 'cute' and masks EAGAIN.
>  So, you'll never see EAGAIN from this code path!  As I said earlier,
> the EAGAIN logic in httpd is completely suspect.
>
> Furthermore, as I read it, ap_read_async_request is assuming that it
> gets a complete line from getline_nonblocking - which almost certainly
> won't be the case.  -- justin

I'm currently working on changing ap_core_input_filter so that it
doesn't mask the EAGAIN in AP_MODE_GETLINE mode.  There's
a catch, though:

     if (mode == AP_MODE_GETLINE) {
         /* we are reading a single LF line, e.g. the HTTP headers */
         rv = apr_brigade_split_line(b, ctx->b, block, HUGE_STRING_LEN);

if apr_brigade_split_line returns APR_EAGAIN, it will have
consumed the partial line and put it in b.  So if core_input_filter
returns rv at this point, the caller will receive the partial line and
EAGAIN.  We'll need to do one of two things:

- Buffer the partial line in ap_core_input_filter, by removing the
   buckets from b and putting them back at the start of ctx->b.

- Or buffer the partial line in getline_nonblocking (or  
read_partial_request).

Anybody have a preference among these options?

Brian


Re: svn commit: r360461 - in /httpd/httpd/trunk: CHANGES include/ap_mmn.h include/httpd.h server/protocol.c

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
On 1/3/06, Brian Pane <br...@apache.org> wrote:

> Yeah, setting r->status to HTTP_OK is done here solely to make it work
> with the existing logic about HTTP_REQUEST_TIME_OUT meaning "still
> reading the request header."
>
> +1 for of removing the HTTP_REQUEST_TIME_OUT hack.  I was trying
> to be conservative by preserving that part of the original logic, but
> now
> that you mention it, we might as well replace it with something less
> subtle
> in 2.3.  This will break any 3rd party modules that depend upon the
> HTTP_REQUEST_TIME_OUT convention, but for a major release
> like 2.4 or 3.0 that's a defensible choice.

+1.

> I'm relieved to hear that it's not async, since you're looking at the
> blocking
> version.  :-)  See ap_read_async_request() (still on the async-read-
> dev branch).

AFAICT, ap_read_async_request() on the branch can't handle a partial
line correctly.

Noting of course that ap_core_input_filter is 'cute' and masks EAGAIN.
 So, you'll never see EAGAIN from this code path!  As I said earlier,
the EAGAIN logic in httpd is completely suspect.

Furthermore, as I read it, ap_read_async_request is assuming that it
gets a complete line from getline_nonblocking - which almost certainly
won't be the case.  -- justin

Re: svn commit: r360461 - in /httpd/httpd/trunk: CHANGES include/ap_mmn.h include/httpd.h server/protocol.c

Posted by Brian Pane <br...@apache.org>.
On Jan 2, 2006, at 3:41 PM, Justin Erenkrantz wrote:

>> +static apr_status_t process_request_line(request_rec *r, char *line,
>> +                                         int skip_first)
>> +{
>> +    if (!skip_first && (r->the_request == NULL)) {
>> +        /* This is the first line of the request */
>> +        if ((line == NULL) || (*line == '\0')) {
>> +            /* We skip empty lines because browsers have to tack  
>> a CRLF
>> on to the end +             * of POSTs to support old CERN  
>> webservers.
>> +             */
>> +            int max_blank_lines = r->server->limit_req_fields;
>> +            if (max_blank_lines <= 0) {
>> +                max_blank_lines = DEFAULT_LIMIT_REQUEST_FIELDS;
>> +            }
>> +            r->num_blank_lines++;
>> +            if (r->num_blank_lines < max_blank_lines) {
>> +                return APR_SUCCESS;
>> +            }
>> +        }
>> +        set_the_request(r, line);
>> +    }
>
> This will cause a segfault if line is null and we are at or above  
> max_blank_lines.  Perhaps you meant for an else clause here?

Yes, thanks for catching that.

>
>> +    else {
>> +        /* We've already read the first line of the request.   
>> This is
>> either +         * a header field or the blank line terminating  
>> the header
>> +         */
>> +        if ((line == NULL) || (*line == '\0')) {
>> +            if (r->pending_header_line != NULL) {
>> +                apr_status_t rv = set_mime_header(r,
>> r->pending_header_line); +                if (rv != APR_SUCCESS) {
>> +                    return rv;
>> +                }
>> +                r->pending_header_line = NULL;
>> +            }
>> +            if (r->status == HTTP_REQUEST_TIME_OUT) {
>> +                apr_table_compress(r->headers_in,
>> APR_OVERLAP_TABLES_MERGE);
>> +                r->status = HTTP_OK;
>
> Say what?  ...looks at rest of file...
>
> Is this because r->status is unset and we're saying that's it's  
> 'okay' to proceed with the request.  If so, this *really* needs a  
> comment to that effect.  It makes no sense whatsoever otherwise.   
> (We should probably remove the hack to set it to  
> HTTP_REQUEST_TIME_OUT initially as part of these changes.)

Yeah, setting r->status to HTTP_OK is done here solely to make it work
with the existing logic about HTTP_REQUEST_TIME_OUT meaning "still
reading the request header."

+1 for of removing the HTTP_REQUEST_TIME_OUT hack.  I was trying
to be conservative by preserving that part of the original logic, but  
now
that you mention it, we might as well replace it with something less  
subtle
in 2.3.  This will break any 3rd party modules that depend upon the
HTTP_REQUEST_TIME_OUT convention, but for a major release
like 2.4 or 3.0 that's a defensible choice.

>> +            }
>> +        }
>> +        else {
>> +            if ((*line == ' ') || (*line == '\t')) {
>> +                /* This line is a continuation of the previous  
>> one */
>> +                if (r->pending_header_line == NULL) {
>> +                    r->pending_header_line = line;
>> +                    r->pending_header_size = 0;
>> +                }
>> +                else {
>> +                    apr_size_t pending_len =
>> strlen(r->pending_header_line);
>> +                    apr_size_t fold_len = strlen(line);
>
> This seems really expensive.  You shouldn't need to recount  
> pending_len each time.

Good point; I'll add something to keep track of the length.

>> +            }
>> +            break;
>
> If I understand your direction, it'd bail out here if it ever got  
> EAGAIN?

Yes indeed.  That's what the version on the async-read-dev branch does.

>> +request_rec *ap_read_request(conn_rec *conn)
>> +{
>> +    request_rec *r;
>> +    apr_status_t rv;
>> +
>> +    r = init_request(conn);
>> +
>> +    rv = read_partial_request(r);
>> +    /* TODO poll if EAGAIN */
>> +    if (r->status != HTTP_OK) {
>> +        ap_send_error_response(r, 0);
>> +        ap_update_child_status(conn->sbh, SERVER_BUSY_LOG, r);
>> +        ap_run_log_transaction(r);
>> +        return NULL;
>> +    }
>
> Obviously, this can't be the case if you want to do real polling.   
> This would be the wrong place to poll.  You have to exit out of  
> ap_read_request and go back up to an 'event thread' that waits  
> until there's data to read on any incoming sockets.  Then, you'd  
> have to call ap_read_request again to 'restart' the parser whenever  
> there is more data to read.
>
> In my opinion, this code isn't even close to being async.

I'm relieved to hear that it's not async, since you're looking at the  
blocking
version.  :-)  See ap_read_async_request() (still on the async-read- 
dev branch).

>   So, I wonder why it was even merged to trunk right now.  You'd  
> have to deal with partial lines and the state the header parser is  
> in when it gets the next chunk of data - which this code doesn't  
> even try to do.  The current code is just going to bail when it  
> doesn't have a 'complete' line.
>
> My hunch is that you plan to build up pending_header_line and delay  
> parsing until you have the line terminators; but that's not going  
> to work really well with non-blocking sockets as you have to store  
> the data you just read non-blocking'ly if you actually read  
> multiple MIME header lines or, icky, even part of the request body  
> or the next request.  This just lends more credence to the argument  
> that request_rec is the wrong place for the buffer.  It's clearly  
> connection or connection-filter oriented...

The async impl addesses this problem by only grabbing a line at a  
time from
the input filter chain.  The filters are responsible for buffering  
anything that's been
read from the socket but not consumed by the current request.  If you  
have a
chance to review the async code (the version on the branch), I'm  
eager to get
some feedback on that part of the design.

> (IIRC, another point that'll kill httpd is that httpd mis-uses  
> EAGAIN; serf has a definition that EAGAIN means some data is read  
> *and* you need to ask for more.  httpd's filters don't have that.   
> One more issue that httpd got wrong...)
>
> To me, it sounds like we need a way for the MPM (or whatever) to  
> redefine the accept logic instead of relying upon the core.  Oh,  
> yah, we do that already (heh) by allowing them to call something  
> other than ap_process_connection: such as with  
> ap_process_http_async_connection(). So, why are we even touching  
> ap_read_request() at all?  This is going to necessitate an  
> ap_async_read_request() anyway...

Agreed on the need for async_read_request(); I purposely created that  
as a separate function,
following the pattern set by ap_process_http_async_connection(), so  
that things depending on
the semantics of the original ap_read_request() wouldn't break.  The  
bulk of the refactoring
that I've recently checked into the trunk is designed to allow the  
sync and async versions of
ap_read_request() to share code.  Given that the mime-folding code  
has been a source of
security vulnerabilities in the past, I'm trying to stick with a  
single impl of the header parsing
code that the sync and async code can both use.

Brian


Re: svn commit: r360461 - in /httpd/httpd/trunk: CHANGES include/ap_mmn.h include/httpd.h server/protocol.c

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
Comments inline.

--On December 31, 2005 11:45:14 PM +0000 brianp@apache.org wrote:

> ===== --- httpd/httpd/trunk/include/httpd.h (original)
> +++ httpd/httpd/trunk/include/httpd.h Sat Dec 31 15:45:11 2005
> @@ -967,6 +967,16 @@
>      /** A flag to determine if the eos bucket has been sent yet */
>      int eos_sent;
>
> +    /** Number of leading blank lines encountered before request header
> */ +    int num_blank_lines;
> +
> +    /** A buffered header line, used to support header folding while
> +     *  reading the request */
> +    char *pending_header_line;
> +
> +    /** If nonzero, the number of bytes allocated to hold
> pending_header_line */ +    apr_size_t pending_header_size;
> +

As Roy pointed out, these fields should not be in request_rec.  A filter or 
connection context, perhaps.  But, not here.  (More as to why below.)

...snip...

> +static apr_status_t process_request_line(request_rec *r, char *line,
> +                                         int skip_first)
> +{
> +    if (!skip_first && (r->the_request == NULL)) {
> +        /* This is the first line of the request */
> +        if ((line == NULL) || (*line == '\0')) {
> +            /* We skip empty lines because browsers have to tack a CRLF
> on to the end +             * of POSTs to support old CERN webservers.
> +             */
> +            int max_blank_lines = r->server->limit_req_fields;
> +            if (max_blank_lines <= 0) {
> +                max_blank_lines = DEFAULT_LIMIT_REQUEST_FIELDS;
> +            }
> +            r->num_blank_lines++;
> +            if (r->num_blank_lines < max_blank_lines) {
> +                return APR_SUCCESS;
> +            }
> +        }
> +        set_the_request(r, line);
> +    }

This will cause a segfault if line is null and we are at or above 
max_blank_lines.  Perhaps you meant for an else clause here?

> +    else {
> +        /* We've already read the first line of the request.  This is
> either +         * a header field or the blank line terminating the header
> +         */
> +        if ((line == NULL) || (*line == '\0')) {
> +            if (r->pending_header_line != NULL) {
> +                apr_status_t rv = set_mime_header(r,
> r->pending_header_line); +                if (rv != APR_SUCCESS) {
> +                    return rv;
> +                }
> +                r->pending_header_line = NULL;
> +            }
> +            if (r->status == HTTP_REQUEST_TIME_OUT) {
> +                apr_table_compress(r->headers_in,
> APR_OVERLAP_TABLES_MERGE);
> +                r->status = HTTP_OK;

Say what?  ...looks at rest of file...

Is this because r->status is unset and we're saying that's it's 'okay' to 
proceed with the request.  If so, this *really* needs a comment to that 
effect.  It makes no sense whatsoever otherwise.  (We should probably 
remove the hack to set it to HTTP_REQUEST_TIME_OUT initially as part of 
these changes.)

> +            }
> +        }
> +        else {
> +            if ((*line == ' ') || (*line == '\t')) {
> +                /* This line is a continuation of the previous one */
> +                if (r->pending_header_line == NULL) {
> +                    r->pending_header_line = line;
> +                    r->pending_header_size = 0;
> +                }
> +                else {
> +                    apr_size_t pending_len =
> strlen(r->pending_header_line);
> +                    apr_size_t fold_len = strlen(line);

This seems really expensive.  You shouldn't need to recount pending_len 
each time.

> +                    if (pending_len + fold_len >
> +                        r->server->limit_req_fieldsize) {
> +                        /* CVE-2004-0942 */
> +                        r->status = HTTP_BAD_REQUEST;
> +                        return APR_ENOSPC;
> +                    }
> +                    if (pending_len + fold_len + 1 >
> r->pending_header_size) { +                        /* Allocate a new
> buffer big enough to hold the +                         * concatenated
> lines
> +                         */
> +                        apr_size_t new_size = r->pending_header_size;
> +                        char *new_buf;
> +                        if (new_size == 0) {
> +                            new_size = pending_len + fold_len + 1;
> +                        }
> +                        else {
> +                            do {
> +                                new_size *= 2;
> +                            } while (new_size < pending_len + fold_len +
> 1); +                        }
> +                        new_buf = (char *)apr_palloc(r->pool, new_size);
> +                        memcpy(new_buf, r->pending_header_line,
> pending_len); +                        r->pending_header_line = new_buf;
> +                        r->pending_header_size = new_size;
> +                    }
> +                    memcpy(r->pending_header_line + pending_len, line,
> +                           fold_len + 1);

See, we know how much we've written each time...

> +    /* Read and process lines of the request until we
> +     * encounter a complete request header, an error, or EAGAIN
> +     */
> +    tmp_bb = apr_brigade_create(r->pool, r->connection->bucket_alloc);
> +    while (r->status == HTTP_REQUEST_TIME_OUT) {
> +        char *line = NULL;
> +        apr_size_t line_length;
> +        apr_size_t length_limit;
> +        int first_line = (r->the_request == NULL);
> +        if (first_line) {
> +            length_limit = r->server->limit_req_line;
> +        }
> +        else {
> +            if (r->assbackwards) {
> +                r->status = HTTP_OK;
> +                break;
> +            }
> +            length_limit = r->server->limit_req_fieldsize;
> +        }
> +        /* TODO: use a nonblocking call in place of ap_rgetline */
> +        rv = ap_rgetline(&line, length_limit + 2,
> +                         &line_length, r, 0, tmp_bb);
> +        if (rv == APR_SUCCESS) {
> +            rv = process_request_line(r, line, 0);
> +        }
> +        if (rv != APR_SUCCESS) {
> +            r->request_time = apr_time_now();
> +            /* ap_rgetline returns APR_ENOSPC if it fills up the
> +             * buffer before finding the end-of-line.  This is only
> going to +             * happen if it exceeds the configured limit for a
> request-line. +             */
> +            if (rv == APR_ENOSPC) {
> +                if (first_line) {
> +                    ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r,
> +                                  "request failed: URI too long (longer
> than %d)", +                                  r->server->limit_req_line);
> +                    r->status    = HTTP_REQUEST_URI_TOO_LARGE;
> +                    r->proto_num = HTTP_VERSION(1,0);
> +                    r->protocol  = apr_pstrdup(r->pool, "HTTP/1.0");
> +                }
> +                else {
> +                    ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r,
> +                                  "request failed: header line too long "
> +                                  "(longer than %d)",
> +                                  r->server->limit_req_fieldsize);
> +                    r->status = HTTP_BAD_REQUEST;
> +                }
> +            }
> +            break;

If I understand your direction, it'd bail out here if it ever got EAGAIN?

> +        }
> +    }
> +    apr_brigade_destroy(tmp_bb);
> +    return rv;
> +}
> +
> +request_rec *ap_read_request(conn_rec *conn)
> +{
> +    request_rec *r;
> +    apr_status_t rv;
> +
> +    r = init_request(conn);
> +
> +    rv = read_partial_request(r);
> +    /* TODO poll if EAGAIN */
> +    if (r->status != HTTP_OK) {
> +        ap_send_error_response(r, 0);
> +        ap_update_child_status(conn->sbh, SERVER_BUSY_LOG, r);
> +        ap_run_log_transaction(r);
> +        return NULL;
> +    }

Obviously, this can't be the case if you want to do real polling.  This 
would be the wrong place to poll.  You have to exit out of ap_read_request 
and go back up to an 'event thread' that waits until there's data to read 
on any incoming sockets.  Then, you'd have to call ap_read_request again to 
'restart' the parser whenever there is more data to read.

In my opinion, this code isn't even close to being async.  So, I wonder why 
it was even merged to trunk right now.  You'd have to deal with partial 
lines and the state the header parser is in when it gets the next chunk of 
data - which this code doesn't even try to do.  The current code is just 
going to bail when it doesn't have a 'complete' line.

My hunch is that you plan to build up pending_header_line and delay parsing 
until you have the line terminators; but that's not going to work really 
well with non-blocking sockets as you have to store the data you just read 
non-blocking'ly if you actually read multiple MIME header lines or, icky, 
even part of the request body or the next request.  This just lends more 
credence to the argument that request_rec is the wrong place for the 
buffer.  It's clearly connection or connection-filter oriented...

FWIW, serf does all of its header parsing network asynchronously; but the 
HTTP header parser and buckets have a concept of state:

<http://svn.webdav.org/repos/projects/serf/trunk/buckets/response_buckets.c>

(IIRC, another point that'll kill httpd is that httpd mis-uses EAGAIN; serf 
has a definition that EAGAIN means some data is read *and* you need to ask 
for more.  httpd's filters don't have that.  One more issue that httpd got 
wrong...)

To me, it sounds like we need a way for the MPM (or whatever) to redefine 
the accept logic instead of relying upon the core.  Oh, yah, we do that 
already (heh) by allowing them to call something other than 
ap_process_connection: such as with ap_process_http_async_connection(). 
So, why are we even touching ap_read_request() at all?  This is going to 
necessitate an ap_async_read_request() anyway...  -- justin

Re: svn commit: r360461 - in /httpd/httpd/trunk: CHANGES include/ap_mmn.h include/httpd.h server/protocol.c

Posted by Brian Pane <br...@apache.org>.
On Jan 2, 2006, at 2:14 PM, Roy T. Fielding wrote:

> On Jan 2, 2006, at 1:37 PM, Brian Pane wrote:
>
>> "Significantly faster than prefork" has never been a litmus test for
>> assessing new features, and I'm -1 for adding it now.  A reasonable
>> technical metric for validating the async changes would  
>> "significantly
>> more scalable than the 2.2 Event MPM" or "memory footprint
>> competitive with IIS/Zeus/phttpd/one's-competitive-benchmark-of- 
>> choice."
>
> Those aren't features.  They are properties of the resulting system
> assuming all goes well.
>
>> The bit about degrading the rest of the server is a wonderful sound
>> bite, but we need to engineer the httpd based on data, not FUD.
>
> I said leave it on the async branch until you have data.  You moved
> it to trunk before you've even implemented the async part, which I
> think is wrong because the way you implemented it damages the
> performance of prefork and needlessly creates an incompatible MMN

You have yet to present any data backing up the assertion that
it "damages the performance of prefork."  (Repeating the claim
doesn't count as a proof.)  Having looked at the compiler's
inlining of the code that's been factored into separate functions,
I'm rather skeptical of the claim.

The "incompatible MMN" point is puzzling, to say the least.  As the
4th MMN change in the past quarter, this was by no means an
unusual event.  And on an unreleased development codeline,
the impact to production sites and 3rd party module developers
is near zero.

Brian


Re: svn commit: r360461 - in /httpd/httpd/trunk: CHANGES include/ap_mmn.h include/httpd.h server/protocol.c

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
Roy T. Fielding wrote:
>
> Alternatively, rewrite the server to remove all MPMs other than
> event and then show that the new server is better than our existing
> server, and we can adopt that for 3.0.

Well that's a bit silly, leave the others for those who must have an entirely
non-threaded server or other options; but prove that event is more performant
and quite stable for all platforms, if they would only take the time to use
threaded-compatible 3rd party modules, then make event the default 3.0 mpm...

Now that's a metric I'll buy.

Bill

Re: svn commit: r360461 - in /httpd/httpd/trunk: CHANGES include/ap_mmn.h include/httpd.h server/protocol.c

Posted by "Roy T. Fielding" <fi...@gbiv.com>.
On Jan 3, 2006, at 12:02 AM, William A. Rowe, Jr. wrote:

> Roy T. Fielding wrote:
>> On Jan 2, 2006, at 2:14 PM, Roy T. Fielding wrote:
>>> Now, if you want to tell me that those changes produced a net
>>> performance benefit on prefork (and thus are applicable to other   
>>> MPMs),
>>> then I am all ears.  I am easily convinced by comparative  
>>> performance
>>> figures when the comparison is meaningful.
>
> lol, of course you choose the non-threaded MPM as a reference,  
> which therefore
> should receive no meaningful performance change.  The difference  
> between an
> async wakeup and a poll result should be null for one socket pool,  
> one process,
> one thread (of course select is a differently ugly beast, and if  
> there were
> a platform that supported async with no poll, I'd laugh.)

You seem to misunderstand me -- if I compare two prefork servers, one
with the change and one without the change, and the one with the change
performs better (by whatever various measures of performance you can  
test),
then that is an argument for making the change regardless of the other
concerns.

If, instead, you say that the change improves the event MPM by 10% and
degrades performance on prefork by 1%, then I am -1 on that change.
Prefork is our workhorse MPM.  The task then is to isolate MPM-specific
changes so that they have no significant impact on the critical path
of our primary MPM, even if that means using #ifdefs.

Alternatively, rewrite the server to remove all MPMs other than
event and then show that the new server is better than our existing
server, and we can adopt that for 3.0.

>> BTW, part of the reason I say that is because I have considered
>> replacing the same code with something that doesn't parse the
>> header fields until the request header/body separator line is
>> seen, since that would allow the entire request header to be parsed
>> in-place for the common case.
>
> Well ... you are using protocol knowledge that will render us http- 
> bound
> when it comes time to efficently bind waka (no crlf delims in a  
> binary format
> protocol, no?) or ftp (pushes a 'greeting' before going back to  
> sleep.)

Well, I am assuming that the MIME header parsing code is in the
protocol-specific part of the server, yes.

....Roy


Re: svn commit: r360461 - in /httpd/httpd/trunk: CHANGES include/ap_mmn.h include/httpd.h server/protocol.c

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
Roy T. Fielding wrote:
> On Jan 2, 2006, at 2:14 PM, Roy T. Fielding wrote:
> 
>> Now, if you want to tell me that those changes produced a net
>> performance benefit on prefork (and thus are applicable to other  MPMs),
>> then I am all ears.  I am easily convinced by comparative performance
>> figures when the comparison is meaningful.

lol, of course you choose the non-threaded MPM as a reference, which therefore
should receive no meaningful performance change.  The difference between an
async wakeup and a poll result should be null for one socket pool, one process,
one thread (of course select is a differently ugly beast, and if there were
a platform that supported async with no poll, I'd laugh.)

> BTW, part of the reason I say that is because I have considered
> replacing the same code with something that doesn't parse the
> header fields until the request header/body separator line is
> seen, since that would allow the entire request header to be parsed
> in-place for the common case.

Well ... you are using protocol knowledge that will render us http-bound
when it comes time to efficently bind waka (no crlf delims in a binary format
protocol, no?) or ftp (pushes a 'greeting' before going back to sleep.)

But I'd agree these changes are radical enough that maintaining the async
branch a while longer is probably a good thing.

Bill

Re: svn commit: r360461 - in /httpd/httpd/trunk: CHANGES include/ap_mmn.h include/httpd.h server/protocol.c

Posted by "Roy T. Fielding" <fi...@gbiv.com>.
On Jan 2, 2006, at 2:14 PM, Roy T. Fielding wrote:

> Now, if you want to tell me that those changes produced a net
> performance benefit on prefork (and thus are applicable to other  
> MPMs),
> then I am all ears.  I am easily convinced by comparative performance
> figures when the comparison is meaningful.

BTW, part of the reason I say that is because I have considered
replacing the same code with something that doesn't parse the
header fields until the request header/body separator line is
seen, since that would allow the entire request header to be parsed
in-place for the common case.

....Roy

Re: svn commit: r360461 - in /httpd/httpd/trunk: CHANGES include/ap_mmn.h include/httpd.h server/protocol.c

Posted by "Roy T. Fielding" <fi...@gbiv.com>.
On Jan 2, 2006, at 1:37 PM, Brian Pane wrote:

> "Significantly faster than prefork" has never been a litmus test for
> assessing new features, and I'm -1 for adding it now.  A reasonable
> technical metric for validating the async changes would "significantly
> more scalable than the 2.2 Event MPM" or "memory footprint
> competitive with IIS/Zeus/phttpd/one's-competitive-benchmark-of- 
> choice."

Those aren't features.  They are properties of the resulting system
assuming all goes well.

> The bit about degrading the rest of the server is a wonderful sound
> bite, but we need to engineer the httpd based on data, not FUD.

I said leave it on the async branch until you have data.  You moved
it to trunk before you've even implemented the async part, which I
think is wrong because the way you implemented it damages the
performance of prefork and needlessly creates an incompatible MMN.
Maybe it would be easier for me to understand why the event loop is
being controlled at such a high level if I could see it work first.

Now, if you want to tell me that those changes produced a net
performance benefit on prefork (and thus are applicable to other MPMs),
then I am all ears.  I am easily convinced by comparative performance
figures when the comparison is meaningful.

....Roy

Re: svn commit: r360461 - in /httpd/httpd/trunk: CHANGES include/ap_mmn.h include/httpd.h server/protocol.c

Posted by Brian Pane <br...@apache.org>.
On Jan 2, 2006, at 11:52 AM, Roy T. Fielding wrote:

>> It would be feasible to build up the pending request in a structure
>> other than the request_rec, so that ap_read_async_request() can
>> operate on, say, an ap_partial_request_t instead of a request_rec.
>> My preference so far, though, has been to leave the responsibility
>> for knowing how to parse request headers encapsulated within
>> the request_rec and its associated "methods."
>
> Maybe you should just keep those changes on the async branch for now.
> The rest of the server cannot be allowed to degrade just because you
> want to introduce a new MPM.  After the async branch is proven to be
> significantly faster than prefork, then we can evaluate whether or
> not the additional complications are worth it.

"Significantly faster than prefork" has never been a litmus test for
assessing new features, and I'm -1 for adding it now.  A reasonable
technical metric for validating the async changes would "significantly
more scalable than the 2.2 Event MPM" or "memory footprint
competitive with IIS/Zeus/phttpd/one's-competitive-benchmark-of-choice."

The bit about degrading the rest of the server is a wonderful sound
bite, but we need to engineer the httpd based on data, not FUD.

Brian


Re: svn commit: r360461 - in /httpd/httpd/trunk: CHANGES include/ap_mmn.h include/httpd.h server/protocol.c

Posted by "Roy T. Fielding" <fi...@gbiv.com>.
On Dec 31, 2005, at 9:55 PM, Brian Pane wrote:
> On Dec 31, 2005, at 6:50 PM, Roy T. Fielding wrote:
>
>> The nonblocking yield should happen inside ap_rgetline (or its
>> replacement), not in the calling routine.  The thread has nothing
>> to do until that call is finished or it times out.
>
> On the contrary, the thread has some very important work to do before
> that call finishes or times out: it has other connections to  
> process.  If
> the thread waits until the ap_rgetline completes, a server  
> configuration
> sized for multiple threads per connection will be vulnerable to a  
> trivially
> implementable DoS.

When I say "thread", I mean a single stream of control with execution
stack, not OS process/thread.  An event MPM is going to have a single
stream of control per event, right?  What I am saying is that the
control should block in rgetline and yield (return to the event pool)
inside that function.  That way, the complications due to yielding are
limited to the I/O routines that might block a thread rather than
being spread all over the server code.

Am I missing something?  This is not a new topic -- Dean Gaudet had
quite a few rants on the subject in the archives.

>>  In any case,
>> this code should be independent of the MPM and no MPM is going
>> to do something useful with a partial HTTP request.
>>
>> I say -1 to adding the input buffer variables to the request_rec.
>> Those variables can be local to the input loop.
>
> Are you proposing that the buffers literally become local variables?
> That generally won't work; the input loop isn't necessarily contained
> within a single function call, and the reading of a single request's
> input can cross threads.

I am saying it doesn't belong in the request_rec.  It belongs on the
execution stack that the yield routine has to save in order to return
to this execution path on the next event.  The request does not care
about partial lines.

> It would be feasible to build up the pending request in a structure
> other than the request_rec, so that ap_read_async_request() can
> operate on, say, an ap_partial_request_t instead of a request_rec.
> My preference so far, though, has been to leave the responsibility
> for knowing how to parse request headers encapsulated within
> the request_rec and its associated "methods."

Maybe you should just keep those changes on the async branch for now.
The rest of the server cannot be allowed to degrade just because you
want to introduce a new MPM.  After the async branch is proven to be
significantly faster than prefork, then we can evaluate whether or
not the additional complications are worth it.

....Roy

Re: svn commit: r360461 - in /httpd/httpd/trunk: CHANGES include/ap_mmn.h include/httpd.h server/protocol.c

Posted by Brian Pane <br...@apache.org>.
On Dec 31, 2005, at 6:50 PM, Roy T. Fielding wrote:

> The nonblocking yield should happen inside ap_rgetline (or its
> replacement), not in the calling routine.  The thread has nothing
> to do until that call is finished or it times out.

On the contrary, the thread has some very important work to do before
that call finishes or times out: it has other connections to  
process.  If
the thread waits until the ap_rgetline completes, a server configuration
sized for multiple threads per connection will be vulnerable to a  
trivially
implementable DoS.

>  In any case,
> this code should be independent of the MPM and no MPM is going
> to do something useful with a partial HTTP request.
>
> I say -1 to adding the input buffer variables to the request_rec.
> Those variables can be local to the input loop.

Are you proposing that the buffers literally become local variables?
That generally won't work; the input loop isn't necessarily contained
within a single function call, and the reading of a single request's
input can cross threads.

It would be feasible to build up the pending request in a structure
other than the request_rec, so that ap_read_async_request() can
operate on, say, an ap_partial_request_t instead of a request_rec.
My preference so far, though, has been to leave the responsibility
for knowing how to parse request headers encapsulated within
the request_rec and its associated "methods."

Brian


Re: svn commit: r360461 - in /httpd/httpd/trunk: CHANGES include/ap_mmn.h include/httpd.h server/protocol.c

Posted by "Roy T. Fielding" <fi...@gbiv.com>.
On Dec 31, 2005, at 3:45 PM, brianp@apache.org wrote:

> Author: brianp
> Date: Sat Dec 31 15:45:11 2005
> New Revision: 360461
>
> URL: http://svn.apache.org/viewcvs?rev=360461&view=rev
> Log:
> Refactoring of ap_read_request() to store partial request state
> in the request rec.  The point of this is to allow asynchronous
> MPMs do do nonblocking reads of requests.  (Backported from the
> async-read-dev branch)

Umm, this needs more eyes.

It doesn't seem to me to be doing anything useful.  It just adds
a set of unused input buffer fields to the wrong memory structure,
resulting in what should be a minor (not major) MMN bump, and then
rearranges a critical-path function into several subroutines.

The nonblocking yield should happen inside ap_rgetline (or its
replacement), not in the calling routine.  The thread has nothing
to do until that call is finished or it times out.  In any case,
this code should be independent of the MPM and no MPM is going
to do something useful with a partial HTTP request.

I say -1 to adding the input buffer variables to the request_rec.
Those variables can be local to the input loop.  I don't see any
point in placing this on trunk until it can do something useful.

....Roy