You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Stuart Roll <sr...@symantec.com> on 2007/11/21 23:05:46 UTC

Weird error across connections - keepalive, timeouts, HTTP 413

Greetings -

<originally posted to modules-dev but redirected here>
 
I have encountered a very weird error and I wanted to run it by this
list before I try filing a bug report.

I am running a custom proxy module that receives chunked HTTP requests
using REQUEST_CHUNKED_DECHUNK and then forwards the unchunked payload to
an internal server.

Occasionally a client would hang for about 5 minutes and then return an
HTTP 413 error (HTTP_REQUEST_ENTITY_TOO_LARGE).  Upon investigation it
appears that there is a problem with the persistent connection keepalive
timeouts in Apache.

If I insert a delay in the (Java) client in between opening the response
input stream and reading the last of the response data, and that delay
exceeds the Apache KeepAliveTimeout value then the server closes the
read side of the connection with the client.  The client is, as
expected, able to read the rest of the data correctly.

The next request goes out over the same persistent TCP connection and
immediately gets a TCP RESET (totally expected).  The Sun HTTP client
library automatically opens a new connection and re-issues the request.

Here's the weird part... When the new request comes in on the new
connection, my module gets control normally but when it invokes
ap_get_client_block() it hangs for the configured Timeout value before
returning -1.  Debugging the Apache code a bit, I found that
http_protocol.c:ap_http_filter() receives APR_TIMEUP (70007) from
ap_get_brigade().  This falls through the code and erroneously gets
reported as a 413 error by ap_http_filter().


Here's the relevant 2.0 code in http_filters.c:

        /* We can't read the chunk until after sending 100 if required.
*/
        if (ctx->state == BODY_CHUNK) {
            char line[30];
            apr_bucket_brigade *bb;
            apr_size_t len = 30;

            bb = apr_brigade_create(f->r->pool, f->c->bucket_alloc);

            rv = ap_get_brigade(f->next, bb, AP_MODE_GETLINE,
                                APR_BLOCK_READ, 0);

/* At this point rv == APR_TIMEUP (70007) */

            if (rv == APR_SUCCESS) {
                rv = apr_brigade_flatten(bb, line, &len);
                if (rv == APR_SUCCESS) {
                    ctx->remaining = get_chunk_size(line);
                }
            }
            apr_brigade_cleanup(bb);

            /* Detect chunksize error (such as overflow) */
            if (rv != APR_SUCCESS || ctx->remaining < 0) {
                ctx->remaining = 0; /* Reset it in case we have to
                                     * come back here later */

/* Here's where the APR_TIMEUP error is erroneously reported as a 413.
*/

                e =
ap_bucket_error_create(HTTP_REQUEST_ENTITY_TOO_LARGE, NULL,
                                           f->r->pool,
                                           f->c->bucket_alloc);
                APR_BRIGADE_INSERT_TAIL(bb, e);
                e = apr_bucket_eos_create(f->c->bucket_alloc);
                APR_BRIGADE_INSERT_TAIL(bb, e);
                ctx->eos_sent = 1;
                return ap_pass_brigade(f->r->output_filters, bb);
            }


If I insert the client delay after the client reads the end of the
response, no such error occurs on the next request.

It smells like some bad state carries over from one connection to
another but I don't know how that could happen.

Any ideas from you gurus out there?  I'm starting to get out of my level
of expertise here! ;)

Stuart