You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Joe Orton <jo...@redhat.com> on 2004/10/12 15:08:11 UTC

[RFC] a lazy byterange filter

The patch below changes the byterange filter to only do byteranging if
passed a complete EOS-terminated brigade containing only buckets with
non-negative length.  This fixes the cases where the filter will
currently eat all your RAM (PR 29962).

It would be possible to also extend the code (adding significant
complexity) to be able to "streamily" byterange arbitrary responses for
certain byterange requests.  I don't think it is necessary or even
desirable to do this in the stock byterange filter:

1) it's not particularly useful to be able to retrieve byteranges of the
output of some SSI, CGI or PHP script output, if the output may change
for each request anyway.

2) it's only good to byterange if it's clearly "cheap" to do so.  For a
CGI script which does some expensive computation then generates a
response, byteranging the output is not desirable.  Similarly for the
case of a reverse proxy to a dumb backend given in PR 29962: if the
backend doesn't support byteranges, handling a "last five bytes please"
byterange request *on the proxy* is dumb.

OK, let the flames begin...

 http_protocol.c |   80 +++++++++++++++++++++-----------------------------------
 1 files changed, 30 insertions(+), 50 deletions(-)
Index: modules/http/http_protocol.c
===================================================================
RCS file: /home/cvs/httpd-2.0/modules/http/http_protocol.c,v
retrieving revision 1.484
diff -u -r1.484 http_protocol.c
--- modules/http/http_protocol.c	29 Sep 2004 14:38:42 -0000	1.484
+++ modules/http/http_protocol.c	12 Oct 2004 13:07:18 -0000
@@ -2839,13 +2839,6 @@
 
 static int ap_set_byterange(request_rec *r);
 
-typedef struct byterange_ctx {
-    apr_bucket_brigade *bb;
-    int num_ranges;
-    char *boundary;
-    char *bound_head;
-} byterange_ctx;
-
 /*
  * Here we try to be compatible with clients that want multipart/x-byteranges
  * instead of multipart/byteranges (also see above), as per HTTP/1.1. We
@@ -2871,19 +2864,35 @@
 #define MIN_LENGTH(len1, len2) ((len1 > len2) ? len2 : len1)
     request_rec *r = f->r;
     conn_rec *c = r->connection;
-    byterange_ctx *ctx = f->ctx;
     apr_bucket *e;
     apr_bucket_brigade *bsend;
     apr_off_t range_start;
     apr_off_t range_end;
     char *current;
-    apr_off_t bb_length;
     apr_off_t clength = 0;
     apr_status_t rv;
     int found = 0;
+    int num_ranges;
+    char *boundary = NULL; /* init to avoid compiler warning */
+    char *bound_head = NULL;
+
+    for (e = APR_BRIGADE_FIRST(bb);
+         e != APR_BRIGADE_SENTINEL(bb) && !APR_BUCKET_IS_EOS(e)
+             && e->length != (apr_size_t)-1;
+         e = APR_BUCKET_NEXT(e)) {
+        clength += e->length;
+    }
+
+    /* Don't attempt to do byte range work if the brigade doesn't
+     * contain an EOS, or or if any of the buckets has length == -1
+     * (which implies the bucket will morph when ->read(). */
+    
+    if (!APR_BUCKET_IS_EOS(e) || clength <= 0) {
+        ap_remove_output_filter(f);
+        return ap_pass_brigade(f->next, bb);
+    }
 
-    if (!ctx) {
-        int num_ranges = ap_set_byterange(r);
+        num_ranges = ap_set_byterange(r);
 
         /* We have nothing to do, get out of the way. */
         if (num_ranges == 0) {
@@ -2891,54 +2900,25 @@
             return ap_pass_brigade(f->next, bb);
         }
 
-        ctx = f->ctx = apr_pcalloc(r->pool, sizeof(*ctx));
-        ctx->num_ranges = num_ranges;
-        /* create a brigade in case we never call ap_save_brigade() */
-        ctx->bb = apr_brigade_create(r->pool, c->bucket_alloc);
-
-        if (ctx->num_ranges > 1) {
+        if (num_ranges > 1) {
             /* Is ap_make_content_type required here? */
             const char *orig_ct = ap_make_content_type(r, r->content_type);
-            ctx->boundary = apr_psprintf(r->pool, "%" APR_UINT64_T_HEX_FMT "%lx",
+            boundary = apr_psprintf(r->pool, "%" APR_UINT64_T_HEX_FMT "%lx",
                                          (apr_uint64_t)r->request_time, (long) getpid());
 
             ap_set_content_type(r, apr_pstrcat(r->pool, "multipart",
                                                use_range_x(r) ? "/x-" : "/",
                                                "byteranges; boundary=",
-                                               ctx->boundary, NULL));
+                                               boundary, NULL));
 
-            ctx->bound_head = apr_pstrcat(r->pool,
-                                    CRLF "--", ctx->boundary,
+            bound_head = apr_pstrcat(r->pool,
+                                    CRLF "--", boundary,
                                     CRLF "Content-type: ",
                                     orig_ct,
                                     CRLF "Content-range: bytes ",
                                     NULL);
-            ap_xlate_proto_to_ascii(ctx->bound_head, strlen(ctx->bound_head));
+            ap_xlate_proto_to_ascii(bound_head, strlen(bound_head));
         }
-    }
-
-    /* We can't actually deal with byte-ranges until we have the whole brigade
-     * because the byte-ranges can be in any order, and according to the RFC,
-     * we SHOULD return the data in the same order it was requested.
-     *
-     * XXX: We really need to dump all bytes prior to the start of the earliest
-     * range, and only slurp up to the end of the latest range.  By this we
-     * mean that we should peek-ahead at the lowest first byte of any range,
-     * and the highest last byte of any range.
-     */
-    if (!APR_BUCKET_IS_EOS(APR_BRIGADE_LAST(bb))) {
-        ap_save_brigade(f, &ctx->bb, &bb, r->pool);
-        return APR_SUCCESS;
-    }
-
-    /* Prepend any earlier saved brigades. */
-    APR_BRIGADE_PREPEND(bb, ctx->bb);
-
-    /* It is possible that we won't have a content length yet, so we have to
-     * compute the length before we can actually do the byterange work.
-     */
-    apr_brigade_length(bb, 1, &bb_length);
-    clength = (apr_off_t)bb_length;
 
     /* this brigade holds what we will be sending */
     bsend = apr_brigade_create(r->pool, c->bucket_alloc);
@@ -2972,7 +2952,7 @@
         /* For single range requests, we must produce Content-Range header.
          * Otherwise, we need to produce the multipart boundaries.
          */
-        if (ctx->num_ranges == 1) {
+        if (num_ranges == 1) {
             apr_table_setn(r->headers_out, "Content-Range",
                            apr_psprintf(r->pool, "bytes " BYTERANGE_FMT,
                                         range_start, range_end, clength));
@@ -2980,7 +2960,7 @@
         else {
             char *ts;
 
-            e = apr_bucket_pool_create(ctx->bound_head, strlen(ctx->bound_head),
+            e = apr_bucket_pool_create(bound_head, strlen(bound_head),
                                        r->pool, c->bucket_alloc);
             APR_BRIGADE_INSERT_TAIL(bsend, e);
 
@@ -3025,11 +3005,11 @@
         return ap_pass_brigade(f->next, bsend);
     }
 
-    if (ctx->num_ranges > 1) {
+    if (num_ranges > 1) {
         char *end;
 
         /* add the final boundary */
-        end = apr_pstrcat(r->pool, CRLF "--", ctx->boundary, "--" CRLF, NULL);
+        end = apr_pstrcat(r->pool, CRLF "--", boundary, "--" CRLF, NULL);
         ap_xlate_proto_to_ascii(end, strlen(end));
         e = apr_bucket_pool_create(end, strlen(end), r->pool, c->bucket_alloc);
         APR_BRIGADE_INSERT_TAIL(bsend, e);


Re: [RFC] a lazy byterange filter

Posted by Rüdiger Plüm <r....@t-online.de>.

Joe Orton wrote:
> On Tue, Oct 12, 2004 at 04:39:00PM +0200, André Malo wrote:
> 
>>* Joe Orton <jo...@redhat.com> wrote:
>>
>>
>>>1) it's not particularly useful to be able to retrieve byteranges of the
>>>output of some SSI, CGI or PHP script output, if the output may change
>>>for each request anyway.
>>
>>Not agreed. It is, especially for download scripts very useful (which e.g.
>>count the number of downloads or whatever).
> 
> 
> OK sure, but the point you have to argue is why it's useful to support
> this feature *in httpd*.  N million 1.3 users live without it.  I'm
> arguing that you can and should implement byteranging in your CGI script
> if you want that, since that is the place where it is efficient to do
> so.  If you serve DVD images with your counting-CGI do you really want
> to pass 2Gb through the filter chain then discard it, for those "last 5
> bytes" byterange requests?
> 

I agree with André on that. Your DVD example is an extreme example. Most
times much less data will be transfered. Other applications that require the
current behaviour are CGI scripts / Tomcat servlets and jsps that protect downloads
with a custom authorization. So I fear the change will break partial downloads in these
situations and I do not think in general that every kind of application should implement
its own byteranging instead of doing this at a central position like the webserver.
I think the application should implement it by itself if the operation it does is

a) costly
b) produces very much data (like your DVD example)

On the other hand I understand the current problem. As far as I can see it is ONLY neccessary
to save the whole content in memory if the request contains multiple byteranges. If the request only
contains one byterange I think it would be sufficient to simply drop the uneeded data and only
forward the requested one. So my idea would be to have the behaviour of your patch if we have
multiple byteranges and the old behaviour if we have one byterange. But in this case the
code needs to be adjusted in a way that it drops all unneeded content immediately to avoid
the memory problem. Thoughts?


Regards

Rüdiger

Re: [RFC] a lazy byterange filter

Posted by Joe Orton <jo...@redhat.com>.
On Tue, Oct 12, 2004 at 11:05:14AM -0700, Justin Erenkrantz wrote:
> --On Tuesday, October 12, 2004 4:17 PM +0100 Joe Orton <jo...@redhat.com> 
> wrote:
> 
> >OK sure, but the point you have to argue is why it's useful to support
> >this feature *in httpd*.  N million 1.3 users live without it.  I'm
> 
> Agreed.  +1 for making it simpler and avoiding the memory hogging.  My only 
> comment on your patch is that I'd personally keep the filter context around 
> just to make the diff that much easier to grok.  To be correct, you'd also 
> have to shift everything over one tab level.  And, if someone can come up 
> with a way to add a smarter byterange filter later, they have a context 
> there already.  =)

OK, yup, thanks.

joe

Re: [RFC] a lazy byterange filter

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On Tuesday, October 12, 2004 4:17 PM +0100 Joe Orton <jo...@redhat.com> 
wrote:

> OK sure, but the point you have to argue is why it's useful to support
> this feature *in httpd*.  N million 1.3 users live without it.  I'm

Agreed.  +1 for making it simpler and avoiding the memory hogging.  My only 
comment on your patch is that I'd personally keep the filter context around 
just to make the diff that much easier to grok.  To be correct, you'd also 
have to shift everything over one tab level.  And, if someone can come up with 
a way to add a smarter byterange filter later, they have a context there 
already.  =)

> (I probably should have made the subject "a lazy developer's byterange
> filter" to make my intentions clear about doing any real work to fix PR
> 29962 :)

I don't think there's a straightforward way to honor these contrived byterange 
requests that produce degenerate output streams.  And, my interpretation of 
the RFC leads me to believe that the people who wrote it understood that and 
gave us a clear opt-out.  -- justin

Re: [RFC] a lazy byterange filter

Posted by Joe Orton <jo...@redhat.com>.
On Tue, Oct 12, 2004 at 04:39:00PM +0200, André Malo wrote:
> * Joe Orton <jo...@redhat.com> wrote:
> 
> > 1) it's not particularly useful to be able to retrieve byteranges of the
> > output of some SSI, CGI or PHP script output, if the output may change
> > for each request anyway.
> 
> Not agreed. It is, especially for download scripts very useful (which e.g.
> count the number of downloads or whatever).

OK sure, but the point you have to argue is why it's useful to support
this feature *in httpd*.  N million 1.3 users live without it.  I'm
arguing that you can and should implement byteranging in your CGI script
if you want that, since that is the place where it is efficient to do
so.  If you serve DVD images with your counting-CGI do you really want
to pass 2Gb through the filter chain then discard it, for those "last 5
bytes" byterange requests?

(I probably should have made the subject "a lazy developer's byterange
filter" to make my intentions clear about doing any real work to fix PR
29962 :)

joe

Re: [RFC] a lazy byterange filter

Posted by André Malo <nd...@perlig.de>.
* Joe Orton <jo...@redhat.com> wrote:

> 1) it's not particularly useful to be able to retrieve byteranges of the
> output of some SSI, CGI or PHP script output, if the output may change
> for each request anyway.

Not agreed. It is, especially for download scripts very useful (which e.g.
count the number of downloads or whatever).

nd