You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Brian Pane <bp...@pacbell.net> on 2001/08/28 01:05:31 UTC

chunking of content in mod_include?

In mod_include's find_start_sequence function, there's some code that
splits the current bucket if "ctx->bytes_parsed >= BYTE_COUNT_THRESHOLD."

Can somebody explain the rationale for this?  It seems redundant to be
splitting the data into smaller chunks in a content filter; I'd expect
mod_include to defer network block sizing to downstream filters.  In the
profile data I'm looking at currently, this check accounts for 35% of the
total run time of find_start_sequence, so there's some performance to
be gained if the "ctx->bytes_parsed >= BYTE_COUNT_THRESHOLD" check can
be eliminated.

--Brian

Re: [PATCH] Re: chunking of content in mod_include?

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.

From: "Ryan Bloom" <rb...@covalent.net>
Sent: Tuesday, August 28, 2001 8:57 AM

> On Monday 27 August 2001 23:27, Brian Pane wrote:
> 
> > How does this patch look?  It does the check only on bucket
> > boundaries, and it pushes out the buffered content if either:
> >   * the amount of buffered content is too large, or
> >   * the apr_bucket_read would block.
> >
> > +        if (rv == APR_EAGAIN) {
> 
> This needs to be if (APR_STATUS_IS_EAGAIN(rv)) {
> 
> EAGAIN is one of those annoying error conditions, that is completely non-portable,
> so we have to use a macro to check for it.

For those of you coming on to apr or httpd development recently (I count at least
five active patch submitters that haven't been here all that long) ... let me tell
a fast story.

Most non-unixes don't map directly to errno.  We started with a lookup table and
some other bogus stuff to try mapping errors.  It's horribly slow and a poor solution.
Even some Unix errors have this problem with EAGAIN/EWOULDBLOCK.  But PLEASE don't
simply code correctly in this one case, code consistently!

Anyone writing (rv == APR_ESOMETHING) is asking for problems.  That's why we implement
and maintain that APR_STATUS_IS_ESOMETHING() set of macros.  They are fast (no time
difference on Unix at all) and legible in mainline code.

Unless you are looking at an APR-defined status please use APR_STATUS_IS_EFOO to avoid
some difficult to debug bugs from creeping into the code.

Bill

Re: [PATCH] Re: chunking of content in mod_include?

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.

From: "Ryan Bloom" <rb...@covalent.net>
Sent: Tuesday, August 28, 2001 8:57 AM

> On Monday 27 August 2001 23:27, Brian Pane wrote:
> 
> > How does this patch look?  It does the check only on bucket
> > boundaries, and it pushes out the buffered content if either:
> >   * the amount of buffered content is too large, or
> >   * the apr_bucket_read would block.
> >
> > +        if (rv == APR_EAGAIN) {
> 
> This needs to be if (APR_STATUS_IS_EAGAIN(rv)) {
> 
> EAGAIN is one of those annoying error conditions, that is completely non-portable,
> so we have to use a macro to check for it.

For those of you coming on to apr or httpd development recently (I count at least
five active patch submitters that haven't been here all that long) ... let me tell
a fast story.

Most non-unixes don't map directly to errno.  We started with a lookup table and
some other bogus stuff to try mapping errors.  It's horribly slow and a poor solution.
Even some Unix errors have this problem with EAGAIN/EWOULDBLOCK.  But PLEASE don't
simply code correctly in this one case, code consistently!

Anyone writing (rv == APR_ESOMETHING) is asking for problems.  That's why we implement
and maintain that APR_STATUS_IS_ESOMETHING() set of macros.  They are fast (no time
difference on Unix at all) and legible in mainline code.

Unless you are looking at an APR-defined status please use APR_STATUS_IS_EFOO to avoid
some difficult to debug bugs from creeping into the code.

Bill

Re: [PATCH] Re: chunking of content in mod_include?

Posted by Ryan Bloom <rb...@covalent.net>.

Good patch, I have just a few fixes that need to be made to it.
Most of these are incredibly non-obvious or intuitive.

On Monday 27 August 2001 23:27, Brian Pane wrote:

> How does this patch look?  It does the check only on bucket
> boundaries, and it pushes out the buffered content if either:
>   * the amount of buffered content is too large, or
>   * the apr_bucket_read would block.
>
> If this technique looks sound, I can create another patch that
> does the same thing for find_end_sequence.
>
> --Brian
>
>
> -        c = buf;
> -        while (c < buf + len) {
> -            if (ctx->bytes_parsed >= BYTE_COUNT_THRESHOLD) {
> -                apr_bucket *start_bucket;
>
> -                if (ctx->head_start_index > 0) {
> -                    start_index  = ctx->head_start_index;
> -                    start_bucket = ctx->head_start_bucket;
> +        rv = apr_bucket_read(dptr, &buf, &len, APR_NONBLOCK_READ);
> +        if (rv == APR_EAGAIN) {

This needs to be if (APR_STATUS_IS_EAGAIN(rv)) {

EAGAIN is one of those annoying error conditions, that is completely non-portable,
so we have to use a macro to check for it.

> +                rv = ap_pass_brigade(f->next, *bb);
> +                *bb = remainder;
> +                if (rv != APR_SUCCESS) {
> +                    return NULL;

This looks like a problem.  Basically, a future filter has just informed us that there
was a problem, but we are going to silently ignore it.  If we get a status other than
APR_SUCCESS from a future filter, then we _must_ return that error code to the
filter that called us.  If we don't, then we will continue to send data, even though
lower filters think that everything is done, and everything gets confused.  I have
closed this bug in mod_include multiple times, so I am VERY aware of it.  :-)

> +            rv = apr_bucket_read(dptr, &buf, &len, APR_BLOCK_READ);
> +        }
> +        if (rv != APR_SUCCESS) {
> +            return NULL;
> +        }

Same as above.  You can't lose that error code.

Fix those three things, and I'll give it a +1.

Ryan

______________________________________________________________
Ryan Bloom				rbb@apache.org
Covalent Technologies			rbb@covalent.net
--------------------------------------------------------------

Re: chunking of content in mod_include?

Posted by Greg Stein <gs...@lyra.org>.

On Mon, Aug 27, 2001 at 08:24:50PM -0700, Ryan Bloom wrote:
>...
> > This code was put in because we were seeing the mod_include code buffer up
> > the entire collection of buckets until an SSI tag was found. If you have a
> > 200 MB file with an SSI tag footer at the end of the brigade, the whole
> > thing was being buffered. How do you propose that this be done differently?
> 
> I don't care if mod_include buffers 200 Megs, as long as it is constantly doing
> something with the data.  If we have a 200 Meg file that has no SSI tags in
> it, but we can get all 200 Meg at one time, then we shouldn't have any problem
> just scanning through the entire 200 Megs very quickly.  Worst case, we do what
> Brian suggested, and just check the bucket length once we have finished
> processing all of the data in that bucket.  The buffering only becomes a
> real problem when we sit waiting for data from a CGI or some other slow
> content generator.

It becomes a problem if you load too much of that file into the heap. Thus,
we have the threshold.

The threshold continues to make sense. Using a bucket boundary is a very
good optimization. If a FILE bucket manages to map the whole sucker into
memory, and returns one massive 200M memory buffer to scan ... fine. But
after that scan, we probably ought to deliver it down the stack :-)

But if that FILE continues spitting out 8k HEAP buffers, then we also need
to flush them out of memory.

Brian's recently posted patch looks good for this.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

[PATCH] Re: chunking of content in mod_include?

Posted by Brian Pane <bp...@pacbell.net>.

Thanks to everybody who reviewed the first patch.  This new
version adds error-checking fixes and moves the brigade-splitting
back into send_parsed_content.  I also added a nonblocking read
to find_end_sequence.

--Brian

Index: mod_include.h
===================================================================
RCS file: /home/cvspublic/httpd-2.0/modules/filters/mod_include.h,v
retrieving revision 1.21
diff -u -r1.21 mod_include.h
--- mod_include.h    2001/08/27 20:25:42    1.21
+++ mod_include.h    2001/08/29 04:48:12
@@ -138,6 +138,8 @@
     int          if_nesting_level;
     apr_size_t   parse_pos;
     int          bytes_parsed;
+    apr_status_t status;
+    int          output_now;
    
     apr_bucket   *head_start_bucket;
     apr_size_t   head_start_index;
Index: mod_include.c
===================================================================
RCS file: /home/cvspublic/httpd-2.0/modules/filters/mod_include.c,v
retrieving revision 1.136
diff -u -r1.136 mod_include.c
--- mod_include.c    2001/08/27 20:25:42    1.136
+++ mod_include.c    2001/08/29 04:48:13
@@ -144,7 +144,7 @@
  * returns NULL if no match found.
  */
 static apr_bucket *find_start_sequence(apr_bucket *dptr, include_ctx_t 
*ctx,
-                                      apr_bucket_brigade *bb, int 
*do_cleanup)
+                                       apr_bucket_brigade *bb, int 
*do_cleanup)
 {
     apr_size_t len;
     const char *c;
@@ -156,39 +156,56 @@
     *do_cleanup = 0;
 
     do {
+        apr_status_t rv;
+        int read_done = 0;
+
         if (APR_BUCKET_IS_EOS(dptr)) {
             break;
         }
-        apr_bucket_read(dptr, &buf, &len, APR_BLOCK_READ);
-        /* XXX handle retcodes */
-        if (len == 0) { /* end of pipe? */
-            break;
+
+        if (ctx->bytes_parsed >= BYTE_COUNT_THRESHOLD) {
+            ctx->output_now = 1;
         }
-        c = buf;
-        while (c < buf + len) {
-            if (ctx->bytes_parsed >= BYTE_COUNT_THRESHOLD) {
-                apr_bucket *start_bucket;
+        else if (ctx->bytes_parsed > 0) {
+            rv = apr_bucket_read(dptr, &buf, &len, APR_NONBLOCK_READ);
+            read_done = 1;
+            if (APR_STATUS_IS_EAGAIN(rv)) {
+                ctx->output_now = 1;
+            }
+        }
 
-                if (ctx->head_start_index > 0) {
-                    start_index  = ctx->head_start_index;
-                    start_bucket = ctx->head_start_bucket;
-                }
-                else {
-                    start_index  = (c - buf);
-                    start_bucket = dptr;
-                }
+        if (ctx->output_now) {
+            apr_size_t start_index;
+            apr_bucket *start_bucket;
+            apr_bucket_brigade *remainder;
+            if (ctx->head_start_index > 0) {
+                start_bucket = ctx->head_start_bucket;
                 apr_bucket_split(start_bucket, start_index);
-                tmp_bkt = APR_BUCKET_NEXT(start_bucket);
-                if (ctx->head_start_index > 0) {
-                    ctx->head_start_index  = 0;
-                    ctx->head_start_bucket = tmp_bkt;
-                    ctx->parse_pos = 0;
-                    ctx->state = PRE_HEAD;
-                }
-
-                return tmp_bkt;
+                start_bucket = APR_BUCKET_NEXT(start_bucket);
+                ctx->head_start_index = 0;
+                ctx->head_start_bucket = start_bucket;
+                ctx->parse_pos = 0;
+                ctx->state = PRE_HEAD;
             }
+            else {
+                start_bucket = dptr;
+            }
+            return start_bucket;
+        }
 
+        if (!read_done) {
+            rv = apr_bucket_read(dptr, &buf, &len, APR_BLOCK_READ);
+        }
+        if (!APR_STATUS_IS_SUCCESS(rv)) {
+            ctx->status = rv;
+            return NULL;
+        }
+
+        if (len == 0) { /* end of pipe? */
+            break;
+        }
+        c = buf;
+        while (c < buf + len) {
             if (*c == str[ctx->parse_pos]) {
                 if (ctx->state == PRE_HEAD) {
                     ctx->state             = PARSE_HEAD;
@@ -256,11 +273,40 @@
     const char *str = ENDING_SEQUENCE;
 
     do {
+        apr_status_t rv;
+        int read_done = 0;
+
         if (APR_BUCKET_IS_EOS(dptr)) {
             break;
+        }
+        if (ctx->bytes_parsed >= BYTE_COUNT_THRESHOLD) {
+            ctx->output_now = 1;
         }
-        apr_bucket_read(dptr, &buf, &len, APR_BLOCK_READ);
-        /* XXX handle retcodes */
+        else if (ctx->bytes_parsed > 0) {
+            rv = apr_bucket_read(dptr, &buf, &len, APR_NONBLOCK_READ);
+            read_done = 1;
+            if (APR_STATUS_IS_EAGAIN(rv)) {
+                ctx->output_now = 1;
+            }
+        }
+
+        if (ctx->output_now) {
+            if (ctx->state == PARSE_DIRECTIVE) {
+                /* gonna start over parsing the directive next time 
through */
+                ctx->directive_length = 0;
+                ctx->tag_length       = 0;
+            }
+            return dptr;
+        }
+
+        if (!read_done) {
+            rv = apr_bucket_read(dptr, &buf, &len, APR_BLOCK_READ);
+        }
+        if (!APR_STATUS_IS_SUCCESS(rv)) {
+            ctx->status = rv;
+            return NULL;
+        }
+
         if (len == 0) { /* end of pipe? */
             break;
         }
@@ -271,15 +317,6 @@
             c = buf;
         }
         while (c < buf + len) {
-            if (ctx->bytes_parsed >= BYTE_COUNT_THRESHOLD) {
-                if (ctx->state == PARSE_DIRECTIVE) {
-                    /* gonna start over parsing the directive next time 
through */
-                    ctx->directive_length = 0;
-                    ctx->tag_length       = 0;
-                }
-                return dptr;
-            }
-
             if (*c == str[ctx->parse_pos]) {
                 if (ctx->state != PARSE_TAIL) {
                     ctx->state             = PARSE_TAIL;
@@ -2364,6 +2401,9 @@
             apr_size_t cleanup_bytes = ctx->parse_pos;
 
             tmp_dptr = find_start_sequence(dptr, ctx, *bb, &do_cleanup);
+            if (!APR_STATUS_IS_SUCCESS(ctx->status)) {
+                return ctx->status;
+            }
 
             /* The few bytes stored in the ssi_tag_brigade turned out 
not to
              * be a tag after all. This can only happen if the starting
@@ -2401,7 +2441,9 @@
                     dptr = APR_BRIGADE_SENTINEL(*bb);
                 }
             }
-            else if ((tmp_dptr != NULL) && (ctx->bytes_parsed >= 
BYTE_COUNT_THRESHOLD)) {
+            else if ((tmp_dptr != NULL) &&
+                     (ctx->output_now ||
+                      (ctx->bytes_parsed >= BYTE_COUNT_THRESHOLD))) {
                                /* Send the large chunk of pre-tag 
bytes...  */
                 tag_and_after = apr_brigade_split(*bb, tmp_dptr);
                 rv = ap_pass_brigade(f->next, *bb);
@@ -2411,6 +2453,7 @@
                 *bb  = tag_and_after;
                 dptr = tmp_dptr;
                 ctx->bytes_parsed = 0;
+                ctx->output_now = 0;
             }
             else if (tmp_dptr == NULL) { /* There was no possible SSI 
tag in the */
                 dptr = APR_BRIGADE_SENTINEL(*bb);  /* remainder of this 
brigade...    */
@@ -2423,6 +2466,9 @@
              (ctx->state == PARSE_TAIL))       &&
             (dptr != APR_BRIGADE_SENTINEL(*bb))) {
             tmp_dptr = find_end_sequence(dptr, ctx, *bb);
+            if (!APR_STATUS_IS_SUCCESS(ctx->status)) {
+                return ctx->status;
+            }
 
             if (tmp_dptr != NULL) {
                 dptr = tmp_dptr;  /* Adjust bucket pos... */
@@ -2438,11 +2484,13 @@
                     APR_BRIGADE_CONCAT(ctx->ssi_tag_brigade, *bb);
                     *bb = tag_and_after;
                 }
-                else if (ctx->bytes_parsed >= BYTE_COUNT_THRESHOLD) {
+                else if (ctx->output_now ||
+                         (ctx->bytes_parsed >= BYTE_COUNT_THRESHOLD)) {
                     SPLIT_AND_PASS_PRETAG_BUCKETS(*bb, ctx, f->next, rv);
                     if (rv != APR_SUCCESS) {
                         return rv;
                     }
+                    ctx->output_now = 0;
                 }
             }
             else {
@@ -2715,6 +2763,7 @@
                 ctx->flags |= FLAG_NO_EXEC;
             }
             ctx->ssi_tag_brigade = apr_brigade_create(f->c->pool);
+            ctx->status = APR_SUCCESS;
 
             apr_cpystrn(ctx->error_str, conf->default_error_msg, 
sizeof(ctx->error_str));
             apr_cpystrn(ctx->time_str, conf->default_time_fmt, 
sizeof(ctx->time_str));

Re: [PATCH] Re: chunking of content in mod_include?

Posted by "Paul J. Reder" <re...@remulak.net>.

In addition to Ryan's comments I have the following comments...

Brian Pane wrote:
> How does this patch look?  It does the check only on bucket
> boundaries, and it pushes out the buffered content if either:
>   * the amount of buffered content is too large, or
>   * the apr_bucket_read would block.
> 
> If this technique looks sound, I can create another patch that
> does the same thing for find_end_sequence.
> 
> --Brian
> 
> Index: mod_include.c
> ===================================================================
> RCS file: /home/cvspublic/httpd-2.0/modules/filters/mod_include.c,v
> retrieving revision 1.136
> diff -u -r1.136 mod_include.c
> --- mod_include.c    2001/08/27 20:25:42    1.136
> +++ mod_include.c    2001/08/28 06:16:38
> @@ -143,9 +143,10 @@
>   * first byte of the BEGINNING_SEQUENCE (after finding a complete
> match) or it
>   * returns NULL if no match found.
>   */
> -static apr_bucket *find_start_sequence(apr_bucket *dptr, include_ctx_t
> *ctx,
> -                                      apr_bucket_brigade *bb, int
> *do_cleanup)
> +static apr_bucket *find_start_sequence(apr_bucket *dptr, ap_filter_t *f,
> +                                      apr_bucket_brigade **bb, int
> *do_cleanup)
>  {
> +    include_ctx_t *ctx = f->ctx;
>      apr_size_t len;
>      const char *c;
>      const char *buf;
> @@ -156,39 +157,43 @@
>      *do_cleanup = 0;
> 
>      do {
> +        apr_status_t rv;
> +
>          if (APR_BUCKET_IS_EOS(dptr)) {
>              break;
>          }
> -        apr_bucket_read(dptr, &buf, &len, APR_BLOCK_READ);
> -        /* XXX handle retcodes */
> -        if (len == 0) { /* end of pipe? */
> -            break;
> +
> +        if ((ctx->bytes_parsed >= BYTE_COUNT_THRESHOLD) &&
> +            !APR_BRIGADE_EMPTY(*bb)) {
> +            apr_bucket_brigade *remainder = apr_brigade_split(*bb, dptr);

This should check ctx->head_start_bucket. If it is non-NULL, use it instead
of dptr in the brigade split or you may be passing part of the tag along.

> +            rv = ap_pass_brigade(f->next, *bb);
> +            *bb = remainder;

You need to reset ctx->bytes_parsed to 0 here or this will trip again.

> +            if (rv != APR_SUCCESS) {
> +                return NULL;
> +            }

You may want to emulate the way the old code worked for threshold processing.
It split the bucket, but you don't *have* to do that. The important part is
what it did with the returned pointer and the context. It allowed the
code in send_parsed_content to do the actual pass_brigade. Thus the code
to split and pass the brigade existed in one common place for the
find_start_sequence and find_end_sequence code. So all you would have to 
do here is reset the context and return a pointer to the head_start_bucket.
You would do the same thing in find_end_sequence. This would limit the
duplicated code to one set of code instead of four.

>          }
> -        c = buf;
> -        while (c < buf + len) {
> -            if (ctx->bytes_parsed >= BYTE_COUNT_THRESHOLD) {
> -                apr_bucket *start_bucket;
> 
> -                if (ctx->head_start_index > 0) {
> -                    start_index  = ctx->head_start_index;
> -                    start_bucket = ctx->head_start_bucket;
> +        rv = apr_bucket_read(dptr, &buf, &len, APR_NONBLOCK_READ);
> +        if (rv == APR_EAGAIN) {
> +            if (!APR_BRIGADE_EMPTY(*bb)) {
> +                apr_bucket_brigade *remainder = apr_brigade_split(*bb,
> dptr);
> +                rv = ap_pass_brigade(f->next, *bb);
> +                *bb = remainder;

Same comments here...

> +                if (rv != APR_SUCCESS) {
> +                    return NULL;
>                  }
> -                else {
> -                    start_index  = (c - buf);
> -                    start_bucket = dptr;
> -                }
> -                apr_bucket_split(start_bucket, start_index);
> -                tmp_bkt = APR_BUCKET_NEXT(start_bucket);
> -                if (ctx->head_start_index > 0) {
> -                    ctx->head_start_index  = 0;
> -                    ctx->head_start_bucket = tmp_bkt;
> -                    ctx->parse_pos = 0;
> -                    ctx->state = PRE_HEAD;
> -                }
> -
> -                return tmp_bkt;
>              }
> +            rv = apr_bucket_read(dptr, &buf, &len, APR_BLOCK_READ);
> +        }
> +        if (rv != APR_SUCCESS) {
> +            return NULL;
> +        }
> 
> +        if (len == 0) { /* end of pipe? */
> +            break;
> +        }
> +        c = buf;
> +        while (c < buf + len) {
>              if (*c == str[ctx->parse_pos]) {
>                  if (ctx->state == PRE_HEAD) {
>                      ctx->state             = PARSE_HEAD;
> @@ -244,7 +249,7 @@
>              ctx->bytes_parsed++;
>          }
>          dptr = APR_BUCKET_NEXT(dptr);
> -    } while (dptr != APR_BRIGADE_SENTINEL(bb));
> +    } while (dptr != APR_BRIGADE_SENTINEL(*bb));
>      return NULL;
>  }
> 
> @@ -2363,7 +2368,7 @@
>              int do_cleanup = 0;
>              apr_size_t cleanup_bytes = ctx->parse_pos;
> 
> -            tmp_dptr = find_start_sequence(dptr, ctx, *bb, &do_cleanup);
> +            tmp_dptr = find_start_sequence(dptr, f, bb, &do_cleanup);
> 
>              /* The few bytes stored in the ssi_tag_brigade turned out
> not to
>               * be a tag after all. This can only happen if the starting

Re: [PATCH] Re: chunking of content in mod_include?

Posted by "Paul J. Reder" <re...@remulak.net>.

"Paul J. Reder" wrote:
> 
> Brian Pane wrote:
> > How does this patch look?  It does the check only on bucket
> > boundaries, and it pushes out the buffered content if either:
> >   * the amount of buffered content is too large, or
> >   * the apr_bucket_read would block.
> >
> > If this technique looks sound, I can create another patch that
> > does the same thing for find_end_sequence.
> >
> 
> I have some concerns about the way this is implemented. It doesn't
> look like it correctly keeps the context information when splitting
> the brigade. But the basic idea looks good.
> 
> I am looking at the code more carefully and will post another note
> or an updated patch soon.

Upon further checking I have determined that this does have the
potential problem that it may incorrectly pass along part of the tag. 

+        if ((ctx->bytes_parsed >= BYTE_COUNT_THRESHOLD) &&
+            !APR_BRIGADE_EMPTY(*bb)) {
+            apr_bucket_brigade *remainder = apr_brigade_split(*bb, dptr);
+            rv = ap_pass_brigade(f->next, *bb);
+            *bb = remainder;
+            if (rv != APR_SUCCESS) {
+                return NULL;
+            }
         }

The brigade split should be performed against ctx->head_start_bucket if
it is non-NULL. Part of the tag might be in a previous bucket, but the
current bucket is the one that throws it over the limit.

I need to go to a meeting now, but I will continue looking at this when
I get back. 

I will present all of my comments then. It still looks basically sound,
just needs a little tweaking.

Paul J. Reder

Re: [PATCH] Re: chunking of content in mod_include?

Posted by "Paul J. Reder" <re...@remulak.net>.

Brian Pane wrote:
> How does this patch look?  It does the check only on bucket
> boundaries, and it pushes out the buffered content if either:
>   * the amount of buffered content is too large, or
>   * the apr_bucket_read would block.
> 
> If this technique looks sound, I can create another patch that
> does the same thing for find_end_sequence.
> 

I have some concerns about the way this is implemented. It doesn't
look like it correctly keeps the context information when splitting
the brigade. But the basic idea looks good.

I am looking at the code more carefully and will post another note
or an updated patch soon.

Paul J. Reder

[PATCH] Re: chunking of content in mod_include?

Posted by Brian Pane <bp...@pacbell.net>.

Ryan Bloom wrote:

>On Monday 27 August 2001 19:25, Paul J. Reder wrote:
>
>>Ryan Bloom wrote:
>>
>>>On Monday 27 August 2001 16:05, Brian Pane wrote:
>>>
>>>>In mod_include's find_start_sequence function, there's some code that
>>>>splits the current bucket if "ctx->bytes_parsed >=
>>>>BYTE_COUNT_THRESHOLD."
>>>>
>>>>Can somebody explain the rationale for this?  It seems redundant to be
>>>>splitting the data into smaller chunks in a content filter; I'd expect
>>>>mod_include to defer network block sizing to downstream filters.  In
>>>>the profile data I'm looking at currently, this check accounts for 35%
>>>>of the total run time of find_start_sequence, so there's some
>>>>performance to be gained if the "ctx->bytes_parsed >=
>>>>BYTE_COUNT_THRESHOLD" check can be eliminated.
>>>>
>>>It is used to ensure that we don't buffer all the data in mod_include. 
>>>It isn't really done correctly though, because what we should be doing,
>>>is just continuing to read as much data as possible, but as soon as we
>>>can't read something, send what we have down the filter stack.
>>>
>>>This variable, basically ensures we don't keep reading all the data until
>>>we process the whole file, or reach the first tag.
>>>
>>In what manner do you mean "as soon as we can't read something"? It is my
>>understanding that the bucket code hides reading delays from the
>>mod_include code. If that is true how would the mod_include code know when
>>
>
>The bucket does not hide reading delays at all.  Basically, you call apr_bucket_read
>with APR_NONBLOCK_READ, and when you get a return code of APR_EAGAIN,
>you send what you have, and then call apr_bucket_read with APR_BLOCK_READ.
>
>>to send a chunk along? Are you saying the bucket code should do some magic
>>like send all buckets in the brigade up to the current one? This would
>>wreak havoc on code like mod_include that may be setting aside or tagging
>>buckets for replacement when the end of the tag is found.
>>
>
>Huh?  The bucket code doesn't ever send data down the filter stack unless you
>tell it to.  Take a look at the content-length filter to see what I mean.
>
>>This code was put in because we were seeing the mod_include code buffer up
>>the entire collection of buckets until an SSI tag was found. If you have a
>>200 MB file with an SSI tag footer at the end of the brigade, the whole
>>thing was being buffered. How do you propose that this be done differently?
>>
>
>I don't care if mod_include buffers 200 Megs, as long as it is constantly doing
>something with the data.  If we have a 200 Meg file that has no SSI tags in
>it, but we can get all 200 Meg at one time, then we shouldn't have any problem
>just scanning through the entire 200 Megs very quickly.  Worst case, we do what
>Brian suggested, and just check the bucket length once we have finished
>processing all of the data in that bucket.  The buffering only becomes a
>real problem when we sit waiting for data from a CGI or some other slow
>content generator.
>

How does this patch look?  It does the check only on bucket
boundaries, and it pushes out the buffered content if either:
  * the amount of buffered content is too large, or
  * the apr_bucket_read would block.

If this technique looks sound, I can create another patch that
does the same thing for find_end_sequence.

--Brian


Index: mod_include.c
===================================================================
RCS file: /home/cvspublic/httpd-2.0/modules/filters/mod_include.c,v
retrieving revision 1.136
diff -u -r1.136 mod_include.c
--- mod_include.c    2001/08/27 20:25:42    1.136
+++ mod_include.c    2001/08/28 06:16:38
@@ -143,9 +143,10 @@
  * first byte of the BEGINNING_SEQUENCE (after finding a complete 
match) or it
  * returns NULL if no match found.
  */
-static apr_bucket *find_start_sequence(apr_bucket *dptr, include_ctx_t 
*ctx,
-                                      apr_bucket_brigade *bb, int 
*do_cleanup)
+static apr_bucket *find_start_sequence(apr_bucket *dptr, ap_filter_t *f,
+                                      apr_bucket_brigade **bb, int 
*do_cleanup)
 {
+    include_ctx_t *ctx = f->ctx;
     apr_size_t len;
     const char *c;
     const char *buf;
@@ -156,39 +157,43 @@
     *do_cleanup = 0;
 
     do {
+        apr_status_t rv;
+
         if (APR_BUCKET_IS_EOS(dptr)) {
             break;
         }
-        apr_bucket_read(dptr, &buf, &len, APR_BLOCK_READ);
-        /* XXX handle retcodes */
-        if (len == 0) { /* end of pipe? */
-            break;
+
+        if ((ctx->bytes_parsed >= BYTE_COUNT_THRESHOLD) &&
+            !APR_BRIGADE_EMPTY(*bb)) {
+            apr_bucket_brigade *remainder = apr_brigade_split(*bb, dptr);
+            rv = ap_pass_brigade(f->next, *bb);
+            *bb = remainder;
+            if (rv != APR_SUCCESS) {
+                return NULL;
+            }
         }
-        c = buf;
-        while (c < buf + len) {
-            if (ctx->bytes_parsed >= BYTE_COUNT_THRESHOLD) {
-                apr_bucket *start_bucket;
 
-                if (ctx->head_start_index > 0) {
-                    start_index  = ctx->head_start_index;
-                    start_bucket = ctx->head_start_bucket;
+        rv = apr_bucket_read(dptr, &buf, &len, APR_NONBLOCK_READ);
+        if (rv == APR_EAGAIN) {
+            if (!APR_BRIGADE_EMPTY(*bb)) {
+                apr_bucket_brigade *remainder = apr_brigade_split(*bb, 
dptr);
+                rv = ap_pass_brigade(f->next, *bb);
+                *bb = remainder;
+                if (rv != APR_SUCCESS) {
+                    return NULL;
                 }
-                else {
-                    start_index  = (c - buf);
-                    start_bucket = dptr;
-                }
-                apr_bucket_split(start_bucket, start_index);
-                tmp_bkt = APR_BUCKET_NEXT(start_bucket);
-                if (ctx->head_start_index > 0) {
-                    ctx->head_start_index  = 0;
-                    ctx->head_start_bucket = tmp_bkt;
-                    ctx->parse_pos = 0;
-                    ctx->state = PRE_HEAD;
-                }
-
-                return tmp_bkt;
             }
+            rv = apr_bucket_read(dptr, &buf, &len, APR_BLOCK_READ);
+        }
+        if (rv != APR_SUCCESS) {
+            return NULL;
+        }
 
+        if (len == 0) { /* end of pipe? */
+            break;
+        }
+        c = buf;
+        while (c < buf + len) {
             if (*c == str[ctx->parse_pos]) {
                 if (ctx->state == PRE_HEAD) {
                     ctx->state             = PARSE_HEAD;
@@ -244,7 +249,7 @@
             ctx->bytes_parsed++;
         }
         dptr = APR_BUCKET_NEXT(dptr);
-    } while (dptr != APR_BRIGADE_SENTINEL(bb));
+    } while (dptr != APR_BRIGADE_SENTINEL(*bb));
     return NULL;
 }
 
@@ -2363,7 +2368,7 @@
             int do_cleanup = 0;
             apr_size_t cleanup_bytes = ctx->parse_pos;
 
-            tmp_dptr = find_start_sequence(dptr, ctx, *bb, &do_cleanup);
+            tmp_dptr = find_start_sequence(dptr, f, bb, &do_cleanup);
 
             /* The few bytes stored in the ssi_tag_brigade turned out 
not to
              * be a tag after all. This can only happen if the starting

Re: chunking of content in mod_include?

Posted by Ryan Bloom <rb...@covalent.net>.

On Monday 10 September 2001 18:02, dean gaudet wrote:
> On Mon, 27 Aug 2001, Ryan Bloom wrote:
> > I don't care if mod_include buffers 200 Megs, as long as it is constantly
> > doing something with the data.  If we have a 200 Meg file that has no SSI
> > tags in it, but we can get all 200 Meg at one time, then we shouldn't
> > have any problem just scanning through the entire 200 Megs very quickly. 
> > Worst case, we do what Brian suggested, and just check the bucket length
> > once we have finished processing all of the data in that bucket.  The
> > buffering only becomes a real problem when we sit waiting for data from a
> > CGI or some other slow content generator.
>
> when you say "buffers 200 Megs" up there do you mean it's mmap()d?

yeah.

Ryan

>
> (i think that some of the page freeing decisions in linux are based on
> whether a page is currently mapped or not, and a 200MB file could cause a
> bit of memory shortage...)
>
> -dean

-- 

______________________________________________________________
Ryan Bloom				rbb@apache.org
Covalent Technologies			rbb@covalent.net
--------------------------------------------------------------

Re: chunking of content in mod_include?

Posted by Cliff Woolley <cl...@yahoo.com>.

On Tue, 11 Sep 2001, Aaron Bannert wrote:

> I'm sorry if I'm completely uninformed here, but I've been thinking
> of this for the last day or so and I don't quite understand why we wouldn't
> incrementally mmap() and then munmap() different segments of the file
> as we walk through it (at least for mod_include). Wouldn't that save
> at least one copy of the data, not to mention the free-list management
> overhead we incur from malloc()ing/free()ing all those 8K buckets?

It's probably worth profiling...

You just have to be sure that you don't keep the whole file MMAPed at one
time, which is pretty much an impossible guarantee to make from within
file_read().  Callers can make that guarantee, but I don't know how they
communicate it to the buckets code in a clean way.

--Cliff

--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA

Re: chunking of content in mod_include?

Posted by Aaron Bannert <aa...@clove.org>.

On Mon, Sep 10, 2001 at 09:10:05PM -0400, Cliff Woolley wrote:
> On Mon, 10 Sep 2001, dean gaudet wrote:
> 
> > > I don't care if mod_include buffers 200 Megs, as long as it is
> > > constantly doing something with the data.  If we have a 200 Meg file
> > > that has no SSI tags in it, but we can get all 200 Meg at one time,
> > > then we shouldn't have any problem just scanning through the entire
> > > 200 Megs very quickly.  Worst case, we do what Brian suggested, and
> > > just check the bucket length once we have finished processing all of
> > > the data in that bucket.  The buffering only becomes a real problem
> > > when we sit waiting for data from a CGI or some other slow content
> > > generator.
> >
> > when you say "buffers 200 Megs" up there do you mean it's mmap()d?
> >
> > (i think that some of the page freeing decisions in linux are based on
> > whether a page is currently mapped or not, and a 200MB file could cause a
> > bit of memory shortage...)
> 
> It is possible for us to be talking about 200MB MMAP bucket here, yes.
> 
> But that won't ever happen in the typical case.  Some module (eg
> mod_file_cache) would have to have explicitly set up that 200MB MMAP
> bucket.
> 
> When you read a file bucket and it automatically MMAPs the file, it places
> an upper bound on file size for mmaping at 16MB.  Files bigger than that
> will be read onto the heap 8KB at a time (one 8KB hunk per heap bucket).
> In that case, mod_include will look at those 8KB heap buckets until it
> reaches one that puts it over its buffering threshold, then it will flush
> the data it's already examined down the filter chain.

I'm sorry if I'm completely uninformed here, but I've been thinking
of this for the last day or so and I don't quite understand why we wouldn't
incrementally mmap() and then munmap() different segments of the file
as we walk through it (at least for mod_include). Wouldn't that save
at least one copy of the data, not to mention the free-list management
overhead we incur from malloc()ing/free()ing all those 8K buckets?

-aaron

Re: chunking of content in mod_include?

Posted by Cliff Woolley <cl...@yahoo.com>.

On Mon, 10 Sep 2001, dean gaudet wrote:

> > I don't care if mod_include buffers 200 Megs, as long as it is
> > constantly doing something with the data.  If we have a 200 Meg file
> > that has no SSI tags in it, but we can get all 200 Meg at one time,
> > then we shouldn't have any problem just scanning through the entire
> > 200 Megs very quickly.  Worst case, we do what Brian suggested, and
> > just check the bucket length once we have finished processing all of
> > the data in that bucket.  The buffering only becomes a real problem
> > when we sit waiting for data from a CGI or some other slow content
> > generator.
>
> when you say "buffers 200 Megs" up there do you mean it's mmap()d?
>
> (i think that some of the page freeing decisions in linux are based on
> whether a page is currently mapped or not, and a 200MB file could cause a
> bit of memory shortage...)

It is possible for us to be talking about 200MB MMAP bucket here, yes.

But that won't ever happen in the typical case.  Some module (eg
mod_file_cache) would have to have explicitly set up that 200MB MMAP
bucket.

When you read a file bucket and it automatically MMAPs the file, it places
an upper bound on file size for mmaping at 16MB.  Files bigger than that
will be read onto the heap 8KB at a time (one 8KB hunk per heap bucket).
In that case, mod_include will look at those 8KB heap buckets until it
reaches one that puts it over its buffering threshold, then it will flush
the data it's already examined down the filter chain.

--Cliff

--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA

Re: chunking of content in mod_include?

Posted by dean gaudet <de...@arctic.org>.

On Mon, 27 Aug 2001, Ryan Bloom wrote:

> I don't care if mod_include buffers 200 Megs, as long as it is constantly doing
> something with the data.  If we have a 200 Meg file that has no SSI tags in
> it, but we can get all 200 Meg at one time, then we shouldn't have any problem
> just scanning through the entire 200 Megs very quickly.  Worst case, we do what
> Brian suggested, and just check the bucket length once we have finished
> processing all of the data in that bucket.  The buffering only becomes a
> real problem when we sit waiting for data from a CGI or some other slow
> content generator.

when you say "buffers 200 Megs" up there do you mean it's mmap()d?

(i think that some of the page freeing decisions in linux are based on
whether a page is currently mapped or not, and a 200MB file could cause a
bit of memory shortage...)

-dean

Re: chunking of content in mod_include?

Posted by Ryan Bloom <rb...@covalent.net>.

On Monday 27 August 2001 19:25, Paul J. Reder wrote:
> Ryan Bloom wrote:
> > On Monday 27 August 2001 16:05, Brian Pane wrote:
> > > In mod_include's find_start_sequence function, there's some code that
> > > splits the current bucket if "ctx->bytes_parsed >=
> > > BYTE_COUNT_THRESHOLD."
> > >
> > > Can somebody explain the rationale for this?  It seems redundant to be
> > > splitting the data into smaller chunks in a content filter; I'd expect
> > > mod_include to defer network block sizing to downstream filters.  In
> > > the profile data I'm looking at currently, this check accounts for 35%
> > > of the total run time of find_start_sequence, so there's some
> > > performance to be gained if the "ctx->bytes_parsed >=
> > > BYTE_COUNT_THRESHOLD" check can be eliminated.
> >
> > It is used to ensure that we don't buffer all the data in mod_include. 
> > It isn't really done correctly though, because what we should be doing,
> > is just continuing to read as much data as possible, but as soon as we
> > can't read something, send what we have down the filter stack.
> >
> > This variable, basically ensures we don't keep reading all the data until
> > we process the whole file, or reach the first tag.
>
> In what manner do you mean "as soon as we can't read something"? It is my
> understanding that the bucket code hides reading delays from the
> mod_include code. If that is true how would the mod_include code know when

The bucket does not hide reading delays at all.  Basically, you call apr_bucket_read
with APR_NONBLOCK_READ, and when you get a return code of APR_EAGAIN,
you send what you have, and then call apr_bucket_read with APR_BLOCK_READ.

> to send a chunk along? Are you saying the bucket code should do some magic
> like send all buckets in the brigade up to the current one? This would
> wreak havoc on code like mod_include that may be setting aside or tagging
> buckets for replacement when the end of the tag is found.

Huh?  The bucket code doesn't ever send data down the filter stack unless you
tell it to.  Take a look at the content-length filter to see what I mean.

> This code was put in because we were seeing the mod_include code buffer up
> the entire collection of buckets until an SSI tag was found. If you have a
> 200 MB file with an SSI tag footer at the end of the brigade, the whole
> thing was being buffered. How do you propose that this be done differently?

I don't care if mod_include buffers 200 Megs, as long as it is constantly doing
something with the data.  If we have a 200 Meg file that has no SSI tags in
it, but we can get all 200 Meg at one time, then we shouldn't have any problem
just scanning through the entire 200 Megs very quickly.  Worst case, we do what
Brian suggested, and just check the bucket length once we have finished
processing all of the data in that bucket.  The buffering only becomes a
real problem when we sit waiting for data from a CGI or some other slow
content generator.

Ryan
______________________________________________________________
Ryan Bloom                        	rbb@apache.org
Covalent Technologies			rbb@covalent.net
--------------------------------------------------------------

Re: chunking of content in mod_include?

Posted by Cliff Woolley <cl...@yahoo.com>.

On Mon, 27 Aug 2001, Brian Pane wrote:

> >The only thing I can think of is to add to and check the byte tally
> >at bucket boundaries. We might go over the BYTE_COUNT_THRESHOLD, but
> >the check wouldn't happen on every byte and there wouldn't need to be
> >a bucket split to send along the first part. Is this what you mean?
>
> I think checking on bucket boundaries would be better.  And to guard
> against the case where a single bucket might contain 200 MB of data,
> wouldn't it work to just check the bucket size right after the
> apr_bucket_read in find_start_sequence and split the bucket there if
> its size exceeds some reasonable threshold?

Let me second that idea... checking per-bucket would be WAY better IMO.

--Cliff


--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA

Re: chunking of content in mod_include?

Posted by Brian Pane <bp...@pacbell.net>.

Paul J. Reder wrote:

>Ryan Bloom wrote:
>
>>On Monday 27 August 2001 16:05, Brian Pane wrote:
>>
>>>In mod_include's find_start_sequence function, there's some code that
>>>splits the current bucket if "ctx->bytes_parsed >= BYTE_COUNT_THRESHOLD."
>>>
>>>Can somebody explain the rationale for this?  It seems redundant to be
>>>splitting the data into smaller chunks in a content filter; I'd expect
>>>mod_include to defer network block sizing to downstream filters.  In the
>>>profile data I'm looking at currently, this check accounts for 35% of the
>>>total run time of find_start_sequence, so there's some performance to
>>>be gained if the "ctx->bytes_parsed >= BYTE_COUNT_THRESHOLD" check can
>>>be eliminated.
>>>
>>It is used to ensure that we don't buffer all the data in mod_include.  It isn't
>>really done correctly though, because what we should be doing, is just continuing
>>to read as much data as possible, but as soon as we can't read something, send
>>what we have down the filter stack.
>>
>>This variable, basically ensures we don't keep reading all the data until we
>>process the whole file, or reach the first tag.
>>
>
>In what manner do you mean "as soon as we can't read something"? It is my understanding
>that the bucket code hides reading delays from the mod_include code. If that is true
>how would the mod_include code know when to send a chunk along? Are you saying the
>bucket code should do some magic like send all buckets in the brigade up to the
>current one? This would wreak havoc on code like mod_include that may be setting
>aside or tagging buckets for replacement when the end of the tag is found.
>
>This code was put in because we were seeing the mod_include code buffer up the entire
>collection of buckets until an SSI tag was found. If you have a 200 MB file with an
>SSI tag footer at the end of the brigade, the whole thing was being buffered. How do
>you propose that this be done differently?
>
>The only thing I can think of is to add to and check the byte tally at bucket
>boundaries. We might go over the BYTE_COUNT_THRESHOLD, but the check wouldn't
>happen on every byte and there wouldn't need to be a bucket split to send along
>the first part. Is this what you mean?
>
I think checking on bucket boundaries would be better.  And to guard 
against the
case where a single bucket might contain 200 MB of data, wouldn't it 
work to just
check the bucket size right after the apr_bucket_read in find_start_sequence
and split the bucket there if its size exceeds some reasonable threshold?

--Brian

Re: chunking of content in mod_include?

Posted by "Paul J. Reder" <re...@remulak.net>.

Ryan Bloom wrote:
> 
> On Monday 27 August 2001 16:05, Brian Pane wrote:
> > In mod_include's find_start_sequence function, there's some code that
> > splits the current bucket if "ctx->bytes_parsed >= BYTE_COUNT_THRESHOLD."
> >
> > Can somebody explain the rationale for this?  It seems redundant to be
> > splitting the data into smaller chunks in a content filter; I'd expect
> > mod_include to defer network block sizing to downstream filters.  In the
> > profile data I'm looking at currently, this check accounts for 35% of the
> > total run time of find_start_sequence, so there's some performance to
> > be gained if the "ctx->bytes_parsed >= BYTE_COUNT_THRESHOLD" check can
> > be eliminated.
> 
> It is used to ensure that we don't buffer all the data in mod_include.  It isn't
> really done correctly though, because what we should be doing, is just continuing
> to read as much data as possible, but as soon as we can't read something, send
> what we have down the filter stack.
> 
> This variable, basically ensures we don't keep reading all the data until we
> process the whole file, or reach the first tag.

In what manner do you mean "as soon as we can't read something"? It is my understanding
that the bucket code hides reading delays from the mod_include code. If that is true
how would the mod_include code know when to send a chunk along? Are you saying the
bucket code should do some magic like send all buckets in the brigade up to the
current one? This would wreak havoc on code like mod_include that may be setting
aside or tagging buckets for replacement when the end of the tag is found.

This code was put in because we were seeing the mod_include code buffer up the entire
collection of buckets until an SSI tag was found. If you have a 200 MB file with an
SSI tag footer at the end of the brigade, the whole thing was being buffered. How do
you propose that this be done differently?

The only thing I can think of is to add to and check the byte tally at bucket
boundaries. We might go over the BYTE_COUNT_THRESHOLD, but the check wouldn't
happen on every byte and there wouldn't need to be a bucket split to send along
the first part. Is this what you mean?

Paul J. Reder

Re: chunking of content in mod_include?

Posted by Ryan Bloom <rb...@covalent.net>.

On Monday 27 August 2001 16:05, Brian Pane wrote:
> In mod_include's find_start_sequence function, there's some code that
> splits the current bucket if "ctx->bytes_parsed >= BYTE_COUNT_THRESHOLD."
>
> Can somebody explain the rationale for this?  It seems redundant to be
> splitting the data into smaller chunks in a content filter; I'd expect
> mod_include to defer network block sizing to downstream filters.  In the
> profile data I'm looking at currently, this check accounts for 35% of the
> total run time of find_start_sequence, so there's some performance to
> be gained if the "ctx->bytes_parsed >= BYTE_COUNT_THRESHOLD" check can
> be eliminated.

It is used to ensure that we don't buffer all the data in mod_include.  It isn't
really done correctly though, because what we should be doing, is just continuing
to read as much data as possible, but as soon as we can't read something, send
what we have down the filter stack.

This variable, basically ensures we don't keep reading all the data until we
process the whole file, or reach the first tag.

Ryan

______________________________________________________________
Ryan Bloom                        	rbb@apache.org
Covalent Technologies			rbb@covalent.net
--------------------------------------------------------------