You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Rici Lake <ri...@ricilake.net> on 2005/04/22 23:41:50 UTC
Module code review (long). Was Re: RFC: Who owns a passed brigade?

On 22-Apr-05, at 6:19 AM, Nick Kew wrote:
>
>> However, I think it is a fallacy that a cleaned-up brigade is not too
>> big to wait for pool cleanup. The brigade itself is about four  
>> pointers;
>> and the corresponding pool cleanup is another four pointers, so  
>> that's a
>> total of around 32 bytes, not a lot. But suppose that a brigade is
>> created for every 8k of filtered data,
>
> Are there applications like that in the wild?  I'd have thought that
> sounds like the kind of case where you definitely want to re-use
> one brigade, with a brigade_cleanup after each 8Kb pass_brigade.

Indeed. But the filter does not know, in general, how much data it will  
be called upon to filter, nor how many times it will be invoked. Being  
invoked once every 8 kb should be pretty common, though, since that  
seems to be the recommendation on how often a filter should pass a  
brigade.

Are there applications like that in the wild? I don't know, but it  
seems likely that there are. None of the following examples are  
definitive, but they seem illustrative.

I cite these examples only because they are available; it was not an  
attempt to criticize anyone's code in particular, but rather to show  
the result of the lack of clear documentation on how to write an output  
filter.

In summary, of five modules examined, three of them create new brigades  
on every invocation without destroying the previous one. One of them  
fails to empty the passed brigade. One of them destroys the passed  
brigade. Two of them do one-time initialization on every invocation of  
the filter.

The only output filter I looked at which seems to conform to  
"brigades-are-retained, buckets-are-passed" is mod_deflate. This is the  
loop from mod_deflate.c, which seems to be a reasonable model:

   1   static apr_status_t deflate_out_filter(ap_filter_t *f,
   2                                          apr_bucket_brigade *bb)
   3   {
   // ... declarations
   4       if (APR_BRIGADE_EMPTY(bb)) {
   5           return APR_SUCCESS;
   6       }
   7       if (!ctx) {
               // ... determine whether we should filter the content
   8           /* We're cool with filtering this. */
   9           ctx = f->ctx = apr_pcalloc(r->pool, sizeof(*ctx));
  10           ctx->bb = apr_brigade_create(r->pool, f->c->bucket_alloc);
               // ... more initialization
  11       }
  12       while (!APR_BRIGADE_EMPTY(bb))
  13       {
  14           e = APR_BRIGADE_FIRST(bb);
               // ... check if e is metadata, process data in e
  15           apr_bucket_delete(e);
  16       }
  17       apr_brigade_cleanup(bb);
  18       return APR_SUCCESS;
  19   }

Notes:
lines 4-6
    these only serve to avoid initialization in the event that the filter
    only receives empty brigades. I don't know if this is necessary.
line 7
    all initialization is performed only once, if the ctx structure has
    not yet been created
line 12, 15
    all buckets are removed from the brigade and deleted
line 17
    the call to cleanup is presumably redundant since there is no break
    in the while block. But there could be.


mod_case_filter is supposedly an example of how to write filters, so it  
is likely that its code will be copied by naive filter writers. Here is  
its loop, in essence:

   1   static apr_status_t CaseFilterOutFilter(ap_filter_t *f,
   2                                           apr_bucket_brigade *pbbIn)
   3       {
           // ... declarations
   4       pbbOut=apr_brigade_create(r->pool, c->bucket_alloc);
   5       for (pbktIn = APR_BRIGADE_FIRST(pbbIn);
   6            pbktIn != APR_BRIGADE_SENTINEL(pbbIn);
   7            pbktIn = APR_BUCKET_NEXT(pbktIn))
   8           {
               // ... check for EOS
   8           apr_bucket_read(pbktIn,&data,&len,APR_BLOCK_READ);
               // ... do the transform
  10           pbktOut = apr_bucket_heap_create(buf, len,  
apr_bucket_free,
  11                                            c->bucket_alloc);
  12           APR_BRIGADE_INSERT_TAIL(pbbOut,pbktOut);
  13           }
  14       return ap_pass_brigade(f->next,pbbOut);
  15       }

Notes:
line 4:
   a new brigade is created on every invocation.
lines 5-7:
   the input brigade is iterated but the buckets are not removed
line 14:
   the input brigade is not cleaned, and the output brigade is not  
destroyed.


OK, here's mod_charset_lite, which is flagged as a learning experience  
(for the author), but I believe can actually be found in the wild. The  
filter function itself was a bit much to include here (it does seem to  
empty its input brigade), but the key is in the following function  
which is called everytime the filter decides it has output available:

   1   static apr_status_t send_downstream(ap_filter_t *f, const char  
*tmp,
   2                                       apr_size_t len)
   3   {
           // ... declarations
   4       bb = apr_brigade_create(r->pool, c->bucket_alloc);
   5       b = apr_bucket_transient_create(tmp, len, c->bucket_alloc);
   6       APR_BRIGADE_INSERT_TAIL(bb, b);
   7       rv = ap_pass_brigade(f->next, bb);
   8       if (rv != APR_SUCCESS) {
   9           ctx->ees = EES_DOWNSTREAM;
  10       }
  11       return rv;
  12   }

Notes:
line 4:
   a new brigade is created on every output (8 kb)
line 11:
   the brigade is not destroyed


Now, mod_include. This is a complex module, but stripped to its essence  
with respect to bucket_brigade handling, we have:

   1   static apr_status_t send_parsed_content(ap_filter_t *f,  
apr_bucket_brigade *bb)
   2   {
           // ... lots of stuff snipped
   3       apr_bucket *b = APR_BRIGADE_FIRST(bb);
   4       apr_bucket_brigade *pass_bb;
   5       pass_bb = apr_brigade_create(ctx->pool, f->c->bucket_alloc);

   6       /* loop over the current bucket brigade */
   7       while (b != APR_BRIGADE_SENTINEL(bb)) {
               // ... remove buckets from bb and process them,  
periodically
               // ... calling ap_pass_brigade
   8       /* if something's left over, pass it along */
   9       if (!APR_BRIGADE_EMPTY(pass_bb)) {
  10           rv = ap_pass_brigade(f->next, pass_bb);
  11       }
  12       else {
  13           rv = APR_SUCCESS;
  14           apr_brigade_destroy(pass_bb);
  15       }
  16       return rv;
  17   }

Notes
line 4:
   a new brigade is created on every invocation
line 14:
   the brigade is destroyed after being passed

While I was looking at this code, I realized that send_parsed_content  
is not the actual filter function. The actual filter function, again  
heavily edited:

   1   static apr_status_t includes_filter(ap_filter_t *f,  
apr_bucket_brigade *b)
   2   {
   3      if (!(ap_allow_options(r) & OPT_INCLUDES)) {
              // ... refuse to handle request
   4       }

   5       if (!f->ctx) {
               // ... initialize context
   6       }

   7       if ((parent = ap_get_module_config(r->request_config,  
&include_module))) {
   8           r->subprocess_env = r->main->subprocess_env;
   9           apr_pool_join(r->main->pool, r->pool);
  10           r->finfo.mtime = r->main->finfo.mtime;
  11       }
  12       else {
  13           /* we're not a nested include, so we create an initial
  14            * environment */
  15           ap_add_common_vars(r);
  16           ap_add_cgi_vars(r);
  17           add_include_vars(r, conf->default_time_fmt);
  18       }
           // ... more stuff having to do with Last-Modified and  
Content-Length
           // ... and creating QUERY_STRING from r->args
  19       return send_parsed_content(f, b);
  20   }

Notes
lines 7-18:
   these run on every invocation, including the calls to
   ap_add_common_vars and friends at line 15-17. I spent some time
   fruitlessly searching for the place where it avoids that on the
   second invocation, and finally tested it by applying it to a  
self-proxy:

<Location /inc>
     Allow from all
     ProxyPass http://freeb:8089
     ProxyPassReverse http://freeb:8089
     SetOutputFilter INCLUDES
</Location>

using the file:

<html><head><title>test</title></head>
<body>
<pre>
<!--#printenv-->
</pre>
<p>
........................................................................ 
.......
------ last line repeated 2048 times -----
<!--#set var="SCRIPT_NAME" value="/wouldn't/you/like/to/know"-->
<pre>
<!--#printenv-->
</pre>
</body></html>

As might be expected, the first printenv shows the actual SCRIPT_NAME,  
whereas the second one shows /wouldn't/you/like/to/know. Moving the  
<!--#set...> to just after the <body> reverses this: the first printenv  
shows the occulted SCRIPT_NAME, but since ap_add_cgi_vars has run again  
by the time the second printenv is encountered, the original value is  
shown by the second printenv.


Finally, mod_layout2, a third-party module actually in use:

   1   static apr_status_t layout_filter(ap_filter_t *f,  
apr_bucket_brigade *b) {
   2       if (r->main) {
   3           return ap_pass_brigade(f->next, b);
   4       }
   5       ap_table_setn(r->headers_out, "X-Powered-By",  
"ModLayout/"VERSION);
   6       cfg = ap_get_module_config(r->per_dir_config, &layout_module);
   7       if (cfg->uris_ignore) {
   8           if (table_find(cfg->uris_ignore, r->uri)) {
   9               return ap_pass_brigade(f->next, b);
  10           }
  11       }
  12       info = create_layout_request(r, cfg);
  13       if (ctx == NULL) {
  14           f->ctx = ctx = apr_pcalloc(f->r->pool, sizeof(*ctx));
  15           ctx->b = apr_brigade_create(f->r->pool,  
r->connection->bucket_alloc);
  16           ctx->output = NULL;
  17       }

  18       apr_table_unset(f->r->headers_out, "Content-Length");
  19       apr_table_unset(f->r->headers_out, "ETag");

  20       APR_BRIGADE_FOREACH(e, b) {
  21           if (APR_BUCKET_IS_EOS(e) || APR_BUCKET_IS_FLUSH  (e)) {
  22               info->f = f->next;
  23               info->b = ctx->b;
                   // ... snip: actually process accumulated input
  24               APR_BUCKET_REMOVE(e);
  25               APR_BRIGADE_INSERT_TAIL(ctx->b, e);
  26               return ap_pass_brigade(f->next, ctx->b);
  27           }
  28           if (APR_BUCKET_IS_FLUSH(e)) {
  29               continue;
  30           }
  31           apr_bucket_read(e, &str, &len, APR_NONBLOCK_READ);

  32           if (ctx->output) {
  33               ctx->output = apr_psprintf(r->pool,"%s%.*s",  
ctx->output, len, str);
  34           } else {
  35               ctx->output = apr_pstrndup(r->pool, str, len);
  36           }
  37       }

  38       apr_brigade_destroy(b);
  39       return APR_SUCCESS;
  40   }

Notes
lines 5-12:
   like mod_include, this module performs initialization on every  
invocation.
line 15:
   however, a single brigade is created for passing to the next filter
line 26:
   although the module destroys its received brigade (see below), it
   seems to assume that the next filter will not destroy the brigade
   it is passing
lines 32-36:
   incoming data is simply buffered in memory until an EOS or FLUSH.
   The latter seems to mean that a tag will not be recognized if it
   is interrupted by a flush bucket, but I haven't tested that.
line 38:
   the input buckets have not yet been deleted, but apr_brigade_destroy
   will do that. It will also destroy the brigade, though.