You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Rici Lake <ri...@ricilake.net> on 2005/04/22 23:41:50 UTC
Module code review (long). Was Re: RFC: Who owns a passed brigade?
On 22-Apr-05, at 6:19 AM, Nick Kew wrote:
>
>> However, I think it is a fallacy that a cleaned-up brigade is not too
>> big to wait for pool cleanup. The brigade itself is about four
>> pointers;
>> and the corresponding pool cleanup is another four pointers, so
>> that's a
>> total of around 32 bytes, not a lot. But suppose that a brigade is
>> created for every 8k of filtered data,
>
> Are there applications like that in the wild? I'd have thought that
> sounds like the kind of case where you definitely want to re-use
> one brigade, with a brigade_cleanup after each 8Kb pass_brigade.
Indeed. But the filter does not know, in general, how much data it will
be called upon to filter, nor how many times it will be invoked. Being
invoked once every 8 kb should be pretty common, though, since that
seems to be the recommendation on how often a filter should pass a
brigade.
Are there applications like that in the wild? I don't know, but it
seems likely that there are. None of the following examples are
definitive, but they seem illustrative.
I cite these examples only because they are available; it was not an
attempt to criticize anyone's code in particular, but rather to show
the result of the lack of clear documentation on how to write an output
filter.
In summary, of five modules examined, three of them create new brigades
on every invocation without destroying the previous one. One of them
fails to empty the passed brigade. One of them destroys the passed
brigade. Two of them do one-time initialization on every invocation of
the filter.
The only output filter I looked at which seems to conform to
"brigades-are-retained, buckets-are-passed" is mod_deflate. This is the
loop from mod_deflate.c, which seems to be a reasonable model:
1 static apr_status_t deflate_out_filter(ap_filter_t *f,
2 apr_bucket_brigade *bb)
3 {
// ... declarations
4 if (APR_BRIGADE_EMPTY(bb)) {
5 return APR_SUCCESS;
6 }
7 if (!ctx) {
// ... determine whether we should filter the content
8 /* We're cool with filtering this. */
9 ctx = f->ctx = apr_pcalloc(r->pool, sizeof(*ctx));
10 ctx->bb = apr_brigade_create(r->pool, f->c->bucket_alloc);
// ... more initialization
11 }
12 while (!APR_BRIGADE_EMPTY(bb))
13 {
14 e = APR_BRIGADE_FIRST(bb);
// ... check if e is metadata, process data in e
15 apr_bucket_delete(e);
16 }
17 apr_brigade_cleanup(bb);
18 return APR_SUCCESS;
19 }
Notes:
lines 4-6
these only serve to avoid initialization in the event that the filter
only receives empty brigades. I don't know if this is necessary.
line 7
all initialization is performed only once, if the ctx structure has
not yet been created
line 12, 15
all buckets are removed from the brigade and deleted
line 17
the call to cleanup is presumably redundant since there is no break
in the while block. But there could be.
mod_case_filter is supposedly an example of how to write filters, so it
is likely that its code will be copied by naive filter writers. Here is
its loop, in essence:
1 static apr_status_t CaseFilterOutFilter(ap_filter_t *f,
2 apr_bucket_brigade *pbbIn)
3 {
// ... declarations
4 pbbOut=apr_brigade_create(r->pool, c->bucket_alloc);
5 for (pbktIn = APR_BRIGADE_FIRST(pbbIn);
6 pbktIn != APR_BRIGADE_SENTINEL(pbbIn);
7 pbktIn = APR_BUCKET_NEXT(pbktIn))
8 {
// ... check for EOS
8 apr_bucket_read(pbktIn,&data,&len,APR_BLOCK_READ);
// ... do the transform
10 pbktOut = apr_bucket_heap_create(buf, len,
apr_bucket_free,
11 c->bucket_alloc);
12 APR_BRIGADE_INSERT_TAIL(pbbOut,pbktOut);
13 }
14 return ap_pass_brigade(f->next,pbbOut);
15 }
Notes:
line 4:
a new brigade is created on every invocation.
lines 5-7:
the input brigade is iterated but the buckets are not removed
line 14:
the input brigade is not cleaned, and the output brigade is not
destroyed.
OK, here's mod_charset_lite, which is flagged as a learning experience
(for the author), but I believe can actually be found in the wild. The
filter function itself was a bit much to include here (it does seem to
empty its input brigade), but the key is in the following function
which is called everytime the filter decides it has output available:
1 static apr_status_t send_downstream(ap_filter_t *f, const char
*tmp,
2 apr_size_t len)
3 {
// ... declarations
4 bb = apr_brigade_create(r->pool, c->bucket_alloc);
5 b = apr_bucket_transient_create(tmp, len, c->bucket_alloc);
6 APR_BRIGADE_INSERT_TAIL(bb, b);
7 rv = ap_pass_brigade(f->next, bb);
8 if (rv != APR_SUCCESS) {
9 ctx->ees = EES_DOWNSTREAM;
10 }
11 return rv;
12 }
Notes:
line 4:
a new brigade is created on every output (8 kb)
line 11:
the brigade is not destroyed
Now, mod_include. This is a complex module, but stripped to its essence
with respect to bucket_brigade handling, we have:
1 static apr_status_t send_parsed_content(ap_filter_t *f,
apr_bucket_brigade *bb)
2 {
// ... lots of stuff snipped
3 apr_bucket *b = APR_BRIGADE_FIRST(bb);
4 apr_bucket_brigade *pass_bb;
5 pass_bb = apr_brigade_create(ctx->pool, f->c->bucket_alloc);
6 /* loop over the current bucket brigade */
7 while (b != APR_BRIGADE_SENTINEL(bb)) {
// ... remove buckets from bb and process them,
periodically
// ... calling ap_pass_brigade
8 /* if something's left over, pass it along */
9 if (!APR_BRIGADE_EMPTY(pass_bb)) {
10 rv = ap_pass_brigade(f->next, pass_bb);
11 }
12 else {
13 rv = APR_SUCCESS;
14 apr_brigade_destroy(pass_bb);
15 }
16 return rv;
17 }
Notes
line 4:
a new brigade is created on every invocation
line 14:
the brigade is destroyed after being passed
While I was looking at this code, I realized that send_parsed_content
is not the actual filter function. The actual filter function, again
heavily edited:
1 static apr_status_t includes_filter(ap_filter_t *f,
apr_bucket_brigade *b)
2 {
3 if (!(ap_allow_options(r) & OPT_INCLUDES)) {
// ... refuse to handle request
4 }
5 if (!f->ctx) {
// ... initialize context
6 }
7 if ((parent = ap_get_module_config(r->request_config,
&include_module))) {
8 r->subprocess_env = r->main->subprocess_env;
9 apr_pool_join(r->main->pool, r->pool);
10 r->finfo.mtime = r->main->finfo.mtime;
11 }
12 else {
13 /* we're not a nested include, so we create an initial
14 * environment */
15 ap_add_common_vars(r);
16 ap_add_cgi_vars(r);
17 add_include_vars(r, conf->default_time_fmt);
18 }
// ... more stuff having to do with Last-Modified and
Content-Length
// ... and creating QUERY_STRING from r->args
19 return send_parsed_content(f, b);
20 }
Notes
lines 7-18:
these run on every invocation, including the calls to
ap_add_common_vars and friends at line 15-17. I spent some time
fruitlessly searching for the place where it avoids that on the
second invocation, and finally tested it by applying it to a
self-proxy:
<Location /inc>
Allow from all
ProxyPass http://freeb:8089
ProxyPassReverse http://freeb:8089
SetOutputFilter INCLUDES
</Location>
using the file:
<html><head><title>test</title></head>
<body>
<pre>
<!--#printenv-->
</pre>
<p>
........................................................................
.......
------ last line repeated 2048 times -----
<!--#set var="SCRIPT_NAME" value="/wouldn't/you/like/to/know"-->
<pre>
<!--#printenv-->
</pre>
</body></html>
As might be expected, the first printenv shows the actual SCRIPT_NAME,
whereas the second one shows /wouldn't/you/like/to/know. Moving the
<!--#set...> to just after the <body> reverses this: the first printenv
shows the occulted SCRIPT_NAME, but since ap_add_cgi_vars has run again
by the time the second printenv is encountered, the original value is
shown by the second printenv.
Finally, mod_layout2, a third-party module actually in use:
1 static apr_status_t layout_filter(ap_filter_t *f,
apr_bucket_brigade *b) {
2 if (r->main) {
3 return ap_pass_brigade(f->next, b);
4 }
5 ap_table_setn(r->headers_out, "X-Powered-By",
"ModLayout/"VERSION);
6 cfg = ap_get_module_config(r->per_dir_config, &layout_module);
7 if (cfg->uris_ignore) {
8 if (table_find(cfg->uris_ignore, r->uri)) {
9 return ap_pass_brigade(f->next, b);
10 }
11 }
12 info = create_layout_request(r, cfg);
13 if (ctx == NULL) {
14 f->ctx = ctx = apr_pcalloc(f->r->pool, sizeof(*ctx));
15 ctx->b = apr_brigade_create(f->r->pool,
r->connection->bucket_alloc);
16 ctx->output = NULL;
17 }
18 apr_table_unset(f->r->headers_out, "Content-Length");
19 apr_table_unset(f->r->headers_out, "ETag");
20 APR_BRIGADE_FOREACH(e, b) {
21 if (APR_BUCKET_IS_EOS(e) || APR_BUCKET_IS_FLUSH (e)) {
22 info->f = f->next;
23 info->b = ctx->b;
// ... snip: actually process accumulated input
24 APR_BUCKET_REMOVE(e);
25 APR_BRIGADE_INSERT_TAIL(ctx->b, e);
26 return ap_pass_brigade(f->next, ctx->b);
27 }
28 if (APR_BUCKET_IS_FLUSH(e)) {
29 continue;
30 }
31 apr_bucket_read(e, &str, &len, APR_NONBLOCK_READ);
32 if (ctx->output) {
33 ctx->output = apr_psprintf(r->pool,"%s%.*s",
ctx->output, len, str);
34 } else {
35 ctx->output = apr_pstrndup(r->pool, str, len);
36 }
37 }
38 apr_brigade_destroy(b);
39 return APR_SUCCESS;
40 }
Notes
lines 5-12:
like mod_include, this module performs initialization on every
invocation.
line 15:
however, a single brigade is created for passing to the next filter
line 26:
although the module destroys its received brigade (see below), it
seems to assume that the next filter will not destroy the brigade
it is passing
lines 32-36:
incoming data is simply buffered in memory until an EOS or FLUSH.
The latter seems to mean that a tag will not be recognized if it
is interrupted by a flush bucket, but I haven't tested that.
line 38:
the input buckets have not yet been deleted, but apr_brigade_destroy
will do that. It will also destroy the brigade, though.