You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Rici Lake <ri...@ricilake.net> on 2005/04/21 19:34:27 UTC

RFC: Who owns a passed brigade?

After a filter calls ap_pass_brigade(f->next, bb), who owns bb?

The name of the function might lead one to believe that ownership of 
the brigade was transferred, but inspection of distributed filters 
shows that it has not been; both mod_deflate and mod_include expect the 
brigade to be usable (and apparently empty) after return from 
ap_pass_brigade.

This would imply that the rule is that ownership of the brigade is 
retained by the caller, but that ownership of the buckets in the 
brigade is transferred to the callee.

If this is the case, I believe that it should be clearly stated in the 
API documentation. Otherwise, it is inevitable that people writing 
filters will get it wrong. (And it would also suggest that the function 
should be renamed ap_pass_buckets())

It also leads to the question of what the expected behaviour is if the 
callee does not empty the passed brigade. Is it reasonable for a called 
filter to expect to be able to leave a bucket in the passed brigade so 
as to receive it on the next call? That would seem eccentric, although 
I believe it is what would happen in the case of mod_deflate and 
mod_include, at least.

If the brigade is required to have been emptied when the receiving 
filter returns, then would  it not make more sense for ap_pass_brigade 
(or ap_pass_buckets) to call apr_brigade_cleanup just before returning? 
This would avoid the situation where a called filter religiously cleans 
up the brigade, in order to conform to the API expectations, but the 
calling filter also cleans up the brigade, in order to avoid worrying 
about misbehaving downstream filters.

If, on the other hand, buckets can be left in the brigade for retention 
between filter invocations, it must be clear that the passing filter 
must always use the same brigade for every call to ap_pass_brigade(), 
or otherwise arrange for buckets left in the brigade to be repassed.

I note that apr_brigade_split() seems to be oriented towards a model 
where ownership of the brigade is passed, since it creates a new 
brigade. For the model where brigade ownership is retained, it would be 
better to have a function which shifted buckets from one brigade to 
another one.

There are a number of possible bugs resulting from a lack of clarity on 
brigade ownership. For example:

-- if a filter believes that brigade ownership has been passed to it, 
it may call apr_brigade_destroy on the brigade. This will not cause any 
obvious misbehaviour, but the pool cleanup on the brigade will have 
been removed, and if the request terminates on an error, it is 
conceivable that heap buckets later added to the brigade will leak.

-- if a filter believes that it is passing ownership, it will not call 
apr_brigade_destroy on the brigade. This is only serious in the case 
where the filter creates a new brigade on every invocation (as, for 
example, mod_include does). In that case, the pool's cleanup list will 
grow dramatically for large requests. (There are a couple of issues in 
bugzilla which seem to be showing this symptom.)

-- as mentioned above, if a filter does not understand that it is being 
passed ownership of the buckets, it may leave them in the brigade and 
then receive them again on a subsequent invocation. However, the 
upstream filter may choose to send a different brigade on the 
subsequent invocation, so the filter cannot rely on this behaviour.

My vote would be for a clear statement that ownership of brigades is 
retained and ownership of buckets is transferred, along with a 
documented call to ap_brigade_cleanup() in ap_pass_brigade to precisely 
define the expected state of the brigade on return.

Re: RFC: Who owns a passed brigade?

Posted by Cliff Woolley <jw...@virginia.edu>.

On Thu, 21 Apr 2005, Paul Querna wrote:

> I agree that not having a clear rule has led to some possible leaks in
> many filters.  If some people think there has always been a 'rule', I
> contend that it has never been documented.

That may be.  I *believe* the rule was supposed to be that once you handed
it down, you could assume that it would come back empty.  This was
discussed at length on either httpd-dev or apr-dev back in the day.  I'll
have to dig through the archives to find it, though.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@perl.apache.org
For additional commands, e-mail: dev-help@perl.apache.org

Module code review (long). Was Re: RFC: Who owns a passed brigade?

Posted by Rici Lake <ri...@ricilake.net>.

On 22-Apr-05, at 6:19 AM, Nick Kew wrote:
>
>> However, I think it is a fallacy that a cleaned-up brigade is not too
>> big to wait for pool cleanup. The brigade itself is about four  
>> pointers;
>> and the corresponding pool cleanup is another four pointers, so  
>> that's a
>> total of around 32 bytes, not a lot. But suppose that a brigade is
>> created for every 8k of filtered data,
>
> Are there applications like that in the wild?  I'd have thought that
> sounds like the kind of case where you definitely want to re-use
> one brigade, with a brigade_cleanup after each 8Kb pass_brigade.

Indeed. But the filter does not know, in general, how much data it will  
be called upon to filter, nor how many times it will be invoked. Being  
invoked once every 8 kb should be pretty common, though, since that  
seems to be the recommendation on how often a filter should pass a  
brigade.

Are there applications like that in the wild? I don't know, but it  
seems likely that there are. None of the following examples are  
definitive, but they seem illustrative.

I cite these examples only because they are available; it was not an  
attempt to criticize anyone's code in particular, but rather to show  
the result of the lack of clear documentation on how to write an output  
filter.

In summary, of five modules examined, three of them create new brigades  
on every invocation without destroying the previous one. One of them  
fails to empty the passed brigade. One of them destroys the passed  
brigade. Two of them do one-time initialization on every invocation of  
the filter.

The only output filter I looked at which seems to conform to  
"brigades-are-retained, buckets-are-passed" is mod_deflate. This is the  
loop from mod_deflate.c, which seems to be a reasonable model:

   1   static apr_status_t deflate_out_filter(ap_filter_t *f,
   2                                          apr_bucket_brigade *bb)
   3   {
   // ... declarations
   4       if (APR_BRIGADE_EMPTY(bb)) {
   5           return APR_SUCCESS;
   6       }
   7       if (!ctx) {
               // ... determine whether we should filter the content
   8           /* We're cool with filtering this. */
   9           ctx = f->ctx = apr_pcalloc(r->pool, sizeof(*ctx));
  10           ctx->bb = apr_brigade_create(r->pool, f->c->bucket_alloc);
               // ... more initialization
  11       }
  12       while (!APR_BRIGADE_EMPTY(bb))
  13       {
  14           e = APR_BRIGADE_FIRST(bb);
               // ... check if e is metadata, process data in e
  15           apr_bucket_delete(e);
  16       }
  17       apr_brigade_cleanup(bb);
  18       return APR_SUCCESS;
  19   }

Notes:
lines 4-6
    these only serve to avoid initialization in the event that the filter
    only receives empty brigades. I don't know if this is necessary.
line 7
    all initialization is performed only once, if the ctx structure has
    not yet been created
line 12, 15
    all buckets are removed from the brigade and deleted
line 17
    the call to cleanup is presumably redundant since there is no break
    in the while block. But there could be.

mod_case_filter is supposedly an example of how to write filters, so it  
is likely that its code will be copied by naive filter writers. Here is  
its loop, in essence:

   1   static apr_status_t CaseFilterOutFilter(ap_filter_t *f,
   2                                           apr_bucket_brigade *pbbIn)
   3       {
           // ... declarations
   4       pbbOut=apr_brigade_create(r->pool, c->bucket_alloc);
   5       for (pbktIn = APR_BRIGADE_FIRST(pbbIn);
   6            pbktIn != APR_BRIGADE_SENTINEL(pbbIn);
   7            pbktIn = APR_BUCKET_NEXT(pbktIn))
   8           {
               // ... check for EOS
   8           apr_bucket_read(pbktIn,&data,&len,APR_BLOCK_READ);
               // ... do the transform
  10           pbktOut = apr_bucket_heap_create(buf, len,  
apr_bucket_free,
  11                                            c->bucket_alloc);
  12           APR_BRIGADE_INSERT_TAIL(pbbOut,pbktOut);
  13           }
  14       return ap_pass_brigade(f->next,pbbOut);
  15       }

Notes:
line 4:
   a new brigade is created on every invocation.
lines 5-7:
   the input brigade is iterated but the buckets are not removed
line 14:
   the input brigade is not cleaned, and the output brigade is not  
destroyed.

OK, here's mod_charset_lite, which is flagged as a learning experience  
(for the author), but I believe can actually be found in the wild. The  
filter function itself was a bit much to include here (it does seem to  
empty its input brigade), but the key is in the following function  
which is called everytime the filter decides it has output available:

   1   static apr_status_t send_downstream(ap_filter_t *f, const char  
*tmp,
   2                                       apr_size_t len)
   3   {
           // ... declarations
   4       bb = apr_brigade_create(r->pool, c->bucket_alloc);
   5       b = apr_bucket_transient_create(tmp, len, c->bucket_alloc);
   6       APR_BRIGADE_INSERT_TAIL(bb, b);
   7       rv = ap_pass_brigade(f->next, bb);
   8       if (rv != APR_SUCCESS) {
   9           ctx->ees = EES_DOWNSTREAM;
  10       }
  11       return rv;
  12   }

Notes:
line 4:
   a new brigade is created on every output (8 kb)
line 11:
   the brigade is not destroyed

Now, mod_include. This is a complex module, but stripped to its essence  
with respect to bucket_brigade handling, we have:

   1   static apr_status_t send_parsed_content(ap_filter_t *f,  
apr_bucket_brigade *bb)
   2   {
           // ... lots of stuff snipped
   3       apr_bucket *b = APR_BRIGADE_FIRST(bb);
   4       apr_bucket_brigade *pass_bb;
   5       pass_bb = apr_brigade_create(ctx->pool, f->c->bucket_alloc);

   6       /* loop over the current bucket brigade */
   7       while (b != APR_BRIGADE_SENTINEL(bb)) {
               // ... remove buckets from bb and process them,  
periodically
               // ... calling ap_pass_brigade
   8       /* if something's left over, pass it along */
   9       if (!APR_BRIGADE_EMPTY(pass_bb)) {
  10           rv = ap_pass_brigade(f->next, pass_bb);
  11       }
  12       else {
  13           rv = APR_SUCCESS;
  14           apr_brigade_destroy(pass_bb);
  15       }
  16       return rv;
  17   }

Notes
line 4:
   a new brigade is created on every invocation
line 14:
   the brigade is destroyed after being passed

While I was looking at this code, I realized that send_parsed_content  
is not the actual filter function. The actual filter function, again  
heavily edited:

   1   static apr_status_t includes_filter(ap_filter_t *f,  
apr_bucket_brigade *b)
   2   {
   3      if (!(ap_allow_options(r) & OPT_INCLUDES)) {
              // ... refuse to handle request
   4       }

   5       if (!f->ctx) {
               // ... initialize context
   6       }

   7       if ((parent = ap_get_module_config(r->request_config,  
&include_module))) {
   8           r->subprocess_env = r->main->subprocess_env;
   9           apr_pool_join(r->main->pool, r->pool);
  10           r->finfo.mtime = r->main->finfo.mtime;
  11       }
  12       else {
  13           /* we're not a nested include, so we create an initial
  14            * environment */
  15           ap_add_common_vars(r);
  16           ap_add_cgi_vars(r);
  17           add_include_vars(r, conf->default_time_fmt);
  18       }
           // ... more stuff having to do with Last-Modified and  
Content-Length
           // ... and creating QUERY_STRING from r->args
  19       return send_parsed_content(f, b);
  20   }

Notes
lines 7-18:
   these run on every invocation, including the calls to
   ap_add_common_vars and friends at line 15-17. I spent some time
   fruitlessly searching for the place where it avoids that on the
   second invocation, and finally tested it by applying it to a  
self-proxy:

<Location /inc>
     Allow from all
     ProxyPass http://freeb:8089
     ProxyPassReverse http://freeb:8089
     SetOutputFilter INCLUDES
</Location>

using the file:

<html><head><title>test</title></head>
<body>
<pre>
<!--#printenv-->
</pre>
<p>
........................................................................ 
.......
------ last line repeated 2048 times -----
<!--#set var="SCRIPT_NAME" value="/wouldn't/you/like/to/know"-->
<pre>
<!--#printenv-->
</pre>
</body></html>

As might be expected, the first printenv shows the actual SCRIPT_NAME,  
whereas the second one shows /wouldn't/you/like/to/know. Moving the  
<!--#set...> to just after the <body> reverses this: the first printenv  
shows the occulted SCRIPT_NAME, but since ap_add_cgi_vars has run again  
by the time the second printenv is encountered, the original value is  
shown by the second printenv.

Finally, mod_layout2, a third-party module actually in use:

   1   static apr_status_t layout_filter(ap_filter_t *f,  
apr_bucket_brigade *b) {
   2       if (r->main) {
   3           return ap_pass_brigade(f->next, b);
   4       }
   5       ap_table_setn(r->headers_out, "X-Powered-By",  
"ModLayout/"VERSION);
   6       cfg = ap_get_module_config(r->per_dir_config, &layout_module);
   7       if (cfg->uris_ignore) {
   8           if (table_find(cfg->uris_ignore, r->uri)) {
   9               return ap_pass_brigade(f->next, b);
  10           }
  11       }
  12       info = create_layout_request(r, cfg);
  13       if (ctx == NULL) {
  14           f->ctx = ctx = apr_pcalloc(f->r->pool, sizeof(*ctx));
  15           ctx->b = apr_brigade_create(f->r->pool,  
r->connection->bucket_alloc);
  16           ctx->output = NULL;
  17       }

  18       apr_table_unset(f->r->headers_out, "Content-Length");
  19       apr_table_unset(f->r->headers_out, "ETag");

  20       APR_BRIGADE_FOREACH(e, b) {
  21           if (APR_BUCKET_IS_EOS(e) || APR_BUCKET_IS_FLUSH  (e)) {
  22               info->f = f->next;
  23               info->b = ctx->b;
                   // ... snip: actually process accumulated input
  24               APR_BUCKET_REMOVE(e);
  25               APR_BRIGADE_INSERT_TAIL(ctx->b, e);
  26               return ap_pass_brigade(f->next, ctx->b);
  27           }
  28           if (APR_BUCKET_IS_FLUSH(e)) {
  29               continue;
  30           }
  31           apr_bucket_read(e, &str, &len, APR_NONBLOCK_READ);

  32           if (ctx->output) {
  33               ctx->output = apr_psprintf(r->pool,"%s%.*s",  
ctx->output, len, str);
  34           } else {
  35               ctx->output = apr_pstrndup(r->pool, str, len);
  36           }
  37       }

  38       apr_brigade_destroy(b);
  39       return APR_SUCCESS;
  40   }

Notes
lines 5-12:
   like mod_include, this module performs initialization on every  
invocation.
line 15:
   however, a single brigade is created for passing to the next filter
line 26:
   although the module destroys its received brigade (see below), it
   seems to assume that the next filter will not destroy the brigade
   it is passing
lines 32-36:
   incoming data is simply buffered in memory until an EOS or FLUSH.
   The latter seems to mean that a tag will not be recognized if it
   is interrupted by a flush bucket, but I haven't tested that.
line 38:
   the input buckets have not yet been deleted, but apr_brigade_destroy
   will do that. It will also destroy the brigade, though.

Re: RFC: Who owns a passed brigade?

Posted by Nick Kew <ni...@webthing.com>.

Rici Lake wrote:

> I favour having ap_pass_brigade do it.

 ... as discussed in IRC.  Yes, I like that too.

> However, I think it is a fallacy that a cleaned-up brigade is not too
> big to wait for pool cleanup. The brigade itself is about four pointers;
> and the corresponding pool cleanup is another four pointers, so that's a
> total of around 32 bytes, not a lot. But suppose that a brigade is
> created for every 8k of filtered data,

Hmmm.

Are there applications like that in the wild?  I'd have thought that
sounds like the kind of case where you definitely want to re-use
one brigade, with a brigade_cleanup after each 8Kb pass_brigade.
To create and destroy half a million brigades, including of course
registering and removing their cleanups, seems well worth avoiding.
So long as the creator is responsible (and the brigade guaranteed
to be reusable), that is trivial for the programmer.

-- 
Nick Kew

Re: RFC: Who owns a passed brigade?

Posted by Rici Lake <ri...@ricilake.net>.

On 25-Apr-05, at 12:05 PM, Joe Orton wrote:

> Fragile, why?  That's exactly the right approach as far as I can see.

Because the pool might have a lifetime *longer than* the bucket 
allocator. OK, that's not likely in the current architecture, so it 
would be a concern only for non-httpd applications. That was a 
different heated debate, I think :)

Re: RFC: Who owns a passed brigade?

Posted by Joe Orton <jo...@redhat.com>.

On Mon, Apr 25, 2005 at 10:44:39AM -0500, Rici Lake wrote:
> Allocating the brigade out of the bucket allocator but continuing to 
> register the cleanup with a pool (say, the request pool if that were 
> appropriate) might work but it would be fragile.

Fragile, why?  That's exactly the right approach as far as I can see. 
Leaving the pool cleanup exactly as-is ensures that the brigade
structures have *at most* a lifetime equivalent to today, but using the
bucket allocator means that you *can* shorten the lifetime by destroying
the brigade earlier, if you want.

> The heated discussion at 
> http://marc.theaimsgroup.com/?t=104039785500003&r=2&w=2 is probably 
> relevant.

Yup, and the retention of the pool cleanup is exactly why I don't
understand the argument that allocating brigade structures out of the
bucket allocator could cause memory leaks.  On the contrary, it allows
us to *fix* the unbounded memory usage which some filters suffer from
today.

joe

Re: RFC: Who owns a passed brigade?

Posted by Joe Orton <jo...@redhat.com>.

On Mon, Apr 25, 2005 at 11:15:02AM -0400, Cliff Woolley wrote:
> On Mon, 25 Apr 2005, Joe Orton wrote:
> 
> > 1. update the util_filter.h documentation
> > 2. add some APR_BUCKET_DEBUG code to abort() when brigades are used
> > after being "destroyed"
> > 3. adjust all filters to work with this model
> > 4. switch to allocate brigades out of the bucket allocator
> 
> Didn't we try #4 before and "something bad" happened?  I don't remember
> what that would have been, I just feel certain #4 has been previously
> discussed at least.

What I tried last year was doing #4 without doing #3 first, and that
indeed did not not work.

joe

Re: RFC: Who owns a passed brigade?

Posted by Rici Lake <ri...@ricilake.net>.

On 25-Apr-05, at 8:29 AM, Joe Orton wrote:

> I think it's fine to leave the *contents* of the brigade as undefined
> after a call to ap_pass_brigade.  (that's essentially the case now 
> since
> it's not explicitly defined to be "empty": I don't know of any
> particular bugs which have been caused by this?)

Undefined contents make me nervous. It is very easy to define the
results: just do a cleanup in ap_pass_brigade. (This will also
protect against some resource leaks in failed error handling, for
example.)

I don't know of any particular bugs which have been caused by this
either, but it would be very easy to create one. Take a filter which
expects the brigade to be emptied and feed it into a filter which
does not empty the brigade. Then watch the output quadratically
expand.

I don't believe there are any such bugs in the distribution,
because the only output filter in the distribution that I can
find which does not empty its brigade is mod_case_filter.c
(see below). Furthermore, most of the interesting filters
in the distribution actually create a new brigade for every
invocation, which I think we can all agree is a bad idea.

Some filters, such as mod_deflate.c, do expect the brigade
to be cleaned. In this case, it is unlikely that a bug will
be noticed, because few third-party filters will get inserted
after mod_deflate.c.

If filters were modified to always use the same bucket brigade
for communication with downstream filters, you might start to
see this error with third-party filters which don't clean
their input brigade, unless every filter religiously calls
ap_brigade_cleanup after ap_pass_brigade. But if you were
going to do that, you might as well put the call to
ap_brigade_cleanup in ap_pass_brigade. The cost of an
unnecessary call is minimal, and it simplifies the api.

Note: I honestly believe mod_case_filter.c should be removed
from the distribution, unless someone is willing to rewrite
it as a proper model of how to write an output filter. It is not
useful as a module, and it's only use in the distribution is to
serve as an example for module authors, which it definitely is not.

Re: RFC: Who owns a passed brigade?

Posted by Rici Lake <ri...@ricilake.net>.

On 25-Apr-05, at 10:15 AM, Cliff Woolley wrote:

>> 4. switch to allocate brigades out of the bucket allocator
>
> Didn't we try #4 before and "something bad" happened?  I don't remember
> what that would have been, I just feel certain #4 has been previously
> discussed at least.

I think the "something bad" would be the buildup of brigades in the 
connection pool, if some filter continues to allocate a new brigade for 
every invocation.

Also, if a brigade was not emptied and the bucket resources needed to 
be freed (eg. heap buckets), those would build up as well.

Allocating the brigade out of the bucket allocator but continuing to 
register the cleanup with a pool (say, the request pool if that were 
appropriate) might work but it would be fragile.

The heated discussion at 
http://marc.theaimsgroup.com/?t=104039785500003&r=2&w=2 is probably 
relevant.

Re: RFC: Who owns a passed brigade?

Posted by Cliff Woolley <jw...@virginia.edu>.

On Mon, 25 Apr 2005, Joe Orton wrote:

> 1. update the util_filter.h documentation
> 2. add some APR_BUCKET_DEBUG code to abort() when brigades are used
> after being "destroyed"
> 3. adjust all filters to work with this model
> 4. switch to allocate brigades out of the bucket allocator

Didn't we try #4 before and "something bad" happened?  I don't remember
what that would have been, I just feel certain #4 has been previously
discussed at least.

> 5. profit (aka. closing all the memory leak bugs caused by this stuff
> in bugzilla)

Otherwise, +1.

Re: RFC: Who owns a passed brigade?

Posted by Joe Orton <jo...@redhat.com>.

On Fri, Apr 22, 2005 at 04:42:04PM -0500, Rici Lake wrote:
> 
> On 22-Apr-05, at 9:32 AM, Joe Orton wrote:
> 
> >The issue here is really "which party can *destroy* a brigade", right?
> 
> Or perhaps "which party *must* destroy a brigade." This is much less of 
> an issue if neither party creates a new brigade on every filter 
> invocation. But some do.

Yup, the "must" is the key, actually making this an interface
guarantee...

> >In the current code where brigades are never really destroyed the fact
> >that apr_brigade_cleanup() is called everywhere is not really harmful,
> >is it?
> 
> No, what would be harmful would be apr_brigade_cleanup() not being 
> called on a non-empty brigade.

I think it's fine to leave the *contents* of the brigade as undefined
after a call to ap_pass_brigade.  (that's essentially the case now since
it's not explicitly defined to be "empty": I don't know of any
particular bugs which have been caused by this?)

Any filter which passes a brigade can know that it *must* clear the
brigade before reusing it.  If the brigade is never re-used, it never
need be cleared before it's destroyed; no harm done.

So is there any disagreement at all about moving to the "creator must
destroy the brigade" model?  If not, I'd suggest moving forward like
this:

1. update the util_filter.h documentation
2. add some APR_BUCKET_DEBUG code to abort() when brigades are used 
after being "destroyed"
3. adjust all filters to work with this model
4. switch to allocate brigades out of the bucket allocator
5. profit (aka. closing all the memory leak bugs caused by this stuff
in bugzilla)

joe

Re: RFC: Who owns a passed brigade?

Posted by Rici Lake <ri...@ricilake.net>.

On 22-Apr-05, at 9:32 AM, Joe Orton wrote:

> The issue here is really "which party can *destroy* a brigade", right?

Or perhaps "which party *must* destroy a brigade." This is much less of 
an issue if neither party creates a new brigade on every filter 
invocation. But some do.

> In the current code where brigades are never really destroyed the fact
> that apr_brigade_cleanup() is called everywhere is not really harmful,
> is it?

No, what would be harmful would be apr_brigade_cleanup() not being 
called on a non-empty brigade.

> It looked to me like it was going to much simpler to adapt filters
> shipped in httpd to match the creator-must-destroy model, than to match
> a consumer-must-destroy model, despite the ap_pass_brigade docstring.

That seems to be the case.

> It's certainly easier to audit for.

Sure, but it applies equally to buckets. It seems that the consensus is 
"creator-must-destroy-brigades; consumer-must-remove-buckets". That's 
not easy to audit for, which is why I think ap_pass_brigade should 
guarantee the latter condition.

> In creator-must-destroy it is still fine for a consumer to call
> apr_brigade_cleanup() on a brigade it is passed, of course; that's just
> equivalent to consuming all the buckets, after all.

Sure. Consequently, it's ok for ap_pass_brigade to do the call. Then a 
consumer is free to either consume the buckets or not.

Re: RFC: Who owns a passed brigade?

Posted by Joe Orton <jo...@redhat.com>.

On Thu, Apr 21, 2005 at 07:05:34PM -0500, Rici Lake wrote:
> 
> On 21-Apr-05, at 5:51 PM, Nick Kew wrote:
> 
> >Rici Lake wrote:
> >
> >>FWIW, I think the (apparent) practice, where the caller relinquishes
> >>ownership of the buckets but not the brigade itself, is more efficient
> >>since it avoids a lot of brigade construction and destruction.
> >
> >Agreed.  And it works for any situation, as either party can do a
> >cleanup to keep resource-use down, and a cleaned up brigade is not
> >too big to wait for pool cleanup.
> 
> "Either party can do a cleanup" does not give either party the 
> responsibility. If neither party did the cleanup, that would be a 
> problem. Lack of clear responsibility is always a problem, and not just 
> with computer programs.

The issue here is really "which party can *destroy* a brigade", right? 
In the current code where brigades are never really destroyed the fact
that apr_brigade_cleanup() is called everywhere is not really harmful,
is it?

It looked to me like it was going to much simpler to adapt filters
shipped in httpd to match the creator-must-destroy model, than to match
a consumer-must-destroy model, despite the ap_pass_brigade docstring. 
It's certainly easier to audit for.

In creator-must-destroy it is still fine for a consumer to call
apr_brigade_cleanup() on a brigade it is passed, of course; that's just
equivalent to consuming all the buckets, after all.

joe

Re: RFC: Who owns a passed brigade?

Posted by Rici Lake <ri...@ricilake.net>.

On 21-Apr-05, at 5:51 PM, Nick Kew wrote:

> Rici Lake wrote:
>
>> FWIW, I think the (apparent) practice, where the caller relinquishes
>> ownership of the buckets but not the brigade itself, is more efficient
>> since it avoids a lot of brigade construction and destruction.
>
> Agreed.  And it works for any situation, as either party can do a
> cleanup to keep resource-use down, and a cleaned up brigade is not
> too big to wait for pool cleanup.

"Either party can do a cleanup" does not give either party the 
responsibility. If neither party did the cleanup, that would be a 
problem. Lack of clear responsibility is always a problem, and not just 
with computer programs.

I favour having ap_pass_brigade do it. If the brigade must be cleaned 
before ap_pass_brigade returns, then it might as well be done by 
ap_pass_brigade.

However, I think it is a fallacy that a cleaned-up brigade is not too 
big to wait for pool cleanup. The brigade itself is about four 
pointers; and the corresponding pool cleanup is another four pointers, 
so that's a total of around 32 bytes, not a lot. But suppose that a 
brigade is created for every 8k of filtered data, and you run 4 GB of 
data through that without destroying any brigades. That's half a 
million brigades and half a million registered cleanups (linearly 
searched on kill, so you better hope the cleanups are destroyed in the 
right order).

See <http://issues.apache.org/bugzilla/show_bug.cgi?id=23567> and 
<http://issues.apache.org/bugzilla/show_bug.cgi?id=33382>; Paul 
Querna's patch to fix the latter bug clearly shows how you can drown in 
empty brigades (or rather, how to avoid drowning in them).

> Logically it makes a lot of sense
> to run brigade_cleanup once per pass_brigade, and it's harmless if
> the brigade was already cleaned.
>
>> But it's
>> a slightly odd convention.
>>
>> I don't think it makes that much difference, provided that there is a
>> clear statement and that everyone follows it.
>
> Creator-is-responsible is simple logic.  Other options are more complex
> and bug-prone, as we then have issues like whether to destroy on
> every call or only on EOS.  Or maybe on FLUSH ....
> Also, creator-is-responsible is a logic that can apply symmetrically
> to both input and output filters without mindbending.

I didn't mean to say that "creator-is-responsible" is odd. What I meant 
was that it is odd that ownership of the buckets gets transfered but 
not of the brigades. Creator-is-responsible is fine. Transferred 
ownership is fine. Part of one and part the other is odd. Not 
unworkable -- just odd. And thus in need of clear documentation.

> Dropping destroy from the API altogether and leaving that to the pool
> would also work, AFAICS.

-1, too fragile. See above bugzillas.

Re: RFC: Who owns a passed brigade?

Posted by Nick Kew <ni...@webthing.com>.

Rici Lake wrote:

> FWIW, I think the (apparent) practice, where the caller relinquishes
> ownership of the buckets but not the brigade itself, is more efficient
> since it avoids a lot of brigade construction and destruction.

Agreed.  And it works for any situation, as either party can do a
cleanup to keep resource-use down, and a cleaned up brigade is not
too big to wait for pool cleanup.  Logically it makes a lot of sense
to run brigade_cleanup once per pass_brigade, and it's harmless if
the brigade was already cleaned.

>	 But it's
> a slightly odd convention.
> 
> I don't think it makes that much difference, provided that there is a
> clear statement and that everyone follows it.

Creator-is-responsible is simple logic.  Other options are more complex
and bug-prone, as we then have issues like whether to destroy on
every call or only on EOS.  Or maybe on FLUSH ....
Also, creator-is-responsible is a logic that can apply symmetrically
to both input and output filters without mindbending.

Dropping destroy from the API altogether and leaving that to the pool
would also work, AFAICS.

-- 
Nick Kew

Re: RFC: Who owns a passed brigade?

Posted by Rici Lake <ri...@ricilake.net>.

On 21-Apr-05, at 3:58 PM, Cliff Woolley wrote:

> FWIW, the documentation says:
>
> "The caller relinquishes ownership of the brigade."
>
> This obviously differs from what some of our own filters are doing -- 
> and
> from my memory of past history.  It makes sense that it should be this
> way, though I think at some point in the past we just guaranteed that 
> the
> core_output_filter always emptied the brigades it was passed so it
> wouldn't have mattered until other filters that didn't behave that way
> started getting inserted in front of c_o_f.
>
> I'll keep digging.

I had seen that note in the documentation; in fact, I wrote several 
filters on the basis of that statement but they misbehaved in odd ways 
which is why I started to dig into the question. It is clearly at 
variance from practice.

FWIW, I think the (apparent) practice, where the caller relinquishes 
ownership of the buckets but not the brigade itself, is more efficient 
since it avoids a lot of brigade construction and destruction. But it's 
a slightly odd convention.

I don't think it makes that much difference, provided that there is a 
clear statement and that everyone follows it.

Re: RFC: Who owns a passed brigade?

Posted by Cliff Woolley <jw...@virginia.edu>.

On Thu, 21 Apr 2005, Cliff Woolley wrote:

> "The caller relinquishes ownership of the brigade."


So that documentation came about because of this thread:

http://marc.theaimsgroup.com/?l=apache-httpd-dev&m=106952637722748&w=2

I'm not sure that the thread reflected reality, though.  Perhaps it
*should*, but it doesn't.  Jeff?

--Cliff

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@perl.apache.org
For additional commands, e-mail: dev-help@perl.apache.org

Re: RFC: Who owns a passed brigade?

Posted by Cliff Woolley <jw...@virginia.edu>.

On Thu, 21 Apr 2005, Cliff Woolley wrote:

> "The caller relinquishes ownership of the brigade."


So that documentation came about because of this thread:

http://marc.theaimsgroup.com/?l=apache-httpd-dev&m=106952637722748&w=2

I'm not sure that the thread reflected reality, though.  Perhaps it
*should*, but it doesn't.  Jeff?

--Cliff

Re: RFC: Who owns a passed brigade?

Posted by Cliff Woolley <jw...@virginia.edu>.

On Thu, 21 Apr 2005, Paul Querna wrote:

> I agree that not having a clear rule has led to some possible leaks in
> many filters.  If some people think there has always been a 'rule', I
> contend that it has never been documented.

FWIW, the documentation says:

--------------------------------------------------------------------
/**
 * Pass the current bucket brigade down to the next filter on the filter
 * stack.  The filter returns an apr_status_t value.  If the bottom-most
 * filter doesn't write to the network, then ::AP_NOBODY_WROTE is returned.
 * The caller relinquishes ownership of the brigade.
 * @param filter The next filter in the chain
 * @param bucket The current bucket brigade
 */
AP_DECLARE(apr_status_t) ap_pass_brigade(ap_filter_t *filter,
                                         apr_bucket_brigade *bucket);
--------------------------------------------------------------------

"The caller relinquishes ownership of the brigade."

This obviously differs from what some of our own filters are doing -- and
from my memory of past history.  It makes sense that it should be this
way, though I think at some point in the past we just guaranteed that the
core_output_filter always emptied the brigades it was passed so it
wouldn't have mattered until other filters that didn't behave that way
started getting inserted in front of c_o_f.

I'll keep digging.

Re: RFC: Who owns a passed brigade?

Posted by Cliff Woolley <jw...@virginia.edu>.

On Thu, 21 Apr 2005, Paul Querna wrote:

> I agree that not having a clear rule has led to some possible leaks in
> many filters.  If some people think there has always been a 'rule', I
> contend that it has never been documented.

FWIW, the documentation says:

--------------------------------------------------------------------
/**
 * Pass the current bucket brigade down to the next filter on the filter
 * stack.  The filter returns an apr_status_t value.  If the bottom-most
 * filter doesn't write to the network, then ::AP_NOBODY_WROTE is returned.
 * The caller relinquishes ownership of the brigade.
 * @param filter The next filter in the chain
 * @param bucket The current bucket brigade
 */
AP_DECLARE(apr_status_t) ap_pass_brigade(ap_filter_t *filter,
                                         apr_bucket_brigade *bucket);
--------------------------------------------------------------------

"The caller relinquishes ownership of the brigade."

This obviously differs from what some of our own filters are doing -- and
from my memory of past history.  It makes sense that it should be this
way, though I think at some point in the past we just guaranteed that the
core_output_filter always emptied the brigades it was passed so it
wouldn't have mattered until other filters that didn't behave that way
started getting inserted in front of c_o_f.

I'll keep digging.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@perl.apache.org
For additional commands, e-mail: dev-help@perl.apache.org

Re: RFC: Who owns a passed brigade?

Posted by Cliff Woolley <jw...@virginia.edu>.

On Thu, 21 Apr 2005, Paul Querna wrote:

> I agree that not having a clear rule has led to some possible leaks in
> many filters.  If some people think there has always been a 'rule', I
> contend that it has never been documented.

That may be.  I *believe* the rule was supposed to be that once you handed
it down, you could assume that it would come back empty.  This was
discussed at length on either httpd-dev or apr-dev back in the day.  I'll
have to dig through the archives to find it, though.

Re: RFC: Who owns a passed brigade?

Posted by Paul Querna <ch...@force-elite.com>.

Rici Lake wrote:
> After a filter calls ap_pass_brigade(f->next, bb), who owns bb?
..snip..
> I note that apr_brigade_split() seems to be oriented towards a model
> where ownership of the brigade is passed, since it creates a new
> brigade. For the model where brigade ownership is retained, it would be
> better to have a function which shifted buckets from one brigade to
> another one.

Yes, it has been on my list to get a brigade 'move' or 'shift' or some
other name into APR-Util.  I believe code that does this might already
be duplicated in several places. (server/core_filters.c:115 for example).

..snip..
> There are a number of possible bugs resulting from a lack of clarity on
> brigade ownership. For example:

I agree that not having a clear rule has led to some possible leaks in
many filters.  If some people think there has always been a 'rule', I
contend that it has never been documented.

..snip..
> My vote would be for a clear statement that ownership of brigades is
> retained and ownership of buckets is transferred, along with a
> documented call to ap_brigade_cleanup() in ap_pass_brigade to precisely
> define the expected state of the brigade on return.

+1.

-Paul