You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@apr.apache.org by Cliff Woolley <cl...@yahoo.com> on 2001/02/27 05:57:35 UTC

Bucket API cleanup issues

Okay, I've got a series of pretty major bucket API cleanups I'd like to do
(most of which are mostly transparent outside apr-util, BTW).  Before I
go off and commit, though, this was major enough to warrant seeking group
consent.  Here are the issues I'm addressing, in roughly descending order
of majority:

1) The apr_bucket_shared and apr_bucket_simple structs are completely
useless, and a needless time/resource drain.  If we just add an apr_off_t
start into the apr_bucket struct, apr_bucket_shared and apr_bucket_simple
can go away completely, saving massive amounts of work in all of the
bucket code and making the code vastly simpler and easier to
read/understand.  That simplicity can easily be seen in that
apr_bucket_shared_split is reduced to simple_split plus a refcount++ (same
for shared_copy).  I've renamed simple_split and simple_copy giving them
the apr_bucket_ prefix and making them public functions called by their
shared counterparts.  I have this patched, tested, and ready to go.  If
anyone wants to see the patch, tell me and I'll post it.

2) The parameter 'w' (returned length) to apr_bucket_heap_make() and
apr_bucket_heap_create() is useless.  The returned length is invariant
from the passed-in length, assuming malloc didn't fail, which we can
easily determine from the function's return value.  There's no way to get
*part* of the data made into a heap bucket... malloc just doesn't do
"partial" allocations.  You've either got enough memory or you dont.
Those callers that are currently using the parameter generally discard its
function anyway, seeming to think that they have to use it just because
it's there.  This patch is done for apr-util but not for Apache.

3) pool_bucket_cleanup() is completely bogus AFAICT.  I've added this
comment to the code, which describes the problems pretty well:
    /*
     * XXX: this is fubar... only the *first* apr_bucket to reference
     * the bucket is changed to a heap bucket... ALL references must
     * be changed.  So instead of h->b, we need an array of b's in h.
     * pool_destroy() should remove that bucket from the array.  But
     * pool_destroy doesn't know where the apr_bucket itself is.
     * ARRGGGH.  Only solution: apr_bucket_destroy() should let the
     * e->type->destroy() function destroy the apr_bucket itself, too.
     * pool_destroy() can remove destroyed apr_bucket referrers from
     * the array.  Any referrers that remain when this function is
     * called get converted to heap buckets by adding a loop over the
     * array here (only the first call to apr_bucket_heap_make() gets
     * the copy flag set, of course).  Another gotcha, though this one
     * is easily worked around: apr_bucket_heap_make() calls
     * apr_bucket_shared_make() which invalidates the reference count
     * on the refcounted bucket. --jcw
     */

4) The same problem applies to file buckets that have been split/copied
when APR_HAS_MMAP: when one of them gets read it gets made into an MMAP.
ALL of the file buckets should be converted at the same time to reference
the same MMAP.

5) apr_bucket_destroy() should allow the type-specific destructor to free
the bucket itself, too (needed to solve problem #3).

6) Should mmap_destroy() call apr_mmap_delete(m->mmap) after when the last
reference to the mmap bucket is destroyed?

7) socket_read() and pipe_read() should return APR_SUCCESS even in an EOF
condition, as discussed on new-httpd a week or so ago.  I haven't made
this change yet, though I have put an XXX comment in the code about it,
because it requres fixing the input filters in Apache to figure out EOS on
their own.  Looking for suggestions on how to do that.

8) Can I make apr_bucket_shared_destroy() just return a boolean value
indicating whether or not this bucket is the last reference to the shared
resource?  It's silly to return a pointer to that resource, since the
caller must already know where the resource is anyway.

9) file_read in the non-mmap case was allocating HUGE_STRING_LEN even if
the file was smaller than that. Fixed and thrown in with patch #1.

10) socket_read and pipe_read should modify the heap bucket they create to
have alloc_len == HUGE_STRING_LEN to reflect the actual size of buf. Fixed
and thrown in with patch #1.


PHEW!  I think that's it for now.  Comments, please?

--Cliff

Re: Bucket API cleanup issues

Posted by Cliff Woolley <cl...@yahoo.com>.

On Wed, 28 Feb 2001, Greg Stein wrote:

> 1) it is hard to tell that "p" and "h" refer to the same bucket.
>    "p->heap.foo" is more obvious than "h->foo".

Fine by me.

> 2) Nit: I'm getting tired of these single letters for the bucket stuff. "a"
>    as a variable for a bucket? "e"? Blarg. Using "h" and "p" fall right into
>    that camp; especially p, given its use as a pool or char pointer in so
>    many other contexts. I might suggest "bh" and "bp".

We've more or less standardized on one letter meaning a bucket, though...
fixing this would be a massive change.  For now, I think we need to stick
with one letter for consistency.  I'd be just as happy if that letter were
'b' (I don't know where a and e came from, but they're definitely there),
or at least a letter mnemonic to the type of bucket it is (as I've done
here).

> 3) You have two "base" members. That will be confusing as hell over the long
>    haul. I'd recommend using just heap.base. Guess how to detect when a pool
>    bucket is no longer in a pool? Right in one! Just set pool to NULL. And
>    *please* put some comments to this effect in the apr_bucket_pool
>    structure declaration(!)

Done.

> 4) And no... I don't think we need to check malloc() results. If NULL is
>    returned, then we'll segfault right away. That's fine in my book since
>    we'd otherwise just call abort(). (if you really want to check, then call
>    abort() on error or a pool's abort function)

Yeah... I just put that comment in because there are comments just like it
throughout the buckets code where there's a possibility of failure that
we're not checking for, in case we decide at some point down the road to
go back and put in checks for them.  <shrug>

> That's it for now. I'll re-review when you check it in.

Thanks.

--Cliff

Re: Bucket API cleanup issues

Posted by Greg Stein <gs...@lyra.org>.

On Wed, Feb 28, 2001 at 03:03:08PM -0500, Cliff Woolley wrote:
>...
> Basically, instead of having the pool bucket contain a pointer to a heap
> bucket and having to fool with manipulating reference counts and keeping
> track of when to destroy the old apr_bucket_pool struct, the
> apr_bucket_pool *contains* the apr_bucket_heap struct that it will become
> as its first element.

Good call!

Some comments on the patch:

1) it is hard to tell that "p" and "h" refer to the same bucket.
   "p->heap.foo" is more obvious than "h->foo".

2) Nit: I'm getting tired of these single letters for the bucket stuff. "a"
   as a variable for a bucket? "e"? Blarg. Using "h" and "p" fall right into
   that camp; especially p, given its use as a pool or char pointer in so
   many other contexts. I might suggest "bh" and "bp".

3) You have two "base" members. That will be confusing as hell over the long
   haul. I'd recommend using just heap.base. Guess how to detect when a pool
   bucket is no longer in a pool? Right in one! Just set pool to NULL. And
   *please* put some comments to this effect in the apr_bucket_pool
   structure declaration(!)

4) And no... I don't think we need to check malloc() results. If NULL is
   returned, then we'll segfault right away. That's fine in my book since
   we'd otherwise just call abort(). (if you really want to check, then call
   abort() on error or a pool's abort function)

That's it for now. I'll re-review when you check it in.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: Bucket API cleanup issues

Posted by Cliff Woolley <cl...@yahoo.com>.

On Tue, 27 Feb 2001, Greg Stein wrote:

> Euh... I don't think we want another ring.
>
> A simpler idea is to have the apr_bucket_pool structure contain a pointer to
> an apr_bucket_heap structure. At cleanup time, you do the following:

Ahh, (he says as the little light over his head comes on).  This makes
sense.  I like.

> > > > 6) Should mmap_destroy() call apr_mmap_delete(m->mmap) after when the last
> > > > reference to the mmap bucket is destroyed?
>
> A better approach is to have the cache insert the MMAP bucket with a
> refcount starting at 2 (one for the cache, one for the bucket). When the
> last bucket goes away, you'll still have a refcount of 1 and the MMAP will
> remain alive.

Good idea.

> Reading from the input filter **stack** is different. Let's say that case
> (1) occurs and you've "read" the zero bytes out of the brigade. The brigade
> is now empty, so you do an ap_get_brigade() for more. This drops all the way
> down to the core_input_filter. It says, "sorry, I gave you everything I had"
> and returns APR_EOF. *NOW* is where the EOS is detected.

I confused the issue by even mentioning the filters.  All I care about
right now is what should be returned by the bucket read functions, which
we've determined is *never* APR_EOF.  If the filters want to return
APR_EOF, then they're welcome to, but the buckets themselves don't.

> That should do the trick. Save this MsgID somewhere :-)

;-]  I figured it was more expedient to do a little research and quote
what was said than to do the "well, I had this impression," "no, I had
another impression" back-and-forth routine.  hehe

Thanks,
Cliff

Re: Bucket API cleanup issues

Posted by Cliff Woolley <cl...@yahoo.com>.

On Tue, 27 Feb 2001, Greg Stein wrote:

> Euh... I don't think we want another ring.
>
> A simpler idea is to have the apr_bucket_pool structure contain a pointer to
> an apr_bucket_heap structure. At cleanup time, you do the following:

Ahh, (he says as the little light over his head comes on).  This makes
sense.  I like.

> > > > 6) Should mmap_destroy() call apr_mmap_delete(m->mmap) after when the last
> > > > reference to the mmap bucket is destroyed?
>
> A better approach is to have the cache insert the MMAP bucket with a
> refcount starting at 2 (one for the cache, one for the bucket). When the
> last bucket goes away, you'll still have a refcount of 1 and the MMAP will
> remain alive.

Good idea.

> Reading from the input filter **stack** is different. Let's say that case
> (1) occurs and you've "read" the zero bytes out of the brigade. The brigade
> is now empty, so you do an ap_get_brigade() for more. This drops all the way
> down to the core_input_filter. It says, "sorry, I gave you everything I had"
> and returns APR_EOF. *NOW* is where the EOS is detected.

I confused the issue by even mentioning the filters.  All I care about
right now is what should be returned by the bucket read functions, which
we've determined is *never* APR_EOF.  If the filters want to return
APR_EOF, then they're welcome to, but the buckets themselves don't.

> That should do the trick. Save this MsgID somewhere :-)

;-]  I figured it was more expedient to do a little research and quote
what was said than to do the "well, I had this impression," "no, I had
another impression" back-and-forth routine.  hehe

Thanks,
Cliff

Re: Bucket API cleanup issues

Posted by Cliff Woolley <cl...@yahoo.com>.

On Tue, 27 Feb 2001, Greg Stein wrote:

> Euh... I don't think we want another ring.
>
> A simpler idea is to have the apr_bucket_pool structure contain a pointer to
> an apr_bucket_heap structure. At cleanup time, you do the following:
...
> So what you get here is a single allocation of a heap bucket. Then, you
> lazily repoint all other buckets to the new heap bucket.

I came up with a really cool idea that's an even greater simplification of
this.  It's coded up and ready to go, but I figured I'd give people a
chance to comment before I commit.

Basically, instead of having the pool bucket contain a pointer to a heap
bucket and having to fool with manipulating reference counts and keeping
track of when to destroy the old apr_bucket_pool struct, the
apr_bucket_pool *contains* the apr_bucket_heap struct that it will become
as its first element.

Since the apr_bucket_heap has an apr_bucket_refcount as ITS first element,
the pool and heap bucket share an apr_bucket_refcount (ie, no more
twiddling with reference counts).

All you have to do in the cleanup routine is malloc/memcpy out of the pool
and onto the heap and set p->data to NULL, which is the flag to later
pool_read() calls that they should morph their buckets.  All that entails
is changing b->type to &apr_bucket_type_heap... even the data pointer is
already correct because the apr_bucket_heap is the first element of
apr_bucket_pool.

After the morph, heap_read() and heap_destroy() take right over where
pool_read() and pool_destroy() left off.  When the last reference is
heap_destroy'ed, heap_destroy() free's the b->data.  As far as it knows,
b->data is just an apr_bucket_heap, but it unwittingly cleans up the
entire apr_bucket_pool struct for us since the b->data pointer never
changed.

=-)

Patch attached for comments.

--Cliff

Re: Bucket API cleanup issues

Posted by Greg Stein <gs...@lyra.org>.

On Tue, Feb 27, 2001 at 09:54:25AM -0500, Cliff Woolley wrote:
> On Mon, 26 Feb 2001 rbb@covalent.net wrote:
> > > 3) pool_bucket_cleanup() is completely bogus AFAICT.  I've added this
> > > comment to the code, which describes the problems pretty well:
> > > 4) The same problem applies to file buckets that have been split/copied
> > > when APR_HAS_MMAP: when one of them gets read it gets made into an MMAP.
> > > ALL of the file buckets should be converted at the same time to reference
> > > the same MMAP.
> >
> > I disagree about how to fix this.  The simple solution is to just abstract
> > on level, such as:
> >
> >   bucket   ->     bucket   ->   bucket
> >     |               |             |
> >   shared          shared	shared
> >     |               |             |
> >      -----------------------------
> >                     |
> >                 pool_bucket
> 
> So is what you're saying that struct apr_bucket should be a member of TWO
> rings... one brigade as usual plus one ring ("brigade") of siblings
> buckets that point to the same resource?  I'd thought of that, but didn't
> think anyone would buy it.  I'm fine with the idea.  But I *must* be
> missing something... how does that keep us from traversing a list of
> buckets when we convert types?  (PS: Remember that the "shared" level is
> going away.)  If I've missed your point, please elaborate.

Euh... I don't think we want another ring.

A simpler idea is to have the apr_bucket_pool structure contain a pointer to
an apr_bucket_heap structure. At cleanup time, you do the following:

*) if h->heap_ptr is non-NULL, then use it for your new s->data field.
   do appropriate incref/decref for the two shared structures. toss the
   private pool stuff when refcount==0

*) if h->heap_ptr is NULL, then create a new apr_bucket_heap from the pool
   bucket. leave a ptr to this in the pool bucket. change self to point to
   the new heap and do the right incref/decref. zero-out the ptrs in the
   pool structure (they'll be invalid momentarily; this is for safety in
   case somebody tries to access them)

*) in the read() function (and others), if h->heap_ptr if non-NULL, then
   rebuild the bucket around the (shared) apr_bucket_heap structure. adjust
   incref/decref as appropriate.

So what you get here is a single allocation of a heap bucket. Then, you
lazily repoint all other buckets to the new heap bucket.

This does imply that the pool bucket structures must be malloc'd rather than
come from the pool (because the structs must live until the last reference,
even though the pool may disappear sooner).

Strictly speaking, you'll never get a read where h->heap_ptr is non-NULL.
When the pool is cleaned up, the first bucket hit will construct the heap
bucket. All the rest participating in the cleanup will get repointed. By the
end of the cleanup, nobody will be referring to the old pool substructure,
and it will go away.

But logically, it is better to make allowances for h->heap_ptr in case some
fool wants to read() during a cleanup.

>...
> > > 6) Should mmap_destroy() call apr_mmap_delete(m->mmap) after when the last
> > > reference to the mmap bucket is destroyed?
> >
> > No.  If you do that, then caching MMAP's becomes impossible.  The
> > apr_mmap_delete should be left up to the pool cleanup.
> 
> Okay, no problem.  I'd just noticed it when I was going through the code
> and thought I'd bring it up, but I wasn't married to the idea.  I figured
> it was left out for a reason but didn't know what it was.  Scratch #6.

A better approach is to have the cache insert the MMAP bucket with a
refcount starting at 2 (one for the cache, one for the bucket). When the
last bucket goes away, you'll still have a refcount of 1 and the MMAP will
remain alive.

When we auto-convert a FILE to an MMAP, it will start with refcount==1. This
allows the MMAP to be tossed when the bucket goes away. (pool cleanup could
be a *long* ways away)

This gives us immediate cleanup when possible, and deferred cleanup when
needed [by a cache].

> > > 7) socket_read() and pipe_read() should return APR_SUCCESS even in an EOF
> > > condition, as discussed on new-httpd a week or so ago.  I haven't made
> > > this change yet, though I have put an XXX comment in the code about it,
> > > because it requres fixing the input filters in Apache to figure out EOS on
> > > their own.  Looking for suggestions on how to do that.

If you do a blocking read on a **SOCKET bucket** for one byte, then you'll
end up with one of three situations:

1) EOF was hit (internally) and the socket is turned into a zero-length HEAP
   bucket.

   Apache doesn't do anything special to recognize this situation.

2) Timeout was reached. APR_TIMEUP is returned by the read operation. This
   is an actual error situation, so Apache handles it as appropriate.

3) At least one byte is returned in a HEAP bucket.

Apache will determine when to insert the EOS based on when the protocol says
the request has ended. This is based on chunking or Content-Length or
end-of-headers. The client cannot close the connection to denote the end
marker.

Note that situation (1) should only occur in one of two cases:

a) the client aborted the connection
b) attempting to read a second (or Nth) request, and the client closed it

Okay. The above is for reading an individual bucket. That is, the read()
function.

Reading from the input filter **stack** is different. Let's say that case
(1) occurs and you've "read" the zero bytes out of the brigade. The brigade
is now empty, so you do an ap_get_brigade() for more. This drops all the way
down to the core_input_filter. It says, "sorry, I gave you everything I had"
and returns APR_EOF. *NOW* is where the EOS is detected.

core_input_filter can do one of two things:

 i) insert an EOS right after the SOCKET bucket at setup time. continue to
    return APR_EOF if the SOCKET bucket (and EOS bucket) is consumed and
    core_input_filter is called for more.

ii) don't return APR_EOF, but return EOS instead

I prefer the former. The upper layers cannot distinguish "end of this
request" from "there is nothing more" in the second case. They'll go for
more and keep getting EOS. But that will be a 400 (Bad Request) because it
looks empty.

In case (2) above, Apache tears down the connection, so EOS is never an
issue.

In case (3) above, you have data, so EOS is not an issue.

And when you're successfully fetching data, the EOS is inserted by the upper
level filters (HTTP_IN? DECHUNK? dunno which it is) when it determines the
logical request has ended (despite what the lower level filters may say
about more data being present).

The big gimmick is to not confuse return values from read() and those of
ap_get_brigade(). They have different semantics.

That should do the trick. Save this MsgID somewhere :-)

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: Bucket API cleanup issues

Posted by Greg Stein <gs...@lyra.org>.

On Tue, Feb 27, 2001 at 09:54:25AM -0500, Cliff Woolley wrote:
> On Mon, 26 Feb 2001 rbb@covalent.net wrote:
> > > 3) pool_bucket_cleanup() is completely bogus AFAICT.  I've added this
> > > comment to the code, which describes the problems pretty well:
> > > 4) The same problem applies to file buckets that have been split/copied
> > > when APR_HAS_MMAP: when one of them gets read it gets made into an MMAP.
> > > ALL of the file buckets should be converted at the same time to reference
> > > the same MMAP.
> >
> > I disagree about how to fix this.  The simple solution is to just abstract
> > on level, such as:
> >
> >   bucket   ->     bucket   ->   bucket
> >     |               |             |
> >   shared          shared	shared
> >     |               |             |
> >      -----------------------------
> >                     |
> >                 pool_bucket
> 
> So is what you're saying that struct apr_bucket should be a member of TWO
> rings... one brigade as usual plus one ring ("brigade") of siblings
> buckets that point to the same resource?  I'd thought of that, but didn't
> think anyone would buy it.  I'm fine with the idea.  But I *must* be
> missing something... how does that keep us from traversing a list of
> buckets when we convert types?  (PS: Remember that the "shared" level is
> going away.)  If I've missed your point, please elaborate.

Euh... I don't think we want another ring.

A simpler idea is to have the apr_bucket_pool structure contain a pointer to
an apr_bucket_heap structure. At cleanup time, you do the following:

*) if h->heap_ptr is non-NULL, then use it for your new s->data field.
   do appropriate incref/decref for the two shared structures. toss the
   private pool stuff when refcount==0

*) if h->heap_ptr is NULL, then create a new apr_bucket_heap from the pool
   bucket. leave a ptr to this in the pool bucket. change self to point to
   the new heap and do the right incref/decref. zero-out the ptrs in the
   pool structure (they'll be invalid momentarily; this is for safety in
   case somebody tries to access them)

*) in the read() function (and others), if h->heap_ptr if non-NULL, then
   rebuild the bucket around the (shared) apr_bucket_heap structure. adjust
   incref/decref as appropriate.

So what you get here is a single allocation of a heap bucket. Then, you
lazily repoint all other buckets to the new heap bucket.

This does imply that the pool bucket structures must be malloc'd rather than
come from the pool (because the structs must live until the last reference,
even though the pool may disappear sooner).

Strictly speaking, you'll never get a read where h->heap_ptr is non-NULL.
When the pool is cleaned up, the first bucket hit will construct the heap
bucket. All the rest participating in the cleanup will get repointed. By the
end of the cleanup, nobody will be referring to the old pool substructure,
and it will go away.

But logically, it is better to make allowances for h->heap_ptr in case some
fool wants to read() during a cleanup.

>...
> > > 6) Should mmap_destroy() call apr_mmap_delete(m->mmap) after when the last
> > > reference to the mmap bucket is destroyed?
> >
> > No.  If you do that, then caching MMAP's becomes impossible.  The
> > apr_mmap_delete should be left up to the pool cleanup.
> 
> Okay, no problem.  I'd just noticed it when I was going through the code
> and thought I'd bring it up, but I wasn't married to the idea.  I figured
> it was left out for a reason but didn't know what it was.  Scratch #6.

A better approach is to have the cache insert the MMAP bucket with a
refcount starting at 2 (one for the cache, one for the bucket). When the
last bucket goes away, you'll still have a refcount of 1 and the MMAP will
remain alive.

When we auto-convert a FILE to an MMAP, it will start with refcount==1. This
allows the MMAP to be tossed when the bucket goes away. (pool cleanup could
be a *long* ways away)

This gives us immediate cleanup when possible, and deferred cleanup when
needed [by a cache].

> > > 7) socket_read() and pipe_read() should return APR_SUCCESS even in an EOF
> > > condition, as discussed on new-httpd a week or so ago.  I haven't made
> > > this change yet, though I have put an XXX comment in the code about it,
> > > because it requres fixing the input filters in Apache to figure out EOS on
> > > their own.  Looking for suggestions on how to do that.

If you do a blocking read on a **SOCKET bucket** for one byte, then you'll
end up with one of three situations:

1) EOF was hit (internally) and the socket is turned into a zero-length HEAP
   bucket.

   Apache doesn't do anything special to recognize this situation.

2) Timeout was reached. APR_TIMEUP is returned by the read operation. This
   is an actual error situation, so Apache handles it as appropriate.

3) At least one byte is returned in a HEAP bucket.

Apache will determine when to insert the EOS based on when the protocol says
the request has ended. This is based on chunking or Content-Length or
end-of-headers. The client cannot close the connection to denote the end
marker.

Note that situation (1) should only occur in one of two cases:

a) the client aborted the connection
b) attempting to read a second (or Nth) request, and the client closed it

Okay. The above is for reading an individual bucket. That is, the read()
function.

Reading from the input filter **stack** is different. Let's say that case
(1) occurs and you've "read" the zero bytes out of the brigade. The brigade
is now empty, so you do an ap_get_brigade() for more. This drops all the way
down to the core_input_filter. It says, "sorry, I gave you everything I had"
and returns APR_EOF. *NOW* is where the EOS is detected.

core_input_filter can do one of two things:

 i) insert an EOS right after the SOCKET bucket at setup time. continue to
    return APR_EOF if the SOCKET bucket (and EOS bucket) is consumed and
    core_input_filter is called for more.

ii) don't return APR_EOF, but return EOS instead

I prefer the former. The upper layers cannot distinguish "end of this
request" from "there is nothing more" in the second case. They'll go for
more and keep getting EOS. But that will be a 400 (Bad Request) because it
looks empty.

In case (2) above, Apache tears down the connection, so EOS is never an
issue.

In case (3) above, you have data, so EOS is not an issue.

And when you're successfully fetching data, the EOS is inserted by the upper
level filters (HTTP_IN? DECHUNK? dunno which it is) when it determines the
logical request has ended (despite what the lower level filters may say
about more data being present).

The big gimmick is to not confuse return values from read() and those of
ap_get_brigade(). They have different semantics.

That should do the trick. Save this MsgID somewhere :-)

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: Bucket API cleanup issues

Posted by rb...@covalent.net.

On Tue, 27 Feb 2001, Cliff Woolley wrote:

> On Mon, 26 Feb 2001 rbb@covalent.net wrote:
>
> > > 3) pool_bucket_cleanup() is completely bogus AFAICT.  I've added this
> > > comment to the code, which describes the problems pretty well:
> > > 4) The same problem applies to file buckets that have been split/copied
> > > when APR_HAS_MMAP: when one of them gets read it gets made into an MMAP.
> > > ALL of the file buckets should be converted at the same time to reference
> > > the same MMAP.
> >
> > I disagree about how to fix this.  The simple solution is to just abstract
> > on level, such as:
> >
> >   bucket   ->     bucket   ->   bucket
> >     |               |             |
> >   shared          shared	shared
> >     |               |             |
> >      -----------------------------
> >                     |
> >                 pool_bucket
>
> So is what you're saying that struct apr_bucket should be a member of TWO
> rings... one brigade as usual plus one ring ("brigade") of siblings
> buckets that point to the same resource?  I'd thought of that, but didn't
> think anyone would buy it.  I'm fine with the idea.  But I *must* be
> missing something... how does that keep us from traversing a list of
> buckets when we convert types?  (PS: Remember that the "shared" level is
> going away.)  If I've missed your point, please elaborate.

No, we don't want two rings.  It is just one level of abstraction.
Basically, when creating a pool bucket, you have one instance of the data,
and three small layers that sit on top of it.  Now, when you want to
change from a pool bucket to a heap bucket, you just have to change the
bottom layer.  To change the type pointer, make the type in the top-most
bucket point to a pointer in the bottom most bucket.  That should solve
the problem rather cleanly.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------

Re: Bucket API cleanup issues

Posted by Cliff Woolley <cl...@yahoo.com>.

On Mon, 26 Feb 2001 rbb@covalent.net wrote:

> > 3) pool_bucket_cleanup() is completely bogus AFAICT.  I've added this
> > comment to the code, which describes the problems pretty well:
> > 4) The same problem applies to file buckets that have been split/copied
> > when APR_HAS_MMAP: when one of them gets read it gets made into an MMAP.
> > ALL of the file buckets should be converted at the same time to reference
> > the same MMAP.
>
> I disagree about how to fix this.  The simple solution is to just abstract
> on level, such as:
>
>   bucket   ->     bucket   ->   bucket
>     |               |             |
>   shared          shared	shared
>     |               |             |
>      -----------------------------
>                     |
>                 pool_bucket

So is what you're saying that struct apr_bucket should be a member of TWO
rings... one brigade as usual plus one ring ("brigade") of siblings
buckets that point to the same resource?  I'd thought of that, but didn't
think anyone would buy it.  I'm fine with the idea.  But I *must* be
missing something... how does that keep us from traversing a list of
buckets when we convert types?  (PS: Remember that the "shared" level is
going away.)  If I've missed your point, please elaborate.

> > 5) apr_bucket_destroy() should allow the type-specific destructor to free
> > the bucket itself, too (needed to solve problem #3).
>
> No.  This was done on purpose so that we can cache the buckets themselves.
> If the type-specific destructor frees the bucket itself, then we can't
> cache the freed buckets.  It also makes it very difficult to convert
> buckets from one type to the next.

DOH!!  Yep, you're exactly right... that won't work.  Scratch #5.

> > 6) Should mmap_destroy() call apr_mmap_delete(m->mmap) after when the last
> > reference to the mmap bucket is destroyed?
>
> No.  If you do that, then caching MMAP's becomes impossible.  The
> apr_mmap_delete should be left up to the pool cleanup.

Okay, no problem.  I'd just noticed it when I was going through the code
and thought I'd bring it up, but I wasn't married to the idea.  I figured
it was left out for a reason but didn't know what it was.  Scratch #6.

> > 7) socket_read() and pipe_read() should return APR_SUCCESS even in an EOF
> > condition, as discussed on new-httpd a week or so ago.  I haven't made
> > this change yet, though I have put an XXX comment in the code about it,
> > because it requres fixing the input filters in Apache to figure out EOS on
> > their own.  Looking for suggestions on how to do that.
>
> Didn't we decide that buckets should return EOF?  I could be wrong, but I
> was under the impression that buckets needed to return EOF in order for
> things to work.

No, that's not what we decided.  See message
<3A...@algroup.co.uk> on new-httpd from Ben, which says:
----------------------------------------------------------------------
  bl>>> And I'm, once more, completely confused about what should be
  bl>>> happening when the socket hits EOF in blocking and non-blocking
  bl>>> cases!

  rbb>> IMHO it is really simple.  When a socket hits EOF, it should
  rbb>> simply remove the socket from the brigade.  Period.  That's it.
  rbb>> The socket_read function should never return EOF.  There is no
  rbb>> such thing as EOF to a filter.  Filters only know about EOS.

  bl> Cool.  Again, this makes perfect sense to me.  It does,
  bl> incidentally, have to transmute to an empty bucket, so as not to
  bl> invalidate the pointer held at a higher level...
----------------------------------------------------------------------
Other messages in the thread from both Greg and myself support
this conclusion, as with one from Greg, <20...@lyra.org>:
----------------------------------------------------------------------
  bl>> ... So, the fault is that APR_EOF should _never_ be returned. ...

  gs> Exactly what I said. Never return EOF. Return zero-length buckets.
  gs> Don't requeue if (internally) you hit an EOF.
----------------------------------------------------------------------
Those were the last words on the matter before the subject of the thread
diverged.


> Each of these should be a separate patch, and preferably with a few hours
> in between each patch.

They are.  =-)

--Cliff

Re: Bucket API cleanup issues

Posted by rb...@covalent.net.

> 1) The apr_bucket_shared and apr_bucket_simple structs are completely
> useless, and a needless time/resource drain.  If we just add an apr_off_t
> start into the apr_bucket struct, apr_bucket_shared and apr_bucket_simple
> can go away completely, saving massive amounts of work in all of the
> bucket code and making the code vastly simpler and easier to
> read/understand.  That simplicity can easily be seen in that
> apr_bucket_shared_split is reduced to simple_split plus a refcount++ (same
> for shared_copy).  I've renamed simple_split and simple_copy giving them
> the apr_bucket_ prefix and making them public functions called by their
> shared counterparts.  I have this patched, tested, and ready to go.  If
> anyone wants to see the patch, tell me and I'll post it.

+1

> 2) The parameter 'w' (returned length) to apr_bucket_heap_make() and
> apr_bucket_heap_create() is useless.  The returned length is invariant
> from the passed-in length, assuming malloc didn't fail, which we can
> easily determine from the function's return value.  There's no way to get
> *part* of the data made into a heap bucket... malloc just doesn't do
> "partial" allocations.  You've either got enough memory or you dont.
> Those callers that are currently using the parameter generally discard its
> function anyway, seeming to think that they have to use it just because
> it's there.  This patch is done for apr-util but not for Apache.

+1

> 3) pool_bucket_cleanup() is completely bogus AFAICT.  I've added this
> comment to the code, which describes the problems pretty well:
>     /*
>      * XXX: this is fubar... only the *first* apr_bucket to reference
>      * the bucket is changed to a heap bucket... ALL references must
>      * be changed.  So instead of h->b, we need an array of b's in h.
>      * pool_destroy() should remove that bucket from the array.  But
>      * pool_destroy doesn't know where the apr_bucket itself is.
>      * ARRGGGH.  Only solution: apr_bucket_destroy() should let the
>      * e->type->destroy() function destroy the apr_bucket itself, too.
>      * pool_destroy() can remove destroyed apr_bucket referrers from
>      * the array.  Any referrers that remain when this function is
>      * called get converted to heap buckets by adding a loop over the
>      * array here (only the first call to apr_bucket_heap_make() gets
>      * the copy flag set, of course).  Another gotcha, though this one
>      * is easily worked around: apr_bucket_heap_make() calls
>      * apr_bucket_shared_make() which invalidates the reference count
>      * on the refcounted bucket. --jcw
>      */
>
> 4) The same problem applies to file buckets that have been split/copied
> when APR_HAS_MMAP: when one of them gets read it gets made into an MMAP.
> ALL of the file buckets should be converted at the same time to reference
> the same MMAP.

I disagree about how to fix this.  The simple solution is to just abstract
on level, such as:

  bucket   ->     bucket   ->   bucket
    |               |             |
  shared          shared	shared
    |               |             |
     -----------------------------
                    |
                pool_bucket

Same works for file_buckets, and it is why simple_buckets were invented
IIRC.  Of course if this is done, then heap buckets need to have the same
general setup.  This keeps us from having to traverse a list of buckets
whenever we convert from one type to another.

> 5) apr_bucket_destroy() should allow the type-specific destructor to free
> the bucket itself, too (needed to solve problem #3).

No.  This was done on purpose so that we can cache the buckets themselves.
If the type-specific destructor frees the bucket itself, then we can't
cache the freed buckets.  It also makes it very difficult to convert
buckets from one type to the next.

> 6) Should mmap_destroy() call apr_mmap_delete(m->mmap) after when the last
> reference to the mmap bucket is destroyed?

No.  If you do that, then caching MMAP's becomes impossible.  The
apr_mmap_delete should be left up to the pool cleanup.

> 7) socket_read() and pipe_read() should return APR_SUCCESS even in an EOF
> condition, as discussed on new-httpd a week or so ago.  I haven't made
> this change yet, though I have put an XXX comment in the code about it,
> because it requres fixing the input filters in Apache to figure out EOS on
> their own.  Looking for suggestions on how to do that.

Didn't we decide that buckets should return EOF?  I could be wrong, but I
was under the impression that buckets needed to return EOF in order for
things to work.

> 8) Can I make apr_bucket_shared_destroy() just return a boolean value
> indicating whether or not this bucket is the last reference to the shared
> resource?  It's silly to return a pointer to that resource, since the
> caller must already know where the resource is anyway.

+1

> 9) file_read in the non-mmap case was allocating HUGE_STRING_LEN even if
> the file was smaller than that. Fixed and thrown in with patch #1.

+1

> 10) socket_read and pipe_read should modify the heap bucket they create to
> have alloc_len == HUGE_STRING_LEN to reflect the actual size of buf. Fixed
> and thrown in with patch #1.

+1

> PHEW!  I think that's it for now.  Comments, please?

Each of these should be a separate patch, and preferably with a few hours
in between each patch.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------