You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apr.apache.org by Cliff Woolley <cl...@yahoo.com> on 2001/08/26 03:13:42 UTC

Bucket memory allocation (was Re: SMS parm to bucket_foo_create())

On Sat, 25 Aug 2001, Ryan Bloom wrote:

> Woah!  Those patches were large, and I don't think I agree with them
> at all.

The fact that it's a big patch is why I've put it off this long.  I had
that first phase written months ago and never committed it because it was
large.  Note that much of that first patch was stuff that would have come
right back out again in phase 2... some temporary hacks were necessary as
while I made the basic changes and before changing the API so I could have
something I could test for breakage at several points along the way.

This is the approach we've been talking about doing for months and nobody
objected before as far as I can remember.  I posted this three-phase-plan
thingy on the list on July 8 and nobody voiced any concerns.  It just
hasn't got done because no one ever got around to it.  Phase 1 seemed
harmless enough (big though it was)... but now that I'm working on Phase
2, the more I see how greatly it affects the API, the less I like it, so
I'm willing to change.

> Plus, now we sometimes have both pools and sms's in the
> same structure. How are we synch'ing those two?

We're not.  It's annoying, I know, but we're just counting on the code not
to clean up the data before while we're still using the buckets.  But we
already do that, so that's not a change in the requirements.  Anyhow, the
idea is that the SMS used will live as long as the thread (which is
certainly at least as long as the data), and the caller just keeps giving
you back a pointer to that SMS so you don't have to look up your thread ID
all the time.

> Why can't we just create a free list for buckets?  That way, we take
> the hit for allocating buckets only when necessary, and at some point,
> we will reach a steady state.  The free list, could be per thread, and
> I believe I could have it coded tomorrow night.

That's what we were going to do originally when you and Greg and I and
the others started talking about this, what, 7 or 8 months ago?

But then when the SMS's came around they seemed ideal because we could
just write ourselves up a blocks SMS that gave us our free list tailored
the way we wanted it and pop it in.  The SMS is intended to be per-thread
and last as long as the thread though extra allocated memory might be
cleaned up now and then to avoid leaks, but because we get to choose when
that cleanup occurs, we can still reach our steady state.  As a fringe
benefit, it has been indicated on-list that some custom bucket writers
might want to use custom memory allocation systems, and this takes care of
that without us having to do anything special at all.

(Assume for the purposes of this paragraph that we ARE going to use SMS.)
It was debated whether the SMS to be used should be looked up by the
buckets code itself or whether it should be passed in by the caller.
Passing it in by the caller is what eventually came out seeming the best
because otherwise the buckets code (which has no way to know what its
thread ID is) has to go ask the OS for its thread ID and then look up the
SMS in a hash table or something based on that ID.  If the caller gives it
to us, the caller (which DOES know the thread ID at some point) can just
stash a pointer to that thread's SMS somewhere and pass it in to the
bucket functions and it's constant time instead of pseudo-constant or
logarithmic or linear or whatever.  On the other hand, now that I'm
actually implementing that as phase 2, I sort of dislike how cluttered it
makes the API seem.  So we could take a fork in the road from here and
leave it up to the buckets to find their thread ID.  Less performant, but
not terribly bad.  Probably still better than malloc, though the win is
less clear.  Or we could keep going.

(Assume for the purposes of this paragraph that we're NOT going to use
SMS, either because the group consensus is "SMS were a nice try, but they
couldn't beat pools, so it's time to get rid of them altogether" or
because we'd just rather do something custom here.)  The same thread ID
problem exists for a buckets-specific-non-SMS free list, so you'd end up
doing the call to find out what your thread ID is and then do your hash
lookup because the caller can't help you anymore.

So that was the logic behind it.

Thoughts?

--Cliff


--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA



Re: Bucket memory allocation (was Re: SMS parm to bucket_foo_create())

Posted by Cliff Woolley <cl...@yahoo.com>.
On Sun, 26 Aug 2001, Greg Stein wrote:

> Thread-specific globals are just that: globals. Passing as a parameter gives
> you clarity/maintainability/flexibility/etc. And it can also mean some real
> performance gains in certain places.

That is definitely in-sync with the conclusion that was reached in
previous debates on this matter.  Ignoring the obvious flexibility and so
on and just talking about pure performance, it boiled down to this
example: If you force the buckets code itself to do a little extra work
each time you create a bucket to (look up the SMS|find its free
list|insert your own semantic), then what's going to happen to filters
that create a buttload of buckets?  They're going to have to do tons and
tons of redundant work.  It'd be nice if you only had to do that work once
and propagate the result to the later buckets.  But if you're giving the
caller the interface to do that, the caller might as well figure out what
SMS it wants wayyyyy ahead of time and stash it somewhere so that you
REALLY avoid doing redundant work.  All you have to do per-bucket is take
a pointer from somewhere (eg the conn_rec) and pass it in.  That's clearly
a performance gain over any system that involves per-bucket-creation work.

> Also note that you may want different SMSs based on *applicability* rather
> than based on thread. One SMS for FILE buckets, one SMS for pool buckets,
> etc. And that determination will be made at create time, rather than through
> a "what thread am I on?" question.

That is another benefit, sure.  Though while we're definitely not limited
to a single SMS for all bucket types, what I think we're going to see is
that the best SMS for buckets in general is one that can hand out little
blobs of about 20-60 bytes at a time really really really fast (that's the
approximate range of sizes for various bucket structures) and recycle them
very efficiently.  That's why David and Sander wrote the blocks SMS in the
first place--they had this specifically in mind.  The SMS would not used
to allocate any of the APR structures like apr_file_t, of course, nor
would it be used to allocate space for _data_.  (Unless somebody makes up
an SMS bucket type as a complement to heap and pool buckets, which I'm all
for.  It would probably look a lot like the pool buckets except that it
could fall back on an sms_std instead of having to morph itself to a heap
bucket if the SMS got cleaned up.)


> On a separate note, I really didn't like that last checkin for the buckets
> at all. If you're going to introduce a temporary global variable, and some
> temporary logic in every create function, then I'd suggest two things when
> you do so:
>
> 1) comments on the globals to that effect
> 2) the per-create logic could be encapsulated in a single macro to minimize
>    the line-count changes in those functions, and to focus users on a macro
>    which is commented as being transitory.

That's a good point.  There was a tiny comment in there on the variable
declaration in the header, but I was counting on the commit message to
make it clear that the rest was temporary as well.  More comments and a
macro would have been a good idea.  I'll keep that in mind if I ever have
a need for something like this again.

Thanks,
Cliff


--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA



Re: Bucket memory allocation (was Re: SMS parm to bucket_foo_create())

Posted by Greg Stein <gs...@lyra.org>.
On Sun, Aug 26, 2001 at 11:35:54AM -0700, Brian Pane wrote:
> Cliff Woolley wrote:
>...
> >makes the API seem.  So we could take a fork in the road from here and
> >leave it up to the buckets to find their thread ID.  Less performant, but
> >not terribly bad.  Probably still better than malloc, though the win is
> >less clear.  Or we could keep going.
> >
> IMHO, it's better to pass the SMS through the API than to force the
> bucket code to look it up via the thread ID, because looking up the
> thread ID isn't necessarily a cheap operation (it may require a system
> call on some platforms).

I would agree with Brian.

Thread-specific globals are just that: globals. Passing as a parameter gives
you clarity/maintainability/flexibility/etc. And it can also mean some real
performance gains in certain places.

Also note that you may want different SMSs based on *applicability* rather
than based on thread. One SMS for FILE buckets, one SMS for pool buckets,
etc. And that determination will be made at create time, rather than through
a "what thread am I on?" question.


On a separate note, I really didn't like that last checkin for the buckets
at all. If you're going to introduce a temporary global variable, and some
temporary logic in every create function, then I'd suggest two things when
you do so:

1) comments on the globals to that effect
2) the per-create logic could be encapsulated in a single macro to minimize
   the line-count changes in those functions, and to focus users on a macro
   which is commented as being transitory.

A few months ago when I changed the calling convention on ap_get_brigade(),
I had intended it to be temporary, with a couple more necessary steps to
make things right. I did not properly communicate that, and since the
temporary step left some things broken, Ryan backed it all out instead. Had
I commented about the transitory nature, we may have made forward progress
rather than backing up. [and yes, I hope to make time to *do* that forward
motion again]

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: Bucket memory allocation (was Re: SMS parm to bucket_foo_create())

Posted by Brian Pane <bp...@pacbell.net>.
Cliff Woolley wrote:

>On Sat, 25 Aug 2001, Ryan Bloom wrote:
>
[...]

>(Assume for the purposes of this paragraph that we ARE going to use SMS.)
>It was debated whether the SMS to be used should be looked up by the
>buckets code itself or whether it should be passed in by the caller.
>Passing it in by the caller is what eventually came out seeming the best
>because otherwise the buckets code (which has no way to know what its
>thread ID is) has to go ask the OS for its thread ID and then look up the
>SMS in a hash table or something based on that ID.  If the caller gives it
>to us, the caller (which DOES know the thread ID at some point) can just
>stash a pointer to that thread's SMS somewhere and pass it in to the
>bucket functions and it's constant time instead of pseudo-constant or
>logarithmic or linear or whatever.  On the other hand, now that I'm
>actually implementing that as phase 2, I sort of dislike how cluttered it
>makes the API seem.  So we could take a fork in the road from here and
>leave it up to the buckets to find their thread ID.  Less performant, but
>not terribly bad.  Probably still better than malloc, though the win is
>less clear.  Or we could keep going.
>
IMHO, it's better to pass the SMS through the API than to force the
bucket code to look it up via the thread ID, because looking up the
thread ID isn't necessarily a cheap operation (it may require a system
call on some platforms).

--Brian