You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@apr.apache.org by Brian Pane <bp...@pacbell.net> on 2001/07/08 09:47:32 UTC

Observations on fragmentation in SMS pools

I added some instrumentation to apr_sms_trivial_malloc
to study in more detail where its bottlenecks were.

As a result, I found a few interesting phenomena:

1. The good news is that allocations of small amounts
   of memory are very efficient.  They almost always
   take the fastest path through the code, in which
   some available space is reserved from the
   "sms->used_sentinel.prev" block with a handful of
   pointer arithmetic operations.

2. The bad news is that allocations for larger blocks
   (in the >=8KB range) typically require a call to the
   parent SMS to get data.  On my test machine, I'm seeing
   elapsed times in the 30 microsecond range when this
   happens, compared to less than 1 microsecond for small
   allocations that don't require more memory from the
   parent SMS.  And when an allocation falls through to
   the parent, it often seems to fall all the way through
   to the root SMS (I suspect that 30us includes a malloc).
   The problem seems to be particularly bad for things that
   create subrequests, like mod_include.

3. The worse news is that there seems to be lot of
   fragmentation.  For example, I saw this pattern
   during a server-parsed request:
     - the application code requests 12296 bytes
       from a pool
     - not enough memory is available in the SMS, so it
       requests 16400 bytes from its parent SMS.
     - the parent SMS doesn't have enough free space
       either, so it requests 20504 bytes from the
       grandparent SMS.
     - the grandparent SMS doesn't have enough space
       either, but it has to iterate through 15 blocks
       its free list to figure that out.  Each of these
       blocks has between 8176 and 12272 bytes available.
     - the grandparent calls through to the great-grandparent
       to get 24608 bytes.  The great-grandparent doesn't
       have a block with that much free space, but it
       iterates through 9 blocks in its free list in
       search of one; all of these blocks had 16376 bytes
       free.
     - the great-grandparent thus requests 28712 bytes from
       the great-great grandparent.  The great-great-grandparent
       doesn't have any blocks in its free list, so it calls
       through to its parent, which at last is an SMS that
       does a real malloc.
   This type of pattern may explain the reported higher memory
   use of the SMS-based httpd compared with the original pools;
   there's a lot of memory in those free lists that can't be
   used in this example.

For an SMS that's going to be a parent of other SMSs, we'll
need something with more sophisticated policies for reassigning
freed space than the current trivial-SMS.

--Brian

Re: Observations on fragmentation in SMS pools

Posted by Ben Laurie <be...@algroup.co.uk>.

dean gaudet wrote:
> 
> On Sun, 8 Jul 2001, Ben Laurie wrote:
> 
> > Justin Erenkrantz wrote:
> > > Aaron pointed out that CLR (Cormen/Leiserson/Rivest for those that never
> > > took algorithms classes) has many memory allocation algorithms.  I'd
> > > bet we could find one that would work better.  -- justin
> >
> > At USENIX there was a talk about extending the slab allocator to
> > multiple CPUs. One of the things in the talk was a thing called vmem,
> > which is a general purpose resource allocator with low fragmentation and
> > constant-time allocations (optionally - for more time you can get a
> > better fit). Sounds pretty good, eh?
> >
> > http://www.usenix.org/publications/library/proceedings/usenix01/bonwick.html
> 
> linux 2.4 uses a slab allocator for many critical data structures -- in
> particular the networking code uses it.  this is one of the tricks they
> used betwen 2.2 and 2.4 to increase the scalability to large numbers of
> CPUs.  their allocator does have per-cpu caching, but i don't remember all
> the details.

I should be clear: this gadged goes underneath the slab allocator. That
is, the slab allocator uses memory allocated by vmem. Apparently, this
is because slab allocators always defer the question of how to get big
lumps of memory, and vmem is what it is proposed it should defer it to.

BTW, the obvious thing to do to cure the problem described in the first
place is to not add extra memory after the first step up the tree.

Cheers,

Ben.

--
http://www.apache-ssl.org/ben.html

"There is no limit to what a man can do or how far he can go if he
doesn't mind who gets the credit." - Robert Woodruff

Re: Observations on fragmentation in SMS pools

Posted by dean gaudet <de...@arctic.org>.

On Sun, 8 Jul 2001, Ben Laurie wrote:

> Justin Erenkrantz wrote:
> > Aaron pointed out that CLR (Cormen/Leiserson/Rivest for those that never
> > took algorithms classes) has many memory allocation algorithms.  I'd
> > bet we could find one that would work better.  -- justin
>
> At USENIX there was a talk about extending the slab allocator to
> multiple CPUs. One of the things in the talk was a thing called vmem,
> which is a general purpose resource allocator with low fragmentation and
> constant-time allocations (optionally - for more time you can get a
> better fit). Sounds pretty good, eh?
>
> http://www.usenix.org/publications/library/proceedings/usenix01/bonwick.html

linux 2.4 uses a slab allocator for many critical data structures -- in
particular the networking code uses it.  this is one of the tricks they
used betwen 2.2 and 2.4 to increase the scalability to large numbers of
CPUs.  their allocator does have per-cpu caching, but i don't remember all
the details.

-dean

Re: Observations on fragmentation in SMS pools

Posted by Ben Laurie <be...@algroup.co.uk>.

Justin Erenkrantz wrote:
> Aaron pointed out that CLR (Cormen/Leiserson/Rivest for those that never
> took algorithms classes) has many memory allocation algorithms.  I'd
> bet we could find one that would work better.  -- justin

At USENIX there was a talk about extending the slab allocator to
multiple CPUs. One of the things in the talk was a thing called vmem,
which is a general purpose resource allocator with low fragmentation and
constant-time allocations (optionally - for more time you can get a
better fit). Sounds pretty good, eh?

http://www.usenix.org/publications/library/proceedings/usenix01/bonwick.html

Cheers,

Ben.

--
http://www.apache-ssl.org/ben.html

"There is no limit to what a man can do or how far he can go if he
doesn't mind who gets the credit." - Robert Woodruff

Re: Observations on fragmentation in SMS pools

Posted by Luke Kenneth Casson Leighton <lk...@samba-tng.org>.

regarding fragmentation:

is it reasonable to assume that pre-allocating larger chunks of
memory and sub-dividing them yourself will prevent memory
fragmentation, with the side-effect of having More Code?

use hash tables to optimise the list-walking?

write a better sms instance, one that isn't 'trivial'?

luke

Re: Observations on fragmentation in SMS pools

Posted by Justin Erenkrantz <je...@ebuilt.com>.

On Mon, Jul 09, 2001 at 02:31:35AM +0100, David Reid wrote:
> To be honest changing sms_trivial to use malloc instead of apr_sms_malloc is
> an easy move (just a few lines to change) so it's probably worth trying and
> then seeing what we get...
> 
> BTW, I'm impressed by the amount of traffic this has generated.  Exactly
> what I hoped it would do!
> 
> I guess also I'm wondering if we should apply my patch.  For all the reasons
> given it's not ideal, but it's a starting point, and if in 2 weeks it looks
> totally different, well, no bother, but at least it lets everyone work on it
> with a minimum of fuss...  Opinions?  3 +1's and I'll commit while I'm in
> Pheonix (got a nice connection)

+1 after 2.0.20 is rolled.  -- justin

Re: Observations on fragmentation in SMS pools

Posted by Cliff Woolley <cl...@yahoo.com>.

On Mon, 9 Jul 2001, David Reid wrote:

> I guess also I'm wondering if we should apply my patch.  For all the reasons
> given it's not ideal, but it's a starting point, and if in 2 weeks it looks
> totally different, well, no bother, but at least it lets everyone work on it
> with a minimum of fuss...  Opinions?  3 +1's and I'll commit while I'm in
> Pheonix (got a nice connection)

Please wait until we get a beta2 on Apache (hopefully 2.0.20 will happen
tonight and it'll live up to scrutiny and make it to beta status).  After
that, I'm +1 on concept (haven't looked at the details of the patch yet,
but I will).

--Cliff

--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA

Re: Observations on fragmentation in SMS pools

Posted by Brian Pane <bp...@pacbell.net>.

Sander Striker wrote:

>>To be honest changing sms_trivial to use malloc instead of 
>>apr_sms_malloc is
>>an easy move (just a few lines to change) so it's probably worth 
>>trying and then seeing what we get...
>>
>
>Massive leakage :)
>
>Remember how the apr_sms_reset code works; it does not have to
>traverse and destroy all children since they all use the memory
>from that sms.
>
I propose that it's better to not optimize for reset.

With all the fragmentation I'm seeing in the httpd, the
number of blocks managed by an SMS in the middle of a stack
can be larger than the number of blocks that its descendants
would have to manage (and free upon destruction) if they
got memory from a block source themselves.  Thus we'd likely
have _less_ work to do upon reset if the SMS being reclaimed
had to recurse through its descendants and ask them to return
their blocks by themselves (because the total number of blocks
being returned to SMSs further down the stack would be smaller).

This problem would go away, of course, if the memory management
strategy employed by the parent SMS were one that didn't cause
so much fragmentation.  But I don't see a way to accomplish
that without slowing down allocation.

--Brian

RE: Observations on fragmentation in SMS pools

Posted by Sander Striker <st...@apache.org>.

> To be honest changing sms_trivial to use malloc instead of 
> apr_sms_malloc is
> an easy move (just a few lines to change) so it's probably worth 
> trying and then seeing what we get...

Massive leakage :)

Remember how the apr_sms_reset code works; it does not have to
traverse and destroy all children since they all use the memory
from that sms.

> BTW, I'm impressed by the amount of traffic this has generated.  Exactly
> what I hoped it would do!
> 
> I guess also I'm wondering if we should apply my patch.  For all 
> the reasons
> given it's not ideal, but it's a starting point, and if in 2 
> weeks it looks
> totally different, well, no bother, but at least it lets everyone 
> work on it
> with a minimum of fuss...  Opinions?  3 +1's and I'll commit while I'm in
> Pheonix (got a nice connection)

Don't know if it counts, but +1 :)

> david

Sander

Re: Observations on fragmentation in SMS pools

Posted by David Reid <dr...@jetnet.co.uk>.

To be honest changing sms_trivial to use malloc instead of apr_sms_malloc is
an easy move (just a few lines to change) so it's probably worth trying and
then seeing what we get...

BTW, I'm impressed by the amount of traffic this has generated.  Exactly
what I hoped it would do!

I guess also I'm wondering if we should apply my patch.  For all the reasons
given it's not ideal, but it's a starting point, and if in 2 weeks it looks
totally different, well, no bother, but at least it lets everyone work on it
with a minimum of fuss...  Opinions?  3 +1's and I'll commit while I'm in
Pheonix (got a nice connection)

david

----- Original Message -----
From: "Brian Pane" <bp...@pacbell.net>
To: "Sander Striker" <st...@apache.org>
Cc: "APR Development List" <de...@apr.apache.org>
Sent: Monday, July 09, 2001 1:46 AM
Subject: Re: Observations on fragmentation in SMS pools


> Sander Striker wrote:
>
> >I'll think out loud now:
> >
> Me too :-)
>
> >
> >The solution might be adding specific allocation functions for SMS
> >implementations.  These functions could look something like this:
> >
> >APR_DECLARE(void *) apr_sms_child_malloc(apr_sms_t *sms,
> >                                         apr_sms_t *child,
> >                                         apr_size_t size);
> >
> >APR_DECLARE(apr_status_t) apr_sms_child_free(apr_sms_t *sms,
> >                                             apr_sms_t *child,
> >                                             void *mem);
> >
> >Internally the framework will use the child_malloc_fn if present.  If
> >not, it will fall back to malloc_fn.
> >
> >This is something that overthrows the KIS policy though...
> >
>
> I think this approach (or a variant thereof) has a lot of promise.
>
> One way to look at the root cause of the sms-pool fragmentation
> and performance issues is that the notion of "parent pool" implies
> two things:
>   1. If P is the parent of S, S must cease to exist when
>      P is destroyed.
>   2. If S needs more memory, it gets obtains it from P
>      by calling the "malloc" method of P.
>
> It often doesn't make sense for these two roles to be
> combined in the same object.  Corollary: It doesn't make
> sense for apr_sms_malloc(parent) to be the mechanism by
> which a child allocates additional space.
>
> For example, a pool created for a subrequest needs to go out
> of scope when its parent does, but it doesn't derive any real
> benefit from routing requests for additional blocks through its
> parent.  We could get better performance if the role of the
> parent SMS were to supply a "block source" to the child.  This
> block souce would be a pointer to an SMS that the child should
> call whenever it needed more memory.  The block source could be
> the parent itself, or it could be any ancestor of the parent.
> In the case of a succession of sms_trivials stacked on top of
> an sms_std, each sms_trivial would supply the pointer to the sms_std
> as the block source for its child(ren).  (The resulting setup
> would then look like the original pools implementation, in the
> sense that children would bypass their parents to get additional
> blocks.  The difference is that the parent gets to decide what
> type of block source its children use--so if a stack of SMSs are
> supposed to use memory from a shared memory segment, each SMS
> in the stack can ensure that its descendents are using the shared
> mem block source rather than some malloc(3)-based block source.)
>
> Thoughts?
>
> Thanks,
> --Brian
>
>
>
>

Re: Observations on fragmentation in SMS pools

Posted by Brian Pane <bp...@pacbell.net>.

Sander Striker wrote:

>I'll think out loud now:
>
Me too :-)

>
>The solution might be adding specific allocation functions for SMS
>implementations.  These functions could look something like this:
>
>APR_DECLARE(void *) apr_sms_child_malloc(apr_sms_t *sms,
>                                         apr_sms_t *child,
>                                         apr_size_t size);
>
>APR_DECLARE(apr_status_t) apr_sms_child_free(apr_sms_t *sms,
>                                             apr_sms_t *child,
>                                             void *mem);
>
>Internally the framework will use the child_malloc_fn if present.  If
>not, it will fall back to malloc_fn.
>
>This is something that overthrows the KIS policy though...
>

I think this approach (or a variant thereof) has a lot of promise.

One way to look at the root cause of the sms-pool fragmentation
and performance issues is that the notion of "parent pool" implies
two things:
  1. If P is the parent of S, S must cease to exist when
     P is destroyed.
  2. If S needs more memory, it gets obtains it from P
     by calling the "malloc" method of P.

It often doesn't make sense for these two roles to be
combined in the same object.  Corollary: It doesn't make
sense for apr_sms_malloc(parent) to be the mechanism by
which a child allocates additional space.

For example, a pool created for a subrequest needs to go out
of scope when its parent does, but it doesn't derive any real
benefit from routing requests for additional blocks through its
parent.  We could get better performance if the role of the
parent SMS were to supply a "block source" to the child.  This
block souce would be a pointer to an SMS that the child should
call whenever it needed more memory.  The block source could be
the parent itself, or it could be any ancestor of the parent.
In the case of a succession of sms_trivials stacked on top of
an sms_std, each sms_trivial would supply the pointer to the sms_std
as the block source for its child(ren).  (The resulting setup
would then look like the original pools implementation, in the
sense that children would bypass their parents to get additional
blocks.  The difference is that the parent gets to decide what
type of block source its children use--so if a stack of SMSs are
supposed to use memory from a shared memory segment, each SMS
in the stack can ensure that its descendents are using the shared
mem block source rather than some malloc(3)-based block source.)

Thoughts?

Thanks,
--Brian

Re: Observations on fragmentation in SMS pools

Posted by Bill Stoddard <bi...@wstoddard.com>.

> On Sun, 8 Jul 2001, Ian Holsman wrote:
>
> > Bill Stoddard wrote:
> >
> > >>I've been working on a power of 2 allocator.  I still haven't got that up
> > >>to speed yet, but it is worth a look.  I'll play with it this week and post
> > >>it to the list
> > >>
> > >
> > >I hacked a simple power of two allocator together as a proof of concept that
replacing
> > >malloc/free with apr_malloc/apr_free could give us a performance boost (it does).
Find it
> > >here (it is for Windows).
> > >
> >
> > Hi Bill,
> > I was wondering why the brigade code doesn't make use of pools/sms  to
> > handle it's memory.
> > It has a pool passed to it from what I can see...
>
> Pools aren't used for brigades because when a pool is cleaned up, the
> memory goes away.  This means that whenever we are doing a pipelined
> request, and we allocate the brigade out of the request_rec, we would have
> to copy all of the data into the conn_rec before we could destroy the
> request.  If we allocate everything out of the conn_rec, then we have a
> huge resource leak until the end of the connection.
>
> For a complete description of why we don't use pools for brigades, please
> take a look at the mailing list archives from the time period around when
> the brigades were introduced.  This question was brought up MANY times,
> and it has been answered in great detail.
>
> Ryan

Yep, I cannot add anything to Ryan's answer.

Bill

Re: Observations on fragmentation in SMS pools

Posted by rb...@covalent.net.

On Sun, 8 Jul 2001, Ian Holsman wrote:

> Bill Stoddard wrote:
>
> >>I've been working on a power of 2 allocator.  I still haven't got that up
> >>to speed yet, but it is worth a look.  I'll play with it this week and post
> >>it to the list
> >>
> >
> >I hacked a simple power of two allocator together as a proof of concept that replacing
> >malloc/free with apr_malloc/apr_free could give us a performance boost (it does). Find it
> >here (it is for Windows).
> >
>
> Hi Bill,
> I was wondering why the brigade code doesn't make use of pools/sms  to
> handle it's memory.
> It has a pool passed to it from what I can see...

Pools aren't used for brigades because when a pool is cleaned up, the
memory goes away.  This means that whenever we are doing a pipelined
request, and we allocate the brigade out of the request_rec, we would have
to copy all of the data into the conn_rec before we could destroy the
request.  If we allocate everything out of the conn_rec, then we have a
huge resource leak until the end of the connection.

For a complete description of why we don't use pools for brigades, please
take a look at the mailing list archives from the time period around when
the brigades were introduced.  This question was brought up MANY times,
and it has been answered in great detail.

Ryan

_____________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
Covalent Technologies			rbb@covalent.net
-----------------------------------------------------------------------------

Re: Observations on fragmentation in SMS pools

Posted by Cliff Woolley <cl...@yahoo.com>.

On Sun, 8 Jul 2001, Ian Holsman wrote:

> I was wondering why the brigade code doesn't make use of pools/sms  to
> handle it's memory.
> It has a pool passed to it from what I can see...

The brigade is allocated out of a pool, and that pool is used to register
a cleanup on the brigade to make sure all of the buckets in that brigade
get destroyed.

The buckets themselves use malloc(). The buckets do not and cannot use
pools because the lifetimes could get to be all wrong pretty easily, and
it could be a _major_ resource leak on big requests or pipelined
connections. I'm in the process of switching them to use apr_sms_malloc().
I've got a 3-phase plan for making that switchover and have written phase
1, which I just need to test and commit.

The three phases go something like this (I might have mentioned this on
the list before... I can't remember whether I did or not, so here it is
again):

Phase 1: s/malloc/apr_sms_malloc/g; s/free/apr_sms_free/g; Add in a global
         apr_sms_std instance in the buckets code that all buckets will
         use to get their memory from.  Add an apr_sms_t* to the
         apr_bucket struct so that the apr_bucket_foo_make() functions
         know which SMS to use when allocating memory for private
         structures.  Set that pointer to reference the global
         apr_sms_std instance in all of the apr_bucket_foo_create()
         functions for now.

Phase 2: Add an apr_sms_t parameter to all apr_bucket_foo_create()
         functions that lets the caller determine which SMS to use.
         For now, they can all just pass in a pointer to the global
         apr_sms_std instance.

Phase 3: Create a per-thread blocks SMS in the Apache MPM's and
         stash a pointer to it in the conn_rec for each request.
         Use that instead of the global apr_sms_std when creating
         all buckets and get rid of the global apr_sms_std.

--Cliff

--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA

Re: Observations on fragmentation in SMS pools

Posted by Ian Holsman <ia...@cnet.com>.

Bill Stoddard wrote:

>>I've been working on a power of 2 allocator.  I still haven't got that up
>>to speed yet, but it is worth a look.  I'll play with it this week and post
>>it to the list
>>
>
>I hacked a simple power of two allocator together as a proof of concept that replacing
>malloc/free with apr_malloc/apr_free could give us a performance boost (it does). Find it
>here (it is for Windows).
>

Hi Bill,
I was wondering why the brigade code doesn't make use of pools/sms  to 
handle it's memory.
It has a pool passed to it from what I can see...

..Ian

>
>http://wstoddard.com/patches/apr_mem_alloc.tar
>
>There are several simple changes to this code that could make it much closer to something
>that could be committed.
>
>Bill
>

Re: Observations on fragmentation in SMS pools

Posted by Bill Stoddard <bi...@wstoddard.com>.

>
> I've been working on a power of 2 allocator.  I still haven't got that up
> to speed yet, but it is worth a look.  I'll play with it this week and post
> it to the list

I hacked a simple power of two allocator together as a proof of concept that replacing
malloc/free with apr_malloc/apr_free could give us a performance boost (it does). Find it
here (it is for Windows).

http://wstoddard.com/patches/apr_mem_alloc.tar

There are several simple changes to this code that could make it much closer to something
that could be committed.

Bill

RE: Observations on fragmentation in SMS pools

Posted by Sander Striker <st...@apache.org>.

>>> 1. The good news is that allocations of small amounts
>>>    of memory are very efficient.  They almost always
>>>    take the fastest path through the code, in which
>>>    some available space is reserved from the
>>>    "sms->used_sentinel.prev" block with a handful of
>>>    pointer arithmetic operations.
>
> Cool.

This is what the pools code did.  I only used a different way
to store the required accounting information.

>>> 2. The bad news is that allocations for larger blocks
>>>    (in the >=8KB range) typically require a call to the
>>>    parent SMS to get data.  On my test machine, I'm seeing
>>>    elapsed times in the 30 microsecond range when this
>>>    happens, compared to less than 1 microsecond for small
>>>    allocations that don't require more memory from the
>>>    parent SMS.  And when an allocation falls through to
>>>    the parent, it often seems to fall all the way through
>>>    to the root SMS (I suspect that 30us includes a malloc).
>>>    The problem seems to be particularly bad for things that
>>>    create subrequests, like mod_include.
>
> OK, not so cool.

No, a first optimization might be doing some simple defines in
the sms_private.h file which eliminates the overhead of the
framework code when apr_sms_malloc is called from inside an
sms module. The defines would look something like this:

#define apr_sms_malloc(sms, size)  sms->malloc_fn(sms, size)
#define apr_sms_calloc(sms, size)  sms->calloc_fn(sms, size)

I'm afraid we can't do the free_fn because it might be NULL.

This would get rid of a bit of overhead, but it isn't going to
make an enormous amount of difference.

>>> 3. The worse news is that there seems to be lot of
>>>    fragmentation.  For example, I saw this pattern
>>>    during a server-parsed request:
> <snip.>

>>> For an SMS that's going to be a parent of other SMSs, we'll
>>> need something with more sophisticated policies for reassigning
>>> freed space than the current trivial-SMS.
>
> Definately.
>
>> Yup.  I've brought this up to Sander and David before, but this is how
>
> Really - first i've heard of it :(

Yes, Justin brought something like this up with me when he reviewed the
"trivial" code.  We didn't discuss this usage pattern though.
Ofcourse it is very logical that it happens...

I'll think out loud now:
The solution might be adding specific allocation functions for SMS
implementations.  These functions could look something like this:

APR_DECLARE(void *) apr_sms_child_malloc(apr_sms_t *sms,
                                         apr_sms_t *child,
                                         apr_size_t size);

APR_DECLARE(apr_status_t) apr_sms_child_free(apr_sms_t *sms,
                                             apr_sms_t *child,
                                             void *mem);

Internally the framework will use the child_malloc_fn if present.  If
not, it will fall back to malloc_fn.

This is something that overthrows the KIS policy though...

>> pools work EXCEPT for the fact that we get a little more pathological in
>> SMS's addition of memory because each parent tacks on memory (as you
>> described).  The optimization here is for the <4KB allocation.  I'm not
>> sure that is what we need to be optimizing for.  More analysis is
>> probably needed to see what our typical usage pattern is.  This is where
>> Ian's pool replay code is *very* helpful (although I don't like adding
>> more #ifdef to the pool code - yuck - I'd like to see a SMS that just
>> prints that info and passes it up to the parent to do what it needs
>> to).

I want to state here that POOLS_ARE_SMS effectively makes
each pool a "trivial" sms. To take full advantage of the SMS
code the stack of SMSs should be optimized. Right now we
have this (in httpd with POOLS_ARE_SMS):

std <> trivial <> trivial [<> trivial ...]
^^^^^^^^^^^^^^    ^^^^^^^^^^^^^^^^^^^^^^^^
     APR              httpd

Justin already hinted in a private mail that he was thinking
about something like this:

  ... <> trivial <> tracking [<> tracking ...]

In the buckets code "blocks" could be a very good choice.

So, for a first pass at allowing httpd to run with pools
as sms is not optimal, but it gets things working.

> Yep, i had a evry simple debug sms that simply printed out what it was
> doing, then did it.  Maybe it's time to revive it, but it
> generates a LOT of output.

Another option is just using the debug code you put in and tag the
relevant SMSs. This way you can see the usage patterns in each part
of the application.

>> Before turning SMS on in the mainline code by default, I think we need
>> to come up with a more robust allocation algorithm.  And, the whole
>> point of SMS is that we can play with memory allocation algorithms
>> until we are blue in the face and nothing else changes.  You can't do
>> that with the current pools code.  We need more people playing with it
>> and suggesting new allocation algorithms.  =)
>
> Absolutely.  This was my motivation in doing the work in the first place,
> and it seems to be working.

Yes :)

I've been working on a power of 2 allocator.  I still haven't got that up
to speed yet, but it is worth a look.  I'll play with it this week and post
it to the list.

>> The trivial SMS tries to keep the memory allocation strategy as close as
>> possible to what we currently have.
>
> Oh, we're nowhere near ready to turn it on by default.  I never thought we
> were, but you need a test case to prove/disprove what's going on, and that
> what this whole pools as sms code gives us, a complex example that we can
> use to improve our code :)
>
>> Personally, I'd set MIN_FREE to 0 and bite the small allocations (now
>> they'd be slow, but really, how many allocations are under 4KB - data
>> structures, perhaps?).  I'd also like to see it smarter about reclaiming
>> free space, but that gets really tricky.
>
> Oh, if you look at it we do a LOT of allocations below 4K from pools.
Turn
> on the checking of every allocation and look at the sizes being requested.

:)

About the free space reclaiming. It is very hard to implement without
significant performance loss.  There is something else that is worth a try
and effectively does away with the entire traversal of the free list in
search of a big enough block.

We would need to implement both the used list and the free list as a
fibbonacci (hope I spelled this right) heap.  This way our biggest block
is always on top.  So, when we check the free list we only need to look
at our top block.  The reset function is doing a merge of the free and
used list.

>> It's a tradeoff (and is purposeful for lots of small allocations), but
>> until someone can write a better memory allocation algorithm, this is
>> what we got.  Trivial SMS has its Achilles heel here - this is just an
>> aggrevation of how the original pool code works (the pool code would add
>> 4KB, while each level of the SMS adds 4KB).
>
> Well, we have what we have.  My main concern about trivial is
> that the code gets complex, and im worried that we may be over
complicating
> things for ourselves.  KISS is a guiding principal that it never does any
> harm to follow
> :)

Although I agree with you that KISS is a good principle to follow, it is
nearly impossible to do so in "trivial".  Well, actually it is rather
simple,
but not very obvious.  This is mostly due to the fact that I wanted to keep
all codepaths as short as possible (there might even still be room to make
them shorter).  Memory management code tends to get complex.  This is mainly
because it is a performance bottleneck in a lot of cases.  So, it is a
trade off: complexity <-> speed.  Ofcourse it can never hurt to look for
the lowest complexity, highest speed pair ;)

>> Aaron pointed out that CLR (Cormen/Leiserson/Rivest for those that never
>> took algorithms classes) has many memory allocation algorithms.  I'd
>> bet we could find one that would work better.  -- justin
>
> The sms code was added on May 9th, which looks like less than 2
> months ago, so to be honest we're in pretty damn good shape.  Let's take
> some time, do the testing and define the problems then start looking at
> fixes.

I'm very pleased with the speed this code has been moving at and the support
that it has had.  Thanks everyone!

> Rushing in with a lot of quick solutions won't get us anywhere but a world
> of hurt.

Agreed.  The final step in the whole process is when sms is turned on by
default.  Then we can optimize the sms stacks and really take advantage of
the code.

> david

Sander

Re: Observations on fragmentation in SMS pools

Posted by David Reid <dr...@jetnet.co.uk>.

> > 1. The good news is that allocations of small amounts
> >    of memory are very efficient.  They almost always
> >    take the fastest path through the code, in which
> >    some available space is reserved from the
> >    "sms->used_sentinel.prev" block with a handful of
> >    pointer arithmetic operations.

Cool.

> >
> > 2. The bad news is that allocations for larger blocks
> >    (in the >=8KB range) typically require a call to the
> >    parent SMS to get data.  On my test machine, I'm seeing
> >    elapsed times in the 30 microsecond range when this
> >    happens, compared to less than 1 microsecond for small
> >    allocations that don't require more memory from the
> >    parent SMS.  And when an allocation falls through to
> >    the parent, it often seems to fall all the way through
> >    to the root SMS (I suspect that 30us includes a malloc).
> >    The problem seems to be particularly bad for things that
> >    create subrequests, like mod_include.

OK,not so cool.

> >
> > 3. The worse news is that there seems to be lot of
> >    fragmentation.  For example, I saw this pattern
> >    during a server-parsed request:
<snip.>

> >
> > For an SMS that's going to be a parent of other SMSs, we'll
> > need something with more sophisticated policies for reassigning
> > freed space than the current trivial-SMS.

Definately.

>
> Yup.  I've brought this up to Sander and David before, but this is how

Really - first i've heard of it :(

> pools work EXCEPT for the fact that we get a little more pathological in
> SMS's addition of memory because each parent tacks on memory (as you
> described).  The optimization here is for the <4KB allocation.  I'm not
> sure that is what we need to be optimizing for.  More analysis is
> probably needed to see what our typical usage pattern is.  This is where
> Ian's pool replay code is *very* helpful (although I don't like adding
> more #ifdef to the pool code - yuck - I'd like to see a SMS that just
> prints that info and passes it up to the parent to do what it needs
> to).

Yep, i had a evry simple debug sms that simply printed out what it was
doing, then did it.  Maybe it's time to revive it, but it generates a LOT of
output.

>
> Before turning SMS on in the mainline code by default, I think we need
> to come up with a more robust allocation algorithm.  And, the whole
> point of SMS is that we can play with memory allocation algorithms
> until we are blue in the face and nothing else changes.  You can't do
> that with the current pools code.  We need more people playing with it
> and suggesting new allocation algorithms.  =)

Absolutely.  This was my motivation in doing the work in the first place,
and it seems to be working.

> The trivial SMS tries to
> keep the memory allocation strategy as close as possible to what we
> currently have.

Oh, we're nowhere near ready to turn it on by default.  I never thought we
were, but you need a test case to prove/disprove what's going on, and that
what this whole pools as sms code gives us, a complex example that we can
use to improve our code :)

>
> Personally, I'd set MIN_FREE to 0 and bite the small allocations (now
> they'd be slow, but really, how many allocations are under 4KB - data
> structures, perhaps?).  I'd also like to see it smarter about reclaiming
> free space, but that gets really tricky.

Oh, if you look at it we do a LOT of allocations below 4K from pools.  Turn
on the checking of every allocation and look at the sizes being requested.

>
> It's a tradeoff (and is purposeful for lots of small allocations), but
> until someone can write a better memory allocation algorithm, this is
> what we got.  Trivial SMS has its Achilles heel here - this is just an
> aggrevation of how the original pool code works (the pool code would add
> 4KB, while each level of the SMS adds 4KB).

Well, we have what we have.  My main concern about trivial is that the code
gets complex, and im worried that we may be over complicating things for
ourselves.  KISS is aguiding principal that it never does any harm to follow
:)

>
> Aaron pointed out that CLR (Cormen/Leiserson/Rivest for those that never
> took algorithms classes) has many memory allocation algorithms.  I'd
> bet we could find one that would work better.  -- justin

The sms code was added on May 9th, which looks like less than 2 months ago,
so to be honest we're in pretty damn good shape.  Let's take some time, do
the testing and define the problems then start looking at fixes.  Rushing in
with a lot of quick solutions won't get us anywhere but a world of hurt.

david

Re: Observations on fragmentation in SMS pools

Posted by Justin Erenkrantz <je...@ebuilt.com>.

On Sun, Jul 08, 2001 at 10:18:12AM -0700, dean gaudet wrote:
> 
> 
> On Sun, 8 Jul 2001, Justin Erenkrantz wrote:
> 
> > Yup.  I've brought this up to Sander and David before, but this is how
> > pools
> 
> woah.  no way really?
> 
> that's not at all how it was in 1.3 or in early 2.0 ...
> 
> in 2.0 as of uh a year ago say, there was one free list per process,
> and locks were used to access it.
> 
> no matter where in the tree of pools you tried an allocation, if it
> didn't fit into the current block, the allocator would lock and go to
> the global free block list and pick up the first fit block.  none of this
> going up through a chain of pools or anything, that's insane.

No, nothing changed.  We just added the chain of pools.  =)  It's a
work in progress.  You may call it a "feature" right now.  We're not 
saying it is ready to be activated - just hammered/examined by people
on this list.  That definitely needs to be cleaned up.  We know that.

If we switch that global free block list to per-thread (where we are
heading), is this allocation algorithm the best one for httpd-style 
memory usage?  If so, then I guess I won't look at other allocation
algorithms.  I'll defer to people who've studied this a lot longer 
than I have.  -- justin

Re: Observations on fragmentation in SMS pools

Posted by Luke Kenneth Casson Leighton <lk...@samba-tng.org>.

On Sun, Jul 08, 2001 at 11:04:09AM -0700, Jon Travis wrote:

> Erm, you could always mlock() the memory that gets passed out of the
> pool code.  More than likely, by making some 'secure' pool, you will
> be allocating a lot more than just security data from it (as pools
> tend to be abused), 

one of the original justifications, jon, for sms, was exactly this:
to provide a means to mlock a pool, for security purposes, and not
have to change massive amounts of code.

luke

Re: Observations on fragmentation in SMS pools

Posted by Ian Holsman <ia...@cnet.com>.

Jon Travis wrote:

>On Sun, Jul 08, 2001 at 11:27:08AM -0700, Ian Holsman wrote:
>
>>Justin Erenkrantz wrote:
>>
>>>On Sun, Jul 08, 2001 at 11:04:09AM -0700, Jon Travis wrote:
>>>
[snip]

>
>I'm still curious about the use case.  I'm sure there is one, if you
>could just enlighten me.  I already know it's 'cool'.
>
for every page request which comes in, we need to figure out what app is 
serving it, and which category it belongs to
(eg... it is a price comparision page for a mobile phone), this 
information is used by other modules to determine different things.
(eg... we store info on the category tree so we know that a mobile phone 
category is part of  electronics.

so we plug parts of the URI into a hash table and store info about it 
for each different category.
now.. the reason we are going with shared memory on this is that there 
are ~10,000 (soon to be 60,000) categories on our site,
and we also have other info we store so it blows out to be ~8M->64M of 
memory being used just to store this stuff

we 'could' have used a gdbm/db but I think that direct shared memory 
would be faster than a file based DB shoved into memory cache.

The other unique application of shared memory that we plan to do soon is 
having a state table stored in shared mem to implement a faster setenvIf 
module.

..Ian

(ps.. if you can suggest other methods of doing the above, I would be 
greatfull for the ideas)

>
>-- Jon
>

Re: Observations on fragmentation in SMS pools

Posted by Jon Travis <jt...@covalent.net>.

On Sun, Jul 08, 2001 at 11:27:08AM -0700, Ian Holsman wrote:
> Justin Erenkrantz wrote:
> 
> >On Sun, Jul 08, 2001 at 11:04:09AM -0700, Jon Travis wrote:
> >
> >>As for the ability to use shared memory for this ... yeah that is
> >>an interesting idea.  What are your actual use cases for that?
> >>
> >
> >Ian posted a patch to do something like this - I think he wanted a hash
> >table that was shared across all processes.  The problem gets to be 
> >growing the memory space, but I think his use case was with fixed 
> >memory.  He could probably tell more about how he wanted to do it.  
> >
> 
> yep... It didn't  had a fixed amount of memory which you specified at 
> the start, and had fixed element/key sizes
> It was implemented using arrays/offsets, so it didn't have to deal with 
> pointers.
> 
> I was thinking mmap'ed files could be used to provide expandability.. ..
> 
> hmm... sms - mmap might be interesting..
> 
> The code I posted didn't work as well as the current stuff does BTW, I 
> can put i t up on a website if you would like.

I'm still curious about the use case.  I'm sure there is one, if you
could just enlighten me.  I already know it's 'cool'.

-- Jon

Re: Observations on fragmentation in SMS pools

Posted by Ian Holsman <ia...@cnet.com>.

Justin Erenkrantz wrote:

>On Sun, Jul 08, 2001 at 11:04:09AM -0700, Jon Travis wrote:
>
>>As for the ability to use shared memory for this ... yeah that is
>>an interesting idea.  What are your actual use cases for that?
>>
>
>Ian posted a patch to do something like this - I think he wanted a hash
>table that was shared across all processes.  The problem gets to be 
>growing the memory space, but I think his use case was with fixed 
>memory.  He could probably tell more about how he wanted to do it.  
>

yep... It didn't  had a fixed amount of memory which you specified at 
the start, and had fixed element/key sizes
It was implemented using arrays/offsets, so it didn't have to deal with 
pointers.

I was thinking mmap'ed files could be used to provide expandability.. ..

hmm... sms - mmap might be interesting..

The code I posted didn't work as well as the current stuff does BTW, I 
can put i t up on a website if you would like.

>
>
>Both rbb and I suggested that this is what SMS was designed for.
>Have a SMS that is backed by shared memory and then the hashtable
>immediately becomes shared when you create it with this shared memory
>SMS.  No change needed to the hashtable code.  
>
>And, potentially, the scoreboard could (??) get switched to something
>using a shared memory SMS.  But, I don't know enough about the 
>scoreboard to say more about that.  -- justin
>

Re: Hash tables and request headers Re: observations on fragmentation in SMS pools

Posted by dean gaudet <de...@arctic.org>.

On Tue, 10 Jul 2001, Brian Pane wrote:

>   1. Add a 2nd create function for apr_hash_t that lets the caller
>      supply three callback functions:
>      - a hash function
>      - an equality-check function
>      - a concatenation function (if non-NULL, setting x=y followed by
>        x=y results in concat_fn(y,z) being stored in the table as the
>        value for x)

if you do concatenation this way you'll end up back in the land of O(n^2)
DOS attacks which we fixed years ago... this is why the merging is done
with those odd looking sort functions.  (it's in the archives if you want
to dig it all up, there were a lot of O(n^2) attacks :)

see get_mime_headers and apr_table_overlap.

i'm not sure it's appropriate to use generic interfaces with function
pointer indirections when you're trying to make something go faster.  if
you want it go fast, then don't make it generic.  make it specific.

(aside on modern cpus:  indirect function calls with multiple targets
hurt... they hurt because the CPU has difficulty predicting where the
branch/call target will be, so it has difficulty pre-fetching and decoding
across the branch/call.  this typically results in stalls in the pipeline
-- the number of stalls depends on how many stages from the front of the
pipeline the branch execute phase is.  some CPUs do just fine if your
indirect function call always has the same target -- but as soon as you
add a second target, this predictor will break.)

there's a lot of compiler theory which is appropriate here -- the HTTP
header names themselves are very similar to keywords in a language.
typically a tool such as gperf is used to map keywords to an enumerated
type when writing (fast) parsers for languages.  this would be a great
thing for a webserver other than apache, apache can't use it directly
because we're so general we allow people to modify the set of
recognized/generated HTTP headers at run-time.

i think ultimately what you want to be able to do is the following:

- for all common HTTP headers we've got a compile-time perfect hash
  mapping keyword <-> header_enum

- get_mime_headers uses that hash and stores the headers in a
  structure mapping header_enum -> header value

- r->headers_{in,out,err} manipulations use the integer values rather
  than the string names  (this allows table lookups simply by comparing
  integers rather than strings)

- an additional mechanism exists to add to the enum at run-time

the latter part is a pain in the ass.  (but it's pretty typical of apache
perf problems :)

initially at least, you might as well do the dynamic part using the
existing tables... so you've got a hybrid structure which supports quick
lookups for the core headers, and the same old same old for the rest.

do this just to find out how much of a perf benefit you're getting from
the perfect hash.  (hell just ignore the dynamic header crud for perf
testing.)

but you'll probably get resistance to putting this into the server for
good... 'cause if a module uses a "non-standard" header "BlahBlah" and
then rfc3942 HTTP/1.2 makes BlahBlah standard, and it's moved into the
perfect hash, then the module would be broken.

(i've got other ideas, but it's senseless to think about them until some
sort of perf improvement is shown.)

-dean

Re: Hash tables and request headers Re: observations on fragmentation in SMS pools

Posted by dean gaudet <de...@arctic.org>.

On Tue, 10 Jul 2001, Brian Pane wrote:

>   1. Add a 2nd create function for apr_hash_t that lets the caller
>      supply three callback functions:
>      - a hash function
>      - an equality-check function
>      - a concatenation function (if non-NULL, setting x=y followed by
>        x=y results in concat_fn(y,z) being stored in the table as the
>        value for x)

if you do concatenation this way you'll end up back in the land of O(n^2)
DOS attacks which we fixed years ago... this is why the merging is done
with those odd looking sort functions.  (it's in the archives if you want
to dig it all up, there were a lot of O(n^2) attacks :)

see get_mime_headers and apr_table_overlap.

i'm not sure it's appropriate to use generic interfaces with function
pointer indirections when you're trying to make something go faster.  if
you want it go fast, then don't make it generic.  make it specific.

(aside on modern cpus:  indirect function calls with multiple targets
hurt... they hurt because the CPU has difficulty predicting where the
branch/call target will be, so it has difficulty pre-fetching and decoding
across the branch/call.  this typically results in stalls in the pipeline
-- the number of stalls depends on how many stages from the front of the
pipeline the branch execute phase is.  some CPUs do just fine if your
indirect function call always has the same target -- but as soon as you
add a second target, this predictor will break.)

there's a lot of compiler theory which is appropriate here -- the HTTP
header names themselves are very similar to keywords in a language.
typically a tool such as gperf is used to map keywords to an enumerated
type when writing (fast) parsers for languages.  this would be a great
thing for a webserver other than apache, apache can't use it directly
because we're so general we allow people to modify the set of
recognized/generated HTTP headers at run-time.

i think ultimately what you want to be able to do is the following:

- for all common HTTP headers we've got a compile-time perfect hash
  mapping keyword <-> header_enum

- get_mime_headers uses that hash and stores the headers in a
  structure mapping header_enum -> header value

- r->headers_{in,out,err} manipulations use the integer values rather
  than the string names  (this allows table lookups simply by comparing
  integers rather than strings)

- an additional mechanism exists to add to the enum at run-time

the latter part is a pain in the ass.  (but it's pretty typical of apache
perf problems :)

initially at least, you might as well do the dynamic part using the
existing tables... so you've got a hybrid structure which supports quick
lookups for the core headers, and the same old same old for the rest.

do this just to find out how much of a perf benefit you're getting from
the perfect hash.  (hell just ignore the dynamic header crud for perf
testing.)

but you'll probably get resistance to putting this into the server for
good... 'cause if a module uses a "non-standard" header "BlahBlah" and
then rfc3942 HTTP/1.2 makes BlahBlah standard, and it's moved into the
perfect hash, then the module would be broken.

(i've got other ideas, but it's senseless to think about them until some
sort of perf improvement is shown.)

-dean

Hash tables and request headers Re: observations on fragmentation in SMS pools

Posted by Brian Pane <bp...@pacbell.net>.

Roy T. Fielding wrote:

>>Another reason why apr_hash_t isn't a good match for HTTP headers
>>is that the same field name may appear multiple times in the request
>>or response headers (section 4.2 of RFC 2616), but the hash table
>>implementation is designed around unique keys.
>>
>
>HTTP headers were designed to be processed as a hash table, hence the
>rules regarding combining multiple fields with the same name.  The only
>exception is Set-Cookie, because the folks at Netscape didn't follow
>the rules.
>
Ah, I was wondering why RFC 2109 seemed at odds with 2616 on the
subject of field-combining syntax.  :-(

If Set-Cookie is the only exception, it might not be a bad idea to
use apr_hash_t for the request and response headers.  I'm even willing
to contribute a patch for this, but it would impact both APR (slightly)
and Apache (a lot).  Here's my proposed approach:

  1. Add a 2nd create function for apr_hash_t that lets the caller
     supply three callback functions:
     - a hash function
     - an equality-check function
     - a concatenation function (if non-NULL, setting x=y followed by
       x=y results in concat_fn(y,z) being stored in the table as the
       value for x)
  2. Change the headers_in and headers_out fields of the request_req
     from tables to hash tables.  Supply case-insensitive hash and
     comparison functions for these hash tables.  For headers_out, also
     supply a concatenation function that builds comma-separated lists
     (and semicolon-separated lists when the key is set-cookie)
  3. Change all of the references to headers_in and headers_out in
     the core and standard modules to use the hash API instead of
     the table API.
  4. Document the change so that maintainers of 3rd party modules
     know how to make their code compile once again.

Thoughts?  Is a change like this worth doing?

--Brian

Hash tables and request headers Re: observations on fragmentation in SMS pools

Posted by Brian Pane <bp...@pacbell.net>.

Roy T. Fielding wrote:

>>Another reason why apr_hash_t isn't a good match for HTTP headers
>>is that the same field name may appear multiple times in the request
>>or response headers (section 4.2 of RFC 2616), but the hash table
>>implementation is designed around unique keys.
>>
>
>HTTP headers were designed to be processed as a hash table, hence the
>rules regarding combining multiple fields with the same name.  The only
>exception is Set-Cookie, because the folks at Netscape didn't follow
>the rules.
>
Ah, I was wondering why RFC 2109 seemed at odds with 2616 on the
subject of field-combining syntax.  :-(

If Set-Cookie is the only exception, it might not be a bad idea to
use apr_hash_t for the request and response headers.  I'm even willing
to contribute a patch for this, but it would impact both APR (slightly)
and Apache (a lot).  Here's my proposed approach:

  1. Add a 2nd create function for apr_hash_t that lets the caller
     supply three callback functions:
     - a hash function
     - an equality-check function
     - a concatenation function (if non-NULL, setting x=y followed by
       x=y results in concat_fn(y,z) being stored in the table as the
       value for x)
  2. Change the headers_in and headers_out fields of the request_req
     from tables to hash tables.  Supply case-insensitive hash and
     comparison functions for these hash tables.  For headers_out, also
     supply a concatenation function that builds comma-separated lists
     (and semicolon-separated lists when the key is set-cookie)
  3. Change all of the references to headers_in and headers_out in
     the core and standard modules to use the hash API instead of
     the table API.
  4. Document the change so that maintainers of 3rd party modules
     know how to make their code compile once again.

Thoughts?  Is a change like this worth doing?

--Brian

Re: Observations on fragmentation in SMS pools

Posted by "Roy T. Fielding" <fi...@ebuilt.com>.

> Another reason why apr_hash_t isn't a good match for HTTP headers
> is that the same field name may appear multiple times in the request
> or response headers (section 4.2 of RFC 2616), but the hash table
> implementation is designed around unique keys.

HTTP headers were designed to be processed as a hash table, hence the
rules regarding combining multiple fields with the same name.  The only
exception is Set-Cookie, because the folks at Netscape didn't follow
the rules.

....Roy

Re: Observations on fragmentation in SMS pools

Posted by Brian Pane <bp...@pacbell.net>.

Justin Erenkrantz wrote:

>On Mon, Jul 09, 2001 at 08:14:26PM -0700, Jon Travis wrote:
>
>>Tables should definitely be moved to APR-util if they are to remain.  As
>>for Apache, there are better structures that dictate general order than
>>the table.  IMNSHO, the only reason tables are still in Apache is inertia.
>>Nobody wants to go through and change everything to a more sane data
>>structure.  Case insensitive searches -- linear search time over the
>>table ... ugh.  
>>
>
>One problem with hash tables is that if you want to retrieve a HTTP
>header, you now need to have it be the same case as it came in (unless
>you are normalizing every call).  RFC 2616 states that (Section 4.2):
>
>"Field names are case-insensitive."
>
>Therefore, if the client sends:
>
>CoNnEcTiOn: Keep-Alive
>
>it should respect that.  And, this data structure is primarily used for 
>storing HTTP header info (I'm not sure about mod_mime).  I'm not aware 
>of any browsers that do this, but I'd bet someone will because the RFC 
>says you can.  (It's possible I'm misinterpreting the RFC, wouldn't be 
>the first time...)
>
>AFAICT, this doesn't call for a true hash table or a true array.  It's 
>something slightly different.  Any data structure which replaces tables 
>needs to be able to handle the case-insensitivity gracefully.
>
>Just want to make sure that any solution keeps us correct to the RFC.  
>-- justin
>

Another reason why apr_hash_t isn't a good match for HTTP headers
is that the same field name may appear multiple times in the request
or response headers (section 4.2 of RFC 2616), but the hash table
implementation is designed around unique keys.

--Brian

Re: Observations on fragmentation in SMS pools

Posted by Justin Erenkrantz <je...@ebuilt.com>.

On Mon, Jul 09, 2001 at 08:14:26PM -0700, Jon Travis wrote:
> Tables should definitely be moved to APR-util if they are to remain.  As
> for Apache, there are better structures that dictate general order than
> the table.  IMNSHO, the only reason tables are still in Apache is inertia.
> Nobody wants to go through and change everything to a more sane data
> structure.  Case insensitive searches -- linear search time over the
> table ... ugh.  

One problem with hash tables is that if you want to retrieve a HTTP
header, you now need to have it be the same case as it came in (unless
you are normalizing every call).  RFC 2616 states that (Section 4.2):

"Field names are case-insensitive."

Therefore, if the client sends:

CoNnEcTiOn: Keep-Alive

it should respect that.  And, this data structure is primarily used for 
storing HTTP header info (I'm not sure about mod_mime).  I'm not aware 
of any browsers that do this, but I'd bet someone will because the RFC 
says you can.  (It's possible I'm misinterpreting the RFC, wouldn't be 
the first time...)

AFAICT, this doesn't call for a true hash table or a true array.  It's 
something slightly different.  Any data structure which replaces tables 
needs to be able to handle the case-insensitivity gracefully.

Just want to make sure that any solution keeps us correct to the RFC.  
-- justin

Re: Observations on fragmentation in SMS pools

Posted by Luke Kenneth Casson Leighton <lk...@samba-tng.org>.

> > > and does anyone want a challenge of porting tdb to apr?
> > > *grin*
> > 
> > Challenge, did somebody say challenge?  I'm always up for a challenge.
> > :-)

:)  i just checked the codebase.  it uses:

lseek
fcntl
read
write
open
close
mmap (if supported)
munmap (if supported)
malloc (3 times)
free (16 times)
realloc (once, in tdb_expand(), in TDB_INTERNAL mode.  you don't _have_
         to use a file with tdb, you can do in-memory only).

so, not exactly a _big_ challenge, which is the kind of challenges
i like: small and gratifying and part of something larger.

the larger:

cool code that could provide fast, SHARED in-memory, multi-reader
multi-writer simultaneous access, and oh yeah, it also has the
option to do files as well as in-memory.

the keys, however, are a unique (void*, size_t) tuple,
and so is the data.


the bad news:

does anyone want to contact the three remaining authors,
rusty russell, andrew tridgell (tridge@valinux.com) and
jeremey allison (jallison@valinux.com) to see if they'd
like to release a version under the APR license?

also, anton blanchard (sparc32 kernel hacker Extreme :)
who wrote the spinlock code (no, it's not on the
list of copyrights, i used to work at same company as
him when he was writing it).

whoever decides to contact them, please DO NOT mention
my name, it will have the OPPOSITE intended effect, i
guarantee that, 100%.

luke

Re: table inefficiencies Re: Observations on fragmentation in SMS pools

Posted by Brian Pane <bp...@pacbell.net>.

Cliff Woolley wrote (on apr-dev):

>[Is it just me or is it nearly impossible to have a conversation about
>Apache or APR that doesn't in some way belong on BOTH lists? <sigh>]
>
>
>On Mon, 9 Jul 2001, Brian Pane wrote:
>
>>It's worth noting that half of the apr_table_get calls in
>>Apache are from mod_mime.  I posted a patch to new-httpd
>>a couple of weeks ago that replaces mod_mime's tables with
>>hash tables.
>>
>
>Can you repost that to new-httpd?  I'd forgotten about it but am
>definitely interested
>
Here's the patch:

Index: mod_mime.c
===================================================================
RCS file: /home/cvspublic/httpd-2.0/modules/http/mod_mime.c,v
retrieving revision 1.42
diff -u -r1.42 mod_mime.c
--- mod_mime.c    2001/06/18 05:36:31    1.42
+++ mod_mime.c    2001/07/10 06:26:35
@@ -66,6 +66,7 @@
 #include "apr.h"
 #include "apr_strings.h"
 #include "apr_lib.h"
+#include "apr_hash.h"
 
 #define APR_WANT_STRFUNC
 #include "apr_want.h"
@@ -96,12 +97,19 @@
     char *name;
 } attrib_info;
 
+/* Information to which an extension can be mapped
+ */
+typedef struct extension_info {
+    char *forced_type;                /* Additional AddTyped stuff */
+    char *encoding_type;              /* Added with AddEncoding... */
+    char *language_type;              /* Added with AddLanguage... */
+    char *handler;                    /* Added with AddHandler... */
+    char *charset_type;               /* Added with AddCharset... */
+} extension_info;
+
 typedef struct {
-    apr_table_t *forced_types;        /* Additional AddTyped stuff */
-    apr_table_t *encoding_types;      /* Added with AddEncoding... */
-    apr_table_t *language_types;      /* Added with AddLanguage... */
-    apr_table_t *handlers;            /* Added with AddHandler...  */
-    apr_table_t *charset_types;       /* Added with AddCharset... */      
+    apr_hash_t  *extension_mappings;  /* Map from extension name to
+                       * extension_info structure */
     apr_array_header_t *handlers_remove;  /* List of handlers to remove */
     apr_array_header_t *types_remove;     /* List of MIME types to 
remove */
     apr_array_header_t *encodings_remove; /* List of encodings to remove */
@@ -138,14 +146,10 @@
     mime_dir_config *new =
     (mime_dir_config *) apr_palloc(p, sizeof(mime_dir_config));
 
-    new->forced_types = apr_table_make(p, 4);
-    new->encoding_types = apr_table_make(p, 4);
-    new->charset_types = apr_table_make(p, 4);
-    new->language_types = apr_table_make(p, 4);
-    new->handlers = apr_table_make(p, 4);
-    new->handlers_remove = apr_array_make(p, 4, sizeof(attrib_info));
-    new->types_remove = apr_array_make(p, 4, sizeof(attrib_info));
-    new->encodings_remove = apr_array_make(p, 4, sizeof(attrib_info));
+    new->extension_mappings = apr_hash_make(p);
+    new->handlers_remove = NULL;
+    new->types_remove = NULL;
+    new->encodings_remove = NULL;
 
     new->type = NULL;
     new->handler = NULL;
@@ -154,41 +158,107 @@
     return new;
 }
 
+/*
+ * Overlay one hash table of extension_mappings onto another
+ */
+static void overlay_extension_mappings(apr_pool_t *p,
+                       apr_hash_t *overlay, apr_hash_t *base)
+{
+    apr_hash_index_t *index;
+    for (index = apr_hash_first(overlay); index;
+     index = apr_hash_next(index)) {
+    char *key;
+    apr_ssize_t klen;
+    extension_info *overlay_info, *base_info;
+
+    apr_hash_this(index, (const void**)&key, &klen, (void**)&overlay_info);
+    base_info = (extension_info*)apr_hash_get(base, key, klen);
+    if (base_info) {
+        if (overlay_info->forced_type) {
+        base_info->forced_type = overlay_info->forced_type;
+        }
+        if (overlay_info->encoding_type) {
+        base_info->encoding_type = overlay_info->encoding_type;
+        }
+        if (overlay_info->language_type) {
+        base_info->language_type = overlay_info->language_type;
+        }
+        if (overlay_info->handler) {
+        base_info->handler = overlay_info->handler;
+        }
+        if (overlay_info->charset_type) {
+        base_info->charset_type = overlay_info->charset_type;
+        }
+    }
+    else {
+        base_info = (extension_info*)apr_palloc(p, sizeof(extension_info));
+        memcpy(base_info, overlay_info, sizeof(extension_info));
+        apr_hash_set(base, key, klen, base_info);
+    }
+    }
+}
+
 static void *merge_mime_dir_configs(apr_pool_t *p, void *basev, void *addv)
 {
     mime_dir_config *base = (mime_dir_config *) basev;
     mime_dir_config *add = (mime_dir_config *) addv;
-    mime_dir_config *new =
-        (mime_dir_config *) apr_palloc(p, sizeof(mime_dir_config));
+    mime_dir_config *new = apr_pcalloc(p, sizeof(mime_dir_config));
+
     int i;
     attrib_info *suffix;
 
-    new->forced_types = apr_table_overlay(p, add->forced_types,
-                     base->forced_types);
-    new->encoding_types = apr_table_overlay(p, add->encoding_types,
-                                         base->encoding_types);
-    new->charset_types = apr_table_overlay(p, add->charset_types,
-                       base->charset_types);
-    new->language_types = apr_table_overlay(p, add->language_types,
-                                         base->language_types);
-    new->handlers = apr_table_overlay(p, add->handlers,
-                                   base->handlers);
-
-    suffix = (attrib_info *) add->handlers_remove->elts;
-    for (i = 0; i < add->handlers_remove->nelts; i++) {
-        apr_table_unset(new->handlers, suffix[i].name);
-    }
-
-    suffix = (attrib_info *) add->types_remove->elts;
-    for (i = 0; i < add->types_remove->nelts; i++) {
-        apr_table_unset(new->forced_types, suffix[i].name);
-    }
-
-    suffix = (attrib_info *) add->encodings_remove->elts;
-    for (i = 0; i < add->encodings_remove->nelts; i++) {
-        apr_table_unset(new->encoding_types, suffix[i].name);
+    if (base->extension_mappings == NULL) {
+    new->extension_mappings = add->extension_mappings;
     }
+    else if (add->extension_mappings == NULL) {
+    new->extension_mappings = base->extension_mappings;
+    }
+    else {
+    new->extension_mappings = apr_hash_make(p);
+    overlay_extension_mappings(p, base->extension_mappings,
+                   new->extension_mappings);
+    overlay_extension_mappings(p, add->extension_mappings,
+                   new->extension_mappings);
+    }
+
+    if (add->handlers_remove) {
+    suffix = (attrib_info *) add->handlers_remove->elts;
+    for (i = 0; i < add->handlers_remove->nelts; i++) {
+        extension_info *exinfo =
+        (extension_info*)apr_hash_get(new->extension_mappings,
+                          suffix[i].name,
+                          APR_HASH_KEY_STRING);
+        if (exinfo) {
+        exinfo->handler = NULL;
+        }
+    }
+    }
 
+    if (add->types_remove) {
+    suffix = (attrib_info *) add->types_remove->elts;
+    for (i = 0; i < add->types_remove->nelts; i++) {
+        extension_info *exinfo =
+        (extension_info*)apr_hash_get(new->extension_mappings,
+                          suffix[i].name,
+                          APR_HASH_KEY_STRING);
+        if (exinfo) {
+        exinfo->forced_type = NULL;
+        }
+    }
+    }
+
+    if (add->encodings_remove) {
+    suffix = (attrib_info *) add->encodings_remove->elts;
+    for (i = 0; i < add->encodings_remove->nelts; i++) {
+        extension_info *exinfo =
+        (extension_info*)apr_hash_get(new->extension_mappings,
+                          suffix[i].name,
+                          APR_HASH_KEY_STRING);
+        if (exinfo) {
+        exinfo->encoding_type = NULL;
+        }
+    }
+    }
 
     new->type = add->type ? add->type : base->type;
     new->handler = add->handler ? add->handler : base->handler;
@@ -198,17 +268,33 @@
     return new;
 }
 
+static extension_info *get_extension_info(apr_pool_t *p, 
mime_dir_config *m,
+                      const char* key)
+{
+    extension_info *exinfo;
+
+    exinfo = (extension_info*)apr_hash_get(m->extension_mappings, key,
+                       APR_HASH_KEY_STRING);
+    if (!exinfo) {
+    exinfo = apr_pcalloc(p, sizeof(extension_info));
+    apr_hash_set(m->extension_mappings, apr_pstrdup(p, key),
+             APR_HASH_KEY_STRING, exinfo);
+    }
+    return exinfo;
+}
+
 static const char *add_type(cmd_parms *cmd, void *m_, const char *ct_,
                             const char *ext)
 {
     mime_dir_config *m=m_;
     char *ct=apr_pstrdup(cmd->pool,ct_);
+    extension_info* exinfo;
 
     if (*ext == '.')
     ++ext;
-   
+    exinfo = get_extension_info(cmd->pool, m, ext);
     ap_str_tolower(ct);
-    apr_table_setn(m->forced_types, ext, ct);
+    exinfo->forced_type = ct;
     return NULL;
 }
 
@@ -217,11 +303,13 @@
 {
     mime_dir_config *m=m_;
     char *enc=apr_pstrdup(cmd->pool,enc_);
+    extension_info* exinfo;
 
     if (*ext == '.')
         ++ext;
+    exinfo = get_extension_info(cmd->pool, m, ext);
     ap_str_tolower(enc);
-    apr_table_setn(m->encoding_types, ext, enc);
+    exinfo->encoding_type = enc;
     return NULL;
 }
 
@@ -230,12 +318,14 @@
 {
     mime_dir_config *m=m_;
     char *charset=apr_pstrdup(cmd->pool,charset_);
+    extension_info* exinfo;
 
     if (*ext == '.') {
     ++ext;
     }
+    exinfo = get_extension_info(cmd->pool, m, ext);
     ap_str_tolower(charset);
-    apr_table_setn(m->charset_types, ext, charset);
+    exinfo->charset_type = charset;
     return NULL;
 }
 
@@ -244,12 +334,14 @@
 {
     mime_dir_config *m=m_;
     char *lang=apr_pstrdup(cmd->pool,lang_);
+    extension_info* exinfo;
 
     if (*ext == '.') {
     ++ext;
     }
+    exinfo = get_extension_info(cmd->pool, m, ext);
     ap_str_tolower(lang);
-    apr_table_setn(m->language_types, ext, lang);
+    exinfo->language_type = lang;
     return NULL;
 }
 
@@ -258,11 +350,14 @@
 {
     mime_dir_config *m=m_;
     char *hdlr=apr_pstrdup(cmd->pool,hdlr_);
+    extension_info* exinfo;
 
-    if (*ext == '.')
+    if (*ext == '.') {
         ++ext;
+    }
+    exinfo = get_extension_info(cmd->pool, m, ext);
     ap_str_tolower(hdlr);
-    apr_table_setn(m->handlers, ext, hdlr);
+    exinfo->handler = hdlr;
     return NULL;
 }
 
@@ -279,6 +374,10 @@
     if (*ext == '.') {
         ++ext;
     }
+    if (mcfg->handlers_remove == NULL) {
+    mcfg->handlers_remove =
+        apr_array_make(cmd->pool, 4, sizeof(attrib_info));
+    }
     suffix = (attrib_info *) apr_array_push(mcfg->handlers_remove);
     suffix->name = apr_pstrdup(cmd->pool, ext);
     return NULL;
@@ -296,6 +395,10 @@
     if (*ext == '.') {
         ++ext;
     }
+    if (mcfg->encodings_remove == NULL) {
+    mcfg->encodings_remove =
+        apr_array_make(cmd->pool, 4, sizeof(attrib_info));
+    }
     suffix = (attrib_info *) apr_array_push(mcfg->encodings_remove);
     suffix->name = apr_pstrdup(cmd->pool, ext);
     return NULL;
@@ -313,6 +416,9 @@
     if (*ext == '.') {
         ++ext;
     }
+    if (mcfg->types_remove == NULL) {
+    mcfg->types_remove = apr_array_make(cmd->pool, 4, sizeof(attrib_info));
+    }
     suffix = (attrib_info *) apr_array_push(mcfg->types_remove);
     suffix->name = apr_pstrdup(cmd->pool, ext);
     return NULL;
@@ -361,16 +467,9 @@
      "language to use for documents with no other language file 
extension"),
     {NULL}
 };
-
-/* Hash apr_table_t  --- only one of these per daemon; virtual hosts can
- * get private versions through AddType...
- */
 
-#define MIME_HASHSIZE (32)
-#define hash(i) (apr_tolower(i) % MIME_HASHSIZE)
+static apr_hash_t *mime_type_extensions;
 
-static apr_table_t *hash_buckets[MIME_HASHSIZE];
-
 static void mime_post_config(apr_pool_t *p, apr_pool_t *plog, 
apr_pool_t *ptemp, server_rec *s)
 {
     ap_configfile_t *f;
@@ -390,8 +489,7 @@
         exit(1);
     }
 
-    for (x = 0; x < MIME_HASHSIZE; x++)
-        hash_buckets[x] = apr_table_make(p, 10);
+    mime_type_extensions = apr_hash_make(p);
 
     while (!(ap_cfg_getline(l, MAX_STRING_LEN, f))) {
         const char *ll = l, *ct;
@@ -403,7 +501,7 @@
         while (ll[0]) {
             char *ext = ap_getword_conf(p, &ll);
             ap_str_tolower(ext);   /* ??? */
-            apr_table_setn(hash_buckets[hash(ext[0])], ext, ct);
+        apr_hash_set(mime_type_extensions, ext, APR_HASH_KEY_STRING, ct);
         }
     }
     ap_cfg_closefile(f);
@@ -682,6 +780,7 @@
     /* Parse filename extensions, which can be in any order */
     while ((ext = ap_getword(r->pool, &fn, '.')) && *ext) {
         int found = 0;
+    extension_info *exinfo;
 
 #ifdef CASE_BLIND_FILESYSTEM
         /* We have a basic problem that folks on case-crippled systems
@@ -690,21 +789,25 @@
         ap_str_tolower(ext);
 #endif
 
+    exinfo = (extension_info*) apr_hash_get(conf->extension_mappings,
+                        ext, APR_HASH_KEY_STRING);
+
         /* Check for Content-Type */
-        if ((type = apr_table_get(conf->forced_types, ext))
-            || (type = apr_table_get(hash_buckets[hash(*ext)], ext))) {
+        if ((exinfo && ((type = exinfo->forced_type)))
+            || (type = apr_hash_get(mime_type_extensions, ext,
+                    APR_HASH_KEY_STRING))) {
             r->content_type = type;
             found = 1;
         }
 
     /* Add charset to Content-Type */
-    if ((type = apr_table_get(conf->charset_types, ext))) {
+    if (exinfo && ((type = exinfo->charset_type))) {
         charset = type;
         found = 1;
     }
 
         /* Check for Content-Language */
-        if ((type = apr_table_get(conf->language_types, ext))) {
+        if (exinfo && ((type = exinfo->language_type))) {
             const char **new;
 
             r->content_language = type;         /* back compat. only */
@@ -716,7 +819,7 @@
         }
 
         /* Check for Content-Encoding */
-        if ((type = apr_table_get(conf->encoding_types, ext))) {
+        if (exinfo && ((type = exinfo->encoding_type))) {
             if (!r->content_encoding)
                 r->content_encoding = type;
             else
@@ -726,9 +829,8 @@
         }
 
         /* Check for a special handler, but not for proxy request */
-        if ((type = apr_table_get(conf->handlers, ext))
-        && (PROXYREQ_NONE == r->proxyreq)
-        ) {
+        if ((exinfo && ((type = exinfo->handler)))
+        && (PROXYREQ_NONE == r->proxyreq)) {
             r->handler = type;
             found = 1;
         }

Re: table inefficiencies Re: Observations on fragmentation in SMS pools

Posted by Cliff Woolley <cl...@yahoo.com>.

[Is it just me or is it nearly impossible to have a conversation about
Apache or APR that doesn't in some way belong on BOTH lists? <sigh>]

On Mon, 9 Jul 2001, Brian Pane wrote:

> It's worth noting that half of the apr_table_get calls in
> Apache are from mod_mime.  I posted a patch to new-httpd
> a couple of weeks ago that replaces mod_mime's tables with
> hash tables.

Can you repost that to new-httpd?  I'd forgotten about it but am
definitely interested.

> Other frequent callers of linear-time table lookups:
> ap_set_byterange, ap_setup_client_block.

Patches for those would be fine by me, as well.

--Cliff

--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA

table inefficiencies Re: Observations on fragmentation in SMS pools

Posted by Brian Pane <bp...@pacbell.net>.

Jon Travis wrote:
[...]

>Tables should definitely be moved to APR-util if they are to remain.  As
>for Apache, there are better structures that dictate general order than
>the table.  IMNSHO, the only reason tables are still in Apache is inertia.
>Nobody wants to go through and change everything to a more sane data
>structure.  Case insensitive searches -- linear search time over the
>table ... ugh.  
>
It's worth noting that half of the apr_table_get calls in
Apache are from mod_mime.  I posted a patch to new-httpd
a couple of weeks ago that replaces mod_mime's tables with
hash tables.

Other frequent callers of linear-time table lookups:
ap_set_byterange, ap_setup_client_block.

--Brian

Re: Observations on fragmentation in SMS pools

Posted by Jon Travis <jt...@covalent.net>.

On Mon, Jul 09, 2001 at 04:50:31PM -0700, rbb@covalent.net wrote:
> 
> > cool, huh?  [and it's only 1024 LOC yes i know it's not
> > portable like APR i was v.impressed that someone actually
> > looked at it when i first mentioned it here, btw ]
> >
> > so *grin*.
> >
> > can you guarantee thread-safety on apr_hash_t using
> > apr_hash_first()/next()/this()?
> >
> > can you guarantee complete traversal with multiple
> > simultaneous adders, deleters, readers and writers?
> >
> > and does anyone want a challenge of porting tdb to apr?
> > *grin*
> 
> Challenge, did somebody say challenge?  I'm always up for a challenge.
> :-)
> 
> > > BTW: Why are tables in APR at all?  The only thing I see used
> > > is headers in apr-util hooks code, and in the xml, but that of
> > > course can be fixed.  Step 1, remove tables from APR, Step 2,
> > > remove tables from Apache.
> >
> > agh!  tables are kinda... entrenched into xvl.  okay, maybe
> > not: only 10 instances of apr_table_make.  10 of apr_table_do
> > [which is why i was advocating it: i really like it :)]
> 
> Tables are in APR, because were originally moved from Apache to APR before
> APR-util existed.  They should really move to apr-util.  They should never
> be removed from Apache.  Tables are useful because they garuantee a
> general order to the data, namely, the order you insert information into
> the table.  Hashes have a different use case.

Tables should definitely be moved to APR-util if they are to remain.  As
for Apache, there are better structures that dictate general order than
the table.  IMNSHO, the only reason tables are still in Apache is inertia.
Nobody wants to go through and change everything to a more sane data
structure.  Case insensitive searches -- linear search time over the
table ... ugh.  

-- Jon

tables -> hash (Re: Observations on fragmentation in SMS pools)

Posted by dean gaudet <de...@arctic.org>.

On Mon, 9 Jul 2001, Roy T. Fielding wrote:

> > Tables are in APR, because were originally moved from Apache to APR before
> > APR-util existed.  They should really move to apr-util.  They should never
> > be removed from Apache.  Tables are useful because they garuantee a
> > general order to the data, namely, the order you insert information into
> > the table.  Hashes have a different use case.
>
> Ummm, no, tables do not guarantee that -- arrays do.  Tables were specfically
> created to be an abstract hash table, but the implementation remained
> simple because we never used them for large tables.

the r->headers_{in,out,err} tables are referenced so frequently that
making them hash tables is a win... there was a russian fellow, dimitri? i
forget his last name -- works on freebsd as well i think.  he tried this
and said it was a win.

i did a hash implementation for tables, but it was a loss because the hash
overhead on the zillion other small tables kills you.

but if you just fix mod_mime (Brian's patch is a start :), and the
headers_{in,out,err} then you get the biggest bang.  (btw,
headers_{in,out,err} should be turned into a perfect hash using gperf...
and all the compile-time string constants such as "Connection",
"Accept-Types", ... should become integers with #defines.)

-dean

Re: Observations on fragmentation in SMS pools

Posted by "Roy T. Fielding" <fi...@ebuilt.com>.

> Tables are in APR, because were originally moved from Apache to APR before
> APR-util existed.  They should really move to apr-util.  They should never
> be removed from Apache.  Tables are useful because they garuantee a
> general order to the data, namely, the order you insert information into
> the table.  Hashes have a different use case.

Ummm, no, tables do not guarantee that -- arrays do.  Tables were specfically
created to be an abstract hash table, but the implementation remained
simple because we never used them for large tables.

....Roy

Re: Observations on fragmentation in SMS pools

Posted by Jon Travis <jt...@covalent.net>.

On Tue, Jul 10, 2001 at 04:52:25PM +0200, Luke Kenneth Casson Leighton wrote:
> > > ... errr.. should i be using apr_hash_t or something? :)
> > 
> > Yes.
> 
> okay.  will take a look at it, to see where i could use it.
> 
> i have a suspicion that a lot of instances i really _need_
> the case-insensitive stuff, and also need the dual
> set and add capability.
> 
> not least because HTTP POST args can have this:
> key1=value1&key1=value2 and, correct me if i'm wrong,
> here, that results in a table:
> (key1, value1)
> (key1, value2)

I'll reply to this one, but it also goes towards the other
guys pointing to the spec.  There isn't anything wrong with
storing both the sensitive and then keying off of a toupper
of the string, and chaining it off the end if the entry
already exists in the table (for multiple headers of the
same name).  One could even take the existing table code,
add a field for the toupper'd string, add a small adler32 (or
whatever cksum is to your liking) field, and get a lot 
better performance than we see now.  Of course I still
am no big fan of the API.

-- Jon

Re: Observations on fragmentation in SMS pools

Posted by Luke Kenneth Casson Leighton <lk...@samba-tng.org>.

> > ... errr.. should i be using apr_hash_t or something? :)
> 
> Yes.

okay.  will take a look at it, to see where i could use it.

i have a suspicion that a lot of instances i really _need_
the case-insensitive stuff, and also need the dual
set and add capability.

not least because HTTP POST args can have this:
key1=value1&key1=value2 and, correct me if i'm wrong,
here, that results in a table:
(key1, value1)
(key1, value2)

luke

Re: Observations on fragmentation in SMS pools

Posted by Jon Travis <jt...@covalent.net>.

On Tue, Jul 10, 2001 at 01:29:24AM +0200, Luke Kenneth Casson Leighton wrote:
> > > if you examine tdb.c's design, you will notice that apr_table_do()
> > > becomes identical to [but more powerful than] tdb_traverse().
> > > 
> > > 
> > > apr_array_header_t?  again, i haven't thought about it, but
> > > you could likely get away with the same thing.
> > > 
> > 
> > The apr_table_t is just an apr_array_header_t with a specific
> > key/val entry type.
> > 
> > > in other words, i think that supporting apr_pool_t on top of
> > > shared memory (via sms) is going to be tricky: in my opinion,
> > > the apr_pool_xxx() API _and_ the apr_sms_xxx() API can't Deal With
> > > shared memory at a convenient user-acceptable level.
> > > 
> > > HOWEVER!  supporting the data types that apr_pool_xxx() USES
> > > is a different matter.
> > > 
> > > by supporting apr_array_header_t (if possible) and apr_table_t
> > > the complexity of messing about will be hidden from users of
> > > these data types, but they will work quite happily as expected,
> > > even if it's shared memory behind them.
> > > 
> > > what people think?
> > 
> > I think you are in for a ride on this one.  The API for dealing
> > with the tables is currently rather poor, since typically it
> > is casted to an array and then iterated over.  
> 
> implementation?  or usage?
> 
> implementation, i don't know.
> 
> usage?  i think it's _great_!  

Actually both.  The table's have their uses, but really, casting
it to an array and iterating over it?  Ugh!  Table would be
better implemented with a stricter API for setting, fetching, and
iterating.  Of course the case insensitivity also bothers me,
somewhat.

> > The strcasecmp's
> > of the table code are a whole seperate issue.  
> 
> mmm, setn, set, add and addn _are_ a little weird.
> plus, you can't typecast from the const char* to a
> struct* which is what i use it for.
> 
> ... errr.. should i be using apr_hash_t or something? :)

Yes.

> > I would think that
> > the (nearly analagous) apr_hash_t would better suit what you
> > are talking about here.  
>  
> ... mmmm soundss like i should be using apr_hash_t _anyway_.
> will investigate it, get back to you.
> 
> does apr_hash_t have an apr_hash_do()?
> 
> oh YUCK.  it has one of those horrible non-thread-safeable
> dbm-like interfaces.  the _only_ reason that tdb has
> a getfirst/getnext is to make it 'look' like gdbm.
> 
> however, one of the critical things that makes tdb good is
> that it has tdb_traverse(), which - get this: allows you
> to guarantee traversal of *all* nodes, even when you are
> deleting some out from under your nose *simultaneously*.
> 
> cool, huh?  [and it's only 1024 LOC yes i know it's not
> portable like APR i was v.impressed that someone actually
> looked at it when i first mentioned it here, btw ]
> 
> so *grin*.
> 
> can you guarantee thread-safety on apr_hash_t using
> apr_hash_first()/next()/this()?

Nope.

> can you guarantee complete traversal with multiple
> simultaneous adders, deleters, readers and writers?

Nope.  The documentation only dictates that multiple
iterations can go at the same time, and only deleting the
current entry (within an iteration) is well defined.

> and does anyone want a challenge of porting tdb to apr?
> *grin*
> 
> > BTW: Why are tables in APR at all?  The only thing I see used
> > is headers in apr-util hooks code, and in the xml, but that of
> > course can be fixed.  Step 1, remove tables from APR, Step 2, 
> > remove tables from Apache.
> 
> agh!  tables are kinda... entrenched into xvl.  okay, maybe
> not: only 10 instances of apr_table_make.  10 of apr_table_do
> [which is why i was advocating it: i really like it :)]
> 
> i use it to store the GET/POST args, i use it as a result-table
> for XML-db-like queries, as a table to add nodes to an XML
> doc or to add properties to an XML node, for all _sorts_ of
> horrible things, for which it is ideal.

I would personally advocate switching to the apr_hash_t.  The
table has a linear search (plus strcasecmp), and poor allocation
IMO..

-- Jon

Re: Observations on fragmentation in SMS pools

Posted by rb...@covalent.net.

> cool, huh?  [and it's only 1024 LOC yes i know it's not
> portable like APR i was v.impressed that someone actually
> looked at it when i first mentioned it here, btw ]
>
> so *grin*.
>
> can you guarantee thread-safety on apr_hash_t using
> apr_hash_first()/next()/this()?
>
> can you guarantee complete traversal with multiple
> simultaneous adders, deleters, readers and writers?
>
> and does anyone want a challenge of porting tdb to apr?
> *grin*

Challenge, did somebody say challenge?  I'm always up for a challenge.
:-)

> > BTW: Why are tables in APR at all?  The only thing I see used
> > is headers in apr-util hooks code, and in the xml, but that of
> > course can be fixed.  Step 1, remove tables from APR, Step 2,
> > remove tables from Apache.
>
> agh!  tables are kinda... entrenched into xvl.  okay, maybe
> not: only 10 instances of apr_table_make.  10 of apr_table_do
> [which is why i was advocating it: i really like it :)]

Tables are in APR, because were originally moved from Apache to APR before
APR-util existed.  They should really move to apr-util.  They should never
be removed from Apache.  Tables are useful because they garuantee a
general order to the data, namely, the order you insert information into
the table.  Hashes have a different use case.

Ryan

_____________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
Covalent Technologies			rbb@covalent.net
-----------------------------------------------------------------------------

Re: Observations on fragmentation in SMS pools

Posted by Luke Kenneth Casson Leighton <lk...@samba-tng.org>.

> > if you examine tdb.c's design, you will notice that apr_table_do()
> > becomes identical to [but more powerful than] tdb_traverse().
> > 
> > 
> > apr_array_header_t?  again, i haven't thought about it, but
> > you could likely get away with the same thing.
> > 
> 
> The apr_table_t is just an apr_array_header_t with a specific
> key/val entry type.
> 
> > in other words, i think that supporting apr_pool_t on top of
> > shared memory (via sms) is going to be tricky: in my opinion,
> > the apr_pool_xxx() API _and_ the apr_sms_xxx() API can't Deal With
> > shared memory at a convenient user-acceptable level.
> > 
> > HOWEVER!  supporting the data types that apr_pool_xxx() USES
> > is a different matter.
> > 
> > by supporting apr_array_header_t (if possible) and apr_table_t
> > the complexity of messing about will be hidden from users of
> > these data types, but they will work quite happily as expected,
> > even if it's shared memory behind them.
> > 
> > what people think?
> 
> I think you are in for a ride on this one.  The API for dealing
> with the tables is currently rather poor, since typically it
> is casted to an array and then iterated over.  

implementation?  or usage?

implementation, i don't know.

usage?  i think it's _great_!  

> The strcasecmp's
> of the table code are a whole seperate issue.  

mmm, setn, set, add and addn _are_ a little weird.
plus, you can't typecast from the const char* to a
struct* which is what i use it for.

... errr.. should i be using apr_hash_t or something? :)

> I would think that
> the (nearly analagous) apr_hash_t would better suit what you
> are talking about here.  
 
... mmmm soundss like i should be using apr_hash_t _anyway_.
will investigate it, get back to you.

does apr_hash_t have an apr_hash_do()?

oh YUCK.  it has one of those horrible non-thread-safeable
dbm-like interfaces.  the _only_ reason that tdb has
a getfirst/getnext is to make it 'look' like gdbm.

however, one of the critical things that makes tdb good is
that it has tdb_traverse(), which - get this: allows you
to guarantee traversal of *all* nodes, even when you are
deleting some out from under your nose *simultaneously*.

cool, huh?  [and it's only 1024 LOC yes i know it's not
portable like APR i was v.impressed that someone actually
looked at it when i first mentioned it here, btw ]

so *grin*.

can you guarantee thread-safety on apr_hash_t using
apr_hash_first()/next()/this()?

can you guarantee complete traversal with multiple
simultaneous adders, deleters, readers and writers?

and does anyone want a challenge of porting tdb to apr?
*grin*

> BTW: Why are tables in APR at all?  The only thing I see used
> is headers in apr-util hooks code, and in the xml, but that of
> course can be fixed.  Step 1, remove tables from APR, Step 2, 
> remove tables from Apache.

agh!  tables are kinda... entrenched into xvl.  okay, maybe
not: only 10 instances of apr_table_make.  10 of apr_table_do
[which is why i was advocating it: i really like it :)]

i use it to store the GET/POST args, i use it as a result-table
for XML-db-like queries, as a table to add nodes to an XML
doc or to add properties to an XML node, for all _sorts_ of
horrible things, for which it is ideal.

Re: Observations on fragmentation in SMS pools

Posted by Jon Travis <jt...@covalent.net>.

On Mon, Jul 09, 2001 at 05:49:24PM +0200, Luke Kenneth Casson Leighton wrote:
> On Sun, Jul 08, 2001 at 11:06:33AM -0700, Justin Erenkrantz wrote:
> > On Sun, Jul 08, 2001 at 11:04:09AM -0700, Jon Travis wrote:
> > > As for the ability to use shared memory for this ... yeah that is
> > > an interesting idea.  What are your actual use cases for that?
> > 
> > Ian posted a patch to do something like this - I think he wanted a hash
> > table that was shared across all processes.  The problem gets to be 
> > growing the memory space, but I think his use case was with fixed 
> > memory.  He could probably tell more about how he wanted to do it.  
> > 
> > Both rbb and I suggested that this is what SMS was designed for.
> > Have a SMS that is backed by shared memory and then the hashtable
> > immediately becomes shared when you create it with this shared memory
> > SMS.  No change needed to the hashtable code.  
> 
> okay, had an idea here that i thought i'd share with you about
> shmem/pools/sms.
> 
> lessons from tdb's design.
> 
> accessing shmem, you really have to treat it as a
> very-fast-file-in-memory.
> 
> therefore, best thing to do to be able to easily access
> sections of it: treat the shmem as an in-memory database.
> 
> the simplest way to fit that into the existing APR code is
> to have direct support for apr_table_t in a pool-based
> or sms-based shared-memory-segment.
> 
> if you examine tdb.c's design, you will notice that apr_table_do()
> becomes identical to [but more powerful than] tdb_traverse().
> 
> 
> apr_array_header_t?  again, i haven't thought about it, but
> you could likely get away with the same thing.
> 

The apr_table_t is just an apr_array_header_t with a specific
key/val entry type.

> in other words, i think that supporting apr_pool_t on top of
> shared memory (via sms) is going to be tricky: in my opinion,
> the apr_pool_xxx() API _and_ the apr_sms_xxx() API can't Deal With
> shared memory at a convenient user-acceptable level.
> 
> HOWEVER!  supporting the data types that apr_pool_xxx() USES
> is a different matter.
> 
> by supporting apr_array_header_t (if possible) and apr_table_t
> the complexity of messing about will be hidden from users of
> these data types, but they will work quite happily as expected,
> even if it's shared memory behind them.
> 
> what people think?

I think you are in for a ride on this one.  The API for dealing
with the tables is currently rather poor, since typically it
is casted to an array and then iterated over.  The strcasecmp's
of the table code are a whole seperate issue.  I would think that
the (nearly analagous) apr_hash_t would better suit what you
are talking about here.  

BTW: Why are tables in APR at all?  The only thing I see used
is headers in apr-util hooks code, and in the xml, but that of
course can be fixed.  Step 1, remove tables from APR, Step 2, 
remove tables from Apache.

-- Jon

data types on type of shm (Re: Observations on fragmentation in SMS pools)

Posted by dean gaudet <de...@arctic.org>.

On Mon, 9 Jul 2001, Luke Kenneth Casson Leighton wrote:

> HOWEVER!  supporting the data types that apr_pool_xxx() USES
> is a different matter.

shm can be at different memory locations in all processes.  pointers don't
work.  you'd need to radically change the basic data types, which would
affect performance and make the code ugly.  so i think this is a bad idea.

btw, if you want to a shm database/hash table done properly, take a look
at sleepycat db.

-dean

Re: Observations on fragmentation in SMS pools

Posted by Luke Kenneth Casson Leighton <lk...@samba-tng.org>.

On Sun, Jul 08, 2001 at 11:06:33AM -0700, Justin Erenkrantz wrote:
> On Sun, Jul 08, 2001 at 11:04:09AM -0700, Jon Travis wrote:
> > As for the ability to use shared memory for this ... yeah that is
> > an interesting idea.  What are your actual use cases for that?
> 
> Ian posted a patch to do something like this - I think he wanted a hash
> table that was shared across all processes.  The problem gets to be 
> growing the memory space, but I think his use case was with fixed 
> memory.  He could probably tell more about how he wanted to do it.  
> 
> Both rbb and I suggested that this is what SMS was designed for.
> Have a SMS that is backed by shared memory and then the hashtable
> immediately becomes shared when you create it with this shared memory
> SMS.  No change needed to the hashtable code.  

okay, had an idea here that i thought i'd share with you about
shmem/pools/sms.

lessons from tdb's design.

accessing shmem, you really have to treat it as a
very-fast-file-in-memory.

therefore, best thing to do to be able to easily access
sections of it: treat the shmem as an in-memory database.

the simplest way to fit that into the existing APR code is
to have direct support for apr_table_t in a pool-based
or sms-based shared-memory-segment.

if you examine tdb.c's design, you will notice that apr_table_do()
becomes identical to [but more powerful than] tdb_traverse().

apr_array_header_t?  again, i haven't thought about it, but
you could likely get away with the same thing.

in other words, i think that supporting apr_pool_t on top of
shared memory (via sms) is going to be tricky: in my opinion,
the apr_pool_xxx() API _and_ the apr_sms_xxx() API can't Deal With
shared memory at a convenient user-acceptable level.

HOWEVER!  supporting the data types that apr_pool_xxx() USES
is a different matter.

by supporting apr_array_header_t (if possible) and apr_table_t
the complexity of messing about will be hidden from users of
these data types, but they will work quite happily as expected,
even if it's shared memory behind them.

what people think?

luke

Re: Observations on fragmentation in SMS pools

Posted by Justin Erenkrantz <je...@ebuilt.com>.

On Sun, Jul 08, 2001 at 11:04:09AM -0700, Jon Travis wrote:
> As for the ability to use shared memory for this ... yeah that is
> an interesting idea.  What are your actual use cases for that?

Ian posted a patch to do something like this - I think he wanted a hash
table that was shared across all processes.  The problem gets to be 
growing the memory space, but I think his use case was with fixed 
memory.  He could probably tell more about how he wanted to do it.  

Both rbb and I suggested that this is what SMS was designed for.
Have a SMS that is backed by shared memory and then the hashtable
immediately becomes shared when you create it with this shared memory
SMS.  No change needed to the hashtable code.  

And, potentially, the scoreboard could (??) get switched to something
using a shared memory SMS.  But, I don't know enough about the 
scoreboard to say more about that.  -- justin

Re: Observations on fragmentation in SMS pools

Posted by Jon Travis <jt...@covalent.net>.

On Sun, Jul 08, 2001 at 07:53:56PM +0200, Sander Striker wrote:
> > Stuff that the malloc in glibc does.  I cringe at the thought of how
> > much overhead due to abstraction this whole project is causing.
> 
> The other thing this abstraction buys us is the ability to pass in
> different memory to standard APR operations. Hopefully we can get
> shared memory to work in this fashion. I know for sure that we can
> do mlock()'d memory (which might be usefull for security type calculation
> stuff you don't want to go to swap space).

Erm, you could always mlock() the memory that gets passed out of the
pool code.  More than likely, by making some 'secure' pool, you will
be allocating a lot more than just security data from it (as pools
tend to be abused), causing a nice slowdown.  

As for the ability to use shared memory for this ... yeah that is
an interesting idea.  What are your actual use cases for that?

-- Jon

Re: Observations on fragmentation in SMS pools

Posted by dean gaudet <de...@arctic.org>.

On Sun, 8 Jul 2001, Brian Pane wrote:

> But Hoard is LGPL. :-(

hey check this page out
<http://soldc.sun.com/articles/multiproc/multiproc.html>

it tells me two things:

- on linux with ptmalloc in glibc, the stock performance should be almost
as good as with hoard

- on solaris there is a stock "mtmalloc" library which is almost as good
as ptmalloc and hoard, no licensing issue

this eliminates the worry of a hoard recommendation doesn't it?

ok ok there's other beasts such as AIX and IRIX which run on big boxes.
but, um, oh my biases are showing again aren't they?

*studies netcraft operating system numbers*
*justifies biases*

-dean

Re: Observations on fragmentation in SMS pools

Posted by dean gaudet <de...@arctic.org>.

On Sun, 8 Jul 2001, Brian Pane wrote:

> dean gaudet wrote:
>
> >On Sun, 8 Jul 2001, Ian Holsman wrote:
> >
> >>dean gaudet wrote:
> >>
> >>>it removes block_freelist and uses malloc/free for blocks.  this is much
> >>>less expensive than malloc/free on each allocation (amortization).
> >>>
> >>If you can wait till monday, and you are interested, I'll put it through
> >>the ringer on a 8-CPU Sun4500
> >>
> >
> >oh i'm in no rush, i'm being a lazy ass and not even testing these things
> >i'm posting, so thanks for helping :)  also, you would definitely want to
> >try the www.hoard.org allocator on a (solaris) machine with that many
> >cpus.
> >
> If I remember correctly, he's tried the Hoard allocator on
> a server that size with ALLOC_USE_MALLOC, and it performed
> much better than the global free list.  But Hoard is LGPL. :-(

i would totally believe that the ALLOC_USE_MALLOC code performs better
with hoard on a server that size... because the ALLOC_USE_MALLOC code
doesn't need to use a mutex ever.  the global freelist stuff is just a bad
idea on a box with 8 cpus, the mutex kills ya.

but with the elimination of the mutex we should scale better even just
using regular malloc.

i wasn't advocating bundling hoard, it's firmly my opinion that each libc
vendor should be doing their damndest to provide the best malloc possible.
(especially because each libc vendor tends to also be a libpthread vendor
and they know the dirty details of making the two work well together).
i've no qualms with putting "try hoard" into the "how do i perf tune
apache?" page.  it wouldn't be a requirement...

-dean

Re: Observations on fragmentation in SMS pools

Posted by Brian Pane <bp...@pacbell.net>.

dean gaudet wrote:

>On Sun, 8 Jul 2001, Ian Holsman wrote:
>
>>dean gaudet wrote:
>>
>>>it removes block_freelist and uses malloc/free for blocks.  this is much
>>>less expensive than malloc/free on each allocation (amortization).
>>>
>>If you can wait till monday, and you are interested, I'll put it through
>>the ringer on a 8-CPU Sun4500
>>
>
>oh i'm in no rush, i'm being a lazy ass and not even testing these things
>i'm posting, so thanks for helping :)  also, you would definitely want to
>try the www.hoard.org allocator on a (solaris) machine with that many
>cpus.
>
If I remember correctly, he's tried the Hoard allocator on
a server that size with ALLOC_USE_MALLOC, and it performed
much better than the global free list.  But Hoard is LGPL. :-(

--Brian

Re: Observations on fragmentation in SMS pools

Posted by dean gaudet <de...@arctic.org>.

On Sun, 8 Jul 2001, Ian Holsman wrote:

> dean gaudet wrote:
>
> >it removes block_freelist and uses malloc/free for blocks.  this is much
> >less expensive than malloc/free on each allocation (amortization).
>
> If you can wait till monday, and you are interested, I'll put it through
> the ringer on a 8-CPU Sun4500

oh i'm in no rush, i'm being a lazy ass and not even testing these things
i'm posting, so thanks for helping :)  also, you would definitely want to
try the www.hoard.org allocator on a (solaris) machine with that many
cpus.

-dean

Re: Observations on fragmentation in SMS pools

Posted by Ian Holsman <ia...@cnet.com>.

dean gaudet wrote:

>
>On Sun, 8 Jul 2001, Justin Erenkrantz wrote:
>
>>Also, I did try having the pools use malloc/free directly
>>(ALLOC_USE_MALLOC) and the performance was dreadful.  At least on
>>Solaris.  -- justin
>>
>
>yes, ALLOC_USE_MALLOC is dreadful -- that's not what i meant.
>
>what i mean is something like the below patch, which i haven't even tried
>to compile or test ;)  (the patch needs work to rescue some of the
>debugging code in new_block).
>
>it removes block_freelist and uses malloc/free for blocks.  this is much
>less expensive than malloc/free on each allocation (amortization).
>
>-dean
>

If you can wait till monday, and you are interested, I'll put it through 
the ringer on a 8-CPU Sun4500
..Ian

>
>
>Index: apr_pools.c
>===================================================================
>RCS file: /home/cvs/apr/memory/unix/apr_pools.c,v
>retrieving revision 1.100
>diff -u -r1.100 apr_pools.c
>--- apr_pools.c	2001/07/07 22:23:54	1.100
>+++ apr_pools.c	2001/07/08 18:17:36
>@@ -248,7 +248,6 @@
> /*
>  * Static cells for managing our internal synchronisation.
>  */
>-static union block_hdr *block_freelist = NULL;
>
> #if APR_HAS_THREADS
> static apr_lock_t *alloc_mutex;
>@@ -417,120 +416,20 @@
>
> static void free_blocks(union block_hdr *blok)
> {
>-#ifdef ALLOC_USE_MALLOC
>     union block_hdr *next;
>
>     for ( ; blok; blok = next) {
> 	next = blok->h.next;
> 	DO_FREE(blok);
>     }
>-#else /* ALLOC_USE_MALLOC */
>-
>-#ifdef ALLOC_STATS
>-    unsigned num_blocks;
>-#endif /* ALLOC_STATS */
>-
>-    /*
>-     * First, put new blocks at the head of the free list ---
>-     * we'll eventually bash the 'next' pointer of the last block
>-     * in the chain to point to the free blocks we already had.
>-     */
>-
>-    union block_hdr *old_free_list;
>-
>-    if (blok == NULL) {
>-	return;			/* Sanity check --- freeing empty pool? */
>-    }
>-
>-#if APR_HAS_THREADS
>-    if (alloc_mutex) {
>-        apr_lock_acquire(alloc_mutex);
>-    }
>-#endif
>-    old_free_list = block_freelist;
>-    block_freelist = blok;
>-
>-    /*
>-     * Next, adjust first_avail pointers of each block --- have to do it
>-     * sooner or later, and it simplifies the search in new_block to do it
>-     * now.
>-     */
>-
>-#ifdef ALLOC_STATS
>-    num_blocks = 1;
>-#endif /* ALLOC_STATS */
>-
>-    while (blok->h.next != NULL) {
>-
>-#ifdef ALLOC_STATS
>-	++num_blocks;
>-#endif /* ALLOC_STATS */
>-
>-	chk_on_blk_list(blok, old_free_list);
>-	blok->h.first_avail = (char *) (blok + 1);
>-	debug_fill(blok->h.first_avail, blok->h.endp - blok->h.first_avail);
>-#ifdef APR_POOL_DEBUG
>-	blok->h.owning_pool = FREE_POOL;
>-#endif /* APR_POOL_DEBUG */
>-	blok = blok->h.next;
>-    }
>-
>-    chk_on_blk_list(blok, old_free_list);
>-    blok->h.first_avail = (char *) (blok + 1);
>-    debug_fill(blok->h.first_avail, blok->h.endp - blok->h.first_avail);
>-#ifdef APR_POOL_DEBUG
>-    blok->h.owning_pool = FREE_POOL;
>-#endif /* APR_POOL_DEBUG */
>-
>-    /* Finally, reset next pointer to get the old free blocks back */
>-
>-    blok->h.next = old_free_list;
>-
>-#ifdef ALLOC_STATS
>-    if (num_blocks > max_blocks_in_one_free) {
>-	max_blocks_in_one_free = num_blocks;
>-    }
>-    ++num_free_blocks_calls;
>-    num_blocks_freed += num_blocks;
>-#endif /* ALLOC_STATS */
>-
>-#if APR_HAS_THREADS
>-    if (alloc_mutex) {
>-        apr_lock_release(alloc_mutex);
>-    }
>-#endif /* APR_HAS_THREADS */
>-#endif /* ALLOC_USE_MALLOC */
> }
>
> /*
>- * Get a new block, from our own free list if possible, from the system
>- * if necessary.  Must be called with alarms blocked.
>+ * get a new block from the system
>  */
> static union block_hdr *new_block(apr_size_t min_size, apr_abortfunc_t abortfunc)
> {
>-    union block_hdr **lastptr = &block_freelist;
>-    union block_hdr *blok = block_freelist;
>-
>-    /* First, see if we have anything of the required size
>-     * on the free list...
>-     */
>-
>-    while (blok != NULL) {
>-	if ((apr_ssize_t)min_size + BLOCK_MINFREE <= blok->h.endp - blok->h.first_avail) {
>-	    *lastptr = blok->h.next;
>-	    blok->h.next = NULL;
>-	    debug_verify_filled(blok->h.first_avail, blok->h.endp,
>-				"[new_block] Ouch!  Someone trounced a block "
>-				"on the free list!\n");
>-	    return blok;
>-	}
>-	else {
>-	    lastptr = &blok->h.next;
>-	    blok = blok->h.next;
>-	}
>-    }
>-
>-    /* Nope. */
>+    union block_hdr *blok;
>
>     min_size += BLOCK_MINFREE;
>     blok = malloc_block((min_size > BLOCK_MINALLOC)
>@@ -586,7 +485,6 @@
>     union block_hdr *blok;
>     apr_pool_t *new_pool;
>
>-
> #if APR_HAS_THREADS
>     if (alloc_mutex) {
>         apr_lock_acquire(alloc_mutex);
>@@ -960,7 +858,7 @@
>
> APR_DECLARE(apr_size_t) apr_pool_free_blocks_num_bytes(void)
> {
>-    return bytes_in_block_list(block_freelist);
>+    return 0;
> }
>
> /* the unix linker defines this symbol as the last byte + 1 of
>@@ -1255,13 +1153,7 @@
>     cur_len = strp - blok->h.first_avail;
>
>     /* must try another blok */
>-#if APR_HAS_THREADS
>-    apr_lock_acquire(alloc_mutex);
>-#endif
>     nblok = new_block(2 * cur_len, NULL);
>-#if APR_HAS_THREADS
>-    apr_lock_release(alloc_mutex);
>-#endif
>     memcpy(nblok->h.first_avail, blok->h.first_avail, cur_len);
>     ps->vbuff.curpos = nblok->h.first_avail + cur_len;
>     /* save a byte for the NUL terminator */
>@@ -1270,14 +1162,7 @@
>     /* did we allocate the current blok? if so free it up */
>     if (ps->got_a_new_block) {
> 	debug_fill(blok->h.first_avail, blok->h.endp - blok->h.first_avail);
>-#if APR_HAS_THREADS
>-        apr_lock_acquire(alloc_mutex);
>-#endif
>-	blok->h.next = block_freelist;
>-	block_freelist = blok;
>-#if APR_HAS_THREADS
>-        apr_lock_release(alloc_mutex);
>-#endif
>+	DO_FREE(blok);
>     }
>     ps->blok = nblok;
>     ps->got_a_new_block = 1;
>

Re: Observations on fragmentation in SMS pools

Posted by dean gaudet <de...@arctic.org>.


On Sun, 8 Jul 2001, Justin Erenkrantz wrote:

> Also, I did try having the pools use malloc/free directly
> (ALLOC_USE_MALLOC) and the performance was dreadful.  At least on
> Solaris.  -- justin

yes, ALLOC_USE_MALLOC is dreadful -- that's not what i meant.

what i mean is something like the below patch, which i haven't even tried
to compile or test ;)  (the patch needs work to rescue some of the
debugging code in new_block).

it removes block_freelist and uses malloc/free for blocks.  this is much
less expensive than malloc/free on each allocation (amortization).

-dean

Index: apr_pools.c
===================================================================
RCS file: /home/cvs/apr/memory/unix/apr_pools.c,v
retrieving revision 1.100
diff -u -r1.100 apr_pools.c
--- apr_pools.c	2001/07/07 22:23:54	1.100
+++ apr_pools.c	2001/07/08 18:17:36
@@ -248,7 +248,6 @@
 /*
  * Static cells for managing our internal synchronisation.
  */
-static union block_hdr *block_freelist = NULL;

 #if APR_HAS_THREADS
 static apr_lock_t *alloc_mutex;
@@ -417,120 +416,20 @@

 static void free_blocks(union block_hdr *blok)
 {
-#ifdef ALLOC_USE_MALLOC
     union block_hdr *next;

     for ( ; blok; blok = next) {
 	next = blok->h.next;
 	DO_FREE(blok);
     }
-#else /* ALLOC_USE_MALLOC */
-
-#ifdef ALLOC_STATS
-    unsigned num_blocks;
-#endif /* ALLOC_STATS */
-
-    /*
-     * First, put new blocks at the head of the free list ---
-     * we'll eventually bash the 'next' pointer of the last block
-     * in the chain to point to the free blocks we already had.
-     */
-
-    union block_hdr *old_free_list;
-
-    if (blok == NULL) {
-	return;			/* Sanity check --- freeing empty pool? */
-    }
-
-#if APR_HAS_THREADS
-    if (alloc_mutex) {
-        apr_lock_acquire(alloc_mutex);
-    }
-#endif
-    old_free_list = block_freelist;
-    block_freelist = blok;
-
-    /*
-     * Next, adjust first_avail pointers of each block --- have to do it
-     * sooner or later, and it simplifies the search in new_block to do it
-     * now.
-     */
-
-#ifdef ALLOC_STATS
-    num_blocks = 1;
-#endif /* ALLOC_STATS */
-
-    while (blok->h.next != NULL) {
-
-#ifdef ALLOC_STATS
-	++num_blocks;
-#endif /* ALLOC_STATS */
-
-	chk_on_blk_list(blok, old_free_list);
-	blok->h.first_avail = (char *) (blok + 1);
-	debug_fill(blok->h.first_avail, blok->h.endp - blok->h.first_avail);
-#ifdef APR_POOL_DEBUG
-	blok->h.owning_pool = FREE_POOL;
-#endif /* APR_POOL_DEBUG */
-	blok = blok->h.next;
-    }
-
-    chk_on_blk_list(blok, old_free_list);
-    blok->h.first_avail = (char *) (blok + 1);
-    debug_fill(blok->h.first_avail, blok->h.endp - blok->h.first_avail);
-#ifdef APR_POOL_DEBUG
-    blok->h.owning_pool = FREE_POOL;
-#endif /* APR_POOL_DEBUG */
-
-    /* Finally, reset next pointer to get the old free blocks back */
-
-    blok->h.next = old_free_list;
-
-#ifdef ALLOC_STATS
-    if (num_blocks > max_blocks_in_one_free) {
-	max_blocks_in_one_free = num_blocks;
-    }
-    ++num_free_blocks_calls;
-    num_blocks_freed += num_blocks;
-#endif /* ALLOC_STATS */
-
-#if APR_HAS_THREADS
-    if (alloc_mutex) {
-        apr_lock_release(alloc_mutex);
-    }
-#endif /* APR_HAS_THREADS */
-#endif /* ALLOC_USE_MALLOC */
 }

 /*
- * Get a new block, from our own free list if possible, from the system
- * if necessary.  Must be called with alarms blocked.
+ * get a new block from the system
  */
 static union block_hdr *new_block(apr_size_t min_size, apr_abortfunc_t abortfunc)
 {
-    union block_hdr **lastptr = &block_freelist;
-    union block_hdr *blok = block_freelist;
-
-    /* First, see if we have anything of the required size
-     * on the free list...
-     */
-
-    while (blok != NULL) {
-	if ((apr_ssize_t)min_size + BLOCK_MINFREE <= blok->h.endp - blok->h.first_avail) {
-	    *lastptr = blok->h.next;
-	    blok->h.next = NULL;
-	    debug_verify_filled(blok->h.first_avail, blok->h.endp,
-				"[new_block] Ouch!  Someone trounced a block "
-				"on the free list!\n");
-	    return blok;
-	}
-	else {
-	    lastptr = &blok->h.next;
-	    blok = blok->h.next;
-	}
-    }
-
-    /* Nope. */
+    union block_hdr *blok;

     min_size += BLOCK_MINFREE;
     blok = malloc_block((min_size > BLOCK_MINALLOC)
@@ -586,7 +485,6 @@
     union block_hdr *blok;
     apr_pool_t *new_pool;

-
 #if APR_HAS_THREADS
     if (alloc_mutex) {
         apr_lock_acquire(alloc_mutex);
@@ -960,7 +858,7 @@

 APR_DECLARE(apr_size_t) apr_pool_free_blocks_num_bytes(void)
 {
-    return bytes_in_block_list(block_freelist);
+    return 0;
 }

 /* the unix linker defines this symbol as the last byte + 1 of
@@ -1255,13 +1153,7 @@
     cur_len = strp - blok->h.first_avail;

     /* must try another blok */
-#if APR_HAS_THREADS
-    apr_lock_acquire(alloc_mutex);
-#endif
     nblok = new_block(2 * cur_len, NULL);
-#if APR_HAS_THREADS
-    apr_lock_release(alloc_mutex);
-#endif
     memcpy(nblok->h.first_avail, blok->h.first_avail, cur_len);
     ps->vbuff.curpos = nblok->h.first_avail + cur_len;
     /* save a byte for the NUL terminator */
@@ -1270,14 +1162,7 @@
     /* did we allocate the current blok? if so free it up */
     if (ps->got_a_new_block) {
 	debug_fill(blok->h.first_avail, blok->h.endp - blok->h.first_avail);
-#if APR_HAS_THREADS
-        apr_lock_acquire(alloc_mutex);
-#endif
-	blok->h.next = block_freelist;
-	block_freelist = blok;
-#if APR_HAS_THREADS
-        apr_lock_release(alloc_mutex);
-#endif
+	DO_FREE(blok);
     }
     ps->blok = nblok;
     ps->got_a_new_block = 1;

Re: Observations on fragmentation in SMS pools

Posted by Justin Erenkrantz <je...@ebuilt.com>.

On Sun, Jul 08, 2001 at 10:41:12AM -0700, dean gaudet wrote:
> also -- it would be worthwhile for someone to try a version of pools which
> keeps no freelist, and uses malloc()/free() on each block.  i'm guessing
> this is faster in a multithreaded server than us doing a per-process
> global list because malloc() itself already has to deal with
> multithreading, and has all sorts of other optimisations (i.e. it's not
> just a first-fit allocator).  and as an added bonus it'd be worth trying
> www.hoard.org's malloc replacement.

Switch the apr_sms_pools.c to call apr_sms_tracking_create instead of
apr_sms_trivial_create.  (You need the tracking because you will have to
free the memory, so you have to at least keep track of what you have
allocated.)  

Also, I did try having the pools use malloc/free directly 
(ALLOC_USE_MALLOC) and the performance was dreadful.  At least on 
Solaris.  -- justin

Re: Observations on fragmentation in SMS pools

Posted by Luke Kenneth Casson Leighton <lk...@samba-tng.org>.

On Sun, Jul 08, 2001 at 10:56:59AM -0700, Jon Travis wrote:

> On Sun, Jul 08, 2001 at 10:41:12AM -0700, dean gaudet wrote:

> > On Sun, 8 Jul 2001, Jon Travis wrote:
> > 
> > > There is still all this tomfoolery with locking, though, which I think
> > > would be nice to fix with different sized buckets in the freelist.
> > > Stuff that the malloc in glibc does.  I cringe at the thought of how
> > > much overhead due to abstraction this whole project is causing.
> > 
> > haha, the abstraction problems started AGES ago :)  i think the first real
> > sign of the doom ahead was the argument of changing "ap_pool_t" to
> > "ap_context_t" because there was this arbitrary hash table attached to it.
> > 
> > i'd suggest that before any decisions are made regarding global free list
> > or per thread free list someone should study what the non-SMS, pools-only
> > server behaves like.  the problems you're looking at now in the SMS server
> > are a result of SMS's "feature" of allocating all the way up through the
> > ancestry adding bytes each step of the way.

this _can_ be mitigated by doing your own memory-management on top
of a more normal one.

which is what sander's 'smart memory allocator' does.

sma is targetted at having to have massive amounts of reallocs and
mallocs and then freeing large numbers of them in one blast, repeat.

luke

Re: Observations on fragmentation in SMS pools

Posted by dean gaudet <de...@arctic.org>.

On Sun, 8 Jul 2001, Jon Travis wrote:

> I talked to rbb about this not too long ago, and he told me that you
> did a lot of performance work, and that is why pools use their own
> allocation scheme (8k blocks, freelists, etc.) instead of using a
> total malloc() oriented scheme.

yes and no.  i can't say that i ever compared an implemention of apache
using malloc/free to one just using palloc ... it would have been insane
to try to go about adding free() calls everywhere they would have been
required to do that.  (and you can sit back and do some math and figure
out that it would be slower by virtue of having to track many more
pointers, and double the function call overhead of each allocation.)

so given the restriction that i was implementing palloc -- an allocator
with only a "group free" operation of all allocated elements, then what i
did was to tune the apache-1.3 (maybe it was 1.2 at the time) code.  the
palloc stuff goes back to before apache 1.0, and i'm pretty sure it's
rst's code (but you should ask the real old timers, not me, i came on
board around 1.2).  there were definitely a few bugs in both the alloc.c
code, and in the way apache used it.  i don't remember the details (but
i'm responsible for the mess o' ifdefs of debugging code).

another thing you need to consider is that this was a
process-per-connection server -- and the global free list made a lot of
sense.  but i don't remember actually timing it against using malloc/free
directly for the blocks.  i think that would have sucked on many of the
unixes at the time -- some of which it was standard practice to use
gnumalloc instead of libc malloc to get a perf improvement.  it's
important to remember that malloc/free haven't stood still, and the
technology in there has improved.

> You are right that libc malloc()
> implementations are very fast, can deal with threading issues
> more elegantly than we could in APR, and have other niceties
> (different sized buckets, etc.).  There is currently a ton of locking
> in the pool implementation to deal with the threads (and we are
> actually missing a lock in one needed location, I believe).

you know, i'd be tempted to say get rid of the free list even in the
process-per-connection servers.  it simplifies the code, which is way
better for maintenance.  and it's probably a toss up today as to which is
faster.  (it might be slower on ancient unixes, but, um, i fail to see the
problem ;)

-dean

Re: Observations on fragmentation in SMS pools

Posted by Jon Travis <jt...@covalent.net>.

On Sun, Jul 08, 2001 at 10:41:12AM -0700, dean gaudet wrote:
> On Sun, 8 Jul 2001, Jon Travis wrote:
> 
> > There is still all this tomfoolery with locking, though, which I think
> > would be nice to fix with different sized buckets in the freelist.
> > Stuff that the malloc in glibc does.  I cringe at the thought of how
> > much overhead due to abstraction this whole project is causing.
> 
> haha, the abstraction problems started AGES ago :)  i think the first real
> sign of the doom ahead was the argument of changing "ap_pool_t" to
> "ap_context_t" because there was this arbitrary hash table attached to it.
> 
> i'd suggest that before any decisions are made regarding global free list
> or per thread free list someone should study what the non-SMS, pools-only
> server behaves like.  the problems you're looking at now in the SMS server
> are a result of SMS's "feature" of allocating all the way up through the
> ancestry adding bytes each step of the way.
> 
> also -- it would be worthwhile for someone to try a version of pools which
> keeps no freelist, and uses malloc()/free() on each block.  i'm guessing
> this is faster in a multithreaded server than us doing a per-process
> global list because malloc() itself already has to deal with
> multithreading, and has all sorts of other optimisations (i.e. it's not
> just a first-fit allocator).  and as an added bonus it'd be worth trying
> www.hoard.org's malloc replacement.
> 
> this would make our code simpler on a multithreaded server, and push the
> details into libc which presumably the vendor has optimised to go with
> their threading library.

I talked to rbb about this not too long ago, and he told me that you
did a lot of performance work, and that is why pools use their own
allocation scheme (8k blocks, freelists, etc.) instead of using a
total malloc() oriented scheme.  You are right that libc malloc()
implementations are very fast, can deal with threading issues
more elegantly than we could in APR, and have other niceties 
(different sized buckets, etc.).  There is currently a ton of locking
in the pool implementation to deal with the threads (and we are
actually missing a lock in one needed location, I believe).  

I would love to see some benchmarks on a tuned SMS (for httpd) vs. a 
good, simple, libc malloc().  Especially since other programs using 
APR will not likely be writing their own SMS backend to tweak their
app.

-- Jon

Re: Observations on fragmentation in SMS pools

Posted by Justin Erenkrantz <je...@ebuilt.com>.

On Sun, Jul 08, 2001 at 10:34:22AM -0700, Jon Travis wrote:
> There is still all this tomfoolery with locking, though, which I think
> would be nice to fix with different sized buckets in the freelist.  
> Stuff that the malloc in glibc does.  I cringe at the thought of how
> much overhead due to abstraction this whole project is causing.

Well, most of the tomfoolery goes away when the free-list is per thread.
No need for locking then.

I think that this gets us better performance in a threaded environment
than the current apr_pools.c - which has a per-process global list.
The key is the allocation algorithm needs to take that into account
(at this point, trivial isn't).  In order to truly support a threaded 
environment, the free-lists need to become per-thread.  With this 
abstraction, that *is* much easier.  We're not there yet, but we're
laying the groundwork to do so.  -- justin

RE: Observations on fragmentation in SMS pools

Posted by Sander Striker <st...@apache.org>.

> Stuff that the malloc in glibc does.  I cringe at the thought of how
> much overhead due to abstraction this whole project is causing.

The other thing this abstraction buys us is the ability to pass in
different memory to standard APR operations. Hopefully we can get
shared memory to work in this fashion. I know for sure that we can
do mlock()'d memory (which might be usefull for security type calculation
stuff you don't want to go to swap space).

Sander

Re: Observations on fragmentation in SMS pools

Posted by dean gaudet <de...@arctic.org>.

On Sun, 8 Jul 2001, Jon Travis wrote:

> There is still all this tomfoolery with locking, though, which I think
> would be nice to fix with different sized buckets in the freelist.
> Stuff that the malloc in glibc does.  I cringe at the thought of how
> much overhead due to abstraction this whole project is causing.

haha, the abstraction problems started AGES ago :)  i think the first real
sign of the doom ahead was the argument of changing "ap_pool_t" to
"ap_context_t" because there was this arbitrary hash table attached to it.

i'd suggest that before any decisions are made regarding global free list
or per thread free list someone should study what the non-SMS, pools-only
server behaves like.  the problems you're looking at now in the SMS server
are a result of SMS's "feature" of allocating all the way up through the
ancestry adding bytes each step of the way.

also -- it would be worthwhile for someone to try a version of pools which
keeps no freelist, and uses malloc()/free() on each block.  i'm guessing
this is faster in a multithreaded server than us doing a per-process
global list because malloc() itself already has to deal with
multithreading, and has all sorts of other optimisations (i.e. it's not
just a first-fit allocator).  and as an added bonus it'd be worth trying
www.hoard.org's malloc replacement.

this would make our code simpler on a multithreaded server, and push the
details into libc which presumably the vendor has optimised to go with
their threading library.

-dean

Re: Observations on fragmentation in SMS pools

Posted by Jon Travis <jt...@covalent.net>.

On Sun, Jul 08, 2001 at 10:25:17AM -0700, dean gaudet wrote:
> On Sun, 8 Jul 2001, dean gaudet wrote:
> >
> > On Sun, 8 Jul 2001, Justin Erenkrantz wrote:
> >
> > > Yup.  I've brought this up to Sander and David before, but this is how
> > > pools
> >
> > woah.  no way really?
> >
> > that's not at all how it was in 1.3 or in early 2.0 ...
> >
> > in 2.0 as of uh a year ago say, there was one free list per process,
> > and locks were used to access it.
> 
> i checked -- top of tree pools still behaves almost like 1.3.  so i'm not
> sure why you're claiming the pools would go up through the ancestors for
> an allocation.
> 
> apr_palloc first tries the simple pointer arithmetic fast path, and if
> that fails it calls new_block() which accesses the process-global
> block_freelist (from inside an alloc_mutex critical section).

There is still all this tomfoolery with locking, though, which I think
would be nice to fix with different sized buckets in the freelist.  
Stuff that the malloc in glibc does.  I cringe at the thought of how
much overhead due to abstraction this whole project is causing.

-- Jon

Re: Observations on fragmentation in SMS pools

Posted by dean gaudet <de...@arctic.org>.

On Sun, 8 Jul 2001, dean gaudet wrote:

>
>
> On Sun, 8 Jul 2001, Justin Erenkrantz wrote:
>
> > Yup.  I've brought this up to Sander and David before, but this is how
> > pools
>
> woah.  no way really?
>
> that's not at all how it was in 1.3 or in early 2.0 ...
>
> in 2.0 as of uh a year ago say, there was one free list per process,
> and locks were used to access it.

i checked -- top of tree pools still behaves almost like 1.3.  so i'm not
sure why you're claiming the pools would go up through the ancestors for
an allocation.

apr_palloc first tries the simple pointer arithmetic fast path, and if
that fails it calls new_block() which accesses the process-global
block_freelist (from inside an alloc_mutex critical section).

-dean

Re: Observations on fragmentation in SMS pools

Posted by dean gaudet <de...@arctic.org>.

On Sun, 8 Jul 2001, Justin Erenkrantz wrote:

> Yup.  I've brought this up to Sander and David before, but this is how
> pools

woah.  no way really?

that's not at all how it was in 1.3 or in early 2.0 ...

in 2.0 as of uh a year ago say, there was one free list per process,
and locks were used to access it.

no matter where in the tree of pools you tried an allocation, if it
didn't fit into the current block, the allocator would lock and go to
the global free block list and pick up the first fit block.  none of this
going up through a chain of pools or anything, that's insane.

> It's a tradeoff (and is purposeful for lots of small allocations), but
> until someone can write a better memory allocation algorithm, this is
> what we got.

1.3's memory allocation algorithm kick ass, to put it mildly.  i've not read
the apr-dev archives as to the Why and the Goals of SMS ...

-dean

Re: Observations on fragmentation in SMS pools

Posted by Justin Erenkrantz <je...@ebuilt.com>.

On Sun, Jul 08, 2001 at 12:47:32AM -0700, Brian Pane wrote:
> I added some instrumentation to apr_sms_trivial_malloc
> to study in more detail where its bottlenecks were.
> 
> As a result, I found a few interesting phenomena:
> 
> 1. The good news is that allocations of small amounts
>    of memory are very efficient.  They almost always
>    take the fastest path through the code, in which
>    some available space is reserved from the
>    "sms->used_sentinel.prev" block with a handful of
>    pointer arithmetic operations.
> 
> 2. The bad news is that allocations for larger blocks
>    (in the >=8KB range) typically require a call to the
>    parent SMS to get data.  On my test machine, I'm seeing
>    elapsed times in the 30 microsecond range when this
>    happens, compared to less than 1 microsecond for small
>    allocations that don't require more memory from the
>    parent SMS.  And when an allocation falls through to
>    the parent, it often seems to fall all the way through
>    to the root SMS (I suspect that 30us includes a malloc).
>    The problem seems to be particularly bad for things that
>    create subrequests, like mod_include.
> 
> 3. The worse news is that there seems to be lot of
>    fragmentation.  For example, I saw this pattern
>    during a server-parsed request:
>      - the application code requests 12296 bytes
>        from a pool
>      - not enough memory is available in the SMS, so it
>        requests 16400 bytes from its parent SMS.
>      - the parent SMS doesn't have enough free space
>        either, so it requests 20504 bytes from the
>        grandparent SMS.
>      - the grandparent SMS doesn't have enough space
>        either, but it has to iterate through 15 blocks
>        its free list to figure that out.  Each of these
>        blocks has between 8176 and 12272 bytes available.
>      - the grandparent calls through to the great-grandparent
>        to get 24608 bytes.  The great-grandparent doesn't
>        have a block with that much free space, but it
>        iterates through 9 blocks in its free list in
>        search of one; all of these blocks had 16376 bytes
>        free.
>      - the great-grandparent thus requests 28712 bytes from
>        the great-great grandparent.  The great-great-grandparent
>        doesn't have any blocks in its free list, so it calls
>        through to its parent, which at last is an SMS that
>        does a real malloc.
>    This type of pattern may explain the reported higher memory
>    use of the SMS-based httpd compared with the original pools;
>    there's a lot of memory in those free lists that can't be
>    used in this example.
> 
> For an SMS that's going to be a parent of other SMSs, we'll
> need something with more sophisticated policies for reassigning
> freed space than the current trivial-SMS.

Yup.  I've brought this up to Sander and David before, but this is how 
pools work EXCEPT for the fact that we get a little more pathological in 
SMS's addition of memory because each parent tacks on memory (as you
described).  The optimization here is for the <4KB allocation.  I'm not
sure that is what we need to be optimizing for.  More analysis is
probably needed to see what our typical usage pattern is.  This is where
Ian's pool replay code is *very* helpful (although I don't like adding
more #ifdef to the pool code - yuck - I'd like to see a SMS that just
prints that info and passes it up to the parent to do what it needs 
to).

Before turning SMS on in the mainline code by default, I think we need 
to come up with a more robust allocation algorithm.  And, the whole 
point of SMS is that we can play with memory allocation algorithms 
until we are blue in the face and nothing else changes.  You can't do 
that with the current pools code.  We need more people playing with it
and suggesting new allocation algorithms.  =)  The trivial SMS tries to
keep the memory allocation strategy as close as possible to what we 
currently have.

Personally, I'd set MIN_FREE to 0 and bite the small allocations (now 
they'd be slow, but really, how many allocations are under 4KB - data 
structures, perhaps?).  I'd also like to see it smarter about reclaiming
free space, but that gets really tricky.

It's a tradeoff (and is purposeful for lots of small allocations), but 
until someone can write a better memory allocation algorithm, this is 
what we got.  Trivial SMS has its Achilles heel here - this is just an
aggrevation of how the original pool code works (the pool code would add 
4KB, while each level of the SMS adds 4KB).

Aaron pointed out that CLR (Cormen/Leiserson/Rivest for those that never 
took algorithms classes) has many memory allocation algorithms.  I'd 
bet we could find one that would work better.  -- justin

Re: Observations on fragmentation in SMS pools

Posted by Ian Holsman <ia...@cnet.com>.

On 10 Jul 2001 16:57:25 +0200, Luke Kenneth Casson Leighton wrote:
> On Sun, Jul 08, 2001 at 10:19:33AM -0700, Justin Erenkrantz wrote:
> > On Sun, Jul 08, 2001 at 10:14:16AM -0700, dean gaudet wrote:
> > > an ideal situation for free-lists (blocks of freed, but not free()d
> > > memory) is one per cpu.
> > > 
> > > a less ideal situation is one per thread.
> > > 
> > > an even less ideal situation is one per process (which requires locking).
> > > 
> > > it's insane to have one per pool :)
> > 
> > I think we're shooting for #2 (less ideal).  Unless someone can come up
> > with a way to dynamically tell how many CPUs we are running on and bind
> > a free-list to a specific CPU.
> > 
> > We're currently doing #3 (even less ideal) in apr_pools.c.  
> > 
> > And, yeah, the current trivial SMS is doing #4.  =)  But, don't expect it 
> > to stay like that.  And, if we implement the apr_sms_child_malloc, it
> > gets to somewhere between #2 and #3.  -- justin
> 
> #5 the approach taken by tdb.  have a hash-chain.
> 
This is the approach the shared-mem hash I wrote is using...
If you can't get the tdb folks to see the BSD licence light, you
can always use that as a starting point.

> then you get the best of all worlds.  simultaneous access
> only requires that you lock *one* hash chain, which allows
> a theoretical limit of number-of-hash-bucket simultaneous
> accesses.
> 
> luke
> 
> p.s. i'm hoping that people will not be put off by tdb's GPL
> license because i think that there are some really good techniques
> used in there that could be applied to APR code, too.
--
Ian Holsman          IanH@cnet.com
Performance Measurement & Analysis
CNET Networks   -   (415) 364-8608

Re: Observations on fragmentation in SMS pools

Posted by Luke Kenneth Casson Leighton <lk...@samba-tng.org>.

On Sun, Jul 08, 2001 at 10:19:33AM -0700, Justin Erenkrantz wrote:
> On Sun, Jul 08, 2001 at 10:14:16AM -0700, dean gaudet wrote:
> > an ideal situation for free-lists (blocks of freed, but not free()d
> > memory) is one per cpu.
> > 
> > a less ideal situation is one per thread.
> > 
> > an even less ideal situation is one per process (which requires locking).
> > 
> > it's insane to have one per pool :)
> 
> I think we're shooting for #2 (less ideal).  Unless someone can come up
> with a way to dynamically tell how many CPUs we are running on and bind
> a free-list to a specific CPU.
> 
> We're currently doing #3 (even less ideal) in apr_pools.c.  
> 
> And, yeah, the current trivial SMS is doing #4.  =)  But, don't expect it 
> to stay like that.  And, if we implement the apr_sms_child_malloc, it
> gets to somewhere between #2 and #3.  -- justin

#5 the approach taken by tdb.  have a hash-chain.

then you get the best of all worlds.  simultaneous access
only requires that you lock *one* hash chain, which allows
a theoretical limit of number-of-hash-bucket simultaneous
accesses.

luke

p.s. i'm hoping that people will not be put off by tdb's GPL
license because i think that there are some really good techniques
used in there that could be applied to APR code, too.

Re: Observations on fragmentation in SMS pools

Posted by Justin Erenkrantz <je...@ebuilt.com>.

On Sun, Jul 08, 2001 at 10:14:16AM -0700, dean gaudet wrote:
> an ideal situation for free-lists (blocks of freed, but not free()d
> memory) is one per cpu.
> 
> a less ideal situation is one per thread.
> 
> an even less ideal situation is one per process (which requires locking).
> 
> it's insane to have one per pool :)

I think we're shooting for #2 (less ideal).  Unless someone can come up
with a way to dynamically tell how many CPUs we are running on and bind
a free-list to a specific CPU.

We're currently doing #3 (even less ideal) in apr_pools.c.  

And, yeah, the current trivial SMS is doing #4.  =)  But, don't expect it 
to stay like that.  And, if we implement the apr_sms_child_malloc, it
gets to somewhere between #2 and #3.  -- justin

Re: Observations on fragmentation in SMS pools

Posted by dean gaudet <de...@arctic.org>.

an ideal situation for free-lists (blocks of freed, but not free()d
memory) is one per cpu.

a less ideal situation is one per thread.

an even less ideal situation is one per process (which requires locking).

it's insane to have one per pool :)

-dean