You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Cliff Woolley <cl...@yahoo.com> on 2001/04/21 19:39:48 UTC

memory allocation (was Re: mod_include performance numbers)

On Sat, 21 Apr 2001, Greg Ames wrote:

> are you thinking about an atomic push/pop block allocator?  I'll be
> happy to help out if so, especially with the machine instruction level
> stuff.

Yes, I am, and I definitely *will* need help from various people getting
the appropriate machine language magic for their platforms working.  I've
already done the generic fallback locking implementation of the stack;
that was simple.  And I have what you and Jeff gave me for S390.  But...

> yeah, but that stuff can go away.  compare-and-exchange (or compare &
> swap, or load & reserve, or...) is our friend in multithreaded systems,
> especially on multiprocessors.  The piece I haven't figured out is how
> to set up CPU architecture dependent directories, or macros, or
> whatever, in APR.  <sigh>

THAT's the problem.  No other piece of APR is *architecture* dependent.
There's an "arch" include directory, but it's really a misnamed "os"
directory.  I've got a scheme implemented that gives us an APR_ARCH_IS_foo
macro, but it's a hack.  It disobeys the typical APR rule of "all macros
are always defined and have a value of 0 or 1" and is just defined or not.
I've also been afraid that some of the possible ways to implement these
machine-language tricks will also be *compiler* dependent, not just
architecture dependent.  If so, then that makes this that much harder.
For example, I've taken your jstack.h and pulled out the __cds() "calls"
and wrapped a macro around them so that different architectures can insert
their equivalent instruction.  But are "__cds()" and "cds_t" available on
all S390 platforms?  I'm guessing no.  Ugh.  If anybody has a clean way to
do this, I'm all ears.

--Cliff

--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA



Re: memory allocation (was Re: mod_include performance numbers)

Posted by Greg Ames <gr...@remulak.net>.
Cliff Woolley wrote:
> 
> On Sat, 21 Apr 2001, Greg Ames wrote:
> 
> > are you thinking about an atomic push/pop block allocator?  I'll be
> > happy to help out if so, especially with the machine instruction level
> > stuff.
> 
> Yes, I am, and I definitely *will* need help from various people getting
> the appropriate machine language magic for their platforms working.  I've
> already done the generic fallback locking implementation of the stack;
> that was simple.  

Cool! that's got to be the first piece.

> 
> > yeah, but that stuff can go away.  compare-and-exchange (or compare &
> > swap, or load & reserve, or...) is our friend in multithreaded systems,
> > especially on multiprocessors.  The piece I haven't figured out is how
> > to set up CPU architecture dependent directories, or macros, or
> > whatever, in APR.  <sigh>
> 
> THAT's the problem.  No other piece of APR is *architecture* dependent.
> There's an "arch" include directory, but it's really a misnamed "os"
> directory.  I've got a scheme implemented that gives us an APR_ARCH_IS_foo
> macro, but it's a hack.  It disobeys the typical APR rule of "all macros
> are always defined and have a value of 0 or 1" and is just defined or not.
> I've also been afraid that some of the possible ways to implement these
> machine-language tricks will also be *compiler* dependent, not just
> architecture dependent.  If so, then that makes this that much harder.

Yessir...on platforms that have multiple compilers, there could be
multiple ways of
coding the same machine instruction, or no support at all.  This sounds
like something autoconf tests could figure out.

On the other hand, if gcc is running on i486 or above, it shouldn't
matter if it's Linux or FreeBSD or whatever.  So if we figure out one of
those platforms, we get a lot of bang for the buck.

> For example, I've taken your jstack.h 

jstack == Jeff's stack.  I can only take credit for teaching him some
things about atomic updates on multiprocessors.

>                                         and pulled out the __cds() "calls"
> and wrapped a macro around them so that different architectures can insert
> their equivalent instruction.  But are "__cds()" and "cds_t" available on
> all S390 platforms?  I'm guessing no.  Ugh.  

I'm guessing you're right, I doubt if gcc supports it (Linux390).  But
autoconf is our friend.  (sheesh...did I really say that? )

> If anybody has a clean way to do this, I'm all ears.

Me too.  More ideas are greatly appreciated.

Greg

Re: memory allocation (was Re: mod_include performance numbers)

Posted by Cliff Woolley <cl...@yahoo.com>.
On Sat, 21 Apr 2001, dean gaudet wrote:

> can you give a short description of this allocator?

FirstBill wrote the beginnings of it.  It's basically a drop-in
replacement for malloc/calloc/free (really a wrapper around them) that,
when initialized, pre-allocates blocks of various sizes (in FirstBill's,
IIRC, it does as many blocks of a given power-of-two size as will fit in
8KB).

It uses a simple stack to keep its free lists.  The stack, while simple in
concept, is the tricky-in-implementation part.  The idea is that the stack
API just has three operations: init/push/pop.  That's it.  On many
platforms, a stack like this can be implemented without locks, using
architecture-specific instructions like Compare-Double-and-Swap.

So it's really just a wrapper around malloc that keeps stacks of blocks
that can be very efficiently re-allocated.

That's it.

--Cliff

--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA



Re: memory allocation (was Re: mod_include performance numbers)

Posted by dean gaudet <dg...@arctic.org>.
can you give a short description of this allocator?

-dean

On Sat, 21 Apr 2001, Cliff Woolley wrote:

> On Sat, 21 Apr 2001, Greg Ames wrote:
>
> > are you thinking about an atomic push/pop block allocator?  I'll be
> > happy to help out if so, especially with the machine instruction level
> > stuff.
>
> Yes, I am, and I definitely *will* need help from various people getting
> the appropriate machine language magic for their platforms working.  I've
> already done the generic fallback locking implementation of the stack;
> that was simple.  And I have what you and Jeff gave me for S390.  But...
>
> > yeah, but that stuff can go away.  compare-and-exchange (or compare &
> > swap, or load & reserve, or...) is our friend in multithreaded systems,
> > especially on multiprocessors.  The piece I haven't figured out is how
> > to set up CPU architecture dependent directories, or macros, or
> > whatever, in APR.  <sigh>
>
> THAT's the problem.  No other piece of APR is *architecture* dependent.
> There's an "arch" include directory, but it's really a misnamed "os"
> directory.  I've got a scheme implemented that gives us an APR_ARCH_IS_foo
> macro, but it's a hack.  It disobeys the typical APR rule of "all macros
> are always defined and have a value of 0 or 1" and is just defined or not.
> I've also been afraid that some of the possible ways to implement these
> machine-language tricks will also be *compiler* dependent, not just
> architecture dependent.  If so, then that makes this that much harder.
> For example, I've taken your jstack.h and pulled out the __cds() "calls"
> and wrapped a macro around them so that different architectures can insert
> their equivalent instruction.  But are "__cds()" and "cds_t" available on
> all S390 platforms?  I'm guessing no.  Ugh.  If anybody has a clean way to
> do this, I'm all ears.
>
> --Cliff
>
> --------------------------------------------------------------
>    Cliff Woolley
>    cliffwoolley@yahoo.com
>    Charlottesville, VA
>
>
>