You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apr.apache.org by Andi Gutmans <an...@zend.com> on 2002/06/24 22:07:43 UTC

Memory manager

Hi,

PHP uses memory allocation extensively. During the life cycle of a PHP 
script there is a huge amount of malloc()'s and free()'s. We found that 
under multi-threaded web servers this leads to decreased performance due to 
memory fragmentation and locking within the memory manager.
The solution is using per-thread memory pools which don't lock and are 
completely freed at the end of each request.
Win32 supports this kind of per-thread memory pool with the 
HeapCreate(HEAP_NO_SERIALIZE, ...) family of functions. Using these kind of 
functions gave us a huge performance gain.
Now with Apache 2 coming out I wanted to solve this problem in a 
cross-platform way as I don't have Bill's API available on UNIX :) The APR 
memory pools aren't good enough for us because they don't allow for any 
freeing which just doesn't work for PHP.
What we did was write a memory manager (similar to Doug Lea's malloc.c but 
much more lightweight) which allows you to have many instances (pools) and 
it supports allocation, freeing, reallocation. At the end of each request 
it quickly frees all of the huge memory chunks it used. I started using it 
with the new PHP scripting engine and am allocating memory in 64KB blocks 
(run-time definable) and it seems to work pretty well. To allocate the 
memory blocks themselves it uses malloc() which makes it extremely 
portable. (I actually got that idea from APR).

Do you guys have any interest in adding this kind of "smarter" memory pool 
into APR? I think it's extremely useful.

If you reply please cc: me because I'm not on the APR dev list.

Andi


Re: Memory manager

Posted by Emery Berger <em...@cs.utexas.edu>.
Andi Gutmans wrote:
> I read the reaps article. I didn't quite understand how they actually 
> code their heaps and therefore it's hard to understand how fast it 
> really is.
> My approach is actually similar to theirs (pools with free) and they 
> even mention this kind of approach. I created pools which are internally 
> managed similar to Doug Lea style.
> Do these guys make their code available someplace?

We (I) will. The camera-ready copy has to go to OOPSLA in about two 
weeks, and I'll make the code available. I've also revised the text, and 
I think that the explanation of how reaps work is better.

You might want to look at our "Composing High-Performance Memory 
Allocators" paper, in PLDI 2001 and available from my web page, for more 
details on how the Heap Layers infrastructure works (I built reaps using 
this infrastructure).

Regards,
-- Emery

--
Emery Berger
Assistant Professor (starting Fall 2002)
Dept. of Computer Science
University of Massachusetts, Amherst
www.cs.utexas.edu/users/emery




Re: Memory manager

Posted by Andi Gutmans <an...@zend.com>.
At 11:34 AM 6/25/2002 -0700, Greg Stein wrote:
>On Tue, Jun 25, 2002 at 09:07:22PM +0300, Andi Gutmans wrote:
> > I won't cc: the apr dev group because it'll just clutter their list :)
>
>Heh. Well, I trimmed out some, and am taking one point to the list.
>
> > At 03:09 AM 6/25/2002 -0700, Greg Stein wrote:
> >...
> > >Sander mentioned something about "reaps". I remember that coming up a 
> while
> > >back, but am not super clear on it. IIRC, it was a synthesis of pools and
> > >being able to individually free items. Probably something right along the
> > >lines of what you're looking for.
> >
> > If you guys feel this works better for you then great!
>
>My point is that (IIRC) "reaps" are essentially what you are building. There
>is some actual academic research on their use. You might find that useful
>for your own work.
>
>Further, that we would want to look at your work and compare that with the
>"reap" information, and bring that into APR.
>
>The post about reaps is here:
>
> 
>http://www.apachelabs.org/apr-mbox/200203.mbox/%3C000501c1d6c6$84935000$ef801942@ristretto%3E

I read the reaps article. I didn't quite understand how they actually code 
their heaps and therefore it's hard to understand how fast it really is.
My approach is actually similar to theirs (pools with free) and they even 
mention this kind of approach. I created pools which are internally managed 
similar to Doug Lea style.
Do these guys make their code available someplace?

Andi


Re: Memory manager

Posted by Greg Stein <gs...@lyra.org>.
On Tue, Jun 25, 2002 at 09:07:22PM +0300, Andi Gutmans wrote:
> I won't cc: the apr dev group because it'll just clutter their list :)

Heh. Well, I trimmed out some, and am taking one point to the list.

> At 03:09 AM 6/25/2002 -0700, Greg Stein wrote:
>...
> >Sander mentioned something about "reaps". I remember that coming up a while
> >back, but am not super clear on it. IIRC, it was a synthesis of pools and
> >being able to individually free items. Probably something right along the
> >lines of what you're looking for.
> 
> If you guys feel this works better for you then great!

My point is that (IIRC) "reaps" are essentially what you are building. There
is some actual academic research on their use. You might find that useful
for your own work.

Further, that we would want to look at your work and compare that with the
"reap" information, and bring that into APR.

The post about reaps is here:

    http://www.apachelabs.org/apr-mbox/200203.mbox/%3C000501c1d6c6$84935000$ef801942@ristretto%3E

> As I mentioned I thought this memory manager could be helpful for certain 
> projects using APR. I thought it should be an additional optional pool type 
> as different apps and different developers have different needs.

You bet.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: Memory manager

Posted by Greg Stein <gs...@lyra.org>.
On Tue, Jun 25, 2002 at 07:20:34AM +0300, Andi Gutmans wrote:
> At 03:58 PM 6/24/2002 -0700, Greg Stein wrote:
>...
> >Um. We use pools in Subversion and free the memory all the time. The key is
> >the use of subpools. I added some notes about our experiences at the end of
> >this document:
> >
> >     http://cvs.apache.org/viewcvs/apr-serf/docs/roadmap.txt?rev=1.3
> >
> >Note that pools can also be configured to not have a per-thread lock.
> 
> Ouch, you really worked hard there.

Hard? Not really. The pattern is not difficult to implement, and only
certain types of loops require strict subpool usage (loops which have an
input which is pretty well unbounded (based on some user input or file or
whatever)).

On the other side of the coin, however, is that none of our code ever is
concerned about free'ing stuff. We don't have to litter efree() throughout
our code, yet we also know that somebody will get rid of everything that we
happen to allocate [when it is appropriate].

Basically, there is a huge burden lifted by not needing to track every
allocation in the code itself.

When we *do* free (by destroying a pool), we're also getting rid of a bunch
of other, associated stuff. We never need to zero in on a particular item
and say, "get rid of *that*." All allocations come in associated groups, so
we take advantage of that and place them all into a (sub)pool.

> That is exactly what we can't do in 
> PHP. Our code base is so big that the easiest solution for us has always 
> been to just give our users the memory allocation API they are used to (in 
> our case emalloc(), efree(), erealloc() and so on) and just make sure that 
> all of this memory gets freed at the end of each request (we also have some 
> leak detection code but that is coded on top of the actual memory manager).

Understood. Of course, the problem is that if somebody gets into the habit
of, "well, it will just be tossed at the end of the request" and *stops*
using the efree() function, then you could end up with a *huge* working set.
We saw plenty of that in Subversion :-)

Tossing (groups of) memory during unbounded iteration is always necessary,
whether using pools or an alloc/free strategy. Failing that, each item that
might ever be allocated within the loop must be individually tracked by the
code which does the alloc, and then ensured that it gets freed.

> Also as PHP is a scripting language it can run for quite a bit and do lots 
> of allocation's and free's. You can't really do any planning like you guys 
> did in Subversion on exactly when stuff can be freed and when not. Grouping 
> memory allocations is virtually impossible. Anyway, it does seem that you 
> guys had to work a bit too hard.

I don't think so. While it takes some discipline, I'm not sure that I equate
that with a lot of difficulty. And your comment about "when stuff can be
freed and when not" simply tells me that your code is a bit too, um,
"unstructured" :-)

Subversion has very nice lines about when stuff is valid, and when it goes
away. Every object has a defined lifetime, and that is defined by the pool
it was placed into. We don't have destructors -- the object's death is
determined by the pool that the caller placed the object into.

Even an interpreter like PHP can be structured to have a well-defined
hierarchy of lifetimes. At the top is PHP itself. Then you have children for
each interpreter engine, maybe each thread, each time you run the compiler,
each script, etc. I'm sure there is hierarchy within that, but I'm not
familiar enough with the internals.

The "mess" only arrives once you start running the code :-)  But note that
the pools have already tossed all the memory associated with parsing and
compiling your script. Now you just have to worry about what gets allocated
as part of the interpretation process, and where that data might end up
getting stashed. Whereever the data goes... that determines the appropriate
lifetime. If somebody loads a new module into the interpreter, well that
probably sticks around, so it lives in the interp pool. Objects that are
instantiated are probably per-thread, while some data might be passed across
threads, so it lives in a data subpool of the interpreter.

etc.  The point is that object lifetimes *can* be well-designed, and the
pools simply mirror that structure. And also note a subtle benefit:
*because* of the pools, you think harder about lifetimes, and you organize
your code appropriately.

> > >...
> > > Do you guys have any interest in adding this kind of "smarter" memory pool
> > > into APR? I think it's extremely useful.
> >
> >Sure. Although I'm a bit unclear on how it differs from using, say,
> >apr_pool_destroy on a subpool to toss intermediate memory.
> 
> If I understood correctly the difference is that you don't need to group 
> the memory but can allocate and toss memory when ever you need to. This 
> kind of "knowing in advance" can't be done in PHP.

I think it is really about granularity. pools are about grouping together
related allocations. If PHP, or its subcomponents, are trying to alloc and
free individual pieces, then yes: you need to follow that pattern and
provide alloc/free mechanisms. During Apache and Subversion development, I
just haven't found anything that ever requires that kind of granularity,
however.

That said: *legacy* code can certainly impact the kinds of facilities that
you need to provide [to subcomponents].


Sander mentioned something about "reaps". I remember that coming up a while
back, but am not super clear on it. IIRC, it was a synthesis of pools and
being able to individually free items. Probably something right along the
lines of what you're looking for.

> P.S. - I'm enthusiastically waiting for subversion. CVS just doesn't cut it 
> anymore.

hehe :-) We're fast approaching Alpha (two weeks). Our opinion is that it
will be stable enough to use, and have all the necessary features for 90% of
your work. We'll wrap up those little bits and kick out some edge cases and
bugs between Alpha and Beta. Point is: you don't really have to wait :-)

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: Memory manager

Posted by Andi Gutmans <an...@zend.com>.
At 03:58 PM 6/24/2002 -0700, Greg Stein wrote:
>On Mon, Jun 24, 2002 at 11:07:43PM +0300, Andi Gutmans wrote:
> >...
> > The APR memory pools aren't good enough for us because they don't allow
> > for any freeing which just doesn't work for PHP.
>
>Um. We use pools in Subversion and free the memory all the time. The key is
>the use of subpools. I added some notes about our experiences at the end of
>this document:
>
>     http://cvs.apache.org/viewcvs/apr-serf/docs/roadmap.txt?rev=1.3
>
>Note that pools can also be configured to not have a per-thread lock.

Ouch, you really worked hard there. That is exactly what we can't do in 
PHP. Our code base is so big that the easiest solution for us has always 
been to just give our users the memory allocation API they are used to (in 
our case emalloc(), efree(), erealloc() and so on) and just make sure that 
all of this memory gets freed at the end of each request (we also have some 
leak detection code but that is coded on top of the actual memory manager).
Also as PHP is a scripting language it can run for quite a bit and do lots 
of allocation's and free's. You can't really do any planning like you guys 
did in Subversion on exactly when stuff can be freed and when not. Grouping 
memory allocations is virtually impossible. Anyway, it does seem that you 
guys had to work a bit too hard.


> >...
> > Do you guys have any interest in adding this kind of "smarter" memory pool
> > into APR? I think it's extremely useful.
>
>Sure. Although I'm a bit unclear on how it differs from using, say,
>apr_pool_destroy on a subpool to toss intermediate memory.

If I understood correctly the difference is that you don't need to group 
the memory but can allocate and toss memory when ever you need to. This 
kind of "knowing in advance" can't be done in PHP.

Andi

P.S. - I'm enthusiastically waiting for subversion. CVS just doesn't cut it 
anymore.


RE: Memory manager

Posted by Andi Gutmans <an...@zend.com>.
At 11:28 PM 6/24/2002 +0200, Sander Striker wrote:
>I think some of us have an interest in implementing reaps.  However,
>I'm not going to touch the pools code to get another mechanism in place
>anytime soon.  I know apr_free can be added to the current code with
>little trouble, keeping the costs in the free.

I didn't mean to touch the existing pool. I thought it might be interesting 
to have an additional kind of pool so that the user has some more choice.

>In any case I think it would be nice to see your code ;)

http://www.php.net/~andi/zend_mm/

Andi


Re: Memory manager

Posted by Greg Stein <gs...@lyra.org>.
On Mon, Jun 24, 2002 at 11:07:43PM +0300, Andi Gutmans wrote:
>...
> The APR memory pools aren't good enough for us because they don't allow
> for any freeing which just doesn't work for PHP.

Um. We use pools in Subversion and free the memory all the time. The key is
the use of subpools. I added some notes about our experiences at the end of
this document:

    http://cvs.apache.org/viewcvs/apr-serf/docs/roadmap.txt?rev=1.3

Note that pools can also be configured to not have a per-thread lock.

>...
> Do you guys have any interest in adding this kind of "smarter" memory pool 
> into APR? I think it's extremely useful.

Sure. Although I'm a bit unclear on how it differs from using, say,
apr_pool_destroy on a subpool to toss intermediate memory.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

RE: Memory manager

Posted by Sander Striker <st...@apache.org>.
Hi Andi,

> Hi,
> 
> PHP uses memory allocation extensively. During the life cycle of a PHP 
> script there is a huge amount of malloc()'s and free()'s. We found that 
> under multi-threaded web servers this leads to decreased performance due to 
> memory fragmentation and locking within the memory manager.
> The solution is using per-thread memory pools which don't lock and are 
> completely freed at the end of each request.
> Win32 supports this kind of per-thread memory pool with the 
> HeapCreate(HEAP_NO_SERIALIZE, ...) family of functions. Using these kind of 
> functions gave us a huge performance gain.
> Now with Apache 2 coming out I wanted to solve this problem in a 
> cross-platform way as I don't have Bill's API available on UNIX :) The APR 
> memory pools aren't good enough for us because they don't allow for any 
> freeing which just doesn't work for PHP.
> What we did was write a memory manager (similar to Doug Lea's malloc.c but 
> much more lightweight) which allows you to have many instances (pools) and 
> it supports allocation, freeing, reallocation. At the end of each request 
> it quickly frees all of the huge memory chunks it used. I started using it 
> with the new PHP scripting engine and am allocating memory in 64KB blocks 
> (run-time definable) and it seems to work pretty well. To allocate the 
> memory blocks themselves it uses malloc() which makes it extremely 
> portable. (I actually got that idea from APR).
> 
> Do you guys have any interest in adding this kind of "smarter" memory pool 
> into APR? I think it's extremely useful.

I think some of us have an interest in implementing reaps.  However,
I'm not going to touch the pools code to get another mechanism in place
anytime soon.  I know apr_free can be added to the current code with
little trouble, keeping the costs in the free.

In any case I think it would be nice to see your code ;)


> If you reply please cc: me because I'm not on the APR dev list.
> 
> Andi

Sander