You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@apr.apache.org by "Jay Freeman (saurik)" <sa...@saurik.com> on 2002/03/28 22:31:20 UTC

any documentation on the point of having pools?

Is there any documentation anywhere that describes "why you would want to
use pools"?  I've been using APR for over a year now in virtually all of my
projects, and I _still_ don't get what the advantage of this pool management
that's strewn all over my programs is.  I finally got fed up, wrote a C++
class named "pool" (with an autocast operator for getting an apr_pool_t *
and a destructor that destroys the pool), and have an instance of it in
_every APR related object_ so I have something I can pass to the APR
functions when they scream out for their precious pools :-P.  I pray at
nights that I'm not using an insane amount of working set by doing this,
hehe.

If I want to use sockets, I have to deal with pools (even for entirely
transient memory allocations).  If I want to spawn a thread, I need pools
(and I'm really getting worried that I might not be understanding this
"sup-pool" concept and might be causing weird thread contention on my
memory... I try to build a new pool every time I allocate even a byte of
memory, so hopefully this isn't a problem).  I've got code that loads DLLs
and shared objects, and apparently... apparently I need a pool for that too
(and I'm terrified that if I clear that pool my DLL might get unloaded from
memory).

My confusion is likely coming from the fact that I use C++, so please bear
with me.  Is this a speed optimization?  Do pools allocate memory _faster_
(such as in some sequential memory region somewhere) than the OS's virtual
memory manager can?  Are they there to try to make memory management easier?
If the latter, is there somewhere I can go to find out how and when it
finally gets easier?

Sincerely,
Jay Freeman (saurik)
saurik@saurik.com

Re: any documentation on the point of having pools?

Posted by "Jay Freeman (saurik)" <sa...@saurik.com>.

Karl:

Well, thank you very much for taking the time to write these comments...
this is definitely a C-centric way of thinking :).  However, this is
definitely not encouraging... I'm probably going to end up looking more into
the Netscape Portable Runtime as a replacement to APR.  As a C++ developer,
I manage object lifetime using scoping and try to avoid heap allocations
whereever possible.  When I allocate objects, they usually get allocated
onto the stack.  If I really need to heap allocate an object for some
reason, I'll have it wrapped into a small handle of some sort that can
manage it's lifetime for me.  This ends up being one of the driving tenents
of C++ design:  wrap resources into stack allocated objects in order to take
advantage of destructors.

The reasons for this are various, but there are two main ones.  First of
all, it keeps you from running into the problem of needing an arbitrary
number of free()s at the ends of functions to get rid of memory that was
allocated for the scope of that operation; hence leading to simpler code
(one of the reasons you mention for pools).  Secondly, it makes it easy to
support exceptions within the language.  If I were to be heap allocating
memory, and an exception occured, I would need to make sure I go to whatever
lengths neccessary to catch the various exceptions and destroy that
allocation.  Using C++ destructors, and the guarantee that they will be
called as the stack unwinds, I don't need to go through any extra work to
know that my function won't leak memory in the case of an exception.

Garbage collection (which you mention as a good thing for programmers) tends
to be frowned upon in this programming model, as it leads to
"non-deterministic finalization".  As much as I think garbage collectors
have promise to solve the problem of _memory_ management, they don't help at
all when you try to manage non-memory resources.  Really, they end up
hindering the process, as you have to explicitely call release() and free()
functions on the resources to get rid of them.  Look at the APR mutex.  I
still need to release the mutex when I'm done using it... if I were actually
putting control of that mutex into a garbage collected object I'd have to
either explicitely release() it or wait until the object got collected
(which might not be for a _long_ time after I stop using the mutex object).

That's something that seems to be common to a lot of these memory management
schemes... they try to make managing memory easier, but they either end up
making you A) manage the meta-level of how to allocate things and B) make it
virtually impossible to abstract away non-memory resource deallocation from
the user of the resource.

To demonstrate how C++ tries to handle these problems, let's work our way
towards using a mutex in C++ (to pick an APR friendly example that doesn't
use many lines of code... if the analogy is stretched to far pretend it is a
file handle or a database connection or something).  This is an example of a
non-memory related resource that needs to be tracked and handled
appropriately.

To start with, here is how (on Win32), this is accomplished at the C level
(please note that all formatting is going to be with the "short e-mail"
convention that I'm making up for the scope of this e-mail, hehe):

void myfunction() {
    HANDLE lock = CreateMutex(NULL, NULL, NULL);
    /* some code goes here that uses the mutex */
    CloseHandle(lock);
}

All right.  This works, but (in addition to having the "what if you forget
issue") it doesn't hold well when you are working in C++ and in many
situations is simply incorrect code.  The problem is that "some code" might
call some function that throws an exception, and we wouldn't get that mutex
handle closed (and thereby waste OS resources until such time as our process
exitted).  To solve this, the mutex implementation might be rewritten to be
wrapped into a class:

class mutex {
  protected:
    HANDLE mutex_;
  public:
    mutex() {   mutex_ = CreateMutex(NULL, NULL, NULL);   }
    ~mutex() {   CloseHandle(mutex_);   }
    lock() {   WaitForSingleObject(mutex_, INFINITE);   }
    unlock() {   ReleaseMutex(mutex_);   }
};

void myfunction() {
    mutex lock;
    /* some code goes here that uses the mutex */
}

Given a reasonably good compiler (in cases where exceptions were _known_ to
not be an issue for whatever reason) this would even generate the exact same
code as the previous version, so it isn't as if we added any un-needed
abstraction penalty.  Now, if an exception occurs in "some code", the mutex
handle will still get closed, and there won't be a memory leak.  Also, there
is no longer much of a risk of accidentally freeing the mutex twice, or of
forgetting to free it at all.  Obviously I'd want some error checking in
there, but I'm trying to keep this simple :).

So, over the last week or so, I've been occasionally applying this C++
mentality to writing a small little library to provide higher-level features
(such as networking) to my C++ programs using APR as the underlying OS
abstraction.  This involves wrapping core APR concepts into classes,
providing appropriate destructors, and mapping some things up to more
standard C++isms (my initial usage was to write a C++ iostream/streambuf for
tcp client connections).

If I start writing an APR implementation of mutex's, then I end up with code
like this:

class mutex {
  protected:
    apr_thread_mutex_t *mutex_;
  public:
    mutex() {   apr_thread_mutex_create(&mutex_, APR_THREAD_MUTEX_DEFAULT,
/*POOL*/);   }
    ~mutex() {   apr_thread_mutex_destroy(mutex_);   }
    lock() {   apr_thread_mutex_lock(mutex_);   }
    unlock() {   apr_thread_mutex_unlock(mutex_);   }
};

Note that there is this extra little thing there having to do with a "pool".
Well, to deal with this, I wrote a wrapper for APR's pools called "pool".
The constructor creates a pool, the destructor destroys it, and it has a
clear() method.  It also has an autocast operator so you can use it as if it
were an apr_pool_t *.  OK, so let's use it:

    mutex(pool &context) {   apr_thread_mutex_create(&mutex_,
APR_THREAD_MUTEX_DEFAULT, context);   }

Now I can write my function as follows:

void myfunction() {
    pool context;
    mutex lock(context);
    /* some code that uses the mutex */
}

If an exception gets thrown in this case, both the pool will be destroyed,
and the mutex will be released.  That isn't that bad for the mutex example
(as if someone else were to have freed that pool it likely isn't going to
cause an issue, and the pool is only used once in that single constructor),
but it requires every single thing that wants to use a mutex to have a pool
around to allocate it into.  It actually starts to cause problems when you
are trying to provide more complex features.  Let's look at the networking
example... here is my netaddr class:

class netaddr {
  protected:
    pool pool_;
    apr_sockaddr_t *addr_;

  public:
    netaddr(const std::string &addr) {
        char *host;   apr_port_t port;
        apr_parse_addr_port(&host, NULL, &port, addr.c_str(), pool_);
        apr_sockaddr_info_get(&addr_, host, APR_INET, port, 0, pool_);
    }

    explicit netaddr(socket &sock) {   apr_socket_addr_get(&addr_,
APR_LOCAL, sock);   }
    netaddr(const std::string &host, apr_port_t port) {
apr_sockaddr_info_get(&addr_, host.c_str(), APR_INET, port, 0, pool_);   }

    void set_port(apr_port_t port) {   apr_sockaddr_port_set(addr_,
rt);   }
    operator apr_sockaddr_t *() const {   return addr_;   }
};

What's nice about this class is, if you have a function that takes a "const
netaddr &", and you pass it a string (such as "saurik.com:80"), the compiler
will take care of calling the constructor and making it work for you.  When
the class goes away, so does the pool that allocated its single snippet of
memory.  Everything is self contained.  What's annoying about this class is
that it has it's own private pool for no reason other than to make APR happy
:-P.  More annoyingly, this strategy doesn't even continue to work for other
types of objects.  My TCP Server class's accept() method is (accourding a
recent thread on this mailing list) leaking memory, as the pool that you use
to accept the connection with gets stuff allocated into it, and therefor
needs to be cleared more often than the pool that you have the server socket
working with.  That pool should apparently be something bound to the new
connection, not something bound to the thing listening on the old
connection...

This means that I should really be allowing these different functions that
need pools to actually take pools as arguments (thereby exposing the memory
management to the user of the object, as well as removing the ability to
have implicit constructors and overloaded operators).  In the case of my
network address class, if I allowed it to take a pool and allocate out of
that pool (as in the mutex example), then the pool might get destroyed
before this object does, and then this entire object would be invalidated.
That's a bad thing.  There's no reason I should have to run into that
situation.  I've actually got to the point this morning (soon before I sent
my original post) where I was seriously considering adding a reference
counting abstraction over the pool memory system, and then having any object
that has any memory allocated into the pool holding a reference to said pool
to make sure the memory didn't get taken out from underneath it.  It was
soon after I realized there was no way in hell I could still feel good about
proposing this solution to the people I work with as a replacement for our
existing, not really OS-independent networking and threading library that I
started researching the Netscape Portable Runtime :(.

Sincerely,
Jay Freeman (saurik)
saurik@saurik.com

----- Original Message -----
From: "Karl Fogel" <kf...@newton.ch.collab.net>
To: "Jay Freeman (saurik)" <sa...@saurik.com>
Cc: "apr-dev" <de...@apr.apache.org>
Sent: Thursday, March 28, 2002 7:34 PM
Subject: Re: any documentation on the point of having pools?


> Jay,
>
> I can partially answer your question.
>
> Let's say there are three kinds of memory allocation in the world:
>
>    1. raw -- you know, like C malloc() and free()
>    2. pools
>    3. fully garbage-collected
>
> For the programmer, full GC is ideal.  Unfortunately, it takes time
...
> Anyway, APR is written in C, and that's actually an important part of
> its design as a portability layer.  So full GC would be technically,
...
> So let's look at the remaining two options: raw vs pools.
>
> Some programmers find pools easier to work with, some prefer raw
> allocation.  We'll probably never get agreement on that.
...
> Aside from the efficiency aspect (which I suspect is not so great as
> to be a major motivation, perhaps Sander or someone can comment?),
> people who like pools like them because they give a convenient idiom
> for expressing the lifetimes of objects.  If you have a run of code
> that's going to cons up [er, excuse me, allocate] some objects, all of
> which need to remain valid for the duration of a certain set of
> operations, it's handy to put them all in the same pool, and just
> destroy the pool at the end.  When the same code is written using raw
> allocation, it usually flaunts a dozen calls to free() at the end, and
> when you add a new object to that run of code, it's easy to forget to
> add yet another call to free().  Note that in the pool style, it's
> usually easy to see which pool you're supposed to allocate the thing
> in, or at least the presence of multiple pools there will force you to
> ask yourself about the object's lifetime, which malloc won't.
>
> Wow, I can't believe I stopped coding to write this :-).  I hope it's
> at least technically accurate (fixes welcome!), if not persuasive.
> For the record, I like pools when I don't hate them.
>
> -Karl

RE: any documentation on the point of having pools?

Posted by Emery Berger <em...@cs.utexas.edu>.

Hi all,

I (together with my co-authors, Ben Zorn & Kathryn McKinley) have just
submitted a paper to OOPSLA which discusses lots of custom allocation
strategies, including pools (more commonly known as regions). It
explains the benefits of pools, along with some of their drawbacks & a
proposed solution. Here's the abstract:

Programmers hoping to achieve performance improvements often use
custom memory allocators. This in-depth study examines eight
applications that use custom allocators. Surprisingly, for six of
these applications, a state-of-the-art general-purpose allocator
performs as well as or better than the custom allocators. The two
exceptions use regions, which deliver higher performance (improvements
of up to 44\%). Regions also reduce programmer burden and eliminate a
source of memory leaks. However, we show that the inability of
programmers to free individual objects within regions can lead to a
substantial increase in memory consumption. Worse, this limitation
precludes the use of regions in common programming idioms, reducing
their usefulness.

We present a generalization of general-purpose and region-based
allocators that we call {\em reaps}. Reaps are a combination of
regions and heaps, providing a full range of region semantics with the
addition of individual object deletion. We show that our
implementation of reaps provides high performance, outperforming other
allocators with region-like semantics. Our results indicate that most
programmers needing faster memory allocators should use a better
general-purpose allocator rather than writing a custom allocator, and
that programmers needing regions should instead use reaps.

A pre-print of the paper is available at:

http://www.cs.utexas.edu/users/emery/download.cgi?location=custom.pdf

--
Emery Berger
Dept. of Computer Science
The University of Texas at Austin
www.cs.utexas.edu/users/emery


> -----Original Message-----
> From: Karl Fogel [mailto:kfogel@newton.ch.collab.net]
> Sent: Thursday, March 28, 2002 7:34 PM
> To: Jay Freeman (saurik)
> Cc: apr-dev
> Subject: Re: any documentation on the point of having pools?
> 
> Jay,
> 
> I can partially answer your question.
> 
> Let's say there are three kinds of memory allocation in the world:
> 
>    1. raw -- you know, like C malloc() and free()
>    2. pools
>    3. fully garbage-collected
> 
> For the programmer, full GC is ideal.  Unfortunately, it takes time
> for the GC code to figure out what's garbage and what's not, and to
> free it.  Or if that phase is to be instantaneous, then there must be
> lots of little bits of overhead scattered all around, since all
> allocations will be required to do some GC bookkeeping.  Usually GC is
> implemented with a mixture of these two strategies, but they total up
> to the same penalty, speaking *very* broadly of course.
> 
> I don't want to get into one of programming's longest-running debates,
> but let's just say that despite occasional claims that full GC can, in
> principle, be just as efficient as "raw" allocation methods, in
> practice it never has been, and looks unlikely to be so in the near
> future.  There are also some issues when it comes to interacting with
> non-GC'd languages, as you might expect.  Not knocking GC -- my
> far-and-away favorite language is Lisp -- but GC comes with a penalty.
> 
> Anyway, APR is written in C, and that's actually an important part of
> its design as a portability layer.  So full GC would be technically,
> uh, difficult under the circumstances, even without considering the
> performance hit. :-)
> 
> So let's look at the remaining two options: raw vs pools.
> 
> Some programmers find pools easier to work with, some prefer raw
> allocation.  We'll probably never get agreement on that.
> 
> However, there is one nice thing about pools: they can fulfill the
> promise that GC never did -- the promise of being more efficient than
> malloc() and free().  The reason is that in raw-style allocation,
> every malloc() call must have a matching free() call.  But a pool can
> clean up multiple mallocs() with one free().  I'm not talking about
> literal "malloc" calls, of course, but just the act of allocating
> something in that pool; and by "free" I mean apr_pool_clear or
> apr_pool_destroy, but you get the idea.  Pool bookkeeping is done in
> such a way that you can mark the whole pool as reclaimable in one
> essentially constant-time operation, independent of how many objects
> (of whatever lengths) you may have allocated in that pool.
> 
> Aside from the efficiency aspect (which I suspect is not so great as
> to be a major motivation, perhaps Sander or someone can comment?),
> people who like pools like them because they give a convenient idiom
> for expressing the lifetimes of objects.  If you have a run of code
> that's going to cons up [er, excuse me, allocate] some objects, all of
> which need to remain valid for the duration of a certain set of
> operations, it's handy to put them all in the same pool, and just
> destroy the pool at the end.  When the same code is written using raw
> allocation, it usually flaunts a dozen calls to free() at the end, and
> when you add a new object to that run of code, it's easy to forget to
> add yet another call to free().  Note that in the pool style, it's
> usually easy to see which pool you're supposed to allocate the thing
> in, or at least the presence of multiple pools there will force you to
> ask yourself about the object's lifetime, which malloc won't.
> 
> Wow, I can't believe I stopped coding to write this :-).  I hope it's
> at least technically accurate (fixes welcome!), if not persuasive.
> For the record, I like pools when I don't hate them.
> 
> -Karl
> 
> "Jay Freeman \(saurik\)" <sa...@saurik.com> writes:
> > Is there any documentation anywhere that describes "why you would
want
> to
> > use pools"?  I've been using APR for over a year now in virtually
all of
> my
> > projects, and I _still_ don't get what the advantage of this pool
> management
> > that's strewn all over my programs is.  I finally got fed up, wrote
a
> C++
> > class named "pool" (with an autocast operator for getting an
apr_pool_t
> *
> > and a destructor that destroys the pool), and have an instance of it
in
> > _every APR related object_ so I have something I can pass to the APR
> > functions when they scream out for their precious pools :-P.  I pray
at
> > nights that I'm not using an insane amount of working set by doing
this,
> > hehe.
> >
> > [...]

Re: any documentation on the point of having pools?

Posted by Karl Fogel <kf...@newton.ch.collab.net>.

Jay,

I can partially answer your question.

Let's say there are three kinds of memory allocation in the world:

   1. raw -- you know, like C malloc() and free()
   2. pools
   3. fully garbage-collected

For the programmer, full GC is ideal.  Unfortunately, it takes time
for the GC code to figure out what's garbage and what's not, and to
free it.  Or if that phase is to be instantaneous, then there must be
lots of little bits of overhead scattered all around, since all
allocations will be required to do some GC bookkeeping.  Usually GC is
implemented with a mixture of these two strategies, but they total up
to the same penalty, speaking *very* broadly of course.

I don't want to get into one of programming's longest-running debates,
but let's just say that despite occasional claims that full GC can, in
principle, be just as efficient as "raw" allocation methods, in
practice it never has been, and looks unlikely to be so in the near
future.  There are also some issues when it comes to interacting with
non-GC'd languages, as you might expect.  Not knocking GC -- my
far-and-away favorite language is Lisp -- but GC comes with a penalty.

Anyway, APR is written in C, and that's actually an important part of
its design as a portability layer.  So full GC would be technically,
uh, difficult under the circumstances, even without considering the
performance hit. :-)

So let's look at the remaining two options: raw vs pools.

Some programmers find pools easier to work with, some prefer raw
allocation.  We'll probably never get agreement on that.

However, there is one nice thing about pools: they can fulfill the
promise that GC never did -- the promise of being more efficient than
malloc() and free().  The reason is that in raw-style allocation,
every malloc() call must have a matching free() call.  But a pool can
clean up multiple mallocs() with one free().  I'm not talking about
literal "malloc" calls, of course, but just the act of allocating
something in that pool; and by "free" I mean apr_pool_clear or
apr_pool_destroy, but you get the idea.  Pool bookkeeping is done in
such a way that you can mark the whole pool as reclaimable in one
essentially constant-time operation, independent of how many objects
(of whatever lengths) you may have allocated in that pool.

Aside from the efficiency aspect (which I suspect is not so great as
to be a major motivation, perhaps Sander or someone can comment?),
people who like pools like them because they give a convenient idiom
for expressing the lifetimes of objects.  If you have a run of code
that's going to cons up [er, excuse me, allocate] some objects, all of
which need to remain valid for the duration of a certain set of
operations, it's handy to put them all in the same pool, and just
destroy the pool at the end.  When the same code is written using raw
allocation, it usually flaunts a dozen calls to free() at the end, and
when you add a new object to that run of code, it's easy to forget to
add yet another call to free().  Note that in the pool style, it's
usually easy to see which pool you're supposed to allocate the thing
in, or at least the presence of multiple pools there will force you to
ask yourself about the object's lifetime, which malloc won't.

Wow, I can't believe I stopped coding to write this :-).  I hope it's
at least technically accurate (fixes welcome!), if not persuasive.
For the record, I like pools when I don't hate them.

-Karl

"Jay Freeman \(saurik\)" <sa...@saurik.com> writes:
> Is there any documentation anywhere that describes "why you would want to
> use pools"?  I've been using APR for over a year now in virtually all of my
> projects, and I _still_ don't get what the advantage of this pool management
> that's strewn all over my programs is.  I finally got fed up, wrote a C++
> class named "pool" (with an autocast operator for getting an apr_pool_t *
> and a destructor that destroys the pool), and have an instance of it in
> _every APR related object_ so I have something I can pass to the APR
> functions when they scream out for their precious pools :-P.  I pray at
> nights that I'm not using an insane amount of working set by doing this,
> hehe.
>
> [...]