You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Greg Stein <gs...@gmail.com> on 2008/10/06 07:02:50 UTC

result_pool and scratch_pool

Hey all,

On IRC, danielsh asked where I got the convention for
result_pool/scratch_pool that I've been using in wc_db.h and when I've
been adding more interfaces elsewhere. I wasn't on IRC at the time to
answer, and figured the mailing list would be a good place to do that
so others can read, too.

Way, WAY back when, I recall an FS function taking two pool arguments.
One for results, and one for temporary stuff. I don't remember why or
which function, but I think that sparked the idea. Over time, there
have been situations where it has become obvious that two conceptual
pools should be presented to a function: one for the results because
they will need to stick around beyond the lifetime of the function
call, and a second for any scratch/temporary work that the function
needs to perform. This is *especially* noticeable if the function does
a LOT of temporary allocations (i.e. calling into subsystems).

You can find the latter in our codebase: a function will allocate a
subpool, do its work, place results in the provided pool, then destroy
the subpool on exit. This is bad.

Why is it bad? The caller knows *much* more about what is going on
that the function itself does. The caller may be in an iterative loop
and already has a subpool for scratch allocations, so creating a new
pool is just stupid overhead. Or the caller knows that it is "done",
so any allocation can just go into a specific pool because there is no
need for destroying a pool to "make room" for something else --
everything will be destroyed in just a moment. The caller may be
provided a scratch pool and pass that into a sub-call because sharing
the same scratch pool keeps the same "high water" mark on memory
allocation(*).

In short, whenever you see a pool creation for something *other* than
an iterative pool, then it is bad. Some function signature(s)
somewhere need to be adjusted.

Along different lines, if an "object" stores a pool into itself for
later use (we're talking an object, not a baton which is just a
manifestation of control flow), then it is most probably also bad. Any
allocations should be under the control of callers. In general, the
caller should provide a result_pool with the same/longer lifetime as
the object, rather than allow the object to hold a pool for future
allocations.

Note that these patterns of pool use were essentially found through
the development of Subversion. The Apache httpd server never developed
a rigorous pattern of pool use because it pretty much uses just a few
pools during the processing of any given request. In a few key areas,
an iteration pool is used, but nobody recognized that as a formal
pattern of pool use. Within Subversion, we "discovered" that proper
use of pools across a complex set of libraries and objects was a
requirement, and developed these patterns to meet that need. When I
started the wc_db.h interface, I used the result_pool/scratch_pool
convention with an intent to add that to our standard pattern
throughout our system. It is a convention that improves on our use of
pools. If nobody has any object to the pattern, then i'd like to
document it in hacking.html under pool use and also mention the
convention for function signatures.

Thoughts?

Cheers,
-g

(*) there is one case where these self-allocated pools keep the
high-water mark low: if you call two functions in series, each using a
self-pool, then the high water mark is the max of each function,
rather than the sum. avoiding the sum would be important if a function
knows it uses a crazy amount of memory. I would maintain the need for
a self-pool is rare, and we just state the convention is to avoid, but
document/comment on a case-by-case basis when it is needed/used.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: result_pool and scratch_pool

Posted by Karl Fogel <kf...@red-bean.com>.
Greg Hudson <gh...@MIT.EDU> writes:
> Are there any practical problems which are solved by a scratch pool and
> a result pool?
>
> Our old discipline was:
>
>   * Pass a pool which is used for scratch and results.
>   * If you're going to loop, use a subpool.
>   * Be streamy (don't read whole files into memory).
>
> So every scratch allocation should be constant.  The memory usage from
> this discipline is O(stack size), which is generally not all that large.

This implies that if a caller wants to keep a result longer than the
loop_pool allows, it needs to copy that result into a different pool
(one that was not passed to the callee).  That works until the result is
some opaque object for which there is no dup() function.

You may have been asking for concrete examples, and I don't have any of
those.  I'll try to keep an eye out as I'm coding, though.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: result_pool and scratch_pool

Posted by Greg Hudson <gh...@MIT.EDU>.
Are there any practical problems which are solved by a scratch pool and
a result pool?

Our old discipline was:

  * Pass a pool which is used for scratch and results.
  * If you're going to loop, use a subpool.
  * Be streamy (don't read whole files into memory).

So every scratch allocation should be constant.  The memory usage from
this discipline is O(stack size), which is generally not all that large.

Passing around two pools is a lot of code noise.  I wouldn't suggest
doing it unless it solves a real problem.



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: result_pool and scratch_pool

Posted by Julian Foad <ju...@btopenworld.com>.
On Mon, 2008-10-06 at 15:50 -0700, David Glasser wrote:
> +1, this is a good memory management strategy.

Greg, thanks for the write-up. I would summarize the overall case as:

"If an algorithm can pass only a single pool to subroutines, it is
forced to choose either a space penalty (accumulating scratch data along
with results) or a time penalty (by using a temporary sub-pool and
having to copy the results out of it). If it can pass two pools, it can
avoid both penalties."

When I first saw examples of this and when I first read your message, I
couldn't help thinking that passing two pools around everywhere must be
too ugly to be right, and maybe there would be a neat way to encapsulate
the two pool arguments in a single argument or macro. (I expect many
functions will only pass their pool arguments along in a trivial way.)

But, the argument for doing this makes sense overall [1], and sticking
two pool arguments on each call doesn't seem any less neat than some of
the other things we do in the name of structured programming.

Of course, I realise now that only functions that return non-trivial
results need to take a result pool, so we won't be passing two pools to
every function.

Note that the distinction between "result pool" and "scratch pool" is
not quite as hard-cut as it appears. Where a function returns more than
one result, which is not uncommon, the caller may want to keep one
result persistently and use another only temporarily. However, extending
the paradigm to be "one result pool per result" would surely be
over-kill.

Though it feels like this is adding complexity, I suppose it is actually
only making the existing complexity more visible, which should be a good
thing.

So, cautiously, +1 from me too.

- Julian


[1] I could take issue with some details of the arguments, but don't
need to as the overall point is valid.


> (The "don't keep a pool in a baton" thing is already in HACKING;

FWIW, neither Greg nor HACKING says that. Greg says something about
"objects, as opposed to batons" and HACKING says something about a
particular per-object memory management technique.

>  "use
> two pools for new code or code you can adjust without compatibility
> issues" could definitely go there too.)
> 
> --dave
> 
> On Mon, Oct 6, 2008 at 12:02 AM, Greg Stein <gs...@gmail.com> wrote:
> > Hey all,
> >
> > On IRC, danielsh asked where I got the convention for
> > result_pool/scratch_pool that I've been using in wc_db.h and when I've
> > been adding more interfaces elsewhere. I wasn't on IRC at the time to
> > answer, and figured the mailing list would be a good place to do that
> > so others can read, too.
> >
> > Way, WAY back when, I recall an FS function taking two pool arguments.
> > One for results, and one for temporary stuff. I don't remember why or
> > which function, but I think that sparked the idea. Over time, there
> > have been situations where it has become obvious that two conceptual
> > pools should be presented to a function: one for the results because
> > they will need to stick around beyond the lifetime of the function
> > call, and a second for any scratch/temporary work that the function
> > needs to perform. This is *especially* noticeable if the function does
> > a LOT of temporary allocations (i.e. calling into subsystems).
> >
> > You can find the latter in our codebase: a function will allocate a
> > subpool, do its work, place results in the provided pool, then destroy
> > the subpool on exit. This is bad.
> >
> > Why is it bad? The caller knows *much* more about what is going on
> > that the function itself does. The caller may be in an iterative loop
> > and already has a subpool for scratch allocations, so creating a new
> > pool is just stupid overhead. Or the caller knows that it is "done",
> > so any allocation can just go into a specific pool because there is no
> > need for destroying a pool to "make room" for something else --
> > everything will be destroyed in just a moment. The caller may be
> > provided a scratch pool and pass that into a sub-call because sharing
> > the same scratch pool keeps the same "high water" mark on memory
> > allocation(*).
> >
> > In short, whenever you see a pool creation for something *other* than
> > an iterative pool, then it is bad. Some function signature(s)
> > somewhere need to be adjusted.
> >
> > Along different lines, if an "object" stores a pool into itself for
> > later use (we're talking an object, not a baton which is just a
> > manifestation of control flow), then it is most probably also bad. Any
> > allocations should be under the control of callers. In general, the
> > caller should provide a result_pool with the same/longer lifetime as
> > the object, rather than allow the object to hold a pool for future
> > allocations.
> >
> > Note that these patterns of pool use were essentially found through
> > the development of Subversion. The Apache httpd server never developed
> > a rigorous pattern of pool use because it pretty much uses just a few
> > pools during the processing of any given request. In a few key areas,
> > an iteration pool is used, but nobody recognized that as a formal
> > pattern of pool use. Within Subversion, we "discovered" that proper
> > use of pools across a complex set of libraries and objects was a
> > requirement, and developed these patterns to meet that need. When I
> > started the wc_db.h interface, I used the result_pool/scratch_pool
> > convention with an intent to add that to our standard pattern
> > throughout our system. It is a convention that improves on our use of
> > pools. If nobody has any object to the pattern, then i'd like to
> > document it in hacking.html under pool use and also mention the
> > convention for function signatures.
> >
> > Thoughts?
> >
> > Cheers,
> > -g
> >
> > (*) there is one case where these self-allocated pools keep the
> > high-water mark low: if you call two functions in series, each using a
> > self-pool, then the high water mark is the max of each function,
> > rather than the sum. avoiding the sum would be important if a function
> > knows it uses a crazy amount of memory. I would maintain the need for
> > a self-pool is rare, and we just state the convention is to avoid, but
> > document/comment on a case-by-case basis when it is needed/used.
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: result_pool and scratch_pool

Posted by David Glasser <gl...@davidglasser.net>.
+1, this is a good memory management strategy.

(The "don't keep a pool in a baton" thing is already in HACKING; "use
two pools for new code or code you can adjust without compatibility
issues" could definitely go there too.)

--dave

On Mon, Oct 6, 2008 at 12:02 AM, Greg Stein <gs...@gmail.com> wrote:
> Hey all,
>
> On IRC, danielsh asked where I got the convention for
> result_pool/scratch_pool that I've been using in wc_db.h and when I've
> been adding more interfaces elsewhere. I wasn't on IRC at the time to
> answer, and figured the mailing list would be a good place to do that
> so others can read, too.
>
> Way, WAY back when, I recall an FS function taking two pool arguments.
> One for results, and one for temporary stuff. I don't remember why or
> which function, but I think that sparked the idea. Over time, there
> have been situations where it has become obvious that two conceptual
> pools should be presented to a function: one for the results because
> they will need to stick around beyond the lifetime of the function
> call, and a second for any scratch/temporary work that the function
> needs to perform. This is *especially* noticeable if the function does
> a LOT of temporary allocations (i.e. calling into subsystems).
>
> You can find the latter in our codebase: a function will allocate a
> subpool, do its work, place results in the provided pool, then destroy
> the subpool on exit. This is bad.
>
> Why is it bad? The caller knows *much* more about what is going on
> that the function itself does. The caller may be in an iterative loop
> and already has a subpool for scratch allocations, so creating a new
> pool is just stupid overhead. Or the caller knows that it is "done",
> so any allocation can just go into a specific pool because there is no
> need for destroying a pool to "make room" for something else --
> everything will be destroyed in just a moment. The caller may be
> provided a scratch pool and pass that into a sub-call because sharing
> the same scratch pool keeps the same "high water" mark on memory
> allocation(*).
>
> In short, whenever you see a pool creation for something *other* than
> an iterative pool, then it is bad. Some function signature(s)
> somewhere need to be adjusted.
>
> Along different lines, if an "object" stores a pool into itself for
> later use (we're talking an object, not a baton which is just a
> manifestation of control flow), then it is most probably also bad. Any
> allocations should be under the control of callers. In general, the
> caller should provide a result_pool with the same/longer lifetime as
> the object, rather than allow the object to hold a pool for future
> allocations.
>
> Note that these patterns of pool use were essentially found through
> the development of Subversion. The Apache httpd server never developed
> a rigorous pattern of pool use because it pretty much uses just a few
> pools during the processing of any given request. In a few key areas,
> an iteration pool is used, but nobody recognized that as a formal
> pattern of pool use. Within Subversion, we "discovered" that proper
> use of pools across a complex set of libraries and objects was a
> requirement, and developed these patterns to meet that need. When I
> started the wc_db.h interface, I used the result_pool/scratch_pool
> convention with an intent to add that to our standard pattern
> throughout our system. It is a convention that improves on our use of
> pools. If nobody has any object to the pattern, then i'd like to
> document it in hacking.html under pool use and also mention the
> convention for function signatures.
>
> Thoughts?
>
> Cheers,
> -g
>
> (*) there is one case where these self-allocated pools keep the
> high-water mark low: if you call two functions in series, each using a
> self-pool, then the high water mark is the max of each function,
> rather than the sum. avoiding the sum would be important if a function
> knows it uses a crazy amount of memory. I would maintain the need for
> a self-pool is rare, and we just state the convention is to avoid, but
> document/comment on a case-by-case basis when it is needed/used.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
>
>



-- 
David Glasser | glasser@davidglasser.net | http://www.davidglasser.net/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org