You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Ben Collins-Sussman <su...@newton.ch.collab.net> on 2001/03/28 05:15:22 UTC

uhoh.

On *both* my home and work freebsd boxen, I'm getting "out of swap
space" errors when importing large trees into a repos.  (such as the
aversion or subversion directories.)

I think it's time to start using subpools on commits.  I *know* that
the commit driver (crawl_local_mods) creates/frees a subpool for each
level of directory descent... but that memory is only used for
creating hashes of entries.  The real problem is that the fs commit
editor needs to subpool when receiving a file's data, I think.  It's
currently just using one single pool stashed in the edit_baton.  Yee.

Re: uhoh.

Posted by Jim Blandy <ji...@zwingli.cygnus.com>.
Greg Stein <gs...@lyra.org> writes:
> I believe the main thrust is to add pools to FS function arguments, rather
> than using internal pools.

Well, all the FS functions take pools already.  There's no case where
we use an object's pool for per-operation allocation; that would be an
obvious botch.

The only thing that needs to happen to the public interface, I think,
is to delete the `close' functions:

    svn_error_t *svn_fs_close_fs (svn_fs_t *fs);
    svn_error_t *svn_fs_close_txn (svn_fs_txn_t *txn);
    void svn_fs_close_root (svn_fs_root_t *root);

If those functions don't exist, then filesystems, transactions, and
roots don't need per-object subpools.  The only drawback is that,
since we're letting the pool cleanup functions do the finalization, we
can't really return full Subversion error codes.

But to recap the argument made before --- here's why we shouldn't make
per-object subpools:

The sole advantage per-object subpools provide (ignoring the error-
returning issue mentioned above) is the ability to free the object
before you free its parent pool.  (You might think that a `close'
function is necessary for cleanup actions for the object --- closing
files, releasing DB resources, etc. --- but you could get the same
effect by registering a cleanup function with the parent pool.)  If
you're going to leave cleanup to the pool, instead of calling the
`close' functions explicitly --- a fine approach --- then the
per-object subpools are useless to you.

Note, then, that the caller could get the same benefit simply by
creating a subpool itself, and allocating the object in that subpool.
There's no additional flexibility gained by having libsvn_fs create
subpools for you.

Finally, note that the only time you really need to free objects early
is when you're in a loop or recursion, where the number of steps to be
performed depends on the data (walking a tree; processing a file's
contents; anything, basically).  In these cases, you should create a
pool at the start of each iteration, do the body of the loop, and then
free the pool at the end of the iteration.  That way, each iteration
doesn't leave dead objects behind.  In these cases, per-object
subpools are a waste --- they're subpools of the per-iteration pools,
and will die exactly when their parents do.  Useless.

So, in summary, per-object pools are either useless, or useless.

Per-iteration pools are where it's at.

Re: uhoh.

Posted by Greg Stein <gs...@lyra.org>.
On Wed, Mar 28, 2001 at 05:36:55PM -0600, Karl Fogel wrote:
> Greg Stein <gs...@lyra.org> writes:
> > > Hmm.  Now might be a good time to rework the filesystem not to create
> > > per-object subpools, with the corresponding interface changes.
> > 
> > +1
> > 
> > I'd be happy to see M2 slip a day to get our working set down to a
> > reasonable size.
> 
> (I think I must be missing the wavelength here...)
> 
> What is the "working set"

The "working set" is the memory that a process needs to complete its
actions. Effectively, the peak usage.

It is kind of a age-old term to describe "how much [memory] do you need to
accomplish your work?"

If you have 1G of mem/swap in your machine, and your working set is 100M,
then you can only run 10 operations simultaneously. If your working set is
1M, then you can run 1000 simultaneously. Thus, the desire for a small
working set.

Second, a server never really wants to swap. If that happens, then you're
really screwed. You want to actually design/scale things to that your
maximum load fits into availble RAM. So if you available RAM is 256MB, then
your can deal with 256 1M processes.

> and how does Jim's suggested change get it
> down to a reasonable size?

At the moment, the FS deals with a bunch of subpools internally. As
described in my "pool usage" thread a while back, this can sometimes be
counter-productive to what the caller is trying to do.

> It seems to me that what Jim is suggesting
> makes individual working sets bigger (if I understand the term
> correctly), not smaller; and that he's suggesting it because calling
> styles make it unnecessary for the fs to be so fussy now.
> 
> ... But I'm not sure.
> 
> Somebody please clarify? :-)

It puts more control in the caller's hands about how the FS will use memory.
The caller knows more about whether looping/repetition will occur and can
manage the memory better.

Effectively, it moves control of the memory usage from inside the FS to
outside, where more information about calling patterns are known.

I believe the main thrust is to add pools to FS function arguments, rather
than using internal pools.

However, if the FS is going to keep information across invocations (e.g.
some kind of cache), then it will need an internal pool for that (presumably
a subpool of the top-level FS object).

So... no, Jim's suggestion does not increase working set, unless the caller
keeps passing in the same pool (without clearing it occasionally).

> > Note: the (server) working set when using DAV will probably be quite a bit
> > better. It will be operating against the FS a bit at a time, then cleaning
> > up. We won't have these big editor-based pools that glom everything
> > together. Dunno what'll happen on the client, tho.
> 
> I'm confused.
> 
> In the past, Greg, you've suggested that we not worry too much about
> getting fine-grained editor pool granularity.

Yup. Creating a pool per object, and then using that pool for further
allocations, can lead to lifetime mismatches. For example, let's say that
the FS creates some directory entries, stored in an svn_fs_root subpool. We
fetch the entries and store them away somewhere. Then, we toss the subpool
that held the root and the entries. We've now got some dangling pointers.

Instead, if we always pass in a pool, then the caller can ensure the passed
pool has the same lifetime as the results of the operation.

[ and yes, I believe the dir_entries does use a passed pool; the example is
  just for an example :-) ]

Creating fine-grained pools can also create overheads relative to what the
caller is attempting to do with memory management. If the caller is only
ever going to create a single FS object, then why does the FS need to shove
everything into a subpool? It isn't going to need to later clear that pool
to make room for a second FS object in the working set.

Yes, it doesn't hurt much speed-wise to create the pool, but this goes back
to the confusion of lifetimes problem.

> Meantime, Ben just checked in a change that makes the filesystem
> editor use per-object subpools.

The editor uses subpools for two reasons:

1) pools are not passed for the editor functions
2) its semantics imply that it will be called multiple times

In effect, it is creating the subpools to deal with memory management issues
caused by looping because the API doesn't provide for it.

But that is an independent choice from our other APIs. And yes, I could
probably make an argument for including pools in the editor API :-)

> Meantime, the working copy update/checkout editor has been using
> per-object subpools for a long time now. :-)

Again, this is caused by the editor, not by a pool usage design choice.

> I'm not arguing with anything you're saying above; rather, I'm simply
> failing to understand the proposed changes, or even what's motivating
> change here.

Hopefully, I've clarified. Probably not fully answered everything, but
that's what add'l email is for, right? :-)

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: uhoh.

Posted by Karl Fogel <kf...@collab.net>.
Greg Stein <gs...@lyra.org> writes:
> > Hmm.  Now might be a good time to rework the filesystem not to create
> > per-object subpools, with the corresponding interface changes.
> 
> +1
> 
> I'd be happy to see M2 slip a day to get our working set down to a
> reasonable size.

(I think I must be missing the wavelength here...)

What is the "working set" and how does Jim's suggested change get it
down to a reasonable size?  It seems to me that what Jim is suggesting
makes individual working sets bigger (if I understand the term
correctly), not smaller; and that he's suggesting it because calling
styles make it unnecessary for the fs to be so fussy now.

... But I'm not sure.

Somebody please clarify? :-)

> Note: the (server) working set when using DAV will probably be quite a bit
> better. It will be operating against the FS a bit at a time, then cleaning
> up. We won't have these big editor-based pools that glom everything
> together. Dunno what'll happen on the client, tho.

I'm confused.

In the past, Greg, you've suggested that we not worry too much about
getting fine-grained editor pool granularity.

Meantime, Ben just checked in a change that makes the filesystem
editor use per-object subpools.

Meantime, the working copy update/checkout editor has been using
per-object subpools for a long time now. :-)

I'm not arguing with anything you're saying above; rather, I'm simply
failing to understand the proposed changes, or even what's motivating
change here.

?,
-K

Re: uhoh.

Posted by Greg Stein <gs...@lyra.org>.
On Wed, Mar 28, 2001 at 09:46:07AM -0500, Jim Blandy wrote:
> Ben Collins-Sussman <su...@newton.ch.collab.net> writes:
> > On *both* my home and work freebsd boxen, I'm getting "out of swap
> > space" errors when importing large trees into a repos.  (such as the
> > aversion or subversion directories.)
> > 
> > I think it's time to start using subpools on commits.  I *know* that
> > the commit driver (crawl_local_mods) creates/frees a subpool for each
> > level of directory descent... but that memory is only used for
> > creating hashes of entries.  The real problem is that the fs commit
> > editor needs to subpool when receiving a file's data, I think.  It's
> > currently just using one single pool stashed in the edit_baton.  Yee.
> 
> Hmm.  Now might be a good time to rework the filesystem not to create
> per-object subpools, with the corresponding interface changes.

+1

I'd be happy to see M2 slip a day to get our working set down to a
reasonable size.

Note: the (server) working set when using DAV will probably be quite a bit
better. It will be operating against the FS a bit at a time, then cleaning
up. We won't have these big editor-based pools that glom everything
together. Dunno what'll happen on the client, tho.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: uhoh.

Posted by Jim Blandy <ji...@zwingli.cygnus.com>.
Ben Collins-Sussman <su...@newton.ch.collab.net> writes:
> On *both* my home and work freebsd boxen, I'm getting "out of swap
> space" errors when importing large trees into a repos.  (such as the
> aversion or subversion directories.)
> 
> I think it's time to start using subpools on commits.  I *know* that
> the commit driver (crawl_local_mods) creates/frees a subpool for each
> level of directory descent... but that memory is only used for
> creating hashes of entries.  The real problem is that the fs commit
> editor needs to subpool when receiving a file's data, I think.  It's
> currently just using one single pool stashed in the edit_baton.  Yee.

Hmm.  Now might be a good time to rework the filesystem not to create
per-object subpools, with the corresponding interface changes.