You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by cm...@collab.net on 2002/05/14 16:23:35 UTC

Re: Help.

Just bringing this little dialogue into the public eye.

"Sander Striker" <st...@apache.org> writes:

> > From: cmpilato@collab.net [mailto:cmpilato@collab.net]
> > Sent: 13 May 2002 02:08
> 
> > I'm trying to piece something together here regarding Issue #622.  I
> > did a checkout of a copy of the subversion repository's /trunk (at
> > revision 1600-and-something) over ra_local, with pool debugging turned
> > on, and watching the process in `top'.  The `top' output showed the
> > svn process crawling steadily upwards in terms of memory usage,
> > finishing up at around 30M by the time my checkout completed.
> > However, the pool debugging output showed that we maxxed out our pool
> > usage at 2.29M.  The pool debugging output *looks* accurate to me,
> > since the whole checkout process is a bunch of recursion and looping,
> > all of which is very "subpool-informed", and I've gone over this
> > process pretty thoroughly.
> > 
> > What makes the actual footprint of the program so different in terms
> > of memory used?  Are we leaking non-pool memory somewhere?  Is the
> > pool code simply not re-using allocations?
> 
> The latter is indeed the case.  The production pools code does very
> little to reuse mem.  It is a space-time tradeoff.  There have been
> several patches to improve on mem reuse, but since there hasn't been
> a single project using pools that could benefit from these patches
> they've been lost in the archives.  Maybe now is a good time to reevaluate
> patches that ensure better reuse.
> 
> The reason Apache can get away with this is because apache has either
> shortlived pools or relatively small allocations.  And ofcourse when pools
> were invented they were tuned for Apache...
>  
> > I'd love to understand what's going on here.  As is, the way the
> > ra_local checkout system is written, we *should* (I believe, unless
> > I'm missing something big) be seeing memory usage that's proportional
> > only to the greatest directory depth of the checkout.  But I'm seeing
> > usage that is proportional to the number of items in the checkout
> > instead.
> 
> Sander

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Help.

Posted by Karl Fogel <kf...@newton.ch.collab.net>.
"Sander Striker" <st...@apache.org> writes:
> Also, consecutive blocks aren't joined together to form a bigger
> block.  In other words, you might have one big chunk of mem that could
> satisfy your allocation, but the allocator won't see it and gets new
> (unfragmented) mem.
> 
> For these issues patches were posted quite some time back.  Time to
> get them out from under the dust and test them.

Oh, +1 all over that :-)

-K

Re: Help.

Posted by Karl Fogel <kf...@newton.ch.collab.net>.
"Sander Striker" <st...@apache.org> writes:
> Also, consecutive blocks aren't joined together to form a bigger
> block.  In other words, you might have one big chunk of mem that could
> satisfy your allocation, but the allocator won't see it and gets new
> (unfragmented) mem.
> 
> For these issues patches were posted quite some time back.  Time to
> get them out from under the dust and test them.

Oh, +1 all over that :-)

-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: Help.

Posted by Sander Striker <st...@apache.org>.
> From: sussman@collab.net [mailto:sussman@collab.net]
> Sent: 14 May 2002 18:31

> striker@apache.org and cmpilato@collab.net write:
> 
> > > >  Is the pool code simply not re-using allocations?
> > > 
> > > The latter is indeed the case.  The production pools code does very
> > > little to reuse mem.  It is a space-time tradeoff. 
> 
> Well, this clearly explains why we're unable to do an ra_local
> checkout of /trunk, or even a single 'svnadmin dump' of a full
> revision.  Both of those tasks are being *very* anal about clearing
> and re-using pools; there's no reason they should run out of memory.

I find this a bit odd.  Why wouldn't it work over ra_local?  And what
kind of results do you get when you use the debug version of pools?
 
> We absolutely need to fix the pool code and make it aggressive about
> re-using memory.  SVN can't function otherwise.
> 
> If the httpd project requires time optimization over space, that's
> fine; just have httpd avoid calls to apr_pool_clear()!  Wouldn't that
> be enough?

No.  The problem is in how memory is reused currently.  The pool
maintains an active block.  If the next allocation doesn't fit in the
remaining part of that active block, the block is marked as full.
Then the pool gets a new active block.  Memory waste can grow
dramatically if you are unlucky enough to do your allocations in
a certain order.  I believe Greg Stein and I worked out that you can
get up to 400% of what you would expect in usage.

Also, consecutive blocks aren't joined together to form a bigger
block.  In other words, you might have one big chunk of mem that could
satisfy your allocation, but the allocator won't see it and gets new
(unfragmented) mem.

For these issues patches were posted quite some time back.  Time to
get them out from under the dust and test them.

Sander


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Help.

Posted by Ben Collins-Sussman <su...@collab.net>.
striker@apache.org and cmpilato@collab.net write:

> > >  Is the pool code simply not re-using allocations?
> > 
> > The latter is indeed the case.  The production pools code does very
> > little to reuse mem.  It is a space-time tradeoff. 

Well, this clearly explains why we're unable to do an ra_local
checkout of /trunk, or even a single 'svnadmin dump' of a full
revision.  Both of those tasks are being *very* anal about clearing
and re-using pools; there's no reason they should run out of memory.

We absolutely need to fix the pool code and make it aggressive about
re-using memory.  SVN can't function otherwise.

If the httpd project requires time optimization over space, that's
fine; just have httpd avoid calls to apr_pool_clear()!  Wouldn't that
be enough?




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Thoughts on fixing the APR pool problems Re: Help.

Posted by Brian Pane <br...@cnet.com>.
cmpilato@collab.net wrote:

>Just bringing this little dialogue into the public eye.
>
>"Sander Striker" <st...@apache.org> writes:
>
>>>From: cmpilato@collab.net [mailto:cmpilato@collab.net]
>>>Sent: 13 May 2002 02:08
>>>
>>>I'm trying to piece something together here regarding Issue #622.  I
>>>did a checkout of a copy of the subversion repository's /trunk (at
>>>revision 1600-and-something) over ra_local, with pool debugging turned
>>>on, and watching the process in `top'.  The `top' output showed the
>>>svn process crawling steadily upwards in terms of memory usage,
>>>finishing up at around 30M by the time my checkout completed.
>>>However, the pool debugging output showed that we maxxed out our pool
>>>usage at 2.29M.  The pool debugging output *looks* accurate to me,
>>>since the whole checkout process is a bunch of recursion and looping,
>>>all of which is very "subpool-informed", and I've gone over this
>>>process pretty thoroughly.
>>>
>>>What makes the actual footprint of the program so different in terms
>>>of memory used?  Are we leaking non-pool memory somewhere?  Is the
>>>pool code simply not re-using allocations?
>>>
>>The latter is indeed the case.  The production pools code does very
>>little to reuse mem.  It is a space-time tradeoff.  There have been
>>several patches to improve on mem reuse, but since there hasn't been
>>a single project using pools that could benefit from these patches
>>they've been lost in the archives.  Maybe now is a good time to reevaluate
>>patches that ensure better reuse.
>>
>>The reason Apache can get away with this is because apache has either
>>shortlived pools or relatively small allocations.  And ofcourse when pools
>>were invented they were tuned for Apache...
>>

We really need to provide a more general-purpose memory management
solution for situations where pool semantics aren't a good match for
an application.

I can think of two solutions:
  * Extend the apr_pool implementation to support freeing of blocks.
    I.e., add an upper bound on the size of the allocator free lists,
    and add an apr_pfree() function to free apr_palloc()ed blocks
    within a long-lived pool.  (What I'm thinking of here is something
    like the "reaps" design proposed by Emery Berger et al,
    ftp://ftp.cs.utexas.edu/pub/emery/papers/reconsidering-custom.pdf)

  * Or turn apr_pool_t into an abstract class, with concrete implementations
    to implement different allocator models (with the traditional 
Apache-style
    pool as the first of these).  In order to do this without impacting
    performance, we'd probably have to do a macro-based implementation:

      - Each pool object has a struct at the top that contains pointers
        to the pool's alloc function, free function (possibly null), cleanup
        function, etc

      - apr_palloc(p, size) becomes a macro:
            #define apr_palloc(p, size)  (*(p->alloc_fn))(p, size)
        And similarly for apr_pfree()

--Brian