You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by "D.J. Heap" <dj...@dhiprovo.com> on 2003/05/15 01:16:24 UTC

[PATCH] Reduce memory usage of svn log -v

*Disclaimer*
I'm not an expert on APR and pools, but I think I have a fair
understanding and the following patch significantly reduces the memory
usage of doing a svn log -v on a revision where about 30,880 files were
imported.  Before the patch, it would crash after chewing through about
1.5GB of memory and running out.  After the patch, it completes and uses
around 3 to 5MB or so.

If the patch can be simplified or should be done another way please feel
free to do it or comment and I'll try to do it.

Tests completed successfully for me.

Log:

Reduce the memory usage of 'verbose' log commands.

* subversion/libsvn_repos/log.c
  (detect_changed): Use a scratch pool for the svn_fs_copied_from
  call and clear it each iteration.


**********************************************************************
This email and any files transmitted with it are confidential
and intended solely for the use of the individual or entity to
whom they are addressed. If you have received this email
in error please notify the system manager.

This footnote also confirms that this email message has been
swept by MIMEsweeper for the presence of computer viruses.

www.mimesweeper.com
**********************************************************************


Re: [PATCH] Reduce memory usage of svn log -v

Posted by cm...@collab.net.
Greg Stein <gs...@lyra.org> writes:

> p.p.s. the concept of passing in a pre-created hash table, and having the
>   function fill it in, is also a bit non-standard. it would be best for the
>   function to allocate the 'changed' hash table itself and return it.

If this statement is aimed at svn_fs_revisions_changed(), then not
only do I agree, but I've made that very interface change on the
fs-schema-changes branch.  Oh, well it sits as local mod in my branch
working copy, but the point is ...

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: [PATCH] Reduce memory usage of svn log -v

Posted by "D.J. Heap" <dj...@dhiprovo.com>.
Here is the revised patch following Greg Stein's comments:

Log:

Reduce the memory usage of 'verbose' log commands.

* subversion/libsvn_repos/log.c

  (detect_changed): Allocate and return the CHANGED hash set
  rather than take a pre-created one and fill it. Moved the
  declaration of ITEM into the loop.  Use POOL for the
  CHANGES hash set and the hash iterator.  Clear SUBPOOL
  at the start of each iteration.

  (svn_repos_get_logs): Removed allocation of CHANGED_PATHS --
  detected_changed will allocate it now.


-----Original Message-----
From: Greg Stein [mailto:gstein@lyra.org] 
Sent: Thursday, May 15, 2003 1:16 AM
To: D.J. Heap
Cc: dev@subversion.tigris.org
Subject: Re: [PATCH] Reduce memory usage of svn log -v


On Wed, May 14, 2003 at 08:37:33PM -0600, D.J. Heap wrote:
> Scott Collins wrote:
>...
> >In your patch, would it be as effective to not add |scratchpool|, but
in 
> >the place where you added |svn_pool_clear (scratchpool);|, substitute

> >|svn_pool_clear (subpool);|?  But if turns out you must add 
> >|scratchpool|, is it better if it comes out of |subpool| rather than 
> >|pool|?
>...
> I believe subpool must not be cleared (actually I tried it and boom) 
> because the hash iterator is in it?

Right.

Looking at that function, it would be fixed if we follow standard design
policy here:

* things allocated outside a loop are placed into the pool passed to the
  function. the caller knows how many times the function will be called
and
  will deal with handling the passed-in pool properly (clearing and
  destroying at the right times).
  
  placing these items into a subpool doesn't help since the peak memory
  usage is the same whether you place them directly into the pool or
into a
  subpool. thus, the policy to use the passed-in pool.

* anything within a loop should use a subpool (sometimes called an
"iterator
  pool"). the subpool is cleared on each entry to the loop.


Thus: place the svn_fs_paths_changed() can apr_has_first() results into
'pool'. The item and the paths placed into the item also go into the
pool
since they need to live past the function-return. The
svn_fs_copied_from()
stays in the subpool. The subpool gets cleared on loop-entry.

No scratchpool is needed -- the subpool should have filled that role
from
the start.

Cheers,
-g

p.s. separate change here: the 'item' variable declaration would
normally
  occur inside the block rather than the outermost block. we scope
things as
  tightly as possible.
p.p.s. the concept of passing in a pre-created hash table, and having
the
  function fill it in, is also a bit non-standard. it would be best for
the
  function to allocate the 'changed' hash table itself and return it.

-- 
Greg Stein, http://www.lyra.org/


**********************************************************************
This email and any files transmitted with it are confidential
and intended solely for the use of the individual or entity to
whom they are addressed. If you have received this email
in error please notify the system manager.

This footnote also confirms that this email message has been
swept by MIMEsweeper for the presence of computer viruses.

www.mimesweeper.com
**********************************************************************


Re: [PATCH] Reduce memory usage of svn log -v

Posted by Greg Stein <gs...@lyra.org>.
On Wed, May 14, 2003 at 08:37:33PM -0600, D.J. Heap wrote:
> Scott Collins wrote:
>...
> >In your patch, would it be as effective to not add |scratchpool|, but in 
> >the place where you added |svn_pool_clear (scratchpool);|, substitute  
> >|svn_pool_clear (subpool);|?  But if turns out you must add 
> >|scratchpool|, is it better if it comes out of |subpool| rather than 
> >|pool|?
>...
> I believe subpool must not be cleared (actually I tried it and boom) 
> because the hash iterator is in it?

Right.

Looking at that function, it would be fixed if we follow standard design
policy here:

* things allocated outside a loop are placed into the pool passed to the
  function. the caller knows how many times the function will be called and
  will deal with handling the passed-in pool properly (clearing and
  destroying at the right times).
  
  placing these items into a subpool doesn't help since the peak memory
  usage is the same whether you place them directly into the pool or into a
  subpool. thus, the policy to use the passed-in pool.

* anything within a loop should use a subpool (sometimes called an "iterator
  pool"). the subpool is cleared on each entry to the loop.


Thus: place the svn_fs_paths_changed() can apr_has_first() results into
'pool'. The item and the paths placed into the item also go into the pool
since they need to live past the function-return. The svn_fs_copied_from()
stays in the subpool. The subpool gets cleared on loop-entry.

No scratchpool is needed -- the subpool should have filled that role from
the start.

Cheers,
-g

p.s. separate change here: the 'item' variable declaration would normally
  occur inside the block rather than the outermost block. we scope things as
  tightly as possible.
p.p.s. the concept of passing in a pre-created hash table, and having the
  function fill it in, is also a bit non-standard. it would be best for the
  function to allocate the 'changed' hash table itself and return it.

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Reduce memory usage of svn log -v

Posted by "D.J. Heap" <dj...@shadyvale.net>.
Scott Collins wrote:
> I'm not sure I understand pools well enough either, however, I'm willing 
> to put my money where my mouth is and test it.  So I guess I'm not 
> really evaluating your patch, so much as asking further questions of the 
> person who eventually does.
> 
> In your patch, would it be as effective to not add |scratchpool|, but in 
> the place where you added |svn_pool_clear (scratchpool);|, substitute  
> |svn_pool_clear (subpool);|?  But if turns out you must add 
> |scratchpool|, is it better if it comes out of |subpool| rather than 
> |pool|?
> 
> I'll go try in my working copy now.
> __________
> Scott Collins <http://ScottCollins.net/>
> 

I believe subpool must not be cleared (actually I tried it and boom) 
because the hash iterator is in it?

As far as making scratchpool come from subpool, I have no idea if that 
would be better or not...I would be interested in a comment on that as well.

DJ



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Reduce memory usage of svn log -v

Posted by Scott Collins <sc...@ScottCollins.net>.
I'm not sure I understand pools well enough either, however, I'm 
willing to put my money where my mouth is and test it.  So I guess I'm 
not really evaluating your patch, so much as asking further questions 
of the person who eventually does.

In your patch, would it be as effective to not add |scratchpool|, but 
in the place where you added |svn_pool_clear (scratchpool);|, 
substitute  |svn_pool_clear (subpool);|?  But if turns out you must add 
|scratchpool|, is it better if it comes out of |subpool| rather than 
|pool|?

I'll go try in my working copy now.
__________
Scott Collins <http://ScottCollins.net/>