You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Paul Burba <pa...@softlanding.com> on 2006/11/08 15:19:29 UTC

Re: [PATCH] Was: Enhancement needed in svn status -u

Paul Burba <pa...@softlanding.com> wrote on 10/27/2006 02:57:26 PM:

> Daniel Rall <dl...@collab.net> wrote on 10/25/2006 02:55:09 PM:
> 
> > I've created a temporary branch, "ood-status-info", so we have a place
> > to work on refining these fixes.
> > 
> > On Fri, 13 Oct 2006, Paul Burba wrote:
> <snip>
> > > Our remaining issues(?):
> > > 
> > > A) The performance of svn_repos_deleted_rev()
> > 
> > This is currently too slow to integrate into trunk.  Let's write a
> > faster implementation -- hopefully we can speed things up by moving
> > the logic down inside the FS layer.
> 
> I think I solved this performance problem, not in FS, but with a 
different 
> implmentation of the new function svn_repos_deleted_rev().
> 
> A quick recap the existing performance problem (in r22118 of 
> svn/branches/ood-status-info):
> 
>  * Start with a standard greek-tree test (fsfs) repos.
>  * Delete /A/D/H/omega in r2.
>  * Make a whole slew of (50,000+) of text mods to the other files.
>  * Check out r1 via file://
>  * Run "time svn st -u -v" in Cygwin
> 
> r22118 of svn_repos_deleted_rev() implementation takes almost 5 minutes 
on 
> my machine:
> 
> $ time svn st -v -u SVN/stat-tests-1/
>        *        1        1 jrandom      SVN\stat-tests-1\A\B\E\beta
>        *        1        1 jrandom      SVN\stat-tests-1\A\B\E\alpha
>                 1        1 jrandom      SVN\stat-tests-1\A\B\E
>        *        1        1 jrandom      SVN\stat-tests-1\A\B\lambda
>                 1        1 jrandom      SVN\stat-tests-1\A\B\F
>                 1        1 jrandom      SVN\stat-tests-1\A\B
>        *        1        1 jrandom      SVN\stat-tests-1\A\D\G\pi
>        *        1        1 jrandom      SVN\stat-tests-1\A\D\G\rho
>        *        1        1 jrandom      SVN\stat-tests-1\A\D\G\tau
>                 1        1 jrandom      SVN\stat-tests-1\A\D\G
>        *        1        1 jrandom      SVN\stat-tests-1\A\D\H\omega
>                 1        1 jrandom      SVN\stat-tests-1\A\D\H\psi
>        *        1        1 jrandom      SVN\stat-tests-1\A\D\H\chi
>                 1        1 jrandom      SVN\stat-tests-1\A\D\H
>        *        1        1 jrandom      SVN\stat-tests-1\A\D\gamma
>                 1        1 jrandom      SVN\stat-tests-1\A\D
>        *        1        1 jrandom      SVN\stat-tests-1\A\mu
>                 1        1 jrandom      SVN\stat-tests-1\A\C
>                 1        1 jrandom      SVN\stat-tests-1\A
>        *        1        1 jrandom      SVN\stat-tests-1\iota
>                 1        1 jrandom      SVN\stat-tests-1
> Status against revision:  53600
> 
> real    4m58.004s
> user    0m0.015s
> sys     0m0.031s
> 
> The slowness is due to svn_repos_deleted_rev's linear search that starts 

> at the last rev in the search range and works backwards, looking for the 

> last time a path existed.  In the above scenario this takes a long time 
> because a 50k rev files need to be opened.  At the time I'd thought the 
> linear was the only way to go because a path might be deleted, re-added 
> and deleted many times, so a binary search wouldn't work.  But as Mark 
> pointed out to me, we don't really care if a path of the same name was 
> later added then deleted, that is for the path's parent to care about in 

> it's ood info.  Even if a path is moved within the same directory this 
> still holds.
> 
> Looking at the problem in this new light, I used svn_fs_node_id() and 
> svn_fs_compare_ids() to reimplement svn_repos_deleted_rev() as a 
> non-recursive binary search and committed it to the ood branch in 
r22139.
> 
> The change from linear to logarithmic running time has the expected 
> results, the 5 minute operation now takes less than a second:
> 
> $ time svn st -u -v SVN/stat-tests-1
>        *        1        1 jrandom      SVN\stat-tests-1\A\B\E\beta
>        *        1        1 jrandom      SVN\stat-tests-1\A\B\E\alpha
>                 1        1 jrandom      SVN\stat-tests-1\A\B\E
>        *        1        1 jrandom      SVN\stat-tests-1\A\B\lambda
>                 1        1 jrandom      SVN\stat-tests-1\A\B\F
>                 1        1 jrandom      SVN\stat-tests-1\A\B
>        *        1        1 jrandom      SVN\stat-tests-1\A\D\G\pi
>        *        1        1 jrandom      SVN\stat-tests-1\A\D\G\rho
>        *        1        1 jrandom      SVN\stat-tests-1\A\D\G\tau
>                 1        1 jrandom      SVN\stat-tests-1\A\D\G
>        *        1        1 jrandom      SVN\stat-tests-1\A\D\H\omega
>                 1        1 jrandom      SVN\stat-tests-1\A\D\H\psi
>        *        1        1 jrandom      SVN\stat-tests-1\A\D\H\chi
>                 1        1 jrandom      SVN\stat-tests-1\A\D\H
>        *        1        1 jrandom      SVN\stat-tests-1\A\D\gamma
>                 1        1 jrandom      SVN\stat-tests-1\A\D
>        *        1        1 jrandom      SVN\stat-tests-1\A\mu
>                 1        1 jrandom      SVN\stat-tests-1\A\C
>                 1        1 jrandom      SVN\stat-tests-1\A
>        *        1        1 jrandom      SVN\stat-tests-1\iota
>                 1        1 jrandom      SVN\stat-tests-1
> Status against revision:  53600
> 
> real    0m0.340s
> user    0m0.015s
> sys     0m0.015s
> 
> Anyone see any problems with this approach?
> 
> Paul B.

For anyone who missed it, we discussed this patch at length on IRC on 
10/27:



Essentially the conclusion was that a binary search would work to find the 
*first* time a path was deleted.  Problem was, my binary search algorithm 
in r22139 made incorrect assumptions about how svn_fs_compare_ids() 
behaved when comparing copied/modified nodes.  I addressed these problems 
in r22234 of svn/branches/ood-status-info.  I also made an effort to 
clearly document how the binary search works in the comments for 
svn_repos_deleted_rev()...it might be worth look at those comments first 
if you have any questions on this.

Paul B.