You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Stefan Sperling <st...@apache.org> on 2014/09/17 21:58:39 UTC

Re: svn commit: r1625674 - /subversion/trunk/tools/server-side/svnpredumpfilter.py

On Wed, Sep 17, 2014 at 05:02:33PM -0000, stsp@apache.org wrote:
> Author: stsp
> Date: Wed Sep 17 17:02:33 2014
> New Revision: 1625674
> 
> URL: http://svn.apache.org/r1625674
> Log:
> Fix a big scalability problem in the implementation of svnpredumpfilter.py.
> 
> The script kept re-computing the set of additional include paths while
> mining the log history for copied paths. Each re-computation involved
> a full iteration of the set of copies accumulated so far, which made
> the run time explode on large repositories.
> Instead, we can gather all copies first, and then iterate them at once.
> 
> In my testing this change reduces the runtime of svnpredumpfilter.py on
> a 64GB large dump file of the FreeBSD repository (up to r271458) from
> several days(!) to 1.5 minutes.
> 
> * tools/server-side/svnpredumpfilter.py
>   (svn_log_stream_get_dependencies): Run dt.handle_changes() once the log
>    history has been fully scanned, not for each revision.

It is possible that there is a slight regression with this change.
Currently the script is only detecting direct copy sources of the
to-be-included set of paths, but not copy sources of copy sources.

I'm working on a fix for this problem that doesn't involve reverting
this change and still lets the script complete its task within a
reasonable amount of time.

> 
> Modified:
>     subversion/trunk/tools/server-side/svnpredumpfilter.py
> 
> Modified: subversion/trunk/tools/server-side/svnpredumpfilter.py
> URL: http://svn.apache.org/viewvc/subversion/trunk/tools/server-side/svnpredumpfilter.py?rev=1625674&r1=1625673&r2=1625674&view=diff
> ==============================================================================
> --- subversion/trunk/tools/server-side/svnpredumpfilter.py (original)
> +++ subversion/trunk/tools/server-side/svnpredumpfilter.py Wed Sep 17 17:02:33 2014
> @@ -204,7 +204,6 @@ def svn_log_stream_get_dependencies(stre
>                sanitize_path(match.group(2))
>          else:
>            break
> -      dt.handle_changes(path_copies)
>  
>      # Finally, skip any log message lines.  (If there are none,
>      # remember the last line we read, because it probably has
> @@ -221,6 +220,7 @@ def svn_log_stream_get_dependencies(stre
>                           "'svn log' with the --verbose (-v) option when "
>                           "generating the input to this script?")
>  
> +  dt.handle_changes(path_copies)
>    return dt
>  
>  def analyze_logs(included_paths):
>