You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@subversion.apache.org by st...@apache.org on 2014/09/17 19:02:33 UTC

svn commit: r1625674 - /subversion/trunk/tools/server-side/svnpredumpfilter.py

Author: stsp
Date: Wed Sep 17 17:02:33 2014
New Revision: 1625674

URL: http://svn.apache.org/r1625674
Log:
Fix a big scalability problem in the implementation of svnpredumpfilter.py.

The script kept re-computing the set of additional include paths while
mining the log history for copied paths. Each re-computation involved
a full iteration of the set of copies accumulated so far, which made
the run time explode on large repositories.
Instead, we can gather all copies first, and then iterate them at once.

In my testing this change reduces the runtime of svnpredumpfilter.py on
a 64GB large dump file of the FreeBSD repository (up to r271458) from
several days(!) to 1.5 minutes.

* tools/server-side/svnpredumpfilter.py
  (svn_log_stream_get_dependencies): Run dt.handle_changes() once the log
   history has been fully scanned, not for each revision.

Modified:
    subversion/trunk/tools/server-side/svnpredumpfilter.py

Modified: subversion/trunk/tools/server-side/svnpredumpfilter.py
URL: http://svn.apache.org/viewvc/subversion/trunk/tools/server-side/svnpredumpfilter.py?rev=1625674&r1=1625673&r2=1625674&view=diff
==============================================================================
--- subversion/trunk/tools/server-side/svnpredumpfilter.py (original)
+++ subversion/trunk/tools/server-side/svnpredumpfilter.py Wed Sep 17 17:02:33 2014
@@ -204,7 +204,6 @@ def svn_log_stream_get_dependencies(stre
               sanitize_path(match.group(2))
         else:
           break
-      dt.handle_changes(path_copies)
 
     # Finally, skip any log message lines.  (If there are none,
     # remember the last line we read, because it probably has
@@ -221,6 +220,7 @@ def svn_log_stream_get_dependencies(stre
                          "'svn log' with the --verbose (-v) option when "
                          "generating the input to this script?")
 
+  dt.handle_changes(path_copies)
   return dt
 
 def analyze_logs(included_paths):



Re: svn commit: r1625674 - /subversion/trunk/tools/server-side/svnpredumpfilter.py

Posted by Stefan Sperling <st...@apache.org>.
On Wed, Sep 17, 2014 at 05:02:33PM -0000, stsp@apache.org wrote:
> Author: stsp
> Date: Wed Sep 17 17:02:33 2014
> New Revision: 1625674
> 
> URL: http://svn.apache.org/r1625674
> Log:
> Fix a big scalability problem in the implementation of svnpredumpfilter.py.
> 
> The script kept re-computing the set of additional include paths while
> mining the log history for copied paths. Each re-computation involved
> a full iteration of the set of copies accumulated so far, which made
> the run time explode on large repositories.
> Instead, we can gather all copies first, and then iterate them at once.
> 
> In my testing this change reduces the runtime of svnpredumpfilter.py on
> a 64GB large dump file of the FreeBSD repository (up to r271458) from
> several days(!) to 1.5 minutes.
> 
> * tools/server-side/svnpredumpfilter.py
>   (svn_log_stream_get_dependencies): Run dt.handle_changes() once the log
>    history has been fully scanned, not for each revision.

It is possible that there is a slight regression with this change.
Currently the script is only detecting direct copy sources of the
to-be-included set of paths, but not copy sources of copy sources.

I'm working on a fix for this problem that doesn't involve reverting
this change and still lets the script complete its task within a
reasonable amount of time.

> 
> Modified:
>     subversion/trunk/tools/server-side/svnpredumpfilter.py
> 
> Modified: subversion/trunk/tools/server-side/svnpredumpfilter.py
> URL: http://svn.apache.org/viewvc/subversion/trunk/tools/server-side/svnpredumpfilter.py?rev=1625674&r1=1625673&r2=1625674&view=diff
> ==============================================================================
> --- subversion/trunk/tools/server-side/svnpredumpfilter.py (original)
> +++ subversion/trunk/tools/server-side/svnpredumpfilter.py Wed Sep 17 17:02:33 2014
> @@ -204,7 +204,6 @@ def svn_log_stream_get_dependencies(stre
>                sanitize_path(match.group(2))
>          else:
>            break
> -      dt.handle_changes(path_copies)
>  
>      # Finally, skip any log message lines.  (If there are none,
>      # remember the last line we read, because it probably has
> @@ -221,6 +220,7 @@ def svn_log_stream_get_dependencies(stre
>                           "'svn log' with the --verbose (-v) option when "
>                           "generating the input to this script?")
>  
> +  dt.handle_changes(path_copies)
>    return dt
>  
>  def analyze_logs(included_paths):
>