You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Stefan Fuhrmann <st...@wandisco.com> on 2013/05/27 14:14:19 UTC

What makes merges slow in /trunk

Hi all,

Since merge will be an important topic for 1.9, I ran a quick
test to see what how we are doing for small projects like SVN.
Please note that my local mirror of the ASF repo is about 2
months behind (r1457326) - in case you want to verify my data.

Summary:

Merges can be very slow and might take hours or even days
to complete for large projects. The client-side merge strategy
inflates the load on both sides by at least a factor of 10 in the
chosen scenario. Without addressing this issue, it will be hard
to significantly improve merge performance in said case but
addressing it requires a different strategy.

Test Scenario:

A catch-up merge after various cherry picks.
$svn co svn://server/apache/subversion/branches/1.7.x .
$svn merge svn://server/apache/subversion/trunk . --accept working

Findings:

Server vs. client performance
* With all caches being cold, both operations are limited by
  server-side I/O (duh!). FSFS-f7 is 3 .. 4 times faster and
  should be able to match the client speed within the next
  two months or so.
* With OS file caches hot, merge is client-bound with the
  client using about 2x as much CPU as the server.
* With server caches hot, the ratio becomes 10:1.
* The client is not I/O bound (working copy on RAM disk).

How slow is it?
Here fastest run for merge with hot server caches. Please
note that SVN is a mere 2k files project. Add two zeros for
large projects:
real    1m16.334s
user    0m45.212s
sys    0m17.956s

Difference between "real" and "user"
* Split roughly evenly between client and server / network
* The individual client-side functions are relatively efficient
  or else the time spent in the client code would dwarf the
  OS and server / network contributions.

Why is it slow?
* We obviously request, transfer and process too much data,
  800MB for a 45MB user data working copy:
  RX packets:588246 (av 1415 B) TX packets:71592 (av 148 B)
  RX bytes:832201156 (793.6 MB) TX bytes:10560937 (10.1 MB)
* A profile shows that most CPU time is either directly spent
  to process those 800MB (e.g. MD5) or well distributed over
  many "reasonable" functions like running status etc with
  each contributing 10% or less.
* Root cause: we run merge 169 times, i.e. merge that many
  revision ranges and request ~7800 files from the server. That
  is not necessary for most files most of the time.

Necessary strategic change:
* We need to do the actual merging "space major" instead of
  "revision mayor".
* Determine tree conflicts in advance. Depending on the
  conflict resolution scheme, set the upper revision for the whole
  merge to the conflicting revision
* Determine per-node revision ranges to merge.
* Apply ranges ordered by their upper number, lowest one first.
* In case of unresolvable conflicts, merge all other nodes up to
  the revision that caused the conflict.

If there have been no previous cherry pick merges, the above
strategy should be roughly equivalent to what we do right now.
In my test scenario, it should reduce the number of files requested
from the server by a factor or 5 .. 10. Operations like log and
status would need to be run only once, maybe twice. So, it
seems reasonable to expect a 10-fold speedup on the client
side and also a much lower server load.

Fixing this scenario will drastically change the relative time
spent for virtually all operations involved. So, before fixing it,
it is not clear where local tuning should start.

-- Stefan^2.

-- 
*Join one of our free daily demo sessions on* *Scaling Subversion for the
Enterprise <http://www.wandisco.com/training/webinars>*
*

*

Re: What makes merges slow in /trunk

Posted by Julian Foad <ju...@btopenworld.com>.
Stefan Fuhrmann writes:

> Hi all,
> 
> Since merge will be an important topic for 1.9, I ran a quick
> test to see what how we are doing for small projects like SVN.
> Please note that my local mirror of the ASF repo is about 2
> months behind (r1457326) - in case you want to verify my data.
> 
> 
> Summary:
> 
> Merges can be very slow and might take hours or even days
> to complete for large projects. The client-side merge strategy
> inflates the load on both sides by at least a factor of 10 in the
> chosen scenario. Without addressing this issue, it will be hard
> to significantly improve merge performance in said case but
> addressing it requires a different strategy.

Thanks for doing a real experiment and measuring it.


> Test Scenario:
> 
> A catch-up merge after various cherry picks.
> $svn co svn://server/apache/subversion/branches/1.7.x .
> $svn merge svn://server/apache/subversion/trunk . --accept working
> 
> 
> Findings:
> 
> Server vs. client performance
> 
> * With all caches being cold, both operations are limited by
>   server-side I/O (duh!). FSFS-f7 is 3 .. 4 times faster and
>   should be able to match the client speed within the next
>   two months or so.
> 
> * With OS file caches hot, merge is client-bound with the
>   client using about 2x as much CPU as the server.
> 
> * With server caches hot, the ratio becomes 10:1.
> 
> * The client is not I/O bound (working copy on RAM disk).
> 
> 
> How slow is it?
> Here fastest run for merge with hot server caches. Please
> note that SVN is a mere 2k files project. Add two zeros for
> large projects:
> real    1m16.334s
> user    0m45.212s
> sys    0m17.956s
> 
> 
> Difference between "real" and "user"
> 
> * Split roughly evenly between client and server / network
> 
> * The individual client-side functions are relatively efficient
>   or else the time spent in the client code would dwarf the
>   OS and server / network contributions.
> 
> 
> Why is it slow?
> 
> * We obviously request, transfer and process too much data,
>   800MB for a 45MB user data working copy:
[...]

That's a really embarrassing statistic!

> * A profile shows that most CPU time is either directly spent
>   to process those 800MB (e.g. MD5) or well distributed over
>   many "reasonable" functions like running status etc with
>   each contributing 10% or less.
>
> * Root cause: we run merge 169 times, i.e. merge that many
>   revision ranges and request ~7800 files from the server. That
>   is not necessary for most files most of the time.
> 
> Necessary strategic change:
> 
> * We need to do the actual merging "space major" instead of
>   "revision mayor".
> 
> * Determine tree conflicts in advance. Depending on the 
>   conflict resolution scheme, set the upper revision for the whole
>   merge to the conflicting revision
> 
> * Determine per-node revision ranges to merge.


I agree, and I want to change our merge strategy to something roughly like this.  It would give us not only faster raw performance, but also fewer conflicts and so 
less manual intervention, because this would not break the merge of any 
given file into more revision ranges than needed for that file [1].

> * Apply ranges ordered by their upper number, lowest one first.
> 
> * In case of unresolvable conflicts, merge all other nodes up to
>   the revision that caused the conflict.


I am not sure about this part.  I think here you're trying to approximate the current "failure" mode when conflicts are encountered: that is, leave the WC in a state where all needed changes up to some uniform revision number have been merged across the whole tree.  (At least the current mode is *something* like that.)  While that is helpful for examining and understanding the state of project files, in order to resolve the conflict(s), it does break the nice idea of having each node merged in as few revision ranges as possible, and so it can introduce more conflicts.  That seems to me to be a sufficiently strong indication that it's the wrong thing to do.  We should try to find a better way here.



> If there have been no previous cherry pick merges, the above
> strategy should be roughly equivalent to what we do right now.

> 

> In my test scenario, it should reduce the number of files requested
> from the server by a factor or 5 .. 10. Operations like log and
> status would need to be run only once, maybe twice. So, it
> seems reasonable to expect a 10-fold speedup on the client
> side and also a much lower server load.
>
> Fixing this scenario will drastically change the relative time
> spent for virtually all operations involved. So, before fixing it,
> it is not clear where local tuning should start.

Yup.


[1] I hope we're all familiar with the idea that, if you need to merge two successive changes (C1, C2) into a given file, then merging C1 and then C2 separately creates more likelihood of a conflict than if you merge just the single combined change (C1+C2) in one go.  (I don't have a proof handy.)


- Julian