You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Daniel Becroft <Da...@supercorp.com.au> on 2008/09/07 23:05:23 UTC

Unexpected Merge Behaviour

Hi all,
 
Apologies for posting on the developer list. I have posted this on the
users@ originally, but did not get any explanation. I thought I would go
to the 'source' (no pun intended), to try and get an explanation.
 
=============================================
 
I have only recently started playing around with SVN's merge
capabilities, and have found something strange (SVN 1.5.2 on Windows
XP).
 
I have checked-out a fresh, unmodified copy of a branch in the
repository.
 
The structure is similar to this:
 
    \alpha
        \A
            \B
                \C.txt
    \beta
        ...
    \gamma
        ...
 
I am attempting to cherry-pick a single revision from trunk and apply it
to said working copy. The revision (r1234) only modifies a single file
(C.txt above), but when I run the following command:
 
    svn merge -c 1234 svn://localhost/repo/trunk .
 
the svn process appears to be reading EVERY file in the working copy,
including those located under the \beta and \gamma directories. As a
result, the merge takes approximately 3 minutes to complete. (I used
sysinternal's FileMonitor utility to determine this). When the working
copy contains upwards of 12,000 files, this can extremely slow.
 
This performance does not sound so bad, but when using the 'multiple
revisions' option for the -C command, like so:
 
    svn merge -C 1234,1236,1239,1255 svn://localhost/repo/trunk .
 
it takes approximate 3 minutes PER REVISION - it seems to do a loop
through the revisions and do the same thing for each.
 
However, I have found that if I structure the merge command such that
the merge is being done on the subdirectory, then the problem is
minimized (all files under alpha/A/B are read, but not /beta or /gamma).
 
    svn merge -c 1234 svn://localhost/repo/trunk/alpha/A/B ./alpha/A/B
 
Using either version of the merge command, the only files/directories
that are modified are the parent directory, and C.txt.
 
The only explanation I got from users@ was that this might be the
'elliding' functionality.
 
Is this expected behavior for svn merge? Why is it reading completely
unrelated files when attempting a merge, rather than looking at the
changes being applied first? Is this the 'elliding' feature, and if so,
why is it doing this?
 
Cheers,
Daniel B.

Re: Unexpected Merge Behaviour

Posted by Neels Hofmeyr <ne...@elego.de>.
"
Hey Daniel,

it is best to checkout subtrees of a large repository. For example
  svn checkout http://my.server/my_repos/some/sub/dir
will checkout only `dir', speeding up your subsequent svn calls.

As you have observed, running svn in a subdir of a large checkout has the
same effect. That's because svn does only operate on subdirectories, not on
parents (usually). But I humbly prefer separate working copies.

To tell you more, I'd also need the actual svn commands to reproduce the
problem and the complete output seen (try to copy-and-paste).
"

...This is the answer users@ should have given. About whether svn could be
made faster in this case, it seems that dev@ doesn't know what to say. Let
me try to reply from a developer's point of view.

Version control is extremely complex and so is subversion. Mostly, if you
want to know reasons for something you have to dig them up from the logs. It
seems that no-one has your answer ready at the moment.

Furthermore, subversion's behaviour does, eventually, give the right
results. So there's no pressing reason to abandon the current developments
and dive into this one...

You are welcome to investigate this issue yourself or have someone do it for
you. dev@ will be happy to assist!

Have you searched the archives for this issue? Otherwise this might qualify
for a new feature request in the issue tracker (which must have been voted
on before submitting). http://subversion.tigris.org/issue-tracker.html

Thanks for your interest,

~Neels


Daniel Becroft wrote:
> Hi all,
>  
> Apologies for posting on the developer list. I have posted this on the
> users@ originally, but did not get any explanation. I thought I would go
> to the 'source' (no pun intended), to try and get an explanation.
>  
> =============================================
>  
> I have only recently started playing around with SVN's merge
> capabilities, and have found something strange (SVN 1.5.2 on Windows
> XP).
>  
> I have checked-out a fresh, unmodified copy of a branch in the
> repository.
>  
> The structure is similar to this:
>  
>     \alpha
>         \A
>             \B
>                 \C.txt
>     \beta
>         ...
>     \gamma
>         ...
>  
> I am attempting to cherry-pick a single revision from trunk and apply it
> to said working copy. The revision (r1234) only modifies a single file
> (C.txt above), but when I run the following command:
>  
>     svn merge -c 1234 svn://localhost/repo/trunk .
>  
> the svn process appears to be reading EVERY file in the working copy,
> including those located under the \beta and \gamma directories. As a
> result, the merge takes approximately 3 minutes to complete. (I used
> sysinternal's FileMonitor utility to determine this). When the working
> copy contains upwards of 12,000 files, this can extremely slow.
>  
> This performance does not sound so bad, but when using the 'multiple
> revisions' option for the -C command, like so:
>  
>     svn merge -C 1234,1236,1239,1255 svn://localhost/repo/trunk .
>  
> it takes approximate 3 minutes PER REVISION - it seems to do a loop
> through the revisions and do the same thing for each.
>  
> However, I have found that if I structure the merge command such that
> the merge is being done on the subdirectory, then the problem is
> minimized (all files under alpha/A/B are read, but not /beta or /gamma).
>  
>     svn merge -c 1234 svn://localhost/repo/trunk/alpha/A/B ./alpha/A/B
>  
> Using either version of the merge command, the only files/directories
> that are modified are the parent directory, and C.txt.
>  
> The only explanation I got from users@ was that this might be the
> 'elliding' functionality.
>  
> Is this expected behavior for svn merge? Why is it reading completely
> unrelated files when attempting a merge, rather than looking at the
> changes being applied first? Is this the 'elliding' feature, and if so,
> why is it doing this?
>  
> Cheers,
> Daniel B.
> 

-- 
Neels Hofmeyr -- elego Software Solutions GmbH
Gustav-Meyer-Allee 25 / Gebäude 12, 13355 Berlin, Germany
phone: +49 30 23458696  mobile: +49 177 2345869  fax: +49 30 23458695
http://www.elegosoft.com | Geschäftsführer: Olaf Wagner | Sitz: Berlin
Handelsreg: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: DE163214194


RE: Unexpected Merge Behaviour

Posted by Daniel Becroft <Da...@supercorp.com.au>.
<Moved to users@subversion.tigris.org>

> -----Original Message-----
> From: Mark Phippard [mailto:markphip@gmail.com] 
> Sent: Tuesday, 9 September 2008 11:36 AM
> To: Daniel Becroft
> Cc: dev@subversion.tigris.org
> Subject: Re: Unexpected Merge Behaviour
> 
> On Sun, Sep 7, 2008 at 7:05 PM, Daniel Becroft 
> <Da...@supercorp.com.au> wrote:
> 
> > Apologies for posting on the developer list. I have posted 
> this on the 
> > users@ originally, but did not get any explanation. I 
> thought I would 
> > go to the 'source' (no pun intended), to try and get an explanation.
> 
> You should really just re-raise the issue on users@ then.

Apologies, Mark. I have redirected this reply to the users@ list.

> > =============================================
> >
> > I have only recently started playing around with SVN's merge 
> > capabilities, and have found something strange (SVN 1.5.2 
> on Windows XP).
> >
> > I have checked-out a fresh, unmodified copy of a branch in 
> the repository.
> >
> > The structure is similar to this:
> >
> >     \alpha
> >         \A
> >             \B
> >                 \C.txt
> >     \beta
> >         ...
> >     \gamma
> >         ...
> >
> > I am attempting to cherry-pick a single revision from trunk 
> and apply 
> > it to said working copy. The revision (r1234) only modifies 
> a single 
> > file (C.txt above), but when I run the following command:
> >
> >     svn merge -c 1234 svn://localhost/repo/trunk .
> >
> > the svn process appears to be reading EVERY file in the 
> working copy, 
> > including those located under the \beta and \gamma 
> directories. As a 
> > result, the merge takes approximately 3 minutes to 
> complete. (I used 
> > sysinternal's FileMonitor utility to determine this). When 
> the working 
> > copy contains upwards of 12,000 files, this can extremely slow.
> 
> This is the normal behavior.  It has to scan the mergeinfo of 
> the WC before the merge to communicate to the server what you 
> have so that the server knows what deltas to send.
> 
> 
> > This performance does not sound so bad, but when using the 
> 'multiple 
> > revisions' option for the -C command, like so:
> >
> >     svn merge -C 1234,1236,1239,1255 svn://localhost/repo/trunk .
> >
> > it takes approximate 3 minutes PER REVISION - it seems to do a loop 
> > through the revisions and do the same thing for each.
> 
> Yes, that is exactly how it works.  The syntax is 
> convenience, the behavior is no different than executing 
> three commands.

Okay, thanks. I guess that explains why the --dry-run command is
unreliable when using the revision list.

> > However, I have found that if I structure the merge command 
> such that 
> > the merge is being done on the subdirectory, then the problem is 
> > minimized (all files under alpha/A/B are read, but not 
> /beta or /gamma).
> >
> >     svn merge -c 1234 svn://localhost/repo/trunk/alpha/A/B 
> ./alpha/A/B
> >
> > Using either version of the merge command, the only 
> files/directories 
> > that are modified are the parent directory, and C.txt.
> 
> As you say, the behavior is the same, but by focusing the 
> tree you are just narrowing down the part it needs to look 
> at.  It is worth pointing out though that this will create 
> the mergeinfo at that part of the tree, which is exactly the 
> sort of situation the original scan has to look for.  I am 
> not saying that you should not use this as your technique.  
> But the point is that if you did it this way one time, and 
> another time you used the root URL you would still expect the 
> merge to work.  So it has to determine what is in your 
> working copy and what revisions have been merged to the 
> various subtrees.

Thanks, Mark, this makes perfect sense. I guess I just expected the
process to be more like this:

A) Obtain a list of the files changes in r1234 (similar to svn diff -c
--summarise 1234)
	- Returns alpha\A\B\C.txt

B) Scan the working copy to determine if these files require merging.
	- Would scan the C.txt file and parent directories only.

C) Communicate to the server to obtain the delta.
	- Get delta required for C.txt.

It seems that the merge process (at the moment) jumps straight to Step
B, and in my case, scans far more than is necessary. 
Unfortunately, it does have an extra round-trip to consider. My
situation is probably rare, and that adding the extra server call would
actually hurt the performance, rather than improve it.
 
> The good news is that this is an area that could see huge 
> benefits from a working copy rewrite.  There is no need to 
> scan the tree, just to get all the mergeinfo for the tree.  
> If this could all be retrieved from a single central location 
> the overhead of doing this would be fairly minimal.  It just 
> so happens that in the current working copy design it is very 
> non-performant.

Sounds good.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org


Re: Unexpected Merge Behaviour

Posted by Mark Phippard <ma...@gmail.com>.
On Sun, Sep 7, 2008 at 7:05 PM, Daniel Becroft
<Da...@supercorp.com.au> wrote:

> Apologies for posting on the developer list. I have posted this on the
> users@ originally, but did not get any explanation. I thought I would go to
> the 'source' (no pun intended), to try and get an explanation.

You should really just re-raise the issue on users@ then.


> =============================================
>
> I have only recently started playing around with SVN's merge capabilities,
> and have found something strange (SVN 1.5.2 on Windows XP).
>
> I have checked-out a fresh, unmodified copy of a branch in the repository.
>
> The structure is similar to this:
>
>     \alpha
>         \A
>             \B
>                 \C.txt
>     \beta
>         ...
>     \gamma
>         ...
>
> I am attempting to cherry-pick a single revision from trunk and apply it to
> said working copy. The revision (r1234) only modifies a single file (C.txt
> above), but when I run the following command:
>
>     svn merge -c 1234 svn://localhost/repo/trunk .
>
> the svn process appears to be reading EVERY file in the working copy,
> including those located under the \beta and \gamma directories. As a result,
> the merge takes approximately 3 minutes to complete. (I used sysinternal's
> FileMonitor utility to determine this). When the working copy contains
> upwards of 12,000 files, this can extremely slow.

This is the normal behavior.  It has to scan the mergeinfo of the WC
before the merge to communicate to the server what you have so that
the server knows what deltas to send.


> This performance does not sound so bad, but when using the 'multiple
> revisions' option for the -C command, like so:
>
>     svn merge -C 1234,1236,1239,1255 svn://localhost/repo/trunk .
>
> it takes approximate 3 minutes PER REVISION - it seems to do a loop through
> the revisions and do the same thing for each.

Yes, that is exactly how it works.  The syntax is convenience, the
behavior is no different than executing three commands.


> However, I have found that if I structure the merge command such that the
> merge is being done on the subdirectory, then the problem is minimized (all
> files under alpha/A/B are read, but not /beta or /gamma).
>
>     svn merge -c 1234 svn://localhost/repo/trunk/alpha/A/B ./alpha/A/B
>
> Using either version of the merge command, the only files/directories that
> are modified are the parent directory, and C.txt.

As you say, the behavior is the same, but by focusing the tree you are
just narrowing down the part it needs to look at.  It is worth
pointing out though that this will create the mergeinfo at that part
of the tree, which is exactly the sort of situation the original scan
has to look for.  I am not saying that you should not use this as your
technique.  But the point is that if you did it this way one time, and
another time you used the root URL you would still expect the merge to
work.  So it has to determine what is in your working copy and what
revisions have been merged to the various subtrees.

The good news is that this is an area that could see huge benefits
from a working copy rewrite.  There is no need to scan the tree, just
to get all the mergeinfo for the tree.  If this could all be retrieved
from a single central location the overhead of doing this would be
fairly minimal.  It just so happens that in the current working copy
design it is very non-performant.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org