You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Torsten Rueger <to...@hiit.fi> on 2004/04/28 11:19:36 UTC

3 way merge algorithm

Moi,

at the university I work, we have found a more efficient algorithm for 
3 way merging. It's works better than diff3 especially in that it 
handles moves. So it can merge cases where one person moves a piece of 
text, while the other edits a subset of it. It also handles XML merges 
surprisingly well.

Unfortunately I can not release code (which is ruby anyway), but I can 
describe the algorithm, which is quite simple, and help anyone who 
wants to implement it.
I would suggest it could be used for cases where diff3 fails, so as not 
to rock the boat too much initially.

I'd really like to hear if anyone is interested in this, even quite 
separate from wanting to implement it.

Below is a minimal description of the algorithm using a small xml 
example,

Torsten


Original:  <html><body>Stuff</body><head>Merge Example</head></html>
              1     2     3     4      5    6     7      8       9

One:       <html><head>Merge Example</head><body>Stuff</body></html>
              1  <--5     6      7      8  <--2     3    4   <--9

Two:       <html><body>Algorithm</body><head>Example</head></html>
              1     2   <--10    <--4     5 <-- 7      8      9

Merge:     <html><head>Example</head><body>Algorithm</body></html>
              1     5     7       8      2     10       4     9

Matching phase: Find the string in One and Two to map to the original. 
Add added strings
          One:  1   5-8   2-4  9
          Two   1   5     7,8    2  add:10  4 9
Merging phase: go backwards through original following the change 
pattern:
       Start with 9 in either
       Go to 4, because of change in One
       Go to 10 because of change in Two
       Go to 2 because of change in Two
       Go to 8 because of change in One
       Go to 7 because no change in original order
       Go to 5 because of change in Two
       Go to 1 because of change in One

Output in reverse order and get Merge!

While going through the original matches of the matching phase, one has 
to recognise the "changes" inside the matches. It's either that or 
splitting all matches into the non overlapping pieces that are the 
numbers. This second option has proven to be more complicated.

Matching is done al la xdelta, by splitting files into chunks, 
calculating hashes for each chunk. Then looking for equal hashes and 
expanding the match as far as possible. At the end one can add the 
strings in the "gaps" that have not been matched.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: 3 way merge algorithm

Posted by kf...@collab.net.
Branko Čibej <br...@xbc.nu> writes:
> >We use an internal diff/patch library now, in Subversion.
>
> To be pedantic, we don't actually use an internal patch
> implementation; that is, you can't apply a patch with "svn patch", or
> anything.

Sure.  What I meant was: since svn can update to receive repos changes
into a locally modified file, we clearly have some sort of internal
patch implementation.  Whether we call it "patch" or not, or make it
available as a subcommand, doesn't change that.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: 3 way merge algorithm

Posted by Branko Čibej <br...@xbc.nu>.
kfogel@collab.net wrote:

>Torsten Rueger <to...@hiit.fi> writes:
>  
>
>>and patch process should result in less conflicts. Even if you do it
>>on a line basis. But as I understand you use external patch, so that's
>>out of the question.
>>    
>>
>
>We use an internal diff/patch library now, in Subversion.
>  
>
To be pedantic, we don't actually use an internal patch implementation; 
that is, you can't apply a patch with "svn patch", or anything.

-- Brane




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: 3 way merge algorithm

Posted by kf...@collab.net.
Torsten Rueger <to...@hiit.fi> writes:
> and patch process should result in less conflicts. Even if you do it
> on a line basis. But as I understand you use external patch, so that's
> out of the question.

We use an internal diff/patch library now, in Subversion.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: 3 way merge algorithm

Posted by Torsten Rueger <to...@hiit.fi>.
On 3 May 2004, at 16:38, kfogel@collab.net wrote:
> Sure.  I wasn't talking about persuading your boss to pay you for your
> time spent working on someone else's problem, but rather persuading
> him that it's okay (i.e, doesn't cost him anything) for other projects
> to use algorithms and code already written by his team.
>

Ah, ok. You're right I misunderstood that. Unfortunately my code is of 
no use to subversion, because
-- its ruby
-- it's in memory (not streaming)
-- it's not production code and never will be 

I'll have to stop on the topic soon-ish and move on to calendaring, but 
I'll try and get our sponsors interested,

Thanks for the interest and the info with your patch adjusting. I've 
though about that some more and now think that the problem that solves, 
is one we don't have.
It really is specific to version control to merge (partial) changes 
across branches. Even in distributed synchronisation always the whole 
changes will be merged and a common ancestor created.
Also it is clear that the algorithm I showed, does not help for that 
kind of use cases that are discussed in the link you sent.

But still, the approach of using copy and add instruction is the diff 
and patch process should result in less conflicts. Even if you do it on 
a line basis. But as I understand you use external patch, so that's out 
of the question.

Torsten


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: 3 way merge algorithm

Posted by kf...@collab.net.
Torsten Rueger <to...@hiit.fi> writes:
> > Would it help if some of us sent letters saying how it
> > could make a real difference to our project if the restrictions were
> > lifted?  I'd certainly be happy to do so, if you think it would.
> 
> Thank's for the offer. But I don't think it would help. Our project is
> about data synchronisation, and while I have long seen the basic
> equivalence of source control and (multi-peer) synchronisation, my
> boss has not. So improving a source control system is at this point
> not on the agenda. I have been, and will continue to work on the issue
> though.

Sure.  I wasn't talking about persuading your boss to pay you for your
time spent working on someone else's problem, but rather persuading
him that it's okay (i.e, doesn't cost him anything) for other projects
to use algorithms and code already written by his team.

Of course, we'd love to have as much of your time as possible, too... :-)

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: 3 way merge algorithm

Posted by Torsten Rueger <to...@hiit.fi>.
On 29 Apr 2004, at 20:59, kfogel@collab.net wrote:
> It's too bad that a university (of all things!)
We can release code, just a year after writing. That off course is 
little use in ongoing development. Exceptions can be made, if interest 
is there, but it isn't.

> Do you know why they have this policy, if there's any chance of
> changing it?
It's department policy and the fact that we can publish at all is seen 
as a great benefit. In other departments it's different.

> Would it help if some of us sent letters saying how it
> could make a real difference to our project if the restrictions were
> lifted?  I'd certainly be happy to do so, if you think it would.

Thank's for the offer. But I don't think it would help. Our project is 
about data synchronisation, and while I have long seen the basic 
equivalence of source control and (multi-peer) synchronisation, my boss 
has not. So improving a source control system is at this point not on 
the agenda. I have been, and will continue to work on the issue though.

> You probably have a lot to read already in your job, so sorry for
> piling a new URL on your plate :-).

No way, thanks for  read. It's actually a different problem, one we 
haven't even addressed or thought about yet.
It sounds like a thorough and workable plan you have there.

The algorithm I had there avoids the topic of externalised patch 
formats completely. So it can't apply any patches to different versions 
and the whole topic you solved there, is never addressed. Good to see 
it though.

Thanks
   Torsten


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: 3 way merge algorithm

Posted by kf...@collab.net.
Torsten Rueger <to...@hiit.fi> writes:
> at the university I work, we have found a more efficient algorithm for
> 3 way merging. It's works better than diff3 especially in that it
> handles moves. So it can merge cases where one person moves a piece of
> text, while the other edits a subset of it. It also handles XML merges
> surprisingly well.
> 
> Unfortunately I can not release code (which is ruby anyway), but I can
> describe the algorithm, which is quite simple, and help anyone who
> wants to implement it.
> I would suggest it could be used for cases where diff3 fails, so as
> not to rock the boat too much initially.

It's too bad that a university (of all things!) restricts you from
releasing code, and from contributing to other projects (as you
mentioned in a later mail) :-(.  Sigh.

Do you know why they have this policy, if there's any chance of
changing it?  Would it help if some of us sent letters saying how it
could make a real difference to our project if the restrictions were
lifted?  I'd certainly be happy to do so, if you think it would.

By the way, the algorithm sounds, at first glance, like the one
described in

   http://subversion.tigris.org/variance-adjusted-patching.html

You probably have a lot to read already in your job, so sorry for
piling a new URL on your plate :-).

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: 3 way merge algorithm

Posted by Torsten Rueger <to...@hiit.fi>.
On 29 Apr 2004, at 06:50, Greg Hudson wrote:

> the future.  So, if you could write something up in more detail, I'd
> appreciate it.
I'll probably write a paper on it in in summer. I can post that here.

>  Also, if you can't release code, it can be tough to be
> sure that one is implementing the algorithm correctly.
True. But then it's so fresh that I don't even know that 100%. I only 
have 40 ish test cases.

Still, I was offering to help with that. That is i can read code and 
comment. Just our license make write over 10 lines difficult.
And I can spend some time with this, say say 1-3 hours a day for the 
next 2 months, so quite substantial.

> Based on your description (which I could not totally understand),
I had hoped it was easy, it is quite simple.
But please do ask, off course I left out details.

> I'm concerned that the algorithm may have O(n) memory requirements for
> n-byte inputs.
This is true. Currently it has all 3 files in memory, plus hashes. But 
it can be cut down to 2 at a time.

> We need our diff algorithm to be able to operate
> streamily, reading only sequential chunks of the inputs into memory at
> one time.

That can be done if necessary, though I don't see why. I was not 
proposing this as a diff/patch alternative (network or storage 
encoding), just for a 3 way merging. And initially only when diff3 
fails.

Still it can be done.

Thank you for being interested. I've been using subversion for a year 
now and it's great. I'm talking everyone I can into it.
For me subversion has gotten rid of these unnecessary restrictions, 
especially file moves. I'd be glad if this algorithm could help to get 
rid of quite unnecessary merge conflicts in subversion.

Torsten


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: 3 way merge algorithm

Posted by Torsten Rueger <to...@hiit.fi>.
On 29 Apr 2004, at 14:15, Torsten Rueger wrote:
> The first part of the algorithm is basically like xdelta. That is 
> described in chapter 3.1 of
> http://www.dexa.org/dexa2004/index.php?include=main.php

Oops, copy paste error. I meant:

http://prdownloads.sourceforge.net/xdelta/xdfs.pdf?download

Torsten


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: 3 way merge algorithm

Posted by Torsten Rueger <to...@hiit.fi>.
On 29 Apr 2004, at 06:50, Greg Hudson wrote:

> So, if you could write something up in more detail, I'd
> appreciate it.

The first part of the algorithm is basically like xdelta. That is 
described in chapter 3.1 of
http://www.dexa.org/dexa2004/index.php?include=main.php

The second part really is as simple as it's said in my description, but 
it maybe just needs some time to sink in. I didn't get it immediately 
anyway.


Torsten


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: 3 way merge algorithm

Posted by Greg Hudson <gh...@MIT.EDU>.
On Wed, 2004-04-28 at 07:19, Torsten Rueger wrote:
> I'd really like to hear if anyone is interested in this, even quite 
> separate from wanting to implement it.

I'm interested.  I can't guarantee that I'll put time into it, but I'd
at least like to be able to stash an algorithm description somewhere for
the future.  So, if you could write something up in more detail, I'd
appreciate it.  Also, if you can't release code, it can be tough to be
sure that one is implementing the algorithm correctly.

Based on your description (which I could not totally understand), I'm
concerned that the algorithm may have O(n) memory requirements for
n-byte inputs.  We need our diff algorithm to be able to operate
streamily, reading only sequential chunks of the inputs into memory at
one time.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org