You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Torsten Rueger <to...@hiit.fi> on 2004/04/28 11:19:36 UTC
3 way merge algorithm
Moi,
at the university I work, we have found a more efficient algorithm for
3 way merging. It's works better than diff3 especially in that it
handles moves. So it can merge cases where one person moves a piece of
text, while the other edits a subset of it. It also handles XML merges
surprisingly well.
Unfortunately I can not release code (which is ruby anyway), but I can
describe the algorithm, which is quite simple, and help anyone who
wants to implement it.
I would suggest it could be used for cases where diff3 fails, so as not
to rock the boat too much initially.
I'd really like to hear if anyone is interested in this, even quite
separate from wanting to implement it.
Below is a minimal description of the algorithm using a small xml
example,
Torsten
Original: <html><body>Stuff</body><head>Merge Example</head></html>
1 2 3 4 5 6 7 8 9
One: <html><head>Merge Example</head><body>Stuff</body></html>
1 <--5 6 7 8 <--2 3 4 <--9
Two: <html><body>Algorithm</body><head>Example</head></html>
1 2 <--10 <--4 5 <-- 7 8 9
Merge: <html><head>Example</head><body>Algorithm</body></html>
1 5 7 8 2 10 4 9
Matching phase: Find the string in One and Two to map to the original.
Add added strings
One: 1 5-8 2-4 9
Two 1 5 7,8 2 add:10 4 9
Merging phase: go backwards through original following the change
pattern:
Start with 9 in either
Go to 4, because of change in One
Go to 10 because of change in Two
Go to 2 because of change in Two
Go to 8 because of change in One
Go to 7 because no change in original order
Go to 5 because of change in Two
Go to 1 because of change in One
Output in reverse order and get Merge!
While going through the original matches of the matching phase, one has
to recognise the "changes" inside the matches. It's either that or
splitting all matches into the non overlapping pieces that are the
numbers. This second option has proven to be more complicated.
Matching is done al la xdelta, by splitting files into chunks,
calculating hashes for each chunk. Then looking for equal hashes and
expanding the match as far as possible. At the end one can add the
strings in the "gaps" that have not been matched.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: 3 way merge algorithm
Posted by kf...@collab.net.
Branko Čibej <br...@xbc.nu> writes:
> >We use an internal diff/patch library now, in Subversion.
>
> To be pedantic, we don't actually use an internal patch
> implementation; that is, you can't apply a patch with "svn patch", or
> anything.
Sure. What I meant was: since svn can update to receive repos changes
into a locally modified file, we clearly have some sort of internal
patch implementation. Whether we call it "patch" or not, or make it
available as a subcommand, doesn't change that.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: 3 way merge algorithm
Posted by Branko Čibej <br...@xbc.nu>.
kfogel@collab.net wrote:
>Torsten Rueger <to...@hiit.fi> writes:
>
>
>>and patch process should result in less conflicts. Even if you do it
>>on a line basis. But as I understand you use external patch, so that's
>>out of the question.
>>
>>
>
>We use an internal diff/patch library now, in Subversion.
>
>
To be pedantic, we don't actually use an internal patch implementation;
that is, you can't apply a patch with "svn patch", or anything.
-- Brane
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: 3 way merge algorithm
Posted by kf...@collab.net.
Torsten Rueger <to...@hiit.fi> writes:
> and patch process should result in less conflicts. Even if you do it
> on a line basis. But as I understand you use external patch, so that's
> out of the question.
We use an internal diff/patch library now, in Subversion.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: 3 way merge algorithm
Posted by Torsten Rueger <to...@hiit.fi>.
On 3 May 2004, at 16:38, kfogel@collab.net wrote:
> Sure. I wasn't talking about persuading your boss to pay you for your
> time spent working on someone else's problem, but rather persuading
> him that it's okay (i.e, doesn't cost him anything) for other projects
> to use algorithms and code already written by his team.
>
Ah, ok. You're right I misunderstood that. Unfortunately my code is of
no use to subversion, because
-- its ruby
-- it's in memory (not streaming)
-- it's not production code and never will be
I'll have to stop on the topic soon-ish and move on to calendaring, but
I'll try and get our sponsors interested,
Thanks for the interest and the info with your patch adjusting. I've
though about that some more and now think that the problem that solves,
is one we don't have.
It really is specific to version control to merge (partial) changes
across branches. Even in distributed synchronisation always the whole
changes will be merged and a common ancestor created.
Also it is clear that the algorithm I showed, does not help for that
kind of use cases that are discussed in the link you sent.
But still, the approach of using copy and add instruction is the diff
and patch process should result in less conflicts. Even if you do it on
a line basis. But as I understand you use external patch, so that's out
of the question.
Torsten
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: 3 way merge algorithm
Posted by kf...@collab.net.
Torsten Rueger <to...@hiit.fi> writes:
> > Would it help if some of us sent letters saying how it
> > could make a real difference to our project if the restrictions were
> > lifted? I'd certainly be happy to do so, if you think it would.
>
> Thank's for the offer. But I don't think it would help. Our project is
> about data synchronisation, and while I have long seen the basic
> equivalence of source control and (multi-peer) synchronisation, my
> boss has not. So improving a source control system is at this point
> not on the agenda. I have been, and will continue to work on the issue
> though.
Sure. I wasn't talking about persuading your boss to pay you for your
time spent working on someone else's problem, but rather persuading
him that it's okay (i.e, doesn't cost him anything) for other projects
to use algorithms and code already written by his team.
Of course, we'd love to have as much of your time as possible, too... :-)
-Karl
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: 3 way merge algorithm
Posted by Torsten Rueger <to...@hiit.fi>.
On 29 Apr 2004, at 20:59, kfogel@collab.net wrote:
> It's too bad that a university (of all things!)
We can release code, just a year after writing. That off course is
little use in ongoing development. Exceptions can be made, if interest
is there, but it isn't.
> Do you know why they have this policy, if there's any chance of
> changing it?
It's department policy and the fact that we can publish at all is seen
as a great benefit. In other departments it's different.
> Would it help if some of us sent letters saying how it
> could make a real difference to our project if the restrictions were
> lifted? I'd certainly be happy to do so, if you think it would.
Thank's for the offer. But I don't think it would help. Our project is
about data synchronisation, and while I have long seen the basic
equivalence of source control and (multi-peer) synchronisation, my boss
has not. So improving a source control system is at this point not on
the agenda. I have been, and will continue to work on the issue though.
> You probably have a lot to read already in your job, so sorry for
> piling a new URL on your plate :-).
No way, thanks for read. It's actually a different problem, one we
haven't even addressed or thought about yet.
It sounds like a thorough and workable plan you have there.
The algorithm I had there avoids the topic of externalised patch
formats completely. So it can't apply any patches to different versions
and the whole topic you solved there, is never addressed. Good to see
it though.
Thanks
Torsten
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: 3 way merge algorithm
Posted by kf...@collab.net.
Torsten Rueger <to...@hiit.fi> writes:
> at the university I work, we have found a more efficient algorithm for
> 3 way merging. It's works better than diff3 especially in that it
> handles moves. So it can merge cases where one person moves a piece of
> text, while the other edits a subset of it. It also handles XML merges
> surprisingly well.
>
> Unfortunately I can not release code (which is ruby anyway), but I can
> describe the algorithm, which is quite simple, and help anyone who
> wants to implement it.
> I would suggest it could be used for cases where diff3 fails, so as
> not to rock the boat too much initially.
It's too bad that a university (of all things!) restricts you from
releasing code, and from contributing to other projects (as you
mentioned in a later mail) :-(. Sigh.
Do you know why they have this policy, if there's any chance of
changing it? Would it help if some of us sent letters saying how it
could make a real difference to our project if the restrictions were
lifted? I'd certainly be happy to do so, if you think it would.
By the way, the algorithm sounds, at first glance, like the one
described in
http://subversion.tigris.org/variance-adjusted-patching.html
You probably have a lot to read already in your job, so sorry for
piling a new URL on your plate :-).
-Karl
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: 3 way merge algorithm
Posted by Torsten Rueger <to...@hiit.fi>.
On 29 Apr 2004, at 06:50, Greg Hudson wrote:
> the future. So, if you could write something up in more detail, I'd
> appreciate it.
I'll probably write a paper on it in in summer. I can post that here.
> Also, if you can't release code, it can be tough to be
> sure that one is implementing the algorithm correctly.
True. But then it's so fresh that I don't even know that 100%. I only
have 40 ish test cases.
Still, I was offering to help with that. That is i can read code and
comment. Just our license make write over 10 lines difficult.
And I can spend some time with this, say say 1-3 hours a day for the
next 2 months, so quite substantial.
> Based on your description (which I could not totally understand),
I had hoped it was easy, it is quite simple.
But please do ask, off course I left out details.
> I'm concerned that the algorithm may have O(n) memory requirements for
> n-byte inputs.
This is true. Currently it has all 3 files in memory, plus hashes. But
it can be cut down to 2 at a time.
> We need our diff algorithm to be able to operate
> streamily, reading only sequential chunks of the inputs into memory at
> one time.
That can be done if necessary, though I don't see why. I was not
proposing this as a diff/patch alternative (network or storage
encoding), just for a 3 way merging. And initially only when diff3
fails.
Still it can be done.
Thank you for being interested. I've been using subversion for a year
now and it's great. I'm talking everyone I can into it.
For me subversion has gotten rid of these unnecessary restrictions,
especially file moves. I'd be glad if this algorithm could help to get
rid of quite unnecessary merge conflicts in subversion.
Torsten
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: 3 way merge algorithm
Posted by Torsten Rueger <to...@hiit.fi>.
On 29 Apr 2004, at 14:15, Torsten Rueger wrote:
> The first part of the algorithm is basically like xdelta. That is
> described in chapter 3.1 of
> http://www.dexa.org/dexa2004/index.php?include=main.php
Oops, copy paste error. I meant:
http://prdownloads.sourceforge.net/xdelta/xdfs.pdf?download
Torsten
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: 3 way merge algorithm
Posted by Torsten Rueger <to...@hiit.fi>.
On 29 Apr 2004, at 06:50, Greg Hudson wrote:
> So, if you could write something up in more detail, I'd
> appreciate it.
The first part of the algorithm is basically like xdelta. That is
described in chapter 3.1 of
http://www.dexa.org/dexa2004/index.php?include=main.php
The second part really is as simple as it's said in my description, but
it maybe just needs some time to sink in. I didn't get it immediately
anyway.
Torsten
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: 3 way merge algorithm
Posted by Greg Hudson <gh...@MIT.EDU>.
On Wed, 2004-04-28 at 07:19, Torsten Rueger wrote:
> I'd really like to hear if anyone is interested in this, even quite
> separate from wanting to implement it.
I'm interested. I can't guarantee that I'll put time into it, but I'd
at least like to be able to stash an algorithm description somewhere for
the future. So, if you could write something up in more detail, I'd
appreciate it. Also, if you can't release code, it can be tough to be
sure that one is implementing the algorithm correctly.
Based on your description (which I could not totally understand), I'm
concerned that the algorithm may have O(n) memory requirements for
n-byte inputs. We need our diff algorithm to be able to operate
streamily, reading only sequential chunks of the inputs into memory at
one time.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org