You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by William Uther <wi...@cs.cmu.edu> on 2002/08/09 16:06:40 UTC

Saving merge history (was: Re: merge should copy-with-history)

On Friday, August 9, 2002, at 05:54  AM, Nuutti Kotivuori wrote:

[snip]
> Or, you can tell me that "In the future,
> we can show the log messages from the branch at that point as well,
> when we record merge sources." That's fine, too - I'll just say "Give
> me an option for which added and modified files behave the same when
> it's implemented, and I'll be happy."

I think this is the answer.  However, one thing still worries me.  I 
don't believe svn is storing the merge data yet.  This means that any 
merges you do with this code will be tricky to fix even once svn deals 
with merge history - they will not have merge history stored.

If I were to make a patch that had svn record merge history, but not use 
it, would it have any chance of being applied?  It seems like a small 
change with little chance of breaking anything.  (Someone previously 
mentioned that they would be -1 on a patch for the whole repeated merge 
thing before 1.0 because there wouldn't be time to discuss it properly.  
I think the data you need to store is actually pretty simple and 
wouldn't need much discussion - it is the easy half of the problem.  It 
could always be converted later anyway.)

Anyway here is a proposed format:

When you merge data into a file, the merge occurs normally except for 
the svn:merge property.  The svn:merge property is not merged, but 
instead a single line is appended to it:

rev-A URL-A rev-B URL-B

rev-A is the start rev of the merge (or * if this is an 'add with 
history' merge - the file didn't exist in revA)
URL-A is the start URL of the file that was merged (this is a * if rev-A 
is a *)
rev-B is the end rev of the merge
URL-B is the end URL of the file that was merged (use relative URLs so 
that, if the repos moves, nothing breaks)

There is one detail that I think is important:  If merge copies a 
directory that didn't exist in revA (ie the dir was added with history) 
then you want to add the merge property to each file individually.  This 
does not affect the time complexity as all those files are being copied 
into your working copy anyway.

This is storing all the information passed on the command line, so I 
don't think it leaves out any information.

later,

\x/ill           :-}

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Saving merge history (was: Re: merge should copy-with-history)

Posted by Karl Fogel <kf...@newton.ch.collab.net>.

Branko Čibej <br...@xbc.nu> writes:
> If we implemented this proposal now, we'd just have to rip it out
> later. Right now, our repeated-merge problem is no worse than CVS's,
> and is a bit better in places -- the possible exception is our use of
> diff3 instead of an (the?) integrated diff library. It think that is
> sufficient for 1.0.

Agree in general, but we should also keep an open mind about an
incremental solution here.  For example, an initial implementation of
merge history preservation might record the stated
revision-range/paths of the merge (i.e., the ones given by the user),
and not know anything about the diff-hunk level.  This gives us a very
close approximation of the true merge history, modulo any local mods
made to the merge before the commit.

I guess what I'm saying is: I don't think we're going to solve this
whole problem at once.  But if we're careful, we can make sure that
each step we take contributes to the steps we take later.

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Saving merge history (was: Re: merge should copy-with-history)

Posted by Nuutti Kotivuori <na...@iki.fi>.

brane@xbc.nu wrote:
> Hm, that is a good point. In fact, I'm inclined to say this point is
> more important than getting 'svn log' "right" (for some definition
> of "right" -- I think it's more right to show too many logs than to
> show too few, witch is so not right that it's wrong :-).

Yeah, it is more important than getting 'svn log' "right". Just that
it is said somewhere that 'svn log' output currently is not "right".

And to yet again restate my problem with all this -
inconsistency. Show too many or show too few, but do it similarily for
every file.

> Let's talk about this a bit more, then.

Yeah, let's :-)

-- Naked

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Making merge histories reproducable for v1

Posted by Bill Tutt <ra...@lyra.org>.


Ok folks. I was going to just reply to the current thread when I
realized that I should just start a new thread and explain the idea I
have for v1 in this space.

Goal: To easily store merge history in v1.0, and ensure that
dump/copy/reload can reproduce this post-v1.0 so that whatever schema we
end up having then can do whatever it wants with the data.

High level overview:

The problem essentially breaks down into two interesting sub-problems:

1) Recording merge history related to file/directory modifications.
2) Handling new files/directories in the source of the tree merge into
the destination.

Proposal for #1:

An ancestor set like representation for file merges is identical to what
Branko, and Will have been talking about. That is, append a line to the
"svn:merge" property every time a merge happens.

e.g.: 

Proposal for #2:

While conferring with Branko about the necessary changes required to
make these kinds of merge histories reproducible, we've come to the
conclusion that whatever way you slice it, we need a slight change in
Subversion's data model.

I'll present a couple of complete alternatives for handling this problem
so the folks who might end up doing the work for 1.0 can pick between
various implementation ideas, or use the below to come up with their own
idea. 

However, first I'll present the semantics I'd like to see happen for
this particular subcase of a merge. Here after known as the merge-copy.

Definitions:
Merge: 
v. merged, merg-ing, merg-es 
v. tr.
1. To cause to be absorbed, especially in gradual stages. 
2. To combine or unite: merging two sets of data.

v. intr.
1. To blend together, especially in gradual stages. 
2. To become combined or united. See Synonyms at mix.

Source: www.dictionary.com: The American Heritage Dictionary of the
English Language, Fourth Edition.

Ideal Proposed Semantics:

As you can guess from the above definition, merges are about swallowing
up changes from source branches and uniting them with the target branch.


New files/directories fall into three distinct subclasses:
1) Complex copy destinations, (this sadly includes moved
files/directories)
You could call these locations branch points if you like the branch
metaphor. 
2) Files.
3) Directories.
4) A simply renamed file. 

Here's an example simple file rename: "svn mv ./fileA ./fileB"

The distinguishing characteristic about a simple file rename is that the
destination of the move operation is the same directory. 

i.e.:
void IsSimpleRename(NodeRevision nr)
{
  NodeRevision CopyDest = lookupCopyDestination(nr.CopyID);
  If (nr.NodeID == CopyDest.NodeID
      && strcmp(basename(currentPathToNR),
basename(commitedPathToCopyDest)) 
         == 0)
  { return true; }
  else { return false; }
}

Sadly, we can't easily distinguish between copies and non-simple moves
because we've already lost that data.

This is probably the most annoying part of Subversion's current data
model. :(

Now that we have our 4 classes of merge-able entities, let me explain
how I'd ideally like to handle them.

1) Complex Copy Destinations
NewNodeRevisionPK = OldNodeID.NewCopyID.CurrentTxnID

The reasoning for this is simple. We're merging in a branch so therefore
the merged entity should also be a branch.

2) Files
3) Directories
4) Simply Renamed File
All 3 of these cases thankfully have identical semantics:

NewNodeRevisionPK = OldNodeID.ParentDestCopyID.CurrentTxnID

This is a little trickier to understand, so I'll run through a couple of
examples.

e.g.:
svn add trunk/file1; svn commit
svn cp trunk branch; svn commit
svn add branch/file2; svn commit
svn merge branch trunk; svn commit

The 'ParentDestCopyID' in the above rule means that 'trunk/file2' after
the merge will have trunk's CopyID. 

e.g.: A more complicated example
svn add trunk/file1; svn commit
svn cp trunk branch; svn commit
svn add branch/dir2; svn commit
svn add branch/dir2/file2; svn commit
svn add branch/dir2/file3; svn commit
svn edit branch/file1; svn commit
svn merge branch trunk; svn commit

The merge operation will commit one merge-copy operation, and one normal
merge modification change operation.

trunk/file1 will have a "svn:merge" property with an appropriate value
for branch/file1.

trunk/dir2 will have OldDir2NodeID.TrunkCopyID.CurrentTxnID as its
primary key.

The reason we're doing things this way, is because we're bringing items
into our existing branch. (aka merge) 

See, Copy is synonymous with Branch, how about that.... ;)

Preserving Reproducibility:
	"Keeping Merge Information through a Dump/Load Cycle"

We need to record these merge-copies as being merge-copies someplace.
Otherwise, we won't be able to tell later how these newly inserted rows
got there.

So, let's add the data to the table where all data like this belongs.

i.e.:

   CHANGE ::= ("change" PATH ID CHANGE-KIND TEXT-MOD PROP-MOD) ;
   CHANGE-KIND ::= "add" | "delete" | "replace" | "modify" |
"merge-copy" ;
   TEXT-MOD ::= atom ;
   PROP-MOD ::= atom ;

When CHANGE-KIND is "merge-copy" PATH, and ID are the destination PATH,
and the new NodeRevisionPK value.

If they want to know where the data was merged from they can look at the
listed ID's predecessor data, and lookup the path via the Changes table.
:)

If the merge-copy was a branch point, then the new CopyID gets inserted
into the Transaction table's list of copies.

What about 1.0, I'm not doing all of this before 1.0?:
	"Make less work for Mike, Brane, etc.."

Skipping handling simple file renames is an easy cop out of course, but
let's take a look at Will's idea.

Will's Idea Restated:
Instead of using the ParentDestCopyID, introduce a new row in the Copy
table, but call it a merge instead of a copy. This is equivalent to my
change to the Changes table. 

However, he didn't specify any semantics after that. In order for my
ideal semantics to hold, the "choose_copy_id" logic would have to be
subtly changed to ignore merge point CopyIDs. Otherwise, non-merge
related files underneath this new directory would inherit this new
CopyID which doesn't make any sense. Merging doesn't create more
branches, merges should slide into the branch they're uniting with. 

Now even if we want to leave the choose_copyid logic as is, it's true
that with the new distinguishing characteristic of merge vs. copy in the
Copy table that we do have enough information to go back and fixup the
data post-1.0.

Anyone with free time want to pick up this work?

I hope this is easier to follow then my previous multi-page
explanations.

Crossing his fingers,
Bill



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Saving merge history

Posted by Branko Čibej <br...@xbc.nu>.

William Uther wrote:

> I'm all for storing more information.  There are currently three 
> different "log formats":  The original, the current, and the one 
> described in Issue #820.  I think the current format is better than 
> the one before the copy-with-history patch.  I think the Issue #820 
> one is better still (It doesn't lose any information, and adds 
> consistency).
>
> To sum up the changes I think we need to make to store merge info (in 
> O(1) space):
>
> i) Files that are modified during a merge get a line added to their 
> svn:merged property.  This stores enough to re-create the command line.

Do you want to just recreate the sequence of merge commands, or just 
record enough information to generate _equivalent_ merge commands? Your 
solution is sufficient for the first case, but it wastes space in 
general. For example:

    $ svn merge -r1:4 branch trunk
    $ svn merge -r3:7 branch trunk

With your solution, svn:merged on trunk/* would look like this:

    branch 1:4
    branch 3:7

A more compact representation, that would give the same result (as far 
as we know without finer granulation!) would merge those ranges into

    branch 1:7

That's what I'd propose: svn:merge contains one line per merge source 
(filesystem path, not URL), with a compressed list of relevant ranges. 
After a more complex set of merges:

    $ svn merge -r1:4 foo trunk
    $ svn merge -r3:4 bar trunk
    $ svn merge -r3:4 qux trunk
    $ svn merge -r7:12 foo trunk
    $ svn merge -r4:8 qux trunk

you'd get:

    bar 3:4
    foo 1:4 7:12
    qux 3:8

> ii) Files/Dirs that are copied during a merge get a "merge" line in 
> the copies table, not a "copy" line.  Pre-1.0 these two could be 
> treated identically, although that would change post-1.0.  (see 
> http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgNo=19798)
>
> To make part ii) work, the difference between a merge and a copy needs 
> to be added to the wire protocol somewhere to get the information back 
> to the server, and the dump/load format needs to change to record the 
> difference between the two types of copies.

Are you certain that would be enough to "fix" 'svn log' for you? I'm 
asking because I don't know the details of how 'svn log' follows copy 
history. I certainly don't see a problem with this, but I'd like to see 
comments from our Schema King first.

> At the moment I'm waiting to see how issue #838 resolves, because that 
> affects how change ii) would be made.

Fixing #838 will involve changing the use of svn_client_copy (back) to 
svn_wc_add (with history), for now. Bill Tutt proposed a change in the 
way CopyIDs are produced for merges and lazy copies, but that would 
entail a pretty big libsvn_fs change, and I'm not sure it's worth it.

Anyway, I fail to see how #838 affects (ii), because the semantics 
(e.g., merge does a copy behind the scenes) must remain, IMNSHO.

> Do you still think this is not enough information?

I suppose this could be a workable first step, yes. Care to write the 
spec, and implement it? :-)

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Saving merge history (was: Re: merge should copy-with-history)

Posted by William Uther <wi...@cs.cmu.edu>.

On Monday, August 12, 2002, at 05:25  PM, Branko Čibej wrote:

> William Uther wrote:
>>
>> On Monday, August 12, 2002, at 07:50  AM, brane@xbc.nu wrote:
>>
>>> You have to be able to track individual diff hunks, not just whole 
>>> files. For
>>> example, after a merge each diff hunk in the merge candidates can be 
>>> in one of
>>> the following states:
>>
>> [snip - states]
>>
>> Okie - I see this as internal to the delta combiner.  I don't see why 
>> we need to store merge information on this level.
>
> You misunderstand. The delta combiner doesn't need this information. We 
> need it for variance-adjusted (a.k.a genetic) merging.

Heh, but variance-adjusted merging can be built on top of the 
delta-combiner... I wrote an email to the dev list about this back in 
february :).  
(http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgId=162878)

(Although I don't know much about svndiff, so that email may be a little 
naive.  I think it is pretty close though.)

later,

\x/ill        :-}

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Saving merge history (was: Re: merge should copy-with-history)

Posted by Branko Čibej <br...@xbc.nu>.

William Uther wrote:

>
> On Monday, August 12, 2002, at 07:50  AM, brane@xbc.nu wrote:
>
>> You have to be able to track individual diff hunks, not just whole 
>> files. For
>> example, after a merge each diff hunk in the merge candidates can be 
>> in one of
>> the following states:
>
> [snip - states]
>
> Okie - I see this as internal to the delta combiner.  I don't see why 
> we need to store merge information on this level.


You misunderstand. The delta combiner doesn't need this information. We 
need it for variance-adjusted (a.k.a genetic) merging.

>> What you propose only gives you "merged" and "ignored" info for whole 
>> file contents.
>
>
> Er, it gives you enough to recreate the original svn command line, and 
> hence enough to do anything else.  Unless you are planning to save 
> information about exactly how the user resolved conflicts (more than 
> just looking at the result of that resolution, which you have), then 
> this has all the information.


This is exactly why we'd use this information.

>>> I don't see that tracking individual diff
>>> hunks is necessary though - just a modified form of the delta combiner
>>
>>                               ^^^^
>>
>> Oh, right. You're absolutely right. A changeset engine _is_ a 
>> modified form of
>> the delta combiner. The trouble is that the "just" happens to hide a 
>> slightly
>> huge amount of work. :-)
>
>
> I agree that USING this information is a lot of work.  And there is 
> quite possibly a format that would help that.  That is post-1.0 though.


No, producing it in a useful format isn't trivial, either. My example 
with the hunk states only applies to merging. There are all sorts of 
things you have to know in order to produce a minimal, non-conflicting 
set of changesets -- very much like what BK can do.

But let's leave that for pos-1.0.

(I'll reply to the rest of your post in a separate message.)


-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Saving merge history (was: Re: merge should copy-with-history)

Posted by William Uther <wi...@cs.cmu.edu>.

On Monday, August 12, 2002, at 07:50  AM, brane@xbc.nu wrote:

> Quoting William Uther <wi...@cs.cmu.edu>:
>> The proposed change saves enough information to re-create the original
>> merge command to svn.  What else does 'real changesets' require?
>
> You have to be able to track individual diff hunks, not just whole 
> files. For
> example, after a merge each diff hunk in the merge candidates can be in 
> one of
> the following states:
[snip - states]

Okie - I see this as internal to the delta combiner.  I don't see why we 
need to store merge information on this level.

> What you propose only gives you "merged" and "ignored" info for whole 
> file contents.

Er, it gives you enough to recreate the original svn command line, and 
hence enough to do anything else.  Unless you are planning to save 
information about exactly how the user resolved conflicts (more than 
just looking at the result of that resolution, which you have), then 
this has all the information.

>> I don't see that tracking individual diff
>> hunks is necessary though - just a modified form of the delta combiner
>                               ^^^^
>
> Oh, right. You're absolutely right. A changeset engine _is_ a modified 
> form of
> the delta combiner. The trouble is that the "just" happens to hide a 
> slightly
> huge amount of work. :-)

I agree that USING this information is a lot of work.  And there is 
quite possibly a format that would help that.  That is post-1.0 though.

>> If you don't save merge history now, even given that you don't use it 
>> in
>> 1.0, then you are losing information.  The logs on those copied dirs 
>> are
>> never going to be correct if you don't save the merge information now.
>> If there is an easy way to save that information, and even if we end up
>> re-formatting it later, saving it would seem to be a good thing.
>
> Hm, that is a good point. In fact, I'm inclined to say this point is 
> more
> important than getting 'svn log' "right" (for some definition of 
> "right" -- I
> think it's more right to show too many logs than to show too few, witch 
> is so
> not right that it's wrong :-).

I'm all for storing more information.  There are currently three 
different "log formats":  The original, the current, and the one 
described in Issue #820.  I think the current format is better than the 
one before the copy-with-history patch.  I think the Issue #820 one is 
better still (It doesn't lose any information, and adds consistency).

> Let's talk about this a bit more, then.

To sum up the changes I think we need to make to store merge info (in 
O(1) space):

i) Files that are modified during a merge get a line added to their 
svn:merged property.  This stores enough to re-create the command line.
ii) Files/Dirs that are copied during a merge get a "merge" line in the 
copies table, not a "copy" line.  Pre-1.0 these two could be treated 
identically, although that would change post-1.0.  (see 
http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgNo=19798)

To make part ii) work, the difference between a merge and a copy needs 
to be added to the wire protocol somewhere to get the information back 
to the server, and the dump/load format needs to change to record the 
difference between the two types of copies.

At the moment I'm waiting to see how issue #838 resolves, because that 
affects how change ii) would be made.

Do you still think this is not enough information?

later,

\x/ill        :-}

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Saving merge history (was: Re: merge should copy-with-history)

Posted by br...@xbc.nu.

Quoting William Uther <wi...@cs.cmu.edu>:

> 
> On Sunday, August 11, 2002, at 09:03  PM, Branko Čibej wrote:
> 
> > Apart from merges and becoming O(N)? Well, I think it's a nice idea,
> 
> > but rather nave and deserves to be scuttled on its maiden voyage, like
> 
> > the Vasa. :-)
> >
> > What we should ask ourselves is this: What do we want merge history 
> > for? The simple answer is, "To avoid the repeated-merge problem." Now,
> 
> > while this solution tells us which versions of which files contributed
> 
> > to a file's contents, it doesn't give us enough information to 
> > implement real changesets.
> 
> Another reason to save merge history: getting the logs right.  See Issue
> #820.

Sure. I'm just not convinced they're wrong. :-)

> The proposed change saves enough information to re-create the original
> merge command to svn.  What else does 'real changesets' require?

You have to be able to track individual diff hunks, not just whole files. For
example, after a merge each diff hunk in the merge candidates can be in one of
the following states:

   - merged:   the hunk didn't cause a conflict, and the
               whole hunk is now in the merge target.
   - ignored:  the hunk didn't cause a conflict, but was
               not included in the merge target.
   - modified: the hunk didn't cause a conflict, but was
               modified in the merge target.
   - accepted: the hunk caused a conflict, but the whole
               hunk was chosen as the merge result.
   - rejected: the hunk caused a conflict, but the hunk
               from the other merge target was chosen.
   - resolved: the hunk caused a conflict, which was resolved
               by modifying (manually merging) both merge hunks.

The (merged, ignored) and (accepted, rejected) pairs are complementary, and the
set of states is symmetric WRT the conflicted/not conflicted boundary, yielding
the following table:

                | use all  | use none | use part
    ------------+----------+----------+----------
    no conflict | merged   | ignored  | modified
    ------------+----------+----------+----------
    conflict    | accepted | rejected | resolved


What you propose only gives you "merged" and "ignored" info for whole file contents.


> > That simply isn't good enough. We know (at least I know :-) that we'll
> 
> > have to at least simulate true changeset knowledge in order to get 
> > variance-adjusted merging. To do that, we'll need a way to track 
> > indivdual diff hunks, their relatedness and collisions. I have a
> strong 
> > feeling that this cannot be done efficiently in properties. The word
> 
> > that comes to my mind is "micro-branches", but that's all it is for 
> > now -- just a word, without much substance behind it. I suspect the 
> > schema modification that gives us efficient changeset management will
> 
> > also give us efficient distributed repositories, and a _lot_ of
> thought 
> > has to go into that (Bill, don't get hit by a car yet! :-).
> 
> I'll have to take your word for this...  I've also been thinking about
> distributed repositories, and I can see that you may be right in that it
> is related to merge history.  I don't see that tracking individual diff
> hunks is necessary though - just a modified form of the delta combiner
                              ^^^^

Oh, right. You're absolutely right. A changeset engine _is_ a modified form of
the delta combiner. The trouble is that the "just" happens to hide a slightly
huge amount of work. :-)

> (though I suspect you know a LITTLE bit more about this than me :).

Heh, I wish I did... I'm learning along the way, too.


> > If we implemented this proposal now, we'd just have to rip it out 
> > later. Right now, our repeated-merge problem is no worse than CVS's,
> > and is a bit better in places -- the possible exception is our use of
> > diff3 instead of an (the?) integrated diff library. It think that is
> 
> > sufficient for 1.0.
> 
> If you don't save merge history now, even given that you don't use it in
> 1.0, then you are losing information.  The logs on those copied dirs are
> never going to be correct if you don't save the merge information now. 
> If there is an easy way to save that information, and even if we end up
> re-formatting it later, saving it would seem to be a good thing.

Hm, that is a good point. In fact, I'm inclined to say this point is more
important than getting 'svn log' "right" (for some definition of "right" -- I
think it's more right to show too many logs than to show too few, witch is so
not right that it's wrong :-).

Let's talk about this a bit more, then.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Saving merge history (was: Re: merge should copy-with-history)

Posted by William Uther <wi...@cs.cmu.edu>.

On Sunday, August 11, 2002, at 09:03  PM, Branko Čibej wrote:

> Apart from merges and becoming O(N)? Well, I think it's a nice idea, 
> but rather nave and deserves to be scuttled on its maiden voyage, like 
> the Vasa. :-)
>
> What we should ask ourselves is this: What do we want merge history 
> for? The simple answer is, "To avoid the repeated-merge problem." Now, 
> while this solution tells us which versions of which files contributed 
> to a file's contents, it doesn't give us enough information to 
> implement real changesets.

Another reason to save merge history: getting the logs right.  See Issue 
#820.

The proposed change saves enough information to re-create the original 
merge command to svn.  What else does 'real changesets' require?

> That simply isn't good enough. We know (at least I know :-) that we'll 
> have to at least simulate true changeset knowledge in order to get 
> variance-adjusted merging. To do that, we'll need a way to track 
> indivdual diff hunks, their relatedness and collisions. I have a strong 
> feeling that this cannot be done efficiently in properties. The word 
> that comes to my mind is "micro-branches", but that's all it is for 
> now -- just a word, without much substance behind it. I suspect the 
> schema modification that gives us efficient changeset management will 
> also give us efficient distributed repositories, and a _lot_ of thought 
> has to go into that (Bill, don't get hit by a car yet! :-).

I'll have to take your word for this...  I've also been thinking about 
distributed repositories, and I can see that you may be right in that it 
is related to merge history.  I don't see that tracking individual diff 
hunks is necessary though - just a modified form of the delta combiner 
(though I suspect you know a LITTLE bit more about this than me :).

> If we implemented this proposal now, we'd just have to rip it out 
> later. Right now, our repeated-merge problem is no worse than CVS's, 
> and is a bit better in places -- the possible exception is our use of 
> diff3 instead of an (the?) integrated diff library. It think that is 
> sufficient for 1.0.

If you don't save merge history now, even given that you don't use it in 
1.0, then you are losing information.  The logs on those copied dirs are 
never going to be correct if you don't save the merge information now.  
If there is an easy way to save that information, and even if we end up 
re-formatting it later, saving it would seem to be a good thing.

later,

\x/ill        :-}

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Saving merge history (was: Re: merge should copy-with-history)

Posted by Branko Čibej <br...@xbc.nu>.

Karl Fogel wrote:

>Golly.
>
>That's pretty simple.  But I can't see anything wrong with it -- at
>the very worst, it preserves too much information (or rather fails to
>compress information), but we can always deal with that later.
>
>Thoughts, folks?  Brane, GregH? :-)
>  
>

Apart from merges and becoming O(N)? Well, I think it's a nice idea, but 
rather naïve and deserves to be scuttled on its maiden voyage, like the 
Vasa. :-)

What we should ask ourselves is this: What do we want merge history for? 
The simple answer is, "To avoid the repeated-merge problem." Now, while 
this solution tells us which versions of which files contributed to a 
file's contents, it doesn't give us enough information to implement real 
changesets.

That simply isn't good enough. We know (at least I know :-) that we'll 
have to at least simulate true changeset knowledge in order to get 
variance-adjusted merging. To do that, we'll need a way to track 
indivdual diff hunks, their relatedness and collisions. I have a strong 
feeling that this cannot be done efficiently in properties. The word 
that comes to my mind is "micro-branches", but that's all it is for now 
-- just a word, without much substance behind it. I suspect the schema 
modification that gives us efficient changeset management will also give 
us efficient distributed repositories, and a _lot_ of thought has to go 
into that (Bill, don't get hit by a car yet! :-).

If we implemented this proposal now, we'd just have to rip it out later. 
Right now, our repeated-merge problem is no worse than CVS's, and is a 
bit better in places -- the possible exception is our use of diff3 
instead of an (the?) integrated diff library. It think that is 
sufficient for 1.0.

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Saving merge history (was: Re: merge should copy-with-history)

Posted by William Uther <wi...@cs.cmu.edu>.

On Friday, August 9, 2002, at 12:46  PM, Karl Fogel wrote:

> I have one concern: Sure, we can use relative URLs so that the repos
> doesn't break, but then we're not preserving information that might be
> useful to support cross-repository merge history later on.  (Of
> course, "host.domain/path" is a pretty poor global unique identifier
> for repositories anyway, so I'm not saying this feature was ready for
> prime time or anything).  This isn't a showstopper: for one thing, if
> we ever _do_ support cross-repository stuff, we can teach clients to
> patch up old format svn:merge properties...

I had thought about this.  My assumption was that you could use full 
URLs for outside repositories.  You can patch it up during the copy.

> BUT, maybe this tells us a format version number at the beginning of
> the property would be good.  So the first line would be
>
>    "merge history format version: 1\n"
>
> or something, then the rest of the value would be as you describe.

Fair enough.  I'd simplify that to "version: 1\n\n" though.

later,

\x/ill        :-}


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Saving merge history (was: Re: merge should copy-with-history)

Posted by Karl Fogel <kf...@newton.ch.collab.net>.

William Uther <wi...@cs.cmu.edu> writes:
> If I were to make a patch that had svn record merge history, but not
> use it, would it have any chance of being applied?  It seems like a
> small change with little chance of breaking anything.  (Someone
> previously mentioned that they would be -1 on a patch for the whole
> repeated merge thing before 1.0 because there wouldn't be time to
> discuss it properly.  I think the data you need to store is actually
> pretty simple and wouldn't need much discussion - it is the easy half
> of the problem.  It could always be converted later anyway.)

I hadn't thought it was all that simple; if it turns out it is, we
should kick ourselves that we haven't been storing it all along :-).

> Anyway here is a proposed format:
> 
> When you merge data into a file, the merge occurs normally except for
> the svn:merge property.  The svn:merge property is not merged, but
> instead a single line is appended to it:
> 
> rev-A URL-A rev-B URL-B
> 
> rev-A is the start rev of the merge (or * if this is an 'add with
> history' merge - the file didn't exist in revA)
> URL-A is the start URL of the file that was merged (this is a * if
> rev-A
> is a *)
> rev-B is the end rev of the merge
> URL-B is the end URL of the file that was merged (use relative URLs so
> that, if the repos moves, nothing breaks)
> 
> There is one detail that I think is important:  If merge copies a
> directory that didn't exist in revA (ie the dir was added with
> history) then you want to add the merge property to each file
> individually.  This does not affect the time complexity as all those
> files are being copied into your working copy anyway.
> 
> This is storing all the information passed on the command line, so I
> don't think it leaves out any information.

Golly.

That's pretty simple.  But I can't see anything wrong with it -- at
the very worst, it preserves too much information (or rather fails to
compress information), but we can always deal with that later.

Thoughts, folks?  Brane, GregH? :-)

And it's nice that the value can be edited by hand.  If Subversion
doesn't understand the full implications of some local edits, the user
can patch up the property if necessary.

I have one concern: Sure, we can use relative URLs so that the repos
doesn't break, but then we're not preserving information that might be
useful to support cross-repository merge history later on.  (Of
course, "host.domain/path" is a pretty poor global unique identifier
for repositories anyway, so I'm not saying this feature was ready for
prime time or anything).  This isn't a showstopper: for one thing, if
we ever _do_ support cross-repository stuff, we can teach clients to
patch up old format svn:merge properties...

BUT, maybe this tells us a format version number at the beginning of
the property would be good.  So the first line would be 

   "merge history format version: 1\n"

or something, then the rest of the value would be as you describe.

-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Saving merge history (was: Re: merge should copy-with-history)

Posted by Karl Fogel <kf...@newton.ch.collab.net>.

William Uther <wi...@cs.cmu.edu> writes:
> - If you set it on each file then you have to make an individual copy
> of each file on the server (you are already making these copies in the
> wc) - that copy will be stored as a diff and will have a compact
> representation.

It doesn't matter if we store the new nodes compactly or otherwise.
The point is, we're creating new nodes where we formerly didn't.  The
O() of the operation has changed: it's now O(N), proportional to the
number of items in the copied tree, instead of O(1).

That's kind of a showstopper.

> I think my original proposal is reasonable.  The (slightly)
> inefficient part is if you create a large subdir on a branch and then
> merge it into another line of dev.  In this case things are still not
> that inefficient.  The content diffs will all be zero length - In
> effect this is just storing a node with the svn:merge property for
> each file.  This is exactly the information you need to store for each
> file.  And this use case doesn't often happen with large trees anyway.

That "this use case doesn't often happen" argument never works :-).
If it happens often enough to to make the benefits beneficial, then it
also happens often enough to make the harms harmful.

If the piece of information is "Tree PATH1:REV1, and everything
underneath it, was copied recursively to destination PATH2:REV2",
that's a very small bit of information.  To store it distributed
redundantly across multiple nodes may be formally correct, but I'm
hoping we can do better.

-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: Re: Saving merge history (was: Re: merge should copy-with-history)

Posted by Bill Tutt <ra...@lyra.org>.


Bill
----
Do you want a dangerous fugitive staying in your flat?
No.
Well, don't upset him and he'll be a nice fugitive staying in your flat.
 

> -----Original Message-----
> From: William Uther [mailto:will+@cs.cmu.edu]
> Sent: Friday, August 09, 2002 12:14 PM
> To: kfogel@collab.net
> Cc: Philip Martin; Nuutti Kotivuori; dev@subversion.tigris.org; Bill
Tutt;
> Branko Èibej
> Subject: Re: Saving merge history (was: Re: merge should
copy-with-history)
> 
> 
> On Friday, August 9, 2002, at 01:09  PM, Karl Fogel wrote:
> 
> > Philip Martin <ph...@codematters.co.uk> writes:
> >> This is what Will wrote:
> >>
> >>> There is one detail that I think is important:  If merge copies a
> >>> directory that didn't exist in revA (ie the dir was added with
> >>> history) then you want to add the merge property to each file
> >>> individually.  This does not affect the time complexity as all
those
> >>> files are being copied into your working copy anyway.
> >>
> >> The files get copied into the working copy, but they will show up
as
> >> status copied-with-history and unmodified, so only the parent needs
to
> >> be committed.  If we set svn:merge on the files then the files will
> >> need to be committed as well.
> >
> > Oh dear, yes.  We should only set the property on the directory that
> > was copied... But of course, that loses some of the usefulness of
> > having the property at all, for example if we're in a subtree of
that
> > directory later and need to retrieve merge history...
> >
> > Which, now that I think about it, is what led me to the long-ago
> > conclusion that this problem is not simple.  William, thoughts?
> 
> I see the issue.
> 
> - If you just set it on the top level dir, then you lose the info on
> sub-trees.  Bad.

Not necessarily... (more on that in a later email I'm composing)

> - Some form of inheritable properties might be possible, but would be
> tricky.
> - If you set it on each file then you have to make an individual copy
of
> each file on the server (you are already making these copies in the
> wc) - that copy will be stored as a diff and will have a compact
> representation.
> 
> I think my original proposal is reasonable.  The (slightly)
inefficient
> part is if you create a large subdir on a branch and then merge it
into
> another line of dev.  In this case things are still not that
> inefficient.  The content diffs will all be zero length - In effect
this
> is just storing a node with the svn:merge property for each file.
This
> is exactly the information you need to store for each file.  And this
> use case doesn't often happen with large trees anyway.
> 

In the short term, Will has it right, since the email I'm composing will
explain the consequences of wanting to do this the right way. ;)

This is the easiest way to do this change now. i.e. make merge non-O(1).

That does royally suck though.

More details in my subsequent email.

Bill


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Saving merge history (was: Re: merge should copy-with-history)

Posted by William Uther <wi...@cs.cmu.edu>.

On Friday, August 9, 2002, at 01:09  PM, Karl Fogel wrote:

> Philip Martin <ph...@codematters.co.uk> writes:
>> This is what Will wrote:
>>
>>> There is one detail that I think is important:  If merge copies a
>>> directory that didn't exist in revA (ie the dir was added with
>>> history) then you want to add the merge property to each file
>>> individually.  This does not affect the time complexity as all those
>>> files are being copied into your working copy anyway.
>>
>> The files get copied into the working copy, but they will show up as
>> status copied-with-history and unmodified, so only the parent needs to
>> be committed.  If we set svn:merge on the files then the files will
>> need to be committed as well.
>
> Oh dear, yes.  We should only set the property on the directory that
> was copied... But of course, that loses some of the usefulness of
> having the property at all, for example if we're in a subtree of that
> directory later and need to retrieve merge history...
>
> Which, now that I think about it, is what led me to the long-ago
> conclusion that this problem is not simple.  William, thoughts?

I see the issue.

- If you just set it on the top level dir, then you lose the info on 
sub-trees.  Bad.
- Some form of inheritable properties might be possible, but would be 
tricky.
- If you set it on each file then you have to make an individual copy of 
each file on the server (you are already making these copies in the 
wc) - that copy will be stored as a diff and will have a compact 
representation.

I think my original proposal is reasonable.  The (slightly) inefficient 
part is if you create a large subdir on a branch and then merge it into 
another line of dev.  In this case things are still not that 
inefficient.  The content diffs will all be zero length - In effect this 
is just storing a node with the svn:merge property for each file.  This 
is exactly the information you need to store for each file.  And this 
use case doesn't often happen with large trees anyway.

later,

\x/ill            :-}


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Saving merge history (was: Re: merge should copy-with-history)

Posted by Karl Fogel <kf...@newton.ch.collab.net>.

William Uther <wi...@cs.cmu.edu> writes:
> Remember that merges are already not O(1) time - the copy is already
> made in your wc.

I'm talking about the repository side, where it counts :-).

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Saving merge history (was: Re: merge should copy-with-history)

Posted by William Uther <wi...@cs.cmu.edu>.

[snip - discussion of storing properties for altered files.]
[snip - discussion noting that for newly created files you don't want to 
edit the prop-list of all of them, you want something like a copy]

On Friday, August 9, 2002, at 06:21  PM, Bill Tutt wrote:

>>> 2) We fold this wrinkle into our lazy mode of operation.
>>> Specifically: Initial edits of copies that originate as the result
> of a
>>> merge need the merge-specific ancestor set information added to
> their
>>> ancestor set.

Okie - I'm trying to figure this out.

I'm looking in: subversion/libsvn_fs/structure.  It looks like you 
should just be able to modify the copies table...

 From subversion/libsvn_fs/structure:
> The Berkeley DB `copies' table records all filesystem copies.  Every
> key in this table is copy ID, and every value is a skel of the form:
>
>    ("copy" SRC-PATH SRC-TXN DST-NODE-ID)

Replace "copy" with "merge" for copies that occur as the result of a 
merge.  This change should be backwards compatible.  Would this even 
require a dump/load?  This modification seems to store all the data you 
need.*

Having said that, making this change depends upon how both issues 838 
and 842 get resolved.

Issue 838 relevance: To get "merge" in the copies table instead of 
"copy", you need something heading over the wire specifying the 
difference.  As the way the copy information itself gets to the fs is 
still being worked on, it is hard to make changes :).

Issue 842 relevance: What happens when you edit a "merge" copied file?  
It may get a new copy table entry.  In that case, that entry needs to be 
a "merge" too, I think.  I'm don't yet have enough info on how 842 is 
being solved to figure this case out.

Anyway, I'm in a holding pattern looking at how those issues resolve.

later,

\x/ill          :-}

* While this stores all the data you need, it is not necessarily in an 
easily accessible form.  I'm now going to talk about that access... all 
of this should probably be post-1.0.

The result of making this post-1.0 is that the svn:merged properties 
will require some patching to be complete (they can be patched up when 
the stuff below is implemented).  The svn:merged properties will only 
contain the information about when already existent files were merged - 
they will not have any information about merge-copies.

I am assuming that from the point of view of everything above the fs 
layer, everything is stored in svn:merged properties (or an equivalent 
list of merges).  In that case, when a file is checked out, you'd need 
to check if any of the parent dirs were merge-copies.  If they were, 
then you append the correct merge line to the end of the svn:merged 
property as you check out - make a pseudo-property.  No theoretical 
problem there.

Things get interesting if you merge-copy a directory, and then re-copy 
that copy.  All the files beneath the final copy still need the 
pseudo-property added.  In that case you have to detect that you have a 
different copy-id from your parent, follow its copies table entry, and 
note that you STILL have a different copy-id.  So, you look up the 
copies table entry again and see that the entry was a merge, and add the 
pseudo-property.
The access code must keep chaining back through the copies table until 
it gets to the last time the file changed.  In that case you assume that 
when the file was modified the pseudo-properties were turned into real 
properties.

This seems potentially expensive.  In particular, every time you access 
a node beneath a copied tree, you need to follow the copies tables back 
until that node was last modified to collect any pseudo-properties.  
This needs to happen even for non-merged files as you cannot tell them 
from the merged ones.  It is a time/space trade-off: either you keep the 
properties up to date (my original proposal), or you need to calculate 
them.  (Actually, I guess this proposal is already somewhere between the 
extremes: I'm suggesting denormalizing the merge info when a file is 
modified.  You don't have to do that.)  If svn:merged properties were 
accessed separately from the other properties then at least you would 
only need to pay this cost when looking for merge information.

It may also be possible to cache some of this info in the copies table, 
but that's definitely a post-1.0 issue.  It would require some details 
to be worked out and a dump/load.  There is no need to think about this 
now.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: Re: Saving merge history (was: Re: merge should copy-with-history)

Posted by Bill Tutt <ra...@lyra.org>.

> From: William Uther [mailto:will+@cs.cmu.edu]
> 
> On Friday, August 9, 2002, at 03:48  PM, Bill Tutt wrote:
> >
> > There are two possible approaches.
> >
> > 1) We decide merges just aren't O(1).
> > i.e. we mark all of the children as copied-with-history
> > That's not so nice because it makes life complicated and creates a
bunch
> > of copies with just property changes.
> 
> Remember that merges are already not O(1) time - the copy is already
> made in your wc.
> 

What Karl said. We're talking about repository side, not WC side.

> > 2) We fold this wrinkle into our lazy mode of operation.
> > Specifically: Initial edits of copies that originate as the result
of a
> > merge need the merge-specific ancestor set information added to
their
> > ancestor set.
> >
> > I kind of like #2 myself.
> 
> 2) is more complex, although I can see how this might be implemented.
> It is much less complex than 'inheritable properties' in general.
> 
> You'd also need to be careful in this case that the right information
is
> updated when a file is changed.  In particular, I was assuming that
the
> URLs I mentioned were the URL of the file that was merged, not the URL
> used in the command line (which might be an enclosing directory).
> 

The WC representation of the data is going to be different than the
repository representation of the data if we pick #2.

> > Once we have #2 decided on
> 
> hehe :).  #2 is more space efficient, but significantly more complex.
> If it were me, I'd probably choose #1 under the 80-20 rule.
> 
> We could do #1 now and change to #2 later.  The change would require a
> dump/load, but would be backwards compatible...
> 
> Hmm - If we went with #2 we'd have to alter the dump/load format.
> 

Indeed.

> > then we can get on with figuring out how we
> > want to store star merge information, and how we want to store
ancestor
> > sets.
> 
> Um, how are these different from a series of pairwise merges?
> 
> I kinda want to keep this change small so it makes it into 1.0.
> 

Ancestor sets, are the repeated merge problem. Having a place to stick
merge predecessors, possibly multiple merge predecessors is what I mean
here.

Star merges aren't any different from a series of pairwise merges
effectwise. They just record all of those merges in one step.

>  > Alternatively, we could just for the short term have an alternative
#3:
>  >
>  > 3) Don't store copy-with-history on completely new directories in a
> merge.
> 
> Oh, you mean revert the recent "merge should copy-with-history" patch?
> *evil grin*
> 

No, just don't copy-with-history on a new directory. ;) i.e. keep the
good part with new files, but punt on the annoying part, new
directories.

> Essentially option #2 is making a new type of "copy-with-history",
that
> is more "merge-with-history".  It is correcting the semantic
difference
> I was pointing out previously.
> 

It's still not going to change the svn log output for you though. :)

FYI,
Bill


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Saving merge history (was: Re: merge should copy-with-history)

Posted by William Uther <wi...@cs.cmu.edu>.

On Friday, August 9, 2002, at 03:48  PM, Bill Tutt wrote:
>> Philip Martin <ph...@codematters.co.uk> writes:
>>> This is what Will wrote:
>>>
>>>> There is one detail that I think is important:  If merge copies a
>>>> directory that didn't exist in revA (ie the dir was added with
>>>> history) then you want to add the merge property to each file
>>>> individually.  This does not affect the time complexity as all
> those
>>>> files are being copied into your working copy anyway.
>>>
>>> The files get copied into the working copy, but they will show up as
>>> status copied-with-history and unmodified, so only the parent needs
> to
>>> be committed.  If we set svn:merge on the files then the files will
>>> need to be committed as well.
>
> Heh. Well, clearly this is just one more wrinkle in our lazy-ness ==
> complication syndrome.
>
> There are two possible approaches.
>
> 1) We decide merges just aren't O(1).
> i.e. we mark all of the children as copied-with-history
> That's not so nice because it makes life complicated and creates a bunch
> of copies with just property changes.

Remember that merges are already not O(1) time - the copy is already 
made in your wc.

> 2) We fold this wrinkle into our lazy mode of operation.
> Specifically: Initial edits of copies that originate as the result of a
> merge need the merge-specific ancestor set information added to their
> ancestor set.
>
> I kind of like #2 myself.

2) is more complex, although I can see how this might be implemented.  
It is much less complex than 'inheritable properties' in general.

You'd also need to be careful in this case that the right information is 
updated when a file is changed.  In particular, I was assuming that the 
URLs I mentioned were the URL of the file that was merged, not the URL 
used in the command line (which might be an enclosing directory).

> Once we have #2 decided on

hehe :).  #2 is more space efficient, but significantly more complex.  
If it were me, I'd probably choose #1 under the 80-20 rule.

We could do #1 now and change to #2 later.  The change would require a 
dump/load, but would be backwards compatible...

Hmm - If we went with #2 we'd have to alter the dump/load format.

> then we can get on with figuring out how we
> want to store star merge information, and how we want to store ancestor
> sets.

Um, how are these different from a series of pairwise merges?

I kinda want to keep this change small so it makes it into 1.0.

 > Alternatively, we could just for the short term have an alternative #3:
 >
 > 3) Don't store copy-with-history on completely new directories in a 
merge.

Oh, you mean revert the recent "merge should copy-with-history" patch?  
*evil grin*

Essentially option #2 is making a new type of "copy-with-history", that 
is more "merge-with-history".  It is correcting the semantic difference 
I was pointing out previously.

later,

\x/ill          :-}

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: RE: Re: Saving merge history (was: Re: merge should copy-with-history)

Posted by Bill Tutt <ra...@lyra.org>.

Alternatively, we could just for the short term have an alternative #3:

3) Don't store copy-with-history on completely new directories in a
merge.

FYI,
Bill


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: Re: Saving merge history (was: Re: merge should copy-with-history)

Posted by Bill Tutt <ra...@lyra.org>.

> From: Karl Fogel [mailto:kfogel@newton.ch.collab.net]
> 
> Philip Martin <ph...@codematters.co.uk> writes:
> > This is what Will wrote:
> >
> > > There is one detail that I think is important:  If merge copies a
> > > directory that didn't exist in revA (ie the dir was added with
> > > history) then you want to add the merge property to each file
> > > individually.  This does not affect the time complexity as all
those
> > > files are being copied into your working copy anyway.
> >
> > The files get copied into the working copy, but they will show up as
> > status copied-with-history and unmodified, so only the parent needs
to
> > be committed.  If we set svn:merge on the files then the files will
> > need to be committed as well.
> 
> Oh dear, yes.  We should only set the property on the directory that
> was copied... But of course, that loses some of the usefulness of
> having the property at all, for example if we're in a subtree of that
> directory later and need to retrieve merge history...
> 
> Which, now that I think about it, is what led me to the long-ago
> conclusion that this problem is not simple.  William, thoughts?
> 

Heh. Well, clearly this is just one more wrinkle in our lazy-ness ==
complication syndrome.

There are two possible approaches. 

1) We decide merges just aren't O(1).
i.e. we mark all of the children as copied-with-history
That's not so nice because it makes life complicated and creates a bunch
of copies with just property changes.

2) We fold this wrinkle into our lazy mode of operation.
Specifically: Initial edits of copies that originate as the result of a
merge need the merge-specific ancestor set information added to their
ancestor set.

I kind of like #2 myself. 

This implies:
* That we need someplace in the copy table to keep track of this
factoid.
I'd suggest keeping this fairly compact. We may/may not run into more of
these kinds of issues.

* Having an efficient storage location for this merge data is going to
be a good thing.

Once we have #2 decided on then we can get on with figuring out how we
want to store star merge information, and how we want to store ancestor
sets.

Bill

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Saving merge history (was: Re: merge should copy-with-history)

Posted by Karl Fogel <kf...@newton.ch.collab.net>.

Philip Martin <ph...@codematters.co.uk> writes:
> This is what Will wrote:
> 
> > There is one detail that I think is important:  If merge copies a
> > directory that didn't exist in revA (ie the dir was added with
> > history) then you want to add the merge property to each file
> > individually.  This does not affect the time complexity as all those
> > files are being copied into your working copy anyway.
> 
> The files get copied into the working copy, but they will show up as
> status copied-with-history and unmodified, so only the parent needs to
> be committed.  If we set svn:merge on the files then the files will
> need to be committed as well.

Oh dear, yes.  We should only set the property on the directory that
was copied... But of course, that loses some of the usefulness of
having the property at all, for example if we're in a subtree of that
directory later and need to retrieve merge history...

Which, now that I think about it, is what led me to the long-ago
conclusion that this problem is not simple.  William, thoughts?

-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Saving merge history (was: Re: merge should copy-with-history)

Posted by Philip Martin <ph...@codematters.co.uk>.

Karl Fogel <kf...@newton.ch.collab.net> writes:

> Philip Martin <ph...@codematters.co.uk> writes:
> > Doesn't setting svn:merge on every file affect the complexity at
> > commit time?
> 
> I think he's only proposing to set it "when merging data into a file",
> that is, on files that would change anyway.

This is what Will wrote:

> There is one detail that I think is important:  If merge copies a
> directory that didn't exist in revA (ie the dir was added with
> history) then you want to add the merge property to each file
> individually.  This does not affect the time complexity as all those
> files are being copied into your working copy anyway.

The files get copied into the working copy, but they will show up as
status copied-with-history and unmodified, so only the parent needs to
be committed.  If we set svn:merge on the files then the files will
need to be committed as well.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Saving merge history (was: Re: merge should copy-with-history)

Posted by Karl Fogel <kf...@newton.ch.collab.net>.

Philip Martin <ph...@codematters.co.uk> writes:
> Doesn't setting svn:merge on every file affect the complexity at
> commit time?

I think he's only proposing to set it "when merging data into a file",
that is, on files that would change anyway.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Saving merge history (was: Re: merge should copy-with-history)

Posted by Philip Martin <ph...@codematters.co.uk>.

William Uther <wi...@cs.cmu.edu> writes:

> When you merge data into a file, the merge occurs normally except for
> the svn:merge property.  The svn:merge property is not merged, but
> instead a single line is appended to it:
> 
> rev-A URL-A rev-B URL-B
> 
> rev-A is the start rev of the merge (or * if this is an 'add with
> history' merge - the file didn't exist in revA)
> 
> URL-A is the start URL of the file that was merged (this is a * if
> rev-A
> 
> is a *)
> rev-B is the end rev of the merge
> URL-B is the end URL of the file that was merged (use relative URLs so
> that, if the repos moves, nothing breaks)
> 
> There is one detail that I think is important:  If merge copies a
> directory that didn't exist in revA (ie the dir was added with
> history) then you want to add the merge property to each file
> individually.  This does not affect the time complexity as all those
> files are being copied into your working copy anyway.

Doesn't setting svn:merge on every file affect the complexity at
commit time?

My understanding is that at present such files are identical to the
files they were copied from, they are the same node in the subversion
file system.  When you commit such a merge no new nodes have to be
created for the files.  If you change a property on the copied files
then you will need to create new nodes in the subversion filesystem.

Imagine tagging the trunk, at present a single new directory is
created.  With your scheme we will need to create a new version of
every item in order to change the svn:merge property!

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org