You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Tom Lord <lo...@emf.net> on 2003/04/16 00:28:18 UTC

short question about merge [PROPOSAL] vs. tree-deltas


v.a.p. looks at a revision tree like this:


			ANCESTOR
			  /   \
		      ...        ...
			/	  \
		     ORIG	   TARGET
		    ...
		      /
		    MOD


Let's suppose that the trees for ANCESTOR and ORIG have:


	dir1/file1 == node_N.copy_C.rev_R
	dir2/file2 == node_N.copy_C.rev_R

(Note that the two file-rev nodes are identical.)

The tree for MOD has:

	dir1/file1 == node_N.copy_C3.rev_R3
	dir2/file2 == node_N.copy_C4.rev_R4

(the files have not been renamed since ORIG, but they have been
modifed, and the copy_ids lazilly modified.)

And the tree for for TARGET has:

	dir5/file5 == node_N.copy_C5.rev_R5
	dir6/file6 == node_N.copy_C6.rev_R6

(the files have been both renamed (multiple times, let's assume) and
modified.)

Now I want to merge ORIG...MOD into TARGET.

This is the first merge between these branches.

What happens?

-t

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: short question about merge [PROPOSAL] vs. tree-deltas

Posted by Philip Martin <ph...@codematters.co.uk>.

Branko Äibej <br...@xbc.nu> writes:

> >At present two files can only share the same node revision id if they
> >are the result of copying directories, in which case the files will
> >have the same basename.  So
> >
> >	dir1/file1 == node_N.copy_C.rev_R
> > 	dir2/file1 == node_N.copy_C.rev_R
> >
> >can occur.
> >
> 
> Yes, that can occur... (sorry Sander, I didn't think of lazy copying).
> However, in that case, it's the same file, there's nothing to merge

I was simply responding to Sander's point, but from a brief look at
Tom's scenario it appears that the file names don't really matter.  So
whether these two files have the same basename doesn't affect the his
question.

> >It's possible that when (or should that be if?) atomic move gets
> >implemented then the original scenario will occur.
> >  
> >
> No, Bill Tutt thought up a way to avoid this, and IIRC it's already
> implemented in his branch.

Hmmm.  I thought atomic move was intended to lead to exactly that sort
of duplication, but I only ever used Bill's original 1003 branch, I
haven't looked at Mike's newer branch.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: short question about merge [PROPOSAL] vs. tree-deltas

Posted by Tom Lord <lo...@emf.net>.

    me:

    >>>> Let's suppose that the trees for ANCESTOR and ORIG have:

    >>>>	dir1/file1 == node_N.copy_C.rev_R
    >>>>	dir2/file2 == node_N.copy_C.rev_R

    >>>> (Note that the two file-rev nodes are identical.)


    Sander:

    >>> This can't happen AFAIK.


    Phillip:

    >> At present two files can only share the same node revision id if they
    >> are the result of copying directories, in which case the files will
    >> have the same basename.  [....]

    >> can occur.

That's what I thought.  I haven't seen it in svn, but can imagine, an
interface that _might_ be able to hide that duplication of nodrev ids
with some virtual nodes.  If there is such an interface -- please
point me to the code.


    > From: =?UTF-8?B?QnJhbmtvIMSMaWJlag==?= <br...@xbc.nu>

    > Yes, that can occur... (sorry Sander, I didn't think of lazy
    > copying).  However, in that case, it's the same file, there's
    > nothing to merge


Tree deltas and ambiguity about "corresponding files" between two
trees. _Maybe_ I'll have more to say about that soon.  Honestly: I'm
still puzzling it out and the outcome might be "Ain't gonna work" (my
pretty strong _bet_) or "Oh crap -- works perfectly well" (which I'll
cop to if that's the conclusion I reach.)


     Phillip:

     > I was simply responding to Sander's point, but from a brief
     > look at Tom's scenario it appears that the file names don't
     > really matter.  So whether these two files have the same
     > basename doesn't affect the his question.

Just as an intermediate core-dump of my puzzling things out:

Your going to wind up with, _at_best_, the ability to fairly cheaply
compute a copy/rename history of any given noderev.   That's a pretty
non-trivial problem actually -- it's going to impact various commands
and it's hard to do without screwing up the cost of copy.   I still
don't see any way to do it without really screwing the _worst_case_
space consequences of copy --- but let's assume that probabilisticly,
you can make that rename/copy history computable without too many
penalties.

You absolutely need such history for v.a.p. tree-deltas.

Then the question is: does that history give you results that are
unambiguous, useful, easy to control, unsurprising, etc....   And my
_bet_, based on analysis so far, is "no".   But again -- in problem
spaces like this, there's always surprises.  And I may wind up doing
nothing more than "double-checking Sander's math" and endorsing his
approach.   But if I had to bet my last $50 ...... :-)


-t

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: short question about merge [PROPOSAL] vs. tree-deltas

Posted by Branko Čibej <br...@xbc.nu>.

Philip Martin wrote:

>"Sander Striker" <st...@apache.org> writes:
>
>  
>
>>>From: Tom Lord [mailto:lord@emf.net]
>>>Sent: Wednesday, April 16, 2003 2:28 AM
>>>
>>>Let's suppose that the trees for ANCESTOR and ORIG have:
>>>
>>>
>>>	dir1/file1 == node_N.copy_C.rev_R
>>>	dir2/file2 == node_N.copy_C.rev_R
>>>
>>>(Note that the two file-rev nodes are identical.)
>>>      
>>>
>>This can't happen AFAIK.
>>    
>>
>
>At present two files can only share the same node revision id if they
>are the result of copying directories, in which case the files will
>have the same basename.  So
>
>	dir1/file1 == node_N.copy_C.rev_R
> 	dir2/file1 == node_N.copy_C.rev_R
>
>can occur.
>

Yes, that can occur... (sorry Sander, I didn't think of lazy copying).
However, in that case, it's the same file, there's nothing to merge

>It's possible that when (or should that be if?) atomic move gets
>implemented then the original scenario will occur.
>  
>
No, Bill Tutt thought up a way to avoid this, and IIRC it's already
implemented in his branch.


-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: short question about merge [PROPOSAL] vs. tree-deltas

Posted by Philip Martin <ph...@codematters.co.uk>.

"Sander Striker" <st...@apache.org> writes:

> > From: Tom Lord [mailto:lord@emf.net]
> > Sent: Wednesday, April 16, 2003 2:28 AM
> > 
> > Let's suppose that the trees for ANCESTOR and ORIG have:
> > 
> > 
> > 	dir1/file1 == node_N.copy_C.rev_R
> > 	dir2/file2 == node_N.copy_C.rev_R
> > 
> > (Note that the two file-rev nodes are identical.)
> 
> This can't happen AFAIK.

At present two files can only share the same node revision id if they
are the result of copying directories, in which case the files will
have the same basename.  So

	dir1/file1 == node_N.copy_C.rev_R
 	dir2/file1 == node_N.copy_C.rev_R

can occur.

It's possible that when (or should that be if?) atomic move gets
implemented then the original scenario will occur.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: short question about merge [PROPOSAL] vs. tree-deltas

Posted by cm...@collab.net.

Tom Lord <lo...@emf.net> writes:

> 	cmpilato:
> 
> 	> [refinements to my still-developing grokking of the
>         > data structures and their future evolution.]
> 
> 
> Thank you.

No sweat.

> 	> And ... judging by the size of the rest of your mail, and
> 	> that a quick skim doesn't seem to reveal more questions
> 	> about today's FS, I regret to inform you that I've timed
> 	> out. :-\
> 
> Oh well.  The upshot is that you can assume as much
> rename/copy/add/delete history as you like, with any optimizations to
> access to that history you like, and it doesn't matter.  Semantically,
> that data + noderev history data doesn't add up to enough to do
> reasonable tree deltas.   Tree deltas reflect the logical roles of
> files -- and those ancestry and tree-rearrangement histories don't
> reflect the logical roles of files.   So you need something new in the
> data model -- such as "logical ID cookies".

I did actually gather as much from the last two paragraphs of your
mail.  Unfortunately, I lacked the context to pass judgment on the
accuracy of the claim.  I'll leave that for other brains to
ponder. :-)


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: short question about merge [PROPOSAL] vs. tree-deltas

Posted by Tom Lord <lo...@emf.net>.


	cmpilato:

	> [refinements to my still-developing grokking of the
        > data structures and their future evolution.]


Thank you.


	> And ... judging by the size of the rest of your mail, and
	> that a quick skim doesn't seem to reveal more questions
	> about today's FS, I regret to inform you that I've timed
	> out. :-\

Oh well.  The upshot is that you can assume as much
rename/copy/add/delete history as you like, with any optimizations to
access to that history you like, and it doesn't matter.  Semantically,
that data + noderev history data doesn't add up to enough to do
reasonable tree deltas.   Tree deltas reflect the logical roles of
files -- and those ancestry and tree-rearrangement histories don't
reflect the logical roles of files.   So you need something new in the
data model -- such as "logical ID cookies".


-t

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: short question about merge [PROPOSAL] vs. tree-deltas

Posted by cm...@collab.net.

Tom Lord <lo...@emf.net> writes:

>     > From: Greg Stein <gs...@lyra.org>
> 
>     > That huge email is based on a premise that you don't
>     > explain. You throw out "this won't work", but without anything
>     > to back it up, and then go into a long list of "which means this
>     > and that, and can't do this, and so the code should do that."
> 
> Mostly I left out explaining why "this won't work" to keep the message
> from being huger than huge.
> 
> But here you go.
> 
> We can already assume that for every path@rev we can quickly and
> easily compute a node_id.copy_id.rev.   

Today, the ease of this calculation is quite high.  You simply
traverse the DAG for PATH's parent directory, and return (roughly) the
NODE-REV-ID for basename(path)'s entry in its parent.  As such, the
speed of this calculation is directly proportional to the number of
path components in PATH.  Also, I say "roughly" because NODE-REV-ID is
node_id.copy_id.txn_id -- to get specifically what you asked for is an
extra table lookup mapping txn_id -> rev.

> We can assume that for two such noderev ids, the predicate:
> 
> 	X is_ancestor_of Y
> 
> is easy to compute.  It's cheap to know where a given node_id.copy_id
> branched from.  The immediate ancestor of a noderev is easy to find.
> I'll assume that immediate successor is easy to compute, though I
> don't know that for a fact.

Immediate successors are stored in the NODE-REVISION structure, so
that's easy.  Branch point calculation is a bit tricky.  Don't ask for
details on that just yet, please.  My brain has dismissed the details
(I'll be working to remedy that in upcoming days).

X is_ancestor_of Y is nearly impossible to calculate as stated.  But X
is_ancestor_of Y ::= Y is_successor_of X, which is doable, but which
has costs on the order of the number of successors between X and Y
(it's a linked-list backwards in time, essentially).  Of course, you
get a quick rule-out if X is_related_to Y fails (which is
a constant-time calculation).

---

And ... judging by the size of the rest of your mail, and that a quick
skim doesn't seem to reveal more questions about today's FS, I regret
to inform you that I've timed out. :-\

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: short question about merge [PROPOSAL] vs. tree-deltas

Posted by Michael Price <mi...@acm.org>.

Philip Martin wrote:
> Tom Lord <lo...@emf.net> writes:
>>Mostly I left out explaining why "this won't work" to keep the message
>>from being huger than huge.
> 
> You could try simply writing shorter emails.
> I don't really want to ignore you, but until you start presenting
> concise arguments the investment I need to put into reading your
> emails is too high.  You lose out because you get ignored.  I lose out
> if you have important ideas.
> 
> [Snip Tom's argument, it's obscured by far too many unnecessary words.]

+1 :)



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: short question about merge [PROPOSAL] vs. tree-deltas

Posted by Tom Lord <lo...@emf.net>.

	me: 

	>> This message (not from me), points to another use-case that
	>> could be easily solved with id cookies:

	>> [about students using `cp' and `add' and manual deletion
        >>  of wc meta-data to move files]
	

	Will:

	> Yeah - it could be solved with id-cookies.  I'm not sure
	> about the 'easily' part.  All the id cookies allow is the
	> munging of copy history.  In the specific case I was
	> referring to the students had deleted the meta-data, and so
	> it would have had to be manually re-created.

No, it wouldn't _necessarily_ have to be done by hand, and it's impact 
on copy history is more interesting than you give it credit for.  

Using id cookies for tree merges means that you then have two distinct
ways to look at a noderev within a tree: there's the physical history
of the noderev -- essentially an audit trail of what subversion
commands left the file there; then there's the id cookie --
essentially a separate assertion that this physical noderev counts as
the file having a particular logical role for the sake of tree
comparisons.  If the physical history of a noderev winds up confusing
the merge algorithm relative to programmers' intentions, they can
change the id cookie directly.

You wouldn't _necessarily_ have to restore id cookies manually.  For
_some_ source and text files, I've found it convenient to embed id
cookies in comments from which the rev ctl system extracts them
automatically.  This approach doesn't apply to everything, obviously,
but imo it's awfully convenient when it does.  Embedded id cookies and
other ways of including id cookies in the source tree mean that when I
export a tree and include all the id cookies, someone else can import
it, and now we can exchange changesets based on id cookies.

The particular way in which your students are "messing things up" is
extreme -- but I think it illustrates a general rule of thumb that
users can be counted on to use svn commands the first way it occurs to
them to get trees that "look right", or even move files around while
thinking about the impact on some merges but forgetting about other
merges.  ID cookies relax the overloading of physical noderev history
as the strict determinant of how tree merging behaves.

-t

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: short question about merge [PROPOSAL] vs. tree-deltas

Posted by William Uther <wi...@cse.unsw.edu.au>.

Tom Lord wrote:

> This message (not from me), points to another use-case that could be
> easily solved with id cookies:
> 
> 	http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgNo=35541

Heh,
   Yeah - it could be solved with id-cookies.  I'm not sure about the 
'easily' part.  All the id cookies allow is the munging of copy history. 
  In the specific case I was referring to the students had deleted the 
meta-data, and so it would have had to be manually re-created.

Later,

Will       :-}

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: short question about merge [PROPOSAL] vs. tree-deltas

Posted by Tom Lord <lo...@emf.net>.


    > From: Philip Martin <ph...@codematters.co.uk>

    > I have only skimmed what you wrote but it appears that you are
    > describing a series of copy, move, modify and merge events.  Why
    > don't you just write down those events

The events I described lead to an ambiguity for a merge algorithm
based on copy/rename/... history.  After some branching, copying, and
deleting -- merge can't tell which, if any, of the directories in
TARGET corresponds to "src/doc" in ORIG.

What should merge do?

There are two "stories" that fit those copy/rename/... events.  (One
about a doc directory being forked, one about two programs.)  The two
stories lead to very different conclusions about "what merge should
do".

"If it hurts when you do that, then don't do that!"  But the stories
point out that doing it in that way is the only way to get merge to
pick an appropriate ANCESTOR for certain subsequent subtree merges
from remote branches.

"Ok, don't do subtree branching and merging!".  And that was a large
part of my point.

In:

   http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgNo=35501

I proposed use-restrictions on branching and merging to keep merge
happy, and pointed out the implication that the restrictions suggest
using id cookies instead of rename history as the basis of tree
merging.  That's far simpler to implement, and it gives users simple,
direct, and flexible control over the relevant issues.

This message (not from me), points to another use-case that could be
easily solved with id cookies:

	http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgNo=35541

-t

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: short question about merge [PROPOSAL] vs. tree-deltas

Posted by Philip Martin <ph...@codematters.co.uk>.

Tom Lord <lo...@emf.net> writes:

> Mostly I left out explaining why "this won't work" to keep the message
> from being huger than huge.

You could try simply writing shorter emails.

I have only skimmed what you wrote but it appears that you are
describing a series of copy, move, modify and merge events.  Why don't
you just write down those events

    - start with a file foo/bar at rev A
    - copy foo/bar at rev A to foo2/bar at rev B
    - modify foo2/bar to produce rev C
    - etc.

Make it easy for other people to read and understand.  It takes you a
dozen lines to describe a single directory copy operation, and even
then the copy operation is buried in the prose.  Why do you do that?
I know you like to think we are misguided, but do you really think
that we don't understand why someone would make a branch?  Sentences
like

 "He wanted to work on these without messing up the current src/doc
  directory."

are superfluous.  They serve only to hide the important information.
They take up your time writing them and our time reading them.  Leave
them out.  We don't need, or want, cute stories about translators,
writers and programmers.

I don't really want to ignore you, but until you start presenting
concise arguments the investment I need to put into reading your
emails is too high.  You lose out because you get ignored.  I lose out
if you have important ideas.

[Snip Tom's argument, it's obscured by far too many unnecessary words.]

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: short question about merge [PROPOSAL] vs. tree-deltas

Posted by Tom Lord <lo...@emf.net>.


    me:

    >> The nested contents of MOD may be arbitrarily rearranged from ORIG.

    >> The merge algorithm has to look at the two trees and figure out what
    >> in ORIG corresponds to which in MOD.   It has to know what has been
    >> renamed, and what to compare to what.

    > From: Greg Stein <gs...@lyra.org>

    > That huge email is based on a premise that you don't
    > explain. You throw out "this won't work", but without anything
    > to back it up, and then go into a long list of "which means this
    > and that, and can't do this, and so the code should do that."

Mostly I left out explaining why "this won't work" to keep the message
from being huger than huge.

But here you go.

We can already assume that for every path@rev we can quickly and
easily compute a node_id.copy_id.rev.   

We can assume that for two such noderev ids, the predicate:

	X is_ancestor_of Y

is easy to compute.  It's cheap to know where a given node_id.copy_id
branched from.  The immediate ancestor of a noderev is easy to find.
I'll assume that immediate successor is easy to compute, though I
don't know that for a fact.

Let's also assume that that a true rename is added.

Finally, let's assume that we can tweak various commands enough so
that for any given path@rev we can efficiently compute a complete
"import/copy/rename" history.  So that the history for path-a@r_k
might read:

		renamed to path-a@r_q
                renamed to path-b@r_p
                copied to path-c@r_n
                copied to path-d@r_m
                imported to path-e@r_k

Alternatively or additionally, we could keep import/copy/rename
histories on directories.   So that given a directory noderev, we
might be able to compute a history like:

		copied [path@u | noderev_id] to <name> @ rev Q
		copied  <name> from [path@v | noderev2_id] @ rev P
		deleted <name> from [path@w | noderev3_id] @ rev N
		renamed [path@x | noderev4_id] to <name> @ rev M
		...

Now I give you three directories: ORIG, MOD, TARGET -- with their
common ancestor, ANCESTOR.  Under the constraints of v.a.p., we can
assume that

		ORIG is_ancestor_of MOD
		ANCESTOR is_ancestor_of ORIG
                ANCESTOR is_ancestor_of TARGET


So what follows will be the construction of an example that
demonstrates how none of the meta-data listed above helps you merge
reasonably.

Between ORIG and MOD, a change has been made to tree structure.
Let's say:  "orig/src/doc" has been renamed "orig/doc".

First, just to ensure we're on the same page here, let's look at a
case that works.  Let's assume that in TARGET, maybe some files in
src/doc have been modified, and src/doc has been renamed
src/documentation -- but that's about it.  Using the meta-data above,
we'll have no trouble variance adjusting the rename, regardless of how
we characterize it.  Maybe the changeset says, for example:

		--- ORIG
                *** MOD 
		% rename src/doc@x to doc

and after we adjust it back to ANCESTOR it will say:

		--- ORIG
                *** MOD
                vvv ORIG,ANCESTOR
		% rename src/doc@v to doc

and after we adjust it forward to TARGET it will say:

		--- ORIG
                *** MOD
                vvv ORIG,ANCESTOR
                vvv ANCESTOR,TARGET
		% rename src/documentation@i to doc

and then we'll apply that, commit, and be happy.   

Pretty much the idea you have mind?

Now, what else might reasonably have occured between, say, TARGET and
ANCESTOR?  and between ORIG and MOD?  Since we're talking about
structureless, unrestricted use of copy and rename, how about this
scenario:

[I'll warn you in advance that your first reaction might be "What a
 contrived story, I don't care what merge does in that case!" -- but 
 after the story I'll point out why that first reaction isn't such a
 good idea.]

The programmer working on MOD didn't just rename the doc directory.
He also made some changes to files in that directory.

The programmer working on the TARGET branch had some thoughts about 
improving the documentation.   He wanted to work on these without
messing up the current src/doc directory.   So he copied src/doc to
src/doc2, planning to shrink back to just one doc directory before the
next scheduled merge from his branch to the mainline.   He made a
bunch of changes in doc2.

The technical writer group who has primary responsibility for the
manuals got wind of his doc ideas and looked them over.  They, in
fact, have a branch of src/doc that branched somewhere prior to
ANCESTOR.   They like his ideas about doc improvements so they make a
new branch of their own doc directory and merge in his changes.  They
then proceed to "go to town" on the docs, fleshing out his idea for
improvements.

A few days later, someone in the company is going to start working on
a Japanese translation of the docs.   In the translation, some of the
technical sections containing formulas and code snippets require no
translation, but some of the text does.   And anyway, this work is
going to happen concurrently with changes to the text.   Basically,
the translator wants to branch from the English docs and just start
adding new files that contain the translations -- so that he can keep
the English files current, and compare them to the ongoing
translation.   The doc group tells the translator to branch from the
src/doc2 directory in the TARGET branch -- since it's more stable than
their own branch, yet is expected to eventually merge their changes
back in.

Eventually, our TARGET programmer gets the word, merges in the doc
groups changes to his src/doc2, renames that to src/documentation,
deletes the now defunct src/doc[*], and adds a copy of the translation as
src/documentation-jp.  (Re "[*]":  you might be tempted to say "That's
a usage error!" but keep reading.)   Actually -- screw that.  That's
not quite what he does.   Actually he deletes both "src/doc" and
"src/doc2" and simply _copies_ from the doc group's tree to make the
new "src/documentation".   Why does he do that?  Well, because
anticipates further merges from the doc group's branch and this way, he
gets a good choice of ANCESTOR for those merges.  And because it's
easier and is the first thing that occurs to him.

Of course, there's not a chance in heck here that v.a.p. is going to
figure out to rename src/documentation to doc or where to apply
the ORIG..MOD changes within src/doc --- and if you think there is
such a chance, then why doesn't it rename and modify
src/documentation-jp instead?  Or should it modify both and delete
one?  Or modify both and signal a rename conflict?   

Now maybe you'll think, as I did initially -- "No, actually v.a.p. can
probably be made to do something really useful here!  It can say `you
have both src/documentation and src/documentation-2 -- based on
meta-data, it looks like either of these might be the directory I'm
supposed to rename to ./doc and modify the contents of, and I can't
tell which.  You have to figure out how to resolve this conflict.'"

But that's part of the point of making up the little narrative tale.
I can make up a completely different narrative tail, that boils down
to the same svn operations, but that has very different implications.

Instead of `src/doc', we can talk about `src/prog-a'.   Instead of
`src/doc-2', we can talk about `src/prog-a-revised'.   Instead of a 
Japanese translation, we can consider a new program being added to the
suite, `src/prog-b' -- but initially derived from and sharing history
with `prog-a'.  Instead of ORIG..MOD renaming `src/doc' to `doc',
perhaps it renames `src/prog-a' to `src/a-prog'.   Now in this
scenario, sure -- the initial merge might throw up its hands and say
"I can't tell what counts as `a-prog' here!" -- and programmers
grumble and fix the conflicts by hand.   But what happens on the next
merge?  And the next?   The ambiguity is still there.  The programmers
know perfectly well because they have to resolve it the same way
_every_ time.   But there's _nearly_ no way to tell svn how to resolve
the ambiguity automatically, from now on.

Ok, there's _one_ way: but it creates new problems.  If you haven't
already typed it out, I'd guess you're just itching to flame:  "Hey,
the mistake here was that the guy just deleted src/doc.   Instead of
deleting it, he should have just merged doc2 into it!".

Nope.  The TARGET guy's "src/doc" and "src/doc2" have very different
ANCESTOR revisions with respect to the doc group's branch.   They'll
get very different merges one way or the other -- with the better
merges coming from deleting "src/doc" and "src/doc2" and copying from
the doc group's branch.

There's no reasonable resolution here.  There's no good expression of
"logical file identity" just in terms of rename and copy operations.
The divergence between "how to get a tree that looks like what I want"
and "how to get a tree that merge handles as I would expect" is deep.
We can make up lots of stories that demonstrate this.

"It's such a contrived example, I can't take it seriously!," I hear
you cry.   And to be sure, whether or not these scenarios or ones
resembling them are realistic is sufficiently hard to prove that if
your _goal_ is just to make arguments, I'm sure any disagreement
along these lines can ultimately be reduced to "It's a matter of
speculation and opinion."   But let me assume that your goal isn't
just to argue for argument's sake and, in that light, make a
non-conclusive pitch for my view (but one that I find convincing):

User's are perverse.   

User's aren't going to form accurate models of the abstract data type
that describes svn's notion of history.  They aren't going to learn
subtle patterns of using svn commands that avoid problems such as
mentioned above.  They're going to use svn commands however it first
occurs to them until they get their trees to "look right", and then
they're going to commit.  For whatever reason, not just my little
story, they'll delete, and copy, and rename on a random walk until
their wc looks like what they want.  And then they'll commit.  Given
just noderev ids and even copy/rename history --- merge has no chance
at all of sorting that out.

ID cookies, on the other hand -- those aren't so damn hard for even
users with imperfect grokking of svn to grasp.   And if the ID cookies
get messed up somehow, that's ok:  you can change them directly rather
than going through a lot of copy/rename/merge indirections to achieve
the same logical effect.

ID cookies are a heck of a lot easier to implement, AFAICT, than
efficiently computable copy/rename histories.  (By all means, keep
making it _possible_ to compute copy (and later, rename) histories --
but don't worry so much about making it as fast as merge would need.
Save the effort -- ID cookies solve the same problem in a simpler,
more general, and more easily user-controlled way.

And, of course, ID cookies simplify alternative merging operations,
distributed branches, and yadda yadda yadda.  You guys work too hard.


-t


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: short question about merge [PROPOSAL] vs. tree-deltas

Posted by Greg Stein <gs...@lyra.org>.

On Thu, Apr 17, 2003 at 12:08:34PM -0700, Tom Lord wrote:
> 
> Let's say that I say:  `svn merge ORIG MOD TGT'.
> 
> The nested contents of MOD may be arbitrarily rearranged from ORIG.
> 
> The merge algorithm has to look at the two trees and figure out what
> in ORIG corresponds to which in MOD.   It has to know what has been
> renamed, and what to compare to what.
> 
> In the general case, with absolutely no usage restrictions, this is
> basically impossible.  Noderev ancestry, a hypothetical "perfect
> copy/rename history for each noderev" -- both give results that are
> either ambiguous, or that rely on disambiguation rules that are hard
> to control and mess up merging generally.[1]
>...
> [1] I can explain that in detail but wanted to keep this post short.

That huge email is based on a premise that you don't explain. You throw out
"this won't work", but without anything to back it up, and then go into a
long list of "which means this and that, and can't do this, and so the code
should do that."

I'm not sure that I agree with your premise, so it is impossible to evaluate
anything in the rest of your email.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: short question about merge [PROPOSAL] vs. tree-deltas

Posted by Sander Striker <st...@apache.org>.

> From: Tom Lord [mailto:lord@emf.net]
> Sent: Thursday, April 17, 2003 9:09 PM

Tom,

I just don't have time right now.  I may be able to respond over
the weekend, but currently I am swamped in 'real' work.

Others might be able to respond at this point, but since you
CC'd me explicitly I thought I'd let you know.


Sander

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Munging copy history (was: Re: short question about...)

Posted by Stefan Monnier <mo...@rum.cs.yale.edu>.

> I often find that they copy using normal cp, remove the old '.svn'
> directories and then add things to the repository (rather than using 'svn
> cp').

MetaCVS has some support for such things.
It guesses operations by looking at inode numbers and by comparing
the textual content (using things like word-frequencies).
It of course only works in simple enough cases, but it's surprisingly
effective when doing a `mcvs import' of some vendor code and the vendor
has moved some files around in the new version.


        Stefan


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Munging copy history (was: Re: short question about...)

Posted by William Uther <wi...@cse.unsw.edu.au>.

On Friday, April 18, 2003, at 09:38  AM, Tom Lord wrote:

> Briefly: there isn't a reasonable (easy
> to explain/understand, user controllable, suitably flexible)
> correlation between svn's directory-changing operations and the
> programmer's idea of logical file correspondences.

[snip]

> [What you want is] directory-changing commands [that] can "do the right
> thing" most of the time, while still giving users an easy out when the
> right thing is something different.

I've been using subversion with students.  It is their first look at a 
version control system.

I often find that they copy using normal cp, remove the old '.svn' 
directories and then add things to the repository (rather than using 
'svn cp').

At the moment I have two options when this happens:

i) lose the copy history and assume that we'll never have to merge 
between the copied (or moved) data.

ii) Add the copy history in later as follows:

export their newly added stuff (call this tree A)
svn rm the newly added stuff (tree A)
svn cp the old stuff (from tree B to tree C)
check out tree C
copy the exported tree A over the checked out tree C
check in tree C.

Now we have what was in tree A back in the same location in the 
repository, but with copy history.

I was hoping that some time in the future there would be a way to fix 
this case automatically.  Some ideas:

i) My original thought was that you could use the merge history 
properties somehow, but I'm not sure the merge algorithms being 
proposed will work in this case.  Does merge history allow you to add 
new copy information, or just figure out what has been merged between 
copies that previously existed.

ii) You might be able to do something wacky with svndumpfilter that 
goes in an adds the appropriate copy history.  This is not a 
lightweight solution though...

Thoughts,

Will            :-}

--
Dr William Uther                            National ICT Australia
Phone: +61 2 9385 6926             School of Computer Science and 
Engineering
Email: willu@cse.unsw.edu.au             University of New South Wales
Jabber: willu@jabber.cse.unsw.edu.au          Sydney, Australia

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: short question about merge [PROPOSAL] vs. tree-deltas

Posted by Tom Lord <lo...@emf.net>.



    > From: "Jack Repenning" <jr...@collab.net>

    > > In the general case, with absolutely no usage restrictions, 
    > > this is basically impossible.  

    > How so?  If you merge the directories before you merge their contents,
    > then you discover correspondences.

See my long reply to gstein.  Briefly: there isn't a reasonable (easy
to explain/understand, user controllable, suitably flexible)
correlation between svn's directory-changing operations and the
programmer's idea of logical file correspondences.

With "ID cookies" -- the directory-changing commands can "do the right
thing" most of the time, while still giving users an easy out when the
right thing is something different.

-t


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: short question about merge [PROPOSAL] vs. tree-deltas

Posted by Branko Čibej <br...@xbc.nu>.

Jack Repenning wrote:

>>In the general case, with absolutely no usage restrictions, 
>>this is basically impossible.  
>>    
>>
>
>How so?  If you merge the directories before you merge their contents,
>then you discover correspondences.
>  
>
Don't you wish it were that easy... Even a simple "svn mv foo/bar bar"
can cause havoc.

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: short question about merge [PROPOSAL] vs. tree-deltas

Posted by Jack Repenning <jr...@collab.net>.

> In the general case, with absolutely no usage restrictions, 
> this is basically impossible.  

How so?  If you merge the directories before you merge their contents,
then you discover correspondences.



-===-
Jack Repenning
CollabNet, Inc.
o: 650.228.2562
c: 408.835.8090


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: short question about merge [PROPOSAL] vs. tree-deltas

Posted by Tom Lord <lo...@emf.net>.

Let's say that I say:  `svn merge ORIG MOD TGT'.

The nested contents of MOD may be arbitrarily rearranged from ORIG.

The merge algorithm has to look at the two trees and figure out what
in ORIG corresponds to which in MOD.   It has to know what has been
renamed, and what to compare to what.

In the general case, with absolutely no usage restrictions, this is
basically impossible.  Noderev ancestry, a hypothetical "perfect
copy/rename history for each noderev" -- both give results that are
either ambiguous, or that rely on disambiguation rules that are hard
to control and mess up merging generally.[1]

The only way to make this a tractable problem is with restrictions
(either enforced or advisory) on how people use copy and rename.
The best restrictions I've been able to think of are:

	1) "Copy" is normally used only to create tags and branches,
           and only on the top-level directories of project trees.
           It's almost never used to copy anything _into_ a project
           tree, unless that copy is copying in files and dirs that
           don't already exist in the target tree.  "Copy" is never
           used _within_ a project tree.  [I'll slightly relax that
           last constraint later on.]

	2) "Rename" is used only to rename tags and branches, or to
           rename files and directories _within_ a project tree, but
           it is almost never used between project trees, except when
           the renamed-into tree doesn't already have _any_ of the
           files being added by the rename.  In other words, "rename"
           isn't permitted to create "two copies of the same logical
           file" within a project tree.

That's _fairly_ simple advice to give users.   Better still: it'd
be really easy to implement some higher-level commands, let's call
them "scm-copy" and "scm-rename" which enforce those restrictions.

One problem, though, is that sometimes I certainly want "nested
projects", and even source trees that contain multiple (but variant)
instances of a given "nested project".  Nested projects can easily
violate the invariants that the copy/rename restrictions attempt to
preserve.

But that might be ok: if I can convince the merge commands to stop at
sub-project boundaries.  That is, if I merge in some project tree, if
the merge simply ignores any "nested projects" then there's no
problem.  I can merge those subtrees in a separate step -- or even
make a little script that merges them recursively, when it can be told
succinctly what to merge with what.

Under those restrictions, when I "add" a file to a project tree, I
could think of that file as acquiring a "logical identity" in that
tree.  "Rename" preserves that identity.  If I wind up adding a copy
of the file to a different branch of the same project: it has the same
"logical identity" in that other branch.  The invariant here is that
each project tree contains at most one file with a given logical
identity.

So, why muck around with rename histories at all?  I simply don't need
them and they're a bear to implement efficiently and accurately.
Instead, whenever I "add" a file to a project tree, I can give it a
cookie -- a property that contains a unique ID representing the
"logical identity" of the file.  (I guess this would be a property on
the noderev, automatically inherited to all descendent noderevs
unless explicitly overridden.)  If a merge operation copies that file
to another branch, it'll still have the same cookie.  When a merge
operator needs to compare two trees, it doesn't have to muck with
rename histories at all -- it can just compare those id cookies.

Earlier, I said "Copy is never used _within_ a project tree."   We can
relax that constraint at the cost of making intra-tree copies more
expensive.   Copy _within_ a tree has to change the logical ids of the
copied objects.   I guess that could either be done eagerly, or with a
lazy mechanism comparable to the way new copy_ids are assigned, though
even if done lazily, a client must never observe two path@x noderevs
within the same project tree to have the same id.  (The other
restrictions about copy and rename can be relaxed similarly, at
similar expense.)

The big thing is that every noderev has this "id" property; normally
that property is the same for all revisions of a given node_id.copy_id
and all other node_id.copy_ids descended from those;  "project trees"
have at most one noderev with a given id;  copying within a project
tree assigns new ids to the copied objects.   If you need two subtrees
of a project tree which have duplicate ids -- then those must be
nested subprojects and merge will never see them while working on the
parent tree.

While v.a.p. only ever compares two trees which have a direct ancestry
relationship, the id cookies make it possible to compare any two
trees, regardless of ancestry.  There's plenty of _other_ merge
techniques that can make good use of that, in case you're not so sure
yet that v.a.p. is the One True Merge Technique.

I had a moment of darkness where I thought: you know, it'll never fly.
Svn hackers will just balk at the idea of introducing project tree
boundaries and restrictions based on them --- even if you talk about
implementing them without disturbing the lower level project-less
generality of the fs.  But I also had the hope that, maybe if they
bang their heads long enough against the brick wall of trying to solve
the tree-delta mechanism absent these restrictions, the project-tree
trick will start to look like a pretty clean and simple alternative
that doesn't take away much if any flexibility.

-t




[1] I can explain that in detail but wanted to keep this post short.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: short question about merge [PROPOSAL] vs. tree-deltas

Posted by Sander Striker <st...@apache.org>.

> From: Tom Lord [mailto:lord@emf.net]
> Sent: Wednesday, April 16, 2003 2:28 AM

> v.a.p. looks at a revision tree like this:
> 
> 
> 			ANCESTOR
> 			  /   \
> 		      ...        ...
> 			/	  \
> 		     ORIG	   TARGET
> 		    ...
> 		      /
> 		    MOD
> 
> 
> Let's suppose that the trees for ANCESTOR and ORIG have:
> 
> 
> 	dir1/file1 == node_N.copy_C.rev_R
> 	dir2/file2 == node_N.copy_C.rev_R
> 
> (Note that the two file-rev nodes are identical.)

This can't happen AFAIK.

Sander


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org