You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Sander Striker <st...@apache.org> on 2003/04/10 19:09:24 UTC

[PROPOSAL] Merging Improved

Hi,

Attached is a pdf file with examples.  I'll refer to them
in this email.  Doing the same diagrams in ascii art can be
done, but it isn't pretty ;).

For the sake of this discussion I'm ignoring cherry picking
for now.  I will mention something in the bookkeeping section,
but am not going into details on how to use that information
as of yet.

The goal of this proposal is to come up with a scheme that will
make the:

  svn merge URL .

syntax do something usefull.  That is, merge the changes between
URL and WC URL, since you last ran this command, into your working
copy.

For sake of simplicity I'm leaving the handling of tree deltas out
of this proposal.  This being resemblant to textual merging to
a degree.

-------
Merging will rely heavily on variance adjusting.  Prior discussions
have questioned it's safety, but that is beyond the scope of this
proposal.  Lets assume we allow the user enough rope to macrame
a castle with in which they can then hang themselves.

When merging from one branch to another, we need 4 files:

  1) the working copy, from now on referred to as Modified(M}.
  2) the latest version of a branch, from now referred to as Latest(L).
  3) the most recently merged revision(MRMR) of said branch.
  4) the most recent common ancestor(MRCA) of L and M.

After a merge we end up with an altered working copy(WC).

When MRCA is absent we currently need to punt since we need
to figure out how to handle that case.

When MRCA == MRMR (or MRMR is absent), we can use our current
internal merge.  This will be (mostly) equivalent to
diff3 -mE M MRCA L.  I'll refer to this function as diff3 in the
examples.

When MRCA != MRMR, we use variance adjusted merging.  This is based
on an algorithm described at:

  http://subversion.tigris.org/variance-adjusted-patching.html

Basically you obtain a diff between MRMR and L, adjust that using
the diffs between MRCA and MRMR, and MRCA and M, then apply it
to M.  This has been implemented in the internal diff library
as diff4 (untested).  I'll refer to this function as diff4 in
the examples.

On each merge we need to do some bookkeeping (which we will rely
on to do merges aswell).  All bookkeeping operations are on the
[TBD: svn:merged-from ?] property of each merged file in the WC.

Each line in this property will have the following syntax:

  [REPOSITORY-UUID::]PATH@REVISION[:REVISION]

The REPOSITORY-UUID is to be left out when it is the same as the
one M belongs to.  The REVISION:REVISION syntax is to accomodate
for cherry picking.

There will be no two lines in svn:merged of the
[REPOSITORY-UUID::]PATH@REVISION syntax where the [REPOSITORY-UUID::]PATH
parts are the same.  This line will be called the MRMR line.

There will be no lines in svn:merged where the [REPOSITORY-UUID::]PATH
is the same and the REVISION is < as REVISION in the MRMR line.

There will also be no lines in svn:merged where PATH is equal to
the path of M in the repository.

Given these rules it is always possible to determine the MRMR, it will
retain full merge history (since properties are versioned) and the size
of the property contents will not grow unbounded.

Note that the MRCA was already recoverable since this information is
recorded on copy.

Now what should 'svn merge BRANCH' do?  In my opinion it should try
to apply the changes between BRANCH@MRMR (or MRCA when MRMR is absent)
and BRANCH@HEAD to my working copy.

------
The examples are easy to read.  The rounded rectangles represent a
revision.  The vertical line they're in present their path.  Arrows
represent transitions to the next revision.  The grey blocks are the
merged-from properties.  We make the assumption that the working
copies are unmodified prior to a merge so that it always is the
same as the revision is was checked out as.  Furthermore we assume
that the working copy is committed after the merge.

Example 1.

Note that Branch1 is copied from Trunk at revision 17, therefor:
MRCA = Trunk@17 == Branch1@18.  Since there are no more copies in this
example this will remain the same for the entire example.

M1.  MRCA = Trunk@17 = Branch@18
     MRMR is absent
     M = Branch@19
     L = Trunk@20
     WC = Branch@21

This results in WC = diff3(M, MRCA, L).  A line is added to svn:merged-from 
of WC:

  Trunk@20.


M2.  MRCA = Trunk@17 = Branch@18
     MRMR = Trunk@20
     M = Branch1@21
     L = Trunk@23
     WC = Branch1@24

This results in WC = diff4(M, MRCA, L, MRMR).  The line in svn:merged-from
of WC is updated to:

  Trunk@23


M3.  MRCA = Trunk@17 = Branch@18
     MRMR is absent
     M = Trunk@23
     L = Branch1@25
     WC = Trunk@26

This results in WC = diff3(M, MRCA, L).  Note that this shouldn't be a
problem since diff3 is perfectly capable of identifying common changes
(the ones merged from Trunk to Branch earlier), and these should therefor
not render any conflicts.  The svn:merged-from property from L is merged
into the one on WC, after which all 'Trunk' lines will be removed
(rendering it empty).  Then the following line is added:

  Branch1@25


I'll leave the second and third example for tomorrow, since I'm out of
time for today.

Hopefully this was an interesting read.  Not truly a well written
proposal, but there you go.

Sander

Re: [PROPOSAL] Merging Improved

Posted by Branko Čibej <br...@xbc.nu>.

Sander Striker wrote:

>>From: Sander Striker [mailto:striker@apache.org]
>>Sent: Friday, April 11, 2003 12:12 AM
>>    
>>
>
>  
>
>>>From: Branko Cibej [mailto:brane@xbc.nu]
>>>Sent: Friday, April 11, 2003 12:06 AM
>>>      
>>>
>>>Right. But I'm confused, because in your proposal (my first quote from
>>>you, above) you say that merges from the same path would *not* be
>>>recorded. I'd like to understand if such records are really unnecessary.
>>>      
>>>
>>In the same path as you are merging _to_.  Ah, communication problem.  I
>>was talking about all the branchpoints of L, not M.
>>    
>>
>
>Don't mind me.  I shouldn't be replying when too tired.  I'll get
>back to you on this one tomorrow, okay?
>  
>

It's already tomorrow. :-)

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: [PROPOSAL] Merging Improved

Posted by Sander Striker <st...@apache.org>.

> From: Sander Striker [mailto:striker@apache.org]
> Sent: Friday, April 11, 2003 12:12 AM

>> From: Branko Cibej [mailto:brane@xbc.nu]
>> Sent: Friday, April 11, 2003 12:06 AM
> 
>> Right. But I'm confused, because in your proposal (my first quote from
>> you, above) you say that merges from the same path would *not* be
>> recorded. I'd like to understand if such records are really unnecessary.
> 
> In the same path as you are merging _to_.  Ah, communication problem.  I
> was talking about all the branchpoints of L, not M.

Don't mind me.  I shouldn't be replying when too tired.  I'll get
back to you on this one tomorrow, okay?

Sander


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: [PROPOSAL] Merging Improved

Posted by Sander Striker <st...@apache.org>.

> From: Branko Cibej [mailto:brane@xbc.nu]
> Sent: Friday, April 11, 2003 12:06 AM

> Right. But I'm confused, because in your proposal (my first quote from
> you, above) you say that merges from the same path would *not* be
> recorded. I'd like to understand if such records are really unnecessary.

In the same path as you are merging _to_.  Ah, communication problem.  I
was talking about all the branchpoints of L, not M.

Sander


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Branko Čibej <br...@xbc.nu>.

Sander Striker wrote:

>>From: Branko Cibej [mailto:brane@xbc.nu]
>>Sent: Thursday, April 10, 2003 11:07 PM
>>    
>>
>
>  
>
>>Sander Striker wrote:
>>
>>    
>>
>>>There will also be no lines in svn:merged where PATH is equal to
>>>the path of M in the repository.
>>> 
>>>
>>>      
>>>
>>Then what happens if you use merge to undo a change on the same branch?
>>For example: say you're at revision 5 of /trunk, and you want to undo
>>the changes that went in at revision 3. Presumably, this is what you'd do:
>>
>>    svn merge -r3:2 url://of/trunk .
>>
>>Shouldn't this record "trunk@3:2", or something? (Yes, I know this is
>>nitpicking... i mean, cherry picking)
>>    
>>
>
>Yes.  See example 3, CP1 and CP2 ;)  (guess what CP stands for).
>  
>

Right. But I'm confused, because in your proposal (my first quote from
you, above) you say that merges from the same path would *not* be
recorded. I'd like to understand if such records are really unnecessary.

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: [PROPOSAL] Merging Improved

Posted by Sander Striker <st...@apache.org>.

> From: Branko Cibej [mailto:brane@xbc.nu]
> Sent: Thursday, April 10, 2003 11:07 PM

> Sander Striker wrote:
> 
> >There will also be no lines in svn:merged where PATH is equal to
> >the path of M in the repository.
> >  
> >
> Then what happens if you use merge to undo a change on the same branch?
> For example: say you're at revision 5 of /trunk, and you want to undo
> the changes that went in at revision 3. Presumably, this is what you'd do:
> 
>     svn merge -r3:2 url://of/trunk .
> 
> Shouldn't this record "trunk@3:2", or something? (Yes, I know this is
> nitpicking... i mean, cherry picking)

Yes.  See example 3, CP1 and CP2 ;)  (guess what CP stands for).


Sander


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Branko Čibej <br...@xbc.nu>.

Sander Striker wrote:

>There will also be no lines in svn:merged where PATH is equal to
>the path of M in the repository.
>  
>
Then what happens if you use merge to undo a change on the same branch?
For example: say you're at revision 5 of /trunk, and you want to undo
the changes that went in at revision 3. Presumably, this is what you'd do:

    svn merge -r3:2 url://of/trunk .

Shouldn't this record "trunk@3:2", or something? (Yes, I know this is
nitpicking... i mean, cherry picking)


-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Branko Čibej <br...@xbc.nu>.

Sander Striker wrote:

>>From: Branko Cibej [mailto:brane@xbc.nu]
>>Sent: Friday, April 11, 2003 12:13 AM
>>    
>>
>
>[...]
>  
>
>>>No.  Consider attached example.  Ofcourse you always bring forward
>>>interesting cases ;).  In this case I clearly forgot about something.
>>>
>>>We need to find _all_ the branchpoints of M until we hit the path of the
>>>MCRA, and either record them all or decide to infer them at the time
>>>we need them.  The latter could lead to some brute force searching
>>>at a later point in time though (for every source merged that doesn't
>>>have an entry in svn:merged-from yet).
>>>      
>>>
>>Aha! Well then, I'd suggest to record the branch point at the time of
>>the branch.
>>    
>>
>
>The reason I didn't want to do this is that copies aren't copies anymore
>since the copied-to tree will have svn:merged-from properties attached
>to it, which the copied-from tree clearly won't have.
>
Oof, you're right.

>  For tagging this
>information is completely irrelevant, so it will bite users that do a
>lot of tagging but no branching.  This will also make the copy operation
>more expensive than O(1), which I think is a no-go.
>

And right again. Lazy propchange during lazy copy is not something I'd
want to contemplate. Brrrr.

Luckily, all that information is available in the node history.


-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: [PROPOSAL] Merging Improved

Posted by Sander Striker <st...@apache.org>.

> From: Branko Cibej [mailto:brane@xbc.nu]
> Sent: Friday, April 11, 2003 12:13 AM

[...]
>> No.  Consider attached example.  Ofcourse you always bring forward
>> interesting cases ;).  In this case I clearly forgot about something.
>>
>> We need to find _all_ the branchpoints of M until we hit the path of the
>> MCRA, and either record them all or decide to infer them at the time
>> we need them.  The latter could lead to some brute force searching
>> at a later point in time though (for every source merged that doesn't
>> have an entry in svn:merged-from yet).
> 
> Aha! Well then, I'd suggest to record the branch point at the time of
> the branch.

The reason I didn't want to do this is that copies aren't copies anymore
since the copied-to tree will have svn:merged-from properties attached
to it, which the copied-from tree clearly won't have.  For tagging this
information is completely irrelevant, so it will bite users that do a
lot of tagging but no branching.  This will also make the copy operation
more expensive than O(1), which I think is a no-go.

> Even though that information is trivially available from the
> node ancestry, you don't have to traverse the node history at merge
> time. This would also let you compute the MRCA just by comparing the
> svn:merged props.

Yes, that would be a plus.  It crossed my mind obviously, just failed
to mention it ;)  Sorry.


Sander


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Branko Čibej <br...@xbc.nu>.

Sander Striker wrote:

>>From: Branko Cibej [mailto:brane@xbc.nu]
>>Sent: Thursday, April 10, 2003 11:03 PM
>>    
>>
>
>  
>
>>>Sander Striker wrote:
>>>I also found out that I forgot about one rule in the
>>>bookkeeping process:
>>>
>>>If the MRCA of M and L is not the same path as M, then
>>>a line needs to be added to svn:merge that contains the
>>>MRCA of M and the HEAD of the path of the MRCA of M and L
>>>(you might need to re-read that twice).
>>>
>>>Example 3. M1 is a good example of this.
>>>      
>>>
>>Let me see if I understand... that's where you got the trunk@34 line in
>>M1 in example 3, right?
>>    
>>
>
>Yes.
>
>  
>
>>Hm, (scratch scratch). Isn't that always the same as the branchpoint
>>of M's path?
>>    
>>
>
>No.  Consider attached example.  Ofcourse you always bring forward
>interesting cases ;).  In this case I clearly forgot about something.
>
>We need to find _all_ the branchpoints of M until we hit the path of the
>MCRA, and either record them all or decide to infer them at the time
>we need them.  The latter could lead to some brute force searching
>at a later point in time though (for every source merged that doesn't
>have an entry in svn:merged-from yet).
>  
>

Aha! Well then, I'd suggest to record the branch point at the time of
the branch. Even though that information is trivially available from the
node ancestry, you don't have to traverse the node history at merge
time. This would also let you compute the MRCA just by comparing the
svn:merged props.

Assuming no tree changes, of course, to deal with those, you'd more
likely have to record node-id@revision rather than path@revision, and
use the node change's committed-path info to get the path.

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: [PROPOSAL] Merging Improved

Posted by Sander Striker <st...@apache.org>.

> From: Branko Cibej [mailto:brane@xbc.nu]
> Sent: Thursday, April 10, 2003 11:03 PM

>> Sander Striker wrote:
>> I also found out that I forgot about one rule in the
>> bookkeeping process:
>>
>> If the MRCA of M and L is not the same path as M, then
>> a line needs to be added to svn:merge that contains the
>> MRCA of M and the HEAD of the path of the MRCA of M and L
>> (you might need to re-read that twice).
>>
>> Example 3. M1 is a good example of this.
> 
> Let me see if I understand... that's where you got the trunk@34 line in
> M1 in example 3, right?

Yes.

> Hm, (scratch scratch). Isn't that always the same as the branchpoint
> of M's path?

No.  Consider attached example.  Ofcourse you always bring forward
interesting cases ;).  In this case I clearly forgot about something.

We need to find _all_ the branchpoints of M until we hit the path of the
MCRA, and either record them all or decide to infer them at the time
we need them.  The latter could lead to some brute force searching
at a later point in time though (for every source merged that doesn't
have an entry in svn:merged-from yet).

Sander

Re: [PROPOSAL] Merging Improved

Posted by Branko Čibej <br...@xbc.nu>.

Sander Striker wrote:

>>From: Sander Striker [mailto:striker@apache.org]
>>Sent: Thursday, April 10, 2003 9:17 PM
>>    
>>
>
>  
>
>>Note that example 2 and 3 contain some flaws.  Please
>>disregard those until I repost.
>>    
>>
>
>Ok, fixed.
>
>I also found out that I forgot about one rule in the
>bookkeeping process:
>
>If the MRCA of M and L is not the same path as M, then
>a line needs to be added to svn:merge that contains the
>MRCA of M and the HEAD of the path of the MRCA of M and L
>(you might need to re-read that twice).
>
>Example 3. M1 is a good example of this.
>

Let me see if I understand... that's where you got the trunk@34 line in
M1 in example 3, right? Hm, (scratch scratch). Isn't that always the
same as the branchpoint of M's path?


-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by "Glenn A. Thompson" <gt...@cdr.net>.

Hey,

cmpilato@collab.net wrote:

>=?UTF-8?B?QnJhbmtvIMSMaWJlag==?= <br...@xbc.nu> writes:
>
>  
>
>>Ah! You'd have to ask the remote repository about that. We record the
>>committed-path of every node revision.
>>    
>>
>
>You mean today?  Not entirely true (yet).  That code exists only in
>Tutt's fs-related branch, and in my fs-schema-changes branch (where
>I'm gradually re-constructing Tutt's work in a way that I can actually
>understand).
>
I'm a little confused.  I thought fs-schema-changes *was* Bill's branch 
and that you just took it over.  If this isn't true. What Is Bill's branch?
I haven't looked at the fs-schema-changes branch in a while.  I can't 
seem to get to the repos right now.  Are you making big changes in there 
lately?  Also, you wouldn't happen to be removing the skelzzzzz from 
dag.c?  I'd like to see another "bdb" function or two that handle hashes 
for getting/putting "reps" of props and directories.  In order to take 
advantage "relational" capabilities, long term I will be storing 
directories and props using a different "rep" type.  There are several 
short-term options of course:-)  Just a heads up.

Thanks,
gat

PS Sorry I haven't posted my pluggable-db design doc yet.  I'm working 
on it.  I'm still up to my ears in illness.  Un-freaking believable year 
for me.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Branko Čibej <br...@xbc.nu>.

cmpilato@collab.net wrote:

>=?UTF-8?B?QnJhbmtvIMSMaWJlag==?= <br...@xbc.nu> writes:
>
>  
>
>>Ah! You'd have to ask the remote repository about that. We record the
>>committed-path of every node revision.
>>    
>>
>
>You mean today?  Not entirely true (yet).  That code exists only in
>Tutt's fs-related branch, and in my fs-schema-changes branch (where
>I'm gradually re-constructing Tutt's work in a way that I can actually
>understand).
>  
>

Oh! I forgot that work wasn't on trunk yet. But never mind, it /will/ be
on the trunk, along with the atomic rename fixes, and I'd say we want to
take that into account in the merge design.

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by cm...@collab.net.

=?UTF-8?B?QnJhbmtvIMSMaWJlag==?= <br...@xbc.nu> writes:

> Ah! You'd have to ask the remote repository about that. We record the
> committed-path of every node revision.

You mean today?  Not entirely true (yet).  That code exists only in
Tutt's fs-related branch, and in my fs-schema-changes branch (where
I'm gradually re-constructing Tutt's work in a way that I can actually
understand).

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: [PROPOSAL] Merging Improved

Posted by Sander Striker <st...@apache.org>.

> From: Branko Cibej [mailto:brane@xbc.nu]
> Sent: Friday, April 11, 2003 5:41 PM

>> Sander Striker wrote:
>>>>> Oh, yes: one other thing to worry about here are dump/load cycles. If
>>>>> you use node id's instead of paths, then the dumper has to convert them
>>>>> to paths and the loader has to convert back, because node IDs aren't
>>>>> preserved across a reload.
>>
>> But that won't help for remote repositories.  You don't even know when a
>> remote repository will be dumped/loaded and you might be stuck with IDs
>> that are invalid afterwards.
> 
> Or we could simply declare that the UUID defines the repository; if the
> UUID changes, it's arguably no longer the same repository (and such a
> change would cause problems for more than just merges, anyway). That's
> why "svnadmin load" has an option to restore the UUID from the dump file.

With IDs I was referring to NODECHANGE ids.

Sander


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Branko Čibej <br...@xbc.nu>.

Sander Striker wrote:

>>>>Oh, yes: one other thing to worry about here are dump/load cycles. If
>>>>you use node id's instead of paths, then the dumper has to convert them
>>>>to paths and the loader has to convert back, because node IDs aren't
>>>>preserved across a reload.
>>>>        
>>>>
>
>But that won't help for remote repositories.  You don't even know when a
>remote repository will be dumped/loaded and you might be stuck with IDs
>that are invalid afterwards.
>

Or we could simply declare that the UUID defines the repository; if the
UUID changes, it's arguably no longer the same repository (and such a
change would cause problems for more than just merges, anyway). That's
why "svnadmin load" has an option to restore the UUID from the dump file.

>>>I wonder, if we use PATH@REVISION, we should be able to get a tree delta
>>>from the server which does take renames into account, no?
>>>      
>>>
>>Yes. But you'd have to talk to the server.
>>    
>>
>
>But we'd have to do that anyway to figure out the commited-path, right?
>So why not do it the other way around?  Given PATH@REVISION, return
>NODE-CHANGE.
>
>Or is asking the server only needed for remote repositories?
>

I don't think so, at least not in general.

>>>Are NODE-IDs guaranteed to be the same after a dump/load?
>>>      
>>>
>>They're not, that's exactly what I said above. :-)
>>    
>>
>
>Then isn't PATH@REV 'safer' than NODE-CHANGE?
>
>Or, we should write out the ids when dumping and on load use that
>data, so we are sure that the ids are stable across dump/load.  Hmmm.
>

What I proposed was for the dumper to convert NODE-CHANGE-PK to
PATH@REV, and the loader to convert back. That would mean dump and load
has to know about the semantics of svn:merged.


-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: [PROPOSAL] Merging Improved

Posted by Sander Striker <st...@apache.org>.

> From: Branko Cibej [mailto:brane@xbc.nu]
> Sent: Friday, April 11, 2003 9:04 AM

>> I meant, given something like UUID::NODE-ID@REVISION in my svn:merged property,
>> will I know what path that relates to in the remote repository?
> 
> Ah! You'd have to ask the remote repository about that. We record the
> committed-path of every node revision.

Ok.
 
> Come to think of it, thought, the NODE-CHANGE primary key contains both
> the node ID and the branch (copy) id, so it might be better to record
> node-changes. Getting a path from a node change is even simpler than
> from a node-id+revision.
> 
>>> Oh, yes: one other thing to worry about here are dump/load cycles. If
>>> you use node id's instead of paths, then the dumper has to convert them
>>> to paths and the loader has to convert back, because node IDs aren't
>>> preserved across a reload.

But that won't help for remote repositories.  You don't even know when a
remote repository will be dumped/loaded and you might be stuck with IDs
that are invalid afterwards.

>> I wonder, if we use PATH@REVISION, we should be able to get a tree delta
>> from the server which does take renames into account, no?
> 
> Yes. But you'd have to talk to the server.

But we'd have to do that anyway to figure out the commited-path, right?
So why not do it the other way around?  Given PATH@REVISION, return
NODE-CHANGE.

Or is asking the server only needed for remote repositories?

 
>> Are NODE-IDs guaranteed to be the same after a dump/load?
> 
> They're not, that's exactly what I said above. :-)

Then isn't PATH@REV 'safer' than NODE-CHANGE?

Or, we should write out the ids when dumping and on load use that
data, so we are sure that the ids are stable across dump/load.  Hmmm.


Sander


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Branko Čibej <br...@xbc.nu>.

Sander Striker wrote:

>>From: Branko Cibej [mailto:brane@xbc.nu]
>>Sent: Friday, April 11, 2003 12:51 AM
>>    
>>
>
>  
>
>>>>From: Branko Cibej [mailto:brane@xbc.nu]
>>>>Sent: Friday, April 11, 2003 12:25 AM
>>>>   
>>>>
>>>>I /think/ that recording NODE-ID@REVISION[:REVISION] would be sufficient
>>>>(assuming we have atomc renames that preserve node id's): Perhaps even
>>>>NODE-CHANGE-PK[:NODE-CHANGE-PK], which is semantically the but a) avoids
>>>>one index lookup when pulling the MRCA or MRMR from the repository, but
>>>>b) makes history comparison harder (without making assumptions about the
>>>>strucutre of the NODE-CHANGE primary key).
>>>>   
>>>>
>>>>        
>>>>
>>>Going from PATH to NODE-ID is indeed a minimal change to presented
>>>format in the proposal.  +1.  But, will that also work for out of repository
>>>merges though?
>>>      
>>>
>>You've already got a disambiguation mechanism for that:
>>
>>    [REPOS-UUID::]NODE-ID@REVISION[:REVISION]
>>    
>>
>
>I meant, given something like UUID::NODE-ID@REVISION in my svn:merged property,
>will I know what path that relates to in the remote repository?
>

Ah! You'd have to ask the remote repository about that. We record the
committed-path of every node revision.

Come to think of it, thought, the NODE-CHANGE primary key contains both
the node ID and the branch (copy) id, so it might be better to record
node-changes. Getting a path from a node change is even simpler than
from a node-id+revision.

>>Oh, yes: one other thing to worry about here are dump/load cycles. If
>>you use node id's instead of paths, then the dumper has to convert them
>>to paths and the loader has to convert back, because node IDs aren't
>>preserved across a reload.
>>    
>>
>
>I wonder, if we use PATH@REVISION, we should be able to get a tree delta
>from the server which does take renames into account, no?
>

Yes. But you'd have to talk to the server.

>Are NODE-IDs guaranteed to be the same after a dump/load?
>

They're not, that's exactly what I said above. :-)

>>What happens to thre repos UUIDs of out-of-repos merge sources when
>>those souces get reloaded is something I don't want to contemplate.
>>    
>>
>
>We would need a rewrite tool that rewerites OLDUUID to NEWUUID.  Or
>we must do one indirection and record the non-local repository UUIDs
>somewhere and use keys to that in the merge history.
>  
>

Oh, right -- we already hav a "uuids" table, for this very purpose. So
yes, you'd not put the actual UUID into the recode in svn:merge, you'd
just use its index in the uuids table. Index 0 is always this
repository; other indexes aren't used yet, but we can declare that
they're invariant to dump/load.

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: [PROPOSAL] Merging Improved

Posted by Sander Striker <st...@apache.org>.

> From: Branko Cibej [mailto:brane@xbc.nu]
> Sent: Friday, April 11, 2003 12:51 AM

> >>From: Branko Cibej [mailto:brane@xbc.nu]
> >>Sent: Friday, April 11, 2003 12:25 AM
> >>    
> >>
> >>I /think/ that recording NODE-ID@REVISION[:REVISION] would be sufficient
> >>(assuming we have atomc renames that preserve node id's): Perhaps even
> >>NODE-CHANGE-PK[:NODE-CHANGE-PK], which is semantically the but a) avoids
> >>one index lookup when pulling the MRCA or MRMR from the repository, but
> >>b) makes history comparison harder (without making assumptions about the
> >>strucutre of the NODE-CHANGE primary key).
> >>    
> >>
> >
> >Going from PATH to NODE-ID is indeed a minimal change to presented
> >format in the proposal.  +1.  But, will that also work for out of repository
> >merges though?
> 
> You've already got a disambiguation mechanism for that:
> 
>     [REPOS-UUID::]NODE-ID@REVISION[:REVISION]

I meant, given something like UUID::NODE-ID@REVISION in my svn:merged property,
will I know what path that relates to in the remote repository?
 
> Oh, yes: one other thing to worry about here are dump/load cycles. If
> you use node id's instead of paths, then the dumper has to convert them
> to paths and the loader has to convert back, because node IDs aren't
> preserved across a reload.

I wonder, if we use PATH@REVISION, we should be able to get a tree delta
from the server which does take renames into account, no?

Are NODE-IDs guaranteed to be the same after a dump/load?

> What happens to thre repos UUIDs of out-of-repos merge sources when
> those souces get reloaded is something I don't want to contemplate.

We would need a rewrite tool that rewerites OLDUUID to NEWUUID.  Or
we must do one indirection and record the non-local repository UUIDs
somewhere and use keys to that in the merge history.

Sander


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Branko Čibej <br...@xbc.nu>.

Sander Striker wrote:

>>From: Branko Cibej [mailto:brane@xbc.nu]
>>Sent: Friday, April 11, 2003 12:25 AM
>>    
>>
>>I /think/ that recording NODE-ID@REVISION[:REVISION] would be sufficient
>>(assuming we have atomc renames that preserve node id's): Perhaps even
>>NODE-CHANGE-PK[:NODE-CHANGE-PK], which is semantically the but a) avoids
>>one index lookup when pulling the MRCA or MRMR from the repository, but
>>b) makes history comparison harder (without making assumptions about the
>>strucutre of the NODE-CHANGE primary key).
>>    
>>
>
>Going from PATH to NODE-ID is indeed a minimal change to presented
>format in the proposal.  +1.  But, will that also work for out of repository
>merges though?
>  
>

You've already got a disambiguation mechanism for that:

    [REPOS-UUID::]NODE-ID@REVISION[:REVISION]

Oh, yes: one other thing to worry about here are dump/load cycles. If
you use node id's instead of paths, then the dumper has to convert them
to paths and the loader has to convert back, because node IDs aren't
preserved across a reload.

What happens to thre repos UUIDs of out-of-repos merge sources when
those souces get reloaded is something I don't want to contemplate.

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: [PROPOSAL] Merging Improved

Posted by Sander Striker <st...@apache.org>.

> From: Branko Cibej [mailto:brane@xbc.nu]
> Sent: Friday, April 11, 2003 12:25 AM

> Sander Striker wrote:
> 
>> Not doing tree deltas in the proposal is simply because the same
>> rules apply for them as for normal text files.  The merge history
>> can be recorded in the svn:merged-from property on the directories
>> the same way as we do for files.  Consider a directory to be a
>> textfile containing filenames ;).  [in practice we have real code
>> to deal with tree deltas, but the idea is pretty much the same]
> 
> With one caveat: taking tree changes into account will affect the way
> merge history is recorded (see my other post). If you record
> PATH@REVISION[:REVISION], you do get unique keys, but when you're
> comparing merge histories of M and L, you can't tell if they're related
> just by looking at the PATH part.

Ah, yes, ofcourse.  Duh.

> I /think/ that recording NODE-ID@REVISION[:REVISION] would be sufficient
> (assuming we have atomc renames that preserve node id's): Perhaps even
> NODE-CHANGE-PK[:NODE-CHANGE-PK], which is semantically the but a) avoids
> one index lookup when pulling the MRCA or MRMR from the repository, but
> b) makes history comparison harder (without making assumptions about the
> strucutre of the NODE-CHANGE primary key).

Going from PATH to NODE-ID is indeed a minimal change to presented
format in the proposal.  +1.  But, will that also work for out of repository
merges though?
 
> I whish Bull Tutt was here... :-(

/me too

Sander


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Branko Čibej <br...@xbc.nu>.

Sander Striker wrote:

>Not doing tree deltas in the proposal is simply because the same
>rules apply for them as for normal text files.  The merge history
>can be recorded in the svn:merged-from property on the directories
>the same way as we do for files.  Consider a directory to be a
>textfile containing filenames ;).  [in practice we have real code
>to deal with tree deltas, but the idea is pretty much the same]
>  
>

With one caveat: taking tree changes into account will affect the way
merge history is recorded (see my other post). If you record
PATH@REVISION[:REVISION], you do get unique keys, but when you're
comparing merge histories of M and L, you can't tell if they're related
just by looking at the PATH part.

I /think/ that recording NODE-ID@REVISION[:REVISION] would be sufficient
(assuming we have atomc renames that preserve node id's): Perhaps even
NODE-CHANGE-PK[:NODE-CHANGE-PK], which is semantically the but a) avoids
one index lookup when pulling the MRCA or MRMR from the repository, but
b) makes history comparison harder (without making assumptions about the
strucutre of the NODE-CHANGE primary key).

I whish Bull Tutt was here... :-(

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Tom Lord <lo...@emf.net>.

    > From: =?UTF-8?B?QnJhbmtvIMSMaWJlag==?= <br...@xbc.nu>

    > Tom, maybe I'm dense or blind or something, but could you point
    > me at a design doc for what you're proposing?


Pretty much.  It depends on how much detail you are willing to dig
into.  I can give you design-revealing documentation and what would be
regarded from the svn perspective as a functioning prototype.

If you're serious, and interested in details, let me start by
suggesting you grab a copy of the arch source code from 

           http://regexps.srparish.net/src/arch/

because I'm going to point you to some docs in the source tree.
(That's also the "functioning prototype".)

Re: [PROPOSAL] Merging Improved

Posted by Branko Čibej <br...@xbc.nu>.

Tom Lord wrote:

>That's part of my point: there's tons more to do just to specify your
>proposal or even to just finish working it out, while there's over a
>year's experience running an implementation of mine (albeit on a
>different storage manager -- and it's worth considering how much
>difference that storage manager difference makes (not much, imo)).
>

Tom, maybe I'm dense or blind or something, but could you point me at a
design doc for what you're proposing?


-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Tom Lord <lo...@emf.net>.


	Sander:

	> I just love quoting out of context...

Not my intent, obviously.

	> Seriously, I meant to keep the proposal simple.  

That's part of my point: there's tons more to do just to specify your
proposal or even to just finish working it out, while there's over a
year's experience running an implementation of mine (albeit on a
different storage manager -- and it's worth considering how much
difference that storage manager difference makes (not much, imo)).

To name just one example, your proposal relies heavily on variance
adjusted patching and, as much as you say it will be trivial to refine
the algorithm to eliminate any purported dangers, the fact remains
that it's all just untried theory at this point, with lots of design
and coding to do before it can even be experimented with in practice.


	For cherry picking there is clearly a solution.  The means to
	record the data for it is already in the proposal.  The same
	goes for out of tree, or unrelated tree merging (although I
	need to work that out I do have something on my desk for
	that).

Yes, cherry picking can be solved and your meta-data looks (from quick
examination) rich enough to do the job.  That's not the point:
cherry-picking has already been implemented on the mechanisms I'm
advocating.   Useful areas of functionality have already been worked
out, implemented, deployed, and used.

But I don't mean to turn this into a fight about which of these two
mechanisms should be part of svn and which shouldn't.  As I said: they
do different (but related) things -- they are essentially
complementary.   

I only mean to point out there exists a potentially practical route to
get very useful history-sensitive merging into svn quickly, and that
taking that route has some interesting side effects such as doing a
lot of the grunt work needed to support distributed branching and
changeset-oriented auditing -- and that none of that requires deep
changes to svn, just (mostly) a new layer.

-t


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: [PROPOSAL] Merging Improved

Posted by Sander Striker <st...@apache.org>.

> From: Tom Lord [mailto:lord@emf.net]
> Sent: Friday, April 11, 2003 12:06 AM

> On the subject of "faster, cheaper, better":
> 
> 	Sander:
> 
> 	> For the sake of this discussion I'm ignoring cherry picking
>         > for now.  
> 
> 	> [...]
> 
> 	> The goal of this proposal is to come up with a scheme that will
> 	> make the:
> 
> 	> svn merge URL .
> 
> 	> [...]
> 
> 	> For sake of simplicity I'm leaving the handling of tree
> 	> deltas out of this proposal.  
> 
> 	> [...] 
> 
> 	> Merging will rely heavily on variance adjusting.  

I just love quoting out of context...

Seriously, I meant to keep the proposal simple.  For cherry picking
there is clearly a solution.  The means to record the data for it
is already in the proposal.  The same goes for out of tree, or unrelated
tree merging (although I need to work that out I do have something
on my desk for that).

Not doing tree deltas in the proposal is simply because the same
rules apply for them as for normal text files.  The merge history
can be recorded in the svn:merged-from property on the directories
the same way as we do for files.  Consider a directory to be a
textfile containing filenames ;).  [in practice we have real code
to deal with tree deltas, but the idea is pretty much the same]


Sander

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Tom Lord <lo...@emf.net>.



On the subject of "faster, cheaper, better":

	Sander:

	> For the sake of this discussion I'm ignoring cherry picking
        > for now.  

	> [...]

	> The goal of this proposal is to come up with a scheme that will
	> make the:

	> svn merge URL .

	> [...]

	> For sake of simplicity I'm leaving the handling of tree
	> deltas out of this proposal.  

	> [...] 

	> Merging will rely heavily on variance adjusting.  


The patch-log merge history mechanism I'm advocating already supports
cherry picking nicely.  (In case I haven't been explicit enough:  I'm
advocating lifting the design for that mechanism straight out of arch,
treating the current arch version as a strong prototype, and doing the
necessary design-work/reimplementation to both unify it with svn and
implement in a form more portable to windows platforms.)

Rather than a single `merge' command that goes with that patch-log
mechanism, there are several different mechanisms, each useful in
different circumstances.  In practice, I've had the experience of
trying a merge two or three different ways and picking the one that
gives the results that need the least editing.  (In arch terminology,
`update', `replay', and `star-merge' are the big three -- with others
to follow later.)

Whole-tree merging in arch has nice solutions already worked out for
the handling of tree deltas.

There is no current implementation of variance adjusting built on top
of patch-logs, but all of the necessary history information is there.
It would be easy to add support for diff4-style merging to this
framework given either a suitable implementation of diff4, or an
implementation of some "patchutils" that perform variance adjustment
on patch sets.

I'm not trying to be smarmy or obsequious when I say that I can see
uses both for what Sander is proposing and what I'm proposing.   They
are potentially complementary features.  

However, my read on it is that what I'm proposing is much less work to
implement, with much less impact on the core of svn, and while it
loses out on "subtree merging" (or "partial merging"), it wins a lot
by giving quick access to a toolbox of merge strategies rather than a
single one, and opens the door quite a bit to distributed branching
and changeset auditting from the project mgt/business rule
perspective.

I realize my suggestion here (a) is problematic given my status (i.e.,
why I can't just go off an implement it at the moment); (b) risks the
_appearence_ of architectural garishness since it is in a rather
different "style" than existing svn functionality.   All I wish to say
is that both of those are solvable problems and should not be a reason
to dismiss the proposal out-of-hand.

-t

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: [PROPOSAL] Merging Improved

Posted by Sander Striker <st...@apache.org>.

> From: Sander Striker [mailto:striker@apache.org]
> Sent: Thursday, April 10, 2003 9:17 PM

> Note that example 2 and 3 contain some flaws.  Please
> disregard those until I repost.

Ok, fixed.

I also found out that I forgot about one rule in the
bookkeeping process:

If the MRCA of M and L is not the same path as M, then
a line needs to be added to svn:merge that contains the
MRCA of M and the HEAD of the path of the MRCA of M and L
(you might need to re-read that twice).

Example 3. M1 is a good example of this.


Note that in the proposals examples I wrote that
MRCA = Trunk@X = Branch1@Y.  Forget about the Branch1@Y
part.  Although the two are textually equivalent, Branch1@Y
cannot be viewed as a common ancestor.  Sorry for possible
cause for confusion.


Sander

RE: [PROPOSAL] Merging Improved

Posted by Sander Striker <st...@apache.org>.

Note that example 2 and 3 contain some flaws.  Please
disregard those until I repost.

Sander


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Branko Čibej <br...@xbc.nu>.

Jack Repenning wrote:

>>From: Branko Cibej [mailto:brane@xbc.nu] 
>>
>>If the file didn't change, you don't have to record any merge 
>>history. 
>>    
>>
>
>I think I disagree.  It might be that you performed the merge of the
>indicated revisions (of the whole repo), but rejected the changes to a
>particular file.  The fact that you rejected those changes is
>significant.  It needs to be remembered just as surely as the cases
>where you accept the changes, or the cases where you accept them with
>tweaks.  To fail to remember any of these decisions produces the same
>result: the next time you merge, the file differences are taken to be
>uncoordinated changes instead of conscious decisions.  At best, the tool
>bothers the human about the choice once more; possibly it undoes the
>choice.
>

As Sander and I said elsewuere, there is certainly a use case for
recording no-op merges. But doing so is not necessary to the merge
algorithm, and therefore shouldn't be required by default.

>At least, I think so.  But this is the sort of question I'm having
>trouble walking through the proposal, for lack of clarity on some of the
>basics.  Which is why your second response catches my eye:
>
>  
>
>>You can [record the merge on an unchanged file], of course, 
>>and it doesn't hurt, but it's not necessary to the proposed algorithm.
>>    
>>
>
>So, you're saying that it *is* possible to record a new value of a
>property against an unchanged version of a file?
>  
>

Of course it's possible; I never said otherwise. A propchange is a
change, after all, and can be committed even if the contents didn't change.

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: [PROPOSAL] Merging Improved

Posted by Jack Repenning <jr...@collab.net>.

> From: Branko Cibej [mailto:brane@xbc.nu] 
> 
> If the file didn't change, you don't have to record any merge 
> history. 

I think I disagree.  It might be that you performed the merge of the
indicated revisions (of the whole repo), but rejected the changes to a
particular file.  The fact that you rejected those changes is
significant.  It needs to be remembered just as surely as the cases
where you accept the changes, or the cases where you accept them with
tweaks.  To fail to remember any of these decisions produces the same
result: the next time you merge, the file differences are taken to be
uncoordinated changes instead of conscious decisions.  At best, the tool
bothers the human about the choice once more; possibly it undoes the
choice.

At least, I think so.  But this is the sort of question I'm having
trouble walking through the proposal, for lack of clarity on some of the
basics.  Which is why your second response catches my eye:

> You can [record the merge on an unchanged file], of course, 
> and it doesn't hurt, but it's not necessary to the proposed algorithm.

So, you're saying that it *is* possible to record a new value of a
property against an unchanged version of a file?


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Branko Čibej <br...@xbc.nu>.

Sander Striker wrote:

>>From: Branko Cibej [mailto:brane@xbc.nu]
>>Sent: Friday, April 11, 2003 8:47 AM
>>    
>>
>
>  
>
>>Jack Repenning wrote:
>>
>>    
>>
>>>This all depends on having svn:merged props to record the merge history.
>>>But is there always somewhere to park that property when you need it?
>>>Something here I'm not quite following, probably because I haven't fully
>>>grokked a more basic question ... So I'll ask that one:
>>>
>>>If you merge a URL into your WC, some files will change and others will
>>>not.  Can you attach a new svn:merged to all the files in the WC, even
>>>the unchanged ones, without losing any existing svn:merged?
>>>      
>>>
>>If the file didn't change, you don't have to record any merge history.
>>You can, of course, and it doesn't hurt, but it's not necessary to the
>>proposed algorithm.
>>    
>>
>
>Right.  For consistency I was thinking to always record the merge in
>svn:merged-from, even on unchanged files.  Then again, we do not
>need it and it is unrequired overhead...
>
>There is a use case for recording svn:merged-from for files that haven't
>changed though.  We might want to tell merge we want to sync up to
>branch@Y  (MRMR = branch@X), but to ignore all changes.  This way you
>can avoid certain changes on 'branch', since next merge will be from
>branch@Y to branch@HEAD, effectively bypassing the changes in
>branch@X:Y.  And yes, we do need an extra switch to pass to svn merge
>for that.  Ofcourse you can argue that this use case does have changed
>files, namely on branch (MRMR:L), instead of the working copy (M:WC).
>  
>

Yup. That's what ClearCase does with its option to "just draw a merge
arrow, don't peform a merge".

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Karl Fogel <kf...@newton.ch.collab.net>.

Brian Denny <br...@briandenny.net> writes:
> this feature was originally planned for post-1.0, right?

Right.

> if we make it optional/experimental now, does that mean that we are
> willing to release a 1.0 that has known bugs in this feature?

That would depend on the bugs, I guess.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Branko Čibej <br...@xbc.nu>.

Brian Denny wrote:

>>That's all.  Frankly, I'd be +1 on this going in right now, as long as
>>it starts out experimental (i.e., a flag to merge, that later can
>>become the default).
>>
>>    
>>
>
>this feature was originally planned for post-1.0, right?
>
>if we make it optional/experimental now, does that mean that we are
>willing to release a 1.0 that has known bugs in this feature?
>  
>

I'd say yes to that.

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Brian Denny <br...@briandenny.net>.

> 
> That's all.  Frankly, I'd be +1 on this going in right now, as long as
> it starts out experimental (i.e., a flag to merge, that later can
> become the default).
> 

this feature was originally planned for post-1.0, right?

if we make it optional/experimental now, does that mean that we are
willing to release a 1.0 that has known bugs in this feature?


-brian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by mark benedetto king <mb...@boredom.org>.

On Fri, Apr 11, 2003 at 11:28:14PM -0400, Greg Hudson wrote:
> 
> As far as I can tell, the only argument that holds up is space savings. 
> Either make a case that the amount of space saved is significant or
> don't do it.
> 

Well, space savings of this sort usually also translate to speed savings
because the row size decreases mean that fewer disk blocks are involved
in table operations and a greater percentage of the rows can be cached
in any bounded amount of RAM.

I don't know what the average record size is right now, but I suspect
that adding 36 bytes of UUID to it won't help matters.

--ben

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: [PROPOSAL] Merging Improved

Posted by Sander Striker <st...@apache.org>.

> From: Greg Hudson [mailto:ghudson@MIT.EDU]
> Sent: Saturday, April 12, 2003 5:28 AM

> On Fri, 2003-04-11 at 19:11, Greg Stein wrote:
>> On Sat, Apr 12, 2003 at 12:10:49AM +0200, Sander Striker wrote:
>>> Ah, but it isn't to save space.  It is to prevent a rewrite of all
>>> svn:merged-from properties when a remote repository UUID changes.
>>> Using the indexed method the rewrite to the new UUID would be one row
>>> change in a table...
> 
> What is this concept of "remote repository UUID changing?"  That's not
> supposed to happen.

Sure it isn't supposed to happen.  But since the remote repository
UUID _could_ change (by admin intervention), it would be nice to be
able to fixup our merge history to point to the right data again.
Agreed that this will (hopefully) be an uncommon use case.

Sander


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Greg Hudson <gh...@MIT.EDU>.

On Fri, 2003-04-11 at 19:11, Greg Stein wrote:
> On Sat, Apr 12, 2003 at 12:10:49AM +0200, Sander Striker wrote:
> > Ah, but it isn't to save space.  It is to prevent a rewrite of all
> > svn:merged-from properties when a remote repository UUID changes.
> > Using the indexed method the rewrite to the new UUID would be one row
> > change in a table...

What is this concept of "remote repository UUID changing?"  That's not
supposed to happen.

> Space savings is good. Rewrite is good. But also note that the table could
> include more than just the UUID. We could also include other metadata about
> the repository. "Canonical URL" or "Owner" or ...

None of which are necessary to handle merge semantics.

As far as I can tell, the only argument that holds up is space savings. 
Either make a case that the amount of space saved is significant or
don't do it.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Greg Stein <gs...@lyra.org>.

On Sat, Apr 12, 2003 at 12:10:49AM +0200, Sander Striker wrote:
>...
> > Absolutely.  If there's some compelling reason to index them in the
> > property, then that's fine.  All I really meant to say is, let's not
> > optimize this just for the sake of space savings.
> 
> Ah, but it isn't to save space.  It is to prevent a rewrite of all
> svn:merged-from properties when a remote repository UUID changes.
> Using the indexed method the rewrite to the new UUID would be one row
> change in a table...

You bet.

Space savings is good. Rewrite is good. But also note that the table could
include more than just the UUID. We could also include other metadata about
the repository. "Canonical URL" or "Owner" or ...

This is what relational DBs are all about. Enable the relation, and then you
can grow into all kinds of cool stuff.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Tom Lord <lo...@emf.net>.


       Sander: 

       > We have to merge the svn:merged-from properties.  This
       > specific property is going to be specially handled by the svn
       > merge code.  The generic diff/merge code won't be touching
       > it.

I understand that but it's still a (logical) diff/merge.  You
shouldn't be just combining the properties from the merged-from point
-- you have to compare the merged-from properties on the two sides of
the changes, look at their differences, then use those differences to
modify merged-from on the target tree.  Diff the properties on both
sides of the change -- merge that diff into the properties on the
target.

That's a nice convenient fall-out of merge-history-in-trees
vs. merge-history-in-properties.  In merge-history-in-trees, the
generic tree-delta algorithm does the right thing with merge histories
automatically.   So in merge-history-in-trees, it's a literal
diff/merge -- no additional code required.

-t


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: [PROPOSAL] Merging Improved

Posted by Sander Striker <st...@apache.org>.

> From: Tom Lord [mailto:lord@emf.net]
> Sent: Monday, April 14, 2003 2:26 AM
>     > From: "Sander Striker" <s....@striker.nl>
> 
>     > [me:]
>     >> [Where's transitive merge handling?!?!?]
> 
>     > Ah, you mean something like Example 2, M1?  Or M4 (which resembles
>     > your text above even more).
>     > 
>     > Sander
>     > 
>     > PS.  I'm referring to the examples in the pdf attached to this message:
>     >      http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgNo=35071
> 
> 
> Thanks.  Yup.
> 
> I didn't see it in the text so I was wondering if you were thinking
> about the need to diff and merge svn:merged* properties.....

We have to merge the svn:merged-from properties.  This specific property
is going to be specially handled by the svn merge code.  The generic
diff/merge code won't be touching it.

Sander


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Tom Lord <lo...@emf.net>.


    > From: "Sander Striker" <s....@striker.nl>

    > [me:]
    >> [Where's transitive merge handling?!?!?]

    > Ah, you mean something like Example 2, M1?  Or M4 (which resembles
    > your text above even more).
    > 
    > Sander
    > 
    > PS.  I'm referring to the examples in the pdf attached to this message:
    >      http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgNo=35071


Thanks.  Yup.

I didn't see it in the text so I was wondering if you were thinking
about the need to diff and merge svn:merged* properties.....

-t

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: [PROPOSAL] Merging Improved

Posted by Sander Striker <s....@striker.nl>.

> From: Tom Lord [mailto:lord@emf.net]
> Sent: Sunday, April 13, 2003 1:30 AM

> 	me: 
> 
> 	>> 2) I don't think I've seen any solution here to the "transitive merge"
> 	>>    problem, though perhaps I missed it.
> 	>>
> 
> 	brane:
> 
> 	> If I understand correcly what you mean by transitive merge, then
> 	> recording the branch points on the way to the MRCA is exactly what we
> 	> need to solve the problem, isn't it?
> 
> It remains entirely possible, but I don't think so.  If you want to
> point me to a specific message to re-re-review, please do.  Sorry if
> I'm raising a false spectre.
> 
> I'm talking about the problem illustrated here:
> 
> Suppose that I have three branches, A, B, and C.   Lowercase letters
> here will be revision numbers.
> 
> I'm going to merge the changes B:a-b into A.   Somewhere in B:a-b, I
> merged C:c-d into B.
> 
> At the end of my merge into A, shouldn't A have merge history from C
> (specifically changes C:c-d)?

Ah, you mean something like Example 2, M1?  Or M4 (which resembles
your text above even more).

Sander

PS.  I'm referring to the examples in the pdf attached to this message:
     http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgNo=35071

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Tom Lord <lo...@emf.net>.


	me: 

	>> 2) I don't think I've seen any solution here to the "transitive merge"
	>>    problem, though perhaps I missed it.
	>>

	brane:

	> If I understand correcly what you mean by transitive merge, then
	> recording the branch points on the way to the MRCA is exactly what we
	> need to solve the problem, isn't it?

It remains entirely possible, but I don't think so.  If you want to
point me to a specific message to re-re-review, please do.  Sorry if
I'm raising a false spectre.

I'm talking about the problem illustrated here:

Suppose that I have three branches, A, B, and C.   Lowercase letters
here will be revision numbers.

I'm going to merge the changes B:a-b into A.   Somewhere in B:a-b, I
merged C:c-d into B.

At the end of my merge into A, shouldn't A have merge history from C
(specifically changes C:c-d)?

Again, I apologize for wasting your time if the answer is just _right
there_ in some earlier message and I didn't recognize it as such.  If
nothing else, hopefully going through it detail will be of value to
people playing along at home (that's been my experience on the arch
lists).



       >> but I'm not convinced.  I've mentioned some issues that come
       >> up in that regard earlier, but here's another: Let's suppose
       >> that that the changes on the merged-from branch rename a
       >> file.  What does this correspond to in the merged-to
       >> revision?  In other words, how does one identify which
       >> corresponding file is to be renamed?

       > That's what the node ID is for. It uniquely identifies a
       > versoined object in the repository (or at least, it will once
       > the atomic rename fix is in).

       >> Note that, in the merged-to revision, the renamed file may
       >> have a different name and may have either a longer or
       >> shorter node id -- or even an unrelated node id.

       > Nope, node IDs are just unique integers, and they're supposed
       > to be unique for an object on all branches. Maybe you're
       > thinking of the old, CVS-like way we used to generate node
       > IDs; thankfully, we got rid of that along the way.

I was using the terminology in ./subversion/libsvn_fs/structure -- but
alas, a horribly (several entire months) out of date copy of that
file.  Yes, I was speaking in terms of the CVS-like ids that are no
longer there.

So let me rephrase in terms of a more recent copy of "structure".

It seems to me that I can easily wind up with a "logical project tree"
(a subtree of the repository, corresponding to a "single source tree")
in which I have multiple files:

	having the same node_id
	having different copy_id
	having various paths (of course)

None of those three datums, node_id, copy_id, or path reliably
correspond to the programmer's notion of "logical identity for the
purpose of whole-tree merging".  The node_id is ambiguous for that
purpose and sometimes irrelevant.  The copy_id varies between branches
with modified copies.  The paths (and you state elsewhere that you
agree) are quite orthogonal.

So I'm suggesting that there is a separate "logical file identity",
not currently reflected in the svn schema, which is essential to
tree-delta support.  A file identity that users should be able to
explicitly manipulate, independently of node histories and
relationships.

Put differently, I think that logical file identity is more often
something like:

	project_id.inventory_tag

than:

	node_id

except that project_tree_id and inventory_tag are missing from svn.
(Not that I'm saying those ids should wind up in your db schema -- I
don't think they should, actually -- just trying to "translate" the
concepts into more familiar notation.)

(Incidentally, when I copy a tree for a branch or tag, essentially
only the directory-node at the root of the copy is copied at that
time, right?   That's why branch/tag is an O(1) operation.  Now if I
check out from the new branch, modify a contained file, and then
commit -- am I correct in assuming that the modified file is "lazilly"
copied at the time of that commit?   I think the answer is "Duh, yes,
of course" but I'm asking just as a kind of checksum on my
understanding here.)

 
	> The path is indeed not enough to identify either node or
	> branch; but the node id identifies the node, and the copy id
	> identifies the branch; conversions between path+revision and
	> node-id+copy-id are trivial, although they do involve
	> queries to the server.

But there is nothing to prevent my creating a "branch" within a single 
source tree -- and this would seem to interact with the proposed merge
algorithm badly.   Calls to `svn merge' whose scope is smaller than
the source tree would also seem to raise problems (as when files are
renamed to or from outside that scope).

Such an intra-tree branch could be quite natural and valuable -- as a
way to keep track of the history and relationships between distinct
yet related files in a source tree.   But at the same time, they both
(a) muck up `svn merge' as proposed;  (b) are unprevented and
unpreventable by svn's "ontology", as far as I can see.


          >> 4) No consideration seems to have been given to auditing merges at the
          >>   project level.
          >>

          > Do you mean reviewing the results of a merge before they're
          > committed, or something else?

That kind of review, yes, but much more besides.

Let's suppose, for example, that I have a business rule in my software
processes like "add feature X" or "fix issue Y".   Those rules refer
to project trees -- complete source trees -- not individual files.

I'd like to be able to say "feature X is added by (project tree)
revision A" and "issue Y is fixed by (project tree) revision B".  I'd
like to be able to ask: "which branches have feature X or fix Y?"  I'd
like to be able to say "Oh, X or Y are fixed on such and such branch
-- has that been merged, yet, into such and such other branch?"

The per-node merge-history mechanism seems to make computing answers
to such questions rather expensive.

-t

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Branko Čibej <br...@xbc.nu>.

Tom Lord wrote:

>Rereading this thread, I note three design issues, and one big
>mistake:
>
>1) (The big mistake) I said "node" where I should have said "node
>   revision".  Oops.
>

That happens to the best of us. :-)

>2) I don't think I've seen any solution here to the "transitive merge"
>   problem, though perhaps I missed it.
>

If I understand correcly what you mean by transitive merge, then
recording the branch points on the way to the MRCA is exactly what we
need to solve the problem, isn't it?

>3) The question of tree deltas was dismissed with:
>
>	Sander: 
>	> For sake of simplicity I'm leaving the handling of tree deltas
>	> out of this proposal.  This being resemblant to textual
>	> merging to a degree.
>
>   but I'm not convinced.  I've mentioned some issues that come up in
>   that regard earlier, but here's another: Let's suppose that that
>   the changes on the merged-from branch rename a file.  What does
>   this correspond to in the merged-to revision?  In other words, how
>   does one identify which corresponding file is to be renamed?
>

That's what the node ID is for. It uniquely identifies a versoined
object in the repository (or at least, it will once the atomic rename
fix is in).

>  Note
>   that, in the merged-to revision, the renamed file may have a
>   different name and may have either a longer or shorter node id --
>   or even an unrelated node id.
>

Nope, node IDs are just unique integers, and they're supposed to be
unique for an object on all branches. Maybe you're thinking of the old,
CVS-like way we used to generate node IDs; thankfully, we got rid of
that along the way.

>  Worse, moreover, there may be
>   logically _different_ files in the merged-to revision that have the
>   same path, or a node id that is a "better match" than that of the
>   correct file for the node id in the changes.  In other words,
>   neither node id or path is a good indicator for file identity for
>   the purpose of renames and other elements of tree deltas.
>

The path is indeed not enough to identify either node or branch; but the
node id identifies the node, and the copy id identifies the branch;
conversions between path+revision and node-id+copy-id are trivial,
although they do involve queries to the server.

>4) No consideration seems to have been given to auditing merges at the
>   project level.
>

Do you mean reviewing the results of a merge before they're committed,
or something else?

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Tom Lord <lo...@emf.net>.


Rereading this thread, I note three design issues, and one big
mistake:

1) (The big mistake) I said "node" where I should have said "node
   revision".  Oops.

2) I don't think I've seen any solution here to the "transitive merge"
   problem, though perhaps I missed it.

3) The question of tree deltas was dismissed with:

	Sander: 
	> For sake of simplicity I'm leaving the handling of tree deltas
	> out of this proposal.  This being resemblant to textual
	> merging to a degree.

   but I'm not convinced.  I've mentioned some issues that come up in
   that regard earlier, but here's another: Let's suppose that that
   the changes on the merged-from branch rename a file.  What does
   this correspond to in the merged-to revision?  In other words, how
   does one identify which corresponding file is to be renamed?  Note
   that, in the merged-to revision, the renamed file may have a
   different name and may have either a longer or shorter node id --
   or even an unrelated node id.  Worse, moreover, there may be
   logically _different_ files in the merged-to revision that have the
   same path, or a node id that is a "better match" than that of the
   correct file for the node id in the changes.  In other words,
   neither node id or path is a good indicator for file identity for
   the purpose of renames and other elements of tree deltas.

4) No consideration seems to have been given to auditing merges at the
   project level.


-t

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: [PROPOSAL] Merging Improved

Posted by Sander Striker <st...@apache.org>.

> From: Karl Fogel [mailto:kfogel@newton.ch.collab.net]
> Sent: Friday, April 11, 2003 10:49 PM

[...]
>> No, but clarity is. If we're talking human-readable, imagine trying to
>> compare 40 chars of UUID vs. 1 or 2 chars of index.
> 
> Humans are good at ignoring anything that's fixed width.  Some humans
> will learn to recognize certain repository signatures, if we don't get
> in their way...
> 
>>>      The more self-contained and repository-independent this
>>>      information is, the less trouble we'll have down the road.  The
>>>      points about dump/load cycles are just one of many problems that
>>>      could result if we don't store the information in its most
>>>      basic, canonical form, imho.
>> 
>> Perhaps. And "premature optimization is the root of all evil," etc. But
>> I'd like to understand the issues better before deciding one way or another.
> 
> Absolutely.  If there's some compelling reason to index them in the
> property, then that's fine.  All I really meant to say is, let's not
> optimize this just for the sake of space savings.

Ah, but it isn't to save space.  It is to prevent a rewrite of all
svn:merged-from properties when a remote repository UUID changes.
Using the indexed method the rewrite to the new UUID would be one row
change in a table...

Sander


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Karl Fogel <kf...@newton.ch.collab.net>.

Branko Čibej <br...@xbc.nu> writes:
> I'd agree with this, although note that revision numbers aren't fixed
> width, so your argument about columns doesn't hold. But getting rid of
> the "@" is a good thing.

Oh, I didn't mean that they were fixed width, merely that the
alignment would be better this way (on average) than with paths in the
middle.

> I don't entirely agree with this. First of all, you can't determine
> branch relatedness from looking at the path (because of moves and
> renames); you /can/ -- or will be able to once the fs schema changes are
> in -- from the node change key. This might be important for performance
> in large merges.
> 
> I'd really have to read through Bill Tutt's latest description of
> copy/branch semantics and copy ID generation again. Brrrr.

Include both, then:  [UUID]:REV[-REV]:[NODECHANGE-KEY]:PATH

But let's not lose that path; people can use it.

> >      (And if this was the only purpose of that table, that in itself
> >      is no argument for using indices in the property.)
> >
> 
> No, but clarity is. If we're talking human-readable, imagine trying to
> compare 40 chars of UUID vs. 1 or 2 chars of index.

Humans are good at ignoring anything that's fixed width.  Some humans
will learn to recognize certain repository signatures, if we don't get
in their way...

> >      The more self-contained and repository-independent this
> >      information is, the less trouble we'll have down the road.  The
> >      points about dump/load cycles are just one of many problems that
> >      could result if we don't store the information in its most
> >      basic, canonical form, imho.
> 
> Perhaps. And "premature optimization is the root of all evil," etc. But
> I'd like to understand the issues better before deciding one way or another.

Absolutely.  If there's some compelling reason to index them in the
property, then that's fine.  All I really meant to say is, let's not
optimize this just for the sake of space savings.

> As long as bugs in this feature don't hold up 1.0, as Brian said.

+1

-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Branko Čibej <br...@xbc.nu>.

Karl Fogel wrote:

>Sander,
>
>I like the proposal!  A few comments -- I've read the whole thread up
>to this point, so some of this is really in response to subsequent
>discussion:
>
>   1. Instead of [UUID::]PATH@REV[:REV], how about [UUID]:REV[-REV]:PATH ?
>      That way, we keep all the fixed-format stuff up front, and the
>      most variant component (the path, which might even contain "@")
>      always comes after the second colon.  Easier to parse, and also
>      probably easier to read when lined up in a column.  You may
>      laugh, but such things make a difference :-).
>

I'd agree with this, although note that revision numbers aren't fixed
width, so your argument about columns doesn't hold. But getting rid of
the "@" is a good thing.

>      Using a hyphen to indicate revision ranges seems more readable,
>      and (in this format) would be easier to parse anyway.
>

Oh hah. I'll start putting together a thesaurus of the various colours,
shades and patterns we've painted our bikesheds in on this project. :-)

>   2. Please let's not add levels of indirection :-).  No matter how
>      smart we try to be, sometimes humans will have to read, and
>      perhaps even edit, these properties.  So let's use paths, not
>      node-ids; and use real UUIDS -- even though they're longer --
>      rather than indices into some particular repository's `uuids'
>      table.
>

I don't entirely agree with this. First of all, you can't determine
branch relatedness from looking at the path (because of moves and
renames); you /can/ -- or will be able to once the fs schema changes are
in -- from the node change key. This might be important for performance
in large merges.

I'd really have to read through Bill Tutt's latest description of
copy/branch semantics and copy ID generation again. Brrrr.

>      (And if this was the only purpose of that table, that in itself
>      is no argument for using indices in the property.)
>

No, but clarity is. If we're talking human-readable, imagine trying to
compare 40 chars of UUID vs. 1 or 2 chars of index.

>      The more self-contained and repository-independent this
>      information is, the less trouble we'll have down the road.  The
>      points about dump/load cycles are just one of many problems that
>      could result if we don't store the information in its most
>      basic, canonical form, imho.
>

Perhaps. And "premature optimization is the root of all evil," etc. But
I'd like to understand the issues better before deciding one way or another.

>   3. Regarding the question of whether to store the property on
>      files/dirs unaffected by the merge, my instinct is not to.
>

I agree, but...

>      To get a feel for this, I tried reasoning about an extreme case:
>      you do a merge and absolutely no files are affected.  In that
>      case, it would be equally correct to store no properties, or to
>      store a property on every file indicating that the merge was
>      done.  But in the latter case, we'd really be recording the mere
>      fact that someone had typed 'svn merge'.  The merge would have
>      had no real-world consequence at all, and yet all these
>      properties would change.  This seems inconsistent with other
>      Subversion operations, where if they do nothing, then no change
>      is recorded in the repository.  Also, it would use up space for
>      no reason :-).
>

... recording a no-op merge still has valid use cases. I'm not saying we
should do that by default, but we should allow it.

>That's all.  Frankly, I'd be +1 on this going in right now, as long as
>it starts out experimental (i.e., a flag to merge, that later can
>become the default).
>

As long as bugs in this feature don't hold up 1.0, as Brian said.


-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Greg Hudson <gh...@MIT.EDU>.

On Sat, 2003-04-12 at 22:08, Branko Čibej wrote:
> Imagine the following: I have branches A and B, where B was created from
> A. Now I change some files on A, and some on B. Every change on B causes
> a lazy copy of the changed files and their parent directories. Then I
> merge A to B, changing some files, but leaving most unchanged. Now, if I
> record the merge source on the unchanged files to satisfy b), then I
> have to lazily copy *all* the files to the branch;

Shouldn't we only have to do this for files which changed on A since the
branch was made?  That doesn't seem too expensive.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Branko Čibej <br...@hermes.si>.

Tom Lord wrote:

>    > From: =?UTF-8?B?QnJhbmtvIMSMaWJlag==?= <br...@xbc.nu>
>
>    > ... whole-tree merge history would be recorded in exactly the
>    > same way as file merge history -- in svn:merged properties on
>    > directories.
>
>Yet, under Sander's proposal (making reasonable assumptions about how
>he fleshes out tree-deltas), there is _no_ directory in a project tree
>that reliably records the merge history of that project tree.  To
>compute the merge history for the tree, it's necessary to either: (a)
>ensure, by usage pattern, the existence of a node that is reliably
>modified in every revision, on every branch, and included in every
>merge -- then rely on the history of that file to be representative of
>the tree;
>

Any directory can serve as such a node. Even when they're not explicitly
modified by a commit, every commit creates a new version of the
directories that contain the changed files, all the way to the root,
because of the "bubble-up" rule. These directories will be included in
every merge.

>    > No, it's the other way around: Subversion's design is being treated as
>    > an implementation constraint. Sander's proposal may not have mentioned
>    > this explicitly, but getting reasonable behaviour on whole-tree merges
>    > is *exactly* what it's aiming at. The assumption is that the merge
>    > algorithm, as described for files, can be logically expanded to apply to
>    > tree merges (rearrangements). Up till now, I haven't seen any evidence
>    > to the contrary.
>
>I see at least three problems.
>
>One is the practical "project tree history" problem mentioned above.
>

If a project tree is nothing but a collection of branches (directories),
then there is no problem.

>Another is that, while you elsewhere assert that the node_id is the
>logical identity of a file within a project tree, (a) there is nothing
>to prevent my winding up with multiple files in a project tree that
>share a common node_id (and, indeed, plausible scenarios where this
>would occur quite by accident);
>

Sure, two files in a tree can have the same node id. So what? That just
means they're related; in fact, that they are different branches of the
same file (they'll always have different branch IDs). That's something
you have to know; after all, a tree merge is composed of node merges.

>(b) you've now added the constraint
>that users must explicitly manage node_id's with logical file identity
>in mind -- which means that the easiest way to make some change just
>editting the tree is not going resemble the correct way to make that
>change using svn.
>

I don't see it that way. There's no reason at all to expose the node IDs
in to users in any way. Managing node IDs is the FS layer's job. Users
only have to think in terms of files and directories, nothing else.

>Finally: elsewhere you talk about the functional core of the svn data
>model being unlikely to change anytime soon.
>

IIRC, I said that about the usage model. The two are connected, but the
usage model drives that data model, IMHO.

> Well, the more that you
>make dependent on that model, the fewer degrees of freedom there are
>to _ever_ change it.  It's the classic design question of "monolith"
>vs. "software tools".  There's no good reason to make merge history a
>part of the monolith when it can, in a practical manner, be made
>orthogonal.
>

I've said before that, as far as I can see, the representations are
functionally identical. I've also said that storing the history in the
tree is better from the point of view of Subversion's usage model. You
haven't even tried to show me why your way is better (except with such
arguments as the above, and the "layer on top of svn" approach). So, I
don't buy it.

>    > All I can say here is, again, that Sander's proposed mechanism
>    > does exactly what you want on the tree level, it just doesn't
>    > have shape and colour that you'd like.
>
>Except that in saying that, you would be inaccurate.
>

You keep saying I'm wrong, yet you don't say how or why. You keep
throwing around phrases like "project tree history", yet you don't say
what a project tree is (I've beem making assumptions about that; i.e.,
that a "project tree" is any subtree that is by convention treated as a
whole).

Another thing strikes me: On the one hand, you complain that the
proposed data representation is too monolithic; on the other hand, you
want to impose a constraint (project trees) on the data model that would
make it less flexible.

Indeed, Subversion's data model relies heavily on conventions:
Organizing the tree into projects, branches, etc. -- and consequently,
merging rules -- are all a matter of convention. That's not a bad thing
at all -- in fact, I consider that Subversion's strongest feature.

>    >> Alas, in saying that, I suppose that I'm in some sense speaking
>    >> more through you to collabnet than to you directly.
>
>    > Tom, you're backsliding again. :-) Let's leave CollabNet's
>    > commercial interests out of this.
>
>Excuse me?
>
>First, I don't find that funny and so I don't understand your ":-)".
>
>Second, are you saying that "CollabNet's commercial interests" do not
>have a significant impact on those core developers who are employed by
>svn or that they do not, in turn, have significant impact on the plans
>for and design of svn?
>

This is a discussion about the design and data representation for a
better merge algorithm. So let's talk about design, and leave politics
out of it. Please.

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Daniel Berlin <db...@dberlin.org>.


On Sun, 13 Apr 2003, Tom Lord wrote:

>
>
>     > From: Greg Hudson <gh...@MIT.EDU>
>
>     > On Sun, 2003-04-13 at 20:23, Tom Lord wrote:
>
>     >> Yet, under Sander's proposal (making reasonable assumptions about how
>     >> he fleshes out tree-deltas), there is _no_ directory in a project tree
>     >> that reliably records the merge history of that project tree.
>
>     > Wouldn't the root of the project tree qualify, since changes to
>     > files bubble up through parent directories to the root?  (This
>     > of course depends on how tree deltas are fleshed out.)
>
> Well, help me out here.  If merge records "bubble up" that way, then
> the merge records for the shallow-depth subdirectories of / in the fs
> namespace will grow to be huge.  And if they don't bubble up that way,
> then indeed, there is _no_ directory in a project tree that reliably
> records the merge history of that project tree.
>
>
>
>     >> Second, are you saying that "CollabNet's commercial interests"
>     >> do not have a significant impact on those core developers who
>     >> are employed by svn or that they do not, in turn, have
>     >> significant impact on the plans for and design of svn?
>
>     > Given how many times CollabNet has made it clear that it does
>     > not want to determine the design of Subversion or even determine
>     > who has commit access to the project, I don't think you'll
>     > have much success going over the developers' heads to them.
>
>
> Wow.  What paranoid, f'd up language, "going over ... heads".
>
> >From what I've heard:
>
> (a) core developers employed by svn have a lot of clout in the svn
>     community;  they have a lot of influence over design decisions
>     and project planning.
>
>
> (b) the degrees of freedom of those core developers is strongly
>     influenced by the business plans of their employer regarding
>     svn.   After all, a substantial % of their time on svn is
>     in service of their employer.
>
> In such circumstances, in all walks of life, civil society makes the
> judgement that "There is a conflict of interset there."  We _never_
> burden people embraced in such a conflict with the responsibility to
> separate out concerns (a) and (b).   We _always_ assume that even the
> most reasonable, well intentioned persons can not separate out such
> interests when they collide in a single mind.

Not quite.
In law, at least, we allow it in quite a few cases without anything
further required, and in almost all cases if the
clients consent after consultation.

I'm not sure whether this puts lawyers in the category of unreasonable,
non-well-intentioned, or not having a mind, or some combination of the
three.
Or of course, you could just be plain wrong.

Have you actually looked at the codes of ethical conduct adhered to by
other licensed practioner fields (IE psychiatry, etc)?
Or are you just pulling it out of your ass?
Just curious, not implying either way.
--Dan


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Tom Lord <lo...@emf.net>.


    > From: Greg Hudson <gh...@MIT.EDU>

    > On Sun, 2003-04-13 at 20:23, Tom Lord wrote:

    >> Yet, under Sander's proposal (making reasonable assumptions about how
    >> he fleshes out tree-deltas), there is _no_ directory in a project tree
    >> that reliably records the merge history of that project tree.

    > Wouldn't the root of the project tree qualify, since changes to
    > files bubble up through parent directories to the root?  (This
    > of course depends on how tree deltas are fleshed out.)

Well, help me out here.  If merge records "bubble up" that way, then
the merge records for the shallow-depth subdirectories of / in the fs
namespace will grow to be huge.  And if they don't bubble up that way,
then indeed, there is _no_ directory in a project tree that reliably
records the merge history of that project tree.



    >> Second, are you saying that "CollabNet's commercial interests"
    >> do not have a significant impact on those core developers who
    >> are employed by svn or that they do not, in turn, have
    >> significant impact on the plans for and design of svn?

    > Given how many times CollabNet has made it clear that it does
    > not want to determine the design of Subversion or even determine
    > who has commit access to the project, I don't think you'll
    > have much success going over the developers' heads to them.


Wow.  What paranoid, f'd up language, "going over ... heads".

Re: [PROPOSAL] Merging Improved

Posted by Greg Hudson <gh...@MIT.EDU>.

On Sun, 2003-04-13 at 20:23, Tom Lord wrote:
> Yet, under Sander's proposal (making reasonable assumptions about how
> he fleshes out tree-deltas), there is _no_ directory in a project tree
> that reliably records the merge history of that project tree.

Wouldn't the root of the project tree qualify, since changes to files
bubble up through parent directories to the root?  (This of course
depends on how tree deltas are fleshed out.)

> Second, are you saying that "CollabNet's commercial interests" do not
> have a significant impact on those core developers who are employed by
> svn or that they do not, in turn, have significant impact on the plans
> for and design of svn?

Given how many times CollabNet has made it clear that it does not want
to determine the design of Subversion or even determine who has commit
access to the project, I don't think you'll have much success going over
the developers' heads to them.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Greg Stein <gs...@lyra.org>.

On Sun, Apr 13, 2003 at 05:23:51PM -0700, Tom Lord wrote:
>     > From: =?UTF-8?B?QnJhbmtvIMSMaWJlag==?= <br...@xbc.nu>
>...
>     >> Alas, in saying that, I suppose that I'm in some sense speaking
>     >> more through you to collabnet than to you directly.
> 
>     > Tom, you're backsliding again. :-) Let's leave CollabNet's
>     > commercial interests out of this.
> 
> Excuse me?
> 
> First, I don't find that funny and so I don't understand your ":-)".

I think Brane means to work through the design issues and leave out the
commercial interests. There isn't any reason to bring commercial issues to
the table when you're doing design work.

> Second, are you saying that "CollabNet's commercial interests" do not
> have a significant impact on those core developers who are employed by
> svn or that they do not, in turn, have significant impact on the plans
> for and design of svn?

Yes and yes. But the impact on the plans/design is granted by the community
rather than enforced/required by CollabNet. There is a large difference
there, and one that I'm happy about.

> On the contrary:  CollabNet fosters a public svn community presumably
> because they believe there to be a mutually beneficial alignment of
> interests between the goals of serving CollabNet's markets, and
> serving the interests of the public.

Yup, among many other things.

>...
> Consequently, ruling CollabNet's business interests to be a
> taboo topic is extraordinarily inappropriate.]

It isn't taboo. I think he's just trying to say that it doesn't have a place
in technical design discussions.

Talk about users' needs, sure. That is great, and is essential. But there
isn't much need to worry about CollabNet's users specifically. (although I
do appreciate the consideration :-)

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Tom Lord <lo...@emf.net>.


    > From: =?UTF-8?B?QnJhbmtvIMSMaWJlag==?= <br...@xbc.nu>

    > Again, true in essence, but what we're discussing here is "the merge
    > algorithm" that Sander proposed. :-)

Well, two related threads have bled together here.  The question posed
to me was "How do we represent merge history" and in my reply, I
anticipated Sander's (then) upcoming proposal, but also pointed out an
alternative.

    > ... whole-tree merge history would be recorded in exactly the
    > same way as file merge history -- in svn:merged properties on
    > directories.

Yet, under Sander's proposal (making reasonable assumptions about how
he fleshes out tree-deltas), there is _no_ directory in a project tree
that reliably records the merge history of that project tree.  To
compute the merge history for the tree, it's necessary to either: (a)
ensure, by usage pattern, the existence of a node that is reliably
modified in every revision, on every branch, and included in every
merge -- then rely on the history of that file to be representative of
the tree; or (b) crawl over all the nodes of the tree and all the
changes that have been merged into those nodes and perform a fairly
complicated computation on the data yielded by that crawl.

    > No, it's the other way around: Subversion's design is being treated as
    > an implementation constraint. Sander's proposal may not have mentioned
    > this explicitly, but getting reasonable behaviour on whole-tree merges
    > is *exactly* what it's aiming at. The assumption is that the merge
    > algorithm, as described for files, can be logically expanded to apply to
    > tree merges (rearrangements). Up till now, I haven't seen any evidence
    > to the contrary.

I see at least three problems.

One is the practical "project tree history" problem mentioned above.

Another is that, while you elsewhere assert that the node_id is the
logical identity of a file within a project tree, (a) there is nothing
to prevent my winding up with multiple files in a project tree that
share a common node_id (and, indeed, plausible scenarios where this
would occur quite by accident); (b) you've now added the constraint
that users must explicitly manage node_id's with logical file identity
in mind -- which means that the easiest way to make some change just
editting the tree is not going resemble the correct way to make that
change using svn.

Finally: elsewhere you talk about the functional core of the svn data
model being unlikely to change anytime soon.  Well, the more that you
make dependent on that model, the fewer degrees of freedom there are
to _ever_ change it.  It's the classic design question of "monolith"
vs. "software tools".  There's no good reason to make merge history a
part of the monolith when it can, in a practical manner, be made
orthogonal.


    > All I can say here is, again, that Sander's proposed mechanism
    > does exactly what you want on the tree level, it just doesn't
    > have shape and colour that you'd like.

Except that in saying that, you would be inaccurate.


    >> Alas, in saying that, I suppose that I'm in some sense speaking
    >> more through you to collabnet than to you directly.

    > Tom, you're backsliding again. :-) Let's leave CollabNet's
    > commercial interests out of this.

Excuse me?

First, I don't find that funny and so I don't understand your ":-)".

Second, are you saying that "CollabNet's commercial interests" do not
have a significant impact on those core developers who are employed by
svn or that they do not, in turn, have significant impact on the plans
for and design of svn?

On the contrary:  CollabNet fosters a public svn community presumably
because they believe there to be a mutually beneficial alignment of
interests between the goals of serving CollabNet's markets, and
serving the interests of the public.   On the basis of that presumed
shared interest, CollabNet is of generous intent towards the community
-- and I think it makes sense to return the favor.  [The alternative,
cynical interpretation is one I have explicitly avoided here -- mostly
because I don't _think_ it reflects anyone's _intensions_, though I'd
caution that vigilence against effects that are most consistent with
the cynical interpretation is the constant obligation of everyone
involved.  Consequently, ruling CollabNet's business interests to be a
taboo topic is extraordinarily inappropriate.]

I am hardly "well connected" in the world of consumers for revision
control systems, but I have had some interesting communications in
that market, and some interesting mentorship in the area of commercial
source mgt. generally.   I feel that I have at least non-0
qualification to speak up, for the benefit of all parties, not least
CollabNet, when I see a design process that is teetering on the brink
of gah-gah land.

-t

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Branko Čibej <br...@xbc.nu>.

Tom Lord wrote:

>    > From: =?UTF-8?B?QnJhbmtvIMSMaWJlag==?= <br...@xbc.nu>
>
>    > There are two issues here: a) what the merge algorithm needs to work
>    > correctly, and b) what the *user* wants to know about merge history. And
>    > probably c) what's optimal.
>
>    > If a file did not change during a merge, then we don't have to record
>    > the merge source to satisfy a). We *might* want to record the merge
>    > source to satisfy b). And c) is a tricky issue, given how Subversion's
>    > storage model works.
>
>Ok, that's not a bad way to look at it.   Let me try to build on that.
>
>First, let's write that a-b-c list a little differently, putting users
>first:
>
>x) What users want from merging.
>
>y) What can be implemented, practically, in the context of svn.
>
>z) What's the best way to satisfy the constraints (x) and (y).
>

O.K., I can live with that. Your points broaden this discussion slightly
-- after all, we started off with Sander's proposal for fixing the
repeated-merge problem on a file-by-file basis -- but that's not a bad
thing, really. :-)

>    > Imagine the following: [....]
>    > [conclusion: recording merge histories on noderev by noderev
>    >  basis is too expensive unless only directly effected files
>    >  record that history.]
>
>Right.  And the interesting things about looking at the relationship
>between "whole project tree" merges and their relationship to business
>rules and project managment is that:
>
>1) Whole project tree merging is a good fit for reasonable business
>   rules/project mgt.   People don't talk about "the GCC patch to
>   files so and so" -- they talk about "the [implicitly: whole tree]
>   patch to GCC that adds feature X or fixes bug Y".
>

I have to agree with Daniel Berlin here: People do talk about both
features and individual files.

>2) Whole project tree merging doesn't need noderev by nodrev merge
>   history.  It keeps one history record for all the files in a 
>   project tree.  It doesn't even require storing that history in
>   properties or other repository meta-data -- it has been
>   demonstrated that it can reasonably stored as plain old source
>   files -- an add-on to the project source tree.
>

That's true in essence, but so what? There is no semantic difference
between storing the merge history in per-object, per-revision properties
and storing it separately. However, which mechanism we chose depends
very much on Subversion's working model (central repository, local
[unversioned] sandbox). However nice it is to discuss this in the
abstract, we have to take into account the fact that this model isn't
likely to change anytime soon.

>I don't think there reasonably is a "the merge algorithm".
>

Again, true in essence, but what we're discussing here is "the merge
algorithm" that Sander proposed. :-)

>  I think
>there are many merge algorithms, and that the best design strategy is
>to syupport a big toolbox of many of them.  Broadly, the algorithms
>can be classified into "node by node" and "whole tree"
>

Here, I cannot agree. Nodes are not just files, they're also directories
and thus (implicitly) whole trees. Sander ignored that on purpose in his
original description of his merge algorithm, but he didn't forget about
it. So, ...

> -- and I think
>that from the business rule/project mgt perspective, "whole tree" is
>the priority.  How convenient, then, that whole tree merging and merge
>histories can be implemented without any changes to the db schema at
>all and most likely, with less work and less destabilization.
>

... whole-tree merge history would be recorded in exactly the same way
as file merge history -- in svn:merged properties on directories.

>It looks to me like the svn core developers are "stuck" with the false
>assumption that smart, useful merging is necessarily node by node
>

Is is, in Subversion -- because a node is a directory is a branch is a tree.

>and necessarily records history in noderev properties.
>

It does, in Subversion -- because of the usage model, as I explained above.

>Those implementation ideas are being treated as design constraints.
>

No, it's the other way around: Subversion's design is being treated as
an implementation constraint. Sander's proposal may not have mentioned
this explicitly, but getting reasonable behaviour on whole-tree merges
is *exactly* what it's aiming at. The assumption is that the merge
algorithm, as described for files, can be logically expanded to apply to
tree merges (rearrangements). Up till now, I haven't seen any evidence
to the contrary.

>They seem to stem from the underlying "project-less" structure of a svn
>filesystem -- a structure that is already directly contradicted by the
>recommended usage patterns.
>

Ah, this reminds me of that question somebody posted a few days ago
about why Subversion doesn't implement "real" branches. :-)

The answer is the same: The Subversion filesystem is free-form, but that
does *not* mean it's "project-less". Its structure does not directly
contradict the recommended usage patterns, because it does not *have* an
inherent structure. You can impose any structure you like on it.

(Apropos "recommended usage patterns" -- recommended by whom, and for
what purpose? I get the distinct feeling that, on this particular point,
/you're/ the one who isn't looking at the issue from a broad enough
perspective. I can't quite believe that's the case, but there you have it.)

>I'm saying: forget about those false constraints.  Consider
>introducing a layer _over_ svn to define (a) project trees as first
>class objects;
>

Not relevant to this discussion.

>(b) logical file identities within project trees (not
>strictly related to node ids);
>

A node ID is *the* logical object (file or directory!!) identity. It
uniquely identifies the versionable object within the repository,
throughout its history and on all branches. I cannot think of a more
general mechanism for identifying nodes.

>(c) in-tree patch logs as a way to record merge history.
>

This is *exactly* what we've been discussing on this thread all along!
The representation is different from what you have in mind, but the
semantics are exactly the same, as I pointed out before.

>On top of those concepts, which require 0
>changes to the db schema, and 0 use of properties, you can inherit
>"for free" the toolbox of merge operators from arch -- plus a bunch of
>other functionality besides.
>

O.K., this is, again, not relevant to this discussion (it's a
consequence of your point (a), above). We cannot make Subversion
dependent on some abstract "layer on top". On the other hand, the
feature Sander is proposing does not hinder your plans to use Subversion
as an arch back-end in any way.

(Heh, and note -- that plan involves asserting your own -- more
specific, less flexible -- structure on the Subversion filesystem, which
corroborates my statement that SVN's filesystem design does not impose
any specific structure or usage pattern. :-)

>To be sure, Sander's mechanisms look useful to me as a programmer
>convenience -- as a way to manipulate individual files _within_
>project trees.   I pretty much wouldn't care if they ever applied to
>tree-deltas -- I think their usefulness mostly pertains to individual
>text files.
>
>But the whole tree mechanisms look to me easier to implement, less
>destabilizing, valuable for more than just merging, and more closely
>aligned to the business rules/project mgt patterns in which fancy
>merging is of the greatest interest.
>

All I can say here is, again, that Sander's proposed mechanism does
exactly what you want on the tree level, it just doesn't have shape and
colour that you'd like.

>Alas, in saying that, I suppose
>that I'm in some sense speaking more through you to collabnet than to
>you directly.
>  
>

Tom, you're backsliding again. :-) Let's leave CollabNet's commercial
interests out of this.

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Greg Hudson <gh...@MIT.EDU>.

On Mon, 2003-04-14 at 01:36, Tom Lord wrote:
> I'm thinking of an FS namespace containing N project trees, and then N
> "lists of changesets" (aka write transaction journals) -- one per
> project tree.   Each list is totally ordered -- but without additional
> structure, the set of changesets is not totally ordered.

Ah, right, of course.  It's simple either without projects or without
global revision numbers, but complicated with both.

> Indeed.  The idea of making a journal of commit records the definitive
> history of the repository and treating everything else as lazy
> caching/memoizing/indexing can speed up/make more space-efficient not
> just native-fs, but other storage managers as well.

You seem to have conceded a different point than I stated.  The
app-level changeset journal idea isn't just orthogonal to the choice of
storage manager; it's orthogonal to the idea of project subdirectories.

>     >> But on what should we key those caches, indexes, and memos?  The
>     >> project-tree boundaries, because of the tractable size of the trees
>     >> they contain and their relationship to the atomicity of commits, are
>     >> ideal.
> 
>     > Why?
> 
> Tractable data set sizes;  isolation of (nearly?) all txns.

So, the idea of a well-constructed index is that you can search it or
update it quickly even if the underlying data set grows large.

> For svn-like performance, I was thinking more of roughly a single
> full-text, but with indexed changesets and cached skip-delta
> changesets.

It seems like by the time you're done with this cache, it would look
very much like our current filesystem structure.  Certainly, I don't
think it would be any simpler for the introduction of project
directories.

>     > Eliminating the revision number by making it effectively part of
>     > the path gains elegance in some areas, but loses it in others.
> 
> I don't see any losses in what you've described so far.

Right now the filesystem namespace is uniform and owned completely by
the user.  (The trunk/branches/tags convention is just a suggestion to
the user; there are no plans to make any Subversion software assume or
enforce the convention.)  I consider that elegant.  What you're
suggesting makes the top one or two levels of the namespace owned by the
implementation, and introduces the notion of an implicit symlink for the
head revision of a project directory.

On consideration, I don't think you gain any elegance in the UI by
eliminating revision numbers, because you still need to be able to, say,
check out by date, or diff against the previous revision of a file.  So
it wouldn't be a simple matter of eliminating the -r option to every
command which currently takes it.

> This distribution thing you keep mentioning:
(I mentioned it twice, once in response to your mentioning it and once
in summary.)

> 1) Gee, you know, actively planning _against_ that seems
>    short-sighted.

Not really.  I don't see Subversion evolving in that direction.  I see
better cross-repository support as a much better direction to look
towards than introducing the concept of partitioning within
repositories.  And even if I'm wrong, a central point which does nothing
but assign revision numbers can scale to a really really high level of
throughput; you'd have to have millions of servers before that central
point would become a bottleneck.

> 2) It's a red herring.  The same properties that help multi-server 
>    implementations help get good performance out of single server
>    implementations that are simpler and have fewer dependencies on
>    3rd party packages.

I have yet to be convinced that it helps; that's what we've been arguing
about above.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Tom Lord <lo...@emf.net>.


Disordered replies to:

    > From: Greg Hudson <gh...@MIT.EDU>


    > (Why would you need a separate list to assign the repository rev
    > numbers?  Presumably the list of changesets has an order, and
    > that could correspond to the repository revisions.)


I'm thinking of an FS namespace containing N project trees, and then N
"lists of changesets" (aka write transaction journals) -- one per
project tree.   Each list is totally ordered -- but without additional
structure, the set of changesets is not totally ordered.

You need that additional structure only to support the semantic of a
repository revision number --- which is part of why I think the idea
of a repository revision number is a gaff.

    > [in several places: that helps throughput with multiple servers,
    >  but not otherwise.   Multiple servers is a relatively
    >  unimportant case for us.]

A project tree partitioning provides a simple-to-implement, O(1) way
to separate concurrent commits to, for example, different branches or
different projects within the same repo.  As such, it gives an easy
mechanism to achieve reasonable throughput in even a single-server
native-fs implementation.  The thing you lose in a native-fs
implementation compared to BDB or RDBMS is the deadlock detection
built-in to those db mechanisms -- project-tree transaction
granularity means that you don't need that mechanism so much.

Now, to be sure, project trees and, gasp, planing to do away with
repository revision numbers do indeed make the model more easily
scalable.   It'd be a forsightful move to make such plans.   But those
aren't the focus of what I'm talking about.



    >> The second observation is that a commit consists of generating
    >> a changeset client side, sending it to the server, checking for
    >> up-to-dateness, and assigning a repository revision number.  An
    >> application-level log of such txns, suitable to ensure ACID
    >> properties, is essentially just a per-project-tree list of
    >> those changesets -- a data structure that's fairly easy to
    >> implement on a native-fs -- plus another list to assign the
    >> repository rev numbers.

    > That's a useful observation which could help us implement more
    > efficient journaling than BDB gives us, as you've discussed in
    > the past, but there's no reason we couldn't do that without
    > project directories.

Indeed.  The idea of making a journal of commit records the definitive
history of the repository and treating everything else as lazy
caching/memoizing/indexing can speed up/make more space-efficient not
just native-fs, but other storage managers as well.



    >> The third observation is that the various performance characteristics
    >> we want can be built on-top of that basic lists-of-changesets
    >> structure by caching and memoization of data about various revs.

    > I don't think you can get the theoretical performance curve of
    > skip-deltas simply by wrapping a cache around a changeset journal.

Um.... why not exactly?   A cache of skip-deltas....



    >> But on what should we key those caches, indexes, and memos?  The
    >> project-tree boundaries, because of the tractable size of the trees
    >> they contain and their relationship to the atomicity of commits, are
    >> ideal.

    > Why?

Tractable data set sizes;  isolation of (nearly?) all txns.


    > > Is that too brief?

    > If you're suggesting keeping a fulltext cache of every N revisions of
    > the repository, with hard-links between identical revs of files, 

Not specifically, no.  That's one technique to throw into
consideration.

For svn-like performance, I was thinking more of roughly a single
full-text, but with indexed changesets and cached skip-delta
changesets.  (Arch revision libraries contain not just full-text, but
also the changesets and some rudementary indexes -- the plan here (for
arch) is to beef up that indexing a bit and rely less on full-texts --
to treat the evolved rev library as more like a cache and less like a
memo).


    > then
    > you're probably not aiming for the same performance characteristics as I
    > am.  With the repository structure we have now, you can have millions of
    > revs of a file and can get to any of them by combining a double-digit
    > number of deltas and applying the result to the one plaintext stored in
    > the repository.

No -- that's very much what I'm aiming for.


    > I also disagree that units of atomicity are always of tractable size. 
    > gcc and Linux and Mozilla all require units of atomicity which are
    > pretty damn big, assuming you can split them up at all.  And if you do
    > split them up, you'll probably want atomic commits across the units with
    > some frequency.

My opinion strongly differs here -- and this is an area where having
someone spend 2-6 weeks studying the issue objectively is the right
thing.


    >> It would have been much wiser, a few years back, to
    >> implement commits in terms of tree-copies, not fs revision numbers.

    > I guess the basic idea here is that the repository would only
    > serve the head revision, 

The head _repository revision_, yes -- the head _project tree
revision_, no.  Of course you'd have access to all project tree
revisions, since they'd all be present in the head repository revision
(though they wouldn't all have equal access performance, of course).


    > you'd commit by copying the trunk (or
    > project) directory and modifying it, and an update is like a
    > switch.  

Pretty much, yeah.

    > But that's not a complete vision: how does "svn update"
    > know what to switch to?  What URL would correspond to "the head
    > of the trunk of the Subversion project" when the path of the
    > head changes with each commit?  

A path that didn't include a patch-level (aka project revision number)
would refer to the same path plus the highest numbered patch-level for
that path at the time of the start of the txn.


    > What restrictions does the
    > repository enforce to prevent history from disappearing the
    > space of what clients can access?

First class project trees.

Maybe this will help:  the portions of the fs namespace that include
project tree revision numbers would have write-once semantics -- the
portions that don't would be sort of like (changing) symbolic links to
that write-once portion.



    > Eliminating the revision number by making it effectively part of
    > the path gains elegance in some areas, but loses it in others.

I don't see any losses in what you've described so far.

    > And the only objective gain I've seen you describe has to do
    > with the theoretical maximum commit throughput of a repository
    > distributed across many servers with different servers taking
    > synchronization responsibility for different parts of the
    > namespace.  That's just not a compelling argument.

This distribution thing you keep mentioning:

1) Gee, you know, actively planning _against_ that seems
   short-sighted.

2) It's a red herring.  The same properties that help multi-server 
   implementations help get good performance out of single server
   implementations that are simpler and have fewer dependencies on
   3rd party packages.

-t

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Greg Hudson <gh...@MIT.EDU>.

On Sun, 2003-04-13 at 22:09, Tom Lord wrote:
> Well, for write transactions, the first observation is that distinct
> project subtrees within the namespace can be handled by
> non-communicating servers or server threads/processes

They could, but that only affects maximum throughput; it doesn't help to
make the design simpler than what we have now, since we don't support
hosting different parts of a repository's namespace on different
servers.

> The second observation is that a commit consists of generating a
> changeset client side, sending it to the server, checking for
> up-to-dateness, and assigning a repository revision number.   An
> application-level log of such txns, suitable to ensure ACID
> properties, is essentially just a per-project-tree list of those
> changesets -- a data structure that's fairly easy to implement on a
> native-fs -- plus another list to assign the repository rev numbers.

(Why would you need a separate list to assign the repository rev
numbers?  Presumably the list of changesets has an order, and that could
correspond to the repository revisions.)

That's a useful observation which could help us implement more efficient
journaling than BDB gives us, as you've discussed in the past, but
there's no reason we couldn't do that without project directories.

> The third observation is that the various performance characteristics
> we want can be built on-top of that basic lists-of-changesets
> structure by caching and memoization of data about various revs.

I don't think you can get the theoretical performance curve of
skip-deltas simply by wrapping a cache around a changeset journal.

> But on what should we key those caches, indexes, and memos?  The
> project-tree boundaries, because of the tractable size of the trees
> they contain and their relationship to the atomicity of commits, are
> ideal.

Why?

> Is that too brief?

If you're suggesting keeping a fulltext cache of every N revisions of
the repository, with hard-links between identical revs of files, then
you're probably not aiming for the same performance characteristics as I
am.  With the repository structure we have now, you can have millions of
revs of a file and can get to any of them by combining a double-digit
number of deltas and applying the result to the one plaintext stored in
the repository.

I also disagree that units of atomicity are always of tractable size. 
gcc and Linux and Mozilla all require units of atomicity which are
pretty damn big, assuming you can split them up at all.  And if you do
split them up, you'll probably want atomic commits across the units with
some frequency.

> It would have been much wiser, a few years back, to
> implement commits in terms of tree-copies, not fs revision numbers.

I guess the basic idea here is that the repository would only serve the
head revision, you'd commit by copying the trunk (or project) directory
and modifying it, and an update is like a switch.  But that's not a
complete vision: how does "svn update" know what to switch to?  What URL
would correspond to "the head of the trunk of the Subversion project"
when the path of the head changes with each commit?  What restrictions
does the repository enforce to prevent history from disappearing the
space of what clients can access?

Eliminating the revision number by making it effectively part of the
path gains elegance in some areas, but loses it in others.  And the only
objective gain I've seen you describe has to do with the theoretical
maximum commit throughput of a repository distributed across many
servers with different servers taking synchronization responsibility for
different parts of the namespace.  That's just not a compelling
argument.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Tom Lord <lo...@emf.net>.

    > From: Greg Hudson <gh...@MIT.EDU>

    > On Sun, 2003-04-13 at 12:29, Tom Lord wrote:
    >> And, hmm.... just as an aside.  I woke up this morning realizing that
    >> if you had first class project trees in svn, suddenly a fast,
    >> space-efficient, native-fs storage manager would be a lot more
    >> practical (i.e., no need for BDB or a RDMS).  Not trivial -- but
    >> tractable.

    > Explain?

Heh -- I like that question.

Well, briefly:

User-defined project boundaries give you an excellent hint for the
scope of atomic commits (so good, in fact, that if you tweaked the
semantics to say atomic commits could not have larger scope it
wouldn't be a big loss -- but that's not absolutely necessary).

That commit scope both (a) does a pretty good job of trivially
partitioning concurrent commits into non-interfering subsets; (b)
partitions the the potentially huge fs namespace into tractably-sized
subtrees.

So now how does that translate into a simple native-fs storage manager
that satifies the performance characteristics we ideally expect from a
svn server?   

Well, for write transactions, the first observation is that distinct
project subtrees within the namespace can be handled by
non-communicating servers or server threads/processes -- communication
is only necessary for those rare commits that span project-tree
boundaries (if, in fact, you want to support those).

The second observation is that a commit consists of generating a
changeset client side, sending it to the server, checking for
up-to-dateness, and assigning a repository revision number.   An
application-level log of such txns, suitable to ensure ACID
properties, is essentially just a per-project-tree list of those
changesets -- a data structure that's fairly easy to implement on a
native-fs -- plus another list to assign the repository rev numbers.
(Here, you'd be better off without repository rev numbers -- letting
project trees rev asynchronously -- but either way, your talking about
ACID managment of write-only lists that just grow: pretty easy on a
native fs).   (The arch native repository format is pretty close to
this as it stands -- though it lacks the global repository rev.)

But a "list of changesets" doesn't give you the access pattern
performance characteristics you want, so:

The third observation is that the various performance characteristics
we want can be built on-top of that basic lists-of-changesets
structure by caching and memoization of data about various revs.  For
example, we'll want some supplementary fs data structures that
(approximately) cache head revisions or that build various indexes.
But on what should we key those caches, indexes, and memos?  The
project-tree boundaries, because of the tractable size of the trees
they contain and their relationship to the atomicity of commits, are
ideal.  (This is a little different from the BDB server which keys
whole-text head revisions on node-ids, rather than paths -- but I
think path-based access is what dominates the performance
expectation.)  (Here I'd against point to arch -- specifically to the
"revision library" mechanism -- not because it's exactly what you'd
want in a svn server, but because it's close enough that I suspect you
can do the in-betweening yourself.)

Is that too brief?

Now let me add that, absent explicit project trees -- you could fake
them.  You could make up a heuristic like "all third-level directories
count as project trees" -- but I think you'll eventually run into
problems with heuristics like that and that explicit project trees are
by far, the cleaner solution.

And I'll also add that repository revision numbers are lame and y'all
should be thinking in terms of getting rid of them (at least at the UI
level).  You have a transactional fs with cheap cloning and that's all
you need -- you don't need to serialize changes to unrelated sections
of the fs.  It would have been much wiser, a few years back, to
implement commits in terms of tree-copies, not fs revision numbers.
And it's never to late to plan a migration towards that....  The pivot
point is the recommended usage patterns decorated with labels that say
"this part is going to change."

-t

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Greg Hudson <gh...@MIT.EDU>.

On Sun, 2003-04-13 at 12:29, Tom Lord wrote:
> And, hmm.... just as an aside.  I woke up this morning realizing that
> if you had first class project trees in svn, suddenly a fast,
> space-efficient, native-fs storage manager would be a lot more
> practical (i.e., no need for BDB or a RDMS).  Not trivial -- but
> tractable.

Explain?


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Tom Lord <lo...@emf.net>.


    > From: Daniel Berlin <db...@dberlin.org>

    > > Right.  And the interesting things about looking at the relationship
    > > between "whole project tree" merges and their relationship to business
    > > rules and project managment is that:
    > >
    > > 1) Whole project tree merging is a good fit for reasonable business
    > >    rules/project mgt.   People don't talk about "the GCC patch to
    > >    files so and so" -- they talk about "the [implicitly: whole tree]
    > >    patch to GCC that adds feature X or fixes bug Y".

    > Say what?
    > This is a bad example.
    > People talk about patches to files so and so all the time in GCC.   
    > Unless you are changing something about the actual intermediate 
    > represenation in GCC (IE reducing the size of the tree structures by 
    > moving members around, etc), or adding macros to *all* targets (or all 
    > configs of a target), generally, you don't touch more than maybe a few 
    > files.
    > That would be madness in terms of trying to find bugs if most patches 
    > didn't touch < 5 files.
    > Pick a different example.

I think you just misunderstood me.

When I say "whole tree patch" I don't mean "a patch that touches all
or most files" -- I mean the same kind of patch you're talking about:
a logically unitary change, large or small enough to suit the
particular process for which it's used (e.g., small for _most_ patches
that will be reviewed or applied to mainline or a development branch;
large for patches that sum up all the changes between two releases).

By saying "whole tree", I mean: (1) That there is a substree of the
overall project which is reasonably regarded as a "project tree" --
the whole project being a union of a few project trees.  (2) Those
project trees are the granularity of development lines.  For example,
if I'm hacking GCC, 99/100 times I will check out entire project trees
rather than just the 2-3 files I'm going to change because I'm going
to build and test the project tree.  If I form a development branch or
release branch, I'll branch with project-tree granularity.  (3)
Project trees are the primary granularity of processes like review
policies and patch flow and release management -- you can understand
patch flow into the mainline as patch flow organized with project tree
granularity.  (On that last point, yes there may be special-cases
rules for certain files within some project trees -- but those are
just a special case of special rules based on the nature of a patch:
merge history and patch flow is still organized at the project tree
level.)

Looking at GCC from the maintainer or project manager perspective, or
the perspective of a 3rd party distributor of GCC, or the perspective
of a GCC consumer: I don't ask "Have those instruction scheduler
changes been merged into gcc/foo.c?"  I ask, "Have those changes been
merged into branch so and so or release such and such;  Has feature
XYZZY appeared on the mainline yet?"

Yes, patches are generally small and programmers pay attention,
especially when reviewing, to which files are touched -- but the
administrative handling of patches, _including_the_merge_history_, is
at the "whole project tree" level.

(It's a separate question whether GCC is most reasonably regarded as
one huge or a few smaller project trees.)

And, hmm.... just as an aside.  I woke up this morning realizing that
if you had first class project trees in svn, suddenly a fast,
space-efficient, native-fs storage manager would be a lot more
practical (i.e., no need for BDB or a RDMS).  Not trivial -- but
tractable.

-t

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Daniel Berlin <db...@dberlin.org>.

> 
> Right.  And the interesting things about looking at the relationship
> between "whole project tree" merges and their relationship to business
> rules and project managment is that:
>
> 1) Whole project tree merging is a good fit for reasonable business
>    rules/project mgt.   People don't talk about "the GCC patch to
>    files so and so" -- they talk about "the [implicitly: whole tree]
>    patch to GCC that adds feature X or fixes bug Y".

Say what?
This is a bad example.
People talk about patches to files so and so all the time in GCC.   
Unless you are changing something about the actual intermediate 
represenation in GCC (IE reducing the size of the tree structures by 
moving members around, etc), or adding macros to *all* targets (or all 
configs of a target), generally, you don't touch more than maybe a few 
files.
That would be madness in terms of trying to find bugs if most patches 
didn't touch < 5 files.
Pick a different example.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Tom Lord <lo...@emf.net>.


    > From: =?UTF-8?B?QnJhbmtvIMSMaWJlag==?= <br...@xbc.nu>

    > There are two issues here: a) what the merge algorithm needs to work
    > correctly, and b) what the *user* wants to know about merge history. And
    > probably c) what's optimal.

    > If a file did not change during a merge, then we don't have to record
    > the merge source to satisfy a). We *might* want to record the merge
    > source to satisfy b). And c) is a tricky issue, given how Subversion's
    > storage model works.

Ok, that's not a bad way to look at it.   Let me try to build on that.

First, let's write that a-b-c list a little differently, putting users
first:

x) What users want from merging.

y) What can be implemented, practically, in the context of svn.

z) What's the best way to satisfy the constraints (x) and (y).


    > Imagine the following: [....]
    > [conclusion: recording merge histories on noderev by noderev
    >  basis is too expensive unless only directly effected files
    >  record that history.]

Right.  And the interesting things about looking at the relationship
between "whole project tree" merges and their relationship to business
rules and project managment is that:

1) Whole project tree merging is a good fit for reasonable business
   rules/project mgt.   People don't talk about "the GCC patch to
   files so and so" -- they talk about "the [implicitly: whole tree]
   patch to GCC that adds feature X or fixes bug Y".

2) Whole project tree merging doesn't need noderev by nodrev merge
   history.  It keeps one history record for all the files in a 
   project tree.  It doesn't even require storing that history in
   properties or other repository meta-data -- it has been
   demonstrated that it can reasonably stored as plain old source
   files -- an add-on to the project source tree.

I don't think there reasonably is a "the merge algorithm".  I think
there are many merge algorithms, and that the best design strategy is
to syupport a big toolbox of many of them.  Broadly, the algorithms
can be classified into "node by node" and "whole tree" -- and I think
that from the business rule/project mgt perspective, "whole tree" is
the priority.  How convenient, then, that whole tree merging and merge
histories can be implemented without any changes to the db schema at
all and most likely, with less work and less destabilization.

It looks to me like the svn core developers are "stuck" with the false
assumption that smart, useful merging is necessarily node by node and
necessarily records history in noderev properties.   Those
implementation ideas are being treated as design constraints.  They
seem to stem from the underlying "project-less" structure of a svn
filesystem -- a structure that is already directly contradicted by the
recommended usage patterns.

I'm saying: forget about those false constraints.  Consider
introducing a layer _over_ svn to define (a) project trees as first
class objects; (b) logical file identities within project trees (not
strictly related to node ids); (c) in-tree patch logs as a way to
record merge history.  On top of those concepts, which require 0
changes to the db schema, and 0 use of properties, you can inherit
"for free" the toolbox of merge operators from arch -- plus a bunch of
other functionality besides.

To be sure, Sander's mechanisms look useful to me as a programmer
convenience -- as a way to manipulate individual files _within_
project trees.   I pretty much wouldn't care if they ever applied to
tree-deltas -- I think their usefulness mostly pertains to individual
text files.

But the whole tree mechanisms look to me easier to implement, less
destabilizing, valuable for more than just merging, and more closely
aligned to the business rules/project mgt patterns in which fancy
merging is of the greatest interest.   Alas, in saying that, I suppose
that I'm in some sense speaking more through you to collabnet than to
you directly.   


-t

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Branko Čibej <br...@xbc.nu>.

Tom Lord wrote:

>    > From: =?UTF-8?B?QnJhbmtvIMSMaWJlag==?= <br...@xbc.nu>
>
>    > Ah, I think I see the difficulty. The merge source *would* be
>    > recorded in the second case, because accepting or rejecting
>    > changes isn't part of the automated merge algorithm, it's part
>    > of merge auditing/conflict resolution, something which cannot be
>    > automated.
>
>
>So, I'm looking at some file, and I want to know "Has B:x-y" been
>merged into this file?"
>
>And, the properties on this file won't tell me?
>
>To really get the answer, I need some command that crawls over
>revisions x-y, finds out if the file changed in B, _then_ looks at the
>properties of my copy?
>
>Yuck.
>

:-)

There are two issues here: a) what the merge algorithm needs to work
correctly, and b) what the *user* wants to know about merge history. And
probably c) what's optimal.

If a file did not change during a merge, then we don't have to record
the merge source to satisfy a). We *might* want to record the merge
source to satisfy b). And c) is a tricky issue, given how Subversion's
storage model works.

Imagine the following: I have branches A and B, where B was created from
A. Now I change some files on A, and some on B. Every change on B causes
a lazy copy of the changed files and their parent directories. Then I
merge A to B, changing some files, but leaving most unchanged. Now, if I
record the merge source on the unchanged files to satisfy b), then I
have to lazily copy *all* the files to the branch; this makes the cost
of the merge (or rather, the cost of the commit after the merge)
proportional to the size of the tree, not the size of the changes. And
that's not good at all.

In fact, that's true even without taking the specifics of Subversion's
storage model into account. On the other hand, *not* recording the merge
data doesn't make later merges much more expensive; all the data
required to perform the merge can still be gathered in a single tree
delta report. From the performance viewpoint, it would only make sense
to record the merge point for unchanged files (by default) if the file
changed in the same way on both branches.

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Tom Lord <lo...@emf.net>.


    > From: =?UTF-8?B?QnJhbmtvIMSMaWJlag==?= <br...@xbc.nu>

    > Ah, I think I see the difficulty. The merge source *would* be
    > recorded in the second case, because accepting or rejecting
    > changes isn't part of the automated merge algorithm, it's part
    > of merge auditing/conflict resolution, something which cannot be
    > automated.


So, I'm looking at some file, and I want to know "Has B:x-y" been
merged into this file?"

And, the properties on this file won't tell me?

To really get the answer, I need some command that crawls over
revisions x-y, finds out if the file changed in B, _then_ looks at the
properties of my copy?

Yuck.

-t



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Branko Čibej <br...@xbc.nu>.

Jack Repenning wrote:

>>From: Karl Fogel [mailto:kfogel@newton.ch.collab.net] 
>>    
>>
>
>  
>
>>      To get a feel for this, I tried reasoning about an extreme case:
>>      you do a merge and absolutely no files are affected.  In that
>>      case, it would be equally correct to store no properties, or to
>>      store a property on every file indicating that the merge was
>>      done.
>>    
>>
>
>Again, I point out the distinction between two meanings for "no files
>were affected":
>
>"there were no differences to merge"
>"all the differences were rejected"
>
>The first case is certainly a no-op, and (surely?) everyone would agree
>that there's no value in recording anything about it.
>

Ah, I think I see the difficulty. The merge source *would* be recorded
in the second case, because accepting or rejecting changes isn't part of
the automated merge algorithm, it's part of merge auditing/conflict
resolution, something which cannot be automated.

>The second is a conscious decision (startling, in your extreme case, but
>hey, that's what extremities are for!).  You can only argue that no
>record should be made of this (startling) event if you can also argue
>that no subsequent use would be made of the record.  And I think that's
>not true, here: the next merge including this range of revisions
>will/should be affected by this record.
>  
>

Right. It now looks like we're talking about the exact same use case,
under a different name; a no-op merge isn't a merge that produces no
differences, it's a merge that *ignores* any differences.


-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: [PROPOSAL] Merging Improved

Posted by Jack Repenning <jr...@collab.net>.

> From: Karl Fogel [mailto:kfogel@newton.ch.collab.net] 

>       To get a feel for this, I tried reasoning about an extreme case:
>       you do a merge and absolutely no files are affected.  In that
>       case, it would be equally correct to store no properties, or to
>       store a property on every file indicating that the merge was
>       done.

Again, I point out the distinction between two meanings for "no files
were affected":

"there were no differences to merge"
"all the differences were rejected"

The first case is certainly a no-op, and (surely?) everyone would agree
that there's no value in recording anything about it.

The second is a conscious decision (startling, in your extreme case, but
hey, that's what extremities are for!).  You can only argue that no
record should be made of this (startling) event if you can also argue
that no subsequent use would be made of the record.  And I think that's
not true, here: the next merge including this range of revisions
will/should be affected by this record.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Karl Fogel <kf...@newton.ch.collab.net>.

Sander,

I like the proposal!  A few comments -- I've read the whole thread up
to this point, so some of this is really in response to subsequent
discussion:

   1. Instead of [UUID::]PATH@REV[:REV], how about [UUID]:REV[-REV]:PATH ?
      That way, we keep all the fixed-format stuff up front, and the
      most variant component (the path, which might even contain "@")
      always comes after the second colon.  Easier to parse, and also
      probably easier to read when lined up in a column.  You may
      laugh, but such things make a difference :-).

      Using a hyphen to indicate revision ranges seems more readable,
      and (in this format) would be easier to parse anyway.

   2. Please let's not add levels of indirection :-).  No matter how
      smart we try to be, sometimes humans will have to read, and
      perhaps even edit, these properties.  So let's use paths, not
      node-ids; and use real UUIDS -- even though they're longer --
      rather than indices into some particular repository's `uuids'
      table.

      (And if this was the only purpose of that table, that in itself
      is no argument for using indices in the property.)

      The more self-contained and repository-independent this
      information is, the less trouble we'll have down the road.  The
      points about dump/load cycles are just one of many problems that
      could result if we don't store the information in its most
      basic, canonical form, imho.

   3. Regarding the question of whether to store the property on
      files/dirs unaffected by the merge, my instinct is not to.

      To get a feel for this, I tried reasoning about an extreme case:
      you do a merge and absolutely no files are affected.  In that
      case, it would be equally correct to store no properties, or to
      store a property on every file indicating that the merge was
      done.  But in the latter case, we'd really be recording the mere
      fact that someone had typed 'svn merge'.  The merge would have
      had no real-world consequence at all, and yet all these
      properties would change.  This seems inconsistent with other
      Subversion operations, where if they do nothing, then no change
      is recorded in the repository.  Also, it would use up space for
      no reason :-).

That's all.  Frankly, I'd be +1 on this going in right now, as long as
it starts out experimental (i.e., a flag to merge, that later can
become the default).

-Karl 

"Sander Striker" <st...@apache.org> writes:

> > From: Branko Cibej [mailto:brane@xbc.nu]
> > Sent: Friday, April 11, 2003 8:47 AM
> 
> > Jack Repenning wrote:
> > 
> >> This all depends on having svn:merged props to record the merge history.
> >> But is there always somewhere to park that property when you need it?
> >> Something here I'm not quite following, probably because I haven't fully
> >> grokked a more basic question ... So I'll ask that one:
> >>
> >> If you merge a URL into your WC, some files will change and others will
> >> not.  Can you attach a new svn:merged to all the files in the WC, even
> >> the unchanged ones, without losing any existing svn:merged?
> > 
> > If the file didn't change, you don't have to record any merge history.
> > You can, of course, and it doesn't hurt, but it's not necessary to the
> > proposed algorithm.
> 
> Right.  For consistency I was thinking to always record the merge in
> svn:merged-from, even on unchanged files.  Then again, we do not
> need it and it is unrequired overhead...
> 
> There is a use case for recording svn:merged-from for files that haven't
> changed though.  We might want to tell merge we want to sync up to
> branch@Y  (MRMR = branch@X), but to ignore all changes.  This way you
> can avoid certain changes on 'branch', since next merge will be from
> branch@Y to branch@HEAD, effectively bypassing the changes in
> branch@X:Y.  And yes, we do need an extra switch to pass to svn merge
> for that.  Ofcourse you can argue that this use case does have changed
> files, namely on branch (MRMR:L), instead of the working copy (M:WC).
> 
> 
> Sander
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: [PROPOSAL] Merging Improved

Posted by Sander Striker <st...@apache.org>.

> From: Branko Cibej [mailto:brane@xbc.nu]
> Sent: Friday, April 11, 2003 8:47 AM

> Jack Repenning wrote:
> 
>> This all depends on having svn:merged props to record the merge history.
>> But is there always somewhere to park that property when you need it?
>> Something here I'm not quite following, probably because I haven't fully
>> grokked a more basic question ... So I'll ask that one:
>>
>> If you merge a URL into your WC, some files will change and others will
>> not.  Can you attach a new svn:merged to all the files in the WC, even
>> the unchanged ones, without losing any existing svn:merged?
> 
> If the file didn't change, you don't have to record any merge history.
> You can, of course, and it doesn't hurt, but it's not necessary to the
> proposed algorithm.

Right.  For consistency I was thinking to always record the merge in
svn:merged-from, even on unchanged files.  Then again, we do not
need it and it is unrequired overhead...

There is a use case for recording svn:merged-from for files that haven't
changed though.  We might want to tell merge we want to sync up to
branch@Y  (MRMR = branch@X), but to ignore all changes.  This way you
can avoid certain changes on 'branch', since next merge will be from
branch@Y to branch@HEAD, effectively bypassing the changes in
branch@X:Y.  And yes, we do need an extra switch to pass to svn merge
for that.  Ofcourse you can argue that this use case does have changed
files, namely on branch (MRMR:L), instead of the working copy (M:WC).

Sander

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Merging Improved

Posted by Branko Čibej <br...@xbc.nu>.

Jack Repenning wrote:

>This all depends on having svn:merged props to record the merge history.
>But is there always somewhere to park that property when you need it?
>Something here I'm not quite following, probably because I haven't fully
>grokked a more basic question ... So I'll ask that one:
>
>If you merge a URL into your WC, some files will change and others will
>not.  Can you attach a new svn:merged to all the files in the WC, even
>the unchanged ones, without losing any existing svn:merged?
>  
>

If the file didn't change, you don't have to record any merge history.
You can, of course, and it doesn't hurt, but it's not necessary to the
proposed algorithm.

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: [PROPOSAL] Merging Improved

Posted by Jack Repenning <jr...@collab.net>.

This all depends on having svn:merged props to record the merge history.
But is there always somewhere to park that property when you need it?
Something here I'm not quite following, probably because I haven't fully
grokked a more basic question ... So I'll ask that one:

If you merge a URL into your WC, some files will change and others will
not.  Can you attach a new svn:merged to all the files in the WC, even
the unchanged ones, without losing any existing svn:merged?


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org