You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Julian Foad <ju...@btopenworld.com> on 2013/08/30 13:03:35 UTC

Move Tracking in the Update Editor

I'm working on how the update editor can handle moves.  It's more complex
than the commit editor, because there can be multiple instances of the
"same" repository node in the WC, so moves are not necessarily unique.

This is a copy of my notes in progress.  I could use some suggestions or
thoughts if you have any?

[ASCII version first; scroll down for HTML version.]

=============================================

= Summary =
A WC can contain multiple instances of the same repository node, by
mixed-revision and/or switched paths. During an update, multiple instances
can move to the same target path, and one instance can move to multiple
target paths. We can't reasonably avoid WCs getting into such a state, nor
forbid updating such a WC.

An editor that can perform one move per node (that is, per node-copy-id) is
suitable for editing a repository state, and can thus be used as the commit
editor. The existing update code drives an edit over WC paths, not over
repository nodes or URLs. An editor that can perform only one move per node
cannot be used as the update editor.

Therefore, the update editor must somehow handle multiple source and
destination paths for the same move.

= Details =
== WC has Multiple Instances per Node ==
A WC can contain multiple instances of the same repository node.

* A switched path points to the same or a different revision of any node
** WC: (A@10, X=^/A@10)
** WC: (A@10, X=^/A@11); repo: (^/A@10 and ^/A@11 are the same node-rev)
* A non-switched path points to a different revision of a moved node
** WC: (A@10, B@20); repo: (mv ^/A@10 ^/B@20)

The WC does not (currently) know about node-copy-ids, and so does not know
that it has multiple instances of the same node, except in the simple case
where the URL@REV is the same.

== Multi-Source and Multi-Target Moves ==
During an update, multiple instances of one repository node can move to the
same target path, and one instance can move to multiple target paths.

=== Multiple Sources ===
* WC: (A@10, B@20); repo: (mv A@10 → B@20 → C@30)
* Update to r30; A moves to C, and also B moves to C.

 |                   |
 +-- A  mv--\        |
 |           \       |
 +-- B  mv--\ \      |
             \-\-->  +-- C

If multiple sources have the same tree shape (switches, depth, etc.) and no
local modifications, then it makes sense for the WC to simply accept the
single destination. If there are local modifications to multiple source
instances, then the client might want to merge them or raise a conflict.

=== One Move, One Non-Move ===
This case is similar to Multiple Sources, with one important difference. If
we assume an editor in which each move is labeled by a move-id, the
consumer cannot recognize such a conflict just by examining the move-ids.

* WC: (A@10, B@20); repo: (mv A@10 → B@20)
* Update to r30; A moves to B, and also the existing B is updated.

 |                   |
 +-- A  mv---\       |
 |            \      |
 +-- B  mod--> \-->  +-- B

=== Multiple Targets ===
* WC: (A@10, X@10, Y=^/X@10); repo: (mv A@10 → X/A@20)
* Update to r20; A moves to X/A and also to Y/A.

 |                   |
 +-- A  mv--\        |
 |         \ \       |
 +-- X      \ \      +-- X
 |           \ \-->  |   +-- A
 |            \      |
 +-- Y         \     +-- Y
                \-->     +-- A

With multiple targets, there is no need to prevent multiple instances of
the destination node from being created. However, if there are local
modifications, it could be undesirable to end up with the same
modifications in multiple places, so the client might want to warn the user
or allow the user to choose what happens to the modifications.

=== Many-to-Many Move ===
Many-to-many move, combining multiple sources with multiple target paths:

* WC: r10 (A, B=^/A, X, Y=^/X); repo: (mv A@10 → X/B@20)
* Update to r20; A and B both move to X/C and to Y/C.

With a many-to-many move, there is the possibility that the sources and
destinations can be logically paired according to their pathwise nearness.
Example, starting from WC (trunk1→^/trunk@10, trunk2→^/trunk@20) and repo
(mv trunk/A@10 → trunk/B@20):

 |                     |
 +-- trunk1            +-- trunk1
 |   |                 |   |
 |   +-- A   mv--\     |   |
 |                \->  |   +-- B
 |                     |
 +-- trunk2            +-- trunk2
     |                     |
     +-- A   mv--\         |
                  \->      +-- B

This pairing could be implemented by the edit driver, in which case it
should describe each such move with its own id, or by the consumer on
receiving a set of many-to-many moves.

### What are the rules for this nearness pairing?

== Avoidance ==
We can't easily avoid WCs getting into such a state. To avoid it, the WC
would probably need to know node-ids and have substantial changes in the
allowed patterns of usage.

When a WC has multiple instances of the same repository node, we can't
reasonably forbid updating it.

== Editor ==
Either the update editor must cope with multiple source and destination
paths for the same move, or the client must request several simple edits,
each with no multiple instances. Options include:

* Traversal over WC paths using non-unique mv-away and mv-here
* Traversal over URLs or repository nodes
** multiple edit operations per node, one for each WC path
* Client requests multiple edits, with no multiple instances in a single
edit

=== Non-Unique Moves ===
Traversal over the WC paths, using non-unique mv-away and mv-here.

* 1 op. per path (excluding replacements)
* mv-away and mv-here not uniquely paired by their id

Problems:

* The consumer (client/WC) may want to know whether a given move is unique
before executing it, so that it can choose to warn or raise a conflict (for
example).
* Each move-away is (logically) accompanied by its own edit. For example,
with WC (A@10, B@20), repo (A@10 → B@20 → C@30), one edit applies to A
(r10:30) and another edit applies to B (r20:30).

 |                        |
 +-- A  mv-----\          |
 |      +r10:30 \         |
 |               \        |
 +-- B  mv-----\  \       |
        +r20:30 \  \      |
                 \--\-->  +-- C

But the WC doesn't necessarily need to receive instructions to edit every
move-away instance. If two instances have the same tree shape (switches,
depth, etc.) then it only needs to move and edit one of them and can
discard the others (after preserving any local mods). Thus:

 |                        |
 +-- A  mv[1*]            |  # Redundant duplicate of move-away id 1;
 |                        |  # can be deleted; indicated by id "1*".
 |                        |
 +-- B  mv[1]--\          |  # Any edits described for C are relative
        +r20:30 \         |  # to this instance.
                 \----->  +-- C

The edit operation for path A cannot be simply “delete”. It needs to
indicate that A is part of the same move that also affects B, because the
client may want to notify the user appropriately and preserve any local
mods in A (perhaps merging them into C).

So, is something like this the way forward?


=== Traversal over URLs or repository nodes ===
Would it help if the edit traversed URLs or nodes instead of WC paths? If
such an editor would visitit each URL once, it would need to be able to
send multiple edit operations for the same URL, one operation per instance
of that node. Or something similar for nodes instead of URLs.

This seems to have no advantages over the approach of describing non-unique
moves.

=== Multiple, Simple Edits ===
Another option was briefly considered. Can the client request multiple
edits, with no multiple instances in a single edit?

It would need to know node ids. Either the client would know the node ids
in the WC and make multiple reports, requesting one edit each, or the
reporter (report handler) would know them and have the ability to issue
multiple edits to the client. Either way, how would it decide how to
partition the changes? And still the client would want to be able to detect
conflicts, and this would seem to be more difficult to achieve if the
conflicting changes are in separate edits.

This seems too complex, and too much of a departure from the existing
system.

= Conclusion =
(No conclusion yet.)

= Appendix: Notes on Semantics =
Desirable semantics include:

* Move is a local operation. For example, we can make pairings in this
many-to-many move scenario:
** WC: (trunk1=^/trunk, trunk2=^/trunk)
** Repo: (mv trunk/foo to trunk/bar)
** Update: mv trunk1/foo → trunk1/bar; mv trunk2/foo → trunk2/bar.

Current WC semantics include:

* A node's name and existence is regarded as a property of that node,
rather than of its parent directory. An update of the path 'A' can add or
delete the node at 'A' in the base tree without having to update its parent.

=============================================
 Summary

A WC can contain multiple instances of the same repository node, by
mixed-revision and/or switched paths. During an update, multiple instances
can move to the same target path, and one instance can move to multiple
target paths. We can't reasonably avoid WCs getting into such a state, nor
forbid updating such a WC.

An editor that can perform one move per node (that is, per node-copy-id) is
suitable for editing a repository state, and can thus be used as the commit
editor. The existing update code drives an edit over WC paths, not over
repository nodes or URLs. An editor that can perform only one move per node
cannot be used as the update editor.

Therefore, the update editor must somehow handle multiple source and
destination paths for the same move.
Details WC has Multiple Instances per Node

A WC can contain multiple instances of the same repository node.

   -

   A switched path points to the same or a different revision of any node
   -

      WC: (A@10, X=^/A@10)
      -

      WC: (A@10, X=^/A@11); repo: (^/A@10 and ^/A@11 are the same node-rev)
       -

   A non-switched path points to a different revision of a moved node
   -

      WC: (A@10, B@20); repo: (mv ^/A@10 ^/B@20)

The WC does not (currently) know about node-copy-ids, and so does not know
that it has multiple instances of the same node, except in the simple case
where the URL@REV is the same.
Multi-Source and Multi-Target Moves

During an update, multiple instances of one repository node can move to the
same target path, and one instance can move to multiple target paths.
Multiple Sources

   -

   WC: (A@10, B@20); repo: (mv A@10 → B@20 → C@30)
   -

   Update to r30; A moves to C, and also B moves to C.

|                   |+-- A  mv--\        ||           \       |+-- B
mv--\ \      |
            \-\-->  +-- C

If multiple sources have the same tree shape (switches, depth, etc.) and no
local modifications, then it makes sense for the WC to simply accept the
single destination. If there are local modifications to multiple source
instances, then the client might want to merge them or raise a conflict.
One Move, One Non-Move

This case is similar to Multiple Sources, with one important difference. If
we assume an editor in which each move is labeled by a move-id, the
consumer cannot recognize such a conflict just by examining the move-ids.

   -

   WC: (A@10, B@20); repo: (mv A@10 → B@20)
   -

   Update to r30; A moves to B, and also the existing B is updated.

|                   |+-- A  mv---\       ||            \      |+-- B
mod--> \-->  +-- B

Multiple Targets

   -

   WC: (A@10, X@10, Y=^/X@10); repo: (mv A@10 → X/A@20)
   -

   Update to r20; A moves to X/A and also to Y/A.

|                   |+-- A  mv--\        ||         \ \       |+-- X
   \ \      +-- X|           \ \-->  |   +-- A|            \      |+--
Y         \     +-- Y
               \-->     +-- A

With multiple targets, there is no need to prevent multiple instances of
the destination node from being created. However, if there are local
modifications, it could be undesirable to end up with the same
modifications in multiple places, so the client might want to warn the user
or allow the user to choose what happens to the modifications.
Many-to-Many Move

Many-to-many move, combining multiple sources with multiple target paths:

   -

   WC: r10 (A, B=^/A, X, Y=^/X); repo: (mv A@10 → X/B@20)
   -

   Update to r20; A and B both move to X/C and to Y/C.

With a many-to-many move, there is the possibility that the sources and
destinations can be logically paired according to their pathwise nearness.
Example, starting from WC (trunk1→^/trunk@10, trunk2→^/trunk@20) and repo
(mv trunk/A@10 → trunk/B@20):

|                     |   +-- trunk1            +-- trunk1|   |
         |   ||   +-- A   mv--\     |   ||                \->  |   +--
B|                     |+-- trunk2            +-- trunk2
    |                     |
    +-- A   mv--\         |
                 \->      +-- B

This pairing could be implemented by the edit driver, in which case it
should describe each such move with its own id, or by the consumer on
receiving a set of many-to-many moves.

### What are the rules for this nearness pairing?
Avoidance

We can't easily avoid WCs getting into such a state. To avoid it, the WC
would probably need to know node-ids and have substantial changes in the
allowed patterns of usage.

When a WC has multiple instances of the same repository node, we can't
reasonably forbid updating it.
Editor

Either the update editor must cope with multiple source and destination
paths for the same move, or the client must request several simple edits,
each with no multiple instances. Options include:

   -

   Traversal over WC paths using non-unique mv-away and mv-here
   -

   Traversal over URLs or repository nodes
   -

      multiple edit operations per node, one for each WC path
       -

   Client requests multiple edits, with no multiple instances in a single
   edit

Non-Unique Moves

Traversal over the WC paths, using non-unique mv-away and mv-here.

   -

   1 op. per path (excluding replacements)
   -

   mv-away and mv-here not uniquely paired by their id

Problems:

   -

   The consumer (client/WC) may want to know whether a given move is unique
   before executing it, so that it can choose to warn or raise a conflict (for
   example).
   -

   Each move-away is (logically) accompanied by its own edit. For example,
   with WC (A@10, B@20), repo (A@10 → B@20 → C@30), one edit applies to A
   (r10:30) and another edit applies to B (r20:30).

|                        |+-- A  mv-----\          ||      +r10:30 \
      ||               \        |+-- B  mv-----\  \       |
       +r20:30 \  \      |
                \--\-->  +-- C

But the WC doesn't necessarily need to receive instructions to edit every
move-away instance. If two instances have the same tree shape (switches,
depth, etc.) then it only needs to move and edit one of them and can
discard the others (after preserving any local mods). Thus:

|                        |+-- A  mv[1*]            |  # Redundant
duplicate of move-away id 1;|                        |  # can be
deleted; indicated by id "1*".|                        |+-- B
mv[1]--\          |  # Any edits described for C are relative
       +r20:30 \         |  # to this instance.
                \----->  +-- C

The edit operation for path A cannot be simply “delete”. It needs to
indicate that A is part of the same move that also affects B, because the
client may want to notify the user appropriately and preserve any local
mods in A (perhaps merging them into C).
Traversal over URLs or repository nodes

Would it help if the edit traversed URLs or nodes instead of WC paths? If
such an editor would visitit each URL once, it would need to be able to
send multiple edit operations for the same URL, one operation per instance
of that node. Or something similar for nodes instead of URLs.

This seems to have no advantages over the approach of describing non-unique
moves.
Multiple, Simple Edits

Another option was briefly considered. Can the client request multiple
edits, with no multiple instances in a single edit?

It would need to know node ids. Either the client would know the node ids
in the WC and make multiple reports, requesting one edit each, or the
reporter (report handler) would know them and have the ability to issue
multiple edits to the client. Either way, how would it decide how to
partition the changes? And still the client would want to be able to detect
conflicts, and this would seem to be more difficult to achieve if the
conflicting changes are in separate edits.

This seems too complex, and too much of a departure from the existing
system.
Conclusion

(No conclusion yet.)
Appendix: Notes on Semantics

Desirable semantics include:

   -

   Move is a local operation. For example, we can make pairings in this
   many-to-many move scenario:
   -

      WC: (trunk1=^/trunk, trunk2=^/trunk)
      -

      Repo: (mv trunk/foo to trunk/bar)
      -

      Update: mv trunk1/foo → trunk1/bar; mv trunk2/foo → trunk2/bar.

Current WC semantics include:

   -

   A node's name and existence is regarded as a property of that node,
   rather than of its parent directory. An update of the path 'A' can add or
   delete the node at 'A' in the base tree without having to update its parent.

Re: Move Tracking in the Update Editor

Posted by Branko Čibej <br...@wandisco.com>.

On 30.08.2013 19:12, Julian Foad wrote:
> Branko Čibej wrote:
>
>> One possible solution is to replay the moves in the order they happened.
>> Then you'd get two cases:
>>
>>   * A is an ancestor of B: the operation is:
> ('Ancestor' meaning a time-line predecessor of the same node-copy-id.)

Yes.

[...]
>>   * A and B are the same node: In this case, they'll can only be visible
>>     at the same time if you have a switched subtree or an external
>>     subtree, and the multiple-source move doesn't happen (because the
>>     move needs to be replicated in both subtrees)
> If A points to ^/A@10 and B points to ^/A@10, then B is switched but the move can't (and shouldn't) be replicated in subtree B because B is the move-root node.

Sorry, I should've said that the subtrees are independent and therefore
don't run afoul of any of the scenarios you described.

>
>> I'm not sure how the editor driver would represent the move sequence and
>> whether the resulting (intermediate) tree conflict should be resolved to
>> a text conflict automatically or not; but I'm pretty sure it can be done
>> without introducing multiple-source or multiple-target moves.
> Well, I'd love it if you can say how.

Thinking about that. Converting intuition to algebra is not always
trivial or obvious. :)

-- Brane


-- 
Branko Čibej | Director of Subversion
WANdisco // Non-Stop Data
e. brane@wandisco.com

Re: Move Tracking in the Update Editor

Posted by Julian Foad <ju...@btopenworld.com>.

Branko Čibej wrote:

> On 30.08.2013 17:18, Julian Foad wrote:
>>  Branko Čibej wrote:
>>>  It strikes me that if for update-like edits, which are driven by the
>>>  server, the once rule is interpreted in terms of WC paths, not node URLs
>>>  or node IDs; then the server already has all the information it needs to
>>>  transform the working copy.
>>> 
>>>  Take your first multiple source example:
>>> 
>>>  |                   |
>>>  +-- A  mv--\        |
>>>  |           \       |
>>>  +-- B  mv--\ \      |
>>>              \-\-->  +-- C
>>> 
>>>  The working copy really doesn't have to know that both A and B became C.
>>>  It only has to represent the final state.
[...]
>>  The client *does* need to know that B is in fact being moved to C, so that
>> it can offer to transfer my local modifications from B to C.
> 
> I'm not at all sure that you can avoid a conflict even in this case.
> It's either a tree conflict, or a text conflict (if both A and B were
> modified, and they refer to the same node).

This isn't about trying to avoid a conflict, it's about how the server describes the incoming changes to the client.  The client may raise a conflict, or not, but that must be under the client's control and not an artefact of how the server expresses the edit.

> One possible solution is to replay the moves in the order they happened.
> Then you'd get two cases:
> 
>   * A is an ancestor of B: the operation is:

('Ancestor' meaning a time-line predecessor of the same node-copy-id.)

>       o move(A, B) -> tree conflict, can be resolved as a text merge
>       o move(B, C)

This is what I meant by "Traversal over URLs or repository nodes" in my document.  In other words, rather than visiting each WC path once, the edit visits each repository node (node-copy-id) once, and sends a series of edits for that node before visiting the next node.

It is a possible approach that could be explored, but it has drawbacks.  (1) If we tried to apply this approach to moves only and not to all edit operations I think that would create a big mess in terms of the editor semantics.  (2) If the WC node at 'A' has local mods and we edit it twice in succession (first when moving to B, again when moving to C) then that increases the likelihood of conflicts compared with if we applied just one big change.  (Because some parts of the first half of the change might be cancelled or rendered non-conflicting by the further changes in the second half.)

>   * A and B are the same node: In this case, they'll can only be visible
>     at the same time if you have a switched subtree or an external
>     subtree, and the multiple-source move doesn't happen (because the
>     move needs to be replicated in both subtrees)

If A points to ^/A@10 and B points to ^/A@10, then B is switched but the move can't (and shouldn't) be replicated in subtree B because B is the move-root node.

> I'm not sure how the editor driver would represent the move sequence and
> whether the resulting (intermediate) tree conflict should be resolved to
> a text conflict automatically or not; but I'm pretty sure it can be done
> without introducing multiple-source or multiple-target moves.

Well, I'd love it if you can say how.

- Julian

Re: Move Tracking in the Update Editor

Posted by Julian Foad <ju...@btopenworld.com>.

Julian Foad wrote:

> Philip Martin wrote:
>> [...] I can imagine an enhanced Ev1 editor drive that does
>> 
>>     move away A
>>     move to C (original A)
>>     del A
>>     move away B
>>     move to C (original B)
>>     del B
>>     add C
>> 
>>  The deletes lead to tree-conflicts on A and B due to local mods.  The
>>  add creates a pristine C with no local mods.  The user resolves the
>>  tree-conflicts post-update and gets to choose which local mods are
>>  merged to C, possibly both but one before the other, which may in turn
>>  raise text/prop/tree-conflicts on C.
> 
> This is good as far as local mods are concerned.  The decision of how to combine 
> A and B, or raise a conflict, is left to the WC client code as it should be, and 
> sufficient information is provided about the nature of the changes.
> 
> Functionally, this would work as you've shown.  Regarding updating the base 
> tree, however, note that the plain "add" in Ev1 is a node-by-node add 
> of the whole subtree, which could be arbitrarily large.  For performance, we 
> must not use a plain add, as a move should be designed to execute in O(1) 
> time. [...]

Philip pointed out to me that this would still be an improvement over what we have today.  If we're going to get anything into a release cycle it must be broken down into manageable stages, so perhaps this is a good place to draw the line for a first stage.

I'm fairly well persuaded by this view.

- Julian


> It may be advantageous to come up with an implementation plan whereby 
> move-awareness is introduced first to the repository side, with just enough 
> support in the WC to make typical move scenarios work much better than they do 
> today, and then as a second phase extend the WC to know about node-id 
> relationships and complete the WC-side awareness (so that scenarios like the 
> above commit-time failures are instead detected within the WC, and whatever else 
> goes along with that).

Re: Move Tracking in the Update Editor

Posted by Philip Martin <ph...@wandisco.com>.

Julian Foad <ju...@btopenworld.com> writes:

> Philip Martin wrote:
>
>> A current Ev1 drive would be:
>> 
>>     delete A
>>     open B
>>     change_prop B
>>     close B
>
> Right.  Note that here the changes in B are expressed relative to the
> pre-existing 'B' base state.
>
>> For the Ev1+ drive we add move and drop the "delete A" and "open 
>> B":
>> 
>>     move away A id=1
>>     move here id=1 B
>>     change prop B
>>     close B
>
> Well, now, are you thinking that the "move here" operation just
> conveys a semantic intent but does not actually do anything to B, and
> then editor expresses the necessary changes in B relative to the
> existing base state at path B (^/B@20); or that the "move here"
> executes a move (or copy) of the base state in the WC and expresses
> the changes relative to that state (^/A@10)?  It matters -- they're
> different sets of changes.

For the update editor it would be best if the B changes were based on
B@20 rather than A@10.  I don't think the commit editor has to handle
this case since it always starts with a single revision and the move
source always has to be identical to HEAD.

We start with

  base A@10, working file/dir A' with mods
  base B@20, working file/dir B' with mods

We update to r30 and receive a move

  move away A, id=1
  move here id=1, B
  change B, ref B@20
  close B

The move away raises a tree-conflict on A due to local mods, and copies
A@10 into a working node and removes the base A@10.  We can still
identify the local mods by comparing the working A@10 and A'.

The move here actions depend on whether B is a directory or a file.

For a directory the move here causes B to become B@30/incomplete.  At
this stage we can still identify the local mods because the incomplete
base hasn't received any changes.  As changes arrive they have to be
merged with the local mods B' possibly causing a conflict. After
receiving all the changes the close causes the base to be marked B@30
complete.  There may be a conflict.

For a file the move here doesn't change wc.db, the text/property changes
accumulate until close.  At close we create a workqueue item to merge
the changes, possibly raising a conflict, and update the base to B@30.

The user can then choose to resolve the tree-conflict on A by merging
the A@20-A' diff into B'.

>> However if B did not exist in the initial working copy the Ev1 drive
>> would be different:
>> 
>>     delete A
>>     add B
>>     change_prop B
>>     close B
>
> Agreed.
>
>> but with Ev1+ we would get the same sequence.  So it appears that "move
>> here" is somehow replacing both open and add.  The driver can
>> distinguish these cases and I think it should be telling the receiver.

In this case the B changes have to be based on A@10.

  move away A, id=1
  move here id=1, B
  change B, ref A@10
  close B

The driver can distinguish these two cases because it has been told
whether or not B@20 is present.  I don't know whether we need to have
the driver explicitly communicate it to the receiver, as with the
current add/open.

Another case is A-to-B-to-C:

We start with

  base A@10, working A' file/dir with mods
  base B@20, working B' file/dir with mods

We update to r30 and receive two moves:

  move away A, id=1
  move here id=1, C
  change C
  close C
  move away B, id=2
  move here id=2, C
  close C

Here the driver chooses to move one of A or B first, I've picked A.
After the first move here the driver sends changes based on A@10 since
there is no previous C.  For the second move there is already a C@30 and
the driver has no changes to send.  The driver knows all this.

For the user there will be tree-conflicts on A and B, there will be
working nodes A@10 and B@20, and the local mods can be determined by
comparing A@10-A' and B@20-B'.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: Move Tracking in the Update Editor

Posted by Julian Foad <ju...@btopenworld.com>.

Philip Martin wrote:

> Julian Foad <ju...@btopenworld.com> writes:
>>  Philip Martin wrote:
>>>  There is a related issue with mixed-revision working copies.  Suppose A
>>>  gets moved to B in some commit.  A mixed-revision working copy could
>>>  contain both A and B so what does the update that includes that commit
>>>  do?  I suppose the server still sends some sort of "move A B" operation
>> 
>>  Yes.  That's this scenario:
>> 
>> <https://wiki.apache.org/subversion/MoveDev/MoveTrackingUpdateEditor#One_Move.2C_One_Non-Move>.
>> 
>>    WC: (A@10, B@20); repo: (r10: mkdir A; r20: mv A B)
>>    Update to r20; A moves to B, and also the existing B is updated.

Oops, that should be update to r30, and assume that there are changes in each commit as well as the move, because I want to demonstrate that there are two different sets of incoming changes when we update.

>>  and the steps would be:
>> 
>>    move away A; move to B (original A)
>>    delete B  # move-away
>>    add B (copy-from ... A)  # move-here: it conflicts
> 
> Not sure I follow that, we don't want to delete B as it may have local
> mods and the final B is the same node so those mods should be preserved.

Notation.  I was using the notation of sending the move info in a separate prefix and then using the existing "delete" and "add-with-copy" instructions to actually execute the "move-away" and "move-here" operations respectively.

Nevertheless, you bring up a good problem.  I agree "we" (the client) quite likely do not want to delete B in such a case; the most likely conflict resolution we'd choose would be to merge the local mods from A into those in B.  But I was trying to envision an edit driver in the server that does not make that decision but rather sends commands for each of the two original paths separately and lets the receiver (client) decide how to handle the conflict.

However, I see the problem with that.  There is no sane way for the driver to describe both the move from A to B and the update to the original B in the same edit drive.

Switching to Ev1.5 notation with "move-away" and "move-here" as first class operations, as described in <http://wiki.apache.org/subversion/MoveDev/Ev15MovesDesign#Specification>, the move from A to B would be:

  rename(A,B)
  [edits in B, relative to original *A*]
  close-directory(B)

and the update to the original B would be:

  open-directory(B)
  [edits in B, relative to original *B*]
  close-directory(B)

I suppose the edit driver *could* just send both sets of instructions, one after the other, but that's pretty severely overloading the editor definition.  It likely would not play nicely with lazy/late closing of the directories, which I think we do these days.

The alternative is for the merging (and conflict resolution?) to happen earlier.  Is that feasible?  In the server where it receives a report of the WC base state and drives the editor, perhaps?

> A current Ev1 drive would be:
> 
>     delete A
>     open B
>     change_prop B
>     close B

Right.  Note that here the changes in B are expressed relative to the pre-existing 'B' base state.

> For the Ev1+ drive we add move and drop the "delete A" and "open 
> B":
> 
>     move away A id=1
>     move here id=1 B
>     change prop B
>     close B

Well, now,  are you thinking that the "move here" operation just conveys a semantic intent but does not actually do anything to B, and then editor expresses the necessary changes in B relative to the existing base state at path B (^/B@20); or that the "move here" executes a move (or copy) of the base state in the WC and expresses the changes relative to that state (^/A@10)?  It matters -- they're different sets of changes.

> However if B did not exist in the initial working copy the Ev1 drive
> would be different:
> 
>     delete A
>     add B
>     change_prop B
>     close B

Agreed.

> but with Ev1+ we would get the same sequence.  So it appears that "move
> here" is somehow replacing both open and add.  The driver can
> distinguish these cases and I think it should be telling the receiver.

My definition in the Wiki has "move-here" meaning to actually move the referenced base state to the target path, and any following edits will be based on that new state.

(I can't usefully respond to the remainder below until we get more precise in our notation.)

- Julian

>>>  but we may want the server to tell the client that B is not being
>>>  replaced.
>> 
>>  That happens naturally.  If B is being replaced by the moved A:
>> 
>>    WC: (A@10, B@10); repo: (r10: mkdir A B; r20: delete B; mv A B)
>>    Update to r20; A moves to B, replacing the previous B.
>> 
>>  then the server would include an additional "delete A" step:
>> 
>>    move away A; move to B (original A)
>>    delete B  # move-away
>>    delete A
>>    add B (copy-from ... A)  # move-here: it conflicts
> 
> There are 3 cases:
> 
>     1)  A moved to B where B does not exist
>     2)  A moved to B where B exists and is the same node
>     3)  A moved to B where B exists but is a different node
> 
> 1 and 2 are the cases above.  For 3 the current Ev1 drive would be:
> 
>     delete A
>     delete B
>     add B
>     change prop B
>     close B
> 
> I suppose the Ev1+ drive might be something like:
> 
>     move away A id=1
>     delete B
>     move here id=1 B
>     change prop B
>     close B
> 
> In this case local mods to B would cause a tree-conflict due to the
> delete/replace.
> 
> It might be interesting to see how Ev2 would handle these 3 cases.
> 
>     1)  A moved to B where B does not exist
> 
>     alter dir ., children=A,B
>     move A B, replaces_rev=-1
>     alter B
> 
>     2)  A moved to B where B exists and is the same node
> 
>     alter dir ., children=A,B
>     move A B, replaces_rev=-1
>     alter B
> 
>     3)  A moved to B where B exists and is a different node
> 
>     alter dir ., children=A,B
>     move A B, replaces_rev=N
>     alter B
> 
> Ev2 distinguishes between add and replace via the replaces_rev flag but
> as with Ev1+ the driver knows about the difference between 1 and 2 but
> doesn't communicate it to the receiver.

Re: Move Tracking in the Update Editor

Posted by Philip Martin <ph...@wandisco.com>.

Julian Foad <ju...@btopenworld.com> writes:

> Philip Martin wrote:
>> There is a related issue with mixed-revision working copies.  Suppose A
>> gets moved to B in some commit.  A mixed-revision working copy could
>> contain both A and B so what does the update that includes that commit
>> do?  I suppose the server still sends some sort of "move A B" operation
>
> Yes.  That's this scenario:
> <https://wiki.apache.org/subversion/MoveDev/MoveTrackingUpdateEditor#One_Move.2C_One_Non-Move>.
>
>   WC: (A@10, B@20); repo: (r10: mkdir A; r20: mv A B)
>   Update to r20; A moves to B, and also the existing B is      updated.
>
> and the steps would be:
>
>   move away A; move to B (original A)
>   delete B  # move-away
>   add B (copy-from ... A)  # move-here: it conflicts

Not sure I follow that, we don't want to delete B as it may have local
mods and the final B is the same node so those mods should be preserved.

A current Ev1 drive would be:

    delete A
    open B
    change_prop B
    close B

For the Ev1+ drive we add move and drop the "delete A" and "open B":

    move away A id=1
    move here id=1 B
    change prop B
    close B

However if B did not exist in the initial working copy the Ev1 drive
would be different:

    delete A
    add B
    change_prop B
    close B

but with Ev1+ we would get the same sequence.  So it appears that "move
here" is somehow replacing both open and add.  The driver can
distinguish these cases and I think it should be telling the receiver.

>> but we may want the server to tell the client that B is not being
>> replaced.
>
> That happens naturally.  If B is being replaced by the moved A:
>
>   WC: (A@10, B@10); repo: (r10: mkdir A B; r20: delete B; mv A B)
>   Update to r20; A moves to B, replacing the previous B.
>
> then the server would include an additional "delete A" step:
>
>   move away A; move to B (original A)
>   delete B  # move-away
>   delete A
>   add B (copy-from ... A)  # move-here: it conflicts

There are 3 cases:

    1)  A moved to B where B does not exist
    2)  A moved to B where B exists and is the same node
    3)  A moved to B where B exists but is a different node

1 and 2 are the cases above.  For 3 the current Ev1 drive would be:

    delete A
    delete B
    add B
    change prop B
    close B

I suppose the Ev1+ drive might be something like:

    move away A id=1
    delete B
    move here id=1 B
    change prop B
    close B

In this case local mods to B would cause a tree-conflict due to the
delete/replace.

It might be interesting to see how Ev2 would handle these 3 cases.

    1)  A moved to B where B does not exist

    alter dir ., children=A,B
    move A B, replaces_rev=-1
    alter B

    2)  A moved to B where B exists and is the same node

    alter dir ., children=A,B
    move A B, replaces_rev=-1
    alter B

    3)  A moved to B where B exists and is a different node

    alter dir ., children=A,B
    move A B, replaces_rev=N
    alter B

Ev2 distinguishes between add and replace via the replaces_rev flag but
as with Ev1+ the driver knows about the difference between 1 and 2 but
doesn't communicate it to the receiver.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: Move Tracking in the Update Editor

Posted by Julian Foad <ju...@btopenworld.com>.

Philip Martin wrote:
> Julian Foad <ju...@btopenworld.com> writes:
>> Julian Foad wrote:
>>> Philip Martin wrote:
>>>> [...] I can imagine an enhanced Ev1 editor drive that does
>>>> 
>>>>    move away A
>>>>    move to C (original A)
>>>>    del A
>>>>    move away B
>>>>    move to C (original B)
>>>>    del B
>>>>    add C
>>>> 
>>>> The deletes lead to tree-conflicts on A and B due to local mods. The
>>>> add creates a pristine C with no local mods.  The user resolves the
>>>> tree-conflicts post-update and gets to choose which local mods are
>>>> merged to C, possibly both but one before the other, which may in 
>>>> turn raise text/prop/tree-conflicts on C.
>>> 
>>> This is good as far as local mods are concerned.  The decision of how
>>> to combine A and B, or raise a conflict, is left to the WC client code
>>> as it should be, and sufficient information is provided about the
>>> nature of the changes.
[...]
> There is a related issue with mixed-revision working copies.  Suppose A
> gets moved to B in some commit.  A mixed-revision working copy could
> contain both A and B so what does the update that includes that commit
> do?  I suppose the server still sends some sort of "move A B" operation

Yes.  That's this scenario: <https://wiki.apache.org/subversion/MoveDev/MoveTrackingUpdateEditor#One_Move.2C_One_Non-Move>.

  WC: (A@10, B@20); repo: (r10: mkdir A; r20: mv A B)
  Update to r20; A moves to B, and also the existing B is      updated.

and the steps would be:

  move away A; move to B (original A)
  delete B  # move-away
  add B (copy-from ... A)  # move-here: it conflicts

> but we may want the server to tell the client that B is not being
> replaced.

That happens naturally.  If B is being replaced by the moved A:

  WC: (A@10, B@10); repo: (r10: mkdir A B; r20: delete B; mv A B)
  Update to r20; A moves to B, replacing the previous B.

then the server would include an additional "delete A" step:

  move away A; move to B (original A)
  delete B  # move-away
  delete A
  add B (copy-from ... A)  # move-here: it conflicts

- Julian

Re: Move Tracking in the Update Editor

Posted by Philip Martin <ph...@wandisco.com>.

Julian Foad <ju...@btopenworld.com> writes:

> Julian Foad wrote:
>
>> Philip Martin wrote:
>>> [...] I can imagine an enhanced Ev1 editor drive that does
>>> 
>>>     move away A
>>>     move to C (original A)
>>>     del A
>>>     move away B
>>>     move to C (original B)
>>>     del B
>>>     add C
>>> 
>>>  The deletes lead to tree-conflicts on A and B due to local mods.  The
>>>  add creates a pristine C with no local mods.  The user resolves the
>>>  tree-conflicts post-update and gets to choose which local mods are
>>>  merged to C, possibly both but one before the other, which may in turn
>>>  raise text/prop/tree-conflicts on C.
>> 
>> This is good as far as local mods are concerned.  The decision of how to combine 
>> A and B, or raise a conflict, is left to the WC client code as it should be, and 
>> sufficient information is provided about the nature of the changes.
>> 
>> Functionally, this would work as you've shown.  Regarding updating the base 
>> tree, however, note that the plain "add" in Ev1 is a node-by-node add 
>> of the whole subtree, which could be arbitrarily large.  For performance, we 
>> must not use a plain add, as a move should be designed to execute in O(1) 
>> time. [...]
>
> Philip pointed out to me that this would still be an improvement over
> what we have today.  If we're going to get anything into a release
> cycle it must be broken down into manageable stages, so perhaps this
> is a good place to draw the line for a first stage.
>
> I'm fairly well persuaded by this view.

There is a related issue with mixed-revision working copies.  Suppose A
gets moved to B in some commit.  A mixed-revision working copy could
contain both A and B so what does the update that includes that commit
do?  I suppose the server still sends some sort of "move A B" operation
but we may want the server to tell the client that B is not being
replaced.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: Move Tracking in the Update Editor

Posted by Julian Foad <ju...@btopenworld.com>.

Julian Foad wrote:

> Philip Martin wrote:
>> [...] I can imagine an enhanced Ev1 editor drive that does
>> 
>>     move away A
>>     move to C (original A)
>>     del A
>>     move away B
>>     move to C (original B)
>>     del B
>>     add C
>> 
>>  The deletes lead to tree-conflicts on A and B due to local mods.  The
>>  add creates a pristine C with no local mods.  The user resolves the
>>  tree-conflicts post-update and gets to choose which local mods are
>>  merged to C, possibly both but one before the other, which may in turn
>>  raise text/prop/tree-conflicts on C.
> 
> This is good as far as local mods are concerned.  The decision of how to combine 
> A and B, or raise a conflict, is left to the WC client code as it should be, and 
> sufficient information is provided about the nature of the changes.
> 
> Functionally, this would work as you've shown.  Regarding updating the base 
> tree, however, note that the plain "add" in Ev1 is a node-by-node add 
> of the whole subtree, which could be arbitrarily large.  For performance, we 
> must not use a plain add, as a move should be designed to execute in O(1) 
> time. [...]

Philip pointed out to me that this would still be an improvement over what we have today.  If we're going to get anything into a release cycle it must be broken down into manageable stages, so perhaps this is a good place to draw the line for a first stage.

I'm fairly well persuaded by this view.

- Julian


> It may be advantageous to come up with an implementation plan whereby 
> move-awareness is introduced first to the repository side, with just enough 
> support in the WC to make typical move scenarios work much better than they do 
> today, and then as a second phase extend the WC to know about node-id 
> relationships and complete the WC-side awareness (so that scenarios like the 
> above commit-time failures are instead detected within the WC, and whatever else 
> goes along with that).

Re: Move Tracking in the Update Editor

Posted by Julian Foad <ju...@btopenworld.com>.

Philip Martin wrote:
>> On 30.08.2013 17:18, Julian Foad wrote:
>>> The client *does* need to know that B is in fact being moved to C, so 
>>> that it can offer to transfer my local modifications from B to C.
> 
> Allowing multiple moves to the same destination doesn't really fit with
> Ev2 but I can imagine an enhanced Ev1 editor drive that does
> 
>    move away A
>    move to C (original A)
>    del A
>    move away B
>    move to C (original B)
>    del B
>    add C
> 
> The deletes lead to tree-conflicts on A and B due to local mods.  The
> add creates a pristine C with no local mods.  The user resolves the
> tree-conflicts post-update and gets to choose which local mods are
> merged to C, possibly both but one before the other, which may in turn
> raise text/prop/tree-conflicts on C.

This is good as far as local mods are concerned.  The decision of how to combine A and B, or raise a conflict, is left to the WC client code as it should be, and sufficient information is provided about the nature of the changes.

Functionally, this would work as you've shown.  Regarding updating the base tree, however, note that the plain "add" in Ev1 is a node-by-node add of the whole subtree, which could be arbitrarily large.  For performance, we must not use a plain add, as a move should be designed to execute in O(1) time.[1]

So, one of the base trees (at A or B) should be deleted, while the other should be moved over to 'C' and updated.  The edit producer must therefore make a decision between A and B, and send either

  [move-info...]
  del A   // but WC shall not delete the base tree
  del B
  add C (copy-from A@10)  // WC shall move the base tree from A
  modify... C/...  // diff A@10:C@30

or

  [move-info...]
  del A
  del B   // but WC shall not delete the base tree
  add C (copy-from B@20)  // WC shall move the base tree from B
  modify... C/...  // diff B@20:C@30

(The ideal choice of which base-tree to move would be the one with the smaller delta to C@30, but that is hard to determine efficiently.  The one with the closer revision number, in this case B, would often be a good choice.  However, each of A and B could be a mixed-rev subtree.  A good choosing algorithm is far from obvious.  That this problem exists at all might be a clue that something deeper is wrong.)

But notice how we're really trying to send a combination of two different messages:

  * The incoming logical change, that must be merged with any local mods, is:

    - mv(A,C) && modify-tree(diff(A10:C30))
    - mv(B,C) && modify-tree(diff(B20:C30))

  * To produce the new base tree, one reasonably efficient way is:

    - rm(A)
    - mv(B,C)
    - modify-tree(C, diff(B20:C30))

What we've been striving for above is a kind of hybrid of those two messages.

But let's try to take a step back and look at how we got into this surprisingly ugly case in the first place.  The real problem stems from an "impedance mismatch" between the path-centric semantics of the existing WC and the line-of-history semantics that "moves" introduces.

Why do we start off with the expectation that this update (A@10 and B@20 -> C@30) should resolve smoothly into C@30?  Because, in our old-fashioned thinking, this was a "normal" mixed-revision WC, and any "normal" WC should update smoothly.  But what if we instead started off with this WC (and the same repository):

  old_B->A@10
  B@20

Then we would say that 'old_B' is 'switched' and we would not expect an update to bring this WC to contain just a child 'C'.  We might expect the child named 'old_B' to be deleted (if we're still clinging to the old copy-and-delete ideas about how a move should be handled), or in a move-aware world we might expect 'old_B' to remain present but end up pointing at '^/C@30'.

So, back to the original example, I'm thinking that the child 'A@10' should be treated as a switched child.  In a move-aware world, that child should not be found in the WC at all at the same time as 'B@20' unless somebody did a deliberate kind of move-defeating update.

But the WC doesn't know 'A@10' and 'B@20' are the same repository node.  The WC doesn't know about node-ids or node-history relationships.  This is the problem.  Maybe the WC needs to know.

If the WC doesn't know about node-id relationships, then it cannot prevent the following silly situations from arising:

(1) Start from:

  A@10
  B@20

  Then move A to B/A, and try to commit.  The commit will fail because 
you cannot move the same node into a child of itself.  The detection 
occurs only at commit time, where the system knows about node ids, 
rather than at the time the invalid operation was performed.
(2) Start from:

  A@10
  B@20

  Then modify A, and modify B, and try to commit.  The commit will fail 
because the two changes to the same repository node conflict.  The WC 
contains logic to detect this kind of problem before commit in the 
specific case where a switched WC node is pointing to the same URL as 
another node, but it does not know how to detect when two WC paths 
pointing to *different* URLs are in fact pointing to the same repository node.

I think the answer is that in order to make a properly move-aware system, the WC needs to know about node-id relationships [2].

It may be advantageous to come up with an implementation plan whereby move-awareness is introduced first to the repository side, with just enough support in the WC to make typical move scenarios work much better than they do today, and then as a second phase extend the WC to know about node-id relationships and complete the WC-side awareness (so that scenarios like the above commit-time failures are instead detected within the WC, and whatever else goes along with that).

- Julian

[1] The current WC DB schema requires O(nodes in subtree) time for a 
move internally, but that's an implementation detail and is often not 
the dominant factor.

[2] It could either know the actual node-copy-ids, or just know which pairs of nodes have the same node-copy-id; either would suffice.

Re: Move Tracking in the Update Editor

Posted by Philip Martin <ph...@wandisco.com>.

Branko Čibej <br...@wandisco.com> writes:

> On 30.08.2013 17:18, Julian Foad wrote:
>> Branko Čibej wrote:
>>
>>> It strikes me that if for update-like edits, which are driven by the
>>> server, the once rule is interpreted in terms of WC paths, not node URLs
>>> or node IDs; then the server already has all the information it needs to
>>> transform the working copy.
>>>
>>> Take your first multiple source example:
>>>
>>> |                   |
>>> +-- A  mv--\        |
>>> |           \       |
>>> +-- B  mv--\ \      |
>>>             \-\-->  +-- C
>>>
>>> The working copy really doesn't have to know that both A and B became C.
>>> It only has to represent the final state. this can be described as:
>>>
>>>     move A, C; delete B
>>>
>>> or
>>>
>>>     move B, C; delete A 
>>>
>>> and any text modifications are applied after the move.
>> That would work for the WC base tree, but is not good enough for handling the local modifications.  Consider if the user has made local modifications inside B, and the server sends "move A, C; delete B".  If the client told the user:
>>
>>   I've updated A OK (moved it to C).
>>   Tree conflict: incoming delete of B, local modifications to B.
>>
>> ... that would be a horrible experience.  And it would be arbitrary (from the user's POV) whether she got this conflict or not, depending on which one the server decided to move and which one the server decided to delete.
>>
>> The client *does* need to know that B is in fact being moved to C, so that it can offer to transfer my local modifications from B to C.
>
> I'm not at all sure that you can avoid a conflict even in this case.
> It's either a tree conflict, or a text conflict (if both A and B were
> modified, and they refer to the same node).
>
> One possible solution is to replay the moves in the order they happened.
> Then you'd get two cases:
>
>   * A is an ancestor of B: the operation is:
>       o move(A, B) -> tree conflict, can be resolved as a text merge
>       o move(B, C)
>   * A and B are the same node: In this case, they'll can only be visible
>     at the same time if you have a switched subtree or an external
>     subtree, and the multiple-source move doesn't happen (because the
>     move needs to be replicated in both subtrees)
>
> I'm not sure how the editor driver would represent the move sequence and
> whether the resulting (intermediate) tree conflict should be resolved to
> a text conflict automatically or not; but I'm pretty sure it can be done
> without introducing multiple-source or multiple-target moves.

Allowing multiple moves to the same destination doesn't really fit with
Ev2 but I can imagine an enhanced Ev1 editor drive that does

   move away A
   move to C (original A)
   del A
   move away B
   move to C (original B)
   del B
   add C

The deletes lead to tree-conflicts on A and B due to local mods.  The
add creates a pristine C with no local mods.  The user resolves the
tree-conflicts post-update and gets to choose which local mods are
merged to C, possibly both but one before the other, which may in turn
raise text/prop/tree-conflicts on C.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: Move Tracking in the Update Editor

Posted by Branko Čibej <br...@wandisco.com>.

On 30.08.2013 17:18, Julian Foad wrote:
> Branko Čibej wrote:
>
>> It strikes me that if for update-like edits, which are driven by the
>> server, the once rule is interpreted in terms of WC paths, not node URLs
>> or node IDs; then the server already has all the information it needs to
>> transform the working copy.
>>
>> Take your first multiple source example:
>>
>> |                   |
>> +-- A  mv--\        |
>> |           \       |
>> +-- B  mv--\ \      |
>>             \-\-->  +-- C
>>
>> The working copy really doesn't have to know that both A and B became C.
>> It only has to represent the final state. this can be described as:
>>
>>     move A, C; delete B
>>
>> or
>>
>>     move B, C; delete A 
>>
>> and any text modifications are applied after the move.
> That would work for the WC base tree, but is not good enough for handling the local modifications.  Consider if the user has made local modifications inside B, and the server sends "move A, C; delete B".  If the client told the user:
>
>   I've updated A OK (moved it to C).
>   Tree conflict: incoming delete of B, local modifications to B.
>
> ... that would be a horrible experience.  And it would be arbitrary (from the user's POV) whether she got this conflict or not, depending on which one the server decided to move and which one the server decided to delete.
>
> The client *does* need to know that B is in fact being moved to C, so that it can offer to transfer my local modifications from B to C.

I'm not at all sure that you can avoid a conflict even in this case.
It's either a tree conflict, or a text conflict (if both A and B were
modified, and they refer to the same node).

One possible solution is to replay the moves in the order they happened.
Then you'd get two cases:

  * A is an ancestor of B: the operation is:
      o move(A, B) -> tree conflict, can be resolved as a text merge
      o move(B, C)
  * A and B are the same node: In this case, they'll can only be visible
    at the same time if you have a switched subtree or an external
    subtree, and the multiple-source move doesn't happen (because the
    move needs to be replicated in both subtrees)

I'm not sure how the editor driver would represent the move sequence and
whether the resulting (intermediate) tree conflict should be resolved to
a text conflict automatically or not; but I'm pretty sure it can be done
without introducing multiple-source or multiple-target moves.

-- Brane

-- 
Branko Čibej | Director of Subversion
WANdisco // Non-Stop Data
e. brane@wandisco.com

Re: Move Tracking in the Update Editor

Posted by Julian Foad <ju...@btopenworld.com>.

Branko Čibej wrote:

> It strikes me that if for update-like edits, which are driven by the
> server, the once rule is interpreted in terms of WC paths, not node URLs
> or node IDs; then the server already has all the information it needs to
> transform the working copy.
> 
> Take your first multiple source example:
> 
> |                   |
> +-- A  mv--\        |
> |           \       |
> +-- B  mv--\ \      |
>             \-\-->  +-- C
> 
> The working copy really doesn't have to know that both A and B became C.
> It only has to represent the final state. this can be described as:
> 
>     move A, C; delete B
> 
> or
> 
>     move B, C; delete A 
> 
> and any text modifications are applied after the move.

That would work for the WC base tree, but is not good enough for handling the local modifications.  Consider if the user has made local modifications inside B, and the server sends "move A, C; delete B".  If the client told the user:

  I've updated A OK (moved it to C).
  Tree conflict: incoming delete of B, local modifications to B.

... that would be a horrible experience.  And it would be arbitrary (from the user's POV) whether she got this conflict or not, depending on which one the server decided to move and which one the server decided to delete.

The client *does* need to know that B is in fact being moved to C, so that it can offer to transfer my local modifications from B to C.

- Julian

Re: Move Tracking in the Update Editor

Posted by Branko Čibej <br...@wandisco.com>.

On 30.08.2013 13:03, Julian Foad wrote:
> I'm working on how the update editor can handle moves.  It's more
> complex than the commit editor, because there can be multiple
> instances of the "same" repository node in the WC, so moves are not
> necessarily unique.
>
> This is a copy of my notes in progress.  I could use some suggestions
> or thoughts if you have any?

It strikes me that if for update-like edits, which are driven by the
server, the once rule is interpreted in terms of WC paths, not node URLs
or node IDs; then the server already has all the information it needs to
transform the working copy.

Take your first multiple source example:

|                   |
+-- A  mv--\        |
|           \       |
+-- B  mv--\ \      |
            \-\-->  +-- C

The working copy really doesn't have to know that both A and B became C.
It only has to represent the final state. this can be described as:

    move A, C; delete B

or

    move B, C; delete A 

and any text modifications are applied after the move. The only real
difference between the two descriptions is that one of them /may/ be
more efficient, in the sense that the delta that represents the text
modifications on C may be smaller. But that's a minor optimization; if
the server can do it, fine; if not, we're still better off than in the
copy+delete case, especially for deep directory trees.

-- Brane

-- 
Branko Čibej | Director of Subversion
WANdisco // Non-Stop Data
e. brane@wandisco.com

Re: Move Tracking in the Update Editor

Posted by Julian Foad <ju...@btopenworld.com>.

Stefan Fuhrmann wrote:
> On Fri, Aug 30, 2013 at 2:02 PM, Julian Foad wrote:
>> <https://wiki.apache.org/subversion/MoveDev/MoveTrackingUpdateEditor>.
> 
> I read the wiki page and agree that all those issues are real.
> 
> So, here is my take on how to address them. Maybe you
> find something useful in there ;)

Thanks, Stefan.

> Multiple Move Targets
> ---------------------
>
> This can only happen with switched paths in the working copy
> as you need to have a given path@rev more than in one in your
> working copy.
>
> Switched sub-trees are already "isolated" from changes in
> other parts of the working copy in the sense that an update,
> commit or - most importantly - modification in one part does
> not modify any switched sub-tree that happens to mirror these
> paths.
>
> If we modify a node in the working copy and that node gets moved
> as part of the update, it is perfectly reasonable to say that
> the local change will not be propagated to any switched sub-tree
> but requires a commit and update cycle to show up there.  Even
> if the move was 1:1, it still should not cross switch boundaries
> but appear as a deletion + potential tree conflict during update
> and as an addition in the switched location.
>
> Finally, having switched sub-trees mirror other parts or the
> working copy seems to be a non-standard usage of the 'switch'
> feature originally designed to make it easy to switch between
> various branches.  We have to deal with it but we don't need
> to get fancy here.

I agree with you up to here.

> My suggestion for how the update editor might work (see below)
> allows for multiple copy targets.  But for simplicity, the
> client may decide to eliminate moves that cross switched
> sub-tree boundaries.

I'm afraid my gut feeling about your 3-stage suggestion is it's a horrible hack :-)  But I don't quite understand it in detail.  What I like about it is that the client gets the info about all moves that affect nodes in the WC, even those where it is only going to want to perform a delete or add (half a move) in the end.  That's good because it allows the client to make decisions and/or inform the user about exactly how it wants to manage the WC.  But I don't like it being two separate client-server communications which partly overlap in content (they both contain *some* info about moves, if I understand you).

- Julian


> Multiple Move Sources
> ---------------------
>
> This might happen relatively frequently if people move nodes
> over "long distances" in the tree and users update only parts
> of their working copies.
>
> Again, the update editor should support multiple sources for
> a given move but the client is free to narrow it down to one
> (or even zero).
>
> For the first implementation, I suggest that the most recent
> (revision-wise) of the move sources is being used and a simple
> deletion is done to all other move sources.  Chances are that
> this simply works without being noticed by the user.  If there
> were local changes to the older instances of the moved node,
> those will cause tree conflicts as they do today.
>
>
> Update Editor / Reporter Drive
> ------------------------------
>
> I suggest a 3-stage process:
>
> (1) Client communicates the working copy path + URL@rev list
>     to the server and gets a list of all moves relevant to the
>     working copy from the server (being tagged with local path
>     info in a "baton-esque" manor, i.e. the server does not
>     need to understand them).  This might be done in a process
>     similar to the actual update / reporter drive but will be
>     much faster as it only requires wc.db queries.
>
> (2) Client filters list to eliminate multiple sources as well
>     as cross-switch moves.  It also knows the list of "move
>     from" paths now and can handle them transparently for any
>     incoming move or other modification.
>
> (3) Client passes the filtered list to the normal update editor /
>     reporter drive as today.  If an ADD happens to match a move
>     target, the editor will send a MOVE instead and continue to
>     recurse accordingly.

Re: Move Tracking in the Update Editor

Posted by Stefan Fuhrmann <st...@wandisco.com>.

On Fri, Aug 30, 2013 at 2:02 PM, Julian Foad <ju...@btopenworld.com>wrote:

> Branko Čibej wrote:
>
> > On 30.08.2013 13:03, Julian Foad wrote:
> >>  I'm working on how the update editor can handle moves.  It's more
> >>  complex than the commit editor, because there can be multiple
> >>  instances of the "same" repository node in the WC, so moves are not
> >>  necessarily unique.
> >>
> >>  This is a copy of my notes in progress.  I could use some suggestions
> >>  or thoughts if you have any?
> >
> > Here's a suggestion, please put this in the Wiki, not HTML (shudder)
> > mail. :)
>
> Good suggestion.  Done.
>
> <https://wiki.apache.org/subversion/MoveDev/MoveTrackingUpdateEditor>.
>
>
Hi Julian,

I read the wiki page and agree that all those issues are real.
So, here is my take on how to address them. Maybe you
find something useful in there ;)

-- Stefan^2.

Multiple Move Targets
---------------------

This can only happen with switched paths in the working copy
as you need to have a given path@rev more than in one in your
working copy.

Switched sub-trees are already "isolated" from changes in
other parts of the working copy in the sense that an update,
commit or - most importantly - modification in one part does
not modify any switched sub-tree that happens to mirror these
paths.

If we modify a node in the working copy and that node gets moved
as part of the update, it is perfectly reasonable to say that
the local change will not be propagated to any switched sub-tree
but requires a commit and update cycle to show up there.  Even
if the move was 1:1, it still should not cross switch boundaries
but appear as a deletion + potential tree conflict during update
and as an addition in the switched location.

Finally, having switched sub-trees mirror other parts or the
working copy seems to be a non-standard usage of the 'switch'
feature originally designed to make it easy to switch between
various branches.  We have to deal with it but we don't need
to get fancy here.

My suggestion for how the update editor might work (see below)
allows for multiple copy targets.  But for simplicity, the
client may decide to eliminate moves that cross switched
sub-tree boundaries.

Multiple Move Sources
---------------------

This might happen relatively frequently if people move nodes
over "long distances" in the tree and users update only parts
of their working copies.

Again, the update editor should support multiple sources for
a given move but the client is free to narrow it down to one
(or even zero).

For the first implementation, I suggest that the most recent
(revision-wise) of the move sources is being used and a simple
deletion is done to all other move sources.  Chances are that
this simply works without being noticed by the user.  If there
were local changes to the older instances of the moved node,
those will cause tree conflicts as they do today.

Update Editor / Reporter Drive
------------------------------

I suggest a 3-stage process:

(1) Client communicates the working copy path + URL@rev list
    to the server and gets a list of all moves relevant to the
    working copy from the server (being tagged with local path
    info in a "baton-esque" manor, i.e. the server does not
    need to understand them).  This might be done in a process
    similar to the actual update / reporter drive but will be
    much faster as it only requires wc.db queries.

(2) Client filters list to eliminate multiple sources as well
    as cross-switch moves.  It also knows the list of "move
    from" paths now and can handle them transparently for any
    incoming move or other modification.

(3) Client passes the filtered list to the normal update editor /
    reporter drive as today.  If an ADD happens to match a move
    target, the editor will send a MOVE instead and continue to
    recurse accordingly.

Re: Move Tracking in the Update Editor

Posted by Julian Foad <ju...@btopenworld.com>.

Branko Čibej wrote:

> On 30.08.2013 13:03, Julian Foad wrote:
>>  I'm working on how the update editor can handle moves.  It's more
>>  complex than the commit editor, because there can be multiple
>>  instances of the "same" repository node in the WC, so moves are not
>>  necessarily unique.
>> 
>>  This is a copy of my notes in progress.  I could use some suggestions
>>  or thoughts if you have any?
> 
> Here's a suggestion, please put this in the Wiki, not HTML (shudder)
> mail. :)

Good suggestion.  Done.

<https://wiki.apache.org/subversion/MoveDev/MoveTrackingUpdateEditor>.

- Julian

Re: Move Tracking in the Update Editor

Posted by Branko Čibej <br...@wandisco.com>.

On 30.08.2013 13:03, Julian Foad wrote:
> I'm working on how the update editor can handle moves.  It's more
> complex than the commit editor, because there can be multiple
> instances of the "same" repository node in the WC, so moves are not
> necessarily unique.
>
> This is a copy of my notes in progress.  I could use some suggestions
> or thoughts if you have any?

Here's a suggestion, please put this in the Wiki, not HTML (shudder)
mail. :)

-- Brane


-- 
Branko Čibej | Director of Subversion
WANdisco // Non-Stop Data
e. brane@wandisco.com