You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Julian Foad <ju...@btopenworld.com> on 2013/08/12 17:42:40 UTC

Ev2 using move-away and move-here

 TLDR: Ev2 move support doesn't work with the "move" and "rotate"
instructions as currently defined, and I propose that it could work instead
with separate "move-away" and "move-here" instructions.


== SUMMARY ==

Ev2 was designed with an intention to support “move” semantics, but the
current design of its move() method does not satisfy the requirements. Here
I explain why, and present a solution.

The following principles constrain or guide Ev2 design:

  * An edit must be able to express any change that can be constructed in a
WC and committed.

  * An edit must be able to express any change that can be constructed in a
repo across multiple revisions.

  * Ev2 uses sequential description: a succession of incremental edits
applied to an initial state gives a final state.

  * Ev2 should not require the driver to use temporary paths to express a
change.

Given the constraints,

  * Not all moves can be expressed by a “move source to destination”
operation, with or without a “rotate” operation.

  * Moves can be expressed, in general, using separate move-away and
move-here operations.

While other designs are also possible, a move-away and move-here design
might be the best fit for the shape of the editor. Such a design is
detailed below, but not yet finished.


== WHY IS THIS NECESSARY? ==

=== Express Any Change ===

Since the purpose of the editor is to communicate a change originating in a
WC when it is committed, and a change originating in a repository when the
WC is updated, then it must be able to express any such change. This
includes updates across multiple revisions, and from a mixed-revision state
to a single revision.

Through a series of simple steps of the form “move X to Y”, some quite
funky overall changes can be created. For example, starting from this state:

  |
  +-- A
      |
      +-- B

the following sequence:

  move /A/B /X
  move /A /X/B
  move /X /A

results in swapping A with B. If those steps are performed in a WC and the
result committed all at once, the editor needs to be able to handle it. If
those steps are committed separately, and then a working copy is updated,
the editor needs to be able to handle it.

More examples are given later, some of which involve other operations such
as “make directory” as well as moves. All of this emergent complexity
results from the introduction of a simple “move” primitive, and there does
not seem to be any acceptable way to further constrain the basic move that
would simplify the derived complexity.Temporary Paths

Paths in the Ev2 editor operations refer to the current state of the tree
being edited, at that moment in the sequence of edit operations. (The sole
exception is the copy-from path, which is relative to a given committed
revision.)

A natural and compact way of expressing moves would be as a mapping from
original-state paths to final-state paths. However, that paradigm does not
fit well with the desire to express the edit as a sequence of incremental
steps. If we are going to include move functionality as steps in a sequence
of edit operations, it makes sense to use paths that are relative to the
current state.

Ev2 should not require the driver to express a change as a sequence of
operations that can include moving a node to a temporary path and then
later moving it again to the final path. The end result of the edit is the
important part, and there are unlimited potential sequences that lead to
that result, and it does not make sense to require an edit driver to
construct such a sequence arbitrarily, if there is an alternative method
that specifies the result uniquely. The receiver may in fact need to employ
temporary paths in its implementation, but then it knows this and it is in
a better position to construct such paths when needed, and it will know
that they are temporary which may be important.

There are advantages to placing a node in its final position exactly once.
It was claimed that Google's BigTable implementation of Svn's back-end
would have benefited from knowing that once a node has been edited then it
is in its final state. Ev2 aims to provide that guarantee, where Ev1 could
not.

=== Sequential Description ===

Ev2 (svn_editor_t) intends to express a new state in terms of an old state
and a description of the parts of the new state that differ from the old
state, or, in other words, a description of the changes against the old
state. It uses a sequential, incremental description: for example, “add
directory X, then copy file X/Y from somewhere, then edit the contents and
properties of file X/Y”.

Only certain parts of the description are incremental. Ev2 aims to allow
nodes to be visited in arbitrary order, subject to a small number of
restrictions. The parts where ordering matters are:
  * tree changes before content changes

  * alter/add/copy a directory (if required) before touching its
(immediate) children

=== A Move Event is Not Adequate ===

Moves, in general, cannot be expressed as “move from path X to path Y”
events in such a sequential description without introducing temporary
paths. This is because some cases require the source path of the move to be
moved away, then some other steps, and then the destination path of the
move can be populated. Some classic examples are:

Example 1: Insert a directory level

  |                     |
  +--A  mv--\     (add) +--A
             \             |
              \-->         +--B

The add cannot happen before the move-away, but must happen before the
move-here.

(The mirror of Example 1, which would be removing a directory level, is not
actually problematic:

  |                     |
  +--A (del)     /-->   +--A
     |          /
     +--B  mv--/

The move-away must (in principle) happen before the delete, while the
move-here cannot (in principle) happen before the delete. However, Ev2
defines that a deletion which is to be replaced shall occur implicitly when
the replacement is put in place, and so the move-here can happen
simultaneously with the delete.)

Example 2: Swap two siblings

  |                     |
  +--A     mv--\ /-->   +--A
  |             X       |
  +--B     mv--/ \-->   +--B

Neither of the moves can be completed before doing the move-away part of
the other one.

Example 3: Swap two directory levels

  |                     |
  +--A     mv--\ /-->   +--A
     |          X          |
     +--B  mv--/ \-->      +--B
Neither of the moves can be completed before doing the move-away part of
the other one.

=== A Rotate Event is Not Adequate ===

At one time there was an idea that the addition of a “rotate PATH1 PATH2 …
PATHn” event would complete the semantics and allow arbitrary moves to be
supported.

While this does enable Example 2 (swap two siblings) and Example 3 (swap
two directory levels) and many other cases, it does not help with inserting
a directory level (Example 1), and it has been shown [1] to be incapable of
resolving other more involved cases involving swapping or rotation. One
specific example is swapping A with A/B/C [2]:

  |                            |
  +-- A          mv--\   /-->  +-- A
      |               \ /          |
      +-- B      mv--- X --->      +-- B
          |           / \              |
          +-- C  mv--/   \-->          +-- C

[1] Email thread on dev@, “Ev2 as a move-aware editor”, started on
2013-06-24, e.g. <http://svn.haxx.se/dev/archive-2013-06/0560.shtml> or <
http://mail-archives.apache.org/mod_mbox/subversion-dev/201306.mbox/%3C1372087321.76009.YahooMailNeo%40web186101.mail.ir2.yahoo.com%3E
>.

[2] An example problem in thread [1], of swapping A with A/B/C: <
http://svn.haxx.se/dev/archive-2013-06/0684.shtml> or <
http://mail-archives.apache.org/mod_mbox/subversion-dev/201306.mbox/%3C87bo6rewwp.fsf%40ntlworld.com%3E
>.


== MOVE-AWAY AND MOVE-HERE ==

One solution is to describe the two halves of each move separately:

  move-away SOURCE-PATH
  ...
  move-here DESTINATION-PATH

We can then solve Example 1 in the following way: issue the “move-away A”,
then create a new directory at path A which replaces that source of the
move, and then finally issue the “move-here A/B” which relies on that
replacement directory A having been created.

The consumer must be able to put the node aside for an indeterminate amount
of time until the “move-here” is received.

Of course there needs to be a way to link each move-away with the
corresponding move-here. Remembering that each edit step refers to the
current state in a sequence of states, we cannot simply specify the path
corresponding to the other end of the move like this:

  move-away SOURCE-PATH to DESTINATION-PATH
  ...
  move-here DESTINATION-PATH from SOURCE-PATH

because the problem cases are when the destination path does not yet exist
at the time of a move-away, or the source path no longer exists at the time
of a move-here. What we can do is use some other unique reference that is
unique within the edit, like this:

  move-away SOURCE-PATH as identifier ID1
  ...
  move-here DESTINATION-PATH from identifier ID1

The reference could perhaps be the destination path as it will finally
exist at the end of the edit, or just and arbitrary number or string. We
will just specify the identifier as an “id” and not specify how it is
generated.

=== Ordering Restrictions ===

The ordering rules regarding move-away and move-here should include:
  * mv-away must come before the matching mv-here

    - The edit should provide a sequential procedure that the consumer must
be able to follow without having to buffer an arbitrary amount of state.

  * mv-here & cp & add must be in nesting order: create (or put in place)
the parent before its children

  * mv-away must come before deleting a parent

    - Receiver needs to know that it must preserve this path when we delete
its parent.

  * mv-away must come before mv-away of a parent

    - If we allowed “mv-away A; …; mv-away A/B” then the child path “A/B”
would have to be specified not relative to the current state of the edit,
as all other operative paths are, but in some other way, because the parent
has gone into temporary namespace, and has perhaps been replaced so that
“A/B” now refers to some other node.

    - There is a general rule that all edits within a moved directory “A”
must come after A is moved its destination, but a mv-away of a subtree of A
is not considered an edit for this purpose.

=== Examples Solved ===

Example 1:

  |                     |
  +--A  mv--\    add--> +--A
             \             |
              \-->         +--B

  1. alter-dir / (children={A})
  2. mv-away A (id=“original A”)
  3. add-directory A (children={B})
  4. mv-here A/B (id=“original A”)

Example 2: Swapping two siblings

  |                     |
  +--A      --\ /-->    +--A
  |            X        |
  +--B      --/ \-->    +--B

  1. alter-dir / (children={A,B})
  2. mv-away A (id=“original A”)
  3. mv-away B (id=“original B”)
  4. mv-here A (id=“original B”)
  5. mv-here B (id=“original A”)

Example 3 can also be solved in this way, except for some ordering
restriction issues that are discussed below.The Need for Alter-directory
before Tree Changes

Why do we have this ordering rule?

  If any path is added (with add_*) or deleted/moved/rotated, then
  an svn_editor_alter_directory() call must be made for its parent
  directory with the target/eventual set of children.

It can't be quite right. Inside an add-directory, all the children have to
be added, and that would imply we need an alter-directory as well as the
add-directory, violating the Once Rule. It omits “copied” – just an
oversight? It does not specify when the call must occur – presumably it
must be before any such modifications to the children.

To remedy those three initial problems, I suppose something like this is
intended:

  Either alter_directory() or add_directory() must be called on a
  directory, declaring its final set of children, before calling
  delete(), move_away(), move_here(), copy(), or add_*() on any child.

  (For delete() or move_away(), it must be alter_directory(), as
  the children of add_directory() must not be deleted or moved away.)

But there is still a problem.  If we require alter_directory() before a
move_away(), it leads to a violation of the Once Rule as shown in the
following example.

Example 3: Swapping two directory levels

  |                     |
  +--A      --\ /-->    +--A
     |         X           |
     +--B   --/ \-->       +--B

  1. alter-dir A (children={}) ### Needed?

  2. mv-away A/B (id=”original A/B”)

  3. alter-dir / (children={A})

  4. mv-away A (id=”original A”)

  5. mv-here A (id=”original A/B”)

  6. alter-dir A (children={B}) ### Second alter-dir on same path, if (1)
was needed.

  7. mv-here A/B (id=”original A”)

There are two potential problems here:

  * We make an edit within subtree A (the “move-away A/B”) before moving A

  * The “alter-dir A” is performed twice

The first point violates the assumed constraint that we should disallow
edits within a subtree before moving the subtree.

The second point at face value violates the Once Rule. We can probably
resolve is by making the Once Rule apply per node rather than per path –
see below.

=== What If We Allow Edit Before Move? ===

We were assuming that we should disallow edits within a subtree before
moving the subtree. [Why?]

One solution might be to drop that requirement and let the subtree be moved
(move-away) part way through editing it, allowing all editing operations.
To accommodate such a change, some of the other rules that currently refer
to “a path” probably need to be reformulated to refer to “a path relative
to such a subtree” instead.

If we allow edits before and after moving, should we also allow edits after
the move-away and before the move-here? Not sure. It seems like that may be
more problematic for certain consumer architectures and so probably should
not be allowed. But is there a better way to decide?

=== What if the Once Rule is Per Node? ===

The path-based Once Rule was written something like this:

  A path should never be referenced more than once by the add_*, alter_*,
and delete
  operations (the "Once Rule"). The source path of a copy (and its
children, if a directory)
  may be copied many times, and are otherwise subject to the Once Rule. The
destination path
  of a copy [or move_here] may have alter_* operations applied, but not
add_* or delete.  If
  the destination path of a copy or move is a directory, then its children
are subject to the
  Once Rule. The source path of a move_away() (and its child paths) may be
replaced using
  add_*() or copy() or move_here() (where these new or copied nodes are
subject to the Once Rule).

Perhaps the Once Rule should not apply per path as it was stated, but
rather per node. If a directory is altered and then moved away, we should
be able to create a replacement directory, being a different node at the
same path, and then alter that.

  * The Once Rule applies per node, rather than per path. The definition of
a “node” is such that “move” moves an existing node (or nodes) to a new
path and does not create a new node, while “add” creates a new node and
“copy” creates a set of new nodes, each new node being different from all
other nodes that are or were in the tree even if it replaces an existing
node at the same path.

  * One of the following actions can be applied, just Once, directly to
each node:

      - create (only if it does not exist in the initial tree)

      - remove (only if it exists in the initial tree or is brought in as a
child of a copy)

      - modify (only if it exists in the initial tree or is brought in as a
child of a copy)

  * A node may be created by one of:

      - add_*()

      - copy(), optionally followed by alter_*()

    No other operations may be applied to such a node during the entire
edit. Any children may be edited after (not before) the node is created.
When a node is created by copy(), its children (recursively) are brought
into the tree, and are then subject to the Once Rule as existing nodes.

  * A node may be removed, along with all its children recursively, by one
of:

      - delete()

      - add_*() replacing this node

      - copy() replacing this node

      - move_here() replacing this node

    No other operations may be applied to such a node, nor to any children
(recursively), during the entire edit, except for nodes that have been
moved away.

  * A node may be modified by any combination from the following:

      - move_away() followed by move_here(), at most once

      - alter_*(), at most once

      - (if a directory) edit a child

    These can come in any order except that alter_*() is required before
editing any children, and neither of those can come between move-away and
move-here except for move-away of a deeper child [and/or editing of such
prior to move-away?].

    [### This seems more complex than it could be. It would be much easier
if alter_directory() were not required before adding/moving/editing
children.]

  * The source of a copy operation may be a node in the tree being edited.
Such a node (and its children, if a directory) may be copied many times, in
addition to being subject to the Once Rule as existing nodes.

    [### I'm not sure about the Copy operation: the doc string implies it's
copying from something in the current state, but it should be able to copy
from any path-rev.]


== CONCLUSION ==

With a bit more work on the ordering restrictions, especially the
requirement about calling alter-dir, it looks like this design may fit the
requirements.

Thoughts?

- Julian

Re: Ev2 using move-away and move-here

Posted by Philip Martin <ph...@wandisco.com>.
Julian Foad <ju...@btopenworld.com> writes:

>> As you pointed out when a copy replaces a mv-away it is still necessary
>> to alter-dir two different nodes at the same path:
>> 
>>     1. alter-dir A (children={...})
>>     2. mv-away A (id=...)
>>     3. copy A (src=...)
>>     4. alter-dir A (children={...})
>
> Well, that depends.  There may be no need to issue the alter-dir on
> the original 'A' at this stage, we could wait until it arrives at its
> final path (move-here B (id="original A"), then alter-dir B).

I was initially thinking that we could require alter-dir to happen
before mv-away, then the alter-dir path would always be a path in the
original tree.  However that doesn't work for copies where alter-dir has
to happen after the copy, or for moves from copies where, even if
alter-dir happens before the move, the path is still not a path in the
original tree.

So we have to allow

        copy A B
        alter-dir B (children={...})

which means there is little reason to require

        alter-dir X (children={...})
        mv-away X (id="original X")
        mv-here Y (id="original X")

and prohibit

        mv-away X (id="original X")
        mv-here Y (id="original X")
        alter-dir Y (children={...})

Another issue, there can be mv-aways inside copies and that allows
multiple mv-away at the same path:

        svn cp X A
        svn mv A/B/C C1
        svn rm A/B
        svn cp Y A/B
        svn mv A/B/C C2

Here we have two mv-aways for A/B/C, one for the C copied from X/B/C and
one for the C copied from Y/C.

       alter-dir / (children={..., C1, C2})
       copy X A
       mv-away A/B/C (id="something")
       mv-here C1 (id="something")
       copy Y A/B (replaces=...)
       mv-away A/B/C (id="something else")
       mv-here C2 (id="something else")

We can't use the source of the copy to make things unique either as that
could be copied more than once.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: Ev2 using move-away and move-here

Posted by Julian Foad <ju...@btopenworld.com>.
Philip Martin wrote:

> Julian Foad <ju...@btopenworld.com> writes:
>>  Example 3: Swapping two directory levels
>> 
>>    |                     |
>>    +--A      --\ /-->    +--A
>>       |         X           |
>>       +--B   --/ \-->       +--B
>> 
>>    1. alter-dir A (children={}) ### Needed?
>> 
>>    2. mv-away A/B (id=”original A/B”)
>> 
>>    3. alter-dir / (children={A})
>> 
>>    4. mv-away A (id=”original A”)
>> 
>>    5. mv-here A (id=”original A/B”)
>> 
>>    6. alter-dir A (children={B}) ### Second alter-dir on same path, if 
>>  (1) was needed.
>> 
>>    7. mv-here A/B (id=”original A”)
> 
> That's inconsistent: for A alter-dir is before mv-away while for A/B
> alter-dir is after mv-away.

In what I wrote here, alter-dir(/) is before mv-away(A), and there are two calls to alter-dir(A), in steps 1 and 6, before and after mv-away(A/B).  But anyway, yes, it's messed up, and your suggestion below is neater.

> A more consistent set of steps would be:
> 
>     1. alter-dir A/B (children={A})
> 
>     2. alter-dir A (children={})
> 
>     3. mv-away A/B (id="original A/B")
> 
>     4. mv-away A (id="original A")
> 
>     5. mv-here A (id="original A/B")
> 
>     6. mv-here A/B (id="original A")

Yes, that's nicer.  Yes, you can probably drop the alter-dir(/) like you have done, on the basis that the list of child names in (/) does not change even though their identities do change.  That's an under-specified part of the rules.

> Not sure whether 1 and 2 have any particular order.  In the working copy
> receiver the alter-dir for A/B causes A to be added as incomplete in
> A/B; it becomes complete at step 6.  I'm not sure whether the alter-dir
> for A that deletes a child causes anything to be marked.
> 
>>  There are two potential problems here:
>> 
>>    * We make an edit within subtree A (the “move-away A/B”) before moving A
>> 
>>    * The “alter-dir A” is performed twice
> 
> In this case there is no need to perform alter-dir on A twice.

Right, good.

> As you pointed out when a copy replaces a mv-away it is still necessary
> to alter-dir two different nodes at the same path:
> 
>     1. alter-dir A (children={...})
>     2. mv-away A (id=...)
>     3. copy A (src=...)
>     4. alter-dir A (children={...})

Well, that depends.  There may be no need to issue the alter-dir on the original 'A' at this stage, we could wait until it arrives at its final path (move-here B (id="original A"), then alter-dir B).

- Julian

Re: Ev2 using move-away and move-here

Posted by Philip Martin <ph...@wandisco.com>.
Julian Foad <ju...@btopenworld.com> writes:

> Example 3: Swapping two directory levels
>
>   |                     |
>   +--A      --\ /-->    +--A
>      |         X           |
>      +--B   --/ \-->       +--B
>
>   1. alter-dir A (children={}) ### Needed?
>
>   2. mv-away A/B (id=”original A/B”)
>
>   3. alter-dir / (children={A})
>
>   4. mv-away A (id=”original A”)
>
>   5. mv-here A (id=”original A/B”)
>
>   6. alter-dir A (children={B}) ### Second alter-dir on same path, if (1)
> was needed.
>
>   7. mv-here A/B (id=”original A”)

That's inconsistent: for A alter-dir is before mv-away while for A/B
alter-dir is after mv-away.  A more consistent set of steps would be:

    1. alter-dir A/B (children={A})

    2. alter-dir A (children={})

    3. mv-away A/B (id="original A/B")

    4. mv-away A (id="original A")

    5. mv-here A (id="original A/B")

    6. mv-here A/B (id="original A")

Not sure whether 1 and 2 have any particular order.  In the working copy
receiver the alter-dir for A/B causes A to be added as incomplete in
A/B; it becomes complete at step 6.  I'm not sure whether the alter-dir
for A that deletes a child causes anything to be marked.

> There are two potential problems here:
>
>   * We make an edit within subtree A (the “move-away A/B”) before moving A
>
>   * The “alter-dir A” is performed twice

In this case there is no need to perform alter-dir on A twice.

As you pointed out when a copy replaces a mv-away it is still necessary
to alter-dir two different nodes at the same path:

    1. alter-dir A (children={...})
    2. mv-away A (id=...)
    3. copy A (src=...)
    4. alter-dir A (children={...})

[...]

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: Ev2 using move-away and move-here

Posted by Julian Foad <ju...@btopenworld.com>.
My intention is to present possible designs for Ev2, and allow us to consider which we prefer.  In particular, we need to decide how important it is to adhere to each of the principles including:

  - the Once Rule
  - avoidance of temporary paths (closely related to Once Rule)
  - alter-dir ("final set of children = {...}") before altering dir's children
  - sequential modification

since different designs may trade off one of these against the others.

Greg Stein wrote:

> In the wiki page you started, I thought we already solved the move
> issue by using the original state for the source. [...]

No, we didn't come up with an actual solution, we came up with an idea that *might* lead to a solution -- it felt like it should.  My gut feeling is it doesn't play well with the Ev2 principles, but I am still going to explore that idea further as a possible alternative and present either how it could work or why it couldn't.

- Julian



> On Aug 12, 2013 10:43 AM, "Julian Foad" <ju...@btopenworld.com> wrote:
>> TLDR: Ev2 move support doesn't work with the "move" and "rotate"
>> instructions as currently defined, and I propose that it could work
>> instead with separate "move-away" and "move-here" instructions.

[...]


Re: Ev2 using move-away and move-here

Posted by Julian Foad <ju...@btopenworld.com>.
Philip Martin wrote:

[...]
>>>  === Examples Solved ===
>>> 
>>>  Example 1:
>>> 
>>>    |                     |
>>>    +--A  mv--\    add--> +--A
>>>               \             |
>>>                \-->         +--B
>>> 
>>>    1. alter-dir / (children={A})
> 
> Do we need to call alter-dir when a child is replaced or only when a
> child is added/removed?  If we look at the working copy update receiver
> it wants to record incomplete for adds but probably does nothing for
> replaces, it may not even do anything for deletes preferring to leave it
> to the delete call.

The exact definition of the requirement for calling alter_directory() is part of the discussion in the thread "Ev2 alter_directory deltas", to which I have just replied.

- Julian

>>>    2. mv-away A (id=“original A”)
>>>    3. add-directory A (children={B})
>>>    4. mv-here A/B (id=“original A”)

Re: Ev2 using move-away and move-here

Posted by Julian Foad <ju...@btopenworld.com>.
Branko Čibej wrote:
On 14 Aug 2013 16:23, "Julian Foad" <ju...@btopenworld.com> wrote:
>> > and also makes validating the drive easier.
>>
>> I'm not sure what you're thinking about validating the editor drive being easier.
> 
> Move away without a matching moved here (or the converse) is clearly invalid.
> It must be trivial for the receiver to detect that. Making the temporary
> locations explicit makes that so much easier.

It does?  I don't see how.  You're saying that keeping track of whether there are mismatches in a sequence such as this (which swaps siblings A and B):

  - move(A,tmp1)
  - move(B,A)
  - move(tmp1,B)

is so much easier than with a corresponding non-explicit sequence which could be


  - move(A,tmp1)
  - move(B,tmp2)
  - move(tmp2,A)

  - move(tmp1,B)

or

  - move(A,tmp1)
  - move(B,tmp2)  - move(tmp1,B)
  - move(tmp2,A)

?  Assuming an implementation whereby the receiver keep a list of nodes that are currently in temporary storage, I can see that the list will be shorter with the former protocol, and have fewer insertions and deletions, but that's all.


> Regarding direct move without intermediate state, IMO the driver should be
> required to to use that whenever it can.

That would necessitate us defining "whenever it can" algorithmically.  For example, calculating the minimum possible number of direct moves required to perform a given set of moves.  That seems difficult.  Granted the Ev2 design philosophy includes being minimal (no duplications) in general... but I just don't see this particular rule being easy to codify.  Perhaps we can do it.

> Driver always has enough info to know that receiver can process such a move. If
> it cannot, that indicates a bug in the driver.

Yup, agreed there.

- Julian


Re: Ev2 using move-away and move-here

Posted by Branko Čibej <br...@wandisco.com>.
On 14 Aug 2013 16:23, "Julian Foad" <ju...@btopenworld.com> wrote:
>
> Branko Čibej wrote:
>
> > and also makes validating the drive easier.
>
> I'm not sure what you're thinking about validating the editor drive being
easier.
> - Julian

Move away without a matching moved here (or the converse) is clearly
invalid. It must be trivial for the receiver to detect that. Making the
temporary locations explicit makes that so much easier.

Regarding direct move without intermediate state, IMO the driver should be
required to to use that whenever it can. Driver always has enough info to
know that receiver can process such a move. If it cannot, that indicates a
bug in the driver.

Re: Ev2 using move-away and move-here

Posted by Julian Foad <ju...@btopenworld.com>.
Branko Čibej wrote:

> Move away/from with Id is essentially introducing a namespace for temporary
> paths unrelated to the tree structure. Therefore you can change notation and
>  keep only one move operation with three variants:
> 

> move(a,b): direct move of node to new path
> move(a,tmp(1)): move to temporary slot
> move(tmp(1),b): move from temporary slot
> 

> N.b. this does not imply a single conflated move API on the code level, it
> merely describes the protocol.> 
> Making the edit driver assign temporary slots explicitly is better than
> forcing the receiver to second-guess the driver's intent,
I think what you're saying is that, in cases where the driver can issue a move(a,b) instead of a (functionally equivalent) pair of move-away immediately followed by move-here, then the receiver is likely to be able to process that single move more efficiently.  So having the three methods available is better than just having the latter two.

Do I understand you correctly?

We should probably just leave it as a "quality of implementation" issue for the editor driver to prefer single move(a,b) instructions over pairs of move-away, move-here.  We could try to come up with a requirement that it must do so in certain cases (starting with collapsing adjacent pairs of move-away, move-here), but I think this may be rather difficult to define fully and hard to justify.

> and also makes validating the drive easier.

I'm not sure what you're thinking about validating the editor drive being easier.
- Julian

Re: Ev2 using move-away and move-here

Posted by Branko Čibej <br...@wandisco.com>.
On phone so a bit constrained ...
Move away/from with Id is essentially introducing a namespace for temporary
paths unrelated to the tree structure. Therefore you can change notation
and keep only one move operation with three variants:

move(a,b): direct move of node to new path
move(a,tmp(1)): move to temporary slot
move(tmp(1),b): move from temporary slot

N.b. this does not imply a single conflated move API on the code level, it
merely describes the protocol.

Making the edit driver assign temporary slots explicitly is better than
forcing the receiver to second-guess the driver's intent, and also makes
validating the drive easier.

Re: Ev2 using move-away and move-here

Posted by Philip Martin <ph...@wandisco.com>.
Greg Stein <gs...@gmail.com> writes:

> In the wiki page you started, I thought we already solved the move issue by
> using the original state for the source. (and remove rotate)
>
> I didn't read this treatise, cuz I think it has an incorrect starting
> assumption. You state early about sequential edits, when we relaxed that a
> bit due to needing to retain the original state. Does it respond to the
> proposal from your wiki page?

I'm not sure to which wiki page you are referring.  My recollection of
the last dev discussion (see Julian's mail) is that there is still a
problem.

>> === Examples Solved ===
>>
>> Example 1:
>>
>>   |                     |
>>   +--A  mv--\    add--> +--A
>>              \             |
>>               \-->         +--B
>>
>>   1. alter-dir / (children={A})

Do we need to call alter-dir when a child is replaced or only when a
child is added/removed?  If we look at the working copy update receiver
it wants to record incomplete for adds but probably does nothing for
replaces, it may not even do anything for deletes preferring to leave it
to the delete call.

>>   2. mv-away A (id=“original A”)
>>   3. add-directory A (children={B})
>>   4. mv-here A/B (id=“original A”)
>>

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: Ev2 using move-away and move-here

Posted by Greg Stein <gs...@gmail.com>.
In the wiki page you started, I thought we already solved the move issue by
using the original state for the source. (and remove rotate)

I didn't read this treatise, cuz I think it has an incorrect starting
assumption. You state early about sequential edits, when we relaxed that a
bit due to needing to retain the original state. Does it respond to the
proposal from your wiki page?

Thx,
-g
On Aug 12, 2013 10:43 AM, "Julian Foad" <ju...@btopenworld.com> wrote:

> TLDR: Ev2 move support doesn't work with the "move" and "rotate"
> instructions as currently defined, and I propose that it could work instead
> with separate "move-away" and "move-here" instructions.
>
>
> == SUMMARY ==
>
> Ev2 was designed with an intention to support “move” semantics, but the
> current design of its move() method does not satisfy the requirements. Here
> I explain why, and present a solution.
>
> The following principles constrain or guide Ev2 design:
>
>   * An edit must be able to express any change that can be constructed in
> a WC and committed.
>
>   * An edit must be able to express any change that can be constructed in
> a repo across multiple revisions.
>
>   * Ev2 uses sequential description: a succession of incremental edits
> applied to an initial state gives a final state.
>
>   * Ev2 should not require the driver to use temporary paths to express a
> change.
>
> Given the constraints,
>
>   * Not all moves can be expressed by a “move source to destination”
> operation, with or without a “rotate” operation.
>
>   * Moves can be expressed, in general, using separate move-away and
> move-here operations.
>
> While other designs are also possible, a move-away and move-here design
> might be the best fit for the shape of the editor. Such a design is
> detailed below, but not yet finished.
>
>
> == WHY IS THIS NECESSARY? ==
>
> === Express Any Change ===
>
> Since the purpose of the editor is to communicate a change originating in
> a WC when it is committed, and a change originating in a repository when
> the WC is updated, then it must be able to express any such change. This
> includes updates across multiple revisions, and from a mixed-revision state
> to a single revision.
>
> Through a series of simple steps of the form “move X to Y”, some quite
> funky overall changes can be created. For example, starting from this state:
>
>   |
>   +-- A
>       |
>       +-- B
>
> the following sequence:
>
>   move /A/B /X
>   move /A /X/B
>   move /X /A
>
> results in swapping A with B. If those steps are performed in a WC and the
> result committed all at once, the editor needs to be able to handle it. If
> those steps are committed separately, and then a working copy is updated,
> the editor needs to be able to handle it.
>
> More examples are given later, some of which involve other operations such
> as “make directory” as well as moves. All of this emergent complexity
> results from the introduction of a simple “move” primitive, and there does
> not seem to be any acceptable way to further constrain the basic move that
> would simplify the derived complexity.Temporary Paths
>
> Paths in the Ev2 editor operations refer to the current state of the tree
> being edited, at that moment in the sequence of edit operations. (The sole
> exception is the copy-from path, which is relative to a given committed
> revision.)
>
> A natural and compact way of expressing moves would be as a mapping from
> original-state paths to final-state paths. However, that paradigm does not
> fit well with the desire to express the edit as a sequence of incremental
> steps. If we are going to include move functionality as steps in a sequence
> of edit operations, it makes sense to use paths that are relative to the
> current state.
>
> Ev2 should not require the driver to express a change as a sequence of
> operations that can include moving a node to a temporary path and then
> later moving it again to the final path. The end result of the edit is the
> important part, and there are unlimited potential sequences that lead to
> that result, and it does not make sense to require an edit driver to
> construct such a sequence arbitrarily, if there is an alternative method
> that specifies the result uniquely. The receiver may in fact need to employ
> temporary paths in its implementation, but then it knows this and it is in
> a better position to construct such paths when needed, and it will know
> that they are temporary which may be important.
>
> There are advantages to placing a node in its final position exactly once.
> It was claimed that Google's BigTable implementation of Svn's back-end
> would have benefited from knowing that once a node has been edited then it
> is in its final state. Ev2 aims to provide that guarantee, where Ev1 could
> not.
>
> === Sequential Description ===
>
> Ev2 (svn_editor_t) intends to express a new state in terms of an old state
> and a description of the parts of the new state that differ from the old
> state, or, in other words, a description of the changes against the old
> state. It uses a sequential, incremental description: for example, “add
> directory X, then copy file X/Y from somewhere, then edit the contents and
> properties of file X/Y”.
>
> Only certain parts of the description are incremental. Ev2 aims to allow
> nodes to be visited in arbitrary order, subject to a small number of
> restrictions. The parts where ordering matters are:
>   * tree changes before content changes
>
>   * alter/add/copy a directory (if required) before touching its
> (immediate) children
>
> === A Move Event is Not Adequate ===
>
> Moves, in general, cannot be expressed as “move from path X to path Y”
> events in such a sequential description without introducing temporary
> paths. This is because some cases require the source path of the move to be
> moved away, then some other steps, and then the destination path of the
> move can be populated. Some classic examples are:
>
> Example 1: Insert a directory level
>
>   |                     |
>   +--A  mv--\     (add) +--A
>              \             |
>               \-->         +--B
>
> The add cannot happen before the move-away, but must happen before the
> move-here.
>
> (The mirror of Example 1, which would be removing a directory level, is
> not actually problematic:
>
>   |                     |
>   +--A (del)     /-->   +--A
>      |          /
>      +--B  mv--/
>
> The move-away must (in principle) happen before the delete, while the
> move-here cannot (in principle) happen before the delete. However, Ev2
> defines that a deletion which is to be replaced shall occur implicitly when
> the replacement is put in place, and so the move-here can happen
> simultaneously with the delete.)
>
> Example 2: Swap two siblings
>
>   |                     |
>   +--A     mv--\ /-->   +--A
>   |             X       |
>   +--B     mv--/ \-->   +--B
>
> Neither of the moves can be completed before doing the move-away part of
> the other one.
>
> Example 3: Swap two directory levels
>
>   |                     |
>   +--A     mv--\ /-->   +--A
>      |          X          |
>      +--B  mv--/ \-->      +--B
> Neither of the moves can be completed before doing the move-away part of
> the other one.
>
> === A Rotate Event is Not Adequate ===
>
> At one time there was an idea that the addition of a “rotate PATH1 PATH2 …
> PATHn” event would complete the semantics and allow arbitrary moves to be
> supported.
>
> While this does enable Example 2 (swap two siblings) and Example 3 (swap
> two directory levels) and many other cases, it does not help with inserting
> a directory level (Example 1), and it has been shown [1] to be incapable of
> resolving other more involved cases involving swapping or rotation. One
> specific example is swapping A with A/B/C [2]:
>
>   |                            |
>   +-- A          mv--\   /-->  +-- A
>       |               \ /          |
>       +-- B      mv--- X --->      +-- B
>           |           / \              |
>           +-- C  mv--/   \-->          +-- C
>
> [1] Email thread on dev@, “Ev2 as a move-aware editor”, started on
> 2013-06-24, e.g. <http://svn.haxx.se/dev/archive-2013-06/0560.shtml> or <
> http://mail-archives.apache.org/mod_mbox/subversion-dev/201306.mbox/%3C1372087321.76009.YahooMailNeo%40web186101.mail.ir2.yahoo.com%3E
> >.
>
> [2] An example problem in thread [1], of swapping A with A/B/C: <
> http://svn.haxx.se/dev/archive-2013-06/0684.shtml> or <
> http://mail-archives.apache.org/mod_mbox/subversion-dev/201306.mbox/%3C87bo6rewwp.fsf%40ntlworld.com%3E
> >.
>
>
> == MOVE-AWAY AND MOVE-HERE ==
>
> One solution is to describe the two halves of each move separately:
>
>   move-away SOURCE-PATH
>   ...
>   move-here DESTINATION-PATH
>
> We can then solve Example 1 in the following way: issue the “move-away A”,
> then create a new directory at path A which replaces that source of the
> move, and then finally issue the “move-here A/B” which relies on that
> replacement directory A having been created.
>
> The consumer must be able to put the node aside for an indeterminate
> amount of time until the “move-here” is received.
>
> Of course there needs to be a way to link each move-away with the
> corresponding move-here. Remembering that each edit step refers to the
> current state in a sequence of states, we cannot simply specify the path
> corresponding to the other end of the move like this:
>
>   move-away SOURCE-PATH to DESTINATION-PATH
>   ...
>   move-here DESTINATION-PATH from SOURCE-PATH
>
> because the problem cases are when the destination path does not yet exist
> at the time of a move-away, or the source path no longer exists at the time
> of a move-here. What we can do is use some other unique reference that is
> unique within the edit, like this:
>
>   move-away SOURCE-PATH as identifier ID1
>   ...
>   move-here DESTINATION-PATH from identifier ID1
>
> The reference could perhaps be the destination path as it will finally
> exist at the end of the edit, or just and arbitrary number or string. We
> will just specify the identifier as an “id” and not specify how it is
> generated.
>
> === Ordering Restrictions ===
>
> The ordering rules regarding move-away and move-here should include:
>   * mv-away must come before the matching mv-here
>
>     - The edit should provide a sequential procedure that the consumer
> must be able to follow without having to buffer an arbitrary amount of
> state.
>
>   * mv-here & cp & add must be in nesting order: create (or put in place)
> the parent before its children
>
>   * mv-away must come before deleting a parent
>
>     - Receiver needs to know that it must preserve this path when we
> delete its parent.
>
>   * mv-away must come before mv-away of a parent
>
>     - If we allowed “mv-away A; …; mv-away A/B” then the child path “A/B”
> would have to be specified not relative to the current state of the edit,
> as all other operative paths are, but in some other way, because the parent
> has gone into temporary namespace, and has perhaps been replaced so that
> “A/B” now refers to some other node.
>
>     - There is a general rule that all edits within a moved directory “A”
> must come after A is moved its destination, but a mv-away of a subtree of A
> is not considered an edit for this purpose.
>
> === Examples Solved ===
>
> Example 1:
>
>   |                     |
>   +--A  mv--\    add--> +--A
>              \             |
>               \-->         +--B
>
>   1. alter-dir / (children={A})
>   2. mv-away A (id=“original A”)
>   3. add-directory A (children={B})
>   4. mv-here A/B (id=“original A”)
>
> Example 2: Swapping two siblings
>
>   |                     |
>   +--A      --\ /-->    +--A
>   |            X        |
>   +--B      --/ \-->    +--B
>
>   1. alter-dir / (children={A,B})
>   2. mv-away A (id=“original A”)
>   3. mv-away B (id=“original B”)
>   4. mv-here A (id=“original B”)
>   5. mv-here B (id=“original A”)
>
> Example 3 can also be solved in this way, except for some ordering
> restriction issues that are discussed below.The Need for Alter-directory
> before Tree Changes
>
> Why do we have this ordering rule?
>
>   If any path is added (with add_*) or deleted/moved/rotated, then
>   an svn_editor_alter_directory() call must be made for its parent
>   directory with the target/eventual set of children.
>
> It can't be quite right. Inside an add-directory, all the children have to
> be added, and that would imply we need an alter-directory as well as the
> add-directory, violating the Once Rule. It omits “copied” – just an
> oversight? It does not specify when the call must occur – presumably it
> must be before any such modifications to the children.
>
> To remedy those three initial problems, I suppose something like this is
> intended:
>
>   Either alter_directory() or add_directory() must be called on a
>   directory, declaring its final set of children, before calling
>   delete(), move_away(), move_here(), copy(), or add_*() on any child.
>
>   (For delete() or move_away(), it must be alter_directory(), as
>   the children of add_directory() must not be deleted or moved away.)
>
> But there is still a problem.  If we require alter_directory() before a
> move_away(), it leads to a violation of the Once Rule as shown in the
> following example.
>
> Example 3: Swapping two directory levels
>
>   |                     |
>   +--A      --\ /-->    +--A
>      |         X           |
>      +--B   --/ \-->       +--B
>
>   1. alter-dir A (children={}) ### Needed?
>
>   2. mv-away A/B (id=”original A/B”)
>
>   3. alter-dir / (children={A})
>
>   4. mv-away A (id=”original A”)
>
>   5. mv-here A (id=”original A/B”)
>
>   6. alter-dir A (children={B}) ### Second alter-dir on same path, if (1)
> was needed.
>
>   7. mv-here A/B (id=”original A”)
>
> There are two potential problems here:
>
>   * We make an edit within subtree A (the “move-away A/B”) before moving A
>
>   * The “alter-dir A” is performed twice
>
> The first point violates the assumed constraint that we should disallow
> edits within a subtree before moving the subtree.
>
> The second point at face value violates the Once Rule. We can probably
> resolve is by making the Once Rule apply per node rather than per path –
> see below.
>
> === What If We Allow Edit Before Move? ===
>
> We were assuming that we should disallow edits within a subtree before
> moving the subtree. [Why?]
>
> One solution might be to drop that requirement and let the subtree be
> moved (move-away) part way through editing it, allowing all editing
> operations. To accommodate such a change, some of the other rules that
> currently refer to “a path” probably need to be reformulated to refer to “a
> path relative to such a subtree” instead.
>
> If we allow edits before and after moving, should we also allow edits
> after the move-away and before the move-here? Not sure. It seems like that
> may be more problematic for certain consumer architectures and so probably
> should not be allowed. But is there a better way to decide?
>
> === What if the Once Rule is Per Node? ===
>
> The path-based Once Rule was written something like this:
>
>   A path should never be referenced more than once by the add_*, alter_*,
> and delete
>   operations (the "Once Rule"). The source path of a copy (and its
> children, if a directory)
>   may be copied many times, and are otherwise subject to the Once Rule.
> The destination path
>   of a copy [or move_here] may have alter_* operations applied, but not
> add_* or delete.  If
>   the destination path of a copy or move is a directory, then its children
> are subject to the
>   Once Rule. The source path of a move_away() (and its child paths) may be
> replaced using
>   add_*() or copy() or move_here() (where these new or copied nodes are
> subject to the Once Rule).
>
> Perhaps the Once Rule should not apply per path as it was stated, but
> rather per node. If a directory is altered and then moved away, we should
> be able to create a replacement directory, being a different node at the
> same path, and then alter that.
>
>   * The Once Rule applies per node, rather than per path. The definition
> of a “node” is such that “move” moves an existing node (or nodes) to a new
> path and does not create a new node, while “add” creates a new node and
> “copy” creates a set of new nodes, each new node being different from all
> other nodes that are or were in the tree even if it replaces an existing
> node at the same path.
>
>   * One of the following actions can be applied, just Once, directly to
> each node:
>
>       - create (only if it does not exist in the initial tree)
>
>       - remove (only if it exists in the initial tree or is brought in as
> a child of a copy)
>
>       - modify (only if it exists in the initial tree or is brought in as
> a child of a copy)
>
>   * A node may be created by one of:
>
>       - add_*()
>
>       - copy(), optionally followed by alter_*()
>
>     No other operations may be applied to such a node during the entire
> edit. Any children may be edited after (not before) the node is created.
> When a node is created by copy(), its children (recursively) are brought
> into the tree, and are then subject to the Once Rule as existing nodes.
>
>   * A node may be removed, along with all its children recursively, by one
> of:
>
>       - delete()
>
>       - add_*() replacing this node
>
>       - copy() replacing this node
>
>       - move_here() replacing this node
>
>     No other operations may be applied to such a node, nor to any children
> (recursively), during the entire edit, except for nodes that have been
> moved away.
>
>   * A node may be modified by any combination from the following:
>
>       - move_away() followed by move_here(), at most once
>
>       - alter_*(), at most once
>
>       - (if a directory) edit a child
>
>     These can come in any order except that alter_*() is required before
> editing any children, and neither of those can come between move-away and
> move-here except for move-away of a deeper child [and/or editing of such
> prior to move-away?].
>
>     [### This seems more complex than it could be. It would be much easier
> if alter_directory() were not required before adding/moving/editing
> children.]
>
>   * The source of a copy operation may be a node in the tree being edited.
> Such a node (and its children, if a directory) may be copied many times, in
> addition to being subject to the Once Rule as existing nodes.
>
>     [### I'm not sure about the Copy operation: the doc string implies
> it's copying from something in the current state, but it should be able to
> copy from any path-rev.]
>
>
> == CONCLUSION ==
>
> With a bit more work on the ordering restrictions, especially the
> requirement about calling alter-dir, it looks like this design may fit the
> requirements.
>
> Thoughts?
>
> - Julian
>
>