You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Julian Foad <ju...@btopenworld.com> on 2013/08/12 17:42:40 UTC

Ev2 using move-away and move-here

 TLDR: Ev2 move support doesn't work with the "move" and "rotate"
instructions as currently defined, and I propose that it could work instead
with separate "move-away" and "move-here" instructions.


== SUMMARY ==

Ev2 was designed with an intention to support “move” semantics, but the
current design of its move() method does not satisfy the requirements. Here
I explain why, and present a solution.

The following principles constrain or guide Ev2 design:

  * An edit must be able to express any change that can be constructed in a
WC and committed.

  * An edit must be able to express any change that can be constructed in a
repo across multiple revisions.

  * Ev2 uses sequential description: a succession of incremental edits
applied to an initial state gives a final state.

  * Ev2 should not require the driver to use temporary paths to express a
change.

Given the constraints,

  * Not all moves can be expressed by a “move source to destination”
operation, with or without a “rotate” operation.

  * Moves can be expressed, in general, using separate move-away and
move-here operations.

While other designs are also possible, a move-away and move-here design
might be the best fit for the shape of the editor. Such a design is
detailed below, but not yet finished.


== WHY IS THIS NECESSARY? ==

=== Express Any Change ===

Since the purpose of the editor is to communicate a change originating in a
WC when it is committed, and a change originating in a repository when the
WC is updated, then it must be able to express any such change. This
includes updates across multiple revisions, and from a mixed-revision state
to a single revision.

Through a series of simple steps of the form “move X to Y”, some quite
funky overall changes can be created. For example, starting from this state:

  |
  +-- A
      |
      +-- B

the following sequence:

  move /A/B /X
  move /A /X/B
  move /X /A

results in swapping A with B. If those steps are performed in a WC and the
result committed all at once, the editor needs to be able to handle it. If
those steps are committed separately, and then a working copy is updated,
the editor needs to be able to handle it.

More examples are given later, some of which involve other operations such
as “make directory” as well as moves. All of this emergent complexity
results from the introduction of a simple “move” primitive, and there does
not seem to be any acceptable way to further constrain the basic move that
would simplify the derived complexity.Temporary Paths

Paths in the Ev2 editor operations refer to the current state of the tree
being edited, at that moment in the sequence of edit operations. (The sole
exception is the copy-from path, which is relative to a given committed
revision.)

A natural and compact way of expressing moves would be as a mapping from
original-state paths to final-state paths. However, that paradigm does not
fit well with the desire to express the edit as a sequence of incremental
steps. If we are going to include move functionality as steps in a sequence
of edit operations, it makes sense to use paths that are relative to the
current state.

Ev2 should not require the driver to express a change as a sequence of
operations that can include moving a node to a temporary path and then
later moving it again to the final path. The end result of the edit is the
important part, and there are unlimited potential sequences that lead to
that result, and it does not make sense to require an edit driver to
construct such a sequence arbitrarily, if there is an alternative method
that specifies the result uniquely. The receiver may in fact need to employ
temporary paths in its implementation, but then it knows this and it is in
a better position to construct such paths when needed, and it will know
that they are temporary which may be important.

There are advantages to placing a node in its final position exactly once.
It was claimed that Google's BigTable implementation of Svn's back-end
would have benefited from knowing that once a node has been edited then it
is in its final state. Ev2 aims to provide that guarantee, where Ev1 could
not.

=== Sequential Description ===

Ev2 (svn_editor_t) intends to express a new state in terms of an old state
and a description of the parts of the new state that differ from the old
state, or, in other words, a description of the changes against the old
state. It uses a sequential, incremental description: for example, “add
directory X, then copy file X/Y from somewhere, then edit the contents and
properties of file X/Y”.

Only certain parts of the description are incremental. Ev2 aims to allow
nodes to be visited in arbitrary order, subject to a small number of
restrictions. The parts where ordering matters are:
  * tree changes before content changes

  * alter/add/copy a directory (if required) before touching its
(immediate) children

=== A Move Event is Not Adequate ===

Moves, in general, cannot be expressed as “move from path X to path Y”
events in such a sequential description without introducing temporary
paths. This is because some cases require the source path of the move to be
moved away, then some other steps, and then the destination path of the
move can be populated. Some classic examples are:

Example 1: Insert a directory level

  |                     |
  +--A  mv--\     (add) +--A
             \             |
              \-->         +--B

The add cannot happen before the move-away, but must happen before the
move-here.

(The mirror of Example 1, which would be removing a directory level, is not
actually problematic:

  |                     |
  +--A (del)     /-->   +--A
     |          /
     +--B  mv--/

The move-away must (in principle) happen before the delete, while the
move-here cannot (in principle) happen before the delete. However, Ev2
defines that a deletion which is to be replaced shall occur implicitly when
the replacement is put in place, and so the move-here can happen
simultaneously with the delete.)

Example 2: Swap two siblings

  |                     |
  +--A     mv--\ /-->   +--A
  |             X       |
  +--B     mv--/ \-->   +--B

Neither of the moves can be completed before doing the move-away part of
the other one.

Example 3: Swap two directory levels

  |                     |
  +--A     mv--\ /-->   +--A
     |          X          |
     +--B  mv--/ \-->      +--B
Neither of the moves can be completed before doing the move-away part of
the other one.

=== A Rotate Event is Not Adequate ===

At one time there was an idea that the addition of a “rotate PATH1 PATH2 …
PATHn” event would complete the semantics and allow arbitrary moves to be
supported.

While this does enable Example 2 (swap two siblings) and Example 3 (swap
two directory levels) and many other cases, it does not help with inserting
a directory level (Example 1), and it has been shown [1] to be incapable of
resolving other more involved cases involving swapping or rotation. One
specific example is swapping A with A/B/C [2]:

  |                            |
  +-- A          mv--\   /-->  +-- A
      |               \ /          |
      +-- B      mv--- X --->      +-- B
          |           / \              |
          +-- C  mv--/   \-->          +-- C

[1] Email thread on dev@, “Ev2 as a move-aware editor”, started on
2013-06-24, e.g. <http://svn.haxx.se/dev/archive-2013-06/0560.shtml> or <
http://mail-archives.apache.org/mod_mbox/subversion-dev/201306.mbox/%3C1372087321.76009.YahooMailNeo%40web186101.mail.ir2.yahoo.com%3E
>.

[2] An example problem in thread [1], of swapping A with A/B/C: <
http://svn.haxx.se/dev/archive-2013-06/0684.shtml> or <
http://mail-archives.apache.org/mod_mbox/subversion-dev/201306.mbox/%3C87bo6rewwp.fsf%40ntlworld.com%3E
>.


== MOVE-AWAY AND MOVE-HERE ==

One solution is to describe the two halves of each move separately:

  move-away SOURCE-PATH
  ...
  move-here DESTINATION-PATH

We can then solve Example 1 in the following way: issue the “move-away A”,
then create a new directory at path A which replaces that source of the
move, and then finally issue the “move-here A/B” which relies on that
replacement directory A having been created.

The consumer must be able to put the node aside for an indeterminate amount
of time until the “move-here” is received.

Of course there needs to be a way to link each move-away with the
corresponding move-here. Remembering that each edit step refers to the
current state in a sequence of states, we cannot simply specify the path
corresponding to the other end of the move like this:

  move-away SOURCE-PATH to DESTINATION-PATH
  ...
  move-here DESTINATION-PATH from SOURCE-PATH

because the problem cases are when the destination path does not yet exist
at the time of a move-away, or the source path no longer exists at the time
of a move-here. What we can do is use some other unique reference that is
unique within the edit, like this:

  move-away SOURCE-PATH as identifier ID1
  ...
  move-here DESTINATION-PATH from identifier ID1

The reference could perhaps be the destination path as it will finally
exist at the end of the edit, or just and arbitrary number or string. We
will just specify the identifier as an “id” and not specify how it is
generated.

=== Ordering Restrictions ===

The ordering rules regarding move-away and move-here should include:
  * mv-away must come before the matching mv-here

    - The edit should provide a sequential procedure that the consumer must
be able to follow without having to buffer an arbitrary amount of state.

  * mv-here & cp & add must be in nesting order: create (or put in place)
the parent before its children

  * mv-away must come before deleting a parent

    - Receiver needs to know that it must preserve this path when we delete
its parent.

  * mv-away must come before mv-away of a parent

    - If we allowed “mv-away A; …; mv-away A/B” then the child path “A/B”
would have to be specified not relative to the current state of the edit,
as all other operative paths are, but in some other way, because the parent
has gone into temporary namespace, and has perhaps been replaced so that
“A/B” now refers to some other node.

    - There is a general rule that all edits within a moved directory “A”
must come after A is moved its destination, but a mv-away of a subtree of A
is not considered an edit for this purpose.

=== Examples Solved ===

Example 1:

  |                     |
  +--A  mv--\    add--> +--A
             \             |
              \-->         +--B

  1. alter-dir / (children={A})
  2. mv-away A (id=“original A”)
  3. add-directory A (children={B})
  4. mv-here A/B (id=“original A”)

Example 2: Swapping two siblings

  |                     |
  +--A      --\ /-->    +--A
  |            X        |
  +--B      --/ \-->    +--B

  1. alter-dir / (children={A,B})
  2. mv-away A (id=“original A”)
  3. mv-away B (id=“original B”)
  4. mv-here A (id=“original B”)
  5. mv-here B (id=“original A”)

Example 3 can also be solved in this way, except for some ordering
restriction issues that are discussed below.The Need for Alter-directory
before Tree Changes

Why do we have this ordering rule?

  If any path is added (with add_*) or deleted/moved/rotated, then
  an svn_editor_alter_directory() call must be made for its parent
  directory with the target/eventual set of children.

It can't be quite right. Inside an add-directory, all the children have to
be added, and that would imply we need an alter-directory as well as the
add-directory, violating the Once Rule. It omits “copied” – just an
oversight? It does not specify when the call must occur – presumably it
must be before any such modifications to the children.

To remedy those three initial problems, I suppose something like this is
intended:

  Either alter_directory() or add_directory() must be called on a
  directory, declaring its final set of children, before calling
  delete(), move_away(), move_here(), copy(), or add_*() on any child.

  (For delete() or move_away(), it must be alter_directory(), as
  the children of add_directory() must not be deleted or moved away.)

But there is still a problem.  If we require alter_directory() before a
move_away(), it leads to a violation of the Once Rule as shown in the
following example.

Example 3: Swapping two directory levels

  |                     |
  +--A      --\ /-->    +--A
     |         X           |
     +--B   --/ \-->       +--B

  1. alter-dir A (children={}) ### Needed?

  2. mv-away A/B (id=”original A/B”)

  3. alter-dir / (children={A})

  4. mv-away A (id=”original A”)

  5. mv-here A (id=”original A/B”)

  6. alter-dir A (children={B}) ### Second alter-dir on same path, if (1)
was needed.

  7. mv-here A/B (id=”original A”)

There are two potential problems here:

  * We make an edit within subtree A (the “move-away A/B”) before moving A

  * The “alter-dir A” is performed twice

The first point violates the assumed constraint that we should disallow
edits within a subtree before moving the subtree.

The second point at face value violates the Once Rule. We can probably
resolve is by making the Once Rule apply per node rather than per path –
see below.

=== What If We Allow Edit Before Move? ===

We were assuming that we should disallow edits within a subtree before
moving the subtree. [Why?]

One solution might be to drop that requirement and let the subtree be moved
(move-away) part way through editing it, allowing all editing operations.
To accommodate such a change, some of the other rules that currently refer
to “a path” probably need to be reformulated to refer to “a path relative
to such a subtree” instead.

If we allow edits before and after moving, should we also allow edits after
the move-away and before the move-here? Not sure. It seems like that may be
more problematic for certain consumer architectures and so probably should
not be allowed. But is there a better way to decide?

=== What if the Once Rule is Per Node? ===

The path-based Once Rule was written something like this:

  A path should never be referenced more than once by the add_*, alter_*,
and delete
  operations (the "Once Rule"). The source path of a copy (and its
children, if a directory)
  may be copied many times, and are otherwise subject to the Once Rule. The
destination path
  of a copy [or move_here] may have alter_* operations applied, but not
add_* or delete.  If
  the destination path of a copy or move is a directory, then its children
are subject to the
  Once Rule. The source path of a move_away() (and its child paths) may be
replaced using
  add_*() or copy() or move_here() (where these new or copied nodes are
subject to the Once Rule).

Perhaps the Once Rule should not apply per path as it was stated, but
rather per node. If a directory is altered and then moved away, we should
be able to create a replacement directory, being a different node at the
same path, and then alter that.

  * The Once Rule applies per node, rather than per path. The definition of
a “node” is such that “move” moves an existing node (or nodes) to a new
path and does not create a new node, while “add” creates a new node and
“copy” creates a set of new nodes, each new node being different from all
other nodes that are or were in the tree even if it replaces an existing
node at the same path.

  * One of the following actions can be applied, just Once, directly to
each node:

      - create (only if it does not exist in the initial tree)

      - remove (only if it exists in the initial tree or is brought in as a
child of a copy)

      - modify (only if it exists in the initial tree or is brought in as a
child of a copy)

  * A node may be created by one of:

      - add_*()

      - copy(), optionally followed by alter_*()

    No other operations may be applied to such a node during the entire
edit. Any children may be edited after (not before) the node is created.
When a node is created by copy(), its children (recursively) are brought
into the tree, and are then subject to the Once Rule as existing nodes.

  * A node may be removed, along with all its children recursively, by one
of:

      - delete()

      - add_*() replacing this node

      - copy() replacing this node

      - move_here() replacing this node

    No other operations may be applied to such a node, nor to any children
(recursively), during the entire edit, except for nodes that have been
moved away.

  * A node may be modified by any combination from the following:

      - move_away() followed by move_here(), at most once

      - alter_*(), at most once

      - (if a directory) edit a child

    These can come in any order except that alter_*() is required before
editing any children, and neither of those can come between move-away and
move-here except for move-away of a deeper child [and/or editing of such
prior to move-away?].

    [### This seems more complex than it could be. It would be much easier
if alter_directory() were not required before adding/moving/editing
children.]

  * The source of a copy operation may be a node in the tree being edited.
Such a node (and its children, if a directory) may be copied many times, in
addition to being subject to the Once Rule as existing nodes.

    [### I'm not sure about the Copy operation: the doc string implies it's
copying from something in the current state, but it should be able to copy
from any path-rev.]


== CONCLUSION ==

With a bit more work on the ordering restrictions, especially the
requirement about calling alter-dir, it looks like this design may fit the
requirements.

Thoughts?

- Julian