You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Greg Stein <gs...@gmail.com> on 2013/09/05 09:13:26 UTC

Move using initial state (was: Update of "MoveDev/Ev2MovesDesign" ...)

On Wed, Sep 4, 2013 at 10:43 AM, Apache subversion Wiki
<co...@subversion.apache.org> wrote:
>...
> Given these constraints, not all combinations of moves can be expressed using a “move source to destination” operation, with or without a “rotate” operation, without using temporary paths.

I'm not buying that you need two operations. The "move uses initial
state" tweak seems fully adequate and gets us the single-op/atomicity
principle that Ev2 was designed under.

-g

Re: Move using initial state

Posted by Julian Foad <ju...@btopenworld.com>.

Philip Martin wrote:

> Greg Stein <gs...@gmail.com> writes:
>> [from the Wiki]
>>>  Given these constraints, not all combinations of moves can be expressed
>>> using a "move source to destination" operation, with or without a "rotate"
>>> operation, without using temporary paths.
>> 
>>  I'm not buying that you need two operations. The "move uses initial
>>  state" tweak seems fully adequate and gets us the single-op/atomicity
>>  principle that Ev2 was designed under.

Hi Greg.  I've given these issues a *lot* of thought over the last couple of months and have come up with some schemes which are close to being well defined, though the impedance mismatch between the existing purely path-based semantics and the "line of history" concept that's needed to define the whole idea of "moves" is a big obstacle.

I have not been able to work the "move uses initial state" idea into a real scheme, and I think that's because it's not such a silver bullet as it sounds.  I am not saying it's not possible to come up with some scheme that incorporates the idea, but referring to it as a "tweak" that "seems fully adequate" does nothing to help its cause.  Without a hard spec, I'm sorry but it's just wishful thinking.


Philip wrote:
> At some point we have to create temporary locations and I suppose it
> might be possible for the receiver to generate them as necessary.
> However I'm still struggling to understand the ordering of moves and
> alter_dirs so I can't determine whether that is practical or sensible.
> 
> Given this example:
> 
>    svn mv A     X
>    svn mv X/B/C A
>    svn mv X/B   A/B
>    svn mv X     A/B/C
>    svn ci
> 
> or the equivalent:
> 
>    svn mv A/B/C X
>    svn mv A/B   X/B
>    svn mv A     X/B/C
>    svn mv X     A
>    cvn ci
> 
> we have this set of Ev2 moves in some order:
> 
>    move A, A/B/C
>    move A/B, A/B
>    move A/B/C, A
> 
> What is the correct order for these operations?  I guess there may be
> more than one valid order since I showed two possible temporaries,
> perhaps both
> 
>    move A, A/B/C
>    move A/B, A/B
>    move A/B/C, A
> 
> and
> 
>    move A/B/C, A
>    move A/B, A/B
>    move A, A/B/C
> 
> are valid. Or perhaps the alter_dir ordering rules exclude one?
> 
> What about alter_dir?  I think the rule is that alter_dir on a directory
> should occur before add or delete affects the children of the directory.
> There is also a rule:
> 
> * - The ancestor of an added, copied-here, moved-here, or
> *   modified node may not be deleted. The ancestor may not be moved
> *   (instead: perform the move, *then* the edits).
> 
> It's not clear where alter_dir should occur w.r.t the moves in my
> example.  Does alter_dir count as an edit that should occur after move?
> Do we pass initial state paths:
> 
>    alter_dir .,     children='A'
>    alter_dir A,     children=''
>    alter_dir A/B,   children='C'
>    alter_dir A/B/C, children='B'
> 
> or final_state paths:
> 
>    alter_dir .,     children='A'
>    alter_dir A,     children='B'
>    alter_dir A/B,   children='C'
>    alter_dir A/B/C, children=''

Quite.  I have come to the conclusion that all of these kinds of questions stem from the fact that the existing Ev2 rules were written with a path-based perspective.  In order to talk rationally about moves, it is necessary to introduce a 
lines-of-history concept at least somewhere in the thinking behind the 
model.  I have been through numerous possible adaptations of the "alter 
before editing" and "once rule" ideas to a move-aware world, and the best I have been able to achieve is currently documented in the Wiki here:

  http://wiki.apache.org/subversion/MoveDev/Ev2MovesDesign

If you are willing and able to review that and suggest the necessary fixes that would be extremely useful.

In parallel with that I've also been working on how we would extend Ev1:

  http://wiki.apache.org/subversion/MoveDev/Ev15MovesDesign
  
which has the same path-based/line-of-history impedance mismatch problems but is easier to work with because the starting point is known and understood.

- Julian

Re: Move using initial state

Posted by Greg Stein <gs...@gmail.com>.

On Fri, Sep 6, 2013 at 11:36 AM, Branko Čibej <br...@wandisco.com> wrote:
> On 06.09.2013 17:50, Philip Martin wrote:
>> Philip Martin <ph...@wandisco.com> writes:
>...
>> I've been thinking about alter_dir and I see no reason, in the update
>> editor at least, for a rule that requires alter_dir before adding or
>> removing children.  The Ev2 "once" rule is designed to ensure that Ev2
>> actions can be applied to the nodes in the working copy as the actions
>> are received and that the working copy nodes will always reflect
>> repository nodes.  This doesn't require alter_dir on the parent before
>> add/delete of children.
>
> Actually, the Once Rule is way more important for server implementations
> than client implementations. It effectively defines when the server can
> commit changes to a node and assume no further changes will happen
> later.

Yes.

In conversations with Jon Trowbridge a few years back, he described
the trouble Google had with a rewrite of their svn/BigTable backend
because they couldn't determine when a node was "done".

A secondary driver for the Once Rule is an atomic change for a
directory so that it doesn't remain in the "incomplete" state during
an update (and this was carried into wc_db.h). Instead, the *children*
are marked incomplete, pending future node actions. The directory
itself is modified and "done" in a single operation.

The notion of single, atomic operations is a huge design point of Ev2.
This is why I'm *extremely* leery of the dual entry points (and
unknown duration!) for performing a move. I have yet to see a
description of where/how source-initial-state and a single move
operation breaks down.

>...

To Philip's point: it does seem quite reasonable to allow child
changes before an alter_dir(). I could have a directory at r10, and a
(new) child at r11. That is allowed and logical. Thus, an editor drive
could move a child to r11 before it moves the parent to r11 (which
specifies that child).

Cheers,
-g

Re: Move using initial state

Posted by Branko Čibej <br...@wandisco.com>.

On 06.09.2013 17:50, Philip Martin wrote:
> Philip Martin <ph...@wandisco.com> writes:
>
>> What about alter_dir?  I think the rule is that alter_dir on a directory
>> should occur before add or delete affects the children of the directory.
>> There is also a rule:
>>
>>  * - The ancestor of an added, copied-here, moved-here, or
>>  *   modified node may not be deleted. The ancestor may not be moved
>>  *   (instead: perform the move, *then* the edits).
> I've been thinking about alter_dir and I see no reason, in the update
> editor at least, for a rule that requires alter_dir before adding or
> removing children.  The Ev2 "once" rule is designed to ensure that Ev2
> actions can be applied to the nodes in the working copy as the actions
> are received and that the working copy nodes will always reflect
> repository nodes.  This doesn't require alter_dir on the parent before
> add/delete of children.

Actually, the Once Rule is way more important for server implementations
than client implementations. It effectively defines when the server can
commit changes to a node and assume no further changes will happen
later. That's quite a nice property to have when you're designing
caching strategies; especially if writes are orders of magnitude more
expensive than reads, which is the norm for distributed databases -- and
even for plain vanilla filesystems.

-- Brane

-- 
Branko Čibej | Director of Subversion
WANdisco // Non-Stop Data
e. brane@wandisco.com

Re: Move using initial state

Posted by Greg Stein <gs...@gmail.com>.

On Mon, Sep 9, 2013 at 11:35 AM, Julian Foad <ju...@btopenworld.com> wrote:
> Branko Čibej wrote:
>> Julian Foad wrote:
>>> The design of Ev2 is based on the concept of incremental edits to a "current" tree state.  I

Not really. The core/original design was "random access editing". Then
danielsh had a question about ordering, so I had to make a call.

I see no problem with saying the *source* of a copy/move is from the
original state, rather than the transient state.

>>> feel that the idea that you could start editing the tree, deleting subtrees, and then come to an operation that says "Now please recover one of the subtrees that I earlier told you to delete" doesn't fit with that philosophy.

You say "recover", but that is an implementation issue. As has been
pointed out, that is *trivial* for the FS.

The working copy *may* receive a move() or copy() in the future, and
we can cross that bridge when we get there. I think it is a long ways
away. And when we *do* get there, I don't see a problem with retaining
original state. (hopefully, we'll have stash/checkpoint by then, and
so tossing around whole-states will be a cakewalk)

>>> The model of operation of the "split-move" scheme is no more split than the model implied by the "single-move" scheme; it's just more explicit.  It doesn't in any way change or add to the overall semantic content of
>>> the edit, all it changes is the timing of the information, fore-warning the consumer that a

Please don't use the term "consumer". You have a Driver, and a
Receiver. Those are the names since the beginning, and I haven't seen
a reason to try and rename those terms.

When I see "consumer", I think "consumer of the API" and have no idea
which side you're talking about. So I have to stop and look for
context.

>>> forthcoming deletion is not to be regarded as final and
>>> absolute.  That fore-warning makes a sequential consumer implementation feasible.
>>
>> I think you're assuming that an implementation that doesn't keep track of the
>> initial state is simpler, or rather, "easier" to write. I don't agree with that
>> assumption. The repository already has all history available, and the WC can
>> "trivially" be taught to remember the initial state.
>
> Hmm, my comment about "makes it feasible" may have been unfounded: I agree that we could relatively easily implement a consumer that works efficiently with that scheme.  In one possible implementation, purely for the purpose of illustrating whether I've understood correctly, the "delete" operation would not delete the subtree permanently until the end of the edit, and until then the subtree would merely be moved aside or hidden from the current view, but still able to be used as a move source, traced from a reference to its "initial state" path.

Yup.

But for the near future: the *only* editor receiving a move() or
copy() will be the FS editor, via a series of commit-time editors.

> I still can't shake the feeling that it doesn't match the "sequential edit" philosophy.

Some sequencing is needed for parent/child purposes.

>  It seems to me that fundamentally "move away"

Never heard of it :-)

In regards to your note elsethread, you really want to call this
remember(). You would also need it for delete() operations, since a
child of a delete could be a move/copy source.

I maintain that it isn't needed. We have the original state --
trivially, in fact, in the only editor of near-term concern.

>...

Cheers,
-g

Re: Move using initial state

Posted by Julian Foad <ju...@btopenworld.com>.

Branko Čibej wrote:

> Julian Foad wrote:
>> The design of Ev2 is based on the concept of incremental edits to a "current" tree state.  I feel that the idea that you could start editing the tree, deleting subtrees, and then come to an operation that says "Now please recover one of the subtrees that I earlier told you to delete" doesn't fit with that philosophy.
>>
>> The model of operation of the "split-move" scheme is no more split than the model implied by the "single-move" scheme; it's just more explicit.  It doesn't in any way change or add to the overall semantic content of
>> the edit, all it changes is the timing of the information, fore-warning the consumer that a forthcoming deletion is not to be regarded as final and
>> absolute.  That fore-warning makes a sequential consumer implementation feasible.
> 
> I think you're assuming that an implementation that doesn't keep track of the
> initial state is simpler, or rather, "easier" to write. I don't agree with that
> assumption. The repository already has all history available, and the WC can
> "trivially" be taught to remember the initial state.

Hmm, my comment about "makes it feasible" may have been unfounded: I agree that we could relatively easily implement a consumer that works efficiently with that scheme.  In one possible implementation, purely for the purpose of illustrating whether I've understood correctly, the "delete" operation would not delete the subtree permanently until the end of the edit, and until then the subtree would merely be moved aside or hidden from the current view, but still able to be used as a move source, traced from a reference to its "initial state" path.

I still can't shake the feeling that it doesn't match the "sequential edit" philosophy.  It seems to me that fundamentally "move away" is very similar to "delete" and yet we're proposing to accord it a special privilege.  If we allow the edit driver to do that, then it feels like it should also be allowed to do things like create a new directory 'A/B' before creating the parent directory 'A' rather than the present requirement for doing everything in "build it up" order.

> I also don't agree that Ev2 design makes any assumptions about initial state.
> It's more likely that we're making too many assumptions about WC semantics.

Yes, I suspect you're right.  /me tries to get over that.

- Julian

Re: Move using initial state

Posted by Branko Čibej <br...@wandisco.com>.

On 9 Sep 2013 16:22, "Julian Foad" <ju...@btopenworld.com> wrote:

> The design of Ev2 is based on the concept of incremental edits to a
"current" tree state.  I feel that the idea that you could start editing
the tree, deleting subtrees, and then come to an operation that says "Now
please recover one of the subtrees that I earlier told you to delete"
doesn't fit with that philosophy.
>
> The model of operation of the "split-move" scheme is no more split than
the model implied by the "single-move" scheme; it's just more explicit.  It
doesn't in any way change or add to the overall semantic content of
> the edit, all it changes is the timing of the information, fore-warning
the consumer that a forthcoming deletion is not to be regarded as final and
> absolute.  That fore-warning makes a sequential consumer implementation
feasible.

I think you're assuming that an implementation that doesn't keep track of
the initial state is simpler, or rather, "easier" to write. I don't agree
with that assumption. The repository already has all history available, and
the WC can "trivially" be taught to remember the initial state.

I also don't agree that Ev2 design makes any assumptions about initial
state. It's more likely that we're making too many assumptions about WC
semantics.

-- Brane

Re: Move using initial state

Posted by Julian Foad <ju...@btopenworld.com>.

Greg Stein wrote:

> On Fri, Sep 6, 2013 at 1:47 PM, Philip Martin wrote:
>> [...] I have shown how Ev2 with a split move could
>>  handle the case
>> 
>>     A/B/C to A
>>     A/B to A/B
>>     A to A/B/C
>> 
>>  What is your alternative?
> 
> move(A/B/C@original, A, replace=R)
> move(A/B@original, A/B)
> move(A@original, A/B/C)

Let me try to lay this to rest from another angle.

Look at step 1 of that suggested sequence of three operations.  The implicit definition of the single move operation suggested there is something like:

   "Find the subtree that was at path A/B/C at the start of this edit.
   It may still be in the current tree state, possibly at a different path,
   or it may have been deleted earlier in this edit in which case you'd
   better have access to a copy of it from before you deleted it.  If it
   still exists in the current tree state, then move it from where it is
   currently path to 'A'; otherwise, recover a copy of it as it would have
   looked before it was deleted, taking care to omit any child that was
   the source of an earlier move, and write that to path 'A'."

The one significant difference between the "split" (move-away, move-here) scheme and the single-move (referring to initial state) scheme can be stated like this:

  Before sending a delete operation, do we declare whether that subtree
  will be used as the source of a move later in the edit?

In the example above, the first move deletes the original 'A' by replacing it.  In the second step, move(A/B@original, A/B) requires the edit consumer to find the original A/B, which was deleted by that earlier replacement.  In the single-move scheme, the consumer discovers this after having deleted it.

Now consider what it would mean if we amended the solution by inserting, 
right at the beginning of the edit, an operation saying just:

  move-away(A)

Overall, the edit would convey exactly the same information; no new information has been added, since that move-away was already implied by steps 2 and 3.  The only thing new is the timing of this information.  The consumer would know, before replacing 'A' in step 1, that the 'A' being replaced will be needed later.  The "split" scheme does exactly that, using the term 'move-here' instead of just 'move' for clarity.

In the "split" scheme, the consumer is informed of each move source before deleting it.  There is no other significant difference between the schemes.  (Whether the source and target are identified by a path relative to the current state or to the initial state, or given an arbitrary "id", is merely a detail of how the API semantics are codified.)

Both schemes are functionally correct.  Both schemes *could* be implemented.  But not with the same sequential temporal characteristics.

The design of Ev2 is based on the concept of incremental edits to a "current" tree state.  I feel that the idea that you could start editing the tree, deleting subtrees, and then come to an operation that says "Now please recover one of the subtrees that I earlier told you to delete" doesn't fit with that philosophy.

The model of operation of the "split-move" scheme is no more split than the model implied by the "single-move" scheme; it's just more explicit.  It doesn't in any way change or add to the overall semantic content of 
the edit, all it changes is the timing of the information, fore-warning the consumer that a forthcoming deletion is not to be regarded as final and 
absolute.  That fore-warning makes a sequential consumer implementation feasible.

Does that angle make sense, Greg?

- Julian

Re: Move using initial state

Posted by Greg Stein <gs...@gmail.com>.

On Sat, Sep 7, 2013 at 4:25 AM, Philip Martin
<ph...@wandisco.com> wrote:
> Greg Stein <gs...@gmail.com> writes:
>
>> On Fri, Sep 6, 2013 at 1:47 PM, Philip Martin
>>> Two people at least.  I have shown how Ev2 with a split move could
>>> handle the case
>>>
>>>    A/B/C to A
>>>    A/B to A/B
>>>    A to A/B/C
>>>
>>> What is your alternative?
>
> How does you suggestion work? Start with
>
> NODES   local_relpath revision status   repos_path

You asked me how the API would work.

"The NODES table doesn't support it" is not a response.

Further: why would moves even be reported to the client? Moves are
operations for the *repository* to remember. The client could care
less, and Branko also pointed out the difficulty of trying to do moves
within a mixed-revision working copy.

"But. But. Are you saying that Ev2 can't be applied uniformly
everywhere? That we can't move() a working copy?" .. Yeah. So what.

We need to describe certain transformations. That process can and
should be *similar* so that experience with one, can be carried to the
next. It doesn't mean absolutism. Every driver knows every receiver.
Or if you don't like that coupling, then just say every driver knows
the tree state of every receiver and knows that trying to drive moves
in a mix-rev receiver is gonna monkey stuff up fast.

>...
>> move(A/B/C@original, A, replace=R)
>
> What does the receiver do?  I suppose it could implement the replace and
> move the replaced nodes to some temporary table:

We discussed this already. The receiver needs to retain the original
state. We even suggested a variant parameter to say "hey. you likely
need to retain the original state, so go do some extra work."

So yes, there is some work in wc.db.

>...
>> Not sure of the intent with children (ie. what is retained under A/B/C).
>
> What children?  Every node gets moved.

Fine. Was just asking, if there were any further thoughts re: children.

>...

Cheers,
-g

Re: Move using initial state

Posted by Philip Martin <ph...@wandisco.com>.

Greg Stein <gs...@gmail.com> writes:

> On Mon, Sep 9, 2013 at 11:29 AM, Julian Foad <ju...@btopenworld.com> wrote:
>> Philip Martin wrote:
>>> The current Ev2 has atomic add for files and directories, it doesn't
>>> attempt to reuse the alter operations by adding "empty" nodes and then
>>> altering those empty nodes.  The current Ev2 also has move and copy
>>> operations that do attempt to resuse alter.  I'm not clear why they are
>>> different.  Why is add different from move/copy?
>
> Combinatorics.
>
> It is striking a balance between atomicity, and combinatoric growth.
> The API doesn't have to be "pure"... it needs to work.
>
> (iow, we can skip the Second System Syndrome and get stuff done,
> rather than perfect)
>
> I don't think it poses a serious problem. But yeah... if it does, then
> we'd want to expand move/copy with the variants.

A node that has been moved and is going to be altered is in the "wrong"
state between the move and the alter.  The old node at the new location
doesn't necessarily have the correct contents for that node.  In working
copy terms the moved nodes need to marked status=incomplete between move
and alter.  This applies to individual files and whole directory trees.
The incomplete status gets cleared as the nodes are altered or when the
drive is complete.  Ev1 doesn't have incomplete files so having
incomplete files is a new complication in Ev2.

>>>     move_here_file(src_path, dst_path, properties, content, replaces_rev)
>>>     move_here_dir(src_path, dst_path, properties, children, replaces_rev)
>>>     copy_file(src_path, src_rev, dst_path, properties, content, replaces_rev)
>>>     copy_dir(src_path, src_rev, dst_path, properties, children, replaces_rev)
>
> Don't forget the symlink variants!

Symlinks and files are easy, it's directories that are the problem: the
whole tree is incomplete between move and alter.  It's not sensible to
attempt to transfer all the tree alterations as part of the directory
move, that would be an unbounded amount of data.  We are just going to
have to handle this incomplete tree state.  That means there is little
point introducing an atomic file/symlink move since we will still have
to handle non-atomic file moves as children of a moved directory.

It might be useful for Ev2 to have a subtree-complete operation.  Nodes
in the destination tree that do not get explictly altered remain
incomplete until the drive is over.  A subtree-complete operation would
allow them to be complete earlier.

I'm still unsure of the ordering rules.  The 3 node depth swap

   A/B/C@N moved to A
   A/B@N moved to A/B
   A@N moved to A/B/C

appears to lead to:

   alter_dir ., children=A               # is this needed?
   move_away A/B/C, A
   move_away A/B, A/B,
   move_away A, A/B/C
   move here A/B/C, A, replaces_rev=-1
   alter_dir A, children=B
   move_here A/B, A/B, replaces_rev=-1
   alter_dir A/B, children=C             # is this needed?
   move_here A, A/B/C, replaces_rev=-1
   alter_dir A/B/C, children=

or since you don't appear to want to split move into away/here perhaps

   alter_dir ., children=A               # is this needed?
   move here A/B/C, A,
   alter_dir A, children=B
   move_here A/B, A/B,
   alter_dir A/B, children=C             # is this needed?
   move_here A, A/B/C,
   alter_dir A/B/C, children=

I don't know if Ev2 requires alter_dir on a directory that is having a
child replaced.  Does a child replace constitute an edit of a parent
directory?  The alter_dir calls for A and A/B/C are necessary because
the list of children is changing.

If I split move into away/here then this rule:

 * Example: mv A@N to B; mv C@M to A. The second move cannot be marked as
 * a "replacing" move since it is not replacing A. The node at A was moved
 * away. The second operation is simply moving C to the now-empty path
 * known as A.

would mean that replaces_rev is not set.  It's not so clear how we apply
that rule if we don't split move.  I suppose it should be the same as
the split case so none of the moves set replaces_rev?

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: Move using initial state

Posted by Greg Stein <gs...@gmail.com>.

On Mon, Sep 9, 2013 at 11:29 AM, Julian Foad <ju...@btopenworld.com> wrote:
> Philip Martin wrote:
>> The current Ev2 has atomic add for files and directories, it doesn't
>> attempt to reuse the alter operations by adding "empty" nodes and then
>> altering those empty nodes.  The current Ev2 also has move and copy
>> operations that do attempt to resuse alter.  I'm not clear why they are
>> different.  Why is add different from move/copy?

Combinatorics.

It is striking a balance between atomicity, and combinatoric growth.
The API doesn't have to be "pure"... it needs to work.

(iow, we can skip the Second System Syndrome and get stuff done,
rather than perfect)

I don't think it poses a serious problem. But yeah... if it does, then
we'd want to expand move/copy with the variants.

>...
>>     move_here_file(src_path, dst_path, properties, content, replaces_rev)
>>     move_here_dir(src_path, dst_path, properties, children, replaces_rev)
>>     copy_file(src_path, src_rev, dst_path, properties, content, replaces_rev)
>>     copy_dir(src_path, src_rev, dst_path, properties, children, replaces_rev)

Don't forget the symlink variants!

> Yup.  Alternatively, we could make 'alter' the *only* way to declare a node's content:

And now you lose much of the atomicity.

Cheers,
-g

Re: Move using initial state

Posted by Julian Foad <ju...@btopenworld.com>.

Philip Martin wrote:
> The current Ev2 has atomic add for files and directories, it doesn't
> attempt to reuse the alter operations by adding "empty" nodes and then
> altering those empty nodes.  The current Ev2 also has move and copy
> operations that do attempt to resuse alter.  I'm not clear why they are
> different.  Why is add different from move/copy?

I agree there's incomplete separation of concerns in these aspects of the API.  I don't know whether that's a problem.

> We could split move in two and make the parts more like delete and
> add. We could also make copy more like add:
> 
>     add_file(path, properties, content, replaces_rev)
>     add_dir(path, properties, children, replaces_rev)
>     alter_file(path, properties, content, replaces_rev)
>     alter_dir(path, properties, children, replaces_rev)
>     delete(path, revision)
>     move_away(src_path, dst_path, revision)
>     move_here_file(src_path, dst_path, properties, content, replaces_rev)
>     move_here_dir(src_path, dst_path, properties, children, replaces_rev)
>     copy_file(src_path, src_rev, dst_path, properties, content, replaces_rev)
>     copy_dir(src_path, src_rev, dst_path, properties, children, replaces_rev)

Yup.  Alternatively, we could make 'alter' the *only* way to declare a node's content:

  add_file(path)  # new empty file
  add_dir(path)   # new empty dir
  copy(src_path, dst_path)
  move(...)
  alter_file(path, properties, content, replaces_rev)
  alter_dir(path, properties, children, replaces_rev)

Or with just a single add_ function:

  add_node(path)  # declare new node; its kind to be set by alter_*.
  copy(src_path, dst_path)
  move(...)
  alter_file(path, properties, content, replaces_rev)
  alter_dir(path, properties, children, replaces_rev)

- Julian

Re: Move using initial state

Posted by Philip Martin <ph...@wandisco.com>.

Julian Foad <ju...@btopenworld.com> writes:

> Branko Čibej wrote:
>
>> Can we please stop arguing about API semantics in terms of wc.db
>> implementation details? They bring nothing useful into the discussion.
>> Design the API first, /then/ worry about how to implement it.
>
> While you and I are among those who prefer to discuss the design in
> terms of semantic models, I defend Philip's right to present arguments
> in the form of quasi-concrete implementation questions.  These should
> then be analyzed to determine what semantic model issues they bring to
> light.

The current Ev2 has atomic add for files and directories, it doesn't
attempt to reuse the alter operations by adding "empty" nodes and then
altering those empty nodes.  The current Ev2 also has move and copy
operations that do attempt to resuse alter.  I'm not clear why they are
different.  Why is add different from move/copy?

We could split move in two and make the parts more like delete and
add. We could also make copy more like add:

    add_file(path, properties, content, replaces_rev)
    add_dir(path, properties, children, replaces_rev)
    alter_file(path, properties, content, replaces_rev)
    alter_dir(path, properties, children, replaces_rev)
    delete(path, revision)
    move_away(src_path, dst_path, revision)
    move_here_file(src_path, dst_path, properties, content, replaces_rev)
    move_here_dir(src_path, dst_path, properties, children, replaces_rev)
    copy_file(src_path, src_rev, dst_path, properties, content, replaces_rev)
    copy_dir(src_path, src_rev, dst_path, properties, children, replaces_rev)
    
-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: Move using initial state

Posted by Julian Foad <ju...@btopenworld.com>.

Branko Čibej wrote:

> Can we please stop arguing about API semantics in terms of wc.db
> implementation details? They bring nothing useful into the discussion.
> Design the API first, /then/ worry about how to implement it.

While you and I are among those who prefer to discuss the design in terms of semantic models, I defend Philip's right to present arguments in the form of quasi-concrete implementation questions.  These should then be analyzed to determine what semantic model issues they bring to light.

- Julian

Re: Move using initial state

Posted by Branko Čibej <br...@wandisco.com>.

On 07.09.2013 11:25, Philip Martin wrote:
> Greg Stein <gs...@gmail.com> writes:
>
>> On Fri, Sep 6, 2013 at 1:47 PM, Philip Martin
>>> Two people at least.  I have shown how Ev2 with a split move could
>>> handle the case
>>>
>>>    A/B/C to A
>>>    A/B to A/B
>>>    A to A/B/C
>>>
>>> What is your alternative?
> How does you suggestion work? Start with
>
> NODES   local_relpath revision status   repos_path
>           A              6     normal       A
>           A/B            6     normal       A/B
>           A/B/C          6     normal       A/B/C

Can we please stop arguing about API semantics in terms of wc.db
implementation details? They bring nothing useful into the discussion.
Design the API first, /then/ worry about how to implement it.

-- Brane


-- 
Branko Čibej | Director of Subversion
WANdisco // Non-Stop Data
e. brane@wandisco.com

Re: Move using initial state

Posted by Greg Stein <gs...@gmail.com>.

On Sat, Sep 7, 2013 at 5:08 AM, Philip Martin
<ph...@wandisco.com> wrote:
>...
> Note that we have to save the original nodes in the temporary table even
> though we don't yet know that they will be needed.  In this case a later
> move is going to refer to original A but we don't know that is going to
> happen.  For this to work all drives have to pre-emptively store all
> replaced nodes until the end of the edit.

That was asked and answered last month. Maybe two months ago.

-g

Re: Move using initial state

Posted by Philip Martin <ph...@wandisco.com>.

Philip Martin <ph...@wandisco.com> writes:

> Greg Stein <gs...@gmail.com> writes:
>
>> move(A/B/C@original, A, replace=R)
>
> What does the receiver do?  I suppose it could implement the replace and
> move the replaced nodes to some temporary table:
>
> NODES   local_relpath revision status   repos_path
>           A              6     normal       A/B/C
>
> and
>
> TEMP      A              6     normal       A
>           A/B            6     normal       A/B
>           A/B/C          6     not-present  A/B/C

Note that we have to save the original nodes in the temporary table even
though we don't yet know that they will be needed.  In this case a later
move is going to refer to original A but we don't know that is going to
happen.  For this to work all drives have to pre-emptively store all
replaced nodes until the end of the edit.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: Move using initial state

Posted by Philip Martin <ph...@wandisco.com>.

Greg Stein <gs...@gmail.com> writes:

> On Fri, Sep 6, 2013 at 1:47 PM, Philip Martin
>> Two people at least.  I have shown how Ev2 with a split move could
>> handle the case
>>
>>    A/B/C to A
>>    A/B to A/B
>>    A to A/B/C
>>
>> What is your alternative?

How does you suggestion work? Start with

NODES   local_relpath revision status   repos_path
          A              6     normal       A
          A/B            6     normal       A/B
          A/B/C          6     normal       A/B/C


> move(A/B/C@original, A, replace=R)

What does the receiver do?  I suppose it could implement the replace and
move the replaced nodes to some temporary table:

NODES   local_relpath revision status   repos_path
          A              6     normal       A/B/C

and

TEMP      A              6     normal       A
          A/B            6     normal       A/B
          A/B/C          6     not-present  A/B/C

but note the repos_path for the new A in NODES.  We can't simply change
it to:

NODES   local_relpath revision status   repos_path
          A              6     normal       A

as that row would be invalid: wrong properties, no A@6 in the
repository.  Let's leave it as A/B/C.

> move(A/B@original, A/B)

Then it moves the relevant rows out of the temporary table:

NODES   local_relpath revision status   repos_path
          A              6     normal       A/B/C
          A/B            6     normal       A/B
          A/B/C                not-present  A/B/C

and

TEMP      A              6     normal       A

So now A/B in NODES is switched relative to A. It's not even our
standard switch because A in the working copy is A/B/C in the repository
and the repository node has no child B.

> move(A@original, A/B/C)

Move the final row out of the temporary table

NODES   local_relpath revision status   repos_path
          A              6     normal       A/B/C
          A/B            6     normal       A/B
          A/B/C          6     normal       A

So now we have two of these strange switches.

> Not sure of the intent with children (ie. what is retained under A/B/C).

What children?  Every node gets moved.

Now we need the alter calls, these can fix up the switches:

alter_dir(A, children=B)

NODES   local_relpath revision status   repos_path
          A              8     normal       A
          A/B            6     normal       A/B
          A/B/C          6     normal       A

alter_dir(A/B, children=C)

NODES   local_relpath revision status   repos_path
          A              8     normal       A
          A/B            8     normal       A/B
          A/B/C          6     normal       A

alter_dir(A/B/C, children=)

NODES   local_relpath revision status   repos_path
          A              8     normal       A
          A/B            8     normal       A/B
          A/B/C          8     normal       A/B/C

Is that the plan?  NODES goes through those intermediate states with
switched nodes?

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: Move using initial state

Posted by Greg Stein <gs...@gmail.com>.

On Fri, Sep 6, 2013 at 1:47 PM, Philip Martin
<ph...@wandisco.com> wrote:
> Greg Stein <gs...@gmail.com> writes:
>
>> On Fri, Sep 6, 2013 at 10:50 AM, Philip Martin
>> <ph...@wandisco.com> wrote:
>>>...
>>> again either invalid or switched. This implies that if we want to
>>> combine
>>
>> They are already combined. One person is trying to *decombine* them
>> into separate non-atomic unknown-duration actions.
>
> Two people at least.  I have shown how Ev2 with a split move could
> handle the case
>
>    A/B/C to A
>    A/B to A/B
>    A to A/B/C
>
> What is your alternative?

move(A/B/C@original, A, replace=R)
move(A/B@original, A/B)
move(A@original, A/B/C)

Not sure of the intent with children (ie. what is retained under A/B/C).

Cheers,
-g

Re: Move using initial state

Posted by Philip Martin <ph...@wandisco.com>.

Greg Stein <gs...@gmail.com> writes:

> On Fri, Sep 6, 2013 at 10:50 AM, Philip Martin
> <ph...@wandisco.com> wrote:
>>...
>> again either invalid or switched. This implies that if we want to
>> combine
>
> They are already combined. One person is trying to *decombine* them
> into separate non-atomic unknown-duration actions.

Two people at least.  I have shown how Ev2 with a split move could
handle the case

   A/B/C to A
   A/B to A/B
   A to A/B/C

What is your alternative?

>
>>
>>      move_away A, id=1
>>      move_here id=1, B
>>
>> into a single
>>
>>      move A, B
>>
>> then move and alter need to be combined:
>>
>>      move_dir  A, B, children=, props=
>>      move_file A, B, checksum=, props=
>
> Well, that is one possibility. But then you also need move_symlink(A,
> B, target=, props=). And if we ever add a fourth node type... another
> entrypoint.

Yes. That's what we do.

> Same issue for copy().
>
> It was a difficult decision in the Ev2 design.

I suspect copy needs to change as well.   If I start with A@6 and copy A
to B and modify B then the Ev2 sequence

   copy A B
   alter_dir B properties=n:v
   alter_dir . children=A,B

doesn't work in the update editor. (I could have put "alter_dir ."
earlier, there is no order restriction, but it makes no difference.)

I start with NODES

    path rev  status  repo
     A    6   normal   A

the copy either gives

    path rev  status  repo
     A    6   normal   A
     B    6   normal   A

which has B switched, or it gives

    path rev  status  repo
     A    6   normal   A
     B    6   normal   B

which has an invalid node B@6.  Neither of those will update to the
desired final state

    path rev  status  repo
     A    8   normal   A
     B    8   normal   B

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: Move using initial state

Posted by Greg Stein <gs...@gmail.com>.

On Fri, Sep 6, 2013 at 10:50 AM, Philip Martin
<ph...@wandisco.com> wrote:
>...
> again either invalid or switched. This implies that if we want to
> combine

They are already combined. One person is trying to *decombine* them
into separate non-atomic unknown-duration actions.

>
>      move_away A, id=1
>      move_here id=1, B
>
> into a single
>
>      move A, B
>
> then move and alter need to be combined:
>
>      move_dir  A, B, children=, props=
>      move_file A, B, checksum=, props=

Well, that is one possibility. But then you also need move_symlink(A,
B, target=, props=). And if we ever add a fourth node type... another
entrypoint.

Same issue for copy().

It was a difficult decision in the Ev2 design.

Cheers,
-g

Re: Move using initial state

Posted by Philip Martin <ph...@wandisco.com>.

Philip Martin <ph...@wandisco.com> writes:

> This implies that if we want to
> combine
>
>      move_away A, id=1
>      move_here id=1, B
>
> into a single
>
>      move A, B
>
> then move and alter need to be combined:
>
>      move_dir  A, B, children=, props=
>      move_file A, B, checksum=, props=

I didn't mean to imply that children, props, checksum were NULL. It
would have been better if I had written:

       move_dir  A, B, children=..., props=...
       move_file A, B, checksum=..., props=...

I suppose we could allow

       move A, B

in cases where B is not altered, i.e. a short form for move_dir that
doesn't change children or properties, and a short form for a move_file
that doesn't change checksum or properties.  That's probably not a
useful form.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: Move using initial state

Posted by Philip Martin <ph...@wandisco.com>.

"Bert Huijben" <be...@qqmail.nl> writes:

> Shouldn't alter_dir get the complete list of directories when children are
> added/removed?

Yes, I didn't mean to imply anything else.  If alter_dir is called it
must get a complete list of children. However alter_dir doesn't have to
occur before children are added/deleted, it can happen between the adds
and deletes or after them.  If children are only replaced it doesn't
have to occur at all.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

RE: Move using initial state

Posted by Bert Huijben <be...@qqmail.nl>.


> -----Original Message-----
> From: Philip Martin [mailto:philip.martin@wandisco.com]
> Sent: vrijdag 6 september 2013 17:50
> To: Greg Stein
> Cc: dev@subversion.apache.org
> Subject: Re: Move using initial state
> 
> Philip Martin <ph...@wandisco.com> writes:
> 
> > What about alter_dir?  I think the rule is that alter_dir on a directory
> > should occur before add or delete affects the children of the directory.
> > There is also a rule:
> >
> >  * - The ancestor of an added, copied-here, moved-here, or
> >  *   modified node may not be deleted. The ancestor may not be moved
> >  *   (instead: perform the move, *then* the edits).
> 
> I've been thinking about alter_dir and I see no reason, in the update
> editor at least, for a rule that requires alter_dir before adding or
> removing children.  The Ev2 "once" rule is designed to ensure that Ev2
> actions can be applied to the nodes in the working copy as the actions
> are received and that the working copy nodes will always reflect
> repository nodes.  This doesn't require alter_dir on the parent before
> add/delete of children.

Shouldn't alter_dir get the complete list of directories when children are
added/removed?

With editor v1 the list of children *must be* updated before we 'close' the
directory, by remove the incomplete flag. If we don't do this then an
interrupted update can't be restarted.

A similar scheme should also apply for editor v2. If children are added or
removed without bringing the parent in a 'being modified state', it is
neither in the original revision, nor in the final revision.

The alter_dir method would allow bringing it to completion directly by
giving it the complete final revision state, with the knowledge about which
children need a separate update to complete the entire tree.

	Bert

> 
> Consider a working copy with three nodes:
> 
>    A@6
>    A/B@6
>    A/C@6
> 
> that gets updated to
> 
>    A@8
>    A/D@8
>    A/E@8
> 
> That's two adds, two deletes and an alter and the update editor can
> handle them in any order, even this order:
> 
>    add_dir A/D
>    delete A/B
>    alter_dir A, children=D,E
>    add_dir A/E
>    delete A/C
> 
> Lets see how NODES would work:
> 
>      relpath  rev  status
>      A         6   normal
>      A/B       6   normal
>      A/C       6   normal
> 
>    add_dir A/D
> 
>      relpath  rev  status
>      A         6   normal
>      A/B       6   normal
>      A/C       6   normal
>      A/D       8   normal
> 
>    delete A/B
> 
>      relpath  rev  status
>      A         6   normal
>      A/B       6   not-present
>      A/C       6   normal
>      A/D       8   normal
> 
>    alter_dir A
> 
>      relpath  rev  status
>      A         8   normal
>      A/C       6   normal
>      A/D       8   normal
>      A/E       8   incomplete
> 
>    add_dir A/E
> 
>      relpath  rev  status
>      A         8   normal
>      A/C       6   normal
>      A/D       8   normal
>      A/E       8   normal
> 
>    delete A/C
> 
>      relpath  rev  status
>      A         8   normal
>      A/D       8   normal
>      A/E       8   normal
> 
> Every intermediate state has NODES rows that reflect repository nodes.
> If interrupted every intermediate state can be correctly updated to
> either r6, r8 or any other revision.
> 
> The delete introduces a not-present node if the parent revision is
> different from the target revision, otherwise it simply removes the
> node.
> 
> The alter removes any not-present children and introduces incomplete for
> any missing children.
> 
> Any children that are replaced, i.e. add with replaces-rev set, do not
> require alter_dir on the parent at all, although some other change to
> the parent may require it.
> 
> > It's not clear where alter_dir should occur w.r.t the moves in my
> > example.  Does alter_dir count as an edit that should occur after move?
> > Do we pass initial state paths:
> >
> >    alter_dir .,     children='A'
> >    alter_dir A,     children=''
> >    alter_dir A/B,   children='C'
> >    alter_dir A/B/C, children='B'
> >
> > or final_state paths:
> >
> >    alter_dir .,     children='A'
> >    alter_dir A,     children='B'
> >    alter_dir A/B,   children='C'
> >    alter_dir A/B/C, children=''
> 
> So we don't necessarily have to do alter_dir on the parent before moving
> children.  What about alter on the moved node itself?  Perhaps we do
> that between move_away and move_here.
> 
>      relpath    rev  status       repo
>      A           6   normal       A
>      A/B         6   normal       A/B
>      A/B/C       6   normal       A/B/C
> 
>    move_away A/B/C, id=1
> 
>      relpath    rev  status       repo
>      A           6   normal       A
>      A/B         6   normal       A/B
>      A/B/C       6   not-present  A/B/C
> 
>    move_away A/B, id=2
> 
>      relpath    rev  status       repo
>      A           6   normal       A
>      A/B         6   not-present  A/B
> 
>    move_away A, id=3
> 
>      relpath    rev  status       repo
>      A           6   not-present  A
> 
>    alter_dir id=1, children=B
>    move_here id=1, A
> 
>      relpath    rev  status      repo
>      A           8   normal      A
>      A/B         8   incomplete  A
> 
>    alter_dir id=2, children=C
>    move_here id=2, A/B
> 
>      relpath    rev  status      repo
>      A           8   normal      A
>      A/B         8   normal      A/B
>      A/B/C       8   incomplete  A/B/C
> 
>    alter_dir id=3, children=
>    move_here id=3, A/B/C
> 
>      relpath    rev  status      repo
>      A           8   normal      A
>      A/B         8   normal      A/B
>      A/B/C       8   normal      A/B/C
> 
> That looks like the set of NODES tables that we want. At each stage the
> NODES rows reflect nodes in the repository and if interrupted an update
> to any revision is possible.
> 
> This also means that NODES.repos_path and NODE.revision in the NODES
> table always reflect nodes in the repository.  If we try to do alter
> before or after move we end up with things that look switched or nodes
> that are not valid.  Consider an update that moves A/B to A/C:
> 
>      relpath    rev  status       repo
>      A           6   normal       A
>      A/B         6   normal       A/B
> 
> If we move before alter we either get
> 
>    move A/B A/C
> 
>      relpath    rev  status       repo
>      A           6   normal       A
>      A/C         6   normal       A/C
> 
> or
> 
>      relpath    rev  status       repo
>      A           6   normal       A
>      A/C         6   normal       A/B
> 
> The first has an invalid row A/C@6 the second has A/C switched.  If we
> alter before move we either get
> 
>    alter_dir C, children=
> 
>      relpath    rev  status       repo
>      A           6   normal       A
>      A/B         8   normal       A/B
> 
> or
> 
>      relpath    rev  status       repo
>      A           6   normal       A
>      A/B         8   normal       A/C
> 
> again either invalid or switched. This implies that if we want to
> combine
> 
>      move_away A, id=1
>      move_here id=1, B
> 
> into a single
> 
>      move A, B
> 
> then move and alter need to be combined:
> 
>      move_dir  A, B, children=, props=
>      move_file A, B, checksum=, props=
> 
> --
> Philip Martin | Subversion Committer
> WANdisco // *Non-Stop Data*

Re: Move using initial state

Posted by Branko Čibej <br...@wandisco.com>.

On 06.09.2013 18:56, Johan Corveleyn wrote:
> I don't fully grok the Ev2 ideas and discussions, so not sure this is
> relevant, but please do remember this little detail: for working
> copies on case-insensitive filesystems, it's important that deletes
> are executed before adds (for handling case-only renames: 'svn mv foo
> Foo'). 

Just note that a case-only rename in Ev2 does not involve a delete at
all. But yes, this is important not just for editor driver
implementations, but also for the backwards compatibility shims which
have to replace a move with a copy+delete.

-- Brane

-- 
Branko Čibej | Director of Subversion
WANdisco // Non-Stop Data
e. brane@wandisco.com

Re: Move using initial state

Posted by Philip Martin <ph...@wandisco.com>.

Johan Corveleyn <jc...@gmail.com> writes:

> I don't fully grok the Ev2 ideas and discussions, so not sure this is
> relevant, but please do remember this little detail: for working
> copies on case-insensitive filesystems, it's important that deletes
> are executed before adds (for handling case-only renames: 'svn mv foo
> Foo').

That's not a problem, Ev2 gives the same freedom as Ev1 to send deletes
before adds.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: Move using initial state

Posted by Johan Corveleyn <jc...@gmail.com>.

On Fri, Sep 6, 2013 at 5:50 PM, Philip Martin
<ph...@wandisco.com> wrote:
> Philip Martin <ph...@wandisco.com> writes:
>
>> What about alter_dir?  I think the rule is that alter_dir on a directory
>> should occur before add or delete affects the children of the directory.
>> There is also a rule:
>>
>>  * - The ancestor of an added, copied-here, moved-here, or
>>  *   modified node may not be deleted. The ancestor may not be moved
>>  *   (instead: perform the move, *then* the edits).
>
> I've been thinking about alter_dir and I see no reason, in the update
> editor at least, for a rule that requires alter_dir before adding or
> removing children.  The Ev2 "once" rule is designed to ensure that Ev2
> actions can be applied to the nodes in the working copy as the actions
> are received and that the working copy nodes will always reflect
> repository nodes.  This doesn't require alter_dir on the parent before
> add/delete of children.
>
> Consider a working copy with three nodes:
>
>    A@6
>    A/B@6
>    A/C@6
>
> that gets updated to
>
>    A@8
>    A/D@8
>    A/E@8
>
> That's two adds, two deletes and an alter and the update editor can
> handle them in any order, even this order:
>
>    add_dir A/D
>    delete A/B
>    alter_dir A, children=D,E
>    add_dir A/E
>    delete A/C

I don't fully grok the Ev2 ideas and discussions, so not sure this is
relevant, but please do remember this little detail: for working
copies on case-insensitive filesystems, it's important that deletes
are executed before adds (for handling case-only renames: 'svn mv foo
Foo').

-- 
Johan

Re: Move using initial state

Posted by Philip Martin <ph...@wandisco.com>.

Philip Martin <ph...@wandisco.com> writes:

> What about alter_dir?  I think the rule is that alter_dir on a directory
> should occur before add or delete affects the children of the directory.
> There is also a rule:
>
>  * - The ancestor of an added, copied-here, moved-here, or
>  *   modified node may not be deleted. The ancestor may not be moved
>  *   (instead: perform the move, *then* the edits).

I've been thinking about alter_dir and I see no reason, in the update
editor at least, for a rule that requires alter_dir before adding or
removing children.  The Ev2 "once" rule is designed to ensure that Ev2
actions can be applied to the nodes in the working copy as the actions
are received and that the working copy nodes will always reflect
repository nodes.  This doesn't require alter_dir on the parent before
add/delete of children.

Consider a working copy with three nodes:

   A@6
   A/B@6
   A/C@6

that gets updated to

   A@8
   A/D@8
   A/E@8

That's two adds, two deletes and an alter and the update editor can
handle them in any order, even this order:

   add_dir A/D
   delete A/B
   alter_dir A, children=D,E
   add_dir A/E
   delete A/C

Lets see how NODES would work:

     relpath  rev  status
     A         6   normal
     A/B       6   normal
     A/C       6   normal

   add_dir A/D

     relpath  rev  status
     A         6   normal
     A/B       6   normal
     A/C       6   normal
     A/D       8   normal

   delete A/B

     relpath  rev  status
     A         6   normal
     A/B       6   not-present
     A/C       6   normal
     A/D       8   normal

   alter_dir A

     relpath  rev  status
     A         8   normal
     A/C       6   normal
     A/D       8   normal
     A/E       8   incomplete

   add_dir A/E

     relpath  rev  status
     A         8   normal
     A/C       6   normal
     A/D       8   normal
     A/E       8   normal

   delete A/C

     relpath  rev  status
     A         8   normal
     A/D       8   normal
     A/E       8   normal

Every intermediate state has NODES rows that reflect repository nodes.
If interrupted every intermediate state can be correctly updated to
either r6, r8 or any other revision.

The delete introduces a not-present node if the parent revision is
different from the target revision, otherwise it simply removes the
node.

The alter removes any not-present children and introduces incomplete for
any missing children.

Any children that are replaced, i.e. add with replaces-rev set, do not
require alter_dir on the parent at all, although some other change to
the parent may require it.

> It's not clear where alter_dir should occur w.r.t the moves in my
> example.  Does alter_dir count as an edit that should occur after move?
> Do we pass initial state paths:
>
>    alter_dir .,     children='A'
>    alter_dir A,     children=''
>    alter_dir A/B,   children='C'
>    alter_dir A/B/C, children='B'
>
> or final_state paths:
>
>    alter_dir .,     children='A'
>    alter_dir A,     children='B'
>    alter_dir A/B,   children='C'
>    alter_dir A/B/C, children=''

So we don't necessarily have to do alter_dir on the parent before moving
children.  What about alter on the moved node itself?  Perhaps we do
that between move_away and move_here.

     relpath    rev  status       repo
     A           6   normal       A
     A/B         6   normal       A/B
     A/B/C       6   normal       A/B/C

   move_away A/B/C, id=1

     relpath    rev  status       repo
     A           6   normal       A
     A/B         6   normal       A/B
     A/B/C       6   not-present  A/B/C

   move_away A/B, id=2

     relpath    rev  status       repo
     A           6   normal       A
     A/B         6   not-present  A/B

   move_away A, id=3

     relpath    rev  status       repo
     A           6   not-present  A

   alter_dir id=1, children=B
   move_here id=1, A

     relpath    rev  status      repo
     A           8   normal      A
     A/B         8   incomplete  A

   alter_dir id=2, children=C
   move_here id=2, A/B

     relpath    rev  status      repo
     A           8   normal      A
     A/B         8   normal      A/B
     A/B/C       8   incomplete  A/B/C

   alter_dir id=3, children=
   move_here id=3, A/B/C

     relpath    rev  status      repo
     A           8   normal      A
     A/B         8   normal      A/B
     A/B/C       8   normal      A/B/C

That looks like the set of NODES tables that we want. At each stage the
NODES rows reflect nodes in the repository and if interrupted an update
to any revision is possible.

This also means that NODES.repos_path and NODE.revision in the NODES
table always reflect nodes in the repository.  If we try to do alter
before or after move we end up with things that look switched or nodes
that are not valid.  Consider an update that moves A/B to A/C:

     relpath    rev  status       repo
     A           6   normal       A
     A/B         6   normal       A/B

If we move before alter we either get

   move A/B A/C

     relpath    rev  status       repo
     A           6   normal       A
     A/C         6   normal       A/C

or

     relpath    rev  status       repo
     A           6   normal       A
     A/C         6   normal       A/B

The first has an invalid row A/C@6 the second has A/C switched.  If we
alter before move we either get

   alter_dir C, children=

     relpath    rev  status       repo
     A           6   normal       A
     A/B         8   normal       A/B

or

     relpath    rev  status       repo
     A           6   normal       A
     A/B         8   normal       A/C

again either invalid or switched. This implies that if we want to
combine

     move_away A, id=1
     move_here id=1, B

into a single

     move A, B

then move and alter need to be combined:

     move_dir  A, B, children=, props=
     move_file A, B, checksum=, props=

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: Move using initial state

Posted by Philip Martin <ph...@wandisco.com>.

Greg Stein <gs...@gmail.com> writes:

> On Wed, Sep 4, 2013 at 10:43 AM, Apache subversion Wiki
> <co...@subversion.apache.org> wrote:
>>...
>> Given these constraints, not all combinations of moves can be expressed using a “move source to destination” operation, with or without a “rotate” operation, without using temporary paths.
>
> I'm not buying that you need two operations. The "move uses initial
> state" tweak seems fully adequate and gets us the single-op/atomicity
> principle that Ev2 was designed under.

At some point we have to create temporary locations and I suppose it
might be possible for the receiver to generate them as necessary.
However I'm still struggling to understand the ordering of moves and
alter_dirs so I can't determine whether that is practical or sensible.

Given this example:

   svn mv A     X
   svn mv X/B/C A
   svn mv X/B   A/B
   svn mv X     A/B/C
   svn ci

or the equivalent:

   svn mv A/B/C X
   svn mv A/B   X/B
   svn mv A     X/B/C
   svn mv X     A
   cvn ci

we have this set of Ev2 moves in some order:

   move A, A/B/C
   move A/B, A/B
   move A/B/C, A

What is the correct order for these operations?  I guess there may be
more than one valid order since I showed two possible temporaries,
perhaps both

   move A, A/B/C
   move A/B, A/B
   move A/B/C, A

and

   move A/B/C, A
   move A/B, A/B
   move A, A/B/C

are valid. Or perhaps the alter_dir ordering rules exclude one?

What about alter_dir?  I think the rule is that alter_dir on a directory
should occur before add or delete affects the children of the directory.
There is also a rule:

 * - The ancestor of an added, copied-here, moved-here, or
 *   modified node may not be deleted. The ancestor may not be moved
 *   (instead: perform the move, *then* the edits).

It's not clear where alter_dir should occur w.r.t the moves in my
example.  Does alter_dir count as an edit that should occur after move?
Do we pass initial state paths:

   alter_dir .,     children='A'
   alter_dir A,     children=''
   alter_dir A/B,   children='C'
   alter_dir A/B/C, children='B'

or final_state paths:

   alter_dir .,     children='A'
   alter_dir A,     children='B'
   alter_dir A/B,   children='C'
   alter_dir A/B/C, children=''

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: Move using initial state (was: Update of "MoveDev/Ev2MovesDesign" ...)

Posted by Greg Stein <gs...@gmail.com>.

On Sat, Sep 7, 2013 at 6:30 AM, Branko Čibej <br...@wandisco.com> wrote:
> On 07.09.2013 12:47, Greg Stein wrote:
>...
>> I'm curious why a move() would be sent to a client.
>
> Off the top of my head:
>
>   * It would help find better solutions to a category of tree conflicts
>     that we currently do not handle very well.

Hmm. Yeah, I could see this. A local-edit could (easily) be re-applied
to an incoming-move.

>   * It allows the client to optimize working copy changes, issueing
>     filesystem-level moves of files and directories instead of rewriting
>     and deleting them. In large working copies, a rename of a directory
>     can be very expensive in the current copy+delete implementation.

Dunno about this. But it isn't really here/there. There is a lot of
work in between :-)

> But I'm astounded that you'd even consider having an asymmetric editor
> API. After all, it's not constrained to client<->server communication.

Nope. Just being pragmatic. The server is never mixed-rev. The client
is. You've been making that abundantly clear, and I think a reasonable
answer is constraint.

> It seems obvious to me in hindsight that Ev2 was designed mostly with
> the client->server direction in mind, and kind of ignored the issues on
> the working-copy side.

Look at the notes. And my emails. It was designed for atomicity, for
random-access, and to move away from a bare vtable. I could probably
list others. Whatever.

Ev1 is stone age. Ev2 is industrial age. If you think we can make a
further jump... fine.

It didn't ignore "issues". It simply tries to make a positive move
forwards. Was wc-ng perfect? FSX? Ev2? ... nope.

With that in mind, I suggested that a move() to the client isn't an
appropriate operation. It is *way* too complex. We have a hojillion
other problems that can be solved Right Now.

>...
> But high-level features such as mixed-revision working
> copies, switched subtrees, sparse trees etc. do have a non-trivial impact.

Yeah. That crap scares me :-)

But I return to: can we make forward progress without painting
ourselves into a corner? Can we *also* do that without analysis
paralysis? Can we solve the 90%? (without waiting for the 100%?)

Cheers,
-g

Re: Move using initial state (was: Update of "MoveDev/Ev2MovesDesign" ...)

Posted by Branko Čibej <br...@wandisco.com>.

On 07.09.2013 12:47, Greg Stein wrote:
> On Thu, Sep 5, 2013 at 6:51 AM, Branko Čibej <br...@wandisco.com> wrote:
>> ...
>> For the server->client-mixed-revision
>> scenario, I now believe this is not the case.
> I'm curious why a move() would be sent to a client.

Off the top of my head:

  * It would help find better solutions to a category of tree conflicts
    that we currently do not handle very well.
  * It allows the client to optimize working copy changes, issueing
    filesystem-level moves of files and directories instead of rewriting
    and deleting them. In large working copies, a rename of a directory
    can be very expensive in the current copy+delete implementation.

But I'm astounded that you'd even consider having an asymmetric editor
API. After all, it's not constrained to client<->server communication.

It seems obvious to me in hindsight that Ev2 was designed mostly with
the client->server direction in mind, and kind of ignored the issues on
the working-copy side. I don't mean implementation details such as the
NODES table; those can't be used as valid arguments for defining API
semantics. But high-level features such as mixed-revision working
copies, switched subtrees, sparse trees etc. do have a non-trivial impact.

-- Brane

-- 
Branko Čibej | Director of Subversion
WANdisco // Non-Stop Data
e. brane@wandisco.com

Re: Move using initial state (was: Update of "MoveDev/Ev2MovesDesign" ...)

Posted by Greg Stein <gs...@gmail.com>.

On Thu, Sep 5, 2013 at 6:51 AM, Branko Čibej <br...@wandisco.com> wrote:
>...
> For the server->client-mixed-revision
> scenario, I now believe this is not the case.

I'm curious why a move() would be sent to a client.

Cheers,
-g

Re: Move using initial state (was: Update of "MoveDev/Ev2MovesDesign" ...)

Posted by Branko Čibej <br...@wandisco.com>.

On 05.09.2013 09:13, Greg Stein wrote:
> On Wed, Sep 4, 2013 at 10:43 AM, Apache subversion Wiki
> <co...@subversion.apache.org> wrote:
>> ...
>> Given these constraints, not all combinations of moves can be expressed using a “move source to destination” operation, with or without a “rotate” operation, without using temporary paths.
> I'm not buying that you need two operations. The "move uses initial
> state" tweak seems fully adequate and gets us the single-op/atomicity
> principle that Ev2 was designed under.

>From the last couple months of design discussions, mostly driven by
Julian and Philip, I got the impression that this would work fine for a
client->server drive, but falls on its face in a server->client drive
where the client has a mixed-revision working copy.

I believe that in the client->server scenario, in most cases, the
temporary locations required by the initial-state model could be
auto-generated by the receiver. For the server->client-mixed-revision
scenario, I now believe this is not the case. I'd love to be convinced
otherwise, but I'm afraid that a single paragraph stating an opinion
falls short of that.

-- Brane

-- 
Branko Čibej | Director of Subversion
WANdisco // Non-Stop Data
e. brane@wandisco.com