You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Julian Foad <ju...@btopenworld.com> on 2014/05/20 16:25:39 UTC

[RFC] Move tracking - design summary and mock-up proposal

Hi, folks.

At last I have a plan for move tracking. I want to share this plan and get
your feedback.

I want to start implementing it, but first Philip suggested, and I agree,
we should do a mock-up and see how the high-level behaviour pans out in
typical scenarios involving moves, especially merging with moves. There's a
lot of work here and implementing it would be risky unless we first check
whether it will deliver useful results.

The mock 'svn' client will implement the essence of the design, but with
the move info being initially just an explicit user input whenever the code
requires it. The next advancement can be that the move info is stored and
retrieved in any form whatsoever (revprops, versioned props, even off-line
text files). Then we can try out the scenarios that we imagine users will
expect to work, and see whether the design does in fact make these things
easy. An example:

  0 make branches A and B with some content
  1 in branch A, rename a directory and all the files inside it
  2 in branch B, modify some files
  3 merge "automatically" from A to B
  4 in branch A, make some text mods
  5 merge "automatically" from A to B -- check that mods from A go to the
right files in B with no user intervention

I'd prefer to do the mock-up in Python if we can. Does that seem
reasonable, or should I branch and hack the C code instead?


== DESIGN ==

I had an opportunity to discuss the design in person with Ben, Brane,
Philip and Stefan these last couple of weeks, which resulted in a good few
steps forward. I have been going through some variations, but here is more
or less the design in summary.


a BRANCH ELEMENT
  - is, immutably, a file or a directory (or a symlink in future);
  - has a separate but related life-line in each branch in the same branch
family;
  - has, in each rev in each branch, a parent directory element within the
same branch (except for the branch root element);
  - has, in each rev in each branch, a name within its parent element.

An instance of an element in a particular branch can, at each revision, be
moved to a new parent element and/or given a new name, as long as it stays
within its branch. It cannot be moved outside its branch or into a nested
branch.


a BRANCH
  - is rooted at exactly one element (usually a directory)
  - is the set of trees rooted at its root element in all revisions at
which the root element exists
  - is a member of exactly one branch family

The branch root element is distinguishable from non-branch-root elements
(at the repository level and at the client level).

The repository root directory is implicitly the root element of a singleton
top-level branch. (The top-level branch is not particularly interesting to
the user.) Every other branch root element is a non-root element of a
branch at the next higher branching level.

A branch is created in one of two ways:
  - creating a new element and explicitly designating it as the root
element of a (first) branch of a new branch family; or
  - branching (see below) the root element of an existing branch to create
a new branch of an existing branch family.


a BRANCH FAMILY
  - is a set of branches of the "same" tree;
  - results from initially creating a new branch and then directly and
transitively branching it, including by branching a higher-level branch;
  - can be nested inside another branch family, in a strict hierarchy.

When branches are nested, the nesting topology is uniform: if one branch of
family F is inside a branch of family G, then every branch of family F is
inside a branch of family G.

e.g. the outer family contains branches {trunk, br1, ...} and the nested
family contains branches {fs_fs, fs_x}:

^/.../trunk/
    +--- subversion/libsvn_fs_fs/  ...
    +--- subversion/libsvn_fs_x/  ...

^/.../branches/br1/
    +--- subversion/libsvn_fs_fs/  ...
    +--- subversion/experimental/libsvn_fs_x/  ...

There is no fundamental distinction at the repository level between the
"original" branch and any other branch in the same family, nor a
directional relationship implied by the order in which subsequent branches
are added to the family. The "original", and the order of subsequent
branching, are only distinguishable by looking at client-level metadata
(the "copy-from" metadata and/or mergeinfo; see "copying").


MOVING

An element of a branch can be moved to a new parent element and/or a new
name within its own branch, but cannot be moved into a nested branch or out
of its own branch. In other words, a move cannot cross a branch boundary.

The same applies to moving a whole branch, which is achieved by moving its
root element to a new parent and/or name within the outer branch.


COPYING versus BRANCHING

At the repository level, what we have previously called the "copy"
relationship is only useful as a branching relationship. The "copy-from"
information is client-level metadata, not needed by the repository itself,
and so shall be stored elsewhere. Thus, at the repository level, copying is
branching.

At the client level, branches are conceptually distinguished, and so the
terms "copying" and "branching" are naturally distinguished: copying makes
a new element (or tree) that is not a branch, and branching makes a new
branch in the same family as an existing branch.

Normally a user would only ever branch a branch, and copy a non-branch. In
practice it would be unusual to want to "copy" a branch so as to make a new
non-branch element or tree, or to create a new branch as a copy of a
non-branch, but these variants could be provided for advanced users.

A suggested user interface would have
  - "branch" applied to a branch (root) creates a new branch in the same
branch family;
  - "copy" applied to a non-branch creates a new non-branch tree as a copy
of the source;
  - "branch" applied to a non-branch or "copy" to a branch is an error,
except in advanced usage.

An alternative user interface could have just one operation distinguished
by the specified source:
  - "copy" applied to a branch (root) creates a new branch in the same
branch family;
  - "copy" applied to a non-branch creates a new non-branch tree as a copy
of the source.

In any case, the new branch or non-branch shall be tagged with client-level
metadata saying where it was copied from.


LIFE-LINES and RESURRECTION

The life-line of an element on a given branch is a subset of the life-line
of its branch. At each revision in the life-line of its branch, the element
can exist (at an arbitrary name and parent element within the branch) or
not exist.

An element can be resurrected: that is, after being deleted, it can be
brought back into existence on the same branch, as the same element, and
not just as a new element copied from it. This is essentially the same as
what must happen when we merge the creation of this element to any other
branch on which it is not currently alive, and so it is a natural part of
branching and merging. (Resurrection is a necessary consequence of the fact
that, for self-consistency, it must be possible to reverse-merge any
change. Reverse-merging a deletion, whether onto the same branch or onto
another branch, must resurrect an instance of the same element that was
deleted, and not just create a copy of it.)

As a branch root is an element of an outer branch, it follows that each
branch itself can be deleted and resurrected. A branch family can therefore
have zero or more of its branches in existence at any one time.


IDs

Stefan2 and Brane and I have ideas about the repository schema and IDs. One
core point, to the best of my understanding so far, is that in the
<node-id.copy-id.txn-id> scheme, the <node-id> shall identify an element
uniquely across all elements of all branch families (but repeated for each
branch within a family); and the <copy-id> shall identify the branch
uniquely among all branches in all branch families in the repository. The
<copy-id> shall no longer be used when the user wants a simple
non-branching copy; instead, in that case a new node-id (or tree of node
ids) shall be assigned.

We'll write separately more about the repository schema and IDs.


FWIW, I realize it's a very big development, but it no longer seems
impossibly far off. The pieces are coming together.

Comments? Help?

- Julian

Re: [RFC] Move tracking - design summary and mock-up proposal

Posted by Johan Corveleyn <jc...@gmail.com>.
On Tue, May 20, 2014 at 4:25 PM, Julian Foad <ju...@btopenworld.com> wrote:
> Hi, folks.
>
> At last I have a plan for move tracking. I want to share this plan and get
> your feedback.

Awesome!

> == DESIGN ==
>
> I had an opportunity to discuss the design in person with Ben, Brane, Philip
> and Stefan these last couple of weeks, which resulted in a good few steps
> forward. I have been going through some variations, but here is more or less
> the design in summary.

I have read through it and it sounds/looks very good to me, at a high
level. Kudos for refining the multitude of ideas into a consistent and
reasonable design, and writing it down in such a concise and clear
manner.

-- 
Johan

Re: [RFC] Move tracking - design summary and mock-up proposal

Posted by Julian Foad <ju...@btopenworld.com>.
Julian Foad wrote:
> At last I have a plan for move tracking. I want to share this plan and get your feedback.
> [...]


I have written lots more about various parts of move tracking at

http://www.foad.me.uk/tmp/svn/moves/

with an emphasis on trying to explain why it needs to work this way in terms of the theoretical basis. My writing there is a bit scattered and out of date, but you might nevertheless find some of it interesting and informative.

The docs listed first are the most readable, coherent and up to date, and are supposed to make sense top-down in the order given. Don't put too much weight on their titles.

  rationale
  theory
  merge
  design spec
  algebra

One out-of-date concept in the docs there is that I was talking about the definition of a branches and branch families being mutable: that any directory (or file) could become a branch root element on demand, which led to problems of how or whether a branch could later be converted back to a non-branch, and how to decide whether it currently is or isn't, and so on. From discussions with the others I've changed my mind: a directory (or file) either is or isn't a branch root element, immutably for its whole lifetime.

- Julian

Re: [RFC] Move tracking - design summary and mock-up proposal

Posted by Stefan Sperling <st...@elego.de>.
On Tue, May 20, 2014 at 03:25:39PM +0100, Julian Foad wrote:
> Hi, folks.
> 
> At last I have a plan for move tracking. I want to share this plan and get
> your feedback.

I'm delighted to hear this! :)

> I want to start implementing it, but first Philip suggested, and I agree,
> we should do a mock-up and see how the high-level behaviour pans out in
> typical scenarios involving moves, especially merging with moves. There's a
> lot of work here and implementing it would be risky unless we first check
> whether it will deliver useful results.
 
> I'd prefer to do the mock-up in Python if we can. Does that seem
> reasonable, or should I branch and hack the C code instead?

I would branch and hack the C code. Most of the actual behaviour
changes will have to be in libsvn_client/libsvn_wc. The client
only controls prompting and can only offer meaningful choices
if the library provides enough information and does right thing
when asked to do something. I expect we'll have to make changes
in the conflict resolver, and perhaps the conflict store as well.

Also, we're already familiar with the C code and can collaborate
on the branch.

> == DESIGN ==

I'll review the design later.