You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Julian Foad <ju...@wandisco.com> on 2011/10/07 12:59:09 UTC

Re: Identifying branch roots

On Fri, 2011-10-07 at 11:29 +0100, Julian Foad wrote:
> Stefan Sperling wrote:
> > julianfoad wrote:
> > > +/* This property marks a branch root. Branches with the same value of this
> > > + * property are mergeable. */
> > > +#define SVN_PROP_BRANCHING_ROOT "svn:ignore" /* ### should be "svn:branching-root" */
> 
> Hi Stefan. Thanks for picking up on this.
> 
> > I think your addition of a 'branch root' property is quite a significant
> > step. Is this really necessary in order to improve the output of
> > 'svn mergeinfo' or do you have additional steps planned that go beyond
> > tuning output?
> 
> Both.  I think knowing whether the (requested) merge source and target
> are branch roots (and indeed branches of the *same* "project" or tree)
> is important for improving the output and diagnostics of "svn mergeinfo"
> and "svn merge" commands.
> 
> It could of course enable other new behaviours relating to branches, and
> I don't know what those are yet (apart from trivial UI things like
> answering "is this a branch?").
> 
> So I'm working on the idea that it would be useful to have branch roots
> identifiable by some mechanism, so I'll add "some mechanism" (currently
> this property, but I'm totally open to a different mechanism such as
> branch points being defined in a config file) and see what useful
> behaviours I can come up with.
> 
> 
> > There has been some discussion about adding a property for this
> > and similar purposes in the past, see
> > http://svn.haxx.se/dev/archive-2009-09/0156.shtml
> > (there are probably more threads about this topic)
> 
> Yes, and it's time to figure out what we can usefully do with such
> information and then we'll know exactly what branch configuration
> information we need and what's a good way to store it.
> 
> I'll reply to the rest in a further email.
> 
> - Julian
> 
> 
> > One idea I've had in mind which relates to the notion of a branch root
> > is to have a property which specifies the merge policy for the branch,
> > [...]
> > 
> > Simple sketch:
> >  - feature-branch (can only receive catch-up merges from its direct parent
> >                    branch (copyfrom-source), or be reintegrated)
> >     => svn:mergepolicy on branch root: 'sync ^/trunk
> >                                         reintegrate ^/trunk'
> > 
> >  - release-branch (cannot be reintegrated into its parent)
> >     => svn:mergepolicy on branch root: 'cherry-pick ^/trunk'

Yes, this could be useful, and is the kind of policy configuration I'm
talking about here:
<http://wiki.apache.org/subversion/BranchingMergingTerminology>.

> > I haven't though this through yet.
> > I think your approach of "Branches with the same value of this
> > property are mergeable" is a design decision which has a similar
> > kind of impact on how a branch-root property would be used.
> > How do we really want it to be used?
> > 
> > I think this is an exciting feature with lots of potential but
> > it has a lot more inherent complexity than improving 'svn mergeinfo'
> > output. Could we split improving the output of 'svn mergeinfo' and
> > identifying branch roots into two distinct feature branches?

Splitting it wouldn't work for me, because the branch marker information
has no purpose and no defined requirements without the mergeinfo user
interface work; the existence and meaning of the info goes hand-in-hand
with developing some ways in which that info can be used.

When we come to integrate some of this work to trunk, by then the
specification of the branching info will have been worked out, and so
then it may be best to create a feature branch specifically for adding
that (without the mergeinfo changes) if it is complex enough to be
worthwhile doing so.

> > > +/* Set *MARKER to the project-root marker that is common to SOURCE and
> > > + * TARGET, or to NULL if neither has such a marker.
> > 
> > Why do you need to introduce a new term "project"?

That's not a very good term, and I started by calling it a "branch-root
marker", because it does indeed mark a branch root, but I wanted a name
that says the value of this property is the name of the "code base" or
"project tree" that has been branched, rather than the name of the
branch.  I'm open to better ideas.

[...]
> > Or are you suggesting Subversion should have a "project" concept?
> > If so, what is your definition of this term?

It's the "code base" or "tree of files" that gets branched.  I expect
(but am not yet certain) that if we introduce "branch" as a concept then
we will need a name for the generic "thing" that is being branched,
otherwise we will never be able to talk clearly about branching.
"Tree" (as in a tree of files) is one possible suggestion, but a rather
poor choice because the "branches" are not branches in that tree but are
rather branches *of* that tree in (conceptually) another dimension.

If I were to drop the idea of identifying the "project" and just have a
property "svn:is-branch-root" with value "*" meaning "yes", then the
need to find a term for the "thing that is branched" would be postponed
for a short time more.

- Julian

Re: Identifying branch roots

Posted by Trent Nelson <tr...@snakebite.org>.

On 07-Oct-11 6:59 AM, Julian Foad wrote:
> On Fri, 2011-10-07 at 11:29 +0100, Julian Foad wrote:
>> Stefan Sperling wrote:
>>> julianfoad wrote:
>>>> +/* This property marks a branch root. Branches with the same value of this
>>>> + * property are mergeable. */
>>>> +#define SVN_PROP_BRANCHING_ROOT "svn:ignore" /* ### should be "svn:branching-root" */
>>
>> Hi Stefan. Thanks for picking up on this.
>>
>>> I think your addition of a 'branch root' property is quite a significant
>>> step. Is this really necessary in order to improve the output of
>>> 'svn mergeinfo' or do you have additional steps planned that go beyond
>>> tuning output?
>>
>> Both.  I think knowing whether the (requested) merge source and target
>> are branch roots (and indeed branches of the *same* "project" or tree)
>> is important for improving the output and diagnostics of "svn mergeinfo"
>> and "svn merge" commands.
>>
>> It could of course enable other new behaviours relating to branches, and
>> I don't know what those are yet (apart from trivial UI things like
>> answering "is this a branch?").
>>
>> So I'm working on the idea that it would be useful to have branch roots
>> identifiable by some mechanism, so I'll add "some mechanism" (currently
>> this property, but I'm totally open to a different mechanism such as
>> branch points being defined in a config file) and see what useful
>> behaviours I can come up with.
>>
>>> There has been some discussion about adding a property for this
>>> and similar purposes in the past, see
>>> http://svn.haxx.se/dev/archive-2009-09/0156.shtml
>>> (there are probably more threads about this topic)
>>
>> Yes, and it's time to figure out what we can usefully do with such
>> information and then we'll know exactly what branch configuration
>> information we need and what's a good way to store it.
>>
>> I'll reply to the rest in a further email.
>>
>> - Julian
Welp, I'm never going to get a better lead in than that, so, hi, folks!
Freelance SCM consultant here; used to specialise in ClearQuest, of all
things, but my last two gigs ended up revolving around Subversion.

Specifically, Subversion merges, in the enterprise, and the, uh, quirks
involved.  Each client had different requirements, and thus, the
solution I ended up delivering to each one differed a bit.  The first
solution was neat, and did all kinds of funky ClearQuest integration
and merge validation, but the second one is more applicable to this
discussion, so I'll describe that first.

In essence, it's a hook framework that attempts to enforce Subversion
best-practices by blocking* incoming commits if it detects one or more
of the following:

         (*) Sometimes it'll block, but phrase the error message
             along the lines of "if you *really* want to do this,
             re-try your commit with the phrase 'CONFIRM MULTI-ROOT
             RENAME' somewhere in your commit message".

     TagCopied
     TagRenamed
     TagRemoved
     TagModified
     TagReplaced
     TagSubtreeCopied
     TagSubtreeRenamed
     TagSubtreeRemoved
     TagSubtreeModified
     TagSubtreeReplaced
     MultipleUnknownAndKnownRootsModified
     MixedRootNamesInMultiRootCommit
     MixedRootTypesInMultiRootCommit
     SubversionRepositoryCheckedIn
     MergeinfoAddedToRepositoryRoot
     MergeinfoModifiedOnRepositoryRoot
     SubtreeMergeinfoAdded
     RootMergeinfoRemoved
     DirectoryReplacedDuringMerge
     EmptyMergeinfoCreated
     TagDirectoryCreatedManually
     BranchDirectoryCreatedManually
     BranchRenamedToTrunk
     TrunkRenamedToBranch
     TrunkRenamedToTag
     BranchRenamedToTag
     BranchRenamedOutsideRootBaseDir
     TagSubtreePathRemoved
     RenameAffectsMultipleRoots
     UncleanRenameAffectsMultipleRoots
     MultipleRootsCopied
     UncleanCopy
     FileRemovedFromTag
     CopyKnownRootSubtreeToValidAbsRootPath
     MixedRootsNotClarifiedByExternals
     CopyKnownRootToIncorrectlyNamedRootPath
     CopyKnownRootSubtreeToIncorrectlyNamedRootPath
     RenamedKnownRootToIncorrectlyNamedRootPath
     MixedChangeTypesInMultiRootCommit
     CopyKnownRootToKnownRootSubtree
     UnknownPathCopiedToIncorrectlyNamedNewRootPath
     RenamedKnownRootToKnownRootSubtree
     FileUnchangedAndNoParentCopyOrRename
     DirUnchangedAndNoParentCopyOrRename
     EmptyChangeSet
     CopyKnownRootToUnknownPath
     CopyKnownRootSubtreeToInvalidRootPath
     NewRootCreatedByRenamingUnknownPath
     UnknownPathCopiedToKnownRootSubtree
     NewRootCreatedByCopyingUnknownPath
     PathCopiedFromOutsideRootDuringNonMerge
     UnknownDirReplacedViaCopyDuringNonMerge
     DirReplacedViaCopyDuringNonMerge
     DirectoryReplacedDuringNonMerge
     PreviousPathNotMatchedToPathsInMergeinfo
     PreviousRevDiffersFromParentCopiedFromRev
     PreviousPathDiffersFromParentCopiedFromPath
     PreviousRevDiffersFromParentRenamedFromRev
     PreviousPathDiffersFromParentRenamedFromPath
     KnownRootPathReplacedViaCopy
     BranchesDirShouldBeCreatedManuallyNotCopied
     TagsDirShouldBeCreatedManuallyNotCopied
     CopiedFromPathNotMatchedToPathsInMergeinfo
     InvariantViolatedModifyContainsMismatchedPreviousPath
     InvariantViolatedModifyContainsMismatchedPreviousRev
     InvariantViolatedCopyNewPathInRootsButNotReplace
     MultipleRootsAffectedByRemove
     AbsoluteRootOfRepositoryCopied
     PropertyChangedButOldAndNewValuesAreSame
     CopiedOrRenamedUnknownPathToIncorrectlyNamedNewRootPath
     UnknownPathRenamedViaReplaceToExistingKnownRoot
     UnknownPathCopiedViaReplaceToExistingKnownRoot
     UnknownPathRenamedToKnownRootSubtree
     UnknownPathCopiedToKnownRootSubtree
     KnownRootSubtreeRenamedViaReplaceToExistingKnownRoot
     UncleanRenameOfRootAncestorPath
     RenamedKnownRootViaReplaceToExistingKnownRoot
     RootPathAncestorRenamedViaReplaceToExistingKnownRoot
     RenamedKnownRootViaReplaceToRootAncestorPath
     RenamedKnownRootViaReplaceToRootAncestorPath
     RootPathAncestorRenamedToValidAbsoluteRootPath
     RootPathAncestorRenamedToValidRootPathSubtree
     RootPathAncestorRenamedToKnownRootSubtree
     RootPathAncestorRenamedViaReplaceToRootAncestorPath
     RenamedKnownRootToUnknownPath
     RenamedKnownRootSubtreeToUnknownPath
     RenamedKnownRootSubtreeToValidRootPath
     RenamedKnownRootSubtreeToIncorrectlyNamedRootPath
     UncleanRename
     RenameRelocatedPathOutsideKnownRoot

         (There's probably room for another e-mail thread just
          discussing all of these conditions; let's just say,
          Subversion repositories in the enterprise rarely look
          like their usually-well-laid out open source repository
          brethren.  What was the Blade Runner line?  "I've seen
          things you people wouldn't believe."? ;-)  My personal
          favorite: 'SubversionRepositoryCheckedIn'.)

So, as you can see, most of these conditions involve the concept of a
root.  Thus, the ability to accurately discern what constitutes a root
took up a large portion of my time.

Hard-coding regexes and forcing all repositories to confirm to a pre-
defined repository layout worked like a charm for my first client, as
I was coming in before they had any Subversion repositories rolled out
into production.  (Well, sort of.)

That unfortunately wasn't feasible for my second client.  They were a
*huge* Subversion shop.  At the time I came in they had something like
960 production repositories, and I wouldn't be surprised if they were
well over 1,000 by now.  There was no standard layout between repos,
and a lot of repos used non-standard branches/tags/trunks paths so
trying to manage 'root detection' via regexes was a non-starter.

For example, a number of repos had layouts like this:

     /foo/trunk
     /foo/branches/1.0.x
     /foo/branches/bugzilla/1081

i.e. 'bugzilla' was just some random directory they created to hold
developer branches related to bugs.  A regex approach would have
matched 'bugzilla' as the branch root, whereas, in fact, the branch
root would have been 1081.

The other non-starter was requiring the admin staff to have to go in
and manually specify what constituted a branch, i.e. setting a 'branch
root' property on relevant paths.  The overhead that would have been
required to do that for ~1,000 repositories (with hundreds, if not
thousands of differently named branches/roots (i.e. not particularly
easy to automate reliably)) was not acceptable (for many enterprisey
reasons mainly surrounding cost).

So, I needed to design the branch detection logic in such a way that
it didn't require any hand-holding from the admins or support staff.

It took two attempts.

For the first attempt, I played around with the notion of a root *base*
directory, i.e. /branches and /tags.  The first thing the framework
would do when processing a pre-commit was create a 'RepositoryRoots'
class (the framework was written in Python FWIW), which would recurse
through the repo up to N-levels deep in order to determine the valid
root base directories.  Except for trunk, which was special, if a
directory had subdirectories that were created by copying another path
(i.e. how tags or branches are created), then the directory would be
considered a root base dir.

That lasted... about a day or two.  It was a leaky abstraction at best,
and broke when I encountered repos with the more non-standard layouts.
(I'm not even sure if I've described it accurately above; but eh, who
cares, it's gone now.)

The problem with the regex and base-root-dir discovery approaches was
that they were essentially heuristic based.  "This directory features
lots of subdirectories that were copies of other paths, therefore, it's
a good chance it's a valid root base directory."

In most cases, yes, that was a valid assumption, but not always.  The
root detection logic was the most critical piece of my solution -- I
wasn't getting paid to correctly detect roots 70% of the time in 60% of
the repos.  It needed to be 100% in 100%.

So, I thought to myself, how can I correctly and autonomously identify
a root with 100% accuracy?  What one property did valid roots share
that I could interrogate?  Heck, what even constitutes a root? A branch
is a root, so is a tag, so is trunk.

....and then it dawned on me.  It seems so simple now, in retrospect:

     In the beginning, there was one root: trunk.  Then it was copied
     elsewhere, and became a branch, or maybe a tag.  These copies are
     also roots, and copies of them should also be considered roots.

Ah, so simple!  I just need to start at revision 0 and work my way up to
HEAD, whilst keeping a record of roots I encounter along the way.  And
that's pretty much it ;-)

Turns out, that approach has worked surprisingly well.  It's been in
production at the second client's site for nearly a year now.  They just
run the 'repo analysis' part of the code against new repositories before
enabling the hooks, and wallah, they get instant root detection and
prevention of some 80-something erroneous conditions.

Here are some techie' details about the implementation.  So, the script
stores root information in a revision property called 'evn:roots' (set
against the root of the repository).  The value of evn:roots at any
given revision will list all of the known roots in the repo at that
revision:

% svn pg --revprop -r26503 evn:roots svn://client.com/repos/foo
{'/build/branches/3.0.1/': {'created': 22323},
  '/build/branches/3.0.2/': {'created': 23129},
  '/build/branches/3.1.0/': {'created': 25804},
  '/build/branches/cvs/0.0.1/': {'created': 26389},
  '/build/branches/bugzilla/4144/': {'created': 22121},
  '/build/branches/bugzilla/6952/': {'created': 17661},
  '/build/release//3.0.0/': {'created': 20774},
  '/build/release/paris/3.0.0/': {'created': 20307},
  '/build/release/rome/3.0.1/': {'created': 22473},
  '/build/trunk/': {'created': 2919},
  '/src/trunk/': {'created': 9353},
  ...

The 'created' revision refers to the revision that the root was created
in.  That's important, 'cause we store special metadata against the root
in the revprop for the revision it was created in:

% svn pg --revprop -r9353 svn://client.com/repos/foo
  ...
  '/src/trunk/': {
     'copies': {
         9834:  [('/src/branches/2.1/', 9835)],
         9997:  [('/src/branches/bugzilla/2800/', 9998)],
         10211: [('/src/branches/bugzilla/3326/', 10212)],
         10252: [('/src/branches/bugzilla/2160/', 10253)],
         10468: [('/src/branches/2.2/', 10469)],
         11148: [('/src/branches/2.3/', 11149)],
         11420: [('/src/branches/bugzilla/3720/', 11421)]},
     'created': 9353,
     'creation_method': 'created'},
  ...

i.e. we store all the subsequent forward-copies of this root, as well as
details of how it was created (which isn't very interesting in this
example, as it's trunk and was created via mkdir, but if it were a
branch or tag, it would contain details about where it was copied from).

Let's say I delete /src/trunk in r26504.  The entry for it in evn:roots
in that revision will be gone; but a note will be made against the r9353
creation revprop to indicate which rev it was deleted in.

The importance of storing data like this becomes apparent when you deal
with situations like this:

  *hooks are turned off*
     r2:     svn cp ^/trunk ^/branches/foo
     r3:     svn rm ^/branches/foo
     r4:     svn mkdir ^/branches/foo
  *repo is analysed, evn:roots are set, hooks are turned on*

An attempt to do the following would be blocked, because r4/HEAD of
/branches/foo was not created correctly (i.e. wasn't copied from an
existing root), and thus, isn't considered a root either:

     svn cp /branches/foo /branches/bar

However, the following *would* work, because /branches/foo *was* a valid
root in r2:

     svn cp -r2 /branches/foo /branches/bar

Thoughts?

     Trent.

Re: Identifying branch roots

Posted by Julian Foad <ju...@wandisco.com>.

C. Michael Pilato wrote:
[...]
> The thing you are describing is a branch root, so just call it what it is.
> ("svn:branch-root" is what I suggested a couple of years ago for this.)
[...]

I guess, name "svn:branch-root" => value "subversion" can make sense
when we get used to the semantics being "this is *a* branch-root of the
<thing> called 'subversion'" rather than "the branch-root is named ...".

- Julian

Re: Identifying branch roots

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

C. Michael Pilato wrote on Fri, Oct 07, 2011 at 11:08:20 -0400:
> On 10/07/2011 06:59 AM, Julian Foad wrote:
> >>> I think this is an exciting feature with lots of potential but
> >>> it has a lot more inherent complexity than improving 'svn mergeinfo'
> >>> output. Could we split improving the output of 'svn mergeinfo' and
> >>> identifying branch roots into two distinct feature branches?
> > 
> > Splitting it wouldn't work for me, because the branch marker information
> > has no purpose and no defined requirements without the mergeinfo user
> > interface work; the existence and meaning of the info goes hand-in-hand
> > with developing some ways in which that info can be used.
> > 
> > When we come to integrate some of this work to trunk, by then the
> > specification of the branching info will have been worked out, and so
> > then it may be best to create a feature branch specifically for adding
> > that (without the mergeinfo changes) if it is complex enough to be
> > worthwhile doing so.
> 
> Seems sane to me.  If all you need is an property whose mere presence
> matters, we can add meaning to specific values later.

Will the meaning be added before or after the code is released?  There
are compatiblity implications in the latter case.

Re: Identifying branch roots

Posted by "C. Michael Pilato" <cm...@collab.net>.

On 10/07/2011 06:59 AM, Julian Foad wrote:
>>> I think this is an exciting feature with lots of potential but
>>> it has a lot more inherent complexity than improving 'svn mergeinfo'
>>> output. Could we split improving the output of 'svn mergeinfo' and
>>> identifying branch roots into two distinct feature branches?
> 
> Splitting it wouldn't work for me, because the branch marker information
> has no purpose and no defined requirements without the mergeinfo user
> interface work; the existence and meaning of the info goes hand-in-hand
> with developing some ways in which that info can be used.
> 
> When we come to integrate some of this work to trunk, by then the
> specification of the branching info will have been worked out, and so
> then it may be best to create a feature branch specifically for adding
> that (without the mergeinfo changes) if it is complex enough to be
> worthwhile doing so.

Seems sane to me.  If all you need is an property whose mere presence
matters, we can add meaning to specific values later.

>>>> +/* Set *MARKER to the project-root marker that is common to SOURCE and
>>>> + * TARGET, or to NULL if neither has such a marker.
>>>
>>> Why do you need to introduce a new term "project"?
> 
> That's not a very good term, and I started by calling it a "branch-root
> marker", because it does indeed mark a branch root, but I wanted a name
> that says the value of this property is the name of the "code base" or
> "project tree" that has been branched, rather than the name of the
> branch.  I'm open to better ideas.

Yeah, it's not a very good term at all because "project root" is an
established term in Subversion already (and has been for a decade) -- and
not the same thing you're trying to describe here.

The thing you are describing is a branch root, so just call it what it is.
("svn:branch-root" is what I suggested a couple of years ago for this.)
Even when you set the property the first time on, say, your trunk, you are
declaring the existence of a conceptual branch of the
codebase-as-a-tree-metaphor, if the only such branch extant at the time.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand