You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Talden <ta...@gmail.com> on 2008/03/21 12:37:12 UTC

Storing Copied-To info (was: Tree conflicts - thoughts on use cases, merging, and tests)

> (2) there is no API call "whereis", defined as follows:
>   whereis(URL:rev, targetbranchURL:rev), tells you where a file
> identified by URL:rev (on a source branch, for example), can be found
> in the targetbranch (note: whereis may return [0:N] URLs because of
> possible cloning on the target branch). (Note that this requires
> the ability to search "forward" through the logs efficiently, a
> feature that Subversion does not provide right now AFAIK.)

I'd been thinking about this Copied-To information recently and how it
might be stored in an append-only system in a manner that is cheap to
store, but saves some of the cost of a complete forward scan of all
revisions.

If we think of paths in the repository as currently all being in the
'File-System' name-space then a path from the repository root of
"ant.txt" is actually "FS:/ant.txt"

We could use the same skip-delta logic to build up content in another
name-space, 'Copied-To', with content consumed by clients during log
operations.

EG let's say we performed the copy...

    FS:/ant.txt@r1 --> FS:/bat.txt@r3

...then "CT:/ant.txt/r1" could have the following added to its content...

    bat.txt@r3

...if we then did...

    FS:/ant.txt@r1 --> FS:/cat.txt@r4
    FS:/bat.txt@r3 --> FS:/dog.txt@r4

...we would add change "CT:/ant.txt/r1" to...

    bat.txt@r3
    cat.txt@r4

...and add the following to "CT:/bat.txt/r3"

    dog.txt@r4

Looking up Copied-to information is now available by looking at the
HEAD revision of the relevant file in the Copied-To name-space.  It
should also be cheap to ask "what are all of the revisions at which
this file was copied, when and where was it copied to".

Now naturally the file need not be textual, should probably be sorted
and should probably use internal identifiers rather than textual
filenames as source and destination.

Also merge-logic for this name-space is different and, I think, as I
think conflicts are always resolvable with a 'keep both' approach.

Note that this probably doesn't make looking for copied-to information
cheap, but I think that for many use-cases it will make it cheaper.
Benefiting from binary diffs and skip-deltas this shouldn't be a huge
additional burden.  You add the delta in the same revision that the
copy is performed making no change to the atomic nature of operations
or the append-only nature necessary for syncing mirrors.

I'm curious whether anyone else sees this as a solution with any merit
within the scope of the existing SVN architecture.  Is this at least
thought provoking enough to hear some discussion on how we might make
copied-to information cheaper? Would this be useful in building
revision graphs that trace tagging and branching as well as
modification?

--
Talden

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Storing Copied-To info (was: Tree conflicts - thoughts on use cases, merging, and tests)

Posted by Mark Phippard <ma...@gmail.com>.
On Fri, Mar 21, 2008 at 7:51 AM, Stefan Sperling <st...@elego.de> wrote:
> On Fri, Mar 21, 2008 at 09:46:55AM -0400, C. Michael Pilato wrote:
>  > But I actually question the conclusion you provide for your own use-case.
>  > Yes, an issue tracker can tell you which branches have had a bugfix
>  > applied.  But that's rather dependent on your processes involving a
>  > tracker, and on human ability to keep said tracker up to date, and must be
>  > cross-referenced with the version control system anyway to answer the
>  > similar question, "In which branches has this bugfix *not* been applied?"
>
>  The question is "How much of a process-management tool vs. a version control
>  system (which provides a subset of the functionality of the former) does
>  Subversion want to be?" Or maybe it is "How much does Subversion want to help
>  process-management tools with organizing their data?"
>
>  The line has to be drawn somewhere.

I want the line to be within a revision graph :)

Seriously, this feature is asked for a lot.  I was just at EclipseCon
this week and it comes up over and over again when talking about SVN.
The hacks you have to do today to create one of these are crazy and it
would be great if the repository could just provide the information
needed for this to become a first-class feature of SVN.

We also have this feature in Subclipse that could probably use this
information if it existed:

http://subclipse.tigris.org/branch_tag.html

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Storing Copied-To info (was: Tree conflicts - thoughts on use cases, merging, and tests)

Posted by Stefan Sperling <st...@elego.de>.
On Fri, Mar 21, 2008 at 09:46:55AM -0400, C. Michael Pilato wrote:
> But I actually question the conclusion you provide for your own use-case. 
> Yes, an issue tracker can tell you which branches have had a bugfix 
> applied.  But that's rather dependent on your processes involving a 
> tracker, and on human ability to keep said tracker up to date, and must be 
> cross-referenced with the version control system anyway to answer the 
> similar question, "In which branches has this bugfix *not* been applied?"

The question is "How much of a process-management tool vs. a version control
system (which provides a subset of the functionality of the former) does
Subversion want to be?" Or maybe it is "How much does Subversion want to help
process-management tools with organizing their data?"

The line has to be drawn somewhere.

> I began work some time ago on successor link support on a branch with a 
> related name (fs-successor-ids, perhaps?).  I don't recall the status of 
> that code -- probably something like "BDB stores it, but you can't use it; 
> FSFS lacks a design altogether."

Now I remember you mentioning this branch in the past. Quite interesting.
I guess Talden might want to take a look at that branch :)

-- 
Stefan Sperling <st...@elego.de>                 Software Developer
elego Software Solutions GmbH                            HRB 77719
Gustav-Meyer-Allee 25, Gebaeude 12        Tel:  +49 30 23 45 86 96 
13355 Berlin                              Fax:  +49 30 23 45 86 95
http://www.elego.de                 Geschaeftsfuehrer: Olaf Wagner

Re: Storing Copied-To info (was: Tree conflicts - thoughts on use cases, merging, and tests)

Posted by "C. Michael Pilato" <cm...@collab.net>.
Stefan Sperling wrote:
> On Sat, Mar 22, 2008 at 01:37:12AM +1300, Talden wrote:
>> Note that this probably doesn't make looking for copied-to information
>> cheap, but I think that for many use-cases it will make it cheaper.
> 
> I'd like to see concrete use cases to support a copied-to implementation.
> As stated in another mail in the thread this one has spawn off from,
> I've implemented this once for a client, using properties to store the
> copied-to info (which isn't a good design but works as a proof-of-concept).
> The use case this was going to support was "given fix for a bug, how I can
> find out on which branches this fix has been applied to already?".
> This use case is invalid IHMO because it can be answered more effectively
> by an issue tracker than Subversion itself. The client even ended up not
> using the code.
> 
> Do you have other use cases that would benefit from having copied-to
> information available?

As noted, one admittedly superficial use-case for the ability to trace 
forward in an item's history is revision graphing scenarios.

But I actually question the conclusion you provide for your own use-case. 
Yes, an issue tracker can tell you which branches have had a bugfix applied. 
  But that's rather dependent on your processes involving a tracker, and on 
human ability to keep said tracker up to date, and must be cross-referenced 
with the version control system anyway to answer the similar question, "In 
which branches has this bugfix *not* been applied?"

Note that I'm talking about forward tracing in general, not copy-to.  That 
stems from an awareness of our backends.  That is, if we had a general 
successor link from a node-revision to all its successors, "copy to" is a 
subset of those relationship (namely, the ones where the successor has a 
"copy from" the original node).

I began work some time ago on successor link support on a branch with a 
related name (fs-successor-ids, perhaps?).  I don't recall the status of 
that code -- probably something like "BDB stores it, but you can't use it; 
FSFS lacks a design altogether."

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand


Re: Storing Copied-To info (was: Tree conflicts - thoughts on use cases, merging, and tests)

Posted by Talden <ta...@gmail.com>.
On Sat, Mar 22, 2008 at 2:22 AM, Stefan Sperling <st...@elego.de> wrote:
> On Sat, Mar 22, 2008 at 01:37:12AM +1300, Talden wrote:
>  > Note that this probably doesn't make looking for copied-to information
>  > cheap, but I think that for many use-cases it will make it cheaper.
>
>  I'd like to see concrete use cases to support a copied-to implementation.
>  As stated in another mail in the thread this one has spawn off from,
>  I've implemented this once for a client, using properties to store the
>  copied-to info (which isn't a good design but works as a proof-of-concept).
>  The use case this was going to support was "given fix for a bug, how I can
>  find out on which branches this fix has been applied to already?".
>  This use case is invalid IHMO because it can be answered more effectively
>  by an issue tracker than Subversion itself. The client even ended up not
>  using the code.

I would like Subversion to be able to be authoritative on copy
information.  Requiring a side process to capture this information
just begs for human error issues.

>  Do you have other use cases that would benefit from having copied-to
>  information available?

Though I don't consider it of high value, but I could see someone
asking 'where has this file been copied to' in an obliterate task...
The user would need to know all clones of sensitive information to be
purged to purge those paths as well (and the Obliterate feature could
use this information to do that follow operation more cheaply).

>  > Would this be useful in building
>  > revision graphs that trace tagging and branching as well as
>  > modification?
>
>  Yes, I think it would. This is one valid, albeit exotic, use case already.

Tags, Tag collections, branches and branch collections are other
potentially useful pieces of meta-data, combined with copied-to would
make intelligent graph generation much much easier.

I don't see the revision graph as an exotic use-case in a VCS... it is
for Subversion but then that's because support for it sucks.

--
Talden

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Storing Copied-To info (was: Tree conflicts - thoughts on use cases, merging, and tests)

Posted by Stefan Sperling <st...@elego.de>.
On Sat, Mar 22, 2008 at 01:37:12AM +1300, Talden wrote:
> Note that this probably doesn't make looking for copied-to information
> cheap, but I think that for many use-cases it will make it cheaper.

I'd like to see concrete use cases to support a copied-to implementation.
As stated in another mail in the thread this one has spawn off from,
I've implemented this once for a client, using properties to store the
copied-to info (which isn't a good design but works as a proof-of-concept).
The use case this was going to support was "given fix for a bug, how I can
find out on which branches this fix has been applied to already?".
This use case is invalid IHMO because it can be answered more effectively
by an issue tracker than Subversion itself. The client even ended up not
using the code.

Do you have other use cases that would benefit from having copied-to
information available?
 
> Would this be useful in building
> revision graphs that trace tagging and branching as well as
> modification?

Yes, I think it would. This is one valid, albeit exotic, use case already.

-- 
Stefan Sperling <st...@elego.de>                 Software Developer
elego Software Solutions GmbH                            HRB 77719
Gustav-Meyer-Allee 25, Gebaeude 12        Tel:  +49 30 23 45 86 96 
13355 Berlin                              Fax:  +49 30 23 45 86 95
http://www.elego.de                 Geschaeftsfuehrer: Olaf Wagner