You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Julian Foad <ju...@wandisco.com> on 2011/09/30 11:40:31 UTC

Thought experiment - follow logs back before r1 into previous repository

I want to share with you an idea that came to me from a customer.  I'm
not at all proposing that anybody should do this, I'm just curious what
you think.

Imagine, if you will, that we are coders working in a Subversion
repository that has grown very large and that for IT reasons a decision
has been made to freeze the repository -- make it read-only -- and a new
repository has been created, taking a snapshot of the old HEAD and
importing that as the new r1.  We are to continue our development work
in the new repository.

Those of you who are "old" enough svn devs, think back to when
Subversion became self-hosting, starting with a snapshow of the head of
the CVS repository.  All the prior history was back before r1,
inaccessible via Subversion.  Was that a big problem?  No, it wasn't,
and I know that the snapshot approach is often recommended as a
pragmatic and perfectly reasonable way to migrate from one VCS to
another.  But maybe this time there will be hundreds of developers
working in dozens of projects[1].

As Subversion devs today we might like to say "no, don't do that, let's
find a better solution to whatever problem was forcing us to re-start
with an imported snapshot".  But imagine that's already been discussed
and this is the best way forward and now we simply have to get on with
using the new repository.

Q:  What simple modifications could "we" (anybody) make to our
Subversion clients that would help us to work more effectively in this
scenario?  The customer I got this idea from is more interested in
TortoiseSVN than in "svn" and asked me a somewhat different question,
but I think this is the general idea that's of wider interest.

A:  What do you think?

    Maybe one of the most useful things we could do is teach "svn
log" (when running in the usual 'backwards' direction) to run a
follow-on log in the old repo if and when reaching r1.  Perhaps we'd set
a revprop on (new) r0 or r1 pointing to the old repo URL so that this
info is configured in a single place.  The two sets of revision numbers
in the output would be confusing so we may want to consider tagging the
old and/or the new revnums with some marker as well as inserting an "And
now from the old repository:" message.

    I think teaching "svn blame" to view the old repo would be harder:
it would require more intrusive code changes in svn_client_blame().
It's not theoretically difficult to do, of course, but perhaps the
code-to-value ratio would not be worth having in libsvn_client ... hmm,
unless we re-architect the blame code so that it's fed diffs from the
client layer instead of fetching them itself, then it could be done
really cleanly.  The output format would just need a minor tweak to
distinguish old from new revs.

    I think teaching "svn diff" to do general cross-repo diffs would not
be feasible with the current diff implementation.  However, one of my
goals is to generalize the diff code further so it could support such
things (cross-repo, unversioned local tree, etc.).  That would be useful
in theory, but in practice I can't see it really being used very often
in this start-again scenario.  But any single-rev diff is easily
supported because the cut-over revision is present in both repos.  (We
can assume that the tree in old@OLD_HEAD is identical to new@1.)  So
maybe we'd want to make single-rev diffs and all same-repo diffs easier
by tweaking "svn diff" to follow the specified path back into a revision
in the old repo, a bit like what I said above for "svn log", if some
special switch is specified.

    Any other commands or work flows that might be really useful?  I
wouldn't dream of trying to make "svn up" go back to the old repo, that
would certainly be over the top.  And I wouldn't expect "svn cat", "svn
proplist" etc. to be worth bothering with, unless all such simple
read-only commands get the same functionality "for free".


Mad or genius?  (And I know it wouldn't be worth bothering in a small
repository; let's assume it's a big and busy project with lots of
interesting history.)

- Julian


[1] I'm just making up numbers here; I don't know what sort of numbers
the customer that brought up this idea has.

Re: Thought experiment - follow logs back before r1 into previous repository

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

Julian Foad wrote on Fri, Sep 30, 2011 at 10:40:31 +0100:
>     I think teaching "svn blame" to view the old repo would be harder:
> it would require more intrusive code changes in svn_client_blame().
> It's not theoretically difficult to do, of course, but perhaps the
> code-to-value ratio would not be worth having in libsvn_client ... hmm,
> unless we re-architect the blame code so that it's fed diffs from the
> client layer instead of fetching them itself, then it could be done
> really cleanly.  The output format would just need a minor tweak to
> distinguish old from new revs.
> 

How?

Perhaps some sort of N-ary identifier --- "%d.%d" % (repos_number,
revision-number) for the chain case, or "%s.%d" % (repos_path_from_root,
revision-number) for the tree case.

>     I think teaching "svn diff" to do general cross-repo diffs would not
> be feasible with the current diff implementation.  However, one of my

Why?  If old-repos@rHEAD == new-repos@r0, then you could construct
a delta between old-repos@rM and new-repos@rN by combining the deltas
[rM, rHEAD] and [r0, rN], which then would allow the diff...?

> goals is to generalize the diff code further so it could support such
> things (cross-repo, unversioned local tree, etc.).  That would be useful
> in theory, but in practice I can't see it really being used very often
> in this start-again scenario.  But any single-rev diff is easily
> supported because the cut-over revision is present in both repos.  (We
> can assume that the tree in old@OLD_HEAD is identical to new@1.)  So
> maybe we'd want to make single-rev diffs and all same-repo diffs easier
> by tweaking "svn diff" to follow the specified path back into a revision
> in the old repo, a bit like what I said above for "svn log", if some
> special switch is specified.
> 
>     Any other commands or work flows that might be really useful?  I
> wouldn't dream of trying to make "svn up" go back to the old repo, that
> would certainly be over the top.  And I wouldn't expect "svn cat", "svn

Yeah, probably overkill, especially in the mixed-revisions case.  (We
could somehow signal, via the UUIDs, that the two repositories are
related... but whatever; that's Future Work.)

You mention using 'svn up' for backdating.  What about using it for
updating?  i.e., in a working copy of the 'old' repository, to make 'svn
up' print an advisory message saying "Oops; the history has been
restarted; checkout a new working copy from %s" % URL?

> proplist" etc. to be worth bothering with, unless all such simple
> read-only commands get the same functionality "for free".
> 
> 
> Mad or genius?  (And I know it wouldn't be worth bothering in a small

Yes :-)

> repository; let's assume it's a big and busy project with lots of
> interesting history.)
> 
> - Julian
> 
> 
> [1] I'm just making up numbers here; I don't know what sort of numbers
> the customer that brought up this idea has.
> 
>

Re: Thought experiment - follow logs back before r1 into previous repository

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

Julian Foad wrote on Fri, Sep 30, 2011 at 12:17:09 +0100:
> To record my own opinion: I think it's a fine idea that users in that
> situation should be able to do that sort of thing but I don't think that
> functionality belongs in "svn" as I think it's an uncommon use case and
> can't be cleanly and generally supported -- it's rather a hack.  If we
> supported third-party client-side plug-ins that's where it would
> belong ... but we don't have any plans to do so.
> 

Could this be implemented via a wrapper libsvn_fs_* or libsvn_ra_*
module?

libsvn_fs_raid0 or libsvn_ra_raid0.

> - Julian
> 
> 
> I (Julian Foad) wrote:
> > I want to share with you an idea that came to me from a customer.  I'm
> > not at all proposing that anybody should do this, I'm just curious what
> > you think.
> > 
> > Imagine, if you will, that we are coders working in a Subversion
> > repository that has grown very large and that for IT reasons a decision
> > has been made to freeze the repository -- make it read-only -- and a new
> > repository has been created, taking a snapshot of the old HEAD and
> > importing that as the new r1.  We are to continue our development work
> > in the new repository.
> > 
> > Those of you who are "old" enough svn devs, think back to when
> > Subversion became self-hosting, starting with a snapshow of the head of
> > the CVS repository.  All the prior history was back before r1,
> > inaccessible via Subversion.  Was that a big problem?  No, it wasn't,
> > and I know that the snapshot approach is often recommended as a
> > pragmatic and perfectly reasonable way to migrate from one VCS to
> > another.  But maybe this time there will be hundreds of developers
> > working in dozens of projects[1].
> > 
> > As Subversion devs today we might like to say "no, don't do that, let's
> > find a better solution to whatever problem was forcing us to re-start
> > with an imported snapshot".  But imagine that's already been discussed
> > and this is the best way forward and now we simply have to get on with
> > using the new repository.
> > 
> > Q:  What simple modifications could "we" (anybody) make to our
> > Subversion clients that would help us to work more effectively in this
> > scenario?  The customer I got this idea from is more interested in
> > TortoiseSVN than in "svn" and asked me a somewhat different question,
> > but I think this is the general idea that's of wider interest.
> > 
> > A:  What do you think?
> > 
> >     Maybe one of the most useful things we could do is teach "svn
> > log" (when running in the usual 'backwards' direction) to run a
> > follow-on log in the old repo if and when reaching r1.  Perhaps we'd set
> > a revprop on (new) r0 or r1 pointing to the old repo URL so that this
> > info is configured in a single place.  The two sets of revision numbers
> > in the output would be confusing so we may want to consider tagging the
> > old and/or the new revnums with some marker as well as inserting an "And
> > now from the old repository:" message.
> > 
> >     I think teaching "svn blame" to view the old repo would be harder:
> > it would require more intrusive code changes in svn_client_blame().
> > It's not theoretically difficult to do, of course, but perhaps the
> > code-to-value ratio would not be worth having in libsvn_client ... hmm,
> > unless we re-architect the blame code so that it's fed diffs from the
> > client layer instead of fetching them itself, then it could be done
> > really cleanly.  The output format would just need a minor tweak to
> > distinguish old from new revs.
> > 
> >     I think teaching "svn diff" to do general cross-repo diffs would not
> > be feasible with the current diff implementation.  However, one of my
> > goals is to generalize the diff code further so it could support such
> > things (cross-repo, unversioned local tree, etc.).  That would be useful
> > in theory, but in practice I can't see it really being used very often
> > in this start-again scenario.  But any single-rev diff is easily
> > supported because the cut-over revision is present in both repos.  (We
> > can assume that the tree in old@OLD_HEAD is identical to new@1.)  So
> > maybe we'd want to make single-rev diffs and all same-repo diffs easier
> > by tweaking "svn diff" to follow the specified path back into a revision
> > in the old repo, a bit like what I said above for "svn log", if some
> > special switch is specified.
> > 
> >     Any other commands or work flows that might be really useful?  I
> > wouldn't dream of trying to make "svn up" go back to the old repo, that
> > would certainly be over the top.  And I wouldn't expect "svn cat", "svn
> > proplist" etc. to be worth bothering with, unless all such simple
> > read-only commands get the same functionality "for free".
> > 
> > 
> > Mad or genius?  (And I know it wouldn't be worth bothering in a small
> > repository; let's assume it's a big and busy project with lots of
> > interesting history.)
> > 
> > - Julian
> > 
> > 
> > [1] I'm just making up numbers here; I don't know what sort of numbers
> > the customer that brought up this idea has.
> > 
> > 
> 
>

Re: Thought experiment - follow logs back before r1 into previous repository

Posted by Julian Foad <ju...@wandisco.com>.

To record my own opinion: I think it's a fine idea that users in that
situation should be able to do that sort of thing but I don't think that
functionality belongs in "svn" as I think it's an uncommon use case and
can't be cleanly and generally supported -- it's rather a hack.  If we
supported third-party client-side plug-ins that's where it would
belong ... but we don't have any plans to do so.

- Julian


I (Julian Foad) wrote:
> I want to share with you an idea that came to me from a customer.  I'm
> not at all proposing that anybody should do this, I'm just curious what
> you think.
> 
> Imagine, if you will, that we are coders working in a Subversion
> repository that has grown very large and that for IT reasons a decision
> has been made to freeze the repository -- make it read-only -- and a new
> repository has been created, taking a snapshot of the old HEAD and
> importing that as the new r1.  We are to continue our development work
> in the new repository.
> 
> Those of you who are "old" enough svn devs, think back to when
> Subversion became self-hosting, starting with a snapshow of the head of
> the CVS repository.  All the prior history was back before r1,
> inaccessible via Subversion.  Was that a big problem?  No, it wasn't,
> and I know that the snapshot approach is often recommended as a
> pragmatic and perfectly reasonable way to migrate from one VCS to
> another.  But maybe this time there will be hundreds of developers
> working in dozens of projects[1].
> 
> As Subversion devs today we might like to say "no, don't do that, let's
> find a better solution to whatever problem was forcing us to re-start
> with an imported snapshot".  But imagine that's already been discussed
> and this is the best way forward and now we simply have to get on with
> using the new repository.
> 
> Q:  What simple modifications could "we" (anybody) make to our
> Subversion clients that would help us to work more effectively in this
> scenario?  The customer I got this idea from is more interested in
> TortoiseSVN than in "svn" and asked me a somewhat different question,
> but I think this is the general idea that's of wider interest.
> 
> A:  What do you think?
> 
>     Maybe one of the most useful things we could do is teach "svn
> log" (when running in the usual 'backwards' direction) to run a
> follow-on log in the old repo if and when reaching r1.  Perhaps we'd set
> a revprop on (new) r0 or r1 pointing to the old repo URL so that this
> info is configured in a single place.  The two sets of revision numbers
> in the output would be confusing so we may want to consider tagging the
> old and/or the new revnums with some marker as well as inserting an "And
> now from the old repository:" message.
> 
>     I think teaching "svn blame" to view the old repo would be harder:
> it would require more intrusive code changes in svn_client_blame().
> It's not theoretically difficult to do, of course, but perhaps the
> code-to-value ratio would not be worth having in libsvn_client ... hmm,
> unless we re-architect the blame code so that it's fed diffs from the
> client layer instead of fetching them itself, then it could be done
> really cleanly.  The output format would just need a minor tweak to
> distinguish old from new revs.
> 
>     I think teaching "svn diff" to do general cross-repo diffs would not
> be feasible with the current diff implementation.  However, one of my
> goals is to generalize the diff code further so it could support such
> things (cross-repo, unversioned local tree, etc.).  That would be useful
> in theory, but in practice I can't see it really being used very often
> in this start-again scenario.  But any single-rev diff is easily
> supported because the cut-over revision is present in both repos.  (We
> can assume that the tree in old@OLD_HEAD is identical to new@1.)  So
> maybe we'd want to make single-rev diffs and all same-repo diffs easier
> by tweaking "svn diff" to follow the specified path back into a revision
> in the old repo, a bit like what I said above for "svn log", if some
> special switch is specified.
> 
>     Any other commands or work flows that might be really useful?  I
> wouldn't dream of trying to make "svn up" go back to the old repo, that
> would certainly be over the top.  And I wouldn't expect "svn cat", "svn
> proplist" etc. to be worth bothering with, unless all such simple
> read-only commands get the same functionality "for free".
> 
> 
> Mad or genius?  (And I know it wouldn't be worth bothering in a small
> repository; let's assume it's a big and busy project with lots of
> interesting history.)
> 
> - Julian
> 
> 
> [1] I'm just making up numbers here; I don't know what sort of numbers
> the customer that brought up this idea has.
> 
>

Re: Thought experiment - follow logs back before r1 into previous repository

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

Konstantin Kolinko wrote on Fri, Sep 30, 2011 at 14:15:48 +0400:
> 2011/9/30 Julian Foad <ju...@wandisco.com>:
> > Perhaps we'd set
> > a revprop on (new) r0 or r1 pointing to the old repo URL so that this
> > info is configured in a single place.  The two sets of revision numbers
> > in the output would be confusing so we may want to consider tagging the
> > old and/or the new revnums with some marker as well as inserting an "And
> > now from the old repository:" message.

Of course, this should nest to an arbitrary-length chain of repositories.  :-)

Another problem, when you chain repositories this way, I could see how
it would be more natural for the "second in the chain" repository to
have a non-empty r0; but the assumption that the contents of r0 are
fixed --- precisely the root fspath and nothing else --- is ubiquitous
in the codebase.

Thirdly, once you allow a chain of repositories, you can as easily allow
a tree of repositories (where *two* repositories state that repos5 is
their predecessor).  From here the distance is short to asking if DAGs
can also be supported, and --- if that's a yes --- then to further ask
if more complex graph topologies (that DVCS's support) can also be
supported :-).

Re: Thought experiment - follow logs back before r1 into previous repository

Posted by Konstantin Kolinko <kn...@gmail.com>.

2011/9/30 Julian Foad <ju...@wandisco.com>:
> Perhaps we'd set
> a revprop on (new) r0 or r1 pointing to the old repo URL so that this
> info is configured in a single place.  The two sets of revision numbers
> in the output would be confusing so we may want to consider tagging the
> old and/or the new revnums with some marker as well as inserting an "And
> now from the old repository:" message.
>

Just several other possible use case:

1) Consider a project that was developed outside of ASF, and then
imported int ASF repository using "snapshot" of sources on that date.
Can we link to the old repository somehow?

2) Tomcat 6.0 source code
http://svn.apache.org/viewvc?view=revision&revision=389140
http://svn.apache.org/viewvc?view=revision&revision=389146

It was imported as snapshot, without proper links to Tomcat 5.5 sources.
There was a cause for that:
 Tomcat 5.5 used different project layout:
module/(trunk|branch|tags)/...
while in Tomcat 6.0 it is just a single /trunk. All source code is now
in a single tree "/trunk/java" tree, whereas before the packages were
split across several modules.

Sometimes I miss that viewvc cannot show the history of a certain line
of code earlier than r389146 and I have to manually switch it to some
other code tree and to continue my search there.


I agree that if this were implemented, it could be a revision property
so that administrator could change it any time if server
configurations are changed.

There is that "server-side config which 'broadcasts' to clients"
[1974] enhancement request, and how is that configuration stored on
the server?

Maybe some external configuration file? Or configuration file stored
in this/other repository that is announced using some svn property set
on r0?

[1974]
http://subversion.tigris.org/issues/show_bug.cgi?id=1974


Best regards,
Konstantin Kolinko