You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Julian Foad <ju...@btopenworld.com> on 2014/10/01 11:52:42 UTC

Official way to create an empty revision

Daniel Shahaf wrote in the thread "No no-op changes":
> Should we provide an "official" way to create an empty revision?  That
> is, a revision whose changed-paths list is empty?
> 
> Use-cases:
> 
> 1. Suppose last backup is r100 and revisions r101:r105 were lost; then
> after restoring the backup, the admin would create 5 empty revisions.
> 
> 2. Force an empty revision for whatever reason, such as to make the
> revnums sync to something:
> 2.1. See r3 of the regression test merge_tests.py#125 svnmucc_abuse_1().
> 2.2. W hen loading our repository to the ASF repository, if Joe had
> created 26 empty revisions, then The Offset would have been 840100
> rather than 840074, which would make our mental math easier.

Hi Daniel. It seems a reasonable tool to have in the svn admin's tool kit. Perhaps not often, but people do sometimes want this. I found two web pages where people discussed this. One wrote a script that spits out the appropriate few lines of dump file text to represent an empty rev, N times [1]; the other is worse, committing N changes to a temporary repo, dumping it and filtering everything out [2].

I'm assuming this proposal is restricted to the admin side. Your use cases 1. and 2.2 are both admin use cases. Your use case 2.1 is a test which uses a client-side commit to make an uninteresting revision, in order to make the subsequent revision numbers match (modulo 10) those in the original use case. While people no doubt do this sort of thing sometimes in real life, I can't think of a general behaviour that would make sense from the client side. In a shared repository, you never know what revision number your next commit will have.

For a UI, I can envisage two useful ways to expose this functionality: commit N empty revisions as a stand-alone operation, and commit N empty revisions before the first revision loaded from a dump stream. Obviously the former is sufficient; the latter is convenient but is insufficient on its own unless we also give 'svnadmin load' a convenient way to specify there is no dump stream to load.

Stand-alone:

  svnadmin/svnlook commit-empty-revs N
    Commit N empty revisions.

As an option to 'svnadmin load':

  svnadmin load --prefix-empty-revs N
    First commit N empty revisions.

or, expressing a similar behaviour in a different way:

  svnadmin load --commit-first-loaded-rev-as X
    First commit enough empty revisions to make the first loaded revision
    be committed as revision number X.

What should the author and log message be on the empty revs? I suppose these need to be optionally specified, defaulting to blank?

What should the date stamps be on the empty revs? A thought: it seems cleaner to specify that they should all have the same date stamp than that they do/don't/may all have different date stamps. (Imagine a future back-end in which we can create millions of 'virtual' empty revs in O(1) time and space as long as their rev-props are all identical.) The default for 'svnadmin load --prefix-empty-revs' without '--ignore-dates' ("ignore revision date stamps found in the stream") should, I suppose, be that all the prefix empty revs have the same date stamp as the first revision loaded.

- Julian

[1] <http://www.timj.co.uk/2011/09/generating-emptypadding-revisions-in-an-svn-dump/>
[2] <http://stackoverflow.com/questions/7030041/can-i-create-a-subversion-repository-starting-at-another-number>

Re: Official way to create an empty revision

Posted by Julian Foad <ju...@btopenworld.com>.
I should have said, too: I'm not planning on taking this any further, myself. Anybody that wants to... feel free. (Discuss on this list first, as usual, of course.)

- Julian

Julian Foad wrote:
> Daniel Shahaf wrote in the thread "No no-op changes":
>>  Should we provide an "official" way to create an empty revision? That
>>  is, a revision whose changed-paths list is empty?
>> 
>>  Use-cases:
>> 
>>  1. Suppose last backup is r100 and revisions r101:r105 were lost; then
>>  after restoring the backup, the admin would create 5 empty revisions.
>> 
>>  2. Force an empty revision for whatever reason, such as to make the
>>  revnums sync to something:
>>  2.1. See r3 of the regression test merge_tests.py#125 svnmucc_abuse_1().
>>  2.2. W hen loading our repository to the ASF repository, if Joe had
>>  created 26 empty revisions, then The Offset would have been 840100
>>  rather than 840074, which would make our mental math easier.
> 
> Hi Daniel. It seems a reasonable tool to have in the svn admin's tool kit. 
> Perhaps not often, but people do sometimes want this. I found two web pages 
> where people discussed this. One wrote a script that spits out the appropriate 
> few lines of dump file text to represent an empty rev, N times [1]; the other is 
> worse, committing N changes to a temporary repo, dumping it and filtering 
> everything out [2].
> 
> I'm assuming this proposal is restricted to the admin side. Your use cases 
> 1. and 2.2 are both admin use cases. Your use case 2.1 is a test which uses a 
> client-side commit to make an uninteresting revision, in order to make the 
> subsequent revision numbers match (modulo 10) those in the original use case. 
> While people no doubt do this sort of thing sometimes in real life, I can't 
> think of a general behaviour that would make sense from the client side. In a 
> shared repository, you never know what revision number your next commit will 
> have.
> 
> For a UI, I can envisage two useful ways to expose this functionality: commit N 
> empty revisions as a stand-alone operation, and commit N empty revisions before 
> the first revision loaded from a dump stream. Obviously the former is 
> sufficient; the latter is convenient but is insufficient on its own unless we 
> also give 'svnadmin load' a convenient way to specify there is no dump 
> stream to load.
> 
> Stand-alone:
> 
>   svnadmin/svnlook commit-empty-revs N
>     Commit N empty revisions.
> 
> As an option to 'svnadmin load':
> 
>   svnadmin load --prefix-empty-revs N
>     First commit N empty revisions.
> 
> or, expressing a similar behaviour in a different way:
> 
>   svnadmin load --commit-first-loaded-rev-as X
>     First commit enough empty revisions to make the first loaded revision
>     be committed as revision number X.
> 
> What should the author and log message be on the empty revs? I suppose these 
> need to be optionally specified, defaulting to blank?
> 
> What should the date stamps be on the empty revs? A thought: it seems cleaner to 
> specify that they should all have the same date stamp than that they 
> do/don't/may all have different date stamps. (Imagine a future back-end in 
> which we can create millions of 'virtual' empty revs in O(1) time and 
> space as long as their rev-props are all identical.) The default for 
> 'svnadmin load --prefix-empty-revs' without '--ignore-dates' 
> ("ignore revision date stamps found in the stream") should, I suppose, 
> be that all the prefix empty revs have the same date stamp as the first revision 
> loaded.
> 
> - Julian
> 
> [1] 
> <http://www.timj.co.uk/2011/09/generating-emptypadding-revisions-in-an-svn-dump/>
> [2] 
> <http://stackoverflow.com/questions/7030041/can-i-create-a-subversion-repository-starting-at-another-number>
>


Re: Official way to create an empty revision

Posted by Julian Foad <ju...@btopenworld.com>.
Daniel Shahaf wrote:

> If we include the svnrdump change (do we have an explicit use-case for
> doing 'bump' from the client side?), we should also document a way for
> a pre-commit hook¹ to reject 'svnmucc bump' commits.

We don't have a good use case for 'bump' from the client side in general. In relation to "svnrdump load", a user who has exclusive access to the repository (so can guarantee no simultaneous commits) should be provided with a way to start the loaded revisions at a particular revision number, the same as for "svnadmin load". However, we should think carefully about how this relates to using svnrdump with non-exclusive commit access. For example, it might make more sense to design a way for the client to specify to the server:

   "Here's a commit with changes in it;
    please commit it as revision number R,
    inserting empty revs as needed before it;
    or fail if you have already passed R."

instead of simply "insert N empty revs".

- Julian

Efficiently mirroring a subtree (was: Re: Official way to create an empty revision)

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Shahaf wrote on Mon, Oct 06, 2014 at 08:41:44 +0000:
> Julian Foad wrote on Mon, Oct 06, 2014 at 08:30:10 +0100:
> > Daniel Shahaf wrote:
> > 
> > > Konstantin Kolinko wrote on Thu, Oct 02, 2014 at 03:40:51 +0400:
> > >>  My thought:
> > >> 
> > >>  svnadmin bump -m "message" REPOS_PATH
> > >>  svnrdump bump -m "message" URL
> > >> 
> > >>  The command creates 1 empty revision and thus bumps the repository
> > >>  revision number. It can be repeated in a loop as necessary.
> > > 
> > > Two proof-of-concept patches implementing this are attached. [...]
> > 
> > The Subversion project history 
> > starts at revision 836420 in the ASF repository. If I want to clone it, 
> > one use case for this feature would be to initialize my new repository 
> > with 836420 empty revisions. An external loop is going to be slow on 
> > this scale. On my machine with SSD disk, "svnmucc mkdir file://..." 
> > takes 1/8 sec and even "svnadmin delrevprop" takes 1/25 sec, so that's 
> > looking like taking a substantial proportion of a *day* to complete 836420
> >  commits.
> > 
> > That's one reason why I think the UI should allow specifying how many
> > revisions to create. Even if an initial implementation with an
> > internal loop is currently no faster, at least it opens the
> > possibility of changing the implementation later.
> 
> For this particular use-case, a way to make the FS layer treat some
> revisions as empty without physically storing 836420 files containing
> only a root noderev each would be even better.

For example, we could write an FS wrapper provider that, given
a strictly increasing sequence of positive integers a₁, a₂, …,
stores revision $a_i$ as revision $i$ in a wrapped filesystem.  Requests
for intermediate revisions would be computed on-the-fly as though all
revisions between revision $a_i$ and revision $a_j$ had been "bump"
revisions.

The simplest implementation would confine itself to sequences of the
form { a_i = i + CONSTANT }, but the generalization to arbitrary
strictly-increasing sequences also avoids storing interemediate
revisions (commits to subtrees other than ^/subversion) in the mirror.

Daniel

Re: Official way to create an empty revision

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Julian Foad wrote on Mon, Oct 06, 2014 at 08:30:10 +0100:
> Daniel Shahaf wrote:
> 
> > Konstantin Kolinko wrote on Thu, Oct 02, 2014 at 03:40:51 +0400:
> >>  My thought:
> >> 
> >>  svnadmin bump -m "message" REPOS_PATH
> >>  svnrdump bump -m "message" URL
> >> 
> >>  The command creates 1 empty revision and thus bumps the repository
> >>  revision number. It can be repeated in a loop as necessary.
> > 
> > Two proof-of-concept patches implementing this are attached. [...]
> 
> The Subversion project history 
> starts at revision 836420 in the ASF repository. If I want to clone it, 
> one use case for this feature would be to initialize my new repository 
> with 836420 empty revisions. An external loop is going to be slow on 
> this scale. On my machine with SSD disk, "svnmucc mkdir file://..." 
> takes 1/8 sec and even "svnadmin delrevprop" takes 1/25 sec, so that's 
> looking like taking a substantial proportion of a *day* to complete 836420
>  commits.
> 
> That's one reason why I think the UI should allow specifying how many
> revisions to create. Even if an initial implementation with an
> internal loop is currently no faster, at least it opens the
> possibility of changing the implementation later.

For this particular use-case, a way to make the FS layer treat some
revisions as empty without physically storing 836420 files containing
only a root noderev each would be even better.

That said, though, I don't disagree with your analysis; pushing the loop
downwards into svnadmin.c/svnmucc.c is one of the things that could
change between the mockup and the final implementation.

If we include the svnrdump change (do we have an explicit use-case for
doing 'bump' from the client side?), we should also document a way for
a pre-commit hook¹ to reject 'svnmucc bump' commits.  (similar to the
hook script in <http://subversion.apache.org/docs/release-notes/1.7#svnrdump-race>)

Daniel

¹ Or start-commit hook, if it gets the intended changed-paths list by
then (I believe Ev2 will do that).

Re: Official way to create an empty revision

Posted by Julian Foad <ju...@btopenworld.com>.
Daniel Shahaf wrote:

> Konstantin Kolinko wrote on Thu, Oct 02, 2014 at 03:40:51 +0400:
>>  My thought:
>> 
>>  svnadmin bump -m "message" REPOS_PATH
>>  svnrdump bump -m "message" URL
>> 
>>  The command creates 1 empty revision and thus bumps the repository
>>  revision number. It can be repeated in a loop as necessary.
> 
> Two proof-of-concept patches implementing this are attached. [...]

The Subversion project history 
starts at revision 836420 in the ASF repository. If I want to clone it, 
one use case for this feature would be to initialize my new repository 
with 836420 empty revisions. An external loop is going to be slow on 
this scale. On my machine with SSD disk, "svnmucc mkdir file://..." 
takes 1/8 sec and even "svnadmin delrevprop" takes 1/25 sec, so that's 
looking like taking a substantial proportion of a *day* to complete 836420
 commits.

That's one reason why I think the UI should allow specifying how many revisions to create. Even if an initial implementation with an internal loop is currently no faster, at least it opens the possibility of changing the implementation later.

- Julian


Re: Official way to create an empty revision

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Konstantin Kolinko wrote on Thu, Oct 02, 2014 at 03:40:51 +0400:
> 2014-10-02 2:59 GMT+04:00 Daniel Shahaf <d....@daniel.shahaf.name>:
> > Julian Foad wrote on Wed, Oct 01, 2014 at 10:52:42 +0100:
> >> Daniel Shahaf wrote in the thread "No no-op changes":
> >> > Should we provide an "official" way to create an empty revision?  That
> >> > is, a revision whose changed-paths list is empty?
> >> >
> >> > Use-cases:
> >> >
> >> > 1. Suppose last backup is r100 and revisions r101:r105 were lost; then
> >> > after restoring the backup, the admin would create 5 empty revisions.
> >> >
> >> > 2. Force an empty revision for whatever reason, such as to make the
> >> > revnums sync to something:
> >> > 2.1. See r3 of the regression test merge_tests.py#125 svnmucc_abuse_1().
> >> > 2.2. W hen loading our repository to the ASF repository, if Joe had
> >> > created 26 empty revisions, then The Offset would have been 840100
> >> > rather than 840074, which would make our mental math easier.
> >>
> >> What should the author and log message be on the empty revs? I suppose
> >> these need to be optionally specified, defaulting to blank?
> >>
> 
> My thought:
> 
> svnadmin bump -m "message" REPOS_PATH
> svnrdump bump -m "message" URL
> 
> The command creates 1 empty revision and thus bumps the repository
> revision number. It can be repeated in a loop as necessary.
> 

Two proof-of-concept patches implementing this are attached.  They work,
but some bells and whistles are missing (e.g., setting the log message
from 'svnadmin bump' to a boilerplate value or to a user-provided value).

I'm not going to commit them just yet --- I'd rather let the idea soak
for a bit.  (In part because the 1.9 branch point is around the corner.)

> > The author of a revision created by svnadmin is "the administrator"; we
> > have not defined a way to represent this value in an svn:author revprop.
> >
> 
> If dependent tools can deal with missing svn:author property then it
> is OK to do not have one.
> 
> Looking at authors.txt file used to configure svn-git mirroring for
> ASF [1], there is the following line at the end of the file:
> 
> (no author) = No Author <de...@apache.org>
> 
> So I think git-svn can deal with revisions that do not have svn:author
> property, and thus it is safe to create such revisions.
> 

And even if git-svn didn't implement this, it would _still_ be okay to
create revisions with no svn:author revprops, since we stipulated that
was permissible a decade+ ago when we defined our API.  Naturally, if
that were the case, we'd heads-up the git-svn folks and work with them
to minimize the impact on users.

Re: Official way to create an empty revision

Posted by Konstantin Kolinko <kn...@gmail.com>.
2014-10-02 2:59 GMT+04:00 Daniel Shahaf <d....@daniel.shahaf.name>:
> Julian Foad wrote on Wed, Oct 01, 2014 at 10:52:42 +0100:
>> Daniel Shahaf wrote in the thread "No no-op changes":
>> > Should we provide an "official" way to create an empty revision?  That
>> > is, a revision whose changed-paths list is empty?
>> >
>> > Use-cases:
>> >
>> > 1. Suppose last backup is r100 and revisions r101:r105 were lost; then
>> > after restoring the backup, the admin would create 5 empty revisions.
>> >
>> > 2. Force an empty revision for whatever reason, such as to make the
>> > revnums sync to something:
>> > 2.1. See r3 of the regression test merge_tests.py#125 svnmucc_abuse_1().
>> > 2.2. W hen loading our repository to the ASF repository, if Joe had
>> > created 26 empty revisions, then The Offset would have been 840100
>> > rather than 840074, which would make our mental math easier.
>>
>> What should the author and log message be on the empty revs? I suppose
>> these need to be optionally specified, defaulting to blank?
>>

My thought:

svnadmin bump -m "message" REPOS_PATH
svnrdump bump -m "message" URL

The command creates 1 empty revision and thus bumps the repository
revision number. It can be repeated in a loop as necessary.

> The log message should default to a stock log message (like
> SVNAutoversioning and 'svndumpfilter exclude' without --renumber-revs),
> not to an empty one.  As in SVNAutoversioning, we might want an
> svn:empty-revision boolean revprop.

+1

> The author of a revision created by svnadmin is "the administrator"; we
> have not defined a way to represent this value in an svn:author revprop.
>

If dependent tools can deal with missing svn:author property then it
is OK to do not have one.

Looking at authors.txt file used to configure svn-git mirroring for
ASF [1], there is the following line at the end of the file:

(no author) = No Author <de...@apache.org>

So I think git-svn can deal with revisions that do not have svn:author
property, and thus it is safe to create such revisions.

>> What should the date stamps be on the empty revs? A thought: it seems
>> cleaner to specify that they should all have the same date stamp than
>> that they do/don't/may all have different date stamps. (Imagine a
>> future back-end in which we can create millions of 'virtual' empty
>> revs in O(1) time and space as long as their rev-props are all
>> identical.) The default for 'svnadmin load --prefix-empty-revs'
>> without '--ignore-dates' ("ignore revision date stamps found in the
>> stream") should, I suppose, be that all the prefix empty revs have the
>> same date stamp as the first revision loaded.
>>
>
> I don't see what API-consumer-level purpose having the same svn:date
> would serve; it seems to me it would suffice to guarantee
>
>     r100 < r101 ≤ r102 ≤ r103 ≤ r104 ≤ r105 < r106
>
>     (where "x < y" ⇔ "svn:date value of y is younger than that of x",
>     and the promise regarding r106 is conditional upon that revision
>     being a "normal" commit ( load operation)).
>
> If the purpose of the same-dateness was to make detecting the range
> easier, we could achieve that explicitly by setting
> svn:empty-revisions-start=101 and svn:empty-revisions-end=105 revprops
> on each revision in the range (r101:r105).
>

Use the current date on an empty repository,
use date of the previous (HEAD) revision on a non-empty repository?

Use date of the first commit in the loaded dump file if this feature
is implemented as an option to svnadmin load, svnrdump load?

I think just using the current date is more straightforward. An
administrator can change it later with svn propset. If one loads
several disjoint dumps from different sources it is likely that the
dates are already messed up.


[1] http://wiki.apache.org/general/GitAtApache

Best regards,
Konstantin Kolinko

Re: Official way to create an empty revision

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Shahaf wrote on Wed, Oct 01, 2014 at 22:59:53 +0000:
> I don't see what API-consumer-level purpose having the same svn:date
> would serve; it seems to me it would suffice to guarantee
> 
>     r100 < r101 ≤ r102 ≤ r103 ≤ r104 ≤ r105 < r106
> 
>     (where "x < y" ⇔ "svn:date value of y is younger than that of x",
>     and the promise regarding r106 is conditional upon that revision
>     being a "normal" commit ( load operation)).

That should have said: 
      being a "normal" commit (not part of a load operation)).

Re: Official way to create an empty revision

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Julian Foad wrote on Wed, Oct 01, 2014 at 10:52:42 +0100:
> Daniel Shahaf wrote in the thread "No no-op changes":
> > Should we provide an "official" way to create an empty revision?  That
> > is, a revision whose changed-paths list is empty?
> > 
> > Use-cases:
> > 
> > 1. Suppose last backup is r100 and revisions r101:r105 were lost; then
> > after restoring the backup, the admin would create 5 empty revisions.
> > 
> > 2. Force an empty revision for whatever reason, such as to make the
> > revnums sync to something:
> > 2.1. See r3 of the regression test merge_tests.py#125 svnmucc_abuse_1().
> > 2.2. W hen loading our repository to the ASF repository, if Joe had
> > created 26 empty revisions, then The Offset would have been 840100
> > rather than 840074, which would make our mental math easier.
> 
> What should the author and log message be on the empty revs? I suppose
> these need to be optionally specified, defaulting to blank?
> 

The log message should default to a stock log message (like
SVNAutoversioning and 'svndumpfilter exclude' without --renumber-revs),
not to an empty one.  As in SVNAutoversioning, we might want an
svn:empty-revision boolean revprop.

The author of a revision created by svnadmin is "the administrator"; we
have not defined a way to represent this value in an svn:author revprop.

> What should the date stamps be on the empty revs? A thought: it seems
> cleaner to specify that they should all have the same date stamp than
> that they do/don't/may all have different date stamps. (Imagine a
> future back-end in which we can create millions of 'virtual' empty
> revs in O(1) time and space as long as their rev-props are all
> identical.) The default for 'svnadmin load --prefix-empty-revs'
> without '--ignore-dates' ("ignore revision date stamps found in the
> stream") should, I suppose, be that all the prefix empty revs have the
> same date stamp as the first revision loaded.
> 

I don't see what API-consumer-level purpose having the same svn:date
would serve; it seems to me it would suffice to guarantee

    r100 < r101 ≤ r102 ≤ r103 ≤ r104 ≤ r105 < r106

    (where "x < y" ⇔ "svn:date value of y is younger than that of x",
    and the promise regarding r106 is conditional upon that revision
    being a "normal" commit ( load operation)).

If the purpose of the same-dateness was to make detecting the range
easier, we could achieve that explicitly by setting
svn:empty-revisions-start=101 and svn:empty-revisions-end=105 revprops
on each revision in the range (r101:r105).

Thanks for your feedback, Julian.  I've filed issue #4521 to track this:
http://subversion.tigris.org/issues/show_bug.cgi?id=4521 "svnadmin should provide a way to create empty revisions"

Daniel

> - Julian
> 
> [1] <http://www.timj.co.uk/2011/09/generating-emptypadding-revisions-in-an-svn-dump/>
> [2] <http://stackoverflow.com/questions/7030041/can-i-create-a-subversion-repository-starting-at-another-number>