You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Julian Foad <ju...@btopenworld.com> on 2014/12/02 17:33:05 UTC

Issue #4476 - Mergeinfo containing r0 makes svnsync and svnadmin dump fail

http://subversion.tigris.org/issues/show_bug.cgi?id=4476

There's another part to this issue: 'svnadmin load'.

When mergeinfo refers to r0, not only does 'svnadmin dump' fail (fixed in r1574868) and 'svnsync' fail, but 'svnadmin load' also fails, as I found when I started writing a test for the 'svnsync' problem.

I think 'svnadmin load' should always be able to load a dump file in order to restore a backup. There is a 'validate properties' option, so I would say the loading should fail if that is enabled, and succeed if not enabled. When loading 'succeeds' it should load exactly what was in the dump file, and not try to 'correct' it, for any data that could in principle have been dumped out of a repository that was not utterly broken.

(If we want to consider automatically 'correcting' properties during loading, that should be a separate development and should be optional, in other words there should always be a way to load exactly what is in the dump file. That feature should probably be implemented at a higher level than libsvn_repos, too.)

But 'svnadmin load' can do more than just restore revisions exactly how they were: it can also renumber revisions and move all loaded paths into a subdirectory, and when doing either or both of these it attempts to adjust mergeinfo accordingly. In that case if mergeinfo is invalid then we have three options:

  - libsvn_repos throws an error
  - libsvn_repos warns and does not adjust that mergeinfo property
  - svnadmin 'corrects' the mergeinfo, and then libsvn_repos adjusts it

I think warning and not adjusting is better than throwing an error, but I am not sure if 'svnadmin load' should be in the business of correcting bad mergeinfo or if that would be better left to an external tool. I would say this mode of operation (adjusting revision numbers and/or paths) is a lower priority.


PROPOSAL

In the presence of a mergeinfo property that refers to r0, at the libsvn_repos API:

'dump'
  shall dump the property verbatim (with nowarning). (Done.)

'load', with 'validate properties' OFF and not adjusting revision numbers or paths,
  shall load theproperty verbatim, and warn.

'load', with 'validate properties' OFF and adjusting revision numbers or paths,
  shall load the property verbatim (unadjusted), and warn.

'load', with 'validate properties' ON,
  shall fail.

And I would say the same rules should apply to any property that fails the validation rules.

At the application layer:

'svnadmin dump' and 'svnadmin load'
  should behave the same as the repos layer.

Does that sound good as a fix that I can do now and back-port to 1.8.x and 1.7.x?

I'll come to 'svnsync' later, but basically my current thought is it should 'correct' the mergeinfo by removing the r0 reference.

- Julian

Re: Issue #4476 - Mergeinfo containing r0 makes svnsync and svnadmin dump fail

Posted by Branko Čibej <br...@wandisco.com>.
On 04.12.2014 19:40, Julian Foad wrote:
> I (Julian Foad) wrote:
>
>>>  I'll come to 'svnsync' later, but basically my current thought is it 
>>>  should 'correct' the mergeinfo by removing the r0 reference.
>> I have written a test for this, and hacked up code to do this by textual 
>> substitution. It isn't quite right -- for example it would go wrong if a 
>> path contained the character sequence ":0", among other issues. I 
>> ought to rewrite it in the form of a proper mergeinfo parser but one that is 
>> more lenient than the regular parser.
> Here's the patch in progress...

... no it isn't! :)

-- Brane


Re: Issue #4476 - Mergeinfo containing r0 makes svnsync and svnadmin dump fail

Posted by Julian Foad <ju...@btopenworld.com>.
I (Julian Foad) wrote:

>>  I'll come to 'svnsync' later, but basically my current thought is it 
>>  should 'correct' the mergeinfo by removing the r0 reference.
> 
> I have written a test for this, and hacked up code to do this by textual 
> substitution. It isn't quite right -- for example it would go wrong if a 
> path contained the character sequence ":0", among other issues. I 
> ought to rewrite it in the form of a proper mergeinfo parser but one that is 
> more lenient than the regular parser.

Here's the patch in progress...

- Julian

Re: Issue #4476 - Mergeinfo containing r0 makes svnsync and svnadmin dump fail

Posted by Julian Foad <ju...@btopenworld.com>.
Branko Čibej wrote:
> On 17.12.2014 17:13, Julian Foad wrote:
>>   * 'svnadmin load' always rewrites mergeinfo, which can change its form
>> even if it doesn't need to change it semantically. (Except, since 
>> r1643074, if it's unparsable, it loads it unchanged instead of bombing 
>> out.)
>> 
>>   * svnsync always 'normalizes' properties (encoding to UTF-8, EOL style to LF).
>>  
>> That's what we have. For 'svnadmin load' I think that's not 
>> ideal: there should be a way to force it to load exactly what's in the dump 
>> file, for data that *could* have been dumped from this version of repository and 
>> server.

That's now http://subversion.tigris.org/issues/show_bug.cgi?id=4539 .

> For 'svnsync' I think it's reasonable that it should try to 
> normalize metadata that may be considered invalid by the target repository, 
> although other approaches may be possible.
> 
> 
> Ack ... sounds reasonable, given current behaviour in general.

Thanks.

- Julian

Re: Issue #4476 - Mergeinfo containing r0 makes svnsync and svnadmin dump fail

Posted by Branko Čibej <br...@wandisco.com>.
On 17.12.2014 17:13, Julian Foad wrote:
> Branko Čibej wrote:
>> On 17.12.2014 15:52, Julian Foad wrote:
>>> In http://svn.apache.org/r1646250 I committed a version that also works by 
>>> textual substitution, but (hopefully) without the above-mentioned issues. This 
>>> makes svnsync remove r0 references before trying to commit.
>> Does this mean that a dump/load or sync from a repository with r0 in
>> mergeinfo will not create a new repository with identical contents?
> dump/load will preserve the mergeinfo property value exactly in this case.
>
> svnsync will change the mergeinfo property value in this case.
>
> This is not the only case where 'load' or 'sync' changes repo contents even when the revnums and paths are staying the same:
>
>   * 'svnadmin load' always rewrites mergeinfo, which can change its form
> even if it doesn't need to change it semantically. (Except, since 
> r1643074, if it's unparsable, it loads it unchanged instead of bombing 
> out.)
>
>   * svnsync always 'normalizes' properties (encoding to UTF-8, EOL style to LF).
>
> That's what we have. For 'svnadmin load' I think that's not ideal: there should be a way to force it to load exactly what's in the dump file, for data that *could* have been dumped from this version of repository and server. For 'svnsync' I think it's reasonable that it should try to normalize metadata that may be considered invalid by the target repository, although other approaches may be possible.


Ack ... sounds reasonable, given current behaviour in general.

-- Brane

Re: Issue #4476 - Mergeinfo containing r0 makes svnsync and svnadmin dump fail

Posted by Julian Foad <ju...@btopenworld.com>.
Branko Čibej wrote:
> On 17.12.2014 15:52, Julian Foad wrote:
>> In http://svn.apache.org/r1646250 I committed a version that also works by 
>> textual substitution, but (hopefully) without the above-mentioned issues. This 
>> makes svnsync remove r0 references before trying to commit.
> 
> Does this mean that a dump/load or sync from a repository with r0 in
> mergeinfo will not create a new repository with identical contents?

dump/load will preserve the mergeinfo property value exactly in this case.

svnsync will change the mergeinfo property value in this case.

This is not the only case where 'load' or 'sync' changes repo contents even when the revnums and paths are staying the same:

  * 'svnadmin load' always rewrites mergeinfo, which can change its form
even if it doesn't need to change it semantically. (Except, since 
r1643074, if it's unparsable, it loads it unchanged instead of bombing 
out.)

  * svnsync always 'normalizes' properties (encoding to UTF-8, EOL style to LF).

That's what we have. For 'svnadmin load' I think that's not ideal: there should be a way to force it to load exactly what's in the dump file, for data that *could* have been dumped from this version of repository and server. For 'svnsync' I think it's reasonable that it should try to normalize metadata that may be considered invalid by the target repository, although other approaches may be possible.

- Julian

Re: Issue #4476 - Mergeinfo containing r0 makes svnsync and svnadmin dump fail

Posted by Branko Čibej <br...@wandisco.com>.
On 17.12.2014 15:52, Julian Foad wrote:
> I (Julian Foad) wrote on 4th December:
>> Julian Foad wrote:
>>> ... 'svnsync' ... my current thought is it 
>>>   should 'correct' the mergeinfo by removing the r0 reference.
>> I have written a test for this, and hacked up code to do this by textual 
>> substitution. It isn't quite right -- for example it would go wrong if a 
>> path contained the character sequence ":0", among other issues. I 
>> ought to rewrite it in the form of a proper mergeinfo parser but one that is 
>> more lenient than the regular parser.
> In http://svn.apache.org/r1646250 I committed a version that also works by textual substitution, but (hopefully) without the above-mentioned issues. This makes svnsync remove r0 references before trying to
> commit.

Does this mean that a dump/load or sync from a repository with r0 in
mergeinfo will not create a new repository with identical contents?

-- Brane


Re: Issue #4476 - Mergeinfo containing r0 makes svnsync and svnadmin dump fail

Posted by Julian Foad <ju...@btopenworld.com>.
I (Julian Foad) wrote on 4th December:
> Julian Foad wrote:
>> ... 'svnsync' ... my current thought is it 
>>  should 'correct' the mergeinfo by removing the r0 reference.
> 
> I have written a test for this, and hacked up code to do this by textual 
> substitution. It isn't quite right -- for example it would go wrong if a 
> path contained the character sequence ":0", among other issues. I 
> ought to rewrite it in the form of a proper mergeinfo parser but one that is 
> more lenient than the regular parser.

In http://svn.apache.org/r1646250 I committed a version that also works by textual substitution, but (hopefully) without the above-mentioned issues. This makes svnsync remove r0 references before trying to
commit.

I did look into fixing this a different way, by making a more lenient mergeinfo parser. However, that opens up questions about how to define the syntax for mergeinfo in terms of hard requirements, optional variations, and a canonical form. That might have been a good approach to take when mergeinfo was originally invented, but it seems rather too late to go down that road now.

So is the issue fixed?

  * Yes: it should now be possible to svnsync the repository.

  * Yes: it should now be possible to dump and load the repository.

  * Maybe: 'svnrdump dump' and 'svnrdump load' may have the same problem and should be checked.

  * No: the underlying issue has not been addressed, that old mergeinfo that was previously 'working' is still present in repositories but is now being considered 'invalid' with no in-place way to upgrade it.

- Julian


Re: Issue #4476 - Mergeinfo containing r0 makes svnsync and svnadmin dump fail

Posted by Julian Foad <ju...@btopenworld.com>.
[Second attempt to attach the patch.]

I (Julian Foad) wrote:
>> I'll come to 'svnsync' later, but basically my current thought is it 
>> should 'correct' the mergeinfo by removing the r0 reference.
> 
> I have written a test for this, and hacked up code to do this by textual 
> substitution. It isn't quite right -- for example it would go wrong if a 
> path contained the character sequence ":0", among other issues. I 
> ought to rewrite it in the form of a proper mergeinfo parser but one that is 
> more lenient than the regular parser.

Here's the patch in progress...

- Julian

Re: Issue #4476 - Mergeinfo containing r0 makes svnsync and svnadmin dump fail

Posted by Julian Foad <ju...@btopenworld.com>.
Julian Foad wrote:
> http://subversion.tigris.org/issues/show_bug.cgi?id=4476
[...]
> PROPOSAL
> 
> In the presence of a mergeinfo property that refers to r0, at the libsvn_repos 
> API:
> 
> 'dump'
>   shall dump the property verbatim (with nowarning). (Done.)
> 
> 'load', with 'validate properties' OFF and not adjusting 
> revision numbers or paths,
>   shall load theproperty verbatim, and warn.
> 
> 'load', with 'validate properties' OFF and adjusting revision 
> numbers or paths,
>   shall load the property verbatim (unadjusted), and warn.
> 
> 'load', with 'validate properties' ON,
>   shall fail.
> 
> And I would say the same rules should apply to any property that fails the 
> validation rules.
> 
> At the application layer:
> 
> 'svnadmin dump' and 'svnadmin load'
>   should behave the same as the repos layer.
> 
> Does that sound good as a fix that I can do now and back-port to 1.8.x and 
> 1.7.x?

I have done this, and proposed it for backport to 1.7 and 1.8.

(My fix adds a new enumeration constant to an enumeration type, for the warning. For back-porting, I renamed this constant to a double-underscore name to indicate it's not for public use in the 1.7/1.8 APIs.)

> I'll come to 'svnsync' later, but basically my current thought is it 
> should 'correct' the mergeinfo by removing the r0 reference.

I have written a test for this, and hacked up code to do this by textual substitution. It isn't quite right -- for example it would go wrong if a path contained the character sequence ":0", among other issues. I ought to rewrite it in the form of a proper mergeinfo parser but one that is more lenient than the regular parser.

I also noticed that loading from a dump file also modifies mergeinfo in other ways such as removing references to r1. I don't think is a good idea: I think there should be a way to load exactly what is in the dump. But that's a different, though related, issue.

- Julian