You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Julian Foad <ju...@apache.org> on 2018/07/31 11:04:23 UTC

Issue SVN-4767: svnadmin dump shouldn't canonicalize svn:date

https://issues.apache.org/jira/browse/SVN-4767

Running tests with the '--dump-load-cross-check' option revealed a minor discrepancy between the dump files produced by 'svnadmin dump' and 'svnrdump dump' in some test cases.

[...]
 K
 svn:date
 V
- 2015-01-01T00:00:00.000000Z
+ 2015-01-01T00:00:00.0Z
[...]

"svnadmin dump" "canonicalizes" each svn:date revprop while dumping, in the function write_revision_record().

This seems to have been done in r842390 in order to upgrade from pre-0.14 repository format to the new timestamp format introduced in 0.14 -- see issue SVN-614 "DAV:creationdate needs to be an ISO8601 date". svn_time_from_cstring() reads either new or old format, and then svn_time_to_cstring() writes the new format.

However, this does not only convert old to new format, but could also make textual changes to the string if the revprop value is not already canonical. Dump should carefully output exactly what is in the repository and not gratuitously change it. In retrospect, such a transformation should have been done during "svnadmin load" instead of in "dump".

While "svnadmin dump" makes this transformation, "svnrdump dump" and "svndumpfilter" do not. This could lead to unintended differences in dump output depending on which tool is used. (I made some progress in unifying the output logic for those three dump producers a couple of years ago, but I left this part alone because I did not know what to do with it.)

The discrepancy I noticed above comes from prop_tests.py 41 and 42 which explicitly propset svn:date to a value such as '2015-01-01T00:00:00.0Z'. "svnadmin dump" was munging this non-canonical value on output, which meant it was not a true dump. (Whether a non-canonical format should have been allowed into the repository in the first place is a separate issue.)

Therefore I will remove this code path. It even has a comment saying "### Remove this when it is no longer needed for sure" which referred to being needed for pre-0.14 upgrades. We definitely no longer need that.

-- 
- Julian

Re: Issue SVN-4767: svnadmin dump shouldn't canonicalize svn:date

Posted by Julian Foad <ju...@apache.org>.
Branko Čibej wrote:
> You've changed my mind: +1

OK, thanks for the review.

Committed, with a test: http://svn.apache.org/r1837151

-- 
- Julian

Re: Issue SVN-4767: svnadmin dump shouldn't canonicalize svn:date

Posted by Branko Čibej <br...@apache.org>.
On 31.07.2018 15:14, Julian Foad wrote:
> Before r871212 there was no validation of svn:date value entering the repo, I think, so old-format dates could have been put in.

Right. That's the edge-case I was thinking about.

> But if anyone put old-format dates into their repo, and even if users do currently have some tools that expect the conversion to happen, I still don't see that as a good enough reason why future 'svnadmin dump' should not read out exactly what's in there.
>
> It also happens to be a data bug that's extremely easy to fix -- just enable revprop changes and script the propedits.
>
>>  2. Our dumpfile format is documented. With your proposed change, there
>>     is a slight chance that "svnadmin dump" could produce invalid dump
>>     files.
> Invalid how? I don't see the contents of svn:date documented in 'notes/dump-load-format.txt'.

Ha, indeed, that document doesn't say anything concrete about the values
of reserved properties.

Anyway, I was just trying to pedantically think my way through the
possible edge cases. In retrospect, it's a pity we didn't make the
0.14->1.0 dump/load cycle work the way you're proposing now.

Also, you make a good point about scripting the fix before the dump;
these are revision properties, after all.

You've changed my mind: +1

-- Brane

Re: Issue SVN-4767: svnadmin dump shouldn't canonicalize svn:date

Posted by Julian Foad <ju...@apache.org>.
Branko Čibej wrote on 2018-07-31:
> On 31.07.2018 13:04, Julian Foad wrote:
> > https://issues.apache.org/jira/browse/SVN-4767
> >
> > Running tests with the '--dump-load-cross-check' option revealed a minor discrepancy between the dump files produced by 'svnadmin dump' and 'svnrdump dump' in some test cases.
> > - 2015-01-01T00:00:00.000000Z
> > + 2015-01-01T00:00:00.0Z

I wasn't clear but the repository contains ".0Z" in the revprop value, explicitly set by a "propset"; the ".000000Z" line is produced by "svnadmin dump" after going through from_cstring() and to_cstring(); and the ".0Z" line is produced by "svnrdump dump" which prints the property value sent by the server (after normalizing EOLs).

> > "svnadmin dump" "canonicalizes" each svn:date revprop while dumping, in the function write_revision_record().
> >
> > This seems to have been done in r842390 in order to upgrade from pre-0.14 repository format to the new timestamp format introduced in 0.14 -- see issue SVN-614 "DAV:creationdate needs to be an ISO8601 date". svn_time_from_cstring() reads either new or old format, and then svn_time_to_cstring() writes the new format.
> 
> There's another possible interpretation here: that the millisecond field
> should be 6 digits wide, but for some reason svn_time_to_cstring doesn't
> format it that way when the value is zero. [...]

That's not what's happening.

> > [...] such a transformation should have been done during "svnadmin load" [...]
> 
> Agreed, this fits with what "svnadmin load" already does with the
> --normalize-props option. But if the timestamp is *not* in the right
> format, "load" should fix it. Maybe "except with
> --bypass-prop-validation"? [...]
> 
> In any case I think the logic should be changed a bit: Only reformat the
> svn:date property if it is not already in ISO8601 format; otherwise,
> leave it alone.

I'm not yet sure what "load" should do.

> > While "svnadmin dump" makes this transformation, "svnrdump dump" and "svndumpfilter" do not. This could lead to unintended differences in dump output depending on which tool is used.
> 
> "svnadmin dump" should always define the canonical behaviour. Both
> "svnrdump" and "svndumpfilter" are newer; they should behave the way
> "svnadmin dump" does, not the other way around. One _could_ argue that
> this is an omission in svnrdump.

In general I agree that's the right way round, but in this case I think the overriding decision is that this is a bug in "svnadmin dump" that the other utilities happen to have not copied.

> > Therefore I will remove this code path. It even has a comment saying "### Remove this when it is no longer needed for sure" which referred to being needed for pre-0.14 upgrades. We definitely no longer need that.
> 
> +0, and I'll explain why:
> 
>  1. This is not a trivial change and should be documented in 1.11
>     release notes (hence, it stays on trunk until then).

Makes sense.

>     It's possible
>     that some 3rd-party tools rely on "svnadmin dump" producing
>     timestamps in the ISO format.

There are two kinds of change in question: changing old format to new (ISO) format, and canonicalizing details (such as number of decimal places) in an ISO format date. The only case where dump would stop producing ISO format is where the repository contains old format dates. That can't happen "naturally" because a dump-load was required between v0.14 and v1.0 (and the dump was programmed to convert the format). It could only happen if users have loaded those old-format dates in again.

Before r871212 there was no validation of svn:date value entering the repo, I think, so old-format dates could have been put in.

But if anyone put old-format dates into their repo, and even if users do currently have some tools that expect the conversion to happen, I still don't see that as a good enough reason why future 'svnadmin dump' should not read out exactly what's in there.

It also happens to be a data bug that's extremely easy to fix -- just enable revprop changes and script the propedits.

>  2. Our dumpfile format is documented. With your proposed change, there
>     is a slight chance that "svnadmin dump" could produce invalid dump
>     files.

Invalid how? I don't see the contents of svn:date documented in 'notes/dump-load-format.txt'.

>     This is in effect a corollary of point 1., above. Note that
>     "svndrump dump" is already "infected" by this potential bug, whereas
>     "svndumpfilter" is not because it starts with a presumably valid
>     dumpfile.


-- 
- Julian

Re: Issue SVN-4767: svnadmin dump shouldn't canonicalize svn:date

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Branko Čibej wrote on Tue, Jul 31, 2018 at 13:43:57 +0200:
> On 31.07.2018 13:04, Julian Foad wrote:
> > However, this does not only convert old to new format, but could also make textual changes to the string if the revprop value is not already canonical. Dump should carefully output exactly what is in the repository and not gratuitously change it. In retrospect, such a transformation should have been done during "svnadmin load" instead of in "dump".
> 
> Agreed, this fits with what "svnadmin load" already does with the
> --normalize-props option. But if the timestamp is *not* in the right
> format, "load" should fix it. Maybe "except with
> --bypass-prop-validation"? Another option that controls "load" behaviour
> WRT dates is --ignore-dates, though I can't remember why we introduced that.
> 

Presumably in order not to break the binary search algorithm underlying the
@{DATE} peg revision syntax.

Re: Issue SVN-4767: svnadmin dump shouldn't canonicalize svn:date

Posted by Branko Čibej <br...@apache.org>.
On 31.07.2018 13:04, Julian Foad wrote:
> https://issues.apache.org/jira/browse/SVN-4767
>
> Running tests with the '--dump-load-cross-check' option revealed a minor discrepancy between the dump files produced by 'svnadmin dump' and 'svnrdump dump' in some test cases.
>
> [...]
>  K
>  svn:date
>  V
> - 2015-01-01T00:00:00.000000Z
> + 2015-01-01T00:00:00.0Z
> [...]
>
> "svnadmin dump" "canonicalizes" each svn:date revprop while dumping, in the function write_revision_record().
>
> This seems to have been done in r842390 in order to upgrade from pre-0.14 repository format to the new timestamp format introduced in 0.14 -- see issue SVN-614 "DAV:creationdate needs to be an ISO8601 date". svn_time_from_cstring() reads either new or old format, and then svn_time_to_cstring() writes the new format.

There's another possible interpretation here: that the millisecond field
should be 6 digits wide, but for some reason svn_time_to_cstring doesn't
format it that way when the value is zero. I don't have the actual
standard, so I can't check what it says ... and, of course, in the case
of 0 there is no semantic difference.


> However, this does not only convert old to new format, but could also make textual changes to the string if the revprop value is not already canonical. Dump should carefully output exactly what is in the repository and not gratuitously change it. In retrospect, such a transformation should have been done during "svnadmin load" instead of in "dump".

Agreed, this fits with what "svnadmin load" already does with the
--normalize-props option. But if the timestamp is *not* in the right
format, "load" should fix it. Maybe "except with
--bypass-prop-validation"? Another option that controls "load" behaviour
WRT dates is --ignore-dates, though I can't remember why we introduced that.

In any case I think the logic should be changed a bit: Only reformat the
svn:date property if it is not already in ISO8601 format; otherwise,
leave it alone.

> While "svnadmin dump" makes this transformation, "svnrdump dump" and "svndumpfilter" do not. This could lead to unintended differences in dump output depending on which tool is used.

"svnadmin dump" should always define the canonical behaviour. Both
"svnrdump" and "svndumpfilter" are newer; they should behave the way
"svnadmin dump" does, not the other way around. One _could_ argue that
this is an omission in svnrdump.

> Therefore I will remove this code path. It even has a comment saying "### Remove this when it is no longer needed for sure" which referred to being needed for pre-0.14 upgrades. We definitely no longer need that.

+0, and I'll explain why:

 1. This is not a trivial change and should be documented in 1.11
    release notes (hence, it stays on trunk until then). It's possible
    that some 3rd-party tools rely on "svnadmin dump" producing
    timestamps in the ISO format.
 2. Our dumpfile format is documented. With your proposed change, there
    is a slight chance that "svnadmin dump" could produce invalid dump
    files. This is in effect a corollary of point 1., above. Note that
    "svndrump dump" is already "infected" by this potential bug, whereas
    "svndumpfilter" is not because it starts with a presumably valid
    dumpfile.


-- Brane


Re: Issue SVN-4767: svnadmin dump shouldn't canonicalize svn:date

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Julian Foad wrote on Tue, Jul 31, 2018 at 12:04:23 +0100:
> https://issues.apache.org/jira/browse/SVN-4767
> 
> Running tests with the '--dump-load-cross-check' option revealed a minor discrepancy between the dump files produced by 'svnadmin dump' and 'svnrdump dump' in some test cases.
> 
> [...]
>  K
>  svn:date
>  V
> - 2015-01-01T00:00:00.000000Z
> + 2015-01-01T00:00:00.0Z
> [...]
> 
> "svnadmin dump" "canonicalizes" each svn:date revprop while dumping, in the function write_revision_record().
> 
> This seems to have been done in r842390 in order to upgrade from pre-0.14 repository format to the new timestamp format introduced in 0.14 -- see issue SVN-614 "DAV:creationdate needs to be an ISO8601 date". svn_time_from_cstring() reads either new or old format, and then svn_time_to_cstring() writes the new format.
> 

One clarification:

Both "2015-01-01T00:00:00.000000Z" and "2015-01-01T00:00:00.0Z" are "new
format" timestamps.  (I assume they are handled identically, but perhaps some
piece of code somewhere malfunctions when the number of decimal places is
!= 6.)  The "old format" which that issue and comment refer to is this one:

[[[
/* Our old timestamp strings looked like this:
 * 
 *    "Tue 3 Oct 2000 HH:MM:SS.UUU (day 277, dst 1, gmt_off -18000)"
 *
 * The idea is that they are conventionally human-readable for the
 * first part, and then in parentheses comes everything else required
 * to completely fill in an apr_time_exp_t: tm_yday, tm_isdst,
 * and tm_gmtoff.
 *
 * This format is still recognized on input, for backward
 * compatibility, but no longer generated.
 */
static const char * const old_timestamp_format =
"%s %d %s %d %02d:%02d:%02d.%06d (day %03d, dst %d, gmt_off %06d)";
]]]

That string constant is still present in HEAD, but it's been converted to a macro
named OLD_TIMESTAMP_FORMAT.

> However, this does not only convert old to new format, but could also make textual changes to the string if the revprop value is not already canonical. Dump should carefully output exactly what is in the repository and not gratuitously change it. In retrospect, such a transformation should have been done during "svnadmin load" instead of in "dump".
> 
> While "svnadmin dump" makes this transformation, "svnrdump dump" and "svndumpfilter" do not. This could lead to unintended differences in dump output depending on which tool is used. (I made some progress in unifying the output logic for those three dump producers a couple of years ago, but I left this part alone because I did not know what to do with it.)
> 
> The discrepancy I noticed above comes from prop_tests.py 41 and 42 which explicitly propset svn:date to a value such as '2015-01-01T00:00:00.0Z'. "svnadmin dump" was munging this non-canonical value on output, which meant it was not a true dump. (Whether a non-canonical format should have been allowed into the repository in the first place is a separate issue.)
> 
> Therefore I will remove this code path. It even has a comment saying "### Remove this when it is no longer needed for sure" which referred to being needed for pre-0.14 upgrades. We definitely no longer need that.

+1 to having dump write the data verbatim.

Cheers,

Daniel