You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@subversion.apache.org by st...@apache.org on 2017/10/31 16:14:25 UTC

svn commit: r1813898 - in /subversion/trunk/subversion: libsvn_fs_fs/transaction.c libsvn_repos/reporter.c tests/cmdline/basic_tests.py tests/cmdline/svnadmin_tests.py

Author: stsp
Date: Tue Oct 31 16:14:25 2017
New Revision: 1813898

URL: http://svn.apache.org/viewvc?rev=1813898&view=rev
Log:
Fix issue #4700 for property changes.
Coincidentally, this commit also fixes issue #4623 for FSFS.

If rep-sharing is disabled, switching from svn_fs_props_different()
to svn_fs_props_changed() fixes issue #4700 for properties, similar
to how r1813794 fixed it for file content.

However, if rep-sharing is enabled, svn_fs_props_changed() does not work
as advertised because properties do not carry a SHA1 checksum with a
"uniquifier" which identifies the transaction they were created in.
The uniquifier is used by svn_fs_props_changed() to tell apart property
representations which share content but were created in different revisions.

To fix that problem, make FSFS write SHA1 checksums along with uniquifiers
for file properties, just as it is already done for file content.

A source code comment indicates that SHA1 checksums for property reps
were not written due to concerns over disk space. In hindsight, this was
a bad trade-off because it affected correctness of svn_fs_props_changed().

* subversion/libsvn_fs_fs/transaction.c
  (svn_fs_fs__set_proplist, write_final_rev): Create uniquifiers on prop reps.

* subversion/libsvn_repos/reporter.c
  (update_entry): Use svn_fs_props_changed() instead of using
   svn_fs_props_different(). The reasoning is the same as for the
   file content fix committed in r1813794.

* basic_tests.py
  (null_prop_update_last_changed_revision): This test now passes for FSFS.
   It still fails for BDB, though.

* subversion/tests/cmdline/svnadmin_tests.py
  (verify_non_utf8_paths): Update test expectations. This test messes about
   with revision file data and thus needs an update to keep passing.
  (dump_no_op_prop_change): This test now passes as well.

Modified:
    subversion/trunk/subversion/libsvn_fs_fs/transaction.c
    subversion/trunk/subversion/libsvn_repos/reporter.c
    subversion/trunk/subversion/tests/cmdline/basic_tests.py
    subversion/trunk/subversion/tests/cmdline/svnadmin_tests.py

Modified: subversion/trunk/subversion/libsvn_fs_fs/transaction.c
URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_fs_fs/transaction.c?rev=1813898&r1=1813897&r2=1813898&view=diff
==============================================================================
--- subversion/trunk/subversion/libsvn_fs_fs/transaction.c (original)
+++ subversion/trunk/subversion/libsvn_fs_fs/transaction.c Tue Oct 31 16:14:25 2017
@@ -2697,6 +2697,8 @@ svn_fs_fs__set_proplist(svn_fs_t *fs,
     {
       noderev->prop_rep = apr_pcalloc(pool, sizeof(*noderev->prop_rep));
       noderev->prop_rep->txn_id = *svn_fs_fs__id_txn_id(noderev->id);
+      SVN_ERR(set_uniquifier(fs, noderev->prop_rep, pool));
+      noderev->prop_rep->revision = SVN_INVALID_REVNUM;
       SVN_ERR(svn_fs_fs__put_node_revision(fs, noderev->id, noderev, FALSE,
                                            pool));
     }
@@ -3259,7 +3261,8 @@ write_final_rev(const svn_fs_id_t **new_
                              ? SVN_FS_FS__ITEM_TYPE_DIR_PROPS
                              : SVN_FS_FS__ITEM_TYPE_FILE_PROPS;
       SVN_ERR(svn_fs_fs__get_proplist(&proplist, fs, noderev, pool));
-
+      noderev->prop_rep->txn_id = *txn_id;
+      SVN_ERR(set_uniquifier(fs, noderev->prop_rep, pool));
       noderev->prop_rep->revision = rev;
 
       if (ffd->deltify_properties)
@@ -3328,14 +3331,11 @@ write_final_rev(const svn_fs_id_t **new_
         }
     }
 
-  /* don't serialize SHA1 for dirs to disk (waste of space) */
+  /* don't serialize SHA1 for dir content to disk (waste of space) */
+  /* ### Could clients record bogus last-changed-revisions (issue #4700)? */
   if (noderev->data_rep && noderev->kind == svn_node_dir)
     noderev->data_rep->has_sha1 = FALSE;
 
-  /* don't serialize SHA1 for props to disk (waste of space) */
-  if (noderev->prop_rep)
-    noderev->prop_rep->has_sha1 = FALSE;
-
   /* Workaround issue #4031: is-fresh-txn-root in revision files. */
   noderev->is_fresh_txn_root = FALSE;
 

Modified: subversion/trunk/subversion/libsvn_repos/reporter.c
URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_repos/reporter.c?rev=1813898&r1=1813897&r2=1813898&view=diff
==============================================================================
--- subversion/trunk/subversion/libsvn_repos/reporter.c (original)
+++ subversion/trunk/subversion/libsvn_repos/reporter.c Tue Oct 31 16:14:25 2017
@@ -973,8 +973,8 @@ update_entry(report_baton_t *b, svn_revn
           if (s_root == NULL)
             SVN_ERR(get_source_root(b, &s_root, s_rev));
 
-          SVN_ERR(svn_fs_props_different(&changed, s_root, s_path,
-                                         b->t_root, t_path, pool));
+          SVN_ERR(svn_fs_props_changed(&changed, s_root, s_path,
+                                       b->t_root, t_path, pool));
           if (!changed)
             SVN_ERR(svn_fs_contents_changed(&changed, s_root, s_path,
                                             b->t_root, t_path, pool));

Modified: subversion/trunk/subversion/tests/cmdline/basic_tests.py
URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/tests/cmdline/basic_tests.py?rev=1813898&r1=1813897&r2=1813898&view=diff
==============================================================================
--- subversion/trunk/subversion/tests/cmdline/basic_tests.py (original)
+++ subversion/trunk/subversion/tests/cmdline/basic_tests.py Tue Oct 31 16:14:25 2017
@@ -3168,7 +3168,7 @@ def null_update_last_changed_revision(sb
                                      '--show-item', 'last-changed-revision')
 
 @Issue(4700)
-@XFail()
+@XFail(svntest.main.is_fs_type_bdb)
 def null_prop_update_last_changed_revision(sbox):
   "null 'property update' updates last changed rev"
 

Modified: subversion/trunk/subversion/tests/cmdline/svnadmin_tests.py
URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/tests/cmdline/svnadmin_tests.py?rev=1813898&r1=1813897&r2=1813898&view=diff
==============================================================================
--- subversion/trunk/subversion/tests/cmdline/svnadmin_tests.py (original)
+++ subversion/trunk/subversion/tests/cmdline/svnadmin_tests.py Tue Oct 31 16:14:25 2017
@@ -1676,12 +1676,12 @@ def verify_non_utf8_paths(sbox):
     if line == b"A\n":
       # replace 'A' with a latin1 character -- the new path is not valid UTF-8
       fp_new.write(b"\xE6\n")
-    elif line == b"text: 1 279 32 32 d63ecce65d8c428b86f4f8b0920921fe\n":
+    elif line == b"text: 1 340 32 32 a6be7b4cf075fd39e6a99eb69a31232b\n":
       # phys, PLAIN directories: fix up the representation checksum
-      fp_new.write(b"text: 1 279 32 32 b50b1d5ed64075b5f632f3b8c30cd6b2\n")
-    elif line == b"text: 1 292 44 32 a6be7b4cf075fd39e6a99eb69a31232b\n":
+      fp_new.write(b"text: 1 340 32 32 f2e93e73272cac0f18fccf16f224eb93\n")
+    elif line == b"text: 1 340 44 32 a6be7b4cf075fd39e6a99eb69a31232b\n":
       # phys, deltified directories: fix up the representation checksum
-      fp_new.write(b"text: 1 292 44 32 f2e93e73272cac0f18fccf16f224eb93\n")
+      fp_new.write(b"text: 1 340 44 32 f2e93e73272cac0f18fccf16f224eb93\n")
     elif line == b"text: 1 6 31 31 90f306aa9bfd72f456072076a2bd94f7\n":
       # log addressing: fix up the representation checksum
       fp_new.write(b"text: 1 6 31 31 db2d4a0bad5dff0aea9a288dec02f1fb\n")
@@ -3350,7 +3350,7 @@ def dump_no_op_change(sbox):
   svntest.actions.run_and_verify_svn(expected, [], 'log',  '-v',
                                      sbox2.repo_url + '/bar')
 
-@XFail() # This test will XPASS on FSFS if rep-caching is disabled.
+@XFail(svntest.main.is_fs_type_bdb)
 @Issue(4623)
 def dump_no_op_prop_change(sbox):
   "svnadmin dump with no-op property change"



Re: svn commit: r1813898 - in /subversion/trunk/subversion: libsvn_fs_fs/transaction.c libsvn_repos/reporter.c tests/cmdline/basic_tests.py tests/cmdline/svnadmin_tests.py

Posted by Stefan Fuhrmann <st...@apache.org>.

On 25.11.2017 21:27, Evgeny Kotkov wrote:
> Stefan Fuhrmann <st...@apache.org> writes:
> 
>>> An alternative approach that might be worth considering here would be:
>>>
>>>    (1) Extend the on-disk format and allow representation strings without
>>>        SHA1, but with the uniquifier, something like this (where "-" stands
>>>        for "no SHA1"):
>>>
>>>        15 0 563 7809 28ef320a82e7bd11eebdf3502d69e608 - 14-g/_5
>>>
>>>        (The new format would be allowed starting from FSFS 8.)
>>
>> Yes, that would be one option. If you are willing to provide a patch,
>> I would probably +1 it. Not bothering with the space savings would
>> be a valid choice as well, IMO.
>>>
>>>    (2) Use the new format to allow rep sharing for properties that writes
>>>        the uniquifier so that svn_fs_props_changed() would work correctly,
>>>        and doesn't introduce the overhead of writing SHA1 in the
>>>        representation string for every property.
> 
> [...]
> 
>>> Barring objections and alternative suggestions, I could give a shot at
>>> implementing this.
>>
>> Go for it. Maybe notify me once you are done b/c currently, I don't
>> monitor the commit activity closely.
> 
> I committed the implementations of (1) and (2) in
> 
>      https://svn.apache.org/r1816347 and
>      https://svn.apache.org/r1816348
> 
> respectively.

Reviewed, tested and seems to work fine. Thank you!

Test expectations may need to be adapted.
With v7 repos (see r1816402), I get two
test failures while v4 and v6 pass:

[[[
W: EXPECTED STDERR (regexp):
W: | .*Found malformed header '[^']*' in revision file|.*Missing id field in 
node-rev.*
W: ACTUAL STDERR:
W: | * Error verifying repository metadata.
W: | subversion/svnadmin/svnadmin.c:2191,
W: | subversion/libsvn_repos/dump.c:2511,
W: | subversion/libsvn_repos/dump.c:2425,
W: | subversion/svnadmin/svnadmin.c:965,
W: | subversion/libsvn_fs/fs-loader.c:619,
W: | subversion/libsvn_fs_fs/verify.c:914,
W: | subversion/libsvn_fs_fs/verify.c:876,
W: | subversion/libsvn_fs_fs/verify.c:233,
W: | subversion/libsvn_fs_fs/rev_file.c:263,
W: | subversion/libsvn_fs_fs/low_level.c:237,
W: | subversion/libsvn_subr/checksum.c:432: (apr_err=SVN_ERR_BAD_CHECKSUM_PARSE)
W: | svnadmin: E125012: Invalid character in hex checksum
W: CWD: /dev/shm/trunk/subversion/tests/cmdline
W: EXCEPTION: SVNUnmatchedError
Traceback (most recent call last):
   File "/dev/shm/trunk/subversion/tests/cmdline/svntest/main.py", line 1872, in run
     rc = self.pred.run(sandbox)
   File "/dev/shm/trunk/subversion/tests/cmdline/svntest/testcase.py", line 258, 
in run
     return self._delegate.run(sandbox)
   File "/dev/shm/trunk/subversion/tests/cmdline/svntest/testcase.py", line 178, 
in run
     result = self.func(sandbox)
   File "/run/shm/trunk/subversion/tests/cmdline/svnadmin_tests.py", line 923, 
in verify_incremental_fsfs
     expected_stderr=".*Found malformed header '[^']*' in revision file"
   File "/dev/shm/trunk/subversion/tests/cmdline/svntest/verify.py", line 455, 
in verify_outputs
     compare_and_display_lines(message, label, expected, actual, raisable)
   File "/dev/shm/trunk/subversion/tests/cmdline/svntest/verify.py", line 428, 
in compare_and_display_lines
     raise raisable
SVNUnmatchedError
FAIL:  svnadmin_tests.py 12: svnadmin verify detects corruption dump can't
]]]

[[[
W: Unexpected error while running 'svnadmin verify'.
W: EXPECTED STDERR (regexp):
W: | .*Path '.*' is not in UTF-8.*
W: ACTUAL STDERR:
W: | * Error verifying repository metadata.
W: | subversion/svnadmin/svnadmin.c:2191,
W: | subversion/libsvn_repos/dump.c:2511,
W: | subversion/libsvn_repos/dump.c:2425,
W: | subversion/svnadmin/svnadmin.c:965,
W: | subversion/libsvn_fs/fs-loader.c:619,
W: | subversion/libsvn_fs_fs/verify.c:914,
W: | subversion/libsvn_fs_fs/verify.c:876,
W: | subversion/libsvn_fs_fs/verify.c:717,
W: | subversion/libsvn_fs_fs/verify.c:553,
W: | subversion/libsvn_fs_fs/verify.c:527: (apr_err=SVN_ERR_FS_CORRUPT)
W: | svnadmin: E160004: Checksum mismatch in item at offset 84 of length 254 
bytes in file svn-test-work/repositories/svnadmin_tests-24/db/revs/0/1
W: CWD: /dev/shm/trunk/subversion/tests/cmdline
W: EXCEPTION: SVNUnmatchedError
Traceback (most recent call last):
   File "/dev/shm/trunk/subversion/tests/cmdline/svntest/main.py", line 1872, in run
     rc = self.pred.run(sandbox)
   File "/dev/shm/trunk/subversion/tests/cmdline/svntest/testcase.py", line 258, 
in run
     return self._delegate.run(sandbox)
   File "/dev/shm/trunk/subversion/tests/cmdline/svntest/testcase.py", line 258, 
in run
     return self._delegate.run(sandbox)
   File "/dev/shm/trunk/subversion/tests/cmdline/svntest/testcase.py", line 178, 
in run
     result = self.func(sandbox)
   File "/run/shm/trunk/subversion/tests/cmdline/svnadmin_tests.py", line 1706, 
in verify_non_utf8_paths
     [], errput, None, ".*Path '.*' is not in UTF-8.*")
   File "/dev/shm/trunk/subversion/tests/cmdline/svntest/verify.py", line 455, 
in verify_outputs
     compare_and_display_lines(message, label, expected, actual, raisable)
   File "/dev/shm/trunk/subversion/tests/cmdline/svntest/verify.py", line 428, 
in compare_and_display_lines
     raise raisable
SVNUnmatchedError
FAIL:  svnadmin_tests.py 24: svnadmin verify with non-UTF-8 paths
]]]

-- Stefan^2.

Re: svn commit: r1813898 - in /subversion/trunk/subversion: libsvn_fs_fs/transaction.c libsvn_repos/reporter.c tests/cmdline/basic_tests.py tests/cmdline/svnadmin_tests.py

Posted by Evgeny Kotkov <ev...@visualsvn.com>.
Stefan Fuhrmann <st...@apache.org> writes:

>> An alternative approach that might be worth considering here would be:
>>
>>   (1) Extend the on-disk format and allow representation strings without
>>       SHA1, but with the uniquifier, something like this (where "-" stands
>>       for "no SHA1"):
>>
>>       15 0 563 7809 28ef320a82e7bd11eebdf3502d69e608 - 14-g/_5
>>
>>       (The new format would be allowed starting from FSFS 8.)
>
> Yes, that would be one option. If you are willing to provide a patch,
> I would probably +1 it. Not bothering with the space savings would
> be a valid choice as well, IMO.
>>
>>   (2) Use the new format to allow rep sharing for properties that writes
>>       the uniquifier so that svn_fs_props_changed() would work correctly,
>>       and doesn't introduce the overhead of writing SHA1 in the
>>       representation string for every property.

[...]

>> Barring objections and alternative suggestions, I could give a shot at
>> implementing this.
>
> Go for it. Maybe notify me once you are done b/c currently, I don't
> monitor the commit activity closely.

I committed the implementations of (1) and (2) in

    https://svn.apache.org/r1816347 and
    https://svn.apache.org/r1816348

respectively.


Thanks,
Evgeny Kotkov

Re: svn commit: r1813898 - in /subversion/trunk/subversion: libsvn_fs_fs/transaction.c libsvn_repos/reporter.c tests/cmdline/basic_tests.py tests/cmdline/svnadmin_tests.py

Posted by Stefan Fuhrmann <st...@apache.org>.
On 22.11.2017 22:08, Evgeny Kotkov wrote:
> Stefan Sperling <st...@apache.org> writes:
>
>> However, if rep-sharing is enabled, svn_fs_props_changed() does not work
>> as advertised because properties do not carry a SHA1 checksum with a
>> "uniquifier" which identifies the transaction they were created in.
>> The uniquifier is used by svn_fs_props_changed() to tell apart property
>> representations which share content but were created in different revisions.
>>
>> To fix that problem, make FSFS write SHA1 checksums along with uniquifiers
>> for file properties, just as it is already done for file content.
>>
>> A source code comment indicates that SHA1 checksums for property reps
>> were not written due to concerns over disk space. In hindsight, this was
>> a bad trade-off because it affected correctness of svn_fs_props_changed().
> Thinking about this change, it could be that writing the additional 40-byte
> SHA1 for the property representations is going to eliminate the benefit of
> sharing them in the first place.
Here some more background after digging through the commit history.

The original comment about space saving was ill-worded. It should have
said something like "storing sha1 etc. is pointless because nobody uses
it for props & dirs". Hence, it is a waste of space.

In 1.8, svn_fs_fs__dag_things_different got broken for props due to the
missing uniquifier but that went unnoticed. This has not been a deliberate
trade-off but a mere oversight of the potential implication of shared reps.
It needs to be fixed.

One reason for rep sharing is that rep changes tend to be rare and often
not "in sync" with text changes. That means, they must be pulled from
a different repo location than both, the node tree (dirs and noderevs)
and the file contents. Eliminating most of these accesses by reading
only a few shared proplists should give some speedup.

Space savings on disk are not the main focus. Those might happen for
commits with mass prop changes and a longer list of standard-props
at each node. Savings in the membuffer cache may be more significant,
though, because its entry directory has a limited size and storage may
remain unused if the average cached item size drops below 1k.
> If I recall correctly, rep sharing for properties is mostly there to gain
> from deduplicating small properties, such as svn:eol-style or svn:keywords.
> But with the additional SHA1 written in every representation string, this
> overhead is likely to take more space than the property itself.
>
> An alternative approach that might be worth considering here would be:
>
>   (1) Extend the on-disk format and allow representation strings without
>       SHA1, but with the uniquifier, something like this (where "-" stands
>       for "no SHA1"):
>
>       15 0 563 7809 28ef320a82e7bd11eebdf3502d69e608 - 14-g/_5
>
>       (The new format would be allowed starting from FSFS 8.)
Yes, that would be one option. If you are willing to provide a patch,
I would probably +1 it. Not bothering with the space savings would
be a valid choice as well, IMO.
>   (2) Use the new format to allow rep sharing for properties that writes
>       the uniquifier so that svn_fs_props_changed() would work correctly,
>       and doesn't introduce the overhead of writing SHA1 in the representation
>       string for every property.
>
>   (3) Disable rep sharing for properties in FSFS formats < 8 that cannot
>       read and write such representation strings without SHA1, but with an
>       uniquifier.
I'd rather have the pre fsfs-8 repos use the full sha1+uniquifier.
That is not worse than storing the props themselves and will
preserve the benefits mentioned above.
> Barring objections and alternative suggestions, I could give a shot at
> implementing this.
Go for it. Maybe notify me once you are done b/c currently, I don't
monitor the commit activity closely.

-- Stefan^2.

Re: svn commit: r1813898 - in /subversion/trunk/subversion: libsvn_fs_fs/transaction.c libsvn_repos/reporter.c tests/cmdline/basic_tests.py tests/cmdline/svnadmin_tests.py

Posted by Evgeny Kotkov <ev...@visualsvn.com>.
Stefan Sperling <st...@apache.org> writes:

> However, if rep-sharing is enabled, svn_fs_props_changed() does not work
> as advertised because properties do not carry a SHA1 checksum with a
> "uniquifier" which identifies the transaction they were created in.
> The uniquifier is used by svn_fs_props_changed() to tell apart property
> representations which share content but were created in different revisions.
>
> To fix that problem, make FSFS write SHA1 checksums along with uniquifiers
> for file properties, just as it is already done for file content.
>
> A source code comment indicates that SHA1 checksums for property reps
> were not written due to concerns over disk space. In hindsight, this was
> a bad trade-off because it affected correctness of svn_fs_props_changed().

Thinking about this change, it could be that writing the additional 40-byte
SHA1 for the property representations is going to eliminate the benefit of
sharing them in the first place.

If I recall correctly, rep sharing for properties is mostly there to gain
from deduplicating small properties, such as svn:eol-style or svn:keywords.
But with the additional SHA1 written in every representation string, this
overhead is likely to take more space than the property itself.

An alternative approach that might be worth considering here would be:

 (1) Extend the on-disk format and allow representation strings without
     SHA1, but with the uniquifier, something like this (where "-" stands
     for "no SHA1"):

     15 0 563 7809 28ef320a82e7bd11eebdf3502d69e608 - 14-g/_5

     (The new format would be allowed starting from FSFS 8.)

 (2) Use the new format to allow rep sharing for properties that writes
     the uniquifier so that svn_fs_props_changed() would work correctly,
     and doesn't introduce the overhead of writing SHA1 in the representation
     string for every property.

 (3) Disable rep sharing for properties in FSFS formats < 8 that cannot
     read and write such representation strings without SHA1, but with an
     uniquifier.

Barring objections and alternative suggestions, I could give a shot at
implementing this.


Regards,
Evgeny Kotkov