You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Daniel Shahaf <d....@daniel.shahaf.name> on 2015/05/18 01:20:47 UTC
fsfs7 structure-indexes - questions
Good morning Stefan,
A couple of questions on the fsfs7 indexes doc:
[[[
--- subversion/libsvn_fs_fs/structure-indexes
+++ subversion/libsvn_fs_fs/structure-indexes
@@ -18,6 +18,9 @@ a simple concatenation of runtime structs and as such, an implementation
detail subject to change. A proto index basically aggregates all the
information that must later be transformed into the final index.
+### "Subject to change?" The format must be stable & documented since
+### "begin txn, modify txn, reboot, upgrade svn, commit txn" should work.
+
General design concerns
-----------------------
@@ -346,7 +349,10 @@ For performance reasons we use a modified version:
* combine the big endian representation of these checksums plus the
remnant of the original stream into a 12 to 15 byte long intermediate
[i0 .. iK], 12 <= K+1 <= 15
+### "combine" is unclear. Combine how? Concatenate? Interleave? Add? Multiply?
+
* FNV checksum = fnv_1a([i0 .. iK]) in big endian representation
+### Why do we checksum the output of a checksum algorithm? (Bytes i0..iK are themselves FNV output)
]]]
Also, I suggest to change "zero to many" to "zero or more" to avoid confusion
with the term "one-to-many" of relational databases.
Cheers,
Daniel
Re: fsfs7 structure-indexes - questions
Posted by Stefan Fuhrmann <st...@wandisco.com>.
Hi Daniel,
I'll fly back to Europe tomorrow & the day after.
Once there, I'll take time to answer your questions
and review your patch (elsethread).
-- Stefan^2.
On Mon, May 18, 2015 at 1:20 AM, Daniel Shahaf <d....@daniel.shahaf.name>
wrote:
> Good morning Stefan,
>
> A couple of questions on the fsfs7 indexes doc:
>
> [[[
> --- subversion/libsvn_fs_fs/structure-indexes
> +++ subversion/libsvn_fs_fs/structure-indexes
> @@ -18,6 +18,9 @@ a simple concatenation of runtime structs and as such,
> an implementation
> detail subject to change. A proto index basically aggregates all the
> information that must later be transformed into the final index.
>
> +### "Subject to change?" The format must be stable & documented since
> +### "begin txn, modify txn, reboot, upgrade svn, commit txn" should work.
> +
>
> General design concerns
> -----------------------
> @@ -346,7 +349,10 @@ For performance reasons we use a modified version:
> * combine the big endian representation of these checksums plus the
> remnant of the original stream into a 12 to 15 byte long intermediate
>
> [i0 .. iK], 12 <= K+1 <= 15
>
> +### "combine" is unclear. Combine how? Concatenate? Interleave? Add?
> Multiply?
> +
> * FNV checksum = fnv_1a([i0 .. iK]) in big endian representation
>
> +### Why do we checksum the output of a checksum algorithm? (Bytes i0..iK
> are themselves FNV output)
> ]]]
>
> Also, I suggest to change "zero to many" to "zero or more" to avoid
> confusion
> with the term "one-to-many" of relational databases.
>
> Cheers,
>
> Daniel
>
Re: fsfs7 structure-indexes - questions
Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Mon, May 18, 2015 at 1:20 AM, Daniel Shahaf <d....@daniel.shahaf.name>
wrote:
> Good morning Stefan,
>
> A couple of questions on the fsfs7 indexes doc:
>
> [[[
> --- subversion/libsvn_fs_fs/structure-indexes
> +++ subversion/libsvn_fs_fs/structure-indexes
> @@ -18,6 +18,9 @@ a simple concatenation of runtime structs and as such,
> an implementation
> detail subject to change. A proto index basically aggregates all the
> information that must later be transformed into the final index.
>
> +### "Subject to change?" The format must be stable & documented since
> +### "begin txn, modify txn, reboot, upgrade svn, commit txn" should work.
> +
>
That's an outdated section. All index structures,
including the proto-indexes are now well-defined
and platform-independent.
> General design concerns
> -----------------------
> @@ -346,7 +349,10 @@ For performance reasons we use a modified version:
> * combine the big endian representation of these checksums plus the
> remnant of the original stream into a 12 to 15 byte long intermediate
>
> [i0 .. iK], 12 <= K+1 <= 15
>
> +### "combine" is unclear. Combine how? Concatenate? Interleave? Add?
> Multiply?
> +
>
Concatenated into a 16 to 19 byte sequence
(so there is also a factual error).
> * FNV checksum = fnv_1a([i0 .. iK]) in big endian representation
>
> +### Why do we checksum the output of a checksum algorithm? (Bytes i0..iK
> are themselves FNV output)
>
Not all are FNV output, they also include up to
3 "odd" bytes from the end of the original sequence.
But the main reason is that we want a short / space
efficient checksum to keep the index size small.
A 1-in-a-billion false negative for a random bit corruption
is good enough for a quick low-level check. It is on par
with what e.g. BTRFS does.
]]]
>
> Also, I suggest to change "zero to many" to "zero or more" to avoid
> confusion
> with the term "one-to-many" of relational databases.
>
Yeah, that should be changed as well.
Thanks for the review! r1692739 addresses the first
3 points and r1692964 the last one.
-- Stefan^2.