You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Daniel Shahaf <d....@daniel.shahaf.name> on 2015/05/18 01:20:47 UTC

fsfs7 structure-indexes - questions

Good morning Stefan,

A couple of questions on the fsfs7 indexes doc:

[[[
--- subversion/libsvn_fs_fs/structure-indexes
+++ subversion/libsvn_fs_fs/structure-indexes
@@ -18,6 +18,9 @@ a simple concatenation of runtime structs and as such, an implementation
 detail subject to change.  A proto index basically aggregates all the
 information that must later be transformed into the final index.
 
+### "Subject to change?"  The format must be stable & documented since
+### "begin txn, modify txn, reboot, upgrade svn, commit txn" should work.
+
 
 General design concerns
 -----------------------
@@ -346,7 +349,10 @@ For performance reasons we use a modified version:
 * combine the big endian representation of these checksums plus the
   remnant of the original stream into a 12 to 15 byte long intermediate
 
   [i0 .. iK], 12 <= K+1 <= 15
 
+### "combine" is unclear.  Combine how?  Concatenate?  Interleave?  Add?  Multiply?
+
 * FNV checksum = fnv_1a([i0 .. iK]) in big endian representation
 
+### Why do we checksum the output of a checksum algorithm?  (Bytes i0..iK are themselves FNV output)
]]]

Also, I suggest to change "zero to many" to "zero or more" to avoid confusion
with the term "one-to-many" of relational databases.

Cheers,

Daniel

Re: fsfs7 structure-indexes - questions

Posted by Stefan Fuhrmann <st...@wandisco.com>.
Hi Daniel,

I'll fly back to Europe tomorrow & the day after.
Once there, I'll take time to answer your questions
and review your patch (elsethread).

-- Stefan^2.


On Mon, May 18, 2015 at 1:20 AM, Daniel Shahaf <d....@daniel.shahaf.name>
wrote:

> Good morning Stefan,
>
> A couple of questions on the fsfs7 indexes doc:
>
> [[[
> --- subversion/libsvn_fs_fs/structure-indexes
> +++ subversion/libsvn_fs_fs/structure-indexes
> @@ -18,6 +18,9 @@ a simple concatenation of runtime structs and as such,
> an implementation
>  detail subject to change.  A proto index basically aggregates all the
>  information that must later be transformed into the final index.
>
> +### "Subject to change?"  The format must be stable & documented since
> +### "begin txn, modify txn, reboot, upgrade svn, commit txn" should work.
> +
>
>  General design concerns
>  -----------------------
> @@ -346,7 +349,10 @@ For performance reasons we use a modified version:
>  * combine the big endian representation of these checksums plus the
>    remnant of the original stream into a 12 to 15 byte long intermediate
>
>    [i0 .. iK], 12 <= K+1 <= 15
>
> +### "combine" is unclear.  Combine how?  Concatenate?  Interleave?  Add?
> Multiply?
> +
>  * FNV checksum = fnv_1a([i0 .. iK]) in big endian representation
>
> +### Why do we checksum the output of a checksum algorithm?  (Bytes i0..iK
> are themselves FNV output)
> ]]]
>
> Also, I suggest to change "zero to many" to "zero or more" to avoid
> confusion
> with the term "one-to-many" of relational databases.
>
> Cheers,
>
> Daniel
>

Re: fsfs7 structure-indexes - questions

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Mon, May 18, 2015 at 1:20 AM, Daniel Shahaf <d....@daniel.shahaf.name>
wrote:

> Good morning Stefan,
>
> A couple of questions on the fsfs7 indexes doc:
>
> [[[
> --- subversion/libsvn_fs_fs/structure-indexes
> +++ subversion/libsvn_fs_fs/structure-indexes
> @@ -18,6 +18,9 @@ a simple concatenation of runtime structs and as such,
> an implementation
>  detail subject to change.  A proto index basically aggregates all the
>  information that must later be transformed into the final index.
>
> +### "Subject to change?"  The format must be stable & documented since
> +### "begin txn, modify txn, reboot, upgrade svn, commit txn" should work.
> +
>

That's an outdated section. All index structures,
including the proto-indexes are now well-defined
and platform-independent.


>  General design concerns
>  -----------------------
> @@ -346,7 +349,10 @@ For performance reasons we use a modified version:
>  * combine the big endian representation of these checksums plus the
>    remnant of the original stream into a 12 to 15 byte long intermediate
>
>    [i0 .. iK], 12 <= K+1 <= 15
>
> +### "combine" is unclear.  Combine how?  Concatenate?  Interleave?  Add?
> Multiply?
> +
>

Concatenated into a 16 to 19 byte sequence
(so there is also a factual error).


>  * FNV checksum = fnv_1a([i0 .. iK]) in big endian representation
>
> +### Why do we checksum the output of a checksum algorithm?  (Bytes i0..iK
> are themselves FNV output)
>

Not all are FNV output, they also include up to
3 "odd" bytes from the end of the original sequence.
But the main reason is that we want a short / space
efficient checksum to keep the index size small.

A 1-in-a-billion false negative for a random bit corruption
is good enough for a quick low-level check. It is on par
with what e.g. BTRFS does.

]]]
>
> Also, I suggest to change "zero to many" to "zero or more" to avoid
> confusion
> with the term "one-to-many" of relational databases.
>

Yeah, that should be changed as well.

Thanks for the review! r1692739 addresses the first
3 points and r1692964 the last one.

-- Stefan^2.