You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sam Tunnicliffe (JIRA)" <ji...@apache.org> on 2017/03/14 13:49:41 UTC

[jira] [Comment Edited] (CASSANDRA-13304) Add checksumming to the native protocol

    [ https://issues.apache.org/jira/browse/CASSANDRA-13304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15924210#comment-15924210 ] 

Sam Tunnicliffe edited comment on CASSANDRA-13304 at 3/14/17 1:49 PM:
----------------------------------------------------------------------

[~mkjellman] thanks for that, responses below:

bq.I wanted to make it easy if we wanted to switch in an LZFSE implementation or something. Do we really want to throw everything into a single class and then have if conditions all over the place?

I'm not sure I follow you, which single class & if conditions are you referring to? To use another compression algo, you'd just add a {{ChunkCompressor}} implementation and create a new {{ChecksummedTransformer}} singleton that uses it (e.g. {{LZFSEChunkCompression}} & {{FrameBodyTransformer.LZFSE_CHECKSUMMED_TRANSFORMER}}). Basically, this is identical to what you were doing with {{ChecksumedCompressor}}, but making the classes composable instead of inheriting from an abstract base. You would need an additional switch condition in {{StartupMessage::execute}} but I'm not sure that's a problem, is it?

bq.I didn't want to push us into sticking with CRC32 forever

But it's non-trivial to switch because it requires coordination with clients. So unless you allow clients to negotiate the checksum implementation per connection, you'd have to gate that with a protocol version bump anyway. We can use {{o.a.c.utils.ChecksumType}} though, so we don't have to redo the thread local stuff (thanks [~jasobrown] for the pointer).

bq.why would V5 be marked as "beta"?

The beta status was added so that new protocol features could be integrated to trunk early, without needing to maintain a separate protocol_vX branch to collate and then integrate them in one hit when a release is cut. The {{CURRENT}} version is the released and supported protocol version, but clients can implement features from the upcoming/beta version against trunk. If the client requests a version currently marked as beta, it also has to set the {{USE_BETA}} flag in the startup message as an additional safety valve. 

bq.I think this should be an official change that goes into 4.0

Well that's more or less dependent on when v5 moves from beta to current/released, but I'd be surprised if that wasn't part of the 4.0 release. Even if it isn't though, this is totally an issue of protocol, rather than C*, versioning. 

Of course, this is much more invasive than any of the other v5 features that have already landed, as clients will be forced to implement checksumming if they want to connect at v5 at all (i.e. currently a client could pick and choose which of the v5 features to implement and just steer clear of any others). I don't really see a way around this though if we want to make checksumming mandatory. If we chose to make it optional, we could use a flag in startup just like we do for compression. Although we're close to running out of spare bits in the header flags, we could extend those using vints like was done for certain message level flags in CASSANDRA-12838

bq.I feel we've now made it just as unclear for the other case that the code will do decompression/compression... no?
You say potato :) I actually feel it's less unclear now, because the operation is not (never was) simply a compression, so it's the original naming that was kind of erroneous. The job of the class now called {{ChecksummedTransformer}} (previously {{ChecksumedCompressor}}) is to apply between 1 & n (currently up to a maximum of 2) transformations to the frame body. I'm definitely open to suggestions on the naming, but personally I don't think {{compress/decompress}} is an improvement. 

I would have prefered to do away with {{ChecksummedFrameCompressor}} and just have everything be a frame body transformation, but given that we do need to keep the legacy LZ4/Snappy impls around for compatibility that felt like too much change in one hit. 






was (Author: beobal):
[~mkjellman] thanks for that, responses below:

bq.I wanted to make it easy if we wanted to switch in an LZFSE implementation or something. Do we really want to throw everything into a single class and then have if conditions all over the place?

I'm not sure I follow you, which single class & if conditions are you referring to? To use another compression algo, you'd just add a {{ChunkCompressor}} implementation and create a new {{ChecksummedTransformer}} singleton that uses it (e.g. {{LZFSEChunkCompression}} & {{FrameBodyTransformer.LZFSE_CHECKSUMMED_TRANSFORMER}}). Basically, this is identical to what you were doing with {{ChecksumedCompressor}}, but making the classes composable instead of inheriting from an abstract base. You would need an additional switch condition in {{StartupMessage::execute}} but I'm not sure that's a problem, is it?

bq.I didn't want to push us into sticking with CRC32 forever

But it's non-trivial to switch because it requires coordination with clients. So unless you allow clients to negotiate the checksum implementation per connection, you'd have to gate that with a protocol version bump anyway. We can use {{o.a.c.utils.ChecksumType}} though, so we don't have to redo the thread local stuff (thanks @jasobrown for the pointer).

bq.why would V5 be marked as "beta"?

The beta status was added so that new protocol features could be integrated to trunk early, without needing to maintain a separate protocol_vX branch to collate and then integrate them in one hit when a release is cut. The {{CURRENT}} version is the released and supported protocol version, but clients can implement features from the upcoming/beta version against trunk. If the client requests a version currently marked as beta, it also has to set the {{USE_BETA}} flag in the startup message as an additional safety valve. 

bq.I think this should be an official change that goes into 4.0

Well that's more or less dependent on when v5 moves from beta to current/released, but I'd be surprised if that wasn't part of the 4.0 release. Even if it isn't though, this is totally an issue of protocol, rather than C*, versioning. 

Of course, this is much more invasive than any of the other v5 features that have already landed, as clients will be forced to implement checksumming if they want to connect at v5 at all (i.e. currently a client could pick and choose which of the v5 features to implement and just steer clear of any others). I don't really see a way around this though if we want to make checksumming mandatory. If we chose to make it optional, we could use a flag in startup just like we do for compression. Although we're close to running out of spare bits in the header flags, we could extend those using vints like was done for certain message level flags in CASSANDRA-12838

bq.I feel we've now made it just as unclear for the other case that the code will do decompression/compression... no?
You say potato :) I actually feel it's less unclear now, because the operation is not (never was) simply a compression, so it's the original naming that was kind of erroneous. The job of the class now called {{ChecksummedTransformer}} (previously {{ChecksumedCompressor}}) is to apply between 1 & n (currently up to a maximum of 2) transformations to the frame body. I'm definitely open to suggestions on the naming, but personally I don't think {{compress/decompress}} is an improvement. 

I would have prefered to do away with {{ChecksummedFrameCompressor}} and just have everything be a frame body transformation, but given that we do need to keep the legacy LZ4/Snappy impls around for compatibility that felt like too much change in one hit. 





> Add checksumming to the native protocol
> ---------------------------------------
>
>                 Key: CASSANDRA-13304
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13304
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Michael Kjellman
>            Assignee: Michael Kjellman
>              Labels: client-impacting
>         Attachments: 13304_v1.diff
>
>
> The native binary transport implementation doesn't include checksums. This makes it highly susceptible to silently inserting corrupted data either due to hardware issues causing bit flips on the sender/client side, C*/receiver side, or network in between.
> Attaching an implementation that makes checksum'ing mandatory (assuming both client and server know about a protocol version that supports checksums) -- and also adds checksumming to clients that request compression.
> The serialized format looks something like this:
> {noformat}
>  *                      1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
>  *  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |  Number of Compressed Chunks  |     Compressed Length (e1)    /
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * /  Compressed Length cont. (e1) |    Uncompressed Length (e1)   /
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * | Uncompressed Length cont. (e1)| CRC32 Checksum of Lengths (e1)|
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * | Checksum of Lengths cont. (e1)|    Compressed Bytes (e1)    +//
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |                      CRC32 Checksum (e1)                     ||
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |                    Compressed Length (e2)                     |
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |                   Uncompressed Length (e2)                    |
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |                CRC32 Checksum of Lengths (e2)                 |
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |                     Compressed Bytes (e2)                   +//
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |                      CRC32 Checksum (e2)                     ||
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |                    Compressed Length (en)                     |
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |                   Uncompressed Length (en)                    |
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |                CRC32 Checksum of Lengths (en)                 |
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |                      Compressed Bytes (en)                  +//
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |                      CRC32 Checksum (en)                     ||
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
> {noformat}
> The first pass here adds checksums only to the actual contents of the frame body itself (and doesn't actually checksum lengths and headers). While it would be great to fully add checksuming across the entire protocol, the proposed implementation will ensure we at least catch corrupted data and likely protect ourselves pretty well anyways.
> I didn't go to the trouble of implementing a Snappy Checksum'ed Compressor implementation as it's been deprecated for a while -- is really slow and crappy compared to LZ4 -- and we should do everything in our power to make sure no one in the community is still using it. I left it in (for obvious backwards compatibility aspects) old for clients that don't know about the new protocol.
> The current protocol has a 256MB (max) frame body -- where the serialized contents are simply written in to the frame body.
> If the client sends a compression option in the startup, we will install a FrameCompressor inline. Unfortunately, we went with a decision to treat the frame body separately from the header bits etc in a given message. So, instead we put a compressor implementation in the options and then if it's not null, we push the serialized bytes for the frame body *only* thru the given FrameCompressor implementation. The existing implementations simply provide all the bytes for the frame body in one go to the compressor implementation and then serialize it with the length of the compressed bytes up front.
> Unfortunately, this won't work for checksum'ing for obvious reasons as we can't naively just checksum the entire (potentially) 256MB frame body and slap it at the end... so,
> The best place to start with the changes is in {{ChecksumedCompressor}}. I implemented one single place to perform the checksuming (and to support checksuming) the actual required chunking logic. Implementations of ChecksumedCompressor only implement the actual calls to the given compression algorithm for the provided bytes.
> Although the interface takes a {{Checksum}}, right now the attached patch uses CRC32 everywhere. As of right now, given JDK8+ has support for doing the calculation with the Intel instruction set, CRC32 is about as fast as we can get right now.
> I went with a 32kb "default" for the chunk size -- meaning we will chunk the entire frame body into 32kb chunks, compress each one of those chunks, and checksum the chunk. Upon discussing with a bunch of people and researching how checksums actually work and how much data they will protect etc -- if we use 32kb chunks with CRC32 we can catch up to 32 bits flipped in a row (but more importantly catch the more likely corruption where a single bit is flipped) with pretty high certainty. 64kb seems to introduce too much of a probability of missing corruption.
> The maximum block size LZ4 operates on is a 64kb chunk -- so this combined with the need to make sure the CRC32 checksums are actually going to catch stuff -- chunking at 32kb seemed like a good reasonable value to use when weighing both checksums and compression (to ensure we don't kill our compression ratio etc).
> I'm not including client changes here -- I asked around and I'm not really sure what the policy there is -- do we update the python driver? java driver? how has the timing of this stuff been handled in the past?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)