You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sam Tunnicliffe (Jira)" <ji...@apache.org> on 2020/09/18 17:16:00 UTC

[jira] [Commented] (CASSANDRA-15299) CASSANDRA-13304 follow-up: improve checksumming and compression in protocol v5-beta

    [ https://issues.apache.org/jira/browse/CASSANDRA-15299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17198465#comment-17198465 ] 

Sam Tunnicliffe commented on CASSANDRA-15299:
---------------------------------------------

Sorry it's been a while without any visible movement here, but I've just pushed some more commits to address the latest comments from [~ifesdjeen] and [~omichallat]. I've added some tests for protocol negotiation, correct handling of corrupt messages and resource management.
 
{quote} * In CQLMessageHandler#processOneContainedMessage, when we can't acquire capacity and, subsequently, we're not passing the frame further down the line. Shouold we release the frame in this case, since usually we're releasing the source frame after flush.
{quote}

Done, though we only need to do this when {{throwOnOverload == true}} as otherwise we process the inflight request before applying backpressure.

{quote} * ReusableBuffer is unused. {quote}

Ah yes, removed

{quote} * Server has a few unused imports and eventExecutorGroup which is unused.{quote}

Cleaned up the imports and removed eventExecutorGroup

{quote} * I'm not sure if we currently handle releasing corrupted frames.{quote}

For self-contained frames, there's nothing to do here as no resources have been acquired before the corruption is detected, hence {{CorruptFrame::release}} is a no-op. For frames which are part of a large message, there may be some resource allocated before we discover corruption. This is ok though, as we consume the frame, supplying it to the large message state machine, which handles releasing the bufffers of the previous frames (if any). I've added a test for this scenario which includes a check that everything allocated has been freed.

{quote}  Shouold we maybe make FrameSet auto-closeable and make sure we always release buffers in finally? I've also made a similar change to processItem which would add item to flushed to make sure it's released. That makes flushed variable name not quite right though. {quote}

I've pulled in some of your change to {{processItem}} as it removes some duplication around polling the queue. I've removed the condition in the {{finally}} of {{LegacyFlusher}} though, since if we throw from {{processQueue}} then {{doneWork}} will be false anyway, but there may have been some items processed and waiting to flush. The trade off is calling {{flushWrittenChannels}} even if there's no work to do, but that seems both cheap and unlikely, what do you think?
As far as making {{FrameSet}} autoclosable, I don't think that's feasible, given how they are created and accessed. Instead, I've moved the release of source frame buffers out of {{finish}} and added a {{try..finally}} to ensure we release them in the case that {{finish}} fails. I've also tried to address one of your comment re: the memory management here by adding some comments. They're probably not yet enough, but let me know if they are helpful at all.

{quote} We can (and probably should) open a separate ticket that could aim at performance improvements around native protocol. {quote}

Agreed, I'd like to do some further perf testing, but the results from your initial tests makes a follow-up ticket seem a reasonable option.

{quote} I've noticed an issue when the client starts protocol negotiation with an unsupported version. {quote}

Fixed, thanks.

[~ifesdjeen], I haven't pulled in your burn test or changes to {{SimpleClient}} yet, I'll try to do that next week.  I also haven't done any automated renaming yet, I'll hold off on that so as not to add to the cognitive burden until we're pretty much done with review. 

||branch||CI||
|[15299-trunk|https://github.com/beobal/cassandra/tree/15299-trunk]|[circle|https://app.circleci.com/pipelines/github/beobal/cassandra?branch=15299-trunk]|


> CASSANDRA-13304 follow-up: improve checksumming and compression in protocol v5-beta
> -----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15299
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15299
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Messaging/Client
>            Reporter: Aleksey Yeschenko
>            Assignee: Alex Petrov
>            Priority: Normal
>              Labels: protocolv5
>             Fix For: 4.0-alpha
>
>
> CASSANDRA-13304 made an important improvement to our native protocol: it introduced checksumming/CRC32 to request and response bodies. It’s an important step forward, but it doesn’t cover the entire stream. In particular, the message header is not covered by a checksum or a crc, which poses a correctness issue if, for example, {{streamId}} gets corrupted.
> Additionally, we aren’t quite using CRC32 correctly, in two ways:
> 1. We are calculating the CRC32 of the *decompressed* value instead of computing the CRC32 on the bytes written on the wire - losing the properties of the CRC32. In some cases, due to this sequencing, attempting to decompress a corrupt stream can cause a segfault by LZ4.
> 2. When using CRC32, the CRC32 value is written in the incorrect byte order, also losing some of the protections.
> See https://users.ece.cmu.edu/~koopman/pubs/KoopmanCRCWebinar9May2012.pdf for explanation for the two points above.
> Separately, there are some long-standing issues with the protocol - since *way* before CASSANDRA-13304. Importantly, both checksumming and compression operate on individual message bodies rather than frames of multiple complete messages. In reality, this has several important additional downsides. To name a couple:
> # For compression, we are getting poor compression ratios for smaller messages - when operating on tiny sequences of bytes. In reality, for most small requests and responses we are discarding the compressed value as it’d be smaller than the uncompressed one - incurring both redundant allocations and compressions.
> # For checksumming and CRC32 we pay a high overhead price for small messages. 4 bytes extra is *a lot* for an empty write response, for example.
> To address the correctness issue of {{streamId}} not being covered by the checksum/CRC32 and the inefficiency in compression and checksumming/CRC32, we should switch to a framing protocol with multiple messages in a single frame.
> I suggest we reuse the framing protocol recently implemented for internode messaging in CASSANDRA-15066 to the extent that its logic can be borrowed, and that we do it before native protocol v5 graduates from beta. See https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderCrc.java and https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderLZ4.java.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org