You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Dana Powers (JIRA)" <ji...@apache.org> on 2016/04/10 19:45:25 UTC
[jira] [Commented] (KAFKA-3160) Kafka LZ4 framing code
miscalculates header checksum
[ https://issues.apache.org/jira/browse/KAFKA-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15234221#comment-15234221 ]
Dana Powers commented on KAFKA-3160:
------------------------------------
Magnus: have you made any progress on this? The more I think about it, the more I think this needs to get included w/ KIP-31. If the goal of KIP-31 is to avoid recompression, and the goal of this JIRA is to fix the compression format, and in all cases we need to maintain compatibility with old clients, then I think the only way to solve all conditions is to make the pre-KIP-31 FetchRequest / ProduceRequest versions use the broken LZ4 format, and require the fixed format in the new FetchRequest / ProduceRequest version:
Old 0.8/0.9 clients (current behavior): produce messages w/ broken checksum; consume messages w/ incorrect checksum only
New 0.10 clients (proposed behavior): produce messages in "new KIP-31 format" w/ correct checksum; consume messages in "new KIP-31 format" w/ correct checksum only
Proposed behavior for 0.10 broker:
- convert all "old format" messages to "new KIP-31 format" + fix checksum to correct value
- require incoming "new KIP-31 format" messages to have correct checksum, otherwise throw error
- when serving requests for "old format", fixup checksum to be incorrect when converting "new KIP-31 format" messages to old format
Thoughts?
> Kafka LZ4 framing code miscalculates header checksum
> ----------------------------------------------------
>
> Key: KAFKA-3160
> URL: https://issues.apache.org/jira/browse/KAFKA-3160
> Project: Kafka
> Issue Type: Bug
> Components: compression
> Affects Versions: 0.8.2.0, 0.8.2.1, 0.9.0.0, 0.8.2.2, 0.9.0.1
> Reporter: Dana Powers
> Assignee: Magnus Edenhill
> Labels: compatibility, compression, lz4
>
> KAFKA-1493 partially implements the LZ4 framing specification, but it incorrectly calculates the header checksum. This causes KafkaLZ4BlockInputStream to raise an error [IOException(DESCRIPTOR_HASH_MISMATCH)] if a client sends *correctly* framed LZ4 data. It also causes KafkaLZ4BlockOutputStream to generate incorrectly framed LZ4 data, which means clients decoding LZ4 messages from kafka will always receive incorrectly framed data.
> Specifically, the current implementation includes the 4-byte MagicNumber in the checksum, which is incorrect.
> http://cyan4973.github.io/lz4/lz4_Frame_format.html
> Third-party clients that attempt to use off-the-shelf lz4 framing find that brokers reject messages as having a corrupt checksum. So currently non-java clients must 'fixup' lz4 packets to deal with the broken checksum.
> Magnus first identified this issue in librdkafka; kafka-python has the same problem.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)