You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Fabian Lange (Jira)" <ji...@apache.org> on 2020/01/08 16:57:00 UTC
[jira] [Created] (KAFKA-9387) LZ4 Compression creates significant unnecessary CPU usage

Fabian Lange created KAFKA-9387:
-----------------------------------

             Summary: LZ4 Compression creates significant unnecessary CPU usage
                 Key: KAFKA-9387
                 URL: https://issues.apache.org/jira/browse/KAFKA-9387
             Project: Kafka
          Issue Type: Bug
          Components: clients
    Affects Versions: 2.4.0
            Reporter: Fabian Lange
         Attachments: Screenshot 2020-01-08 at 16.52.38.png

KafkaLZ4BlockOutputStream and KafkaLZ4BlockInputStream perform checksumming on 3 bytes in the header. This is potentially quite unnecessary, but this ticket proposes a solution to improve the performance 10x.

{{kafka-downstream-0 id=152 state=RUNNABLE
    at net.jpountz.xxhash.XXHashJNI.XXH32(Native Method)
    at net.jpountz.xxhash.XXHash32JNI.hash(XXHash32JNI.java:30)
    at org.apache.kafka.common.record.KafkaLZ4BlockOutputStream.writeHeader(KafkaLZ4BlockOutputStream.java:156)
    at org.apache.kafka.common.record.KafkaLZ4BlockOutputStream.<init>(KafkaLZ4BlockOutputStream.java:85)
    at org.apache.kafka.common.record.KafkaLZ4BlockOutputStream.<init>(KafkaLZ4BlockOutputStream.java:125)
    at org.apache.kafka.common.record.CompressionType$4.wrapForOutput(CompressionType.java:101)
    at org.apache.kafka.common.record.MemoryRecordsBuilder.<init>(MemoryRecordsBuilder.java:130)
    at org.apache.kafka.common.record.MemoryRecordsBuilder.<init>(MemoryRecordsBuilder.java:166)
    at org.apache.kafka.common.record.MemoryRecords.builder(MemoryRecords.java:534)
    at org.apache.kafka.common.record.MemoryRecords.builder(MemoryRecords.java:516)
    at org.apache.kafka.common.record.MemoryRecords.builder(MemoryRecords.java:464)
    at org.apache.kafka.clients.producer.internals.RecordAccumulator.recordsBuilder(RecordAccumulator.java:245)
    at org.apache.kafka.clients.producer.internals.RecordAccumulator.append(RecordAccumulator.java:222)
    at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:917)
    at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:856)
    at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:743)}}

by default Kafka doesn't do checksumming on blocks (blockChecksum=false)
but it does doe checksumming on the header
The header however is static, so its checksumming the same 6 or 2 bytes over and over again.

Currently it uses the {{XXHashFactory.fastestInstance().hash32()}}
but this will be a JNI one.
For 2 bytes however, this is 10x slower than the java one, so we should replace it with {{fastestJavaInstance}}.






--
This message was sent by Atlassian Jira
(v8.3.4#803005)