You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Fabian Lange (Jira)" <ji...@apache.org> on 2020/01/08 16:57:00 UTC
[jira] [Created] (KAFKA-9387) LZ4 Compression creates significant
unnecessary CPU usage
Fabian Lange created KAFKA-9387:
-----------------------------------
Summary: LZ4 Compression creates significant unnecessary CPU usage
Key: KAFKA-9387
URL: https://issues.apache.org/jira/browse/KAFKA-9387
Project: Kafka
Issue Type: Bug
Components: clients
Affects Versions: 2.4.0
Reporter: Fabian Lange
Attachments: Screenshot 2020-01-08 at 16.52.38.png
KafkaLZ4BlockOutputStream and KafkaLZ4BlockInputStream perform checksumming on 3 bytes in the header. This is potentially quite unnecessary, but this ticket proposes a solution to improve the performance 10x.
{{kafka-downstream-0 id=152 state=RUNNABLE
at net.jpountz.xxhash.XXHashJNI.XXH32(Native Method)
at net.jpountz.xxhash.XXHash32JNI.hash(XXHash32JNI.java:30)
at org.apache.kafka.common.record.KafkaLZ4BlockOutputStream.writeHeader(KafkaLZ4BlockOutputStream.java:156)
at org.apache.kafka.common.record.KafkaLZ4BlockOutputStream.<init>(KafkaLZ4BlockOutputStream.java:85)
at org.apache.kafka.common.record.KafkaLZ4BlockOutputStream.<init>(KafkaLZ4BlockOutputStream.java:125)
at org.apache.kafka.common.record.CompressionType$4.wrapForOutput(CompressionType.java:101)
at org.apache.kafka.common.record.MemoryRecordsBuilder.<init>(MemoryRecordsBuilder.java:130)
at org.apache.kafka.common.record.MemoryRecordsBuilder.<init>(MemoryRecordsBuilder.java:166)
at org.apache.kafka.common.record.MemoryRecords.builder(MemoryRecords.java:534)
at org.apache.kafka.common.record.MemoryRecords.builder(MemoryRecords.java:516)
at org.apache.kafka.common.record.MemoryRecords.builder(MemoryRecords.java:464)
at org.apache.kafka.clients.producer.internals.RecordAccumulator.recordsBuilder(RecordAccumulator.java:245)
at org.apache.kafka.clients.producer.internals.RecordAccumulator.append(RecordAccumulator.java:222)
at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:917)
at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:856)
at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:743)}}
by default Kafka doesn't do checksumming on blocks (blockChecksum=false)
but it does doe checksumming on the header
The header however is static, so its checksumming the same 6 or 2 bytes over and over again.
Currently it uses the {{XXHashFactory.fastestInstance().hash32()}}
but this will be a JNI one.
For 2 bytes however, this is 10x slower than the java one, so we should replace it with {{fastestJavaInstance}}.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)