You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Jay Kreps (JIRA)" <ji...@apache.org> on 2012/06/27 18:42:43 UTC
[jira] [Commented] (KAFKA-374) Move to java CRC32 implementation
[ https://issues.apache.org/jira/browse/KAFKA-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402344#comment-13402344 ]
Jay Kreps commented on KAFKA-374:
---------------------------------
Here are full performance results
The size is in bytes and the value for native/java is the nanoseconds per message averaged over a large number of messages:
size native java improvement
16 149.47 108.11 27.7%
32 197.8 149.78 24.3%
64 291.01 219.89 24.4%
128 487.36 357.64 26.6%
256 892.78 631.15 29.3%
512 1774.22 1251.4 29.5%
1024 3412.79 2470.58 27.6%
2048 6594.28 4421.38 33.0%
4096 13121.85 8751.19 33.3%
8192 25689.03 18173.61 29.3%
16384 51258.21 36278.3 29.2%
32768 103584.61 73240.5 29.3%
65536 207569.05 146748.51 29.3%
131072 415893.86 292083.12 29.8%
I suspect there is still some scala numeric boxing magic happening here that would be good to get rid of.
> Move to java CRC32 implementation
> ---------------------------------
>
> Key: KAFKA-374
> URL: https://issues.apache.org/jira/browse/KAFKA-374
> Project: Kafka
> Issue Type: New Feature
> Components: core
> Affects Versions: 0.8
> Reporter: Jay Kreps
> Priority: Minor
> Labels: newbie
> Attachments: KAFKA-374-draft.patch
>
>
> We keep a per-record crc32. This is fairly cheap algorithm, but the java implementation uses JNI and it seems to be a bit expensive for small records. I have seen this before in Kafka profiles, and I noticed it on another application I was working on. Basically with small records the native implementation can only checksum < 100MB/sec. Hadoop has done some analysis of this and replaced it with a Java implementation that is 2x faster for large values and 5-10x faster for small values. Details are here HADOOP-6148.
> We should do a quick read/write benchmark on log and message set iteration and see if this improves things.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira