You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Ariel Weisberg (JIRA)" <ji...@apache.org> on 2015/01/26 21:03:34 UTC

[jira] [Comment Edited] (CASSANDRA-8684) Replace usage of Adler32 with CRC32

    [ https://issues.apache.org/jira/browse/CASSANDRA-8684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292316#comment-14292316 ] 

Ariel Weisberg edited comment on CASSANDRA-8684 at 1/26/15 8:02 PM:
--------------------------------------------------------------------

!https://docs.google.com/spreadsheets/d/1cxf-V4b8dXdz1vLb5ySUNxK09bukDHHpq79a09xHw20/pubchart?oid=1480884345&format=image!
!https://docs.google.com/spreadsheets/d/1cxf-V4b8dXdz1vLb5ySUNxK09bukDHHpq79a09xHw20/pubchart?oid=1206341035&format=image!

The ever inscrutable results for OS X. I am very skeptical it is doing 13 gigabytes/second on a single core. I don't get why there is a gradual increase in speed as the size of the data being checksummed increases, and that speed up doesn't exist on Linux. Looking at a CPU monitor doesn't make it look like the application is using multiple cores.

!https://docs.google.com/spreadsheets/d/1cxf-V4b8dXdz1vLb5ySUNxK09bukDHHpq79a09xHw20/pubchart?oid=1911364989&format=image!

I think the real speed is 3 gigabytes/second which is what I have seen in the past and seen on Linux. There are faster hashes like xxhash or MurmurHash3 to consider that operate in the 5-6 gigabyte/second range.

However a Java implementation of xxhash might not hit those numbers. The murmur3 implementation certainly doesn't. A native implementation incurs JNI overhead and there is nothing packaged at the moment.


was (Author: aweisberg):
!https://docs.google.com/spreadsheets/d/1cxf-V4b8dXdz1vLb5ySUNxK09bukDHHpq79a09xHw20/pubchart?oid=1480884345&format=image!
!https://docs.google.com/spreadsheets/d/1cxf-V4b8dXdz1vLb5ySUNxK09bukDHHpq79a09xHw20/pubchart?oid=1206341035&format=image!

The ever inscrutable results for OS X. I don't buy for a second that it is doing 13 gigabytes/second.

!https://docs.google.com/spreadsheets/d/1cxf-V4b8dXdz1vLb5ySUNxK09bukDHHpq79a09xHw20/pubchart?oid=1911364989&format=image!

I think the real speed is 3 gigabytes/second which is what I have seen in the past and seen on Linux. There are faster hashes like xxhash or MurmurHash3 to consider that operate in the 5-6 gigabyte/second range.

However a Java implementation of xxhash might not hit those numbers. The murmur3 implementation certainly doesn't. A native implementation incurs JNI overhead and there is nothing packaged at the moment.

> Replace usage of Adler32 with CRC32
> -----------------------------------
>
>                 Key: CASSANDRA-8684
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8684
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>         Attachments: CRCBenchmark.java, PureJavaCrc32.java, Sample.java
>
>
> I could not find a situation in which Adler32 outperformed PureJavaCrc32 much less the intrinsic from Java 8. For small allocations PureJavaCrc32 was much faster probably due to the JNI overhead of invoking the native Adler32 implementation where the array has to be allocated and copied.
> I tested on a 65w Sandy Bridge i5 running Ubuntu 14.04 with JDK 1.7.0_71 as well as a c3.8xlarge running Ubuntu 14.04.
> I think it makes sense to stop using Adler32 when generating new checksums.
> c3.8xlarge, results are time in milliseconds, lower is better
> ||Allocation size|Adler32|CRC32|PureJavaCrc32||
> |64|47636|46075|25782|
> |128|36755|36712|23782|
> |256|31194|32211|22731|
> |1024|27194|28792|22010|
> |1048576|25941|27807|21808|
> |536870912|25957|27840|21836|
> i5
> ||Allocation size|Adler32|CRC32|PureJavaCrc32||
> |64|50539|50466|26826|
> |128|37092|38533|24553|
> |256|30630|32938|23459|
> |1024|26064|29079|22592|
> |1048576|24357|27911|22481|
> |536870912|24838|28360|22853|
> Another fun fact. Performance of the CRC32 intrinsic appears to double from Sandy Bridge -> Haswell. Unless I am measuring something different when going from Linux/Sandy to Haswell/OS X.
> The intrinsic/JDK 8 implementation also operates against DirectByteBuffers better and coding against the wrapper will get that boost when run with Java 8.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)