You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/02/08 13:46:45 UTC
[GitHub] [ozone] sodonnel opened a new pull request #1910: HDDS-4808. Add Genesis benchmark for various CRC implementations
sodonnel opened a new pull request #1910:
URL: https://github.com/apache/ozone/pull/1910
## What changes were proposed in this pull request?
Add a Genesis benchmark to compare the performance of various CRC32 implementations.
## What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-4808
## How was this patch tested?
Benchmarks were execute manually. One new test added to validate that all CRC implementations give the same result.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org
[GitHub] [ozone] swagle commented on a change in pull request #1910: HDDS-4808. Add Genesis benchmark for various CRC implementations
Posted by GitBox <gi...@apache.org>.
swagle commented on a change in pull request #1910:
URL: https://github.com/apache/ozone/pull/1910#discussion_r577026104
##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/util/NativeCRC32Wrapper.java
##########
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.util;
+
+import org.apache.hadoop.fs.ChecksumException;
+
+import java.nio.ByteBuffer;
+
+/**
+ * This class wraps the NativeCRC32 class in hadoop-common, because the class
+ * is package private there. The intention of making this class available
+ * in Ozone is to allow the native libraries to be benchmarked alongside other
+ * implementations. At the current time, the hadoop native CRC is not used
+ * anywhere in Ozone except for benchmarks.
Review comment:
Important to call this out in the jira description as well as the PR. With the changes in this patch could Ozone start making use of the native CRC implementation?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org
[GitHub] [ozone] sodonnel commented on a change in pull request #1910: HDDS-4808. Add Genesis benchmark for various CRC implementations
Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #1910:
URL: https://github.com/apache/ozone/pull/1910#discussion_r577118464
##########
File path: hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/genesis/BenchMarkCRCBatch.java
##########
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.genesis;
+
+import java.nio.ByteBuffer;
+
+import org.apache.commons.lang3.RandomUtils;
+import org.apache.hadoop.util.NativeCRC32Wrapper;
+import org.openjdk.jmh.annotations.Benchmark;
Review comment:
I don't believe so. JMH is pulled in as a dependency in the pom.xml and other existing benchmarks have these same imports.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org
[GitHub] [ozone] szetszwo commented on a change in pull request #1910: HDDS-4808. Add Genesis benchmark for various CRC implementations
Posted by GitBox <gi...@apache.org>.
szetszwo commented on a change in pull request #1910:
URL: https://github.com/apache/ozone/pull/1910#discussion_r577552115
##########
File path: hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/genesis/BenchMarkCRCStreaming.java
##########
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.genesis;
+
+import java.nio.ByteBuffer;
+
+import org.apache.commons.lang3.RandomUtils;
+import org.apache.hadoop.ozone.common.ChecksumByteBuffer;
+import org.apache.hadoop.ozone.common.ChecksumByteBufferImpl;
+import org.apache.hadoop.ozone.common.NativeCheckSumCRC32;
+import org.apache.hadoop.ozone.common.PureJavaCrc32ByteBuffer;
+import org.apache.hadoop.ozone.common.PureJavaCrc32CByteBuffer;
+import org.apache.hadoop.util.NativeCRC32Wrapper;
+import org.apache.hadoop.util.PureJavaCrc32;
+import org.apache.hadoop.util.PureJavaCrc32C;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Level;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.Param;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.Setup;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.Threads;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.infra.Blackhole;
+
+import java.util.zip.CRC32;
+
+import static java.util.concurrent.TimeUnit.MILLISECONDS;
+
+/**
+ * Class to benchmark various CRC implementations. This can be executed via
+ *
+ * ozone genesis -b BenchmarkCRC
+ *
+ * However there are some points to keep in mind. java.util.zip.CRC32C is not
+ * available until Java 9, therefore if the JVM has a lower version than 9, that
+ * implementation will not be tested.
+ *
+ * The hadoop native libraries will only be tested if libhadoop.so is found on
+ * the "-Djava.library.path". libhadoop.so is not currently bundled with Ozone,
+ * so it needs to be obtained from a Hadoop build and the test needs to be
+ * executed on a compatible OS (ie Linux x86):
+ *
+ * ozone --jvmargs -Djava.library.path=/home/sodonnell/native genesis -b
+ * BenchmarkCRC
+ */
+public class BenchMarkCRCStreaming {
+
+ private static int dataSize = 64 * 1024 * 1024;
+
+ @State(Scope.Thread)
+ public static class BenchmarkState {
+
+ private final ByteBuffer data = ByteBuffer.allocate(dataSize);
+
+ @Param({"512", "1024", "2048", "4096", "32768", "1048576"})
+ private int checksumSize;
+
+ @Param({"pureCRC32", "pureCRC32C", "hadoopCRC32C", "hadoopCRC32",
+ "zipCRC32", "zipCRC32C", "nativeCRC32", "nativeCRC32C"})
+ private String crcImpl;
+
+ private ChecksumByteBuffer checksum;
+
+ public ChecksumByteBuffer checksum() {
+ return checksum;
+ }
+
+ public String crcImpl() {
+ return crcImpl;
+ }
+
+ public int checksumSize() {
+ return checksumSize;
+ }
+
+ @Setup(Level.Trial)
+ public void setUp() {
+ switch (crcImpl) {
+ case "pureCRC32":
+ checksum = new PureJavaCrc32ByteBuffer();
+ break;
+ case "pureCRC32C":
+ checksum = new PureJavaCrc32CByteBuffer();
+ break;
+ case "hadoopCRC32":
+ checksum = new ChecksumByteBufferImpl(new PureJavaCrc32());
+ break;
+ case "hadoopCRC32C":
+ checksum = new ChecksumByteBufferImpl(new PureJavaCrc32C());
+ break;
+ case "zipCRC32":
+ checksum = new ChecksumByteBufferImpl(new CRC32());
+ break;
+ case "zipCRC32C":
+ try {
+ checksum = new ChecksumByteBufferImpl(
+ ChecksumByteBufferImpl.Java9Crc32CFactory.createChecksum());
+ } catch (Throwable e) {
+ throw new RuntimeException("zipCRC32C is not available pre Java 9");
+ }
+ break;
+ case "nativeCRC32":
+ if (NativeCRC32Wrapper.isAvailable()) {
+ checksum = new ChecksumByteBufferImpl(new NativeCheckSumCRC32(
+ NativeCRC32Wrapper.CHECKSUM_CRC32, checksumSize));
+ } else {
+ throw new RuntimeException("Native library is not available");
+ }
+ break;
+ case "nativeCRC32C":
+ if (NativeCRC32Wrapper.isAvailable()) {
+ checksum = new ChecksumByteBufferImpl(new NativeCheckSumCRC32(
+ NativeCRC32Wrapper.CHECKSUM_CRC32C, checksumSize));
+ } else {
+ throw new RuntimeException("Native library is not available");
+ }
+ break;
+ default:
+ }
+ data.clear();
+ data.put(RandomUtils.nextBytes(data.remaining()));
+ }
+ }
+
+ @Benchmark
+ @Threads(1)
+ @Warmup(iterations = 3, time = 1000, timeUnit = MILLISECONDS)
+ @Fork(value = 1, warmups = 0)
+ @Measurement(iterations = 5, time = 2000, timeUnit = MILLISECONDS)
+ @BenchmarkMode(Mode.Throughput)
+ public void runCRC(Blackhole blackhole, BenchmarkState state) {
+ ByteBuffer data = state.data;
+ data.clear();
Review comment:
You are right -- clear() does not really clear the buffer. Thanks.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org
[GitHub] [ozone] sodonnel commented on a change in pull request #1910: HDDS-4808. Add Genesis benchmark for various CRC implementations
Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #1910:
URL: https://github.com/apache/ozone/pull/1910#discussion_r577117169
##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/util/NativeCRC32Wrapper.java
##########
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.util;
+
+import org.apache.hadoop.fs.ChecksumException;
+
+import java.nio.ByteBuffer;
+
+/**
+ * This class wraps the NativeCRC32 class in hadoop-common, because the class
+ * is package private there. The intention of making this class available
+ * in Ozone is to allow the native libraries to be benchmarked alongside other
+ * implementations. At the current time, the hadoop native CRC is not used
+ * anywhere in Ozone except for benchmarks.
Review comment:
Not unless you get the compiled shared library from a hadoop build and then add it to the java.library.path. However to be able to benchmark the native libs, we need this code here. The classes inside Hadoop common are marked private, which is why I needed to wrap them.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org
[GitHub] [ozone] sodonnel merged pull request #1910: HDDS-4808. Add Genesis benchmark for various CRC implementations
Posted by GitBox <gi...@apache.org>.
sodonnel merged pull request #1910:
URL: https://github.com/apache/ozone/pull/1910
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org
[GitHub] [ozone] swagle commented on a change in pull request #1910: HDDS-4808. Add Genesis benchmark for various CRC implementations
Posted by GitBox <gi...@apache.org>.
swagle commented on a change in pull request #1910:
URL: https://github.com/apache/ozone/pull/1910#discussion_r577041259
##########
File path: hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/genesis/BenchMarkCRCBatch.java
##########
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.genesis;
+
+import java.nio.ByteBuffer;
+
+import org.apache.commons.lang3.RandomUtils;
+import org.apache.hadoop.util.NativeCRC32Wrapper;
+import org.openjdk.jmh.annotations.Benchmark;
Review comment:
Does this add openjdk compile-time dep?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org
[GitHub] [ozone] sodonnel commented on a change in pull request #1910: HDDS-4808. Add Genesis benchmark for various CRC implementations
Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #1910:
URL: https://github.com/apache/ozone/pull/1910#discussion_r577487531
##########
File path: hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/genesis/BenchMarkCRCStreaming.java
##########
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.genesis;
+
+import java.nio.ByteBuffer;
+
+import org.apache.commons.lang3.RandomUtils;
+import org.apache.hadoop.ozone.common.ChecksumByteBuffer;
+import org.apache.hadoop.ozone.common.ChecksumByteBufferImpl;
+import org.apache.hadoop.ozone.common.NativeCheckSumCRC32;
+import org.apache.hadoop.ozone.common.PureJavaCrc32ByteBuffer;
+import org.apache.hadoop.ozone.common.PureJavaCrc32CByteBuffer;
+import org.apache.hadoop.util.NativeCRC32Wrapper;
+import org.apache.hadoop.util.PureJavaCrc32;
+import org.apache.hadoop.util.PureJavaCrc32C;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Level;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.Param;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.Setup;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.Threads;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.infra.Blackhole;
+
+import java.util.zip.CRC32;
+
+import static java.util.concurrent.TimeUnit.MILLISECONDS;
+
+/**
+ * Class to benchmark various CRC implementations. This can be executed via
+ *
+ * ozone genesis -b BenchmarkCRC
+ *
+ * However there are some points to keep in mind. java.util.zip.CRC32C is not
+ * available until Java 9, therefore if the JVM has a lower version than 9, that
+ * implementation will not be tested.
+ *
+ * The hadoop native libraries will only be tested if libhadoop.so is found on
+ * the "-Djava.library.path". libhadoop.so is not currently bundled with Ozone,
+ * so it needs to be obtained from a Hadoop build and the test needs to be
+ * executed on a compatible OS (ie Linux x86):
+ *
+ * ozone --jvmargs -Djava.library.path=/home/sodonnell/native genesis -b
+ * BenchmarkCRC
+ */
+public class BenchMarkCRCStreaming {
+
+ private static int dataSize = 64 * 1024 * 1024;
+
+ @State(Scope.Thread)
+ public static class BenchmarkState {
+
+ private final ByteBuffer data = ByteBuffer.allocate(dataSize);
+
+ @Param({"512", "1024", "2048", "4096", "32768", "1048576"})
+ private int checksumSize;
+
+ @Param({"pureCRC32", "pureCRC32C", "hadoopCRC32C", "hadoopCRC32",
+ "zipCRC32", "zipCRC32C", "nativeCRC32", "nativeCRC32C"})
+ private String crcImpl;
+
+ private ChecksumByteBuffer checksum;
+
+ public ChecksumByteBuffer checksum() {
+ return checksum;
+ }
+
+ public String crcImpl() {
+ return crcImpl;
+ }
+
+ public int checksumSize() {
+ return checksumSize;
+ }
+
+ @Setup(Level.Trial)
+ public void setUp() {
+ switch (crcImpl) {
+ case "pureCRC32":
+ checksum = new PureJavaCrc32ByteBuffer();
+ break;
+ case "pureCRC32C":
+ checksum = new PureJavaCrc32CByteBuffer();
+ break;
+ case "hadoopCRC32":
+ checksum = new ChecksumByteBufferImpl(new PureJavaCrc32());
+ break;
+ case "hadoopCRC32C":
+ checksum = new ChecksumByteBufferImpl(new PureJavaCrc32C());
+ break;
+ case "zipCRC32":
+ checksum = new ChecksumByteBufferImpl(new CRC32());
+ break;
+ case "zipCRC32C":
+ try {
+ checksum = new ChecksumByteBufferImpl(
+ ChecksumByteBufferImpl.Java9Crc32CFactory.createChecksum());
+ } catch (Throwable e) {
+ throw new RuntimeException("zipCRC32C is not available pre Java 9");
+ }
+ break;
+ case "nativeCRC32":
+ if (NativeCRC32Wrapper.isAvailable()) {
+ checksum = new ChecksumByteBufferImpl(new NativeCheckSumCRC32(
+ NativeCRC32Wrapper.CHECKSUM_CRC32, checksumSize));
+ } else {
+ throw new RuntimeException("Native library is not available");
+ }
+ break;
+ case "nativeCRC32C":
+ if (NativeCRC32Wrapper.isAvailable()) {
+ checksum = new ChecksumByteBufferImpl(new NativeCheckSumCRC32(
+ NativeCRC32Wrapper.CHECKSUM_CRC32C, checksumSize));
+ } else {
+ throw new RuntimeException("Native library is not available");
+ }
+ break;
+ default:
+ }
+ data.clear();
+ data.put(RandomUtils.nextBytes(data.remaining()));
+ }
+ }
+
+ @Benchmark
+ @Threads(1)
+ @Warmup(iterations = 3, time = 1000, timeUnit = MILLISECONDS)
+ @Fork(value = 1, warmups = 0)
+ @Measurement(iterations = 5, time = 2000, timeUnit = MILLISECONDS)
+ @BenchmarkMode(Mode.Throughput)
+ public void runCRC(Blackhole blackhole, BenchmarkState state) {
+ ByteBuffer data = state.data;
+ data.clear();
Review comment:
clear does not actually alter the buffer contents, it only sets the position to zero and the limit to the capacity, getting the buffer read for a new read / write. I guess I don't need the clear here as I set position and limit on each pass around the loop, so I think I can remove line safely.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org
[GitHub] [ozone] szetszwo commented on a change in pull request #1910: HDDS-4808. Add Genesis benchmark for various CRC implementations
Posted by GitBox <gi...@apache.org>.
szetszwo commented on a change in pull request #1910:
URL: https://github.com/apache/ozone/pull/1910#discussion_r577397045
##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/common/ChecksumByteBufferImpl.java
##########
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.common;
+
+import java.lang.invoke.MethodHandle;
+import java.lang.invoke.MethodHandles;
+import java.lang.invoke.MethodType;
+import java.nio.ByteBuffer;
+import java.util.zip.Checksum;
+
+public class ChecksumByteBufferImpl implements ChecksumByteBuffer {
+
+ public static class Java9Crc32CFactory {
+ private static final MethodHandle NEW_CRC32C_MH;
+
+ static {
+ MethodHandle newCRC32C = null;
+ try {
+ newCRC32C = MethodHandles.publicLookup()
+ .findConstructor(
+ Class.forName("java.util.zip.CRC32C"),
+ MethodType.methodType(void.class)
+ );
+ } catch (ReflectiveOperationException e) {
+ // Should not reach here.
+ throw new RuntimeException(e);
+ }
+ NEW_CRC32C_MH = newCRC32C;
+ }
+
+ public static java.util.zip.Checksum createChecksum() {
+ try {
+ // Should throw nothing
+ return (Checksum) NEW_CRC32C_MH.invoke();
+ } catch (Throwable t) {
+ throw (t instanceof RuntimeException) ? (RuntimeException) t
+ : new RuntimeException(t);
+ }
+ }
+ };
+
+ private Checksum checksum;
+
+ public ChecksumByteBufferImpl(Checksum impl) {
+ this.checksum = impl;
+ }
+
+ @Override
+ public void update(ByteBuffer buffer) {
+ if (buffer.hasArray()) {
+ checksum.update(buffer.array(), buffer.position() + buffer.arrayOffset(),
+ buffer.remaining());
+ } else {
+ byte[] b = new byte[buffer.remaining()];
+ buffer.get(b);
+ checksum.update(b, 0, b.length);
+ }
+ }
Review comment:
Since Java 9 Checksum supports `update(ByteBuffer)` https://docs.oracle.com/javase/9/docs/api/java/util/zip/Checksum.html#update-java.nio.ByteBuffer- , this method should call it when `checksum` is a Java. 9 Checksum object.
##########
File path: hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/genesis/BenchMarkCRCStreaming.java
##########
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.genesis;
+
+import java.nio.ByteBuffer;
+
+import org.apache.commons.lang3.RandomUtils;
+import org.apache.hadoop.ozone.common.ChecksumByteBuffer;
+import org.apache.hadoop.ozone.common.ChecksumByteBufferImpl;
+import org.apache.hadoop.ozone.common.NativeCheckSumCRC32;
+import org.apache.hadoop.ozone.common.PureJavaCrc32ByteBuffer;
+import org.apache.hadoop.ozone.common.PureJavaCrc32CByteBuffer;
+import org.apache.hadoop.util.NativeCRC32Wrapper;
+import org.apache.hadoop.util.PureJavaCrc32;
+import org.apache.hadoop.util.PureJavaCrc32C;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Level;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.Param;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.Setup;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.Threads;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.infra.Blackhole;
+
+import java.util.zip.CRC32;
+
+import static java.util.concurrent.TimeUnit.MILLISECONDS;
+
+/**
+ * Class to benchmark various CRC implementations. This can be executed via
+ *
+ * ozone genesis -b BenchmarkCRC
+ *
+ * However there are some points to keep in mind. java.util.zip.CRC32C is not
+ * available until Java 9, therefore if the JVM has a lower version than 9, that
+ * implementation will not be tested.
+ *
+ * The hadoop native libraries will only be tested if libhadoop.so is found on
+ * the "-Djava.library.path". libhadoop.so is not currently bundled with Ozone,
+ * so it needs to be obtained from a Hadoop build and the test needs to be
+ * executed on a compatible OS (ie Linux x86):
+ *
+ * ozone --jvmargs -Djava.library.path=/home/sodonnell/native genesis -b
+ * BenchmarkCRC
+ */
+public class BenchMarkCRCStreaming {
+
+ private static int dataSize = 64 * 1024 * 1024;
+
+ @State(Scope.Thread)
+ public static class BenchmarkState {
+
+ private final ByteBuffer data = ByteBuffer.allocate(dataSize);
+
+ @Param({"512", "1024", "2048", "4096", "32768", "1048576"})
+ private int checksumSize;
+
+ @Param({"pureCRC32", "pureCRC32C", "hadoopCRC32C", "hadoopCRC32",
+ "zipCRC32", "zipCRC32C", "nativeCRC32", "nativeCRC32C"})
+ private String crcImpl;
+
+ private ChecksumByteBuffer checksum;
+
+ public ChecksumByteBuffer checksum() {
+ return checksum;
+ }
+
+ public String crcImpl() {
+ return crcImpl;
+ }
+
+ public int checksumSize() {
+ return checksumSize;
+ }
+
+ @Setup(Level.Trial)
+ public void setUp() {
+ switch (crcImpl) {
+ case "pureCRC32":
+ checksum = new PureJavaCrc32ByteBuffer();
+ break;
+ case "pureCRC32C":
+ checksum = new PureJavaCrc32CByteBuffer();
+ break;
+ case "hadoopCRC32":
+ checksum = new ChecksumByteBufferImpl(new PureJavaCrc32());
+ break;
+ case "hadoopCRC32C":
+ checksum = new ChecksumByteBufferImpl(new PureJavaCrc32C());
+ break;
+ case "zipCRC32":
+ checksum = new ChecksumByteBufferImpl(new CRC32());
+ break;
+ case "zipCRC32C":
+ try {
+ checksum = new ChecksumByteBufferImpl(
+ ChecksumByteBufferImpl.Java9Crc32CFactory.createChecksum());
+ } catch (Throwable e) {
+ throw new RuntimeException("zipCRC32C is not available pre Java 9");
+ }
+ break;
+ case "nativeCRC32":
+ if (NativeCRC32Wrapper.isAvailable()) {
+ checksum = new ChecksumByteBufferImpl(new NativeCheckSumCRC32(
+ NativeCRC32Wrapper.CHECKSUM_CRC32, checksumSize));
+ } else {
+ throw new RuntimeException("Native library is not available");
+ }
+ break;
+ case "nativeCRC32C":
+ if (NativeCRC32Wrapper.isAvailable()) {
+ checksum = new ChecksumByteBufferImpl(new NativeCheckSumCRC32(
+ NativeCRC32Wrapper.CHECKSUM_CRC32C, checksumSize));
+ } else {
+ throw new RuntimeException("Native library is not available");
+ }
+ break;
+ default:
+ }
+ data.clear();
+ data.put(RandomUtils.nextBytes(data.remaining()));
+ }
+ }
+
+ @Benchmark
+ @Threads(1)
+ @Warmup(iterations = 3, time = 1000, timeUnit = MILLISECONDS)
+ @Fork(value = 1, warmups = 0)
+ @Measurement(iterations = 5, time = 2000, timeUnit = MILLISECONDS)
+ @BenchmarkMode(Mode.Throughput)
+ public void runCRC(Blackhole blackhole, BenchmarkState state) {
+ ByteBuffer data = state.data;
+ data.clear();
Review comment:
Why calling clearing the data? Typo?
##########
File path: hadoop-hdds/common/src/test/java/org/apache/hadoop/ozone/common/TestChecksumImplsComputeSameValues.java
##########
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.common;
+
+import org.apache.commons.lang3.RandomUtils;
+import org.apache.hadoop.util.NativeCRC32Wrapper;
+import org.apache.hadoop.util.PureJavaCrc32;
+import org.apache.hadoop.util.PureJavaCrc32C;
+import org.junit.Test;
+
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.zip.CRC32;
+
+import static junit.framework.TestCase.assertEquals;
+
+public class TestChecksumImplsComputeSameValues {
+
+ private int dataSize = 1024 * 1024 * 64;
+ private ByteBuffer data = ByteBuffer.allocate(dataSize);
+ private int[] bytesPerChecksum = {512, 1024, 2048, 4096, 32768, 1048576};
+
+ @Test
+ public void testCRC32ImplsMatch() {
+ data.clear();
+ data.put(RandomUtils.nextBytes(data.remaining()));
+ for (int bpc : bytesPerChecksum) {
+ List<ChecksumByteBuffer> impls = new ArrayList<>();
+ impls.add(new PureJavaCrc32ByteBuffer());
+ impls.add(new ChecksumByteBufferImpl(new PureJavaCrc32()));
+ impls.add(new ChecksumByteBufferImpl(new CRC32()));
+ if (NativeCRC32Wrapper.isAvailable()) {
+ impls.add(new ChecksumByteBufferImpl(new NativeCheckSumCRC32(1, bpc)));
+ }
+ assertEquals(true, validateImpls(data, impls, bpc));
+ }
+ }
+
+ @Test
+ public void testCRC32CImplsMatch() {
+ data.clear();
+ data.put(RandomUtils.nextBytes(data.remaining()));
+ for (int bpc : bytesPerChecksum) {
+ List<ChecksumByteBuffer> impls = new ArrayList<>();
+ impls.add(new PureJavaCrc32CByteBuffer());
+ impls.add(new ChecksumByteBufferImpl(new PureJavaCrc32C()));
+ // TODO - optional loaded java.util.zip.CRC32C if >= Java 9
+ // impls.add(new ChecksumByteBufferImpl(new CRC32C())));
Review comment:
How about doing try-catch Java9Crc32CFactory.createChecksum()? Ignore the exception if it is unavailable.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org
[GitHub] [ozone] sodonnel commented on pull request #1910: HDDS-4808. Add Genesis benchmark for various CRC implementations
Posted by GitBox <gi...@apache.org>.
sodonnel commented on pull request #1910:
URL: https://github.com/apache/ozone/pull/1910#issuecomment-775165462
Running the new benchmarks give the following results. I have posted the conclusion at the start as this comment is quite long.
TLDR:
## Conclusion:
* For real world streaming CRC calculation, the java.util.zip implementations are best on Java 11.
* On Java 8 - CRC32C performance of hadoop native is close to, or slightly better than java.util.zip for higher BPC.
* Hadoop native for CRC32 is a lot slower than CRC32C. Hadoop uses CRC32C by default, but there appears to be an issue there.
## Recommendation:
* Switch Ozone to use java.util.zip.CRC32 by default.
* Switch the non-default CRC32C implementation in Ozone to the Hadoop pure Java implementation, but use java.util.zip.CRC32C if available.
# Benchmarks
There are several implementations of CRC available:
* Ozone Java CRC32
* Ozone Java CRC32C
* Hadoop Java CRC32
* Hadoop Java CRC32
* Java util.zip.CRC32
* Java util.zip.CRC32C
* Hadoop Native CRC32
* Hadoop Native CRC32C
The performance of the algorithm can also depend on the number of data bytes used for each checksum - bytes Per Checksum (BPC).
HDFS has a default BPS of 512 generating 1MB of checksum data per 128MB block.
Ozone has a default BPS of 1MB generating 512 bytes of checksum data per 128MB block.
There is a benchmark class in Hadoop, called Crc32PerformanceTest.java which produces results like the following for varying BPC:
```
| bpc | #T || Zip || ZipC | % diff || PureJava | % diff || PureJavaC | % diff || Native | % diff || NativeC | % diff |
| 512 | 1 | 1736.2 | 1706.4 | -1.7% | 875.4 | -48.7% | 855.3 | -2.3% | 937.2 | 9.6% | 5289.9 | 464.5% |
| 512 | 2 | 2257.5 | 1978.2 | -12.4% | 949.8 | -52.0% | 911.5 | -4.0% | 1089.9 | 19.6% | 6475.0 | 494.1% |
| 512 | 4 | 2257.9 | 1879.4 | -16.8% | 1000.6 | -46.8% | 877.6 | -12.3% | 1087.2 | 23.9% | 6128.5 | 463.7% |
| 512 | 8 | 2322.2 | 1930.3 | -16.9% | 984.7 | -49.0% | 812.1 | -17.5% | 1101.8 | 35.7% | 5508.6 | 400.0% |
| 512 | 16 | 2208.6 | 1876.9 | -15.0% | 932.4 | -50.3% | 753.2 | -19.2% | 1078.2 | 43.1% | 4830.6 | 348.0% |
| bpc | #T || Zip || ZipC | % diff || PureJava | % diff || PureJavaC | % diff || Native | % diff || NativeC | % diff |
| 1024 | 1 | 2252.8 | 2710.8 | 20.3% | 1019.4 | -62.4% | 879.0 | -13.8% | 966.7 | 10.0% | 4535.2 | 369.2% |
| 1024 | 2 | 2411.5 | 2470.6 | 2.5% | 992.0 | -59.8% | 857.7 | -13.5% | 1039.9 | 21.2% | 4181.8 | 302.2% |
| 1024 | 4 | 2656.1 | 2839.8 | 6.9% | 991.9 | -65.1% | 868.1 | -12.5% | 1034.0 | 19.1% | 5473.8 | 429.4% |
| 1024 | 8 | 2391.7 | 2472.1 | 3.4% | 958.6 | -61.2% | 864.1 | -9.9% | 1060.8 | 22.8% | 5314.1 | 400.9% |
| 1024 | 16 | 2545.7 | 2722.7 | 7.0% | 959.3 | -64.8% | 682.5 | -28.9% | 1095.7 | 60.5% | 4814.3 | 339.4% |
| bpc | #T || Zip || ZipC | % diff || PureJava | % diff || PureJavaC | % diff || Native | % diff || NativeC | % diff |
| 2048 | 1 | 1928.7 | 3257.2 | 68.9% | 867.9 | -73.4% | 819.5 | -5.6% | 1035.0 | 26.3% | 4017.9 | 288.2% |
| 2048 | 2 | 2237.2 | 3413.9 | 52.6% | 967.3 | -71.7% | 870.2 | -10.0% | 1011.1 | 16.2% | 5656.2 | 459.4% |
| 2048 | 4 | 2529.7 | 3860.5 | 52.6% | 969.4 | -74.9% | 855.8 | -11.7% | 1108.2 | 29.5% | 5976.6 | 439.3% |
| 2048 | 8 | 2615.2 | 3554.2 | 35.9% | 914.0 | -74.3% | 818.2 | -10.5% | 1071.4 | 31.0% | 5289.9 | 393.7% |
| 2048 | 16 | 2659.1 | 3246.8 | 22.1% | 935.8 | -71.2% | 777.0 | -17.0% | 1111.1 | 43.0% | 4433.7 | 299.0% |
| bpc | #T || Zip || ZipC | % diff || PureJava | % diff || PureJavaC | % diff || Native | % diff || NativeC | % diff |
| 4096 | 1 | 2619.0 | 3460.2 | 32.1% | 1052.1 | -69.6% | 823.9 | -21.7% | 925.4 | 12.3% | 7221.2 | 680.3% |
| 4096 | 2 | 2686.4 | 3518.6 | 31.0% | 1013.6 | -71.2% | 855.1 | -15.6% | 982.6 | 14.9% | 7522.6 | 665.6% |
| 4096 | 4 | 2722.8 | 3225.1 | 18.4% | 973.9 | -69.8% | 881.6 | -9.5% | 1039.8 | 17.9% | 7346.7 | 606.6% |
| 4096 | 8 | 3336.5 | 3680.6 | 10.3% | 1025.9 | -72.1% | 928.4 | -9.5% | 1108.9 | 19.4% | 7394.3 | 566.8% |
| 4096 | 16 | 2924.1 | 3604.2 | 23.3% | 907.3 | -74.8% | 882.7 | -2.7% | 1106.3 | 25.3% | 4543.4 | 310.7% |
| bpc | #T || Zip || ZipC | % diff || PureJava | % diff || PureJavaC | % diff || Native | % diff || NativeC | % diff |
| 8192 | 1 | 2867.8 | 3373.2 | 17.6% | 892.6 | -73.5% | 938.7 | 5.2% | 980.8 | 4.5% | 8047.5 | 720.5% |
| 8192 | 2 | 3022.8 | 3704.8 | 22.6% | 898.6 | -75.7% | 855.2 | -4.8% | 1010.0 | 18.1% | 7174.6 | 610.4% |
| 8192 | 4 | 3196.3 | 4309.5 | 34.8% | 913.8 | -78.8% | 882.2 | -3.5% | 1027.9 | 16.5% | 8071.9 | 685.3% |
| 8192 | 8 | 3135.9 | 4542.4 | 44.9% | 1027.2 | -77.4% | 864.1 | -15.9% | 1072.4 | 24.1% | 5925.5 | 452.5% |
| 8192 | 16 | 2961.7 | 3570.6 | 20.6% | 983.4 | -72.5% | 711.2 | -27.7% | 1119.0 | 57.4% | 4282.8 | 282.7% |
| bpc | #T || Zip || ZipC | % diff || PureJava | % diff || PureJavaC | % diff || Native | % diff || NativeC | % diff |
| 16384 | 1 | 2836.1 | 3645.2 | 28.5% | 1052.0 | -71.1% | 973.8 | -7.4% | 984.4 | 1.1% | 7577.3 | 669.7% |
| 16384 | 2 | 2967.9 | 3705.1 | 24.8% | 942.0 | -74.6% | 881.0 | -6.5% | 1026.9 | 16.6% | 9675.0 | 842.2% |
| 16384 | 4 | 3218.5 | 4501.9 | 39.9% | 980.7 | -78.2% | 885.4 | -9.7% | 1058.8 | 19.6% | 7105.5 | 571.1% |
| 16384 | 8 | 2827.4 | 4076.0 | 44.2% | 1012.6 | -75.2% | 876.7 | -13.4% | 1011.1 | 15.3% | 5649.8 | 458.8% |
| 16384 | 16 | 2423.0 | 3314.9 | 36.8% | 824.5 | -75.1% | 802.4 | -2.7% | 1079.0 | 34.5% | 4112.8 | 281.1% |
| bpc | #T || Zip || ZipC | % diff || PureJava | % diff || PureJavaC | % diff || Native | % diff || NativeC | % diff |
| 32768 | 1 | 1998.8 | 3483.5 | 74.3% | 904.5 | -74.0% | 784.0 | -13.3% | 965.7 | 23.2% | 7445.7 | 671.0% |
| 32768 | 2 | 2526.4 | 3826.7 | 51.5% | 922.5 | -75.9% | 859.4 | -6.8% | 1101.4 | 28.2% | 9013.3 | 718.4% |
| 32768 | 4 | 3076.2 | 4535.3 | 47.4% | 972.3 | -78.6% | 897.1 | -7.7% | 1088.1 | 21.3% | 7682.5 | 606.0% |
| 32768 | 8 | 3127.8 | 3966.8 | 26.8% | 1021.3 | -74.3% | 894.9 | -12.4% | 1103.7 | 23.3% | 6305.4 | 471.3% |
| 32768 | 16 | 3122.3 | 3480.9 | 11.5% | 1030.6 | -70.4% | 842.5 | -18.3% | 1117.5 | 32.6% | 3663.3 | 227.8% |
| bpc | #T || Zip || ZipC | % diff || PureJava | % diff || PureJavaC | % diff || Native | % diff || NativeC | % diff |
| 65536 | 1 | 3129.3 | 3846.4 | 22.9% | 1050.2 | -72.7% | 804.7 | -23.4% | 1175.7 | 46.1% | 7242.8 | 516.1% |
| 65536 | 2 | 3235.7 | 4088.4 | 26.4% | 1051.4 | -74.3% | 852.6 | -18.9% | 1049.2 | 23.1% | 7805.4 | 643.9% |
| 65536 | 4 | 3061.9 | 4777.9 | 56.0% | 1037.6 | -78.3% | 822.7 | -20.7% | 1092.4 | 32.8% | 7706.1 | 605.5% |
| 65536 | 8 | 3239.2 | 4242.4 | 31.0% | 1016.3 | -76.0% | 821.1 | -19.2% | 1078.5 | 31.4% | 5994.4 | 455.8% |
| 65536 | 16 | 2949.5 | 3480.7 | 18.0% | 770.1 | -77.9% | 825.6 | 7.2% | 1081.9 | 31.0% | 3349.3 | 209.6% |
```
Here:
* Zip(C) is java.util.zip.CRC32(C)
* PureJava(C) is the hadoop implementation
* Native(C) is the native hadoop implementation
The numbers in the table show throughput in MB/s. Therefore a higher number is better. With only this data, it is easy conclude that NativeC is the clear winner for all BPC. However, that may not be the case.
In the hadoop benchmark, the logic creates a 64MB byte buffer. Then it calculates the expected checksum. Then it benchmarks a "validate checksums" routine, where it generates the checksums for the new data and compares that with the expected.
For the native calls, the code is like this:
```
public void verifyChunked(ByteBuffer data, int bytesPerSum,
ByteBuffer sums, String fileName, long basePos)
throws ChecksumException {
NativeCrc32.verifyChunkedSums(bytesPerSum, DataChecksum.Type.CRC32.id,
sums, data, fileName, basePos);
}
```
Ie, it calls NativeCRC32.verifyChunkedSums, which takes the entire data set (64MB) and runs the complete validation in a single native call.
The pure Java and java.util.zip implementations cannot do this. They must loop over the data and make multiple calls to the Checksum implementation to checksum at each BPC boundary. Its also worth noting the java.util.zip CRC classes make native calls too.
The above does not test real world use. We don't buffer 64MB of data and then calculate / verify all the CRCs in a batch. Rather, we stream the data and calculate the CRCs on demand. It is important to test the streaming case to get more realistic results.
Using the following simple loop in a JMH benchmark, we can get a more realistic test. First populate a 64MB ByteBuffer with random bytes. Then using the following loop, calculate the checksums for the 64MB at BPC intervals:
```
for (int i=0; i<data.capacity(); i += bytesPerCheckSum) {
data.position(i);
data.limit(i+bytesPerCheckSum);
csum.update(data);
blackhole.consume(csum.getValue());
csum.reset();
}
```
The performance at 512 BPC:
```
BPC Impl J11-1 J11-2 J8-1 J8-2
------------------------------------------------------
512 pureCRC32 10.105 9.5 10.346 11.221
512 pureCRC32C 9.519 9.646 11.111 10.72
512 hadoopCRC32C 16.817 17.183 19.908 19.83
512 hadoopCRC32 19.897 19.345 19.089 17.645
512 zipCRC32 72.795 80.716 59.145 52.792
512 zipCRC32C 56.321 49.921 0 0
512 nativeCRC32 14.316 15.352 15.873 16.697
512 nativeCRC32C 35.651 29.765 39.491 41.885
```
The numbers above are JHM throught put - ie how many times we can calculate the checksums on 64MB of data per second.
* pureCRC* - Ozone implementations in pure Java.
* hadoopCRC32* - Hadoop implementation in pure Java.
* zip* - Java util zip implementations. Note CRC32C is only available from Java 9 and later.
I ran twice on Java 11 and twice on Java 8.
PureCRC32(C), as used in Ozone is the slowest.
The pure java hadoop implementation as significantly faster, but still not great.
java.util.zip is best, beating the native Hadoop implementation by quite a margin.
Also notable, and reproducible in all test runs - java.util.zip.CRC32 is improved significantly in Java 11 over Java 8.
If we also test the Hadoop native implementation, calculating all checksums in a single call (as the hadoop benchmark did), we can see it is fastest as the earlier Hadoop test showed:
```
BPC Impl J11-1 J11-2
---------------------------------------
512 nativeCRC32B 22.977 23.343
512 nativeCRC32CB 108.674 102.923
```
I don't have an explanation as to why CRC32CB is so much faster than CRC32B, but this is consistently so.
Moving on to a higher BPC:
```
BPC Impl J11-1 J11-2 J8-1 J8-2
------------------------------------------------------
4096 pureCRC32 10.334 9.607 11.694 11.682
4096 pureCRC32C 10.365 9.212 11.771 11.818
4096 hadoopCRC32C 17.076 17.235 19.934 20.519
4096 hadoopCRC32 18.789 21.042 18.243 16.353
4096 zipCRC32 100.413 120.215 88.079 109.794
4096 zipCRC32C 108.522 129.197 0 0
4096 nativeCRC32 21.318 21.508 22.177 20.481
4096 nativeCRC32C 77.365 87.459 90.591 89.689
4096 nativeCRC32B 22.651 23.884 0 0
4096 nativeCRC32CB 191.301 175.54 0 0
```
The pure Java implementations have not benefited at all. The zip implementations are significantly faster and still best. The Hadoop native have improved too. There does appear to be something wrong with nativeCRC32 as it lags CRC32C by a large margin.
```
BPC Impl J11-1 J11-2 J8-1 J8-2
------------------------------------------------------
32768 pureCRC32 11.278 11.837 12.284 11.557
32768 pureCRC32C 10.875 11.794 12.006 11.893
32768 hadoopCRC32C 16.477 15.856 19.722 20.599
32768 hadoopCRC32 18.444 20.055 17.601 18.992
32768 zipCRC32 127.591 114.87 104.169 117.778
32768 zipCRC32C 100.77 126.446 0 0
32768 nativeCRC32 23.488 23.934 22.74 23.594
32768 nativeCRC32C 106.726 104.538 106.031 105.871
32768 nativeCRC32B 20.225 23.161 0 0
32768 nativeCRC32CB 167.656 202.245 0 0
BPC Impl J11-1 J11-2 J8-1 J8-2
------------------------------------------------------
1048576 pureCRC32 11.469 11.03 11.673 11.27
1048576 pureCRC32C 11.111 10.98 11.955 11.395
1048576 hadoopCRC32C 15.926 16.686 17.126 18.884
1048576 hadoopCRC32 21.064 20.656 19.65 19.343
1048576 zipCRC32 118.338 116.067 113.645 111.888
1048576 zipCRC32C 117.705 131.284 0 0
1048576 nativeCRC32 21.727 23.414 22.14 22.923
1048576 nativeCRC32C 108.098 109.05 107.373 91.435
1048576 nativeCRC32B 21.134 23.279 0 0
1048576 nativeCRC32CB 108.972 100.259 0 0
```
The numbers have more variance at the higher BPC, but the trend remains.
## Conclusion:
* For real world streaming CRC calculation, the java.util.zip implementations are best on Java 11.
* On Java 8 - CRC32C performance of hadoop native is close to, or slightly better than java.util.zip for higher BPC.
* Hadoop native for CRC32 is a lot slower than CRC32C. Hadoop uses CRC32C by default, but there appears to be an issue there.
## Recommendation:
* Switch Ozone to use java.util.zip.CRC32 by default.
* Switch the non-default CRC32C implementation in Ozone to the Hadoop pure Java implementation, but use java.util.zip.CRC32C if available.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org