You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/02/08 13:46:45 UTC

[GitHub] [ozone] sodonnel opened a new pull request #1910: HDDS-4808. Add Genesis benchmark for various CRC implementations

sodonnel opened a new pull request #1910:
URL: https://github.com/apache/ozone/pull/1910


   ## What changes were proposed in this pull request?
   
   Add a Genesis benchmark to compare the performance of various CRC32 implementations.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-4808
   
   ## How was this patch tested?
   
   Benchmarks were execute manually. One new test added to validate that all CRC implementations give the same result.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] swagle commented on a change in pull request #1910: HDDS-4808. Add Genesis benchmark for various CRC implementations

Posted by GitBox <gi...@apache.org>.
swagle commented on a change in pull request #1910:
URL: https://github.com/apache/ozone/pull/1910#discussion_r577026104



##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/util/NativeCRC32Wrapper.java
##########
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.util;
+
+import org.apache.hadoop.fs.ChecksumException;
+
+import java.nio.ByteBuffer;
+
+/**
+ * This class wraps the NativeCRC32 class in hadoop-common, because the class
+ * is package private there. The intention of making this class available
+ * in Ozone is to allow the native libraries to be benchmarked alongside other
+ * implementations. At the current time, the hadoop native CRC is not used
+ * anywhere in Ozone except for benchmarks.

Review comment:
       Important to call this out in the jira description as well as the PR. With the changes in this patch could Ozone start making use of the native CRC implementation?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #1910: HDDS-4808. Add Genesis benchmark for various CRC implementations

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #1910:
URL: https://github.com/apache/ozone/pull/1910#discussion_r577118464



##########
File path: hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/genesis/BenchMarkCRCBatch.java
##########
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.genesis;
+
+import java.nio.ByteBuffer;
+
+import org.apache.commons.lang3.RandomUtils;
+import org.apache.hadoop.util.NativeCRC32Wrapper;
+import org.openjdk.jmh.annotations.Benchmark;

Review comment:
       I don't believe so. JMH is pulled in as a dependency in the pom.xml and other existing benchmarks have these same imports.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] szetszwo commented on a change in pull request #1910: HDDS-4808. Add Genesis benchmark for various CRC implementations

Posted by GitBox <gi...@apache.org>.
szetszwo commented on a change in pull request #1910:
URL: https://github.com/apache/ozone/pull/1910#discussion_r577552115



##########
File path: hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/genesis/BenchMarkCRCStreaming.java
##########
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.genesis;
+
+import java.nio.ByteBuffer;
+
+import org.apache.commons.lang3.RandomUtils;
+import org.apache.hadoop.ozone.common.ChecksumByteBuffer;
+import org.apache.hadoop.ozone.common.ChecksumByteBufferImpl;
+import org.apache.hadoop.ozone.common.NativeCheckSumCRC32;
+import org.apache.hadoop.ozone.common.PureJavaCrc32ByteBuffer;
+import org.apache.hadoop.ozone.common.PureJavaCrc32CByteBuffer;
+import org.apache.hadoop.util.NativeCRC32Wrapper;
+import org.apache.hadoop.util.PureJavaCrc32;
+import org.apache.hadoop.util.PureJavaCrc32C;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Level;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.Param;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.Setup;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.Threads;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.infra.Blackhole;
+
+import java.util.zip.CRC32;
+
+import static java.util.concurrent.TimeUnit.MILLISECONDS;
+
+/**
+ * Class to benchmark various CRC implementations. This can be executed via
+ *
+ * ozone genesis -b BenchmarkCRC
+ *
+ * However there are some points to keep in mind. java.util.zip.CRC32C is not
+ * available until Java 9, therefore if the JVM has a lower version than 9, that
+ * implementation will not be tested.
+ *
+ * The hadoop native libraries will only be tested if libhadoop.so is found on
+ * the "-Djava.library.path". libhadoop.so is not currently bundled with Ozone,
+ * so it needs to be obtained from a Hadoop build and the test needs to be
+ * executed on a compatible OS (ie Linux x86):
+ *
+ * ozone --jvmargs -Djava.library.path=/home/sodonnell/native genesis -b
+ *     BenchmarkCRC
+ */
+public class BenchMarkCRCStreaming {
+
+  private static int dataSize = 64 * 1024 * 1024;
+
+  @State(Scope.Thread)
+  public static class BenchmarkState {
+
+    private final ByteBuffer data = ByteBuffer.allocate(dataSize);
+
+    @Param({"512", "1024", "2048", "4096", "32768", "1048576"})
+    private int checksumSize;
+
+    @Param({"pureCRC32", "pureCRC32C", "hadoopCRC32C", "hadoopCRC32",
+        "zipCRC32", "zipCRC32C", "nativeCRC32", "nativeCRC32C"})
+    private String crcImpl;
+
+    private ChecksumByteBuffer checksum;
+
+    public ChecksumByteBuffer checksum() {
+      return checksum;
+    }
+
+    public String crcImpl() {
+      return crcImpl;
+    }
+
+    public int checksumSize() {
+      return checksumSize;
+    }
+
+    @Setup(Level.Trial)
+    public void setUp() {
+      switch (crcImpl) {
+      case "pureCRC32":
+        checksum = new PureJavaCrc32ByteBuffer();
+        break;
+      case "pureCRC32C":
+        checksum = new PureJavaCrc32CByteBuffer();
+        break;
+      case "hadoopCRC32":
+        checksum = new ChecksumByteBufferImpl(new PureJavaCrc32());
+        break;
+      case "hadoopCRC32C":
+        checksum = new ChecksumByteBufferImpl(new PureJavaCrc32C());
+        break;
+      case "zipCRC32":
+        checksum = new ChecksumByteBufferImpl(new CRC32());
+        break;
+      case "zipCRC32C":
+        try {
+          checksum = new ChecksumByteBufferImpl(
+              ChecksumByteBufferImpl.Java9Crc32CFactory.createChecksum());
+        } catch (Throwable e) {
+          throw new RuntimeException("zipCRC32C is not available pre Java 9");
+        }
+        break;
+      case "nativeCRC32":
+        if (NativeCRC32Wrapper.isAvailable()) {
+          checksum = new ChecksumByteBufferImpl(new NativeCheckSumCRC32(
+              NativeCRC32Wrapper.CHECKSUM_CRC32, checksumSize));
+        } else {
+          throw new RuntimeException("Native library is not available");
+        }
+        break;
+      case "nativeCRC32C":
+        if (NativeCRC32Wrapper.isAvailable()) {
+          checksum = new ChecksumByteBufferImpl(new NativeCheckSumCRC32(
+              NativeCRC32Wrapper.CHECKSUM_CRC32C, checksumSize));
+        } else {
+          throw new RuntimeException("Native library is not available");
+        }
+        break;
+      default:
+      }
+      data.clear();
+      data.put(RandomUtils.nextBytes(data.remaining()));
+    }
+  }
+
+  @Benchmark
+  @Threads(1)
+  @Warmup(iterations = 3, time = 1000, timeUnit = MILLISECONDS)
+  @Fork(value = 1, warmups = 0)
+  @Measurement(iterations = 5, time = 2000, timeUnit = MILLISECONDS)
+  @BenchmarkMode(Mode.Throughput)
+  public void runCRC(Blackhole blackhole, BenchmarkState state) {
+    ByteBuffer data = state.data;
+    data.clear();

Review comment:
       You are right -- clear() does not really clear the buffer.  Thanks.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #1910: HDDS-4808. Add Genesis benchmark for various CRC implementations

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #1910:
URL: https://github.com/apache/ozone/pull/1910#discussion_r577117169



##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/util/NativeCRC32Wrapper.java
##########
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.util;
+
+import org.apache.hadoop.fs.ChecksumException;
+
+import java.nio.ByteBuffer;
+
+/**
+ * This class wraps the NativeCRC32 class in hadoop-common, because the class
+ * is package private there. The intention of making this class available
+ * in Ozone is to allow the native libraries to be benchmarked alongside other
+ * implementations. At the current time, the hadoop native CRC is not used
+ * anywhere in Ozone except for benchmarks.

Review comment:
       Not unless you get the compiled shared library from a hadoop build and then add it to the java.library.path. However to be able to benchmark the native libs, we need this code here. The classes inside Hadoop common are marked private, which is why I needed to wrap them.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel merged pull request #1910: HDDS-4808. Add Genesis benchmark for various CRC implementations

Posted by GitBox <gi...@apache.org>.
sodonnel merged pull request #1910:
URL: https://github.com/apache/ozone/pull/1910


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] swagle commented on a change in pull request #1910: HDDS-4808. Add Genesis benchmark for various CRC implementations

Posted by GitBox <gi...@apache.org>.
swagle commented on a change in pull request #1910:
URL: https://github.com/apache/ozone/pull/1910#discussion_r577041259



##########
File path: hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/genesis/BenchMarkCRCBatch.java
##########
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.genesis;
+
+import java.nio.ByteBuffer;
+
+import org.apache.commons.lang3.RandomUtils;
+import org.apache.hadoop.util.NativeCRC32Wrapper;
+import org.openjdk.jmh.annotations.Benchmark;

Review comment:
       Does this add openjdk compile-time dep?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #1910: HDDS-4808. Add Genesis benchmark for various CRC implementations

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #1910:
URL: https://github.com/apache/ozone/pull/1910#discussion_r577487531



##########
File path: hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/genesis/BenchMarkCRCStreaming.java
##########
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.genesis;
+
+import java.nio.ByteBuffer;
+
+import org.apache.commons.lang3.RandomUtils;
+import org.apache.hadoop.ozone.common.ChecksumByteBuffer;
+import org.apache.hadoop.ozone.common.ChecksumByteBufferImpl;
+import org.apache.hadoop.ozone.common.NativeCheckSumCRC32;
+import org.apache.hadoop.ozone.common.PureJavaCrc32ByteBuffer;
+import org.apache.hadoop.ozone.common.PureJavaCrc32CByteBuffer;
+import org.apache.hadoop.util.NativeCRC32Wrapper;
+import org.apache.hadoop.util.PureJavaCrc32;
+import org.apache.hadoop.util.PureJavaCrc32C;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Level;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.Param;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.Setup;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.Threads;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.infra.Blackhole;
+
+import java.util.zip.CRC32;
+
+import static java.util.concurrent.TimeUnit.MILLISECONDS;
+
+/**
+ * Class to benchmark various CRC implementations. This can be executed via
+ *
+ * ozone genesis -b BenchmarkCRC
+ *
+ * However there are some points to keep in mind. java.util.zip.CRC32C is not
+ * available until Java 9, therefore if the JVM has a lower version than 9, that
+ * implementation will not be tested.
+ *
+ * The hadoop native libraries will only be tested if libhadoop.so is found on
+ * the "-Djava.library.path". libhadoop.so is not currently bundled with Ozone,
+ * so it needs to be obtained from a Hadoop build and the test needs to be
+ * executed on a compatible OS (ie Linux x86):
+ *
+ * ozone --jvmargs -Djava.library.path=/home/sodonnell/native genesis -b
+ *     BenchmarkCRC
+ */
+public class BenchMarkCRCStreaming {
+
+  private static int dataSize = 64 * 1024 * 1024;
+
+  @State(Scope.Thread)
+  public static class BenchmarkState {
+
+    private final ByteBuffer data = ByteBuffer.allocate(dataSize);
+
+    @Param({"512", "1024", "2048", "4096", "32768", "1048576"})
+    private int checksumSize;
+
+    @Param({"pureCRC32", "pureCRC32C", "hadoopCRC32C", "hadoopCRC32",
+        "zipCRC32", "zipCRC32C", "nativeCRC32", "nativeCRC32C"})
+    private String crcImpl;
+
+    private ChecksumByteBuffer checksum;
+
+    public ChecksumByteBuffer checksum() {
+      return checksum;
+    }
+
+    public String crcImpl() {
+      return crcImpl;
+    }
+
+    public int checksumSize() {
+      return checksumSize;
+    }
+
+    @Setup(Level.Trial)
+    public void setUp() {
+      switch (crcImpl) {
+      case "pureCRC32":
+        checksum = new PureJavaCrc32ByteBuffer();
+        break;
+      case "pureCRC32C":
+        checksum = new PureJavaCrc32CByteBuffer();
+        break;
+      case "hadoopCRC32":
+        checksum = new ChecksumByteBufferImpl(new PureJavaCrc32());
+        break;
+      case "hadoopCRC32C":
+        checksum = new ChecksumByteBufferImpl(new PureJavaCrc32C());
+        break;
+      case "zipCRC32":
+        checksum = new ChecksumByteBufferImpl(new CRC32());
+        break;
+      case "zipCRC32C":
+        try {
+          checksum = new ChecksumByteBufferImpl(
+              ChecksumByteBufferImpl.Java9Crc32CFactory.createChecksum());
+        } catch (Throwable e) {
+          throw new RuntimeException("zipCRC32C is not available pre Java 9");
+        }
+        break;
+      case "nativeCRC32":
+        if (NativeCRC32Wrapper.isAvailable()) {
+          checksum = new ChecksumByteBufferImpl(new NativeCheckSumCRC32(
+              NativeCRC32Wrapper.CHECKSUM_CRC32, checksumSize));
+        } else {
+          throw new RuntimeException("Native library is not available");
+        }
+        break;
+      case "nativeCRC32C":
+        if (NativeCRC32Wrapper.isAvailable()) {
+          checksum = new ChecksumByteBufferImpl(new NativeCheckSumCRC32(
+              NativeCRC32Wrapper.CHECKSUM_CRC32C, checksumSize));
+        } else {
+          throw new RuntimeException("Native library is not available");
+        }
+        break;
+      default:
+      }
+      data.clear();
+      data.put(RandomUtils.nextBytes(data.remaining()));
+    }
+  }
+
+  @Benchmark
+  @Threads(1)
+  @Warmup(iterations = 3, time = 1000, timeUnit = MILLISECONDS)
+  @Fork(value = 1, warmups = 0)
+  @Measurement(iterations = 5, time = 2000, timeUnit = MILLISECONDS)
+  @BenchmarkMode(Mode.Throughput)
+  public void runCRC(Blackhole blackhole, BenchmarkState state) {
+    ByteBuffer data = state.data;
+    data.clear();

Review comment:
       clear does not actually alter the buffer contents, it only sets the position to zero and the limit to the capacity, getting the buffer read for a new read / write. I guess I don't need the clear here as I set position and limit on each pass around the loop, so I think I can remove line safely.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] szetszwo commented on a change in pull request #1910: HDDS-4808. Add Genesis benchmark for various CRC implementations

Posted by GitBox <gi...@apache.org>.
szetszwo commented on a change in pull request #1910:
URL: https://github.com/apache/ozone/pull/1910#discussion_r577397045



##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/common/ChecksumByteBufferImpl.java
##########
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.common;
+
+import java.lang.invoke.MethodHandle;
+import java.lang.invoke.MethodHandles;
+import java.lang.invoke.MethodType;
+import java.nio.ByteBuffer;
+import java.util.zip.Checksum;
+
+public class ChecksumByteBufferImpl implements ChecksumByteBuffer {
+
+  public static class Java9Crc32CFactory {
+    private static final MethodHandle NEW_CRC32C_MH;
+
+    static {
+      MethodHandle newCRC32C = null;
+      try {
+        newCRC32C = MethodHandles.publicLookup()
+            .findConstructor(
+                Class.forName("java.util.zip.CRC32C"),
+                MethodType.methodType(void.class)
+            );
+      } catch (ReflectiveOperationException e) {
+        // Should not reach here.
+        throw new RuntimeException(e);
+      }
+      NEW_CRC32C_MH = newCRC32C;
+    }
+
+    public static java.util.zip.Checksum createChecksum() {
+      try {
+        // Should throw nothing
+        return (Checksum) NEW_CRC32C_MH.invoke();
+      } catch (Throwable t) {
+        throw (t instanceof RuntimeException) ? (RuntimeException) t
+            : new RuntimeException(t);
+      }
+    }
+  };
+
+  private Checksum checksum;
+
+  public ChecksumByteBufferImpl(Checksum impl) {
+    this.checksum = impl;
+  }
+
+  @Override
+  public void update(ByteBuffer buffer) {
+    if (buffer.hasArray()) {
+      checksum.update(buffer.array(), buffer.position() + buffer.arrayOffset(),
+          buffer.remaining());
+    } else {
+      byte[] b = new byte[buffer.remaining()];
+      buffer.get(b);
+      checksum.update(b, 0, b.length);
+    }
+  }

Review comment:
       Since Java 9 Checksum supports `update(ByteBuffer)` https://docs.oracle.com/javase/9/docs/api/java/util/zip/Checksum.html#update-java.nio.ByteBuffer- , this method should call it when `checksum` is a Java. 9 Checksum object.

##########
File path: hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/genesis/BenchMarkCRCStreaming.java
##########
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.genesis;
+
+import java.nio.ByteBuffer;
+
+import org.apache.commons.lang3.RandomUtils;
+import org.apache.hadoop.ozone.common.ChecksumByteBuffer;
+import org.apache.hadoop.ozone.common.ChecksumByteBufferImpl;
+import org.apache.hadoop.ozone.common.NativeCheckSumCRC32;
+import org.apache.hadoop.ozone.common.PureJavaCrc32ByteBuffer;
+import org.apache.hadoop.ozone.common.PureJavaCrc32CByteBuffer;
+import org.apache.hadoop.util.NativeCRC32Wrapper;
+import org.apache.hadoop.util.PureJavaCrc32;
+import org.apache.hadoop.util.PureJavaCrc32C;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Level;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.Param;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.Setup;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.Threads;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.infra.Blackhole;
+
+import java.util.zip.CRC32;
+
+import static java.util.concurrent.TimeUnit.MILLISECONDS;
+
+/**
+ * Class to benchmark various CRC implementations. This can be executed via
+ *
+ * ozone genesis -b BenchmarkCRC
+ *
+ * However there are some points to keep in mind. java.util.zip.CRC32C is not
+ * available until Java 9, therefore if the JVM has a lower version than 9, that
+ * implementation will not be tested.
+ *
+ * The hadoop native libraries will only be tested if libhadoop.so is found on
+ * the "-Djava.library.path". libhadoop.so is not currently bundled with Ozone,
+ * so it needs to be obtained from a Hadoop build and the test needs to be
+ * executed on a compatible OS (ie Linux x86):
+ *
+ * ozone --jvmargs -Djava.library.path=/home/sodonnell/native genesis -b
+ *     BenchmarkCRC
+ */
+public class BenchMarkCRCStreaming {
+
+  private static int dataSize = 64 * 1024 * 1024;
+
+  @State(Scope.Thread)
+  public static class BenchmarkState {
+
+    private final ByteBuffer data = ByteBuffer.allocate(dataSize);
+
+    @Param({"512", "1024", "2048", "4096", "32768", "1048576"})
+    private int checksumSize;
+
+    @Param({"pureCRC32", "pureCRC32C", "hadoopCRC32C", "hadoopCRC32",
+        "zipCRC32", "zipCRC32C", "nativeCRC32", "nativeCRC32C"})
+    private String crcImpl;
+
+    private ChecksumByteBuffer checksum;
+
+    public ChecksumByteBuffer checksum() {
+      return checksum;
+    }
+
+    public String crcImpl() {
+      return crcImpl;
+    }
+
+    public int checksumSize() {
+      return checksumSize;
+    }
+
+    @Setup(Level.Trial)
+    public void setUp() {
+      switch (crcImpl) {
+      case "pureCRC32":
+        checksum = new PureJavaCrc32ByteBuffer();
+        break;
+      case "pureCRC32C":
+        checksum = new PureJavaCrc32CByteBuffer();
+        break;
+      case "hadoopCRC32":
+        checksum = new ChecksumByteBufferImpl(new PureJavaCrc32());
+        break;
+      case "hadoopCRC32C":
+        checksum = new ChecksumByteBufferImpl(new PureJavaCrc32C());
+        break;
+      case "zipCRC32":
+        checksum = new ChecksumByteBufferImpl(new CRC32());
+        break;
+      case "zipCRC32C":
+        try {
+          checksum = new ChecksumByteBufferImpl(
+              ChecksumByteBufferImpl.Java9Crc32CFactory.createChecksum());
+        } catch (Throwable e) {
+          throw new RuntimeException("zipCRC32C is not available pre Java 9");
+        }
+        break;
+      case "nativeCRC32":
+        if (NativeCRC32Wrapper.isAvailable()) {
+          checksum = new ChecksumByteBufferImpl(new NativeCheckSumCRC32(
+              NativeCRC32Wrapper.CHECKSUM_CRC32, checksumSize));
+        } else {
+          throw new RuntimeException("Native library is not available");
+        }
+        break;
+      case "nativeCRC32C":
+        if (NativeCRC32Wrapper.isAvailable()) {
+          checksum = new ChecksumByteBufferImpl(new NativeCheckSumCRC32(
+              NativeCRC32Wrapper.CHECKSUM_CRC32C, checksumSize));
+        } else {
+          throw new RuntimeException("Native library is not available");
+        }
+        break;
+      default:
+      }
+      data.clear();
+      data.put(RandomUtils.nextBytes(data.remaining()));
+    }
+  }
+
+  @Benchmark
+  @Threads(1)
+  @Warmup(iterations = 3, time = 1000, timeUnit = MILLISECONDS)
+  @Fork(value = 1, warmups = 0)
+  @Measurement(iterations = 5, time = 2000, timeUnit = MILLISECONDS)
+  @BenchmarkMode(Mode.Throughput)
+  public void runCRC(Blackhole blackhole, BenchmarkState state) {
+    ByteBuffer data = state.data;
+    data.clear();

Review comment:
       Why calling clearing the data?  Typo?

##########
File path: hadoop-hdds/common/src/test/java/org/apache/hadoop/ozone/common/TestChecksumImplsComputeSameValues.java
##########
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.common;
+
+import org.apache.commons.lang3.RandomUtils;
+import org.apache.hadoop.util.NativeCRC32Wrapper;
+import org.apache.hadoop.util.PureJavaCrc32;
+import org.apache.hadoop.util.PureJavaCrc32C;
+import org.junit.Test;
+
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.zip.CRC32;
+
+import static junit.framework.TestCase.assertEquals;
+
+public class TestChecksumImplsComputeSameValues {
+
+  private int dataSize = 1024 * 1024 * 64;
+  private ByteBuffer data = ByteBuffer.allocate(dataSize);
+  private int[] bytesPerChecksum = {512, 1024, 2048, 4096, 32768, 1048576};
+
+  @Test
+  public void testCRC32ImplsMatch() {
+    data.clear();
+    data.put(RandomUtils.nextBytes(data.remaining()));
+    for (int bpc : bytesPerChecksum) {
+      List<ChecksumByteBuffer> impls = new ArrayList<>();
+      impls.add(new PureJavaCrc32ByteBuffer());
+      impls.add(new ChecksumByteBufferImpl(new PureJavaCrc32()));
+      impls.add(new ChecksumByteBufferImpl(new CRC32()));
+      if (NativeCRC32Wrapper.isAvailable()) {
+        impls.add(new ChecksumByteBufferImpl(new NativeCheckSumCRC32(1, bpc)));
+      }
+      assertEquals(true, validateImpls(data, impls, bpc));
+    }
+  }
+
+  @Test
+  public void testCRC32CImplsMatch() {
+    data.clear();
+    data.put(RandomUtils.nextBytes(data.remaining()));
+    for (int bpc : bytesPerChecksum) {
+      List<ChecksumByteBuffer> impls = new ArrayList<>();
+      impls.add(new PureJavaCrc32CByteBuffer());
+      impls.add(new ChecksumByteBufferImpl(new PureJavaCrc32C()));
+      // TODO - optional loaded java.util.zip.CRC32C if >= Java 9
+      // impls.add(new ChecksumByteBufferImpl(new CRC32C())));

Review comment:
       How about doing try-catch Java9Crc32CFactory.createChecksum()?  Ignore the exception if it is unavailable.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on pull request #1910: HDDS-4808. Add Genesis benchmark for various CRC implementations

Posted by GitBox <gi...@apache.org>.
sodonnel commented on pull request #1910:
URL: https://github.com/apache/ozone/pull/1910#issuecomment-775165462


   Running the new benchmarks give the following results. I have posted the conclusion at the start as this comment is quite long.
   
   TLDR:
   
   ## Conclusion:
   
    * For real world streaming CRC calculation, the java.util.zip implementations are best on Java 11.
    * On Java 8 - CRC32C performance of hadoop native is close to, or slightly better than java.util.zip for higher BPC.
    * Hadoop native for CRC32 is a lot slower than CRC32C. Hadoop uses CRC32C by default, but there appears to be an issue there.
   
   ## Recommendation:
   
    * Switch Ozone to use java.util.zip.CRC32 by default.
    * Switch the non-default CRC32C implementation in Ozone to the Hadoop pure Java implementation, but use java.util.zip.CRC32C if available.
   
   # Benchmarks
   
   There are several implementations of CRC available:
   
    * Ozone Java CRC32 
    * Ozone Java CRC32C
    * Hadoop Java CRC32
    * Hadoop Java CRC32
    * Java util.zip.CRC32
    * Java util.zip.CRC32C
    * Hadoop Native CRC32
    * Hadoop Native CRC32C
   
   The performance of the algorithm can also depend on the number of data bytes used for each checksum - bytes Per Checksum (BPC).
   
   HDFS has a default BPS of 512 generating 1MB of checksum data per 128MB block.
   
   Ozone has a default BPS of 1MB generating 512 bytes of checksum data per 128MB block.
   
   There is a benchmark class in Hadoop, called Crc32PerformanceTest.java which produces results like the following for varying BPC:
   
   
   ```
   |  bpc  | #T ||      Zip ||     ZipC | % diff || PureJava | % diff || PureJavaC | % diff ||   Native | % diff ||  NativeC | % diff |
   |   512 |  1 |    1736.2 |    1706.4 |  -1.7% |     875.4 | -48.7% |      855.3 |  -2.3% |     937.2 |   9.6% |    5289.9 | 464.5% |
   |   512 |  2 |    2257.5 |    1978.2 | -12.4% |     949.8 | -52.0% |      911.5 |  -4.0% |    1089.9 |  19.6% |    6475.0 | 494.1% |
   |   512 |  4 |    2257.9 |    1879.4 | -16.8% |    1000.6 | -46.8% |      877.6 | -12.3% |    1087.2 |  23.9% |    6128.5 | 463.7% |
   |   512 |  8 |    2322.2 |    1930.3 | -16.9% |     984.7 | -49.0% |      812.1 | -17.5% |    1101.8 |  35.7% |    5508.6 | 400.0% |
   |   512 | 16 |    2208.6 |    1876.9 | -15.0% |     932.4 | -50.3% |      753.2 | -19.2% |    1078.2 |  43.1% |    4830.6 | 348.0% |
   |  bpc  | #T ||      Zip ||     ZipC | % diff || PureJava | % diff || PureJavaC | % diff ||   Native | % diff ||  NativeC | % diff |
   |  1024 |  1 |    2252.8 |    2710.8 |  20.3% |    1019.4 | -62.4% |      879.0 | -13.8% |     966.7 |  10.0% |    4535.2 | 369.2% |
   |  1024 |  2 |    2411.5 |    2470.6 |   2.5% |     992.0 | -59.8% |      857.7 | -13.5% |    1039.9 |  21.2% |    4181.8 | 302.2% |
   |  1024 |  4 |    2656.1 |    2839.8 |   6.9% |     991.9 | -65.1% |      868.1 | -12.5% |    1034.0 |  19.1% |    5473.8 | 429.4% |
   |  1024 |  8 |    2391.7 |    2472.1 |   3.4% |     958.6 | -61.2% |      864.1 |  -9.9% |    1060.8 |  22.8% |    5314.1 | 400.9% |
   |  1024 | 16 |    2545.7 |    2722.7 |   7.0% |     959.3 | -64.8% |      682.5 | -28.9% |    1095.7 |  60.5% |    4814.3 | 339.4% |
   |  bpc  | #T ||      Zip ||     ZipC | % diff || PureJava | % diff || PureJavaC | % diff ||   Native | % diff ||  NativeC | % diff |
   |  2048 |  1 |    1928.7 |    3257.2 |  68.9% |     867.9 | -73.4% |      819.5 |  -5.6% |    1035.0 |  26.3% |    4017.9 | 288.2% |
   |  2048 |  2 |    2237.2 |    3413.9 |  52.6% |     967.3 | -71.7% |      870.2 | -10.0% |    1011.1 |  16.2% |    5656.2 | 459.4% |
   |  2048 |  4 |    2529.7 |    3860.5 |  52.6% |     969.4 | -74.9% |      855.8 | -11.7% |    1108.2 |  29.5% |    5976.6 | 439.3% |
   |  2048 |  8 |    2615.2 |    3554.2 |  35.9% |     914.0 | -74.3% |      818.2 | -10.5% |    1071.4 |  31.0% |    5289.9 | 393.7% |
   |  2048 | 16 |    2659.1 |    3246.8 |  22.1% |     935.8 | -71.2% |      777.0 | -17.0% |    1111.1 |  43.0% |    4433.7 | 299.0% |
   |  bpc  | #T ||      Zip ||     ZipC | % diff || PureJava | % diff || PureJavaC | % diff ||   Native | % diff ||  NativeC | % diff |
   |  4096 |  1 |    2619.0 |    3460.2 |  32.1% |    1052.1 | -69.6% |      823.9 | -21.7% |     925.4 |  12.3% |    7221.2 | 680.3% |
   |  4096 |  2 |    2686.4 |    3518.6 |  31.0% |    1013.6 | -71.2% |      855.1 | -15.6% |     982.6 |  14.9% |    7522.6 | 665.6% |
   |  4096 |  4 |    2722.8 |    3225.1 |  18.4% |     973.9 | -69.8% |      881.6 |  -9.5% |    1039.8 |  17.9% |    7346.7 | 606.6% |
   |  4096 |  8 |    3336.5 |    3680.6 |  10.3% |    1025.9 | -72.1% |      928.4 |  -9.5% |    1108.9 |  19.4% |    7394.3 | 566.8% |
   |  4096 | 16 |    2924.1 |    3604.2 |  23.3% |     907.3 | -74.8% |      882.7 |  -2.7% |    1106.3 |  25.3% |    4543.4 | 310.7% |
   |  bpc  | #T ||      Zip ||     ZipC | % diff || PureJava | % diff || PureJavaC | % diff ||   Native | % diff ||  NativeC | % diff |
   |  8192 |  1 |    2867.8 |    3373.2 |  17.6% |     892.6 | -73.5% |      938.7 |   5.2% |     980.8 |   4.5% |    8047.5 | 720.5% |
   |  8192 |  2 |    3022.8 |    3704.8 |  22.6% |     898.6 | -75.7% |      855.2 |  -4.8% |    1010.0 |  18.1% |    7174.6 | 610.4% |
   |  8192 |  4 |    3196.3 |    4309.5 |  34.8% |     913.8 | -78.8% |      882.2 |  -3.5% |    1027.9 |  16.5% |    8071.9 | 685.3% |
   |  8192 |  8 |    3135.9 |    4542.4 |  44.9% |    1027.2 | -77.4% |      864.1 | -15.9% |    1072.4 |  24.1% |    5925.5 | 452.5% |
   |  8192 | 16 |    2961.7 |    3570.6 |  20.6% |     983.4 | -72.5% |      711.2 | -27.7% |    1119.0 |  57.4% |    4282.8 | 282.7% |
   |  bpc  | #T ||      Zip ||     ZipC | % diff || PureJava | % diff || PureJavaC | % diff ||   Native | % diff ||  NativeC | % diff |
   | 16384 |  1 |    2836.1 |    3645.2 |  28.5% |    1052.0 | -71.1% |      973.8 |  -7.4% |     984.4 |   1.1% |    7577.3 | 669.7% |
   | 16384 |  2 |    2967.9 |    3705.1 |  24.8% |     942.0 | -74.6% |      881.0 |  -6.5% |    1026.9 |  16.6% |    9675.0 | 842.2% |
   | 16384 |  4 |    3218.5 |    4501.9 |  39.9% |     980.7 | -78.2% |      885.4 |  -9.7% |    1058.8 |  19.6% |    7105.5 | 571.1% |
   | 16384 |  8 |    2827.4 |    4076.0 |  44.2% |    1012.6 | -75.2% |      876.7 | -13.4% |    1011.1 |  15.3% |    5649.8 | 458.8% |
   | 16384 | 16 |    2423.0 |    3314.9 |  36.8% |     824.5 | -75.1% |      802.4 |  -2.7% |    1079.0 |  34.5% |    4112.8 | 281.1% |
   |  bpc  | #T ||      Zip ||     ZipC | % diff || PureJava | % diff || PureJavaC | % diff ||   Native | % diff ||  NativeC | % diff |
   | 32768 |  1 |    1998.8 |    3483.5 |  74.3% |     904.5 | -74.0% |      784.0 | -13.3% |     965.7 |  23.2% |    7445.7 | 671.0% |
   | 32768 |  2 |    2526.4 |    3826.7 |  51.5% |     922.5 | -75.9% |      859.4 |  -6.8% |    1101.4 |  28.2% |    9013.3 | 718.4% |
   | 32768 |  4 |    3076.2 |    4535.3 |  47.4% |     972.3 | -78.6% |      897.1 |  -7.7% |    1088.1 |  21.3% |    7682.5 | 606.0% |
   | 32768 |  8 |    3127.8 |    3966.8 |  26.8% |    1021.3 | -74.3% |      894.9 | -12.4% |    1103.7 |  23.3% |    6305.4 | 471.3% |
   | 32768 | 16 |    3122.3 |    3480.9 |  11.5% |    1030.6 | -70.4% |      842.5 | -18.3% |    1117.5 |  32.6% |    3663.3 | 227.8% |
   |  bpc  | #T ||      Zip ||     ZipC | % diff || PureJava | % diff || PureJavaC | % diff ||   Native | % diff ||  NativeC | % diff |
   | 65536 |  1 |    3129.3 |    3846.4 |  22.9% |    1050.2 | -72.7% |      804.7 | -23.4% |    1175.7 |  46.1% |    7242.8 | 516.1% |
   | 65536 |  2 |    3235.7 |    4088.4 |  26.4% |    1051.4 | -74.3% |      852.6 | -18.9% |    1049.2 |  23.1% |    7805.4 | 643.9% |
   | 65536 |  4 |    3061.9 |    4777.9 |  56.0% |    1037.6 | -78.3% |      822.7 | -20.7% |    1092.4 |  32.8% |    7706.1 | 605.5% |
   | 65536 |  8 |    3239.2 |    4242.4 |  31.0% |    1016.3 | -76.0% |      821.1 | -19.2% |    1078.5 |  31.4% |    5994.4 | 455.8% |
   | 65536 | 16 |    2949.5 |    3480.7 |  18.0% |     770.1 | -77.9% |      825.6 |   7.2% |    1081.9 |  31.0% |    3349.3 | 209.6% |
   ```
   
   Here:
   
    * Zip(C) is java.util.zip.CRC32(C)
    * PureJava(C) is the hadoop implementation
    * Native(C) is the native hadoop implementation
   
   The numbers in the table show throughput in MB/s. Therefore a higher number is better. With only this data, it is easy conclude that NativeC is the clear winner for all BPC. However, that may not be the case.
   
   In the hadoop benchmark, the logic creates a 64MB byte buffer. Then it calculates the expected checksum. Then it benchmarks a "validate checksums" routine, where it generates the checksums for the new data and compares that with the expected.
   
   For the native calls, the code is like this:
   
   ```
         public void verifyChunked(ByteBuffer data, int bytesPerSum,
             ByteBuffer sums, String fileName, long basePos)
                 throws ChecksumException {
           NativeCrc32.verifyChunkedSums(bytesPerSum, DataChecksum.Type.CRC32.id,
               sums, data, fileName, basePos);
         }
   ```
   
   Ie, it calls NativeCRC32.verifyChunkedSums, which takes the entire data set (64MB) and runs the complete validation in a single native call.
   
   The pure Java and java.util.zip implementations cannot do this. They must loop over the data and make multiple calls to the Checksum implementation to checksum at each BPC boundary. Its also worth noting the java.util.zip CRC classes make native calls too.
   
   The above does not test real world use. We don't buffer 64MB of data and then calculate / verify all the CRCs in a batch. Rather, we stream the data and calculate the CRCs on demand. It is important to test the streaming case to get more realistic results.
   
   Using the following simple loop in a JMH benchmark, we can get a more realistic test. First populate a 64MB ByteBuffer with random bytes. Then using the following loop, calculate the checksums for the 64MB at BPC intervals:
   
   ```
       for (int i=0; i<data.capacity(); i += bytesPerCheckSum) {
         data.position(i);
         data.limit(i+bytesPerCheckSum);
         csum.update(data);
         blackhole.consume(csum.getValue());
         csum.reset();
       }
   ```
   
   The performance at 512 BPC:
   
   ```
   BPC     Impl            J11-1   J11-2   J8-1    J8-2
   ------------------------------------------------------
   512	pureCRC32	10.105	9.5	10.346	11.221
   512	pureCRC32C	9.519	9.646	11.111	10.72
   512	hadoopCRC32C	16.817	17.183	19.908	19.83
   512	hadoopCRC32	19.897	19.345	19.089	17.645
   512	zipCRC32	72.795	80.716	59.145	52.792
   512	zipCRC32C	56.321	49.921	0	0
   512	nativeCRC32	14.316	15.352	15.873	16.697
   512	nativeCRC32C	35.651	29.765	39.491	41.885
   ```
   
   The numbers above are JHM throught put - ie how many times we can calculate the checksums on 64MB of data per second.
   
    * pureCRC* - Ozone implementations in pure Java.
    * hadoopCRC32* - Hadoop implementation in pure Java.
    * zip* - Java util zip implementations. Note CRC32C is only available from Java 9 and later.
   
   I ran twice on Java 11 and twice on Java 8.
   
   PureCRC32(C), as used in Ozone is the slowest.
   
   The pure java hadoop implementation as significantly faster, but still not great.
   
   java.util.zip is best, beating the native Hadoop implementation by quite a margin.
   
   Also notable, and reproducible in all test runs - java.util.zip.CRC32 is improved significantly in Java 11 over Java 8.
   
   If we also test the Hadoop native implementation, calculating all checksums in a single call (as the hadoop benchmark did), we can see it is fastest as the earlier Hadoop test showed:
   
   ```
   BPC     Impl            J11-1   J11-2  
   ---------------------------------------
   512	nativeCRC32B	22.977	23.343
   512	nativeCRC32CB	108.674	102.923
   ```
   
   I don't have an explanation as to why CRC32CB is so much faster than CRC32B, but this is consistently so.
   
   Moving on to a higher BPC:
   
   ```
   BPC     Impl            J11-1   J11-2   J8-1    J8-2
   ------------------------------------------------------
   4096	pureCRC32	10.334	9.607	11.694	11.682
   4096	pureCRC32C	10.365	9.212	11.771	11.818
   4096	hadoopCRC32C	17.076	17.235	19.934	20.519
   4096	hadoopCRC32	18.789	21.042	18.243	16.353
   4096	zipCRC32	100.413	120.215	88.079	109.794
   4096	zipCRC32C	108.522	129.197	0	0
   4096	nativeCRC32	21.318	21.508	22.177	20.481
   4096	nativeCRC32C	77.365	87.459	90.591	89.689
   4096	nativeCRC32B	22.651	23.884	0	0
   4096	nativeCRC32CB	191.301	175.54	0	0
   ```
   
   The pure Java implementations have not benefited at all. The zip implementations are significantly faster and still best. The Hadoop native have improved too. There does appear to be something wrong with nativeCRC32 as it lags CRC32C by a large margin.
   
   ```
   BPC     Impl            J11-1   J11-2   J8-1    J8-2
   ------------------------------------------------------
   32768	pureCRC32	11.278	11.837	12.284	11.557
   32768	pureCRC32C	10.875	11.794	12.006	11.893
   32768	hadoopCRC32C	16.477	15.856	19.722	20.599
   32768	hadoopCRC32	18.444	20.055	17.601	18.992
   32768	zipCRC32	127.591	114.87	104.169	117.778
   32768	zipCRC32C	100.77	126.446	0	0
   32768	nativeCRC32	23.488	23.934	22.74	23.594
   32768	nativeCRC32C	106.726	104.538	106.031	105.871
   32768	nativeCRC32B	20.225	23.161	0	0
   32768	nativeCRC32CB	167.656	202.245	0	0
   
   BPC     Impl            J11-1   J11-2   J8-1    J8-2
   ------------------------------------------------------
   1048576	pureCRC32	11.469	11.03	11.673	11.27
   1048576	pureCRC32C	11.111	10.98	11.955	11.395
   1048576	hadoopCRC32C	15.926	16.686	17.126	18.884
   1048576	hadoopCRC32	21.064	20.656	19.65	19.343
   1048576	zipCRC32	118.338	116.067	113.645	111.888
   1048576	zipCRC32C	117.705	131.284	0	0
   1048576	nativeCRC32	21.727	23.414	22.14	22.923
   1048576	nativeCRC32C	108.098	109.05	107.373	91.435
   1048576	nativeCRC32B	21.134	23.279	0	0
   1048576	nativeCRC32CB	108.972	100.259	0	0
   ```
   
   The numbers have more variance at the higher BPC, but the trend remains.
   
   ## Conclusion:
   
    * For real world streaming CRC calculation, the java.util.zip implementations are best on Java 11.
    * On Java 8 - CRC32C performance of hadoop native is close to, or slightly better than java.util.zip for higher BPC.
    * Hadoop native for CRC32 is a lot slower than CRC32C. Hadoop uses CRC32C by default, but there appears to be an issue there.
   
   ## Recommendation:
   
    * Switch Ozone to use java.util.zip.CRC32 by default.
    * Switch the non-default CRC32C implementation in Ozone to the Hadoop pure Java implementation, but use java.util.zip.CRC32C if available.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org