You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/02/08 13:53:04 UTC
[GitHub] [ozone] sodonnel commented on pull request #1910: HDDS-4808. Add Genesis benchmark for various CRC implementations
sodonnel commented on pull request #1910:
URL: https://github.com/apache/ozone/pull/1910#issuecomment-775165462
Running the new benchmarks give the following results. I have posted the conclusion at the start as this comment is quite long.
TLDR:
## Conclusion:
* For real world streaming CRC calculation, the java.util.zip implementations are best on Java 11.
* On Java 8 - CRC32C performance of hadoop native is close to, or slightly better than java.util.zip for higher BPC.
* Hadoop native for CRC32 is a lot slower than CRC32C. Hadoop uses CRC32C by default, but there appears to be an issue there.
## Recommendation:
* Switch Ozone to use java.util.zip.CRC32 by default.
* Switch the non-default CRC32C implementation in Ozone to the Hadoop pure Java implementation, but use java.util.zip.CRC32C if available.
# Benchmarks
There are several implementations of CRC available:
* Ozone Java CRC32
* Ozone Java CRC32C
* Hadoop Java CRC32
* Hadoop Java CRC32
* Java util.zip.CRC32
* Java util.zip.CRC32C
* Hadoop Native CRC32
* Hadoop Native CRC32C
The performance of the algorithm can also depend on the number of data bytes used for each checksum - bytes Per Checksum (BPC).
HDFS has a default BPS of 512 generating 1MB of checksum data per 128MB block.
Ozone has a default BPS of 1MB generating 512 bytes of checksum data per 128MB block.
There is a benchmark class in Hadoop, called Crc32PerformanceTest.java which produces results like the following for varying BPC:
```
| bpc | #T || Zip || ZipC | % diff || PureJava | % diff || PureJavaC | % diff || Native | % diff || NativeC | % diff |
| 512 | 1 | 1736.2 | 1706.4 | -1.7% | 875.4 | -48.7% | 855.3 | -2.3% | 937.2 | 9.6% | 5289.9 | 464.5% |
| 512 | 2 | 2257.5 | 1978.2 | -12.4% | 949.8 | -52.0% | 911.5 | -4.0% | 1089.9 | 19.6% | 6475.0 | 494.1% |
| 512 | 4 | 2257.9 | 1879.4 | -16.8% | 1000.6 | -46.8% | 877.6 | -12.3% | 1087.2 | 23.9% | 6128.5 | 463.7% |
| 512 | 8 | 2322.2 | 1930.3 | -16.9% | 984.7 | -49.0% | 812.1 | -17.5% | 1101.8 | 35.7% | 5508.6 | 400.0% |
| 512 | 16 | 2208.6 | 1876.9 | -15.0% | 932.4 | -50.3% | 753.2 | -19.2% | 1078.2 | 43.1% | 4830.6 | 348.0% |
| bpc | #T || Zip || ZipC | % diff || PureJava | % diff || PureJavaC | % diff || Native | % diff || NativeC | % diff |
| 1024 | 1 | 2252.8 | 2710.8 | 20.3% | 1019.4 | -62.4% | 879.0 | -13.8% | 966.7 | 10.0% | 4535.2 | 369.2% |
| 1024 | 2 | 2411.5 | 2470.6 | 2.5% | 992.0 | -59.8% | 857.7 | -13.5% | 1039.9 | 21.2% | 4181.8 | 302.2% |
| 1024 | 4 | 2656.1 | 2839.8 | 6.9% | 991.9 | -65.1% | 868.1 | -12.5% | 1034.0 | 19.1% | 5473.8 | 429.4% |
| 1024 | 8 | 2391.7 | 2472.1 | 3.4% | 958.6 | -61.2% | 864.1 | -9.9% | 1060.8 | 22.8% | 5314.1 | 400.9% |
| 1024 | 16 | 2545.7 | 2722.7 | 7.0% | 959.3 | -64.8% | 682.5 | -28.9% | 1095.7 | 60.5% | 4814.3 | 339.4% |
| bpc | #T || Zip || ZipC | % diff || PureJava | % diff || PureJavaC | % diff || Native | % diff || NativeC | % diff |
| 2048 | 1 | 1928.7 | 3257.2 | 68.9% | 867.9 | -73.4% | 819.5 | -5.6% | 1035.0 | 26.3% | 4017.9 | 288.2% |
| 2048 | 2 | 2237.2 | 3413.9 | 52.6% | 967.3 | -71.7% | 870.2 | -10.0% | 1011.1 | 16.2% | 5656.2 | 459.4% |
| 2048 | 4 | 2529.7 | 3860.5 | 52.6% | 969.4 | -74.9% | 855.8 | -11.7% | 1108.2 | 29.5% | 5976.6 | 439.3% |
| 2048 | 8 | 2615.2 | 3554.2 | 35.9% | 914.0 | -74.3% | 818.2 | -10.5% | 1071.4 | 31.0% | 5289.9 | 393.7% |
| 2048 | 16 | 2659.1 | 3246.8 | 22.1% | 935.8 | -71.2% | 777.0 | -17.0% | 1111.1 | 43.0% | 4433.7 | 299.0% |
| bpc | #T || Zip || ZipC | % diff || PureJava | % diff || PureJavaC | % diff || Native | % diff || NativeC | % diff |
| 4096 | 1 | 2619.0 | 3460.2 | 32.1% | 1052.1 | -69.6% | 823.9 | -21.7% | 925.4 | 12.3% | 7221.2 | 680.3% |
| 4096 | 2 | 2686.4 | 3518.6 | 31.0% | 1013.6 | -71.2% | 855.1 | -15.6% | 982.6 | 14.9% | 7522.6 | 665.6% |
| 4096 | 4 | 2722.8 | 3225.1 | 18.4% | 973.9 | -69.8% | 881.6 | -9.5% | 1039.8 | 17.9% | 7346.7 | 606.6% |
| 4096 | 8 | 3336.5 | 3680.6 | 10.3% | 1025.9 | -72.1% | 928.4 | -9.5% | 1108.9 | 19.4% | 7394.3 | 566.8% |
| 4096 | 16 | 2924.1 | 3604.2 | 23.3% | 907.3 | -74.8% | 882.7 | -2.7% | 1106.3 | 25.3% | 4543.4 | 310.7% |
| bpc | #T || Zip || ZipC | % diff || PureJava | % diff || PureJavaC | % diff || Native | % diff || NativeC | % diff |
| 8192 | 1 | 2867.8 | 3373.2 | 17.6% | 892.6 | -73.5% | 938.7 | 5.2% | 980.8 | 4.5% | 8047.5 | 720.5% |
| 8192 | 2 | 3022.8 | 3704.8 | 22.6% | 898.6 | -75.7% | 855.2 | -4.8% | 1010.0 | 18.1% | 7174.6 | 610.4% |
| 8192 | 4 | 3196.3 | 4309.5 | 34.8% | 913.8 | -78.8% | 882.2 | -3.5% | 1027.9 | 16.5% | 8071.9 | 685.3% |
| 8192 | 8 | 3135.9 | 4542.4 | 44.9% | 1027.2 | -77.4% | 864.1 | -15.9% | 1072.4 | 24.1% | 5925.5 | 452.5% |
| 8192 | 16 | 2961.7 | 3570.6 | 20.6% | 983.4 | -72.5% | 711.2 | -27.7% | 1119.0 | 57.4% | 4282.8 | 282.7% |
| bpc | #T || Zip || ZipC | % diff || PureJava | % diff || PureJavaC | % diff || Native | % diff || NativeC | % diff |
| 16384 | 1 | 2836.1 | 3645.2 | 28.5% | 1052.0 | -71.1% | 973.8 | -7.4% | 984.4 | 1.1% | 7577.3 | 669.7% |
| 16384 | 2 | 2967.9 | 3705.1 | 24.8% | 942.0 | -74.6% | 881.0 | -6.5% | 1026.9 | 16.6% | 9675.0 | 842.2% |
| 16384 | 4 | 3218.5 | 4501.9 | 39.9% | 980.7 | -78.2% | 885.4 | -9.7% | 1058.8 | 19.6% | 7105.5 | 571.1% |
| 16384 | 8 | 2827.4 | 4076.0 | 44.2% | 1012.6 | -75.2% | 876.7 | -13.4% | 1011.1 | 15.3% | 5649.8 | 458.8% |
| 16384 | 16 | 2423.0 | 3314.9 | 36.8% | 824.5 | -75.1% | 802.4 | -2.7% | 1079.0 | 34.5% | 4112.8 | 281.1% |
| bpc | #T || Zip || ZipC | % diff || PureJava | % diff || PureJavaC | % diff || Native | % diff || NativeC | % diff |
| 32768 | 1 | 1998.8 | 3483.5 | 74.3% | 904.5 | -74.0% | 784.0 | -13.3% | 965.7 | 23.2% | 7445.7 | 671.0% |
| 32768 | 2 | 2526.4 | 3826.7 | 51.5% | 922.5 | -75.9% | 859.4 | -6.8% | 1101.4 | 28.2% | 9013.3 | 718.4% |
| 32768 | 4 | 3076.2 | 4535.3 | 47.4% | 972.3 | -78.6% | 897.1 | -7.7% | 1088.1 | 21.3% | 7682.5 | 606.0% |
| 32768 | 8 | 3127.8 | 3966.8 | 26.8% | 1021.3 | -74.3% | 894.9 | -12.4% | 1103.7 | 23.3% | 6305.4 | 471.3% |
| 32768 | 16 | 3122.3 | 3480.9 | 11.5% | 1030.6 | -70.4% | 842.5 | -18.3% | 1117.5 | 32.6% | 3663.3 | 227.8% |
| bpc | #T || Zip || ZipC | % diff || PureJava | % diff || PureJavaC | % diff || Native | % diff || NativeC | % diff |
| 65536 | 1 | 3129.3 | 3846.4 | 22.9% | 1050.2 | -72.7% | 804.7 | -23.4% | 1175.7 | 46.1% | 7242.8 | 516.1% |
| 65536 | 2 | 3235.7 | 4088.4 | 26.4% | 1051.4 | -74.3% | 852.6 | -18.9% | 1049.2 | 23.1% | 7805.4 | 643.9% |
| 65536 | 4 | 3061.9 | 4777.9 | 56.0% | 1037.6 | -78.3% | 822.7 | -20.7% | 1092.4 | 32.8% | 7706.1 | 605.5% |
| 65536 | 8 | 3239.2 | 4242.4 | 31.0% | 1016.3 | -76.0% | 821.1 | -19.2% | 1078.5 | 31.4% | 5994.4 | 455.8% |
| 65536 | 16 | 2949.5 | 3480.7 | 18.0% | 770.1 | -77.9% | 825.6 | 7.2% | 1081.9 | 31.0% | 3349.3 | 209.6% |
```
Here:
* Zip(C) is java.util.zip.CRC32(C)
* PureJava(C) is the hadoop implementation
* Native(C) is the native hadoop implementation
The numbers in the table show throughput in MB/s. Therefore a higher number is better. With only this data, it is easy conclude that NativeC is the clear winner for all BPC. However, that may not be the case.
In the hadoop benchmark, the logic creates a 64MB byte buffer. Then it calculates the expected checksum. Then it benchmarks a "validate checksums" routine, where it generates the checksums for the new data and compares that with the expected.
For the native calls, the code is like this:
```
public void verifyChunked(ByteBuffer data, int bytesPerSum,
ByteBuffer sums, String fileName, long basePos)
throws ChecksumException {
NativeCrc32.verifyChunkedSums(bytesPerSum, DataChecksum.Type.CRC32.id,
sums, data, fileName, basePos);
}
```
Ie, it calls NativeCRC32.verifyChunkedSums, which takes the entire data set (64MB) and runs the complete validation in a single native call.
The pure Java and java.util.zip implementations cannot do this. They must loop over the data and make multiple calls to the Checksum implementation to checksum at each BPC boundary. Its also worth noting the java.util.zip CRC classes make native calls too.
The above does not test real world use. We don't buffer 64MB of data and then calculate / verify all the CRCs in a batch. Rather, we stream the data and calculate the CRCs on demand. It is important to test the streaming case to get more realistic results.
Using the following simple loop in a JMH benchmark, we can get a more realistic test. First populate a 64MB ByteBuffer with random bytes. Then using the following loop, calculate the checksums for the 64MB at BPC intervals:
```
for (int i=0; i<data.capacity(); i += bytesPerCheckSum) {
data.position(i);
data.limit(i+bytesPerCheckSum);
csum.update(data);
blackhole.consume(csum.getValue());
csum.reset();
}
```
The performance at 512 BPC:
```
BPC Impl J11-1 J11-2 J8-1 J8-2
------------------------------------------------------
512 pureCRC32 10.105 9.5 10.346 11.221
512 pureCRC32C 9.519 9.646 11.111 10.72
512 hadoopCRC32C 16.817 17.183 19.908 19.83
512 hadoopCRC32 19.897 19.345 19.089 17.645
512 zipCRC32 72.795 80.716 59.145 52.792
512 zipCRC32C 56.321 49.921 0 0
512 nativeCRC32 14.316 15.352 15.873 16.697
512 nativeCRC32C 35.651 29.765 39.491 41.885
```
The numbers above are JHM throught put - ie how many times we can calculate the checksums on 64MB of data per second.
* pureCRC* - Ozone implementations in pure Java.
* hadoopCRC32* - Hadoop implementation in pure Java.
* zip* - Java util zip implementations. Note CRC32C is only available from Java 9 and later.
I ran twice on Java 11 and twice on Java 8.
PureCRC32(C), as used in Ozone is the slowest.
The pure java hadoop implementation as significantly faster, but still not great.
java.util.zip is best, beating the native Hadoop implementation by quite a margin.
Also notable, and reproducible in all test runs - java.util.zip.CRC32 is improved significantly in Java 11 over Java 8.
If we also test the Hadoop native implementation, calculating all checksums in a single call (as the hadoop benchmark did), we can see it is fastest as the earlier Hadoop test showed:
```
BPC Impl J11-1 J11-2
---------------------------------------
512 nativeCRC32B 22.977 23.343
512 nativeCRC32CB 108.674 102.923
```
I don't have an explanation as to why CRC32CB is so much faster than CRC32B, but this is consistently so.
Moving on to a higher BPC:
```
BPC Impl J11-1 J11-2 J8-1 J8-2
------------------------------------------------------
4096 pureCRC32 10.334 9.607 11.694 11.682
4096 pureCRC32C 10.365 9.212 11.771 11.818
4096 hadoopCRC32C 17.076 17.235 19.934 20.519
4096 hadoopCRC32 18.789 21.042 18.243 16.353
4096 zipCRC32 100.413 120.215 88.079 109.794
4096 zipCRC32C 108.522 129.197 0 0
4096 nativeCRC32 21.318 21.508 22.177 20.481
4096 nativeCRC32C 77.365 87.459 90.591 89.689
4096 nativeCRC32B 22.651 23.884 0 0
4096 nativeCRC32CB 191.301 175.54 0 0
```
The pure Java implementations have not benefited at all. The zip implementations are significantly faster and still best. The Hadoop native have improved too. There does appear to be something wrong with nativeCRC32 as it lags CRC32C by a large margin.
```
BPC Impl J11-1 J11-2 J8-1 J8-2
------------------------------------------------------
32768 pureCRC32 11.278 11.837 12.284 11.557
32768 pureCRC32C 10.875 11.794 12.006 11.893
32768 hadoopCRC32C 16.477 15.856 19.722 20.599
32768 hadoopCRC32 18.444 20.055 17.601 18.992
32768 zipCRC32 127.591 114.87 104.169 117.778
32768 zipCRC32C 100.77 126.446 0 0
32768 nativeCRC32 23.488 23.934 22.74 23.594
32768 nativeCRC32C 106.726 104.538 106.031 105.871
32768 nativeCRC32B 20.225 23.161 0 0
32768 nativeCRC32CB 167.656 202.245 0 0
BPC Impl J11-1 J11-2 J8-1 J8-2
------------------------------------------------------
1048576 pureCRC32 11.469 11.03 11.673 11.27
1048576 pureCRC32C 11.111 10.98 11.955 11.395
1048576 hadoopCRC32C 15.926 16.686 17.126 18.884
1048576 hadoopCRC32 21.064 20.656 19.65 19.343
1048576 zipCRC32 118.338 116.067 113.645 111.888
1048576 zipCRC32C 117.705 131.284 0 0
1048576 nativeCRC32 21.727 23.414 22.14 22.923
1048576 nativeCRC32C 108.098 109.05 107.373 91.435
1048576 nativeCRC32B 21.134 23.279 0 0
1048576 nativeCRC32CB 108.972 100.259 0 0
```
The numbers have more variance at the higher BPC, but the trend remains.
## Conclusion:
* For real world streaming CRC calculation, the java.util.zip implementations are best on Java 11.
* On Java 8 - CRC32C performance of hadoop native is close to, or slightly better than java.util.zip for higher BPC.
* Hadoop native for CRC32 is a lot slower than CRC32C. Hadoop uses CRC32C by default, but there appears to be an issue there.
## Recommendation:
* Switch Ozone to use java.util.zip.CRC32 by default.
* Switch the non-default CRC32C implementation in Ozone to the Hadoop pure Java implementation, but use java.util.zip.CRC32C if available.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org