You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/02/08 13:53:04 UTC

[GitHub] [ozone] sodonnel commented on pull request #1910: HDDS-4808. Add Genesis benchmark for various CRC implementations

sodonnel commented on pull request #1910:
URL: https://github.com/apache/ozone/pull/1910#issuecomment-775165462


   Running the new benchmarks give the following results. I have posted the conclusion at the start as this comment is quite long.
   
   TLDR:
   
   ## Conclusion:
   
    * For real world streaming CRC calculation, the java.util.zip implementations are best on Java 11.
    * On Java 8 - CRC32C performance of hadoop native is close to, or slightly better than java.util.zip for higher BPC.
    * Hadoop native for CRC32 is a lot slower than CRC32C. Hadoop uses CRC32C by default, but there appears to be an issue there.
   
   ## Recommendation:
   
    * Switch Ozone to use java.util.zip.CRC32 by default.
    * Switch the non-default CRC32C implementation in Ozone to the Hadoop pure Java implementation, but use java.util.zip.CRC32C if available.
   
   # Benchmarks
   
   There are several implementations of CRC available:
   
    * Ozone Java CRC32 
    * Ozone Java CRC32C
    * Hadoop Java CRC32
    * Hadoop Java CRC32
    * Java util.zip.CRC32
    * Java util.zip.CRC32C
    * Hadoop Native CRC32
    * Hadoop Native CRC32C
   
   The performance of the algorithm can also depend on the number of data bytes used for each checksum - bytes Per Checksum (BPC).
   
   HDFS has a default BPS of 512 generating 1MB of checksum data per 128MB block.
   
   Ozone has a default BPS of 1MB generating 512 bytes of checksum data per 128MB block.
   
   There is a benchmark class in Hadoop, called Crc32PerformanceTest.java which produces results like the following for varying BPC:
   
   
   ```
   |  bpc  | #T ||      Zip ||     ZipC | % diff || PureJava | % diff || PureJavaC | % diff ||   Native | % diff ||  NativeC | % diff |
   |   512 |  1 |    1736.2 |    1706.4 |  -1.7% |     875.4 | -48.7% |      855.3 |  -2.3% |     937.2 |   9.6% |    5289.9 | 464.5% |
   |   512 |  2 |    2257.5 |    1978.2 | -12.4% |     949.8 | -52.0% |      911.5 |  -4.0% |    1089.9 |  19.6% |    6475.0 | 494.1% |
   |   512 |  4 |    2257.9 |    1879.4 | -16.8% |    1000.6 | -46.8% |      877.6 | -12.3% |    1087.2 |  23.9% |    6128.5 | 463.7% |
   |   512 |  8 |    2322.2 |    1930.3 | -16.9% |     984.7 | -49.0% |      812.1 | -17.5% |    1101.8 |  35.7% |    5508.6 | 400.0% |
   |   512 | 16 |    2208.6 |    1876.9 | -15.0% |     932.4 | -50.3% |      753.2 | -19.2% |    1078.2 |  43.1% |    4830.6 | 348.0% |
   |  bpc  | #T ||      Zip ||     ZipC | % diff || PureJava | % diff || PureJavaC | % diff ||   Native | % diff ||  NativeC | % diff |
   |  1024 |  1 |    2252.8 |    2710.8 |  20.3% |    1019.4 | -62.4% |      879.0 | -13.8% |     966.7 |  10.0% |    4535.2 | 369.2% |
   |  1024 |  2 |    2411.5 |    2470.6 |   2.5% |     992.0 | -59.8% |      857.7 | -13.5% |    1039.9 |  21.2% |    4181.8 | 302.2% |
   |  1024 |  4 |    2656.1 |    2839.8 |   6.9% |     991.9 | -65.1% |      868.1 | -12.5% |    1034.0 |  19.1% |    5473.8 | 429.4% |
   |  1024 |  8 |    2391.7 |    2472.1 |   3.4% |     958.6 | -61.2% |      864.1 |  -9.9% |    1060.8 |  22.8% |    5314.1 | 400.9% |
   |  1024 | 16 |    2545.7 |    2722.7 |   7.0% |     959.3 | -64.8% |      682.5 | -28.9% |    1095.7 |  60.5% |    4814.3 | 339.4% |
   |  bpc  | #T ||      Zip ||     ZipC | % diff || PureJava | % diff || PureJavaC | % diff ||   Native | % diff ||  NativeC | % diff |
   |  2048 |  1 |    1928.7 |    3257.2 |  68.9% |     867.9 | -73.4% |      819.5 |  -5.6% |    1035.0 |  26.3% |    4017.9 | 288.2% |
   |  2048 |  2 |    2237.2 |    3413.9 |  52.6% |     967.3 | -71.7% |      870.2 | -10.0% |    1011.1 |  16.2% |    5656.2 | 459.4% |
   |  2048 |  4 |    2529.7 |    3860.5 |  52.6% |     969.4 | -74.9% |      855.8 | -11.7% |    1108.2 |  29.5% |    5976.6 | 439.3% |
   |  2048 |  8 |    2615.2 |    3554.2 |  35.9% |     914.0 | -74.3% |      818.2 | -10.5% |    1071.4 |  31.0% |    5289.9 | 393.7% |
   |  2048 | 16 |    2659.1 |    3246.8 |  22.1% |     935.8 | -71.2% |      777.0 | -17.0% |    1111.1 |  43.0% |    4433.7 | 299.0% |
   |  bpc  | #T ||      Zip ||     ZipC | % diff || PureJava | % diff || PureJavaC | % diff ||   Native | % diff ||  NativeC | % diff |
   |  4096 |  1 |    2619.0 |    3460.2 |  32.1% |    1052.1 | -69.6% |      823.9 | -21.7% |     925.4 |  12.3% |    7221.2 | 680.3% |
   |  4096 |  2 |    2686.4 |    3518.6 |  31.0% |    1013.6 | -71.2% |      855.1 | -15.6% |     982.6 |  14.9% |    7522.6 | 665.6% |
   |  4096 |  4 |    2722.8 |    3225.1 |  18.4% |     973.9 | -69.8% |      881.6 |  -9.5% |    1039.8 |  17.9% |    7346.7 | 606.6% |
   |  4096 |  8 |    3336.5 |    3680.6 |  10.3% |    1025.9 | -72.1% |      928.4 |  -9.5% |    1108.9 |  19.4% |    7394.3 | 566.8% |
   |  4096 | 16 |    2924.1 |    3604.2 |  23.3% |     907.3 | -74.8% |      882.7 |  -2.7% |    1106.3 |  25.3% |    4543.4 | 310.7% |
   |  bpc  | #T ||      Zip ||     ZipC | % diff || PureJava | % diff || PureJavaC | % diff ||   Native | % diff ||  NativeC | % diff |
   |  8192 |  1 |    2867.8 |    3373.2 |  17.6% |     892.6 | -73.5% |      938.7 |   5.2% |     980.8 |   4.5% |    8047.5 | 720.5% |
   |  8192 |  2 |    3022.8 |    3704.8 |  22.6% |     898.6 | -75.7% |      855.2 |  -4.8% |    1010.0 |  18.1% |    7174.6 | 610.4% |
   |  8192 |  4 |    3196.3 |    4309.5 |  34.8% |     913.8 | -78.8% |      882.2 |  -3.5% |    1027.9 |  16.5% |    8071.9 | 685.3% |
   |  8192 |  8 |    3135.9 |    4542.4 |  44.9% |    1027.2 | -77.4% |      864.1 | -15.9% |    1072.4 |  24.1% |    5925.5 | 452.5% |
   |  8192 | 16 |    2961.7 |    3570.6 |  20.6% |     983.4 | -72.5% |      711.2 | -27.7% |    1119.0 |  57.4% |    4282.8 | 282.7% |
   |  bpc  | #T ||      Zip ||     ZipC | % diff || PureJava | % diff || PureJavaC | % diff ||   Native | % diff ||  NativeC | % diff |
   | 16384 |  1 |    2836.1 |    3645.2 |  28.5% |    1052.0 | -71.1% |      973.8 |  -7.4% |     984.4 |   1.1% |    7577.3 | 669.7% |
   | 16384 |  2 |    2967.9 |    3705.1 |  24.8% |     942.0 | -74.6% |      881.0 |  -6.5% |    1026.9 |  16.6% |    9675.0 | 842.2% |
   | 16384 |  4 |    3218.5 |    4501.9 |  39.9% |     980.7 | -78.2% |      885.4 |  -9.7% |    1058.8 |  19.6% |    7105.5 | 571.1% |
   | 16384 |  8 |    2827.4 |    4076.0 |  44.2% |    1012.6 | -75.2% |      876.7 | -13.4% |    1011.1 |  15.3% |    5649.8 | 458.8% |
   | 16384 | 16 |    2423.0 |    3314.9 |  36.8% |     824.5 | -75.1% |      802.4 |  -2.7% |    1079.0 |  34.5% |    4112.8 | 281.1% |
   |  bpc  | #T ||      Zip ||     ZipC | % diff || PureJava | % diff || PureJavaC | % diff ||   Native | % diff ||  NativeC | % diff |
   | 32768 |  1 |    1998.8 |    3483.5 |  74.3% |     904.5 | -74.0% |      784.0 | -13.3% |     965.7 |  23.2% |    7445.7 | 671.0% |
   | 32768 |  2 |    2526.4 |    3826.7 |  51.5% |     922.5 | -75.9% |      859.4 |  -6.8% |    1101.4 |  28.2% |    9013.3 | 718.4% |
   | 32768 |  4 |    3076.2 |    4535.3 |  47.4% |     972.3 | -78.6% |      897.1 |  -7.7% |    1088.1 |  21.3% |    7682.5 | 606.0% |
   | 32768 |  8 |    3127.8 |    3966.8 |  26.8% |    1021.3 | -74.3% |      894.9 | -12.4% |    1103.7 |  23.3% |    6305.4 | 471.3% |
   | 32768 | 16 |    3122.3 |    3480.9 |  11.5% |    1030.6 | -70.4% |      842.5 | -18.3% |    1117.5 |  32.6% |    3663.3 | 227.8% |
   |  bpc  | #T ||      Zip ||     ZipC | % diff || PureJava | % diff || PureJavaC | % diff ||   Native | % diff ||  NativeC | % diff |
   | 65536 |  1 |    3129.3 |    3846.4 |  22.9% |    1050.2 | -72.7% |      804.7 | -23.4% |    1175.7 |  46.1% |    7242.8 | 516.1% |
   | 65536 |  2 |    3235.7 |    4088.4 |  26.4% |    1051.4 | -74.3% |      852.6 | -18.9% |    1049.2 |  23.1% |    7805.4 | 643.9% |
   | 65536 |  4 |    3061.9 |    4777.9 |  56.0% |    1037.6 | -78.3% |      822.7 | -20.7% |    1092.4 |  32.8% |    7706.1 | 605.5% |
   | 65536 |  8 |    3239.2 |    4242.4 |  31.0% |    1016.3 | -76.0% |      821.1 | -19.2% |    1078.5 |  31.4% |    5994.4 | 455.8% |
   | 65536 | 16 |    2949.5 |    3480.7 |  18.0% |     770.1 | -77.9% |      825.6 |   7.2% |    1081.9 |  31.0% |    3349.3 | 209.6% |
   ```
   
   Here:
   
    * Zip(C) is java.util.zip.CRC32(C)
    * PureJava(C) is the hadoop implementation
    * Native(C) is the native hadoop implementation
   
   The numbers in the table show throughput in MB/s. Therefore a higher number is better. With only this data, it is easy conclude that NativeC is the clear winner for all BPC. However, that may not be the case.
   
   In the hadoop benchmark, the logic creates a 64MB byte buffer. Then it calculates the expected checksum. Then it benchmarks a "validate checksums" routine, where it generates the checksums for the new data and compares that with the expected.
   
   For the native calls, the code is like this:
   
   ```
         public void verifyChunked(ByteBuffer data, int bytesPerSum,
             ByteBuffer sums, String fileName, long basePos)
                 throws ChecksumException {
           NativeCrc32.verifyChunkedSums(bytesPerSum, DataChecksum.Type.CRC32.id,
               sums, data, fileName, basePos);
         }
   ```
   
   Ie, it calls NativeCRC32.verifyChunkedSums, which takes the entire data set (64MB) and runs the complete validation in a single native call.
   
   The pure Java and java.util.zip implementations cannot do this. They must loop over the data and make multiple calls to the Checksum implementation to checksum at each BPC boundary. Its also worth noting the java.util.zip CRC classes make native calls too.
   
   The above does not test real world use. We don't buffer 64MB of data and then calculate / verify all the CRCs in a batch. Rather, we stream the data and calculate the CRCs on demand. It is important to test the streaming case to get more realistic results.
   
   Using the following simple loop in a JMH benchmark, we can get a more realistic test. First populate a 64MB ByteBuffer with random bytes. Then using the following loop, calculate the checksums for the 64MB at BPC intervals:
   
   ```
       for (int i=0; i<data.capacity(); i += bytesPerCheckSum) {
         data.position(i);
         data.limit(i+bytesPerCheckSum);
         csum.update(data);
         blackhole.consume(csum.getValue());
         csum.reset();
       }
   ```
   
   The performance at 512 BPC:
   
   ```
   BPC     Impl            J11-1   J11-2   J8-1    J8-2
   ------------------------------------------------------
   512	pureCRC32	10.105	9.5	10.346	11.221
   512	pureCRC32C	9.519	9.646	11.111	10.72
   512	hadoopCRC32C	16.817	17.183	19.908	19.83
   512	hadoopCRC32	19.897	19.345	19.089	17.645
   512	zipCRC32	72.795	80.716	59.145	52.792
   512	zipCRC32C	56.321	49.921	0	0
   512	nativeCRC32	14.316	15.352	15.873	16.697
   512	nativeCRC32C	35.651	29.765	39.491	41.885
   ```
   
   The numbers above are JHM throught put - ie how many times we can calculate the checksums on 64MB of data per second.
   
    * pureCRC* - Ozone implementations in pure Java.
    * hadoopCRC32* - Hadoop implementation in pure Java.
    * zip* - Java util zip implementations. Note CRC32C is only available from Java 9 and later.
   
   I ran twice on Java 11 and twice on Java 8.
   
   PureCRC32(C), as used in Ozone is the slowest.
   
   The pure java hadoop implementation as significantly faster, but still not great.
   
   java.util.zip is best, beating the native Hadoop implementation by quite a margin.
   
   Also notable, and reproducible in all test runs - java.util.zip.CRC32 is improved significantly in Java 11 over Java 8.
   
   If we also test the Hadoop native implementation, calculating all checksums in a single call (as the hadoop benchmark did), we can see it is fastest as the earlier Hadoop test showed:
   
   ```
   BPC     Impl            J11-1   J11-2  
   ---------------------------------------
   512	nativeCRC32B	22.977	23.343
   512	nativeCRC32CB	108.674	102.923
   ```
   
   I don't have an explanation as to why CRC32CB is so much faster than CRC32B, but this is consistently so.
   
   Moving on to a higher BPC:
   
   ```
   BPC     Impl            J11-1   J11-2   J8-1    J8-2
   ------------------------------------------------------
   4096	pureCRC32	10.334	9.607	11.694	11.682
   4096	pureCRC32C	10.365	9.212	11.771	11.818
   4096	hadoopCRC32C	17.076	17.235	19.934	20.519
   4096	hadoopCRC32	18.789	21.042	18.243	16.353
   4096	zipCRC32	100.413	120.215	88.079	109.794
   4096	zipCRC32C	108.522	129.197	0	0
   4096	nativeCRC32	21.318	21.508	22.177	20.481
   4096	nativeCRC32C	77.365	87.459	90.591	89.689
   4096	nativeCRC32B	22.651	23.884	0	0
   4096	nativeCRC32CB	191.301	175.54	0	0
   ```
   
   The pure Java implementations have not benefited at all. The zip implementations are significantly faster and still best. The Hadoop native have improved too. There does appear to be something wrong with nativeCRC32 as it lags CRC32C by a large margin.
   
   ```
   BPC     Impl            J11-1   J11-2   J8-1    J8-2
   ------------------------------------------------------
   32768	pureCRC32	11.278	11.837	12.284	11.557
   32768	pureCRC32C	10.875	11.794	12.006	11.893
   32768	hadoopCRC32C	16.477	15.856	19.722	20.599
   32768	hadoopCRC32	18.444	20.055	17.601	18.992
   32768	zipCRC32	127.591	114.87	104.169	117.778
   32768	zipCRC32C	100.77	126.446	0	0
   32768	nativeCRC32	23.488	23.934	22.74	23.594
   32768	nativeCRC32C	106.726	104.538	106.031	105.871
   32768	nativeCRC32B	20.225	23.161	0	0
   32768	nativeCRC32CB	167.656	202.245	0	0
   
   BPC     Impl            J11-1   J11-2   J8-1    J8-2
   ------------------------------------------------------
   1048576	pureCRC32	11.469	11.03	11.673	11.27
   1048576	pureCRC32C	11.111	10.98	11.955	11.395
   1048576	hadoopCRC32C	15.926	16.686	17.126	18.884
   1048576	hadoopCRC32	21.064	20.656	19.65	19.343
   1048576	zipCRC32	118.338	116.067	113.645	111.888
   1048576	zipCRC32C	117.705	131.284	0	0
   1048576	nativeCRC32	21.727	23.414	22.14	22.923
   1048576	nativeCRC32C	108.098	109.05	107.373	91.435
   1048576	nativeCRC32B	21.134	23.279	0	0
   1048576	nativeCRC32CB	108.972	100.259	0	0
   ```
   
   The numbers have more variance at the higher BPC, but the trend remains.
   
   ## Conclusion:
   
    * For real world streaming CRC calculation, the java.util.zip implementations are best on Java 11.
    * On Java 8 - CRC32C performance of hadoop native is close to, or slightly better than java.util.zip for higher BPC.
    * Hadoop native for CRC32 is a lot slower than CRC32C. Hadoop uses CRC32C by default, but there appears to be an issue there.
   
   ## Recommendation:
   
    * Switch Ozone to use java.util.zip.CRC32 by default.
    * Switch the non-default CRC32C implementation in Ozone to the Hadoop pure Java implementation, but use java.util.zip.CRC32C if available.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org