You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@geode.apache.org by Jacob Barrett <jb...@pivotal.io> on 2018/05/08 03:46:47 UTC

Geode Native Benchmarking

I got sucked into Google Benchmark [1], a C++ benchmark framework, and
decided to put together a POC benchmark project for Geode Native based on
it. You can find it in apache/geode-native PR 293 [2]. There are two new
projects, cpp-benchmark and cpp-integration-benchmark.

The cpp-benchmark is for unit or micro benchmarks that do not require a
Geode server to test, like a unit test.

The cpp-itegration-benchmark is then obviously for integration style
benchmarks that require a Geode server. It shares the new single process
integration framework classes as the new integration tests. The integration
tests, benchmarks and framework, have been refactored and moved to make
this relationship clear.

Please consider giving this some review and your blessing if you think
there is benefit in integrating it into the trunk.

Skip this part if you aren't interested in some benchmark results from each
platform.

As warned, here are some quick benchmarks from the sample included in this
PR that tests the time spent taking a Java String compatible hash code of a
C++ std::string in UTF-8 vs. a std::u16string in UTF-16. Since the Java
String hash code is computed on the individual UTF-16 code units
encapsulate din Java String, we must convert from UTF-8 to UTF-16 to hash
the std::string UTF-8. We have not optimized this to calculate the hash
inline with this conversion but rather take a brute force convert to UTF-16
then hash the UTF-16. This benchmark shows us that optimizing this method
could give us significant hash performance somewhere between the UTF-8 and
UTF-16 benchmark values. The benchmark name is formatted as
[fixture]/[test]/[variant], for example GeodeHashBM/std_string/8 is the
GeodeHashBM fixture (think test fixture), std_string test, with
8 characters. This is achieved by defining the test like this:

BENCHMARK_DEFINE_F(GeodeHashBM, std_string)(benchmark::State& state) {
  std::string x(state.range(0), 'x');
  for (auto _ : state) {
    int hashcode;
    benchmark::DoNotOptimize(hashcode = geode_hash<std::string>{}(x));
  }
}
BENCHMARK_REGISTER_F(GeodeHashBM, std_string)->Range(8, 8 << 10);

Please read the details of Google Benchmark to understand how it does what
it it does or accept that it does some magic to warm up the code and then
runs the bit of code inside the for-range loop in groups of many iterations
(seen in the results below) to calculate a mean time for each individual
execution of that block.

The the results are...

*Windows - AWS c5.2xlarge*
------------------------
Run on (8 X 3000 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 1048K (x4)
  L3 Unified 25952K (x1)
----------------------------------------------------------------------
Benchmark                               Time           CPU Iterations
----------------------------------------------------------------------
GeodeHashBM/std_string/8            28353 ns      28250 ns      24889
GeodeHashBM/std_string/64           27747 ns      27623 ns      24889
GeodeHashBM/std_string/512          31941 ns      32087 ns      22400
GeodeHashBM/std_string/4096         62409 ns      61384 ns      11200
GeodeHashBM/std_string/8192         98233 ns      97656 ns       6400
GeodeHashBM/std_u16string/8             7 ns          7 ns   89600000
GeodeHashBM/std_u16string/64           59 ns         59 ns   11200000
GeodeHashBM/std_u16string/512         593 ns        600 ns    1120000
GeodeHashBM/std_u16string/4096       4816 ns       4757 ns     144516
GeodeHashBM/std_u16string/8192       9641 ns       9835 ns      74667

We can see here that windows has really poor performance converting from
UTF-8 to UTF-16 when compared to other results below. This is likely due to
bugs in the C++ runtime they ship that are supposed to be addressed in the
2019 release.


Linux - AWS c5.2xlarge
----------------------
Run on (8 X 3000 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 1024K (x4)
  L3 Unified 25344K (x1)
----------------------------------------------------------------------
Benchmark                               Time           CPU Iterations
----------------------------------------------------------------------
GeodeHashBM/std_string/8              116 ns        116 ns    6042733
GeodeHashBM/std_string/64             488 ns        488 ns    1434877
GeodeHashBM/std_string/512           2612 ns       2612 ns     268202
GeodeHashBM/std_string/4096         19639 ns      19639 ns      35640
GeodeHashBM/std_string/8192         39025 ns      39025 ns      17930
GeodeHashBM/std_u16string/8             5 ns          5 ns  133394926
GeodeHashBM/std_u16string/64           47 ns         47 ns   14862333
GeodeHashBM/std_u16string/512         449 ns        449 ns    1558067
GeodeHashBM/std_u16string/4096       3628 ns       3628 ns     192958
GeodeHashBM/std_u16string/8192       7260 ns       7261 ns      96415


*2013 MacBook Pro*
-------------------
Run on (8 X 2600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 262K (x4)
  L3 Unified 6291K (x1)
----------------------------------------------------------------------
Benchmark                               Time           CPU Iterations
----------------------------------------------------------------------
GeodeHashBM/std_string/8              251 ns        250 ns    2941424
GeodeHashBM/std_string/64             420 ns        420 ns    1528121
GeodeHashBM/std_string/512           1618 ns       1618 ns     423334
GeodeHashBM/std_string/4096         11548 ns      11546 ns      59040
GeodeHashBM/std_string/8192         23448 ns      23433 ns      29394
GeodeHashBM/std_u16string/8             5 ns          5 ns  100000000
GeodeHashBM/std_u16string/64           50 ns         50 ns   13873749
GeodeHashBM/std_u16string/512         430 ns        430 ns    1639302
GeodeHashBM/std_u16string/4096       3441 ns       3441 ns     198727
GeodeHashBM/std_u16string/8192       6856 ns       6854 ns      92481


*Solaris x86*
-----------
Run on (40 X 1200 MHz CPU s)
----------------------------------------------------------------------
Benchmark                               Time           CPU Iterations
----------------------------------------------------------------------
GeodeHashBM/std_string/8              276 ns        276 ns    2651445
GeodeHashBM/std_string/64             622 ns        622 ns    1126144
GeodeHashBM/std_string/512           3140 ns       3139 ns     222032
GeodeHashBM/std_string/4096         23368 ns      23352 ns      30304
GeodeHashBM/std_string/8192         46586 ns      46559 ns      15309
GeodeHashBM/std_u16string/8             9 ns          9 ns   77951870
GeodeHashBM/std_u16string/64           54 ns         54 ns   12807846
GeodeHashBM/std_u16string/512         467 ns        467 ns    1531799
GeodeHashBM/std_u16string/4096       3764 ns       3763 ns     183405
GeodeHashBM/std_u16string/8192       7407 ns       7405 ns      95226


*Solaris SPARC*
-------------
Run on (128 X 4267 MHz CPU s)
----------------------------------------------------------------------
Benchmark                               Time           CPU Iterations
----------------------------------------------------------------------
GeodeHashBM/std_string/8              771 ns        771 ns     844136
GeodeHashBM/std_string/64            1446 ns       1446 ns     492403
GeodeHashBM/std_string/512           6058 ns       6058 ns     116836
GeodeHashBM/std_string/4096         43661 ns      43656 ns      16377
GeodeHashBM/std_string/8192         85366 ns      85367 ns       8299
GeodeHashBM/std_u16string/8             8 ns          8 ns   86601509
GeodeHashBM/std_u16string/64           47 ns         47 ns   14928524
GeodeHashBM/std_u16string/512         374 ns        374 ns    1892485
GeodeHashBM/std_u16string/4096       2950 ns       2950 ns     242241
GeodeHashBM/std_u16string/8192       5837 ns       5837 ns     121206

Again we see Solaris, especially SPARC, has rather poor conversion from
UTF-8 to UTF-16 in the C++ runtime.


And just in case you ever doubted that a debug build and release build
could be that different...

*Windows - AWS c5.2xlarge*
------------------------
Run on (8 X 3000 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 1048K (x4)
  L3 Unified 25952K (x1)
****WARNING*** Library was built as DEBUG. Timings may be affected.*
----------------------------------------------------------------------
Benchmark                               Time           CPU Iterations
----------------------------------------------------------------------
GeodeHashBM/std_string/8           142896 ns     141246 ns       4978
GeodeHashBM/std_string/64          160902 ns     161122 ns       4073
GeodeHashBM/std_string/512         258949 ns     260911 ns       2635
GeodeHashBM/std_string/4096       1034958 ns    1045850 ns        747
GeodeHashBM/std_string/8192       1941581 ns    1902174 ns        345
*GeodeHashBM/std_u16string/8          1533 ns       1535 ns     448000*
GeodeHashBM/std_u16string/64         8286 ns       8371 ns      89600
GeodeHashBM/std_u16string/512       62726 ns      62779 ns      11200
GeodeHashBM/std_u16string/4096     498803 ns     488281 ns       1120
GeodeHashBM/std_u16string/8192     997531 ns     976563 ns        640

Compare this debug run of 1535ns to the release run of 7ns for UTF-16
string of 8 characters. Ouch!


-Jake


[1] https://github.com/google/benchmark
[2] https://github.com/apache/geode-native/pull/293