You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by yi...@apache.org on 2022/08/05 03:00:53 UTC

[arrow] branch master updated: ARROW-17305: [C++] Avoid spending time in popcount in BitmapAnd benchmark (#13794)

This is an automated email from the ASF dual-hosted git repository.

yibocai pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
     new 56e6caf07d ARROW-17305: [C++] Avoid spending time in popcount in BitmapAnd benchmark (#13794)
56e6caf07d is described below

commit 56e6caf07d77a4d4c79a20c558c2618efe7de830
Author: Antoine Pitrou <an...@python.org>
AuthorDate: Fri Aug 5 05:00:46 2022 +0200

    ARROW-17305: [C++] Avoid spending time in popcount in BitmapAnd benchmark (#13794)
    
    This was artificially limiting the reported performance of BitmapAnd.
    
    Before:
    ```
    --------------------------------------------------------------------------------------
    Benchmark                            Time             CPU   Iterations UserCounters...
    --------------------------------------------------------------------------------------
    BenchmarkBitmapAnd/32768/0        1708 ns         1708 ns       408579 bytes_per_second=17.8726G/s
    BenchmarkBitmapAnd/131072/0       6968 ns         6965 ns       102223 bytes_per_second=17.5262G/s
    BenchmarkBitmapAnd/32768/1        3982 ns         3981 ns       175136 bytes_per_second=7.66574G/s
    BenchmarkBitmapAnd/131072/1      15574 ns        15569 ns        44988 bytes_per_second=7.8404G/s
    BenchmarkBitmapAnd/32768/2        3999 ns         3998 ns       175021 bytes_per_second=7.63248G/s
    BenchmarkBitmapAnd/131072/2      15589 ns        15585 ns        44844 bytes_per_second=7.83234G/s
    ```
    
    After:
    ```
    --------------------------------------------------------------------------------------
    Benchmark                            Time             CPU   Iterations UserCounters...
    --------------------------------------------------------------------------------------
    BenchmarkBitmapAnd/32768/0         732 ns          732 ns       967465 bytes_per_second=41.6736G/s
    BenchmarkBitmapAnd/131072/0       3105 ns         3105 ns       229726 bytes_per_second=39.3198G/s
    BenchmarkBitmapAnd/32768/1        2913 ns         2913 ns       240233 bytes_per_second=10.4774G/s
    BenchmarkBitmapAnd/131072/1      11528 ns        11526 ns        60865 bytes_per_second=10.5912G/s
    BenchmarkBitmapAnd/32768/2        2924 ns         2924 ns       236873 bytes_per_second=10.4378G/s
    BenchmarkBitmapAnd/131072/2      11552 ns        11550 ns        60619 bytes_per_second=10.5691G/s
    ```
    
    (I didn't check, but the compiler here probably auto-vectorizes the aligned code path)
    
    Authored-by: Antoine Pitrou <an...@python.org>
    Signed-off-by: Yibo Cai <yi...@arm.com>
---
 cpp/src/arrow/util/bit_util_benchmark.cc | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/cpp/src/arrow/util/bit_util_benchmark.cc b/cpp/src/arrow/util/bit_util_benchmark.cc
index 8e95d01462..3bcb4ceea6 100644
--- a/cpp/src/arrow/util/bit_util_benchmark.cc
+++ b/cpp/src/arrow/util/bit_util_benchmark.cc
@@ -150,9 +150,7 @@ static void BenchmarkAndImpl(benchmark::State& state, DoAnd&& do_and) {
 
   for (auto _ : state) {
     do_and({bitmap_1, bitmap_2}, &bitmap_3);
-    auto total =
-        internal::CountSetBits(bitmap_3.data(), bitmap_3.offset(), bitmap_3.length());
-    benchmark::DoNotOptimize(total);
+    benchmark::ClobberMemory();
   }
   state.SetBytesProcessed(state.iterations() * nbytes);
 }