You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/05/27 04:07:51 UTC

[GitHub] [arrow] wesm edited a comment on pull request #7267: ARROW-8772: [C++] Unrolled aggregate dense for better speculative execution

wesm edited a comment on pull request #7267:
URL: https://github.com/apache/arrow/pull/7267#issuecomment-633600133


   This shows a small benefit on ARM64 architecture (ThunderX1):
   
   before:
   
   ```
   -----------------------------------------------------------------------------
   Benchmark                   Time             CPU   Iterations UserCounters...
   -----------------------------------------------------------------------------
   SumKernel/32768/0        19.5 us         19.5 us        35883 bytes_per_second=1.56853G/s null_percent=0 size=32.768k
   SumKernel/32768/1        19.2 us         19.2 us        36172 bytes_per_second=1.58548G/s null_percent=1 size=32.768k
   SumKernel/32768/10       21.7 us         21.7 us        32230 bytes_per_second=1.40723G/s null_percent=10 size=32.768k
   SumKernel/32768/50       22.4 us         22.4 us        31172 bytes_per_second=1.36246G/s null_percent=50 size=32.768k
   ```
   
   after (see the Int64 benchmarks):
   
   ```
   -----------------------------------------------------------------------------------
   Benchmark                         Time             CPU   Iterations UserCounters...
   -----------------------------------------------------------------------------------
   SumKernelFloat/32768/0         25.4 us         25.4 us        27590 bytes_per_second=1.20019G/s null_percent=0 size=32.768k
   SumKernelFloat/32768/1         41.2 us         41.2 us        16973 bytes_per_second=758.715M/s null_percent=1 size=32.768k
   SumKernelFloat/32768/10        49.0 us         49.0 us        14285 bytes_per_second=638.254M/s null_percent=10 size=32.768k
   SumKernelFloat/32768/50        51.5 us         51.5 us        13625 bytes_per_second=606.657M/s null_percent=50 size=32.768k
   SumKernelDouble/32768/0        18.5 us         18.5 us        37953 bytes_per_second=1.65038G/s null_percent=0 size=32.768k
   SumKernelDouble/32768/1        26.2 us         26.2 us        26852 bytes_per_second=1.16631G/s null_percent=1 size=32.768k
   SumKernelDouble/32768/10       27.3 us         27.3 us        25685 bytes_per_second=1.11778G/s null_percent=10 size=32.768k
   SumKernelDouble/32768/50       27.7 us         27.7 us        25298 bytes_per_second=1.10219G/s null_percent=50 size=32.768k
   SumKernelInt8/32768/0          64.4 us         64.4 us        10857 bytes_per_second=485.21M/s null_percent=0 size=32.768k
   SumKernelInt8/32768/1          69.8 us         69.8 us        10025 bytes_per_second=448.008M/s null_percent=1 size=32.768k
   SumKernelInt8/32768/10         92.7 us         92.7 us         7554 bytes_per_second=337.084M/s null_percent=10 size=32.768k
   SumKernelInt8/32768/50          100 us          100 us         6974 bytes_per_second=311.424M/s null_percent=50 size=32.768k
   SumKernelInt16/32768/0         41.0 us         41.0 us        17076 bytes_per_second=762.469M/s null_percent=0 size=32.768k
   SumKernelInt16/32768/1         42.5 us         42.5 us        16513 bytes_per_second=735.553M/s null_percent=1 size=32.768k
   SumKernelInt16/32768/10        53.7 us         53.7 us        12977 bytes_per_second=581.673M/s null_percent=10 size=32.768k
   SumKernelInt16/32768/50        57.7 us         57.7 us        12103 bytes_per_second=541.264M/s null_percent=50 size=32.768k
   SumKernelInt32/32768/0         26.0 us         26.0 us        26859 bytes_per_second=1.17154G/s null_percent=0 size=32.768k
   SumKernelInt32/32768/1         26.5 us         26.5 us        26555 bytes_per_second=1.15362G/s null_percent=1 size=32.768k
   SumKernelInt32/32768/10        32.3 us         32.3 us        21653 bytes_per_second=967.29M/s null_percent=10 size=32.768k
   SumKernelInt32/32768/50        34.2 us         34.2 us        20513 bytes_per_second=914.616M/s null_percent=50 size=32.768k
   SumKernelInt64/32768/0         18.0 us         18.0 us        39059 bytes_per_second=1.69949G/s null_percent=0 size=32.768k
   SumKernelInt64/32768/1         19.3 us         19.3 us        36220 bytes_per_second=1.58067G/s null_percent=1 size=32.768k
   SumKernelInt64/32768/10        21.8 us         21.8 us        32171 bytes_per_second=1.40112G/s null_percent=10 size=32.768k
   SumKernelInt64/32768/50        22.5 us         22.5 us        31062 bytes_per_second=1.35471G/s null_percent=50 size=32.768k
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org