You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/27 13:23:45 UTC

[GitHub] [arrow] lidavidm commented on pull request #10813: ARROW-13451: [C++] WIP: POC: benchmark using hash aggregate kernels for scalar aggregation

lidavidm commented on pull request #10813:
URL: https://github.com/apache/arrow/pull/10813#issuecomment-887509020


   This is just to see how the hash aggregate kernels perform compared to the dedicated scalar aggregation kernels in the case that there is only one group.
   
   Unfortunately, it's rather terrible. For count:
   
   <details>
   
   ```
   ---------------------------------------------------------------------------------------------------
   Benchmark                                         Time             CPU   Iterations UserCounters...
   ---------------------------------------------------------------------------------------------------
   CountKernelBenchInt64/1048576/2                3067 ns         3067 ns       457331 bytes_per_second=318.4G/s null_percent=50 size=1048.58k
   CountKernelBenchInt64Aggregate/1048576/2     412162 ns       412138 ns         3238 bytes_per_second=2.3695G/s null_percent=50 size=1048.58k
   ```
   
   </details>
   
   At 2 orders of magnitude slower, the hash aggregate kernel isn't anywhere near the dedicated scalar one. The scalar kernel essentially just calls CountSetBits, while the hash aggregate kernel must use VisitSetBitRuns and index into a length-1 array of counts. Also, a good amount of time (~10% of the runtime according to perf) is spent just allocating and filling an array of group IDs to use at the start.
   
   For min_max the story is not so clear. The hash aggregate kernel actually wins for floats, but loses badly (not as badly as with Count) for integers.
   
   <details>
   
   ```
   ----------------------------------------------------------------------------------------------------
   Benchmark                                          Time             CPU   Iterations UserCounters...
   ----------------------------------------------------------------------------------------------------
   MinMaxKernelFloat/1048576/10000                  981 us          981 us          638 bytes_per_second=1018.91M/s null_percent=0.01 size=1048.58k
   MinMaxKernelFloat/1048576/100                   1008 us         1008 us          723 bytes_per_second=992.523M/s null_percent=1 size=1048.58k
   MinMaxKernelFloat/1048576/10                    1062 us         1062 us          561 bytes_per_second=941.703M/s null_percent=10 size=1048.58k
   MinMaxKernelFloat/1048576/2                     1424 us         1424 us          456 bytes_per_second=702.401M/s null_percent=50 size=1048.58k
   MinMaxKernelFloat/1048576/1                     6.92 us         6.92 us       105816 bytes_per_second=141.155G/s null_percent=100 size=1048.58k
   MinMaxKernelFloat/1048576/0                      900 us          900 us          815 bytes_per_second=1111.18M/s null_percent=0 size=1048.58k
   MinMaxKernelFloatAggregate/1048576/10000         667 us          667 us         1103 bytes_per_second=1.46325G/s null_percent=0.01 size=1048.58k
   MinMaxKernelFloatAggregate/1048576/100           654 us          654 us          924 bytes_per_second=1.49389G/s null_percent=1 size=1048.58k
   MinMaxKernelFloatAggregate/1048576/10            765 us          765 us          965 bytes_per_second=1.27599G/s null_percent=10 size=1048.58k
   MinMaxKernelFloatAggregate/1048576/2            1267 us         1267 us          585 bytes_per_second=789.267M/s null_percent=50 size=1048.58k
   MinMaxKernelFloatAggregate/1048576/1             421 us          421 us         1693 bytes_per_second=2.32129G/s null_percent=100 size=1048.58k
   MinMaxKernelFloatAggregate/1048576/0             668 us          668 us         1107 bytes_per_second=1.46147G/s null_percent=0 size=1048.58k
   MinMaxKernelDouble/1048576/10000                 420 us          420 us         1712 bytes_per_second=2.32776G/s null_percent=0.01 size=1048.58k
   MinMaxKernelDouble/1048576/100                   465 us          465 us         1412 bytes_per_second=2.10164G/s null_percent=1 size=1048.58k
   MinMaxKernelDouble/1048576/10                    592 us          592 us         1168 bytes_per_second=1.64947G/s null_percent=10 size=1048.58k
   MinMaxKernelDouble/1048576/2                     730 us          730 us         1008 bytes_per_second=1.33826G/s null_percent=50 size=1048.58k
   MinMaxKernelDouble/1048576/1                    4.10 us         4.10 us       177426 bytes_per_second=238.21G/s null_percent=100 size=1048.58k
   MinMaxKernelDouble/1048576/0                     540 us          540 us         1000 bytes_per_second=1.80829G/s null_percent=0 size=1048.58k
   MinMaxKernelDoubleAggregate/1048576/10000        342 us          342 us         2106 bytes_per_second=2.85799G/s null_percent=0.01 size=1048.58k
   MinMaxKernelDoubleAggregate/1048576/100          346 us          346 us         2136 bytes_per_second=2.82629G/s null_percent=1 size=1048.58k
   MinMaxKernelDoubleAggregate/1048576/10           385 us          385 us         1911 bytes_per_second=2.53959G/s null_percent=10 size=1048.58k
   MinMaxKernelDoubleAggregate/1048576/2            631 us          631 us         1163 bytes_per_second=1.54829G/s null_percent=50 size=1048.58k
   MinMaxKernelDoubleAggregate/1048576/1            218 us          218 us         3413 bytes_per_second=4.48758G/s null_percent=100 size=1048.58k
   MinMaxKernelDoubleAggregate/1048576/0            334 us          334 us         2193 bytes_per_second=2.92247G/s null_percent=0 size=1048.58k
   MinMaxKernelInt8/1048576/10000                   571 us          571 us         1293 bytes_per_second=1.71088G/s null_percent=0.01 size=1048.58k
   MinMaxKernelInt8/1048576/100                     986 us          986 us          742 bytes_per_second=1014.35M/s null_percent=1 size=1048.58k
   MinMaxKernelInt8/1048576/10                     1818 us         1818 us          402 bytes_per_second=550.013M/s null_percent=10 size=1048.58k
   MinMaxKernelInt8/1048576/2                      4039 us         4039 us          182 bytes_per_second=247.588M/s null_percent=50 size=1048.58k
   MinMaxKernelInt8/1048576/1                      22.9 us         22.9 us        31922 bytes_per_second=42.701G/s null_percent=100 size=1048.58k
   MinMaxKernelInt8/1048576/0                       546 us          546 us         1368 bytes_per_second=1.78943G/s null_percent=0 size=1048.58k
   MinMaxKernelInt8Aggregate/1048576/10000         2241 us         2241 us          325 bytes_per_second=446.241M/s null_percent=0.01 size=1048.58k
   MinMaxKernelInt8Aggregate/1048576/100           2497 us         2497 us          299 bytes_per_second=400.494M/s null_percent=1 size=1048.58k
   MinMaxKernelInt8Aggregate/1048576/10            3144 us         3143 us          237 bytes_per_second=318.129M/s null_percent=10 size=1048.58k
   MinMaxKernelInt8Aggregate/1048576/2             5107 us         5107 us          100 bytes_per_second=195.815M/s null_percent=50 size=1048.58k
   MinMaxKernelInt8Aggregate/1048576/1             1581 us         1581 us          386 bytes_per_second=632.705M/s null_percent=100 size=1048.58k
   MinMaxKernelInt8Aggregate/1048576/0             2093 us         2093 us          274 bytes_per_second=477.747M/s null_percent=0 size=1048.58k
   MinMaxKernelInt16/1048576/10000                  274 us          274 us         2329 bytes_per_second=3.56594G/s null_percent=0.01 size=1048.58k
   MinMaxKernelInt16/1048576/100                    459 us          459 us         1429 bytes_per_second=2.12904G/s null_percent=1 size=1048.58k
   MinMaxKernelInt16/1048576/10                     842 us          842 us          698 bytes_per_second=1.1593G/s null_percent=10 size=1048.58k
   MinMaxKernelInt16/1048576/2                     1997 us         1997 us          370 bytes_per_second=500.784M/s null_percent=50 size=1048.58k
   MinMaxKernelInt16/1048576/1                     12.1 us         12.1 us        60436 bytes_per_second=80.6769G/s null_percent=100 size=1048.58k
   MinMaxKernelInt16/1048576/0                      271 us          271 us         2713 bytes_per_second=3.60785G/s null_percent=0 size=1048.58k
   MinMaxKernelInt16Aggregate/1048576/10000        1226 us         1226 us          615 bytes_per_second=815.951M/s null_percent=0.01 size=1048.58k
   MinMaxKernelInt16Aggregate/1048576/100          1326 us         1326 us          564 bytes_per_second=753.999M/s null_percent=1 size=1048.58k
   MinMaxKernelInt16Aggregate/1048576/10           1608 us         1608 us          462 bytes_per_second=622M/s null_percent=10 size=1048.58k
   MinMaxKernelInt16Aggregate/1048576/2            2700 us         2700 us          275 bytes_per_second=370.316M/s null_percent=50 size=1048.58k
   MinMaxKernelInt16Aggregate/1048576/1             867 us          866 us          868 bytes_per_second=1.12704G/s null_percent=100 size=1048.58k
   MinMaxKernelInt16Aggregate/1048576/0            1190 us         1190 us          620 bytes_per_second=840.448M/s null_percent=0 size=1048.58k
   MinMaxKernelInt32/1048576/10000                  139 us          139 us         4389 bytes_per_second=7.03596G/s null_percent=0.01 size=1048.58k
   MinMaxKernelInt32/1048576/100                    238 us          238 us         2483 bytes_per_second=4.10169G/s null_percent=1 size=1048.58k
   MinMaxKernelInt32/1048576/10                     515 us          515 us         1000 bytes_per_second=1.89702G/s null_percent=10 size=1048.58k
   MinMaxKernelInt32/1048576/2                     1021 us         1021 us          722 bytes_per_second=979.116M/s null_percent=50 size=1048.58k
   MinMaxKernelInt32/1048576/1                     6.55 us         6.55 us       109640 bytes_per_second=149.127G/s null_percent=100 size=1048.58k
   MinMaxKernelInt32/1048576/0                      132 us          132 us         4723 bytes_per_second=7.4224G/s null_percent=0 size=1048.58k
   MinMaxKernelInt32Aggregate/1048576/10000         631 us          631 us         1171 bytes_per_second=1.54789G/s null_percent=0.01 size=1048.58k
   MinMaxKernelInt32Aggregate/1048576/100           703 us          703 us         1096 bytes_per_second=1.38954G/s null_percent=1 size=1048.58k
   MinMaxKernelInt32Aggregate/1048576/10            808 us          808 us          911 bytes_per_second=1.20892G/s null_percent=10 size=1048.58k
   MinMaxKernelInt32Aggregate/1048576/2            1304 us         1304 us          564 bytes_per_second=766.908M/s null_percent=50 size=1048.58k
   MinMaxKernelInt32Aggregate/1048576/1             420 us          420 us         1758 bytes_per_second=2.32421G/s null_percent=100 size=1048.58k
   MinMaxKernelInt32Aggregate/1048576/0             624 us          624 us         1183 bytes_per_second=1.56476G/s null_percent=0 size=1048.58k
   MinMaxKernelInt64/1048576/10000                 73.8 us         73.8 us         9720 bytes_per_second=13.2297G/s null_percent=0.01 size=1048.58k
   MinMaxKernelInt64/1048576/100                    113 us          113 us         5369 bytes_per_second=8.65305G/s null_percent=1 size=1048.58k
   MinMaxKernelInt64/1048576/10                     206 us          206 us         3134 bytes_per_second=4.74316G/s null_percent=10 size=1048.58k
   MinMaxKernelInt64/1048576/2                      516 us          516 us         1000 bytes_per_second=1.89125G/s null_percent=50 size=1048.58k
   MinMaxKernelInt64/1048576/1                     3.89 us         3.89 us       187314 bytes_per_second=250.863G/s null_percent=100 size=1048.58k
   MinMaxKernelInt64/1048576/0                     71.5 us         71.5 us        10264 bytes_per_second=13.6533G/s null_percent=0 size=1048.58k
   MinMaxKernelInt64Aggregate/1048576/10000         305 us          305 us         2414 bytes_per_second=3.19811G/s null_percent=0.01 size=1048.58k
   MinMaxKernelInt64Aggregate/1048576/100           334 us          334 us         2210 bytes_per_second=2.92146G/s null_percent=1 size=1048.58k
   MinMaxKernelInt64Aggregate/1048576/10            407 us          407 us         1832 bytes_per_second=2.40181G/s null_percent=10 size=1048.58k
   MinMaxKernelInt64Aggregate/1048576/2             654 us          654 us         1117 bytes_per_second=1.49329G/s null_percent=50 size=1048.58k
   MinMaxKernelInt64Aggregate/1048576/1             217 us          217 us         3416 bytes_per_second=4.50848G/s null_percent=100 size=1048.58k
   MinMaxKernelInt64Aggregate/1048576/0             302 us          302 us         2430 bytes_per_second=3.23243G/s null_percent=0 size=1048.58k
   ```
   
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org