You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/27 13:23:45 UTC
[GitHub] [arrow] lidavidm commented on pull request #10813: ARROW-13451: [C++] WIP: POC: benchmark using hash aggregate kernels for scalar aggregation
lidavidm commented on pull request #10813:
URL: https://github.com/apache/arrow/pull/10813#issuecomment-887509020
This is just to see how the hash aggregate kernels perform compared to the dedicated scalar aggregation kernels in the case that there is only one group.
Unfortunately, it's rather terrible. For count:
<details>
```
---------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
---------------------------------------------------------------------------------------------------
CountKernelBenchInt64/1048576/2 3067 ns 3067 ns 457331 bytes_per_second=318.4G/s null_percent=50 size=1048.58k
CountKernelBenchInt64Aggregate/1048576/2 412162 ns 412138 ns 3238 bytes_per_second=2.3695G/s null_percent=50 size=1048.58k
```
</details>
At 2 orders of magnitude slower, the hash aggregate kernel isn't anywhere near the dedicated scalar one. The scalar kernel essentially just calls CountSetBits, while the hash aggregate kernel must use VisitSetBitRuns and index into a length-1 array of counts. Also, a good amount of time (~10% of the runtime according to perf) is spent just allocating and filling an array of group IDs to use at the start.
For min_max the story is not so clear. The hash aggregate kernel actually wins for floats, but loses badly (not as badly as with Count) for integers.
<details>
```
----------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
----------------------------------------------------------------------------------------------------
MinMaxKernelFloat/1048576/10000 981 us 981 us 638 bytes_per_second=1018.91M/s null_percent=0.01 size=1048.58k
MinMaxKernelFloat/1048576/100 1008 us 1008 us 723 bytes_per_second=992.523M/s null_percent=1 size=1048.58k
MinMaxKernelFloat/1048576/10 1062 us 1062 us 561 bytes_per_second=941.703M/s null_percent=10 size=1048.58k
MinMaxKernelFloat/1048576/2 1424 us 1424 us 456 bytes_per_second=702.401M/s null_percent=50 size=1048.58k
MinMaxKernelFloat/1048576/1 6.92 us 6.92 us 105816 bytes_per_second=141.155G/s null_percent=100 size=1048.58k
MinMaxKernelFloat/1048576/0 900 us 900 us 815 bytes_per_second=1111.18M/s null_percent=0 size=1048.58k
MinMaxKernelFloatAggregate/1048576/10000 667 us 667 us 1103 bytes_per_second=1.46325G/s null_percent=0.01 size=1048.58k
MinMaxKernelFloatAggregate/1048576/100 654 us 654 us 924 bytes_per_second=1.49389G/s null_percent=1 size=1048.58k
MinMaxKernelFloatAggregate/1048576/10 765 us 765 us 965 bytes_per_second=1.27599G/s null_percent=10 size=1048.58k
MinMaxKernelFloatAggregate/1048576/2 1267 us 1267 us 585 bytes_per_second=789.267M/s null_percent=50 size=1048.58k
MinMaxKernelFloatAggregate/1048576/1 421 us 421 us 1693 bytes_per_second=2.32129G/s null_percent=100 size=1048.58k
MinMaxKernelFloatAggregate/1048576/0 668 us 668 us 1107 bytes_per_second=1.46147G/s null_percent=0 size=1048.58k
MinMaxKernelDouble/1048576/10000 420 us 420 us 1712 bytes_per_second=2.32776G/s null_percent=0.01 size=1048.58k
MinMaxKernelDouble/1048576/100 465 us 465 us 1412 bytes_per_second=2.10164G/s null_percent=1 size=1048.58k
MinMaxKernelDouble/1048576/10 592 us 592 us 1168 bytes_per_second=1.64947G/s null_percent=10 size=1048.58k
MinMaxKernelDouble/1048576/2 730 us 730 us 1008 bytes_per_second=1.33826G/s null_percent=50 size=1048.58k
MinMaxKernelDouble/1048576/1 4.10 us 4.10 us 177426 bytes_per_second=238.21G/s null_percent=100 size=1048.58k
MinMaxKernelDouble/1048576/0 540 us 540 us 1000 bytes_per_second=1.80829G/s null_percent=0 size=1048.58k
MinMaxKernelDoubleAggregate/1048576/10000 342 us 342 us 2106 bytes_per_second=2.85799G/s null_percent=0.01 size=1048.58k
MinMaxKernelDoubleAggregate/1048576/100 346 us 346 us 2136 bytes_per_second=2.82629G/s null_percent=1 size=1048.58k
MinMaxKernelDoubleAggregate/1048576/10 385 us 385 us 1911 bytes_per_second=2.53959G/s null_percent=10 size=1048.58k
MinMaxKernelDoubleAggregate/1048576/2 631 us 631 us 1163 bytes_per_second=1.54829G/s null_percent=50 size=1048.58k
MinMaxKernelDoubleAggregate/1048576/1 218 us 218 us 3413 bytes_per_second=4.48758G/s null_percent=100 size=1048.58k
MinMaxKernelDoubleAggregate/1048576/0 334 us 334 us 2193 bytes_per_second=2.92247G/s null_percent=0 size=1048.58k
MinMaxKernelInt8/1048576/10000 571 us 571 us 1293 bytes_per_second=1.71088G/s null_percent=0.01 size=1048.58k
MinMaxKernelInt8/1048576/100 986 us 986 us 742 bytes_per_second=1014.35M/s null_percent=1 size=1048.58k
MinMaxKernelInt8/1048576/10 1818 us 1818 us 402 bytes_per_second=550.013M/s null_percent=10 size=1048.58k
MinMaxKernelInt8/1048576/2 4039 us 4039 us 182 bytes_per_second=247.588M/s null_percent=50 size=1048.58k
MinMaxKernelInt8/1048576/1 22.9 us 22.9 us 31922 bytes_per_second=42.701G/s null_percent=100 size=1048.58k
MinMaxKernelInt8/1048576/0 546 us 546 us 1368 bytes_per_second=1.78943G/s null_percent=0 size=1048.58k
MinMaxKernelInt8Aggregate/1048576/10000 2241 us 2241 us 325 bytes_per_second=446.241M/s null_percent=0.01 size=1048.58k
MinMaxKernelInt8Aggregate/1048576/100 2497 us 2497 us 299 bytes_per_second=400.494M/s null_percent=1 size=1048.58k
MinMaxKernelInt8Aggregate/1048576/10 3144 us 3143 us 237 bytes_per_second=318.129M/s null_percent=10 size=1048.58k
MinMaxKernelInt8Aggregate/1048576/2 5107 us 5107 us 100 bytes_per_second=195.815M/s null_percent=50 size=1048.58k
MinMaxKernelInt8Aggregate/1048576/1 1581 us 1581 us 386 bytes_per_second=632.705M/s null_percent=100 size=1048.58k
MinMaxKernelInt8Aggregate/1048576/0 2093 us 2093 us 274 bytes_per_second=477.747M/s null_percent=0 size=1048.58k
MinMaxKernelInt16/1048576/10000 274 us 274 us 2329 bytes_per_second=3.56594G/s null_percent=0.01 size=1048.58k
MinMaxKernelInt16/1048576/100 459 us 459 us 1429 bytes_per_second=2.12904G/s null_percent=1 size=1048.58k
MinMaxKernelInt16/1048576/10 842 us 842 us 698 bytes_per_second=1.1593G/s null_percent=10 size=1048.58k
MinMaxKernelInt16/1048576/2 1997 us 1997 us 370 bytes_per_second=500.784M/s null_percent=50 size=1048.58k
MinMaxKernelInt16/1048576/1 12.1 us 12.1 us 60436 bytes_per_second=80.6769G/s null_percent=100 size=1048.58k
MinMaxKernelInt16/1048576/0 271 us 271 us 2713 bytes_per_second=3.60785G/s null_percent=0 size=1048.58k
MinMaxKernelInt16Aggregate/1048576/10000 1226 us 1226 us 615 bytes_per_second=815.951M/s null_percent=0.01 size=1048.58k
MinMaxKernelInt16Aggregate/1048576/100 1326 us 1326 us 564 bytes_per_second=753.999M/s null_percent=1 size=1048.58k
MinMaxKernelInt16Aggregate/1048576/10 1608 us 1608 us 462 bytes_per_second=622M/s null_percent=10 size=1048.58k
MinMaxKernelInt16Aggregate/1048576/2 2700 us 2700 us 275 bytes_per_second=370.316M/s null_percent=50 size=1048.58k
MinMaxKernelInt16Aggregate/1048576/1 867 us 866 us 868 bytes_per_second=1.12704G/s null_percent=100 size=1048.58k
MinMaxKernelInt16Aggregate/1048576/0 1190 us 1190 us 620 bytes_per_second=840.448M/s null_percent=0 size=1048.58k
MinMaxKernelInt32/1048576/10000 139 us 139 us 4389 bytes_per_second=7.03596G/s null_percent=0.01 size=1048.58k
MinMaxKernelInt32/1048576/100 238 us 238 us 2483 bytes_per_second=4.10169G/s null_percent=1 size=1048.58k
MinMaxKernelInt32/1048576/10 515 us 515 us 1000 bytes_per_second=1.89702G/s null_percent=10 size=1048.58k
MinMaxKernelInt32/1048576/2 1021 us 1021 us 722 bytes_per_second=979.116M/s null_percent=50 size=1048.58k
MinMaxKernelInt32/1048576/1 6.55 us 6.55 us 109640 bytes_per_second=149.127G/s null_percent=100 size=1048.58k
MinMaxKernelInt32/1048576/0 132 us 132 us 4723 bytes_per_second=7.4224G/s null_percent=0 size=1048.58k
MinMaxKernelInt32Aggregate/1048576/10000 631 us 631 us 1171 bytes_per_second=1.54789G/s null_percent=0.01 size=1048.58k
MinMaxKernelInt32Aggregate/1048576/100 703 us 703 us 1096 bytes_per_second=1.38954G/s null_percent=1 size=1048.58k
MinMaxKernelInt32Aggregate/1048576/10 808 us 808 us 911 bytes_per_second=1.20892G/s null_percent=10 size=1048.58k
MinMaxKernelInt32Aggregate/1048576/2 1304 us 1304 us 564 bytes_per_second=766.908M/s null_percent=50 size=1048.58k
MinMaxKernelInt32Aggregate/1048576/1 420 us 420 us 1758 bytes_per_second=2.32421G/s null_percent=100 size=1048.58k
MinMaxKernelInt32Aggregate/1048576/0 624 us 624 us 1183 bytes_per_second=1.56476G/s null_percent=0 size=1048.58k
MinMaxKernelInt64/1048576/10000 73.8 us 73.8 us 9720 bytes_per_second=13.2297G/s null_percent=0.01 size=1048.58k
MinMaxKernelInt64/1048576/100 113 us 113 us 5369 bytes_per_second=8.65305G/s null_percent=1 size=1048.58k
MinMaxKernelInt64/1048576/10 206 us 206 us 3134 bytes_per_second=4.74316G/s null_percent=10 size=1048.58k
MinMaxKernelInt64/1048576/2 516 us 516 us 1000 bytes_per_second=1.89125G/s null_percent=50 size=1048.58k
MinMaxKernelInt64/1048576/1 3.89 us 3.89 us 187314 bytes_per_second=250.863G/s null_percent=100 size=1048.58k
MinMaxKernelInt64/1048576/0 71.5 us 71.5 us 10264 bytes_per_second=13.6533G/s null_percent=0 size=1048.58k
MinMaxKernelInt64Aggregate/1048576/10000 305 us 305 us 2414 bytes_per_second=3.19811G/s null_percent=0.01 size=1048.58k
MinMaxKernelInt64Aggregate/1048576/100 334 us 334 us 2210 bytes_per_second=2.92146G/s null_percent=1 size=1048.58k
MinMaxKernelInt64Aggregate/1048576/10 407 us 407 us 1832 bytes_per_second=2.40181G/s null_percent=10 size=1048.58k
MinMaxKernelInt64Aggregate/1048576/2 654 us 654 us 1117 bytes_per_second=1.49329G/s null_percent=50 size=1048.58k
MinMaxKernelInt64Aggregate/1048576/1 217 us 217 us 3416 bytes_per_second=4.50848G/s null_percent=100 size=1048.58k
MinMaxKernelInt64Aggregate/1048576/0 302 us 302 us 2430 bytes_per_second=3.23243G/s null_percent=0 size=1048.58k
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org