You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/09/01 09:32:37 UTC
[GitHub] [arrow] cyb70289 opened a new pull request #8091: ARROW-9873: [C++][Compute] Optimize mode kernel for integers in small…
cyb70289 opened a new pull request #8091:
URL: https://github.com/apache/arrow/pull/8091
… value range
For int16/32/64 arrays with reasonable length, scan the array to find
min/max values first. If (max-min) is within some threshold, instead
of general hashmap, using a value indexed array can improve performance
significantly.
To be compatible with chunked array, value count array is transferred to
hashmap before merging with others. This is an overhead for short array.
Finding min/max may also introduce performance penalty in some cases.
Please note it's hard to benefit all use cases. By applying this patch:
- about 2x performance uplift for integers in small value range
- no obvious performance drop for normal cases
- non-trivial performance drop in some cases
* 40% drop for short int8 array (8k length)
* 10% drop for sparse array (few distinct values, big value gap)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] cyb70289 removed a comment on pull request #8091: ARROW-9873: [C++][Compute] Optimize mode kernel for integers in small value range
Posted by GitBox <gi...@apache.org>.
cyb70289 removed a comment on pull request #8091:
URL: https://github.com/apache/arrow/pull/8091#issuecomment-684666816
Tested on skylake (knight landing).
Removed unrelated tests for clarity.
100% null benchmark is not stable, can be ignored.
```
$ archery benchmark diff --suite-filter="arrow-compute-aggregate-benchmark" --cc=clang-9 --cxx=clang++-9
benchmark baseline contender change %
1 ModeKernelInt16/1048576/10 122.148 MiB/sec 364.273 MiB/sec 198.222 {'null_percent': 10.0}
24 ModeKernelInt32/1048576/10000 227.845 MiB/sec 676.265 MiB/sec 196.809 {'null_percent': 0.01}
14 ModeKernelInt16/1048576/100 116.265 MiB/sec 342.747 MiB/sec 194.798 {'null_percent': 1.0}
23 ModeKernelInt32/1048576/100 229.216 MiB/sec 669.534 MiB/sec 192.098 {'null_percent': 1.0}
17 ModeKernelInt32/1048576/10 241.768 MiB/sec 700.500 MiB/sec 189.740 {'null_percent': 10.0}
16 ModeKernelInt16/1048576/10000 115.484 MiB/sec 324.841 MiB/sec 181.287 {'null_percent': 0.01}
8 ModeKernelInt64/1048576/10000 481.374 MiB/sec 1.308 GiB/sec 178.214 {'null_percent': 0.01}
10 ModeKernelInt64/1048576/100 485.696 MiB/sec 1.313 GiB/sec 176.807 {'null_percent': 1.0}
20 ModeKernelInt64/1048576/10 527.967 MiB/sec 1.383 GiB/sec 168.292 {'null_percent': 10.0}
12 ModeKernelInt32/1048576/0 257.092 MiB/sec 687.083 MiB/sec 167.251 {'null_percent': 0.0}
15 ModeKernelInt16/1048576/0 128.398 MiB/sec 342.670 MiB/sec 166.880 {'null_percent': 0.0}
11 ModeKernelInt64/1048576/0 513.891 MiB/sec 1.325 GiB/sec 164.088 {'null_percent': 0.0}
4 ModeKernelInt16/1048576/2 162.258 MiB/sec 384.148 MiB/sec 136.752 {'null_percent': 50.0}
25 ModeKernelInt32/1048576/2 320.094 MiB/sec 650.958 MiB/sec 103.365 {'null_percent': 50.0}
2 ModeKernelInt64/1048576/2 811.335 MiB/sec 1.403 GiB/sec 77.111 {'null_percent': 50.0}
0 ModeKernelInt64/1048576/1 7.906 GiB/sec 7.888 GiB/sec -0.230 {'null_percent': 100.0}
9 ModeKernelBoolean/1048576/0 26.502 MiB/sec 26.148 MiB/sec -1.337 {'null_percent': 0.0}
19 ModeKernelBoolean/1048576/2 27.081 MiB/sec 26.518 MiB/sec -2.078 {'null_percent': 50.0}
18 ModeKernelBoolean/1048576/10000 26.126 MiB/sec 25.361 MiB/sec -2.927 {'null_percent': 0.01}
5 ModeKernelBoolean/1048576/10 27.987 MiB/sec 27.111 MiB/sec -3.131 {'null_percent': 10.0}
27 ModeKernelBoolean/1048576/100 25.729 MiB/sec 24.911 MiB/sec -3.181 {'null_percent': 1.0}
28 ModeKernelBoolean/1048576/1 123.582 MiB/sec 118.548 MiB/sec -4.074 {'null_percent': 100.0}
6 ModeKernelInt8/1048576/10000 248.084 MiB/sec 232.688 MiB/sec -6.206 {'null_percent': 0.01}
21 ModeKernelInt8/1048576/100 253.614 MiB/sec 230.012 MiB/sec -9.306 {'null_percent': 1.0}
26 ModeKernelInt8/1048576/0 270.337 MiB/sec 244.879 MiB/sec -9.417 {'null_percent': 0.0}
3 ModeKernelInt8/1048576/10 247.946 MiB/sec 223.306 MiB/sec -9.938 {'null_percent': 10.0}
13 ModeKernelInt8/1048576/2 198.628 MiB/sec 178.205 MiB/sec -10.282 {'null_percent': 50.0}
29 ModeKernelInt8/1048576/1 948.757 MiB/sec 828.408 MiB/sec -12.685 {'null_percent': 100.0}
22 ModeKernelInt16/1048576/1 2.852 GiB/sec 2.265 GiB/sec -20.587 {'null_percent': 100.0}
7 ModeKernelInt32/1048576/1 5.714 GiB/sec 3.430 GiB/sec -39.976 {'null_percent': 100.0}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou closed pull request #8091: ARROW-9873: [C++][Compute] Optimize mode kernel for integers in small value range
Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #8091:
URL: https://github.com/apache/arrow/pull/8091
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #8091: ARROW-9873: [C++][Compute] Optimize mode kernel for integers in small…
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8091:
URL: https://github.com/apache/arrow/pull/8091#issuecomment-684681790
https://issues.apache.org/jira/browse/ARROW-9873
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #8091: ARROW-9873: [C++][Compute] Optimize mode kernel for integers in small value range
Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8091:
URL: https://github.com/apache/arrow/pull/8091#discussion_r481275863
##########
File path: cpp/src/arrow/compute/kernels/aggregate_mode.cc
##########
@@ -26,95 +26,187 @@ namespace aggregate {
namespace {
+// Count values and return value:count map
+template <typename T>
+struct ValueCounter {
+ virtual void CountOne(T value) = 0;
Review comment:
I don't think polymorphism is a good idea for performance. Instead, you should probably use templated code.
Also, when `null_count == length`, the implementation can be trivial.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] cyb70289 commented on pull request #8091: ARROW-9873: [C++][Compute] Optimize mode kernel for integers in small…
Posted by GitBox <gi...@apache.org>.
cyb70289 commented on pull request #8091:
URL: https://github.com/apache/arrow/pull/8091#issuecomment-684666816
Tested on skylake (knight landing).
Removed unrelated tests for clarity.
100% null benchmark is not stable, can be ignored.
```
$ archery benchmark diff --suite-filter="arrow-compute-aggregate-benchmark" --cc=clang-9 --cxx=clang++-9
benchmark baseline contender change %
1 ModeKernelInt16/1048576/10 122.148 MiB/sec 364.273 MiB/sec 198.222 {'null_percent': 10.0}
24 ModeKernelInt32/1048576/10000 227.845 MiB/sec 676.265 MiB/sec 196.809 {'null_percent': 0.01}
14 ModeKernelInt16/1048576/100 116.265 MiB/sec 342.747 MiB/sec 194.798 {'null_percent': 1.0}
23 ModeKernelInt32/1048576/100 229.216 MiB/sec 669.534 MiB/sec 192.098 {'null_percent': 1.0}
17 ModeKernelInt32/1048576/10 241.768 MiB/sec 700.500 MiB/sec 189.740 {'null_percent': 10.0}
16 ModeKernelInt16/1048576/10000 115.484 MiB/sec 324.841 MiB/sec 181.287 {'null_percent': 0.01}
8 ModeKernelInt64/1048576/10000 481.374 MiB/sec 1.308 GiB/sec 178.214 {'null_percent': 0.01}
10 ModeKernelInt64/1048576/100 485.696 MiB/sec 1.313 GiB/sec 176.807 {'null_percent': 1.0}
20 ModeKernelInt64/1048576/10 527.967 MiB/sec 1.383 GiB/sec 168.292 {'null_percent': 10.0}
12 ModeKernelInt32/1048576/0 257.092 MiB/sec 687.083 MiB/sec 167.251 {'null_percent': 0.0}
15 ModeKernelInt16/1048576/0 128.398 MiB/sec 342.670 MiB/sec 166.880 {'null_percent': 0.0}
11 ModeKernelInt64/1048576/0 513.891 MiB/sec 1.325 GiB/sec 164.088 {'null_percent': 0.0}
4 ModeKernelInt16/1048576/2 162.258 MiB/sec 384.148 MiB/sec 136.752 {'null_percent': 50.0}
25 ModeKernelInt32/1048576/2 320.094 MiB/sec 650.958 MiB/sec 103.365 {'null_percent': 50.0}
2 ModeKernelInt64/1048576/2 811.335 MiB/sec 1.403 GiB/sec 77.111 {'null_percent': 50.0}
0 ModeKernelInt64/1048576/1 7.906 GiB/sec 7.888 GiB/sec -0.230 {'null_percent': 100.0}
9 ModeKernelBoolean/1048576/0 26.502 MiB/sec 26.148 MiB/sec -1.337 {'null_percent': 0.0}
19 ModeKernelBoolean/1048576/2 27.081 MiB/sec 26.518 MiB/sec -2.078 {'null_percent': 50.0}
18 ModeKernelBoolean/1048576/10000 26.126 MiB/sec 25.361 MiB/sec -2.927 {'null_percent': 0.01}
5 ModeKernelBoolean/1048576/10 27.987 MiB/sec 27.111 MiB/sec -3.131 {'null_percent': 10.0}
27 ModeKernelBoolean/1048576/100 25.729 MiB/sec 24.911 MiB/sec -3.181 {'null_percent': 1.0}
28 ModeKernelBoolean/1048576/1 123.582 MiB/sec 118.548 MiB/sec -4.074 {'null_percent': 100.0}
6 ModeKernelInt8/1048576/10000 248.084 MiB/sec 232.688 MiB/sec -6.206 {'null_percent': 0.01}
21 ModeKernelInt8/1048576/100 253.614 MiB/sec 230.012 MiB/sec -9.306 {'null_percent': 1.0}
26 ModeKernelInt8/1048576/0 270.337 MiB/sec 244.879 MiB/sec -9.417 {'null_percent': 0.0}
3 ModeKernelInt8/1048576/10 247.946 MiB/sec 223.306 MiB/sec -9.938 {'null_percent': 10.0}
13 ModeKernelInt8/1048576/2 198.628 MiB/sec 178.205 MiB/sec -10.282 {'null_percent': 50.0}
29 ModeKernelInt8/1048576/1 948.757 MiB/sec 828.408 MiB/sec -12.685 {'null_percent': 100.0}
22 ModeKernelInt16/1048576/1 2.852 GiB/sec 2.265 GiB/sec -20.587 {'null_percent': 100.0}
7 ModeKernelInt32/1048576/1 5.714 GiB/sec 3.430 GiB/sec -39.976 {'null_percent': 100.0}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] cyb70289 commented on a change in pull request #8091: ARROW-9873: [C++][Compute] Optimize mode kernel for integers in small value range
Posted by GitBox <gi...@apache.org>.
cyb70289 commented on a change in pull request #8091:
URL: https://github.com/apache/arrow/pull/8091#discussion_r481789852
##########
File path: cpp/src/arrow/compute/kernels/aggregate_mode.cc
##########
@@ -26,95 +26,187 @@ namespace aggregate {
namespace {
+// Count values and return value:count map
+template <typename T>
+struct ValueCounter {
+ virtual void CountOne(T value) = 0;
Review comment:
Changed to template. Did get better performance. Thanks.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #8091: ARROW-9873: [C++][Compute] Optimize mode kernel for integers in small value range
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8091:
URL: https://github.com/apache/arrow/pull/8091#issuecomment-685539220
I get similar speedups on AMD Ryzen.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] cyb70289 commented on pull request #8091: ARROW-9873: [C++][Compute] Optimize mode kernel for integers in small value range
Posted by GitBox <gi...@apache.org>.
cyb70289 commented on pull request #8091:
URL: https://github.com/apache/arrow/pull/8091#issuecomment-685472515
"RTools 35" CI failure is not related
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] cyb70289 commented on pull request #8091: ARROW-9873: [C++][Compute] Optimize mode kernel for integers in small value range
Posted by GitBox <gi...@apache.org>.
cyb70289 commented on pull request #8091:
URL: https://github.com/apache/arrow/pull/8091#issuecomment-685348952
Latest benchmark result after re-implementation.
```
# Tested on skylake (knight landing)
$ archery benchmark diff --suite-filter="arrow-compute-aggregate-benchmark" --benchmark-filter="^Mode" --cc=clang-9 --cxx=clang++-9
benchmark baseline contender change %
// igonre these 100% cases, huge boost due to a simple trick, not very useful
ModeKernelBoolean/1048576/1 123.216 MiB/sec 847.125 GiB/sec 703915.356 null_percent': 100.0
ModeKernelInt8/1048576/1 896.330 MiB/sec 617.997 GiB/sec 70502.192 null_percent': 100.0
ModeKernelInt16/1048576/1 2.886 GiB/sec 965.237 GiB/sec 33340.541 null_percent': 100.0
ModeKernelInt32/1048576/1 5.732 GiB/sec 960.476 GiB/sec 16657.027 null_percent': 100.0
ModeKernelInt64/1048576/1 7.925 GiB/sec 974.705 GiB/sec 12198.487 null_percent': 100.0
// big improvement for int16/32/64 with limited value range
ModeKernelInt16/1048576/0 128.522 MiB/sec 495.771 MiB/sec 285.749 'null_percent': 0.0
ModeKernelInt32/1048576/0 257.694 MiB/sec 953.232 MiB/sec 269.909 'null_percent': 0.0
ModeKernelInt64/1048576/0 516.624 MiB/sec 1.715 GiB/sec 240.027 'null_percent': 0.0
ModeKernelInt32/1048576/10000 227.404 MiB/sec 690.032 MiB/sec 203.439 'null_percent': 0.01
ModeKernelInt16/1048576/10000 115.419 MiB/sec 349.055 MiB/sec 202.425 'null_percent': 0.01
ModeKernelInt32/1048576/100 229.661 MiB/sec 684.149 MiB/sec 197.895 'null_percent': 1.0
ModeKernelInt16/1048576/100 116.084 MiB/sec 342.620 MiB/sec 195.148 'null_percent': 1.0
ModeKernelInt64/1048576/10000 481.409 MiB/sec 1.302 GiB/sec 176.913 'null_percent': 0.01
ModeKernelInt64/1048576/100 486.266 MiB/sec 1.297 GiB/sec 173.114 'null_percent': 1.0
ModeKernelInt16/1048576/10 121.865 MiB/sec 315.932 MiB/sec 159.247 'null_percent': 10.0
ModeKernelInt32/1048576/10 242.074 MiB/sec 625.162 MiB/sec 158.252 'null_percent': 10.0
ModeKernelInt64/1048576/10 527.976 MiB/sec 1.199 GiB/sec 132.580 'null_percent': 10.0
ModeKernelInt32/1048576/2 320.156 MiB/sec 429.196 MiB/sec 34.058 'null_percent': 50.0
ModeKernelInt16/1048576/2 162.121 MiB/sec 196.310 MiB/sec 21.089 'null_percent': 50.0
// no obvious difference for bool/int8
ModeKernelInt8/1048576/100 234.422 MiB/sec 251.464 MiB/sec 7.270 'null_percent': 1.0
ModeKernelInt8/1048576/10 246.324 MiB/sec 258.110 MiB/sec 4.785 'null_percent': 10.0
ModeKernelInt8/1048576/10000 239.496 MiB/sec 250.469 MiB/sec 4.582 'null_percent': 0.01
ModeKernelInt64/1048576/2 812.020 MiB/sec 832.610 MiB/sec 2.536 'null_percent': 50.0
ModeKernelBoolean/1048576/10000 26.318 MiB/sec 26.509 MiB/sec 0.728 'null_percent': 0.01
ModeKernelBoolean/1048576/100 26.510 MiB/sec 26.597 MiB/sec 0.327 'null_percent': 1.0
ModeKernelBoolean/1048576/0 28.271 MiB/sec 28.274 MiB/sec 0.009 'null_percent': 0.0
ModeKernelInt8/1048576/0 270.401 MiB/sec 269.025 MiB/sec -0.509 'null_percent': 0.0
ModeKernelInt8/1048576/2 190.410 MiB/sec 187.876 MiB/sec -1.331 'null_percent': 50.0
ModeKernelBoolean/1048576/10 28.007 MiB/sec 27.599 MiB/sec -1.455 'null_percent': 10.0
ModeKernelBoolean/1048576/2 27.157 MiB/sec 24.209 MiB/sec -10.857 'null_percent': 50.0
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org