You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/06/22 23:29:03 UTC
[GitHub] [arrow] wesm opened a new pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h
wesm opened a new pull request #7521:
URL: https://github.com/apache/arrow/pull/7521
This significantly speeds up processing of mostly-not-null or mostly-null data, while having almost no overhead for the other scenarios where you rarely have a word-sized run of all-not-null or all-null-data. For data with null_count 0, data is processed in blocks of INT16_MAX values at a time, so this adds no meaningful overhead for this case either.
I modified the hash benchmarks where this code is used to exhibit both the cases that benefit from this optimization as well as the ones that don't.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm edited a comment on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h
Posted by GitBox <gi...@apache.org>.
wesm edited a comment on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647821487
Here's a benchmark run with gcc-8
```
---------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------
BuildDictionary 3219443 ns 3219440 ns 218 1.21215GB/s
BuildStringDictionary 3692881 ns 3692881 ns 192 81.7532MB/s
UniqueInt64/0 14413456 ns 14413251 ns 48 null_percent=0 2.16814GB/s
UniqueInt64/1 15516052 ns 15515737 ns 45 null_percent=0.1 2.01408GB/s
UniqueInt64/2 17031282 ns 17031266 ns 41 null_percent=1 1.83486GB/s
UniqueInt64/3 20680114 ns 20680064 ns 34 null_percent=10 1.51112GB/s
UniqueInt64/4 12018069 ns 12017844 ns 57 null_percent=99 2.6003GB/s
UniqueInt64/5 9179953 ns 9179946 ns 77 null_percent=100 3.40416GB/s
UniqueInt64/6 15501523 ns 15501496 ns 45 null_percent=0 2.01593GB/s
UniqueInt64/7 16482935 ns 16482300 ns 41 null_percent=0.1 1.89597GB/s
UniqueInt64/8 18349988 ns 18349317 ns 38 null_percent=1 1.70306GB/s
UniqueInt64/9 21439268 ns 21439244 ns 32 null_percent=10 1.45761GB/s
UniqueInt64/10 12530067 ns 12529871 ns 55 null_percent=99 2.49404GB/s
UniqueInt64/11 9167314 ns 9167365 ns 75 null_percent=100 3.40883GB/s
UniqueString10bytes/0 43535899 ns 43535846 ns 16 null_percent=0 918.783MB/s
UniqueString10bytes/1 45130595 ns 45129634 ns 16 null_percent=0.1 886.336MB/s
UniqueString10bytes/2 45249034 ns 45247983 ns 15 null_percent=1 884.017MB/s
UniqueString10bytes/3 45101533 ns 45100209 ns 16 null_percent=10 886.914MB/s
UniqueString10bytes/4 4316048 ns 4316019 ns 163 null_percent=99 9.05059GB/s
UniqueString10bytes/5 1435781 ns 1435763 ns 485 null_percent=100 27.2068GB/s
UniqueString10bytes/6 59100344 ns 59098817 ns 12 null_percent=0 676.832MB/s
UniqueString10bytes/7 59797544 ns 59795857 ns 12 null_percent=0.1 668.943MB/s
UniqueString10bytes/8 61024697 ns 61023090 ns 11 null_percent=1 655.49MB/s
UniqueString10bytes/9 59817211 ns 59816339 ns 12 null_percent=10 668.714MB/s
UniqueString10bytes/10 4950387 ns 4950242 ns 134 null_percent=99 7.89103GB/s
UniqueString10bytes/11 1443482 ns 1443434 ns 446 null_percent=100 27.0622GB/s
UniqueString100bytes/0 95609006 ns 95606132 ns 7 null_percent=0 4.08577GB/s
UniqueString100bytes/1 96850582 ns 96849441 ns 7 null_percent=0.1 4.03332GB/s
UniqueString100bytes/2 95404742 ns 95404634 ns 7 null_percent=1 4.0944GB/s
UniqueString100bytes/3 89401775 ns 89401006 ns 8 null_percent=10 4.36936GB/s
UniqueString100bytes/4 4705868 ns 4705746 ns 148 null_percent=99 83.0102GB/s
UniqueString100bytes/5 1434077 ns 1434055 ns 486 null_percent=100 272.392GB/s
UniqueString100bytes/6 206155133 ns 206148425 ns 3 null_percent=0 1.89487GB/s
UniqueString100bytes/7 204661287 ns 204653659 ns 3 null_percent=0.1 1.90871GB/s
UniqueString100bytes/8 205941884 ns 205941271 ns 3 null_percent=1 1.89678GB/s
UniqueString100bytes/9 192074501 ns 192073431 ns 4 null_percent=10 2.03373GB/s
UniqueString100bytes/10 6180349 ns 6180227 ns 111 null_percent=99 63.2056GB/s
UniqueString100bytes/11 1474565 ns 1474564 ns 482 null_percent=100 264.909GB/s
UniqueUInt8/0 1990025 ns 1990023 ns 348 null_percent=0 1.96292GB/s
UniqueUInt8/1 2594146 ns 2594089 ns 272 null_percent=0.1 1.50583GB/s
UniqueUInt8/2 4726027 ns 4726053 ns 145 null_percent=1 846.372MB/s
UniqueUInt8/3 9465222 ns 9465126 ns 75 null_percent=10 422.604MB/s
UniqueUInt8/4 3557141 ns 3557135 ns 195 null_percent=99 1124.5MB/s
UniqueUInt8/5 2259664 ns 2259664 ns 314 null_percent=100 1.72869GB/s
```
(I need to add "num_unique" to `state.counters` -- there are two different cardinality cases represented here)
Here is the % diff versus the baseline.
* Cases 1 and 7 are the mostly-not-null cases. This shows a 15-20% perf improvement
* Cases 5 and 11 are the all-null cases.
* Case 4 and 10 are the 99% null cases
* The "BuildDictionary" case at the bottom with the perf regression is one of the "worst case scenarios". 89% of the values are null and so we almost never observe an all-null or all-not-null block. The use of `BitUtil::GetBit` over BitmapReader causes this slightly regression since nearly every validity bit must be checked separately. I don't think it's worth optimizing for this case since the others are more empirically representative of real world data
```
benchmark baseline contender change % regression
8 UniqueString100bytes/5 40.668 GiB/sec 272.392 GiB/sec 569.787 False
37 UniqueString10bytes/5 4.064 GiB/sec 27.207 GiB/sec 569.456 False
33 UniqueString10bytes/11 4.065 GiB/sec 27.062 GiB/sec 565.751 False
12 UniqueString100bytes/11 40.578 GiB/sec 264.909 GiB/sec 552.841 False
0 UniqueString10bytes/4 3.568 GiB/sec 9.051 GiB/sec 153.692 False
36 UniqueString100bytes/4 34.408 GiB/sec 83.010 GiB/sec 141.252 False
19 UniqueString10bytes/10 3.375 GiB/sec 7.891 GiB/sec 133.794 False
24 UniqueUInt8/1 677.981 MiB/sec 1.506 GiB/sec 127.435 False
5 UniqueString100bytes/10 30.775 GiB/sec 63.206 GiB/sec 105.381 False
27 UniqueUInt8/5 1000.163 MiB/sec 1.729 GiB/sec 76.989 False
13 UniqueUInt8/2 650.819 MiB/sec 846.372 MiB/sec 30.047 False
29 UniqueInt64/11 2.703 GiB/sec 3.409 GiB/sec 26.126 False
7 UniqueInt64/5 2.704 GiB/sec 3.404 GiB/sec 25.903 False
18 UniqueUInt8/4 932.926 MiB/sec 1.098 GiB/sec 20.535 False
23 UniqueInt64/1 1.681 GiB/sec 2.014 GiB/sec 19.840 False
21 UniqueInt64/7 1.628 GiB/sec 1.896 GiB/sec 16.476 False
31 UniqueInt64/2 1.658 GiB/sec 1.835 GiB/sec 10.651 False
20 UniqueString10bytes/7 612.647 MiB/sec 668.943 MiB/sec 9.189 False
16 UniqueInt64/3 1.386 GiB/sec 1.511 GiB/sec 9.053 False
38 UniqueString10bytes/8 601.259 MiB/sec 655.490 MiB/sec 9.019 False
1 UniqueUInt8/0 1.808 GiB/sec 1.963 GiB/sec 8.588 False
41 UniqueInt64/9 1.355 GiB/sec 1.458 GiB/sec 7.562 False
14 UniqueString10bytes/1 830.614 MiB/sec 886.336 MiB/sec 6.709 False
4 UniqueInt64/8 1.603 GiB/sec 1.703 GiB/sec 6.260 False
32 UniqueString10bytes/2 847.018 MiB/sec 884.017 MiB/sec 4.368 False
42 UniqueInt64/4 2.508 GiB/sec 2.600 GiB/sec 3.701 False
39 UniqueString10bytes/3 855.985 MiB/sec 886.914 MiB/sec 3.613 False
28 UniqueInt64/10 2.413 GiB/sec 2.494 GiB/sec 3.360 False
34 UniqueString100bytes/3 4.254 GiB/sec 4.369 GiB/sec 2.722 False
11 UniqueString100bytes/2 3.993 GiB/sec 4.094 GiB/sec 2.544 False
9 UniqueString10bytes/9 654.257 MiB/sec 668.714 MiB/sec 2.210 False
35 UniqueString10bytes/6 662.915 MiB/sec 676.832 MiB/sec 2.099 False
6 BuildStringDictionary 80.971 MiB/sec 81.753 MiB/sec 0.966 False
22 UniqueString100bytes/1 4.002 GiB/sec 4.033 GiB/sec 0.783 False
25 UniqueInt64/0 2.153 GiB/sec 2.168 GiB/sec 0.697 False
17 UniqueString10bytes/0 917.726 MiB/sec 918.783 MiB/sec 0.115 False
43 UniqueInt64/6 2.017 GiB/sec 2.016 GiB/sec -0.071 False
40 UniqueString100bytes/0 4.091 GiB/sec 4.086 GiB/sec -0.130 False
3 UniqueString100bytes/7 1.938 GiB/sec 1.909 GiB/sec -1.519 False
26 UniqueString100bytes/8 1.954 GiB/sec 1.897 GiB/sec -2.935 False
2 UniqueString100bytes/9 2.114 GiB/sec 2.034 GiB/sec -3.782 False
30 UniqueString100bytes/6 2.008 GiB/sec 1.895 GiB/sec -5.649 True
10 UniqueUInt8/3 474.468 MiB/sec 422.604 MiB/sec -10.931 True
15 BuildDictionary 1.776 GiB/sec 1.212 GiB/sec -31.742 True
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm edited a comment on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h
Posted by GitBox <gi...@apache.org>.
wesm edited a comment on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647821487
Here's a benchmark run with gcc-8
```
-------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
-------------------------------------------------------------------------------
BuildDictionary 2625315 ns 2625247 ns 271 null_percent=0.88889 1.4865GB/s
BuildStringDictionary 3475855 ns 3475854 ns 200 86.8577MB/s
UniqueInt64/0 9842842 ns 9842834 ns 71 null_percent=0 num_unique=1024 3.1749GB/s
UniqueInt64/1 10617685 ns 10617360 ns 66 null_percent=0.1 num_unique=1024 2.94329GB/s
UniqueInt64/2 12648447 ns 12648430 ns 59 null_percent=1 num_unique=1024 2.47066GB/s
UniqueInt64/3 15365202 ns 15365113 ns 43 null_percent=10 num_unique=1024 2.03383GB/s
UniqueInt64/4 5126936 ns 5126851 ns 128 null_percent=99 num_unique=1024 6.09536GB/s
UniqueInt64/5 1763829 ns 1763809 ns 400 null_percent=100 num_unique=1024 17.7173GB/s
UniqueInt64/6 10545960 ns 10545841 ns 67 null_percent=0 num_unique=10.24k 2.96325GB/s
UniqueInt64/7 11478529 ns 11478403 ns 61 null_percent=0.1 num_unique=10.24k 2.7225GB/s
UniqueInt64/8 12792912 ns 12792429 ns 54 null_percent=1 num_unique=10.24k 2.44285GB/s
UniqueInt64/9 16805938 ns 16805535 ns 44 null_percent=10 num_unique=10.24k 1.85951GB/s
UniqueInt64/10 5503266 ns 5503108 ns 114 null_percent=99 num_unique=10.24k 5.67861GB/s
UniqueInt64/11 1763742 ns 1763699 ns 392 null_percent=100 num_unique=10.24k 17.7184GB/s
UniqueString10bytes/0 44193582 ns 44191679 ns 16 null_percent=0 num_unique=1024 905.148MB/s
UniqueString10bytes/1 45022703 ns 45022263 ns 15 null_percent=0.1 num_unique=1024 888.449MB/s
UniqueString10bytes/2 47131705 ns 47130800 ns 15 null_percent=1 num_unique=1024 848.702MB/s
UniqueString10bytes/3 50106213 ns 50105455 ns 14 null_percent=10 num_unique=1024 798.316MB/s
UniqueString10bytes/4 15905586 ns 15905158 ns 43 null_percent=99 num_unique=1024 2.45596GB/s
UniqueString10bytes/5 12983446 ns 12983327 ns 55 null_percent=100 num_unique=1024 3.00867GB/s
UniqueString10bytes/6 62149404 ns 62148971 ns 11 null_percent=0 num_unique=10.24k 643.615MB/s
UniqueString10bytes/7 62707969 ns 62705282 ns 11 null_percent=0.1 num_unique=10.24k 637.905MB/s
UniqueString10bytes/8 65508665 ns 65508532 ns 10 null_percent=1 num_unique=10.24k 610.607MB/s
UniqueString10bytes/9 65766803 ns 65766094 ns 11 null_percent=10 num_unique=10.24k 608.216MB/s
UniqueString10bytes/10 16297990 ns 16298076 ns 43 null_percent=99 num_unique=10.24k 2.39676GB/s
UniqueString10bytes/11 13298987 ns 13298798 ns 54 null_percent=100 num_unique=10.24k 2.9373GB/s
UniqueString100bytes/0 94204048 ns 94200614 ns 7 null_percent=0 num_unique=1024 4.14674GB/s
UniqueString100bytes/1 95631478 ns 95630838 ns 7 null_percent=0.1 num_unique=1024 4.08472GB/s
UniqueString100bytes/2 96547756 ns 96546348 ns 7 null_percent=1 num_unique=1024 4.04598GB/s
UniqueString100bytes/3 91950796 ns 91949032 ns 8 null_percent=10 num_unique=1024 4.24828GB/s
UniqueString100bytes/4 17292562 ns 17291979 ns 42 null_percent=99 num_unique=1024 22.59GB/s
UniqueString100bytes/5 13096944 ns 13096809 ns 55 null_percent=100 num_unique=1024 29.826GB/s
UniqueString100bytes/6 196165738 ns 196161451 ns 4 null_percent=0 num_unique=10.24k 1.99134GB/s
UniqueString100bytes/7 198475556 ns 198475456 ns 4 null_percent=0.1 num_unique=10.24k 1.96813GB/s
UniqueString100bytes/8 199273625 ns 199270358 ns 3 null_percent=1 num_unique=10.24k 1.96028GB/s
UniqueString100bytes/9 189235180 ns 189232925 ns 4 null_percent=10 num_unique=10.24k 2.06425GB/s
UniqueString100bytes/10 18381309 ns 18381409 ns 36 null_percent=99 num_unique=10.24k 21.2511GB/s
UniqueString100bytes/11 13426102 ns 13426072 ns 51 null_percent=100 num_unique=10.24k 29.0945GB/s
UniqueUInt8/0 2239549 ns 2239561 ns 309 null_percent=0 num_unique=200 1.7442GB/s
UniqueUInt8/1 2687371 ns 2687349 ns 248 null_percent=0.1 num_unique=200 1.45357GB/s
UniqueUInt8/2 4244052 ns 4244058 ns 166 null_percent=1 num_unique=200 942.494MB/s
UniqueUInt8/3 7563076 ns 7563066 ns 94 null_percent=10 num_unique=200 528.886MB/s
UniqueUInt8/4 3313484 ns 3313447 ns 214 null_percent=99 num_unique=200 1.17891GB/s
UniqueUInt8/5 1711948 ns 1711947 ns 415 null_percent=100 num_unique=200 2.28176GB/s
```
Here is the % diff versus the baseline.
* Cases 1 and 7 are the mostly-not-null cases. This shows a 15-20% perf improvement
* Cases 5 and 11 are the all-null cases.
* Case 4 and 10 are the 99% null cases
* The "BuildDictionary" case at the bottom with the perf regression is one of the "worst case scenarios". 89% of the values are null and so we almost never observe an all-null or all-not-null block. The use of `BitUtil::GetBit` over BitmapReader causes this slightly regression since nearly every validity bit must be checked separately. I don't think it's worth optimizing for this case since the others are more empirically representative of real world data
```
benchmark baseline contender change % regression
8 UniqueString100bytes/5 40.668 GiB/sec 272.392 GiB/sec 569.787 False
37 UniqueString10bytes/5 4.064 GiB/sec 27.207 GiB/sec 569.456 False
33 UniqueString10bytes/11 4.065 GiB/sec 27.062 GiB/sec 565.751 False
12 UniqueString100bytes/11 40.578 GiB/sec 264.909 GiB/sec 552.841 False
0 UniqueString10bytes/4 3.568 GiB/sec 9.051 GiB/sec 153.692 False
36 UniqueString100bytes/4 34.408 GiB/sec 83.010 GiB/sec 141.252 False
19 UniqueString10bytes/10 3.375 GiB/sec 7.891 GiB/sec 133.794 False
24 UniqueUInt8/1 677.981 MiB/sec 1.506 GiB/sec 127.435 False
5 UniqueString100bytes/10 30.775 GiB/sec 63.206 GiB/sec 105.381 False
27 UniqueUInt8/5 1000.163 MiB/sec 1.729 GiB/sec 76.989 False
13 UniqueUInt8/2 650.819 MiB/sec 846.372 MiB/sec 30.047 False
29 UniqueInt64/11 2.703 GiB/sec 3.409 GiB/sec 26.126 False
7 UniqueInt64/5 2.704 GiB/sec 3.404 GiB/sec 25.903 False
18 UniqueUInt8/4 932.926 MiB/sec 1.098 GiB/sec 20.535 False
23 UniqueInt64/1 1.681 GiB/sec 2.014 GiB/sec 19.840 False
21 UniqueInt64/7 1.628 GiB/sec 1.896 GiB/sec 16.476 False
31 UniqueInt64/2 1.658 GiB/sec 1.835 GiB/sec 10.651 False
20 UniqueString10bytes/7 612.647 MiB/sec 668.943 MiB/sec 9.189 False
16 UniqueInt64/3 1.386 GiB/sec 1.511 GiB/sec 9.053 False
38 UniqueString10bytes/8 601.259 MiB/sec 655.490 MiB/sec 9.019 False
1 UniqueUInt8/0 1.808 GiB/sec 1.963 GiB/sec 8.588 False
41 UniqueInt64/9 1.355 GiB/sec 1.458 GiB/sec 7.562 False
14 UniqueString10bytes/1 830.614 MiB/sec 886.336 MiB/sec 6.709 False
4 UniqueInt64/8 1.603 GiB/sec 1.703 GiB/sec 6.260 False
32 UniqueString10bytes/2 847.018 MiB/sec 884.017 MiB/sec 4.368 False
42 UniqueInt64/4 2.508 GiB/sec 2.600 GiB/sec 3.701 False
39 UniqueString10bytes/3 855.985 MiB/sec 886.914 MiB/sec 3.613 False
28 UniqueInt64/10 2.413 GiB/sec 2.494 GiB/sec 3.360 False
34 UniqueString100bytes/3 4.254 GiB/sec 4.369 GiB/sec 2.722 False
11 UniqueString100bytes/2 3.993 GiB/sec 4.094 GiB/sec 2.544 False
9 UniqueString10bytes/9 654.257 MiB/sec 668.714 MiB/sec 2.210 False
35 UniqueString10bytes/6 662.915 MiB/sec 676.832 MiB/sec 2.099 False
6 BuildStringDictionary 80.971 MiB/sec 81.753 MiB/sec 0.966 False
22 UniqueString100bytes/1 4.002 GiB/sec 4.033 GiB/sec 0.783 False
25 UniqueInt64/0 2.153 GiB/sec 2.168 GiB/sec 0.697 False
17 UniqueString10bytes/0 917.726 MiB/sec 918.783 MiB/sec 0.115 False
43 UniqueInt64/6 2.017 GiB/sec 2.016 GiB/sec -0.071 False
40 UniqueString100bytes/0 4.091 GiB/sec 4.086 GiB/sec -0.130 False
3 UniqueString100bytes/7 1.938 GiB/sec 1.909 GiB/sec -1.519 False
26 UniqueString100bytes/8 1.954 GiB/sec 1.897 GiB/sec -2.935 False
2 UniqueString100bytes/9 2.114 GiB/sec 2.034 GiB/sec -3.782 False
30 UniqueString100bytes/6 2.008 GiB/sec 1.895 GiB/sec -5.649 True
10 UniqueUInt8/3 474.468 MiB/sec 422.604 MiB/sec -10.931 True
15 BuildDictionary 1.776 GiB/sec 1.212 GiB/sec -31.742 True
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] cyb70289 edited a comment on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h
Posted by GitBox <gi...@apache.org>.
cyb70289 edited a comment on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647914768
> I'm refactoring to nix util::optional. I'm too tired to finish it tonight so I'll work on it tomorrow morning. If the perf regression isn't gone I'll rewrite the sort kernels.
~~Very likely I'm wrong. I remember util::optional is added due to CI failure https://github.com/apache/arrow/pull/6495#issuecomment-593732821~~
I think this patch is okay.
Sorting regression can be fixed (maybe improved). I'm okay to do the follow up changes.
I refined util::optional (not sure if same as @wesm thought). No performance difference from this change, still much lower than original code. Diff attached at https://pastebin.com/ywbPxyLL, hope it can save some time for @wesm :)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h
Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647887135
I'm refactoring to nix util::optional.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm edited a comment on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h
Posted by GitBox <gi...@apache.org>.
wesm edited a comment on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647887135
I'm refactoring to nix util::optional. I'm too tired to finish it tonight so I'll work on it tomorrow morning.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] cyb70289 commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h
Posted by GitBox <gi...@apache.org>.
cyb70289 commented on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647884195
I see big performance drop from some counting sort cases, also tested on my local machine. Should be related to these visitor code: https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/vector_sort.cc#L133-L155
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou closed pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h
Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #7521:
URL: https://github.com/apache/arrow/pull/7521
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm edited a comment on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h
Posted by GitBox <gi...@apache.org>.
wesm edited a comment on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647887135
I'm refactoring to nix util::optional. I'm too tired to finish it tonight so I'll work on it tomorrow morning. If the perf regression isn't gone I'll rewrite the sort kernels.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-648019411
Let's leave sorting optimizations for another PR. I'll review this one.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h
Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647825256
FWIW on the "gcc/clang perf discussion", clang also shows performance benefits and limited downside
```
benchmark baseline contender change % regression
2 UniqueInt64/11 6.444 GiB/sec 18.511 GiB/sec 187.240 False
31 UniqueInt64/5 6.470 GiB/sec 18.390 GiB/sec 184.244 False
39 UniqueUInt8/5 810.180 MiB/sec 1.747 GiB/sec 120.867 False
26 UniqueUInt8/1 683.475 MiB/sec 1.430 GiB/sec 114.196 False
42 UniqueInt64/4 5.424 GiB/sec 6.965 GiB/sec 28.397 False
18 UniqueInt64/1 2.672 GiB/sec 3.411 GiB/sec 27.627 False
40 UniqueUInt8/2 654.320 MiB/sec 826.916 MiB/sec 26.378 False
33 UniqueUInt8/4 758.115 MiB/sec 947.360 MiB/sec 24.962 False
25 UniqueInt64/10 5.248 GiB/sec 6.426 GiB/sec 22.460 False
9 UniqueString100bytes/5 26.923 GiB/sec 32.142 GiB/sec 19.384 False
35 UniqueString10bytes/11 2.691 GiB/sec 3.207 GiB/sec 19.173 False
3 UniqueString10bytes/5 2.695 GiB/sec 3.200 GiB/sec 18.731 False
20 UniqueString100bytes/11 26.909 GiB/sec 31.831 GiB/sec 18.291 False
30 UniqueInt64/7 2.514 GiB/sec 2.890 GiB/sec 14.960 False
37 UniqueInt64/2 2.619 GiB/sec 2.975 GiB/sec 13.578 False
11 UniqueString10bytes/4 2.487 GiB/sec 2.700 GiB/sec 8.596 False
32 UniqueString10bytes/10 2.386 GiB/sec 2.589 GiB/sec 8.481 False
0 UniqueString100bytes/4 24.419 GiB/sec 26.365 GiB/sec 7.966 False
38 UniqueString100bytes/10 22.463 GiB/sec 24.128 GiB/sec 7.411 False
34 UniqueInt64/8 2.392 GiB/sec 2.563 GiB/sec 7.157 False
19 UniqueString10bytes/1 781.817 MiB/sec 835.760 MiB/sec 6.900 False
43 UniqueInt64/3 2.184 GiB/sec 2.331 GiB/sec 6.721 False
24 UniqueString10bytes/7 583.523 MiB/sec 621.007 MiB/sec 6.424 False
15 UniqueString100bytes/7 1.936 GiB/sec 2.024 GiB/sec 4.538 False
6 UniqueString10bytes/2 780.337 MiB/sec 805.686 MiB/sec 3.248 False
27 UniqueString100bytes/2 3.934 GiB/sec 4.059 GiB/sec 3.197 False
13 UniqueString100bytes/1 3.898 GiB/sec 3.995 GiB/sec 2.485 False
7 UniqueString10bytes/8 592.115 MiB/sec 604.865 MiB/sec 2.153 False
29 UniqueString100bytes/8 1.969 GiB/sec 2.011 GiB/sec 2.111 False
21 UniqueInt64/9 2.034 GiB/sec 2.048 GiB/sec 0.676 False
1 BuildStringDictionary 85.937 MiB/sec 85.928 MiB/sec -0.010 False
41 UniqueUInt8/3 449.171 MiB/sec 448.844 MiB/sec -0.073 False
28 UniqueString100bytes/0 4.084 GiB/sec 4.077 GiB/sec -0.161 False
4 UniqueString100bytes/3 4.255 GiB/sec 4.235 GiB/sec -0.450 False
5 UniqueString100bytes/6 2.054 GiB/sec 2.033 GiB/sec -1.041 False
14 UniqueString100bytes/9 2.138 GiB/sec 2.107 GiB/sec -1.449 False
8 UniqueUInt8/0 1.777 GiB/sec 1.750 GiB/sec -1.487 False
23 UniqueInt64/0 3.860 GiB/sec 3.799 GiB/sec -1.560 False
10 UniqueString10bytes/9 616.458 MiB/sec 605.470 MiB/sec -1.782 False
22 UniqueString10bytes/3 799.494 MiB/sec 783.825 MiB/sec -1.960 False
17 UniqueString10bytes/6 647.921 MiB/sec 631.631 MiB/sec -2.514 False
36 BuildDictionary 1.539 GiB/sec 1.498 GiB/sec -2.694 False
16 UniqueInt64/6 3.193 GiB/sec 3.077 GiB/sec -3.634 False
12 UniqueString10bytes/0 881.975 MiB/sec 839.487 MiB/sec -4.817 False
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h
Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647823397
Also on the binary size, these changes add about 75KB to libarrow.so. My guess is the difference is mostly coming from code inlining for the all-null case (which wasn't split out before)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h
Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647886669
Also, I don't really understand the use of `util::optional` in these templates. The user should pass separate lambdas for the not-null and null cases
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm edited a comment on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h
Posted by GitBox <gi...@apache.org>.
wesm edited a comment on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647821487
Here's a benchmark run with gcc-8
```
---------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------
BuildDictionary 3219443 ns 3219440 ns 218 1.21215GB/s
BuildStringDictionary 3692881 ns 3692881 ns 192 81.7532MB/s
UniqueInt64/0 14413456 ns 14413251 ns 48 null_percent=0 2.16814GB/s
UniqueInt64/1 15516052 ns 15515737 ns 45 null_percent=0.1 2.01408GB/s
UniqueInt64/2 17031282 ns 17031266 ns 41 null_percent=1 1.83486GB/s
UniqueInt64/3 20680114 ns 20680064 ns 34 null_percent=10 1.51112GB/s
UniqueInt64/4 12018069 ns 12017844 ns 57 null_percent=99 2.6003GB/s
UniqueInt64/5 9179953 ns 9179946 ns 77 null_percent=100 3.40416GB/s
UniqueInt64/6 15501523 ns 15501496 ns 45 null_percent=0 2.01593GB/s
UniqueInt64/7 16482935 ns 16482300 ns 41 null_percent=0.1 1.89597GB/s
UniqueInt64/8 18349988 ns 18349317 ns 38 null_percent=1 1.70306GB/s
UniqueInt64/9 21439268 ns 21439244 ns 32 null_percent=10 1.45761GB/s
UniqueInt64/10 12530067 ns 12529871 ns 55 null_percent=99 2.49404GB/s
UniqueInt64/11 9167314 ns 9167365 ns 75 null_percent=100 3.40883GB/s
UniqueString10bytes/0 43535899 ns 43535846 ns 16 null_percent=0 918.783MB/s
UniqueString10bytes/1 45130595 ns 45129634 ns 16 null_percent=0.1 886.336MB/s
UniqueString10bytes/2 45249034 ns 45247983 ns 15 null_percent=1 884.017MB/s
UniqueString10bytes/3 45101533 ns 45100209 ns 16 null_percent=10 886.914MB/s
UniqueString10bytes/4 4316048 ns 4316019 ns 163 null_percent=99 9.05059GB/s
UniqueString10bytes/5 1435781 ns 1435763 ns 485 null_percent=100 27.2068GB/s
UniqueString10bytes/6 59100344 ns 59098817 ns 12 null_percent=0 676.832MB/s
UniqueString10bytes/7 59797544 ns 59795857 ns 12 null_percent=0.1 668.943MB/s
UniqueString10bytes/8 61024697 ns 61023090 ns 11 null_percent=1 655.49MB/s
UniqueString10bytes/9 59817211 ns 59816339 ns 12 null_percent=10 668.714MB/s
UniqueString10bytes/10 4950387 ns 4950242 ns 134 null_percent=99 7.89103GB/s
UniqueString10bytes/11 1443482 ns 1443434 ns 446 null_percent=100 27.0622GB/s
UniqueString100bytes/0 95609006 ns 95606132 ns 7 null_percent=0 4.08577GB/s
UniqueString100bytes/1 96850582 ns 96849441 ns 7 null_percent=0.1 4.03332GB/s
UniqueString100bytes/2 95404742 ns 95404634 ns 7 null_percent=1 4.0944GB/s
UniqueString100bytes/3 89401775 ns 89401006 ns 8 null_percent=10 4.36936GB/s
UniqueString100bytes/4 4705868 ns 4705746 ns 148 null_percent=99 83.0102GB/s
UniqueString100bytes/5 1434077 ns 1434055 ns 486 null_percent=100 272.392GB/s
UniqueString100bytes/6 206155133 ns 206148425 ns 3 null_percent=0 1.89487GB/s
UniqueString100bytes/7 204661287 ns 204653659 ns 3 null_percent=0.1 1.90871GB/s
UniqueString100bytes/8 205941884 ns 205941271 ns 3 null_percent=1 1.89678GB/s
UniqueString100bytes/9 192074501 ns 192073431 ns 4 null_percent=10 2.03373GB/s
UniqueString100bytes/10 6180349 ns 6180227 ns 111 null_percent=99 63.2056GB/s
UniqueString100bytes/11 1474565 ns 1474564 ns 482 null_percent=100 264.909GB/s
UniqueUInt8/0 1990025 ns 1990023 ns 348 null_percent=0 1.96292GB/s
UniqueUInt8/1 2594146 ns 2594089 ns 272 null_percent=0.1 1.50583GB/s
UniqueUInt8/2 4726027 ns 4726053 ns 145 null_percent=1 846.372MB/s
UniqueUInt8/3 9465222 ns 9465126 ns 75 null_percent=10 422.604MB/s
UniqueUInt8/4 3557141 ns 3557135 ns 195 null_percent=99 1124.5MB/s
UniqueUInt8/5 2259664 ns 2259664 ns 314 null_percent=100 1.72869GB/s
```
Here is the % diff versus the baseline.
* Cases 1 and 7 are the mostly-not-null cases. This shows a 15-20% perf improvement
* Cases 5 and 11 are the all-null cases.
* Case 4 and 10 are the 99% null cases
* The "BuildDictionary" case at the bottom with the perf regression is one of the "worst case scenarios". 89% of the values are null and so we almost never observe an all-null or all-not-null block. The use of `BitUtil::GetBit` over BitmapReader causes this slightly regression since nearly every validity bit must be checked separately. I don't think it's worth optimizing for this case since the others are more empirically representative of real world data
```
benchmark baseline contender change % regression
8 UniqueString100bytes/5 40.668 GiB/sec 272.392 GiB/sec 569.787 False
37 UniqueString10bytes/5 4.064 GiB/sec 27.207 GiB/sec 569.456 False
33 UniqueString10bytes/11 4.065 GiB/sec 27.062 GiB/sec 565.751 False
12 UniqueString100bytes/11 40.578 GiB/sec 264.909 GiB/sec 552.841 False
0 UniqueString10bytes/4 3.568 GiB/sec 9.051 GiB/sec 153.692 False
36 UniqueString100bytes/4 34.408 GiB/sec 83.010 GiB/sec 141.252 False
19 UniqueString10bytes/10 3.375 GiB/sec 7.891 GiB/sec 133.794 False
24 UniqueUInt8/1 677.981 MiB/sec 1.506 GiB/sec 127.435 False
5 UniqueString100bytes/10 30.775 GiB/sec 63.206 GiB/sec 105.381 False
27 UniqueUInt8/5 1000.163 MiB/sec 1.729 GiB/sec 76.989 False
13 UniqueUInt8/2 650.819 MiB/sec 846.372 MiB/sec 30.047 False
29 UniqueInt64/11 2.703 GiB/sec 3.409 GiB/sec 26.126 False
7 UniqueInt64/5 2.704 GiB/sec 3.404 GiB/sec 25.903 False
18 UniqueUInt8/4 932.926 MiB/sec 1.098 GiB/sec 20.535 False
23 UniqueInt64/1 1.681 GiB/sec 2.014 GiB/sec 19.840 False
21 UniqueInt64/7 1.628 GiB/sec 1.896 GiB/sec 16.476 False
31 UniqueInt64/2 1.658 GiB/sec 1.835 GiB/sec 10.651 False
20 UniqueString10bytes/7 612.647 MiB/sec 668.943 MiB/sec 9.189 False
16 UniqueInt64/3 1.386 GiB/sec 1.511 GiB/sec 9.053 False
38 UniqueString10bytes/8 601.259 MiB/sec 655.490 MiB/sec 9.019 False
1 UniqueUInt8/0 1.808 GiB/sec 1.963 GiB/sec 8.588 False
41 UniqueInt64/9 1.355 GiB/sec 1.458 GiB/sec 7.562 False
14 UniqueString10bytes/1 830.614 MiB/sec 886.336 MiB/sec 6.709 False
4 UniqueInt64/8 1.603 GiB/sec 1.703 GiB/sec 6.260 False
32 UniqueString10bytes/2 847.018 MiB/sec 884.017 MiB/sec 4.368 False
42 UniqueInt64/4 2.508 GiB/sec 2.600 GiB/sec 3.701 False
39 UniqueString10bytes/3 855.985 MiB/sec 886.914 MiB/sec 3.613 False
28 UniqueInt64/10 2.413 GiB/sec 2.494 GiB/sec 3.360 False
34 UniqueString100bytes/3 4.254 GiB/sec 4.369 GiB/sec 2.722 False
11 UniqueString100bytes/2 3.993 GiB/sec 4.094 GiB/sec 2.544 False
9 UniqueString10bytes/9 654.257 MiB/sec 668.714 MiB/sec 2.210 False
35 UniqueString10bytes/6 662.915 MiB/sec 676.832 MiB/sec 2.099 False
6 BuildStringDictionary 80.971 MiB/sec 81.753 MiB/sec 0.966 False
22 UniqueString100bytes/1 4.002 GiB/sec 4.033 GiB/sec 0.783 False
25 UniqueInt64/0 2.153 GiB/sec 2.168 GiB/sec 0.697 False
17 UniqueString10bytes/0 917.726 MiB/sec 918.783 MiB/sec 0.115 False
43 UniqueInt64/6 2.017 GiB/sec 2.016 GiB/sec -0.071 False
40 UniqueString100bytes/0 4.091 GiB/sec 4.086 GiB/sec -0.130 False
3 UniqueString100bytes/7 1.938 GiB/sec 1.909 GiB/sec -1.519 False
26 UniqueString100bytes/8 1.954 GiB/sec 1.897 GiB/sec -2.935 False
2 UniqueString100bytes/9 2.114 GiB/sec 2.034 GiB/sec -3.782 False
30 UniqueString100bytes/6 2.008 GiB/sec 1.895 GiB/sec -5.649 True
10 UniqueUInt8/3 474.468 MiB/sec 422.604 MiB/sec -10.931 True
15 BuildDictionary 1.776 GiB/sec 1.212 GiB/sec -31.742 True
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm edited a comment on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h
Posted by GitBox <gi...@apache.org>.
wesm edited a comment on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647888437
FWIW the performance issue seems to be more pronounced on gcc than clang, here is the benchmark comparison on my machine with clang-8
```
benchmark baseline contender change % counters
1 SortToIndicesInt64Count/32768/10000/min_time:1.000 1.560 GiB/sec 2.000 GiB/sec 28.163 {'iterations': 70030, 'null_percent': 0.01}
15 SortToIndicesInt64Compare/32768/10000/min_time:1.000 145.735 MiB/sec 158.918 MiB/sec 9.046 {'iterations': 6654, 'null_percent': 0.01}
5 SortToIndicesInt64Compare/32768/100/min_time:1.000 149.117 MiB/sec 159.609 MiB/sec 7.036 {'iterations': 6545, 'null_percent': 1.0}
7 SortToIndicesInt64Compare/32768/0/min_time:1.000 153.027 MiB/sec 162.227 MiB/sec 6.012 {'iterations': 6862, 'null_percent': 0.0}
4 SortToIndicesInt64Compare/32768/10/min_time:1.000 160.419 MiB/sec 167.725 MiB/sec 4.554 {'iterations': 6934, 'null_percent': 10.0}
2 SortToIndicesInt64Compare/32768/2/min_time:1.000 255.024 MiB/sec 260.284 MiB/sec 2.063 {'iterations': 11390, 'null_percent': 50.0}
9 SortToIndicesInt64Count/32768/100/min_time:1.000 1.486 GiB/sec 1.458 GiB/sec -1.912 {'iterations': 66757, 'null_percent': 1.0}
10 SortToIndicesInt64Count/32768/0/min_time:1.000 2.143 GiB/sec 2.067 GiB/sec -3.568 {'iterations': 98191, 'null_percent': 0.0}
13 SortToIndicesInt64Count/8388608/1/min_time:1.000 4.215 GiB/sec 3.813 GiB/sec -9.531 {'iterations': 762, 'null_percent': 100.0}
11 SortToIndicesInt64Count/32768/2/min_time:1.000 679.023 MiB/sec 609.379 MiB/sec -10.256 {'iterations': 29602, 'null_percent': 50.0}
0 SortToIndicesInt64Count/1048576/1/min_time:1.000 4.487 GiB/sec 4.021 GiB/sec -10.400 {'iterations': 6550, 'null_percent': 100.0}
12 SortToIndicesInt64Compare/8388608/1/min_time:1.000 4.250 GiB/sec 3.762 GiB/sec -11.476 {'iterations': 766, 'null_percent': 100.0}
6 SortToIndicesInt64Count/32768/1/min_time:1.000 4.758 GiB/sec 4.185 GiB/sec -12.040 {'iterations': 217705, 'null_percent': 100.0}
8 SortToIndicesInt64Compare/32768/1/min_time:1.000 4.730 GiB/sec 4.125 GiB/sec -12.780 {'iterations': 213908, 'null_percent': 100.0}
3 SortToIndicesInt64Compare/1048576/1/min_time:1.000 4.556 GiB/sec 3.953 GiB/sec -13.228 {'iterations': 6539, 'null_percent': 100.0}
14 SortToIndicesInt64Count/32768/10/min_time:1.000 1.316 GiB/sec 1.051 GiB/sec -20.108 {'iterations': 59539, 'null_percent': 10.0}
```
The perf regression with the data 100% null is an artifact of the improper implementation
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm edited a comment on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h
Posted by GitBox <gi...@apache.org>.
wesm edited a comment on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647884747
Sorting seems too important to leave it to these relatively complex templates, I would suggest implementing the counting sort without using `VisitArrayDataInline`. I'm happy to help with this.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h
Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647821487
Here's a benchmark run with gcc-8
```
---------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------
BuildDictionary 3219443 ns 3219440 ns 218 1.21215GB/s
BuildStringDictionary 3692881 ns 3692881 ns 192 81.7532MB/s
UniqueInt64/0 14413456 ns 14413251 ns 48 null_percent=0 2.16814GB/s
UniqueInt64/1 15516052 ns 15515737 ns 45 null_percent=0.1 2.01408GB/s
UniqueInt64/2 17031282 ns 17031266 ns 41 null_percent=1 1.83486GB/s
UniqueInt64/3 20680114 ns 20680064 ns 34 null_percent=10 1.51112GB/s
UniqueInt64/4 12018069 ns 12017844 ns 57 null_percent=99 2.6003GB/s
UniqueInt64/5 9179953 ns 9179946 ns 77 null_percent=100 3.40416GB/s
UniqueInt64/6 15501523 ns 15501496 ns 45 null_percent=0 2.01593GB/s
UniqueInt64/7 16482935 ns 16482300 ns 41 null_percent=0.1 1.89597GB/s
UniqueInt64/8 18349988 ns 18349317 ns 38 null_percent=1 1.70306GB/s
UniqueInt64/9 21439268 ns 21439244 ns 32 null_percent=10 1.45761GB/s
UniqueInt64/10 12530067 ns 12529871 ns 55 null_percent=99 2.49404GB/s
UniqueInt64/11 9167314 ns 9167365 ns 75 null_percent=100 3.40883GB/s
UniqueString10bytes/0 43535899 ns 43535846 ns 16 null_percent=0 918.783MB/s
UniqueString10bytes/1 45130595 ns 45129634 ns 16 null_percent=0.1 886.336MB/s
UniqueString10bytes/2 45249034 ns 45247983 ns 15 null_percent=1 884.017MB/s
UniqueString10bytes/3 45101533 ns 45100209 ns 16 null_percent=10 886.914MB/s
UniqueString10bytes/4 4316048 ns 4316019 ns 163 null_percent=99 9.05059GB/s
UniqueString10bytes/5 1435781 ns 1435763 ns 485 null_percent=100 27.2068GB/s
UniqueString10bytes/6 59100344 ns 59098817 ns 12 null_percent=0 676.832MB/s
UniqueString10bytes/7 59797544 ns 59795857 ns 12 null_percent=0.1 668.943MB/s
UniqueString10bytes/8 61024697 ns 61023090 ns 11 null_percent=1 655.49MB/s
UniqueString10bytes/9 59817211 ns 59816339 ns 12 null_percent=10 668.714MB/s
UniqueString10bytes/10 4950387 ns 4950242 ns 134 null_percent=99 7.89103GB/s
UniqueString10bytes/11 1443482 ns 1443434 ns 446 null_percent=100 27.0622GB/s
UniqueString100bytes/0 95609006 ns 95606132 ns 7 null_percent=0 4.08577GB/s
UniqueString100bytes/1 96850582 ns 96849441 ns 7 null_percent=0.1 4.03332GB/s
UniqueString100bytes/2 95404742 ns 95404634 ns 7 null_percent=1 4.0944GB/s
UniqueString100bytes/3 89401775 ns 89401006 ns 8 null_percent=10 4.36936GB/s
UniqueString100bytes/4 4705868 ns 4705746 ns 148 null_percent=99 83.0102GB/s
UniqueString100bytes/5 1434077 ns 1434055 ns 486 null_percent=100 272.392GB/s
UniqueString100bytes/6 206155133 ns 206148425 ns 3 null_percent=0 1.89487GB/s
UniqueString100bytes/7 204661287 ns 204653659 ns 3 null_percent=0.1 1.90871GB/s
UniqueString100bytes/8 205941884 ns 205941271 ns 3 null_percent=1 1.89678GB/s
UniqueString100bytes/9 192074501 ns 192073431 ns 4 null_percent=10 2.03373GB/s
UniqueString100bytes/10 6180349 ns 6180227 ns 111 null_percent=99 63.2056GB/s
UniqueString100bytes/11 1474565 ns 1474564 ns 482 null_percent=100 264.909GB/s
UniqueUInt8/0 1990025 ns 1990023 ns 348 null_percent=0 1.96292GB/s
UniqueUInt8/1 2594146 ns 2594089 ns 272 null_percent=0.1 1.50583GB/s
UniqueUInt8/2 4726027 ns 4726053 ns 145 null_percent=1 846.372MB/s
UniqueUInt8/3 9465222 ns 9465126 ns 75 null_percent=10 422.604MB/s
UniqueUInt8/4 3557141 ns 3557135 ns 195 null_percent=99 1124.5MB/s
UniqueUInt8/5 2259664 ns 2259664 ns 314 null_percent=100 1.72869GB/s
```
Here is the % diff versus the baseline.
* Cases 1 and 7 are the mostly-not-null cases. This shows a 15-20% perf improvement
* Cases 5 and 11 are the all-null cases.
* Case 4 is the 99% null case
* The "BuildDictionary" case at the bottom with the perf regression is one of the "worst case scenarios". 89% of the values are null and so we almost never observe an all-null or all-not-null block. The use of `BitUtil::GetBit` over BitmapReader causes this slightly regression since nearly every validity bit must be checked separately. I don't think it's worth optimizing for this case since the others are more empirically representative of real world data
```
benchmark baseline contender change % regression
8 UniqueString100bytes/5 40.668 GiB/sec 272.392 GiB/sec 569.787 False
37 UniqueString10bytes/5 4.064 GiB/sec 27.207 GiB/sec 569.456 False
33 UniqueString10bytes/11 4.065 GiB/sec 27.062 GiB/sec 565.751 False
12 UniqueString100bytes/11 40.578 GiB/sec 264.909 GiB/sec 552.841 False
0 UniqueString10bytes/4 3.568 GiB/sec 9.051 GiB/sec 153.692 False
36 UniqueString100bytes/4 34.408 GiB/sec 83.010 GiB/sec 141.252 False
19 UniqueString10bytes/10 3.375 GiB/sec 7.891 GiB/sec 133.794 False
24 UniqueUInt8/1 677.981 MiB/sec 1.506 GiB/sec 127.435 False
5 UniqueString100bytes/10 30.775 GiB/sec 63.206 GiB/sec 105.381 False
27 UniqueUInt8/5 1000.163 MiB/sec 1.729 GiB/sec 76.989 False
13 UniqueUInt8/2 650.819 MiB/sec 846.372 MiB/sec 30.047 False
29 UniqueInt64/11 2.703 GiB/sec 3.409 GiB/sec 26.126 False
7 UniqueInt64/5 2.704 GiB/sec 3.404 GiB/sec 25.903 False
18 UniqueUInt8/4 932.926 MiB/sec 1.098 GiB/sec 20.535 False
23 UniqueInt64/1 1.681 GiB/sec 2.014 GiB/sec 19.840 False
21 UniqueInt64/7 1.628 GiB/sec 1.896 GiB/sec 16.476 False
31 UniqueInt64/2 1.658 GiB/sec 1.835 GiB/sec 10.651 False
20 UniqueString10bytes/7 612.647 MiB/sec 668.943 MiB/sec 9.189 False
16 UniqueInt64/3 1.386 GiB/sec 1.511 GiB/sec 9.053 False
38 UniqueString10bytes/8 601.259 MiB/sec 655.490 MiB/sec 9.019 False
1 UniqueUInt8/0 1.808 GiB/sec 1.963 GiB/sec 8.588 False
41 UniqueInt64/9 1.355 GiB/sec 1.458 GiB/sec 7.562 False
14 UniqueString10bytes/1 830.614 MiB/sec 886.336 MiB/sec 6.709 False
4 UniqueInt64/8 1.603 GiB/sec 1.703 GiB/sec 6.260 False
32 UniqueString10bytes/2 847.018 MiB/sec 884.017 MiB/sec 4.368 False
42 UniqueInt64/4 2.508 GiB/sec 2.600 GiB/sec 3.701 False
39 UniqueString10bytes/3 855.985 MiB/sec 886.914 MiB/sec 3.613 False
28 UniqueInt64/10 2.413 GiB/sec 2.494 GiB/sec 3.360 False
34 UniqueString100bytes/3 4.254 GiB/sec 4.369 GiB/sec 2.722 False
11 UniqueString100bytes/2 3.993 GiB/sec 4.094 GiB/sec 2.544 False
9 UniqueString10bytes/9 654.257 MiB/sec 668.714 MiB/sec 2.210 False
35 UniqueString10bytes/6 662.915 MiB/sec 676.832 MiB/sec 2.099 False
6 BuildStringDictionary 80.971 MiB/sec 81.753 MiB/sec 0.966 False
22 UniqueString100bytes/1 4.002 GiB/sec 4.033 GiB/sec 0.783 False
25 UniqueInt64/0 2.153 GiB/sec 2.168 GiB/sec 0.697 False
17 UniqueString10bytes/0 917.726 MiB/sec 918.783 MiB/sec 0.115 False
43 UniqueInt64/6 2.017 GiB/sec 2.016 GiB/sec -0.071 False
40 UniqueString100bytes/0 4.091 GiB/sec 4.086 GiB/sec -0.130 False
3 UniqueString100bytes/7 1.938 GiB/sec 1.909 GiB/sec -1.519 False
26 UniqueString100bytes/8 1.954 GiB/sec 1.897 GiB/sec -2.935 False
2 UniqueString100bytes/9 2.114 GiB/sec 2.034 GiB/sec -3.782 False
30 UniqueString100bytes/6 2.008 GiB/sec 1.895 GiB/sec -5.649 True
10 UniqueUInt8/3 474.468 MiB/sec 422.604 MiB/sec -10.931 True
15 BuildDictionary 1.776 GiB/sec 1.212 GiB/sec -31.742 True
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h
Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647888437
FWIW the performance issue seems to be more pronounced on gcc than clang, here is the benchmark comparison on my machine with clang-8
```
benchmark baseline contender change % counters
1 SortToIndicesInt64Count/32768/10000/min_time:1.000 1.560 GiB/sec 2.000 GiB/sec 28.163 {'iterations': 70030, 'null_percent': 0.01}
15 SortToIndicesInt64Compare/32768/10000/min_time:1.000 145.735 MiB/sec 158.918 MiB/sec 9.046 {'iterations': 6654, 'null_percent': 0.01}
5 SortToIndicesInt64Compare/32768/100/min_time:1.000 149.117 MiB/sec 159.609 MiB/sec 7.036 {'iterations': 6545, 'null_percent': 1.0}
7 SortToIndicesInt64Compare/32768/0/min_time:1.000 153.027 MiB/sec 162.227 MiB/sec 6.012 {'iterations': 6862, 'null_percent': 0.0}
4 SortToIndicesInt64Compare/32768/10/min_time:1.000 160.419 MiB/sec 167.725 MiB/sec 4.554 {'iterations': 6934, 'null_percent': 10.0}
2 SortToIndicesInt64Compare/32768/2/min_time:1.000 255.024 MiB/sec 260.284 MiB/sec 2.063 {'iterations': 11390, 'null_percent': 50.0}
9 SortToIndicesInt64Count/32768/100/min_time:1.000 1.486 GiB/sec 1.458 GiB/sec -1.912 {'iterations': 66757, 'null_percent': 1.0}
10 SortToIndicesInt64Count/32768/0/min_time:1.000 2.143 GiB/sec 2.067 GiB/sec -3.568 {'iterations': 98191, 'null_percent': 0.0}
13 SortToIndicesInt64Count/8388608/1/min_time:1.000 4.215 GiB/sec 3.813 GiB/sec -9.531 {'iterations': 762, 'null_percent': 100.0}
11 SortToIndicesInt64Count/32768/2/min_time:1.000 679.023 MiB/sec 609.379 MiB/sec -10.256 {'iterations': 29602, 'null_percent': 50.0}
0 SortToIndicesInt64Count/1048576/1/min_time:1.000 4.487 GiB/sec 4.021 GiB/sec -10.400 {'iterations': 6550, 'null_percent': 100.0}
12 SortToIndicesInt64Compare/8388608/1/min_time:1.000 4.250 GiB/sec 3.762 GiB/sec -11.476 {'iterations': 766, 'null_percent': 100.0}
6 SortToIndicesInt64Count/32768/1/min_time:1.000 4.758 GiB/sec 4.185 GiB/sec -12.040 {'iterations': 217705, 'null_percent': 100.0}
8 SortToIndicesInt64Compare/32768/1/min_time:1.000 4.730 GiB/sec 4.125 GiB/sec -12.780 {'iterations': 213908, 'null_percent': 100.0}
3 SortToIndicesInt64Compare/1048576/1/min_time:1.000 4.556 GiB/sec 3.953 GiB/sec -13.228 {'iterations': 6539, 'null_percent': 100.0}
14 SortToIndicesInt64Count/32768/10/min_time:1.000 1.316 GiB/sec 1.051 GiB/sec -20.108 {'iterations': 59539, 'null_percent': 10.0}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h
Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647884747
Sorting seems too important to leave it to these relatively complex templates, I would suggest implementing the counting sort without using `VisitArrayDataInline`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] cyb70289 commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h
Posted by GitBox <gi...@apache.org>.
cyb70289 commented on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647914768
> I'm refactoring to nix util::optional. I'm too tired to finish it tonight so I'll work on it tomorrow morning. If the perf regression isn't gone I'll rewrite the sort kernels.
Very likely I'm wrong. I remember util::optional is added due to CI failure https://github.com/apache/arrow/pull/6495#issuecomment-593732821
I think this patch is okay.
Sorting regression can be fixed (maybe improved). I'm okay to do the follow up changes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] cyb70289 commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h
Posted by GitBox <gi...@apache.org>.
cyb70289 commented on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647878260
@ursabot benchmark --suite-filter=arrow-compute-vector-sort-benchmark
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647820320
https://issues.apache.org/jira/browse/ARROW-9210
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm edited a comment on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h
Posted by GitBox <gi...@apache.org>.
wesm edited a comment on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647884747
Sorting seems too important to leave it to these relatively complex templates (for example, just after determining that a value is not null, the `optional` value is checked whether it's null again!), I would suggest implementing the counting sort without using `VisitArrayDataInline`. I'm happy to help with this.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm edited a comment on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h
Posted by GitBox <gi...@apache.org>.
wesm edited a comment on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647825256
FWIW on the "gcc/clang perf discussion", clang-8 also shows performance benefits and limited downside
```
benchmark baseline contender change % regression
2 UniqueInt64/11 6.444 GiB/sec 18.511 GiB/sec 187.240 False
31 UniqueInt64/5 6.470 GiB/sec 18.390 GiB/sec 184.244 False
39 UniqueUInt8/5 810.180 MiB/sec 1.747 GiB/sec 120.867 False
26 UniqueUInt8/1 683.475 MiB/sec 1.430 GiB/sec 114.196 False
42 UniqueInt64/4 5.424 GiB/sec 6.965 GiB/sec 28.397 False
18 UniqueInt64/1 2.672 GiB/sec 3.411 GiB/sec 27.627 False
40 UniqueUInt8/2 654.320 MiB/sec 826.916 MiB/sec 26.378 False
33 UniqueUInt8/4 758.115 MiB/sec 947.360 MiB/sec 24.962 False
25 UniqueInt64/10 5.248 GiB/sec 6.426 GiB/sec 22.460 False
9 UniqueString100bytes/5 26.923 GiB/sec 32.142 GiB/sec 19.384 False
35 UniqueString10bytes/11 2.691 GiB/sec 3.207 GiB/sec 19.173 False
3 UniqueString10bytes/5 2.695 GiB/sec 3.200 GiB/sec 18.731 False
20 UniqueString100bytes/11 26.909 GiB/sec 31.831 GiB/sec 18.291 False
30 UniqueInt64/7 2.514 GiB/sec 2.890 GiB/sec 14.960 False
37 UniqueInt64/2 2.619 GiB/sec 2.975 GiB/sec 13.578 False
11 UniqueString10bytes/4 2.487 GiB/sec 2.700 GiB/sec 8.596 False
32 UniqueString10bytes/10 2.386 GiB/sec 2.589 GiB/sec 8.481 False
0 UniqueString100bytes/4 24.419 GiB/sec 26.365 GiB/sec 7.966 False
38 UniqueString100bytes/10 22.463 GiB/sec 24.128 GiB/sec 7.411 False
34 UniqueInt64/8 2.392 GiB/sec 2.563 GiB/sec 7.157 False
19 UniqueString10bytes/1 781.817 MiB/sec 835.760 MiB/sec 6.900 False
43 UniqueInt64/3 2.184 GiB/sec 2.331 GiB/sec 6.721 False
24 UniqueString10bytes/7 583.523 MiB/sec 621.007 MiB/sec 6.424 False
15 UniqueString100bytes/7 1.936 GiB/sec 2.024 GiB/sec 4.538 False
6 UniqueString10bytes/2 780.337 MiB/sec 805.686 MiB/sec 3.248 False
27 UniqueString100bytes/2 3.934 GiB/sec 4.059 GiB/sec 3.197 False
13 UniqueString100bytes/1 3.898 GiB/sec 3.995 GiB/sec 2.485 False
7 UniqueString10bytes/8 592.115 MiB/sec 604.865 MiB/sec 2.153 False
29 UniqueString100bytes/8 1.969 GiB/sec 2.011 GiB/sec 2.111 False
21 UniqueInt64/9 2.034 GiB/sec 2.048 GiB/sec 0.676 False
1 BuildStringDictionary 85.937 MiB/sec 85.928 MiB/sec -0.010 False
41 UniqueUInt8/3 449.171 MiB/sec 448.844 MiB/sec -0.073 False
28 UniqueString100bytes/0 4.084 GiB/sec 4.077 GiB/sec -0.161 False
4 UniqueString100bytes/3 4.255 GiB/sec 4.235 GiB/sec -0.450 False
5 UniqueString100bytes/6 2.054 GiB/sec 2.033 GiB/sec -1.041 False
14 UniqueString100bytes/9 2.138 GiB/sec 2.107 GiB/sec -1.449 False
8 UniqueUInt8/0 1.777 GiB/sec 1.750 GiB/sec -1.487 False
23 UniqueInt64/0 3.860 GiB/sec 3.799 GiB/sec -1.560 False
10 UniqueString10bytes/9 616.458 MiB/sec 605.470 MiB/sec -1.782 False
22 UniqueString10bytes/3 799.494 MiB/sec 783.825 MiB/sec -1.960 False
17 UniqueString10bytes/6 647.921 MiB/sec 631.631 MiB/sec -2.514 False
36 BuildDictionary 1.539 GiB/sec 1.498 GiB/sec -2.694 False
16 UniqueInt64/6 3.193 GiB/sec 3.077 GiB/sec -3.634 False
12 UniqueString10bytes/0 881.975 MiB/sec 839.487 MiB/sec -4.817 False
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h
Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647883517
[AMD64 Ubuntu 18.04 C++ Benchmark (#114347)](https://ci.ursalabs.org/#builders/73/builds/89) builder has been succeeded.
Revision: dbd166df749e73cbf7c1ec0c6cfa5837280aa32d
```diff
==================================================== =============== =============== ========
benchmark baseline contender change
==================================================== =============== =============== ========
SortToIndicesInt64Count/32768/1/min_time:1.000 2.690 GiB/sec 2.646 GiB/sec -1.654%
SortToIndicesInt64Count/1048576/1/min_time:1.000 3.244 GiB/sec 3.198 GiB/sec -1.423%
SortToIndicesInt64Compare/32768/10000/min_time:1.000 103.004 MiB/sec 101.724 MiB/sec -1.243%
SortToIndicesInt64Compare/32768/1/min_time:1.000 2.685 GiB/sec 2.612 GiB/sec -2.707%
SortToIndicesInt64Compare/32768/0/min_time:1.000 105.027 MiB/sec 103.783 MiB/sec -1.184%
- SortToIndicesInt64Compare/32768/10/min_time:1.000 109.648 MiB/sec 102.376 MiB/sec -6.633%
- SortToIndicesInt64Count/32768/10/min_time:1.000 701.425 MiB/sec 286.420 MiB/sec -59.166%
- SortToIndicesInt64Count/32768/100/min_time:1.000 686.441 MiB/sec 386.614 MiB/sec -43.678%
SortToIndicesInt64Compare/8388608/1/min_time:1.000 3.162 GiB/sec 3.201 GiB/sec 1.242%
- SortToIndicesInt64Count/32768/2/min_time:1.000 526.866 MiB/sec 259.139 MiB/sec -50.815%
- SortToIndicesInt64Count/32768/10000/min_time:1.000 683.857 MiB/sec 599.732 MiB/sec -12.301%
SortToIndicesInt64Compare/32768/100/min_time:1.000 103.157 MiB/sec 98.649 MiB/sec -4.370%
SortToIndicesInt64Count/8388608/1/min_time:1.000 3.259 GiB/sec 3.211 GiB/sec -1.495%
SortToIndicesInt64Count/32768/0/min_time:1.000 647.629 MiB/sec 627.171 MiB/sec -3.159%
SortToIndicesInt64Compare/1048576/1/min_time:1.000 3.197 GiB/sec 3.198 GiB/sec 0.035%
- SortToIndicesInt64Compare/32768/2/min_time:1.000 171.750 MiB/sec 162.637 MiB/sec -5.306%
==================================================== =============== =============== ========
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h
Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-648147535
thanks @pitrou and @cyb70289 -- I will spend a little time on the count-sort implementation and post a new patch
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org