You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/06/22 23:29:03 UTC

[GitHub] [arrow] wesm opened a new pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

wesm opened a new pull request #7521:
URL: https://github.com/apache/arrow/pull/7521


   This significantly speeds up processing of mostly-not-null or mostly-null data, while having almost no overhead for the other scenarios where you rarely have a word-sized run of all-not-null or all-null-data. For data with null_count 0, data is processed in blocks of INT16_MAX values at a time, so this adds no meaningful overhead for this case either. 
   
   I modified the hash benchmarks where this code is used to exhibit both the cases that benefit from this optimization as well as the ones that don't. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm edited a comment on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

Posted by GitBox <gi...@apache.org>.
wesm edited a comment on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647821487


   Here's a benchmark run with gcc-8 
   
   ```
   ---------------------------------------------------------------
   Benchmark                        Time           CPU Iterations
   ---------------------------------------------------------------
   BuildDictionary            3219443 ns    3219440 ns        218   1.21215GB/s
   BuildStringDictionary      3692881 ns    3692881 ns        192   81.7532MB/s
   UniqueInt64/0             14413456 ns   14413251 ns         48 null_percent=0   2.16814GB/s
   UniqueInt64/1             15516052 ns   15515737 ns         45 null_percent=0.1   2.01408GB/s
   UniqueInt64/2             17031282 ns   17031266 ns         41 null_percent=1   1.83486GB/s
   UniqueInt64/3             20680114 ns   20680064 ns         34 null_percent=10   1.51112GB/s
   UniqueInt64/4             12018069 ns   12017844 ns         57 null_percent=99    2.6003GB/s
   UniqueInt64/5              9179953 ns    9179946 ns         77 null_percent=100   3.40416GB/s
   UniqueInt64/6             15501523 ns   15501496 ns         45 null_percent=0   2.01593GB/s
   UniqueInt64/7             16482935 ns   16482300 ns         41 null_percent=0.1   1.89597GB/s
   UniqueInt64/8             18349988 ns   18349317 ns         38 null_percent=1   1.70306GB/s
   UniqueInt64/9             21439268 ns   21439244 ns         32 null_percent=10   1.45761GB/s
   UniqueInt64/10            12530067 ns   12529871 ns         55 null_percent=99   2.49404GB/s
   UniqueInt64/11             9167314 ns    9167365 ns         75 null_percent=100   3.40883GB/s
   UniqueString10bytes/0     43535899 ns   43535846 ns         16 null_percent=0   918.783MB/s
   UniqueString10bytes/1     45130595 ns   45129634 ns         16 null_percent=0.1   886.336MB/s
   UniqueString10bytes/2     45249034 ns   45247983 ns         15 null_percent=1   884.017MB/s
   UniqueString10bytes/3     45101533 ns   45100209 ns         16 null_percent=10   886.914MB/s
   UniqueString10bytes/4      4316048 ns    4316019 ns        163 null_percent=99   9.05059GB/s
   UniqueString10bytes/5      1435781 ns    1435763 ns        485 null_percent=100   27.2068GB/s
   UniqueString10bytes/6     59100344 ns   59098817 ns         12 null_percent=0   676.832MB/s
   UniqueString10bytes/7     59797544 ns   59795857 ns         12 null_percent=0.1   668.943MB/s
   UniqueString10bytes/8     61024697 ns   61023090 ns         11 null_percent=1    655.49MB/s
   UniqueString10bytes/9     59817211 ns   59816339 ns         12 null_percent=10   668.714MB/s
   UniqueString10bytes/10     4950387 ns    4950242 ns        134 null_percent=99   7.89103GB/s
   UniqueString10bytes/11     1443482 ns    1443434 ns        446 null_percent=100   27.0622GB/s
   UniqueString100bytes/0    95609006 ns   95606132 ns          7 null_percent=0   4.08577GB/s
   UniqueString100bytes/1    96850582 ns   96849441 ns          7 null_percent=0.1   4.03332GB/s
   UniqueString100bytes/2    95404742 ns   95404634 ns          7 null_percent=1    4.0944GB/s
   UniqueString100bytes/3    89401775 ns   89401006 ns          8 null_percent=10   4.36936GB/s
   UniqueString100bytes/4     4705868 ns    4705746 ns        148 null_percent=99   83.0102GB/s
   UniqueString100bytes/5     1434077 ns    1434055 ns        486 null_percent=100   272.392GB/s
   UniqueString100bytes/6   206155133 ns  206148425 ns          3 null_percent=0   1.89487GB/s
   UniqueString100bytes/7   204661287 ns  204653659 ns          3 null_percent=0.1   1.90871GB/s
   UniqueString100bytes/8   205941884 ns  205941271 ns          3 null_percent=1   1.89678GB/s
   UniqueString100bytes/9   192074501 ns  192073431 ns          4 null_percent=10   2.03373GB/s
   UniqueString100bytes/10    6180349 ns    6180227 ns        111 null_percent=99   63.2056GB/s
   UniqueString100bytes/11    1474565 ns    1474564 ns        482 null_percent=100   264.909GB/s
   UniqueUInt8/0              1990025 ns    1990023 ns        348 null_percent=0   1.96292GB/s
   UniqueUInt8/1              2594146 ns    2594089 ns        272 null_percent=0.1   1.50583GB/s
   UniqueUInt8/2              4726027 ns    4726053 ns        145 null_percent=1   846.372MB/s
   UniqueUInt8/3              9465222 ns    9465126 ns         75 null_percent=10   422.604MB/s
   UniqueUInt8/4              3557141 ns    3557135 ns        195 null_percent=99    1124.5MB/s
   UniqueUInt8/5              2259664 ns    2259664 ns        314 null_percent=100   1.72869GB/s
   ```
   
   (I need to add "num_unique" to `state.counters` -- there are two different cardinality cases represented here)
   
   Here is the % diff versus the baseline. 
   
   * Cases 1 and 7 are the mostly-not-null cases. This shows a 15-20% perf improvement
   * Cases 5 and 11 are the all-null cases.
   * Case 4 and 10 are the 99% null cases
   * The "BuildDictionary" case at the bottom with the perf regression is one of the "worst case scenarios". 89% of the values are null and so we almost never observe an all-null or all-not-null block. The use of `BitUtil::GetBit` over BitmapReader causes this slightly regression since nearly every validity bit must be checked separately. I don't think it's worth optimizing for this case since the others are more empirically representative of real world data
   
   ```
                     benchmark          baseline        contender  change %  regression
   8    UniqueString100bytes/5    40.668 GiB/sec  272.392 GiB/sec   569.787       False
   37    UniqueString10bytes/5     4.064 GiB/sec   27.207 GiB/sec   569.456       False
   33   UniqueString10bytes/11     4.065 GiB/sec   27.062 GiB/sec   565.751       False
   12  UniqueString100bytes/11    40.578 GiB/sec  264.909 GiB/sec   552.841       False
   0     UniqueString10bytes/4     3.568 GiB/sec    9.051 GiB/sec   153.692       False
   36   UniqueString100bytes/4    34.408 GiB/sec   83.010 GiB/sec   141.252       False
   19   UniqueString10bytes/10     3.375 GiB/sec    7.891 GiB/sec   133.794       False
   24            UniqueUInt8/1   677.981 MiB/sec    1.506 GiB/sec   127.435       False
   5   UniqueString100bytes/10    30.775 GiB/sec   63.206 GiB/sec   105.381       False
   27            UniqueUInt8/5  1000.163 MiB/sec    1.729 GiB/sec    76.989       False
   13            UniqueUInt8/2   650.819 MiB/sec  846.372 MiB/sec    30.047       False
   29           UniqueInt64/11     2.703 GiB/sec    3.409 GiB/sec    26.126       False
   7             UniqueInt64/5     2.704 GiB/sec    3.404 GiB/sec    25.903       False
   18            UniqueUInt8/4   932.926 MiB/sec    1.098 GiB/sec    20.535       False
   23            UniqueInt64/1     1.681 GiB/sec    2.014 GiB/sec    19.840       False
   21            UniqueInt64/7     1.628 GiB/sec    1.896 GiB/sec    16.476       False
   31            UniqueInt64/2     1.658 GiB/sec    1.835 GiB/sec    10.651       False
   20    UniqueString10bytes/7   612.647 MiB/sec  668.943 MiB/sec     9.189       False
   16            UniqueInt64/3     1.386 GiB/sec    1.511 GiB/sec     9.053       False
   38    UniqueString10bytes/8   601.259 MiB/sec  655.490 MiB/sec     9.019       False
   1             UniqueUInt8/0     1.808 GiB/sec    1.963 GiB/sec     8.588       False
   41            UniqueInt64/9     1.355 GiB/sec    1.458 GiB/sec     7.562       False
   14    UniqueString10bytes/1   830.614 MiB/sec  886.336 MiB/sec     6.709       False
   4             UniqueInt64/8     1.603 GiB/sec    1.703 GiB/sec     6.260       False
   32    UniqueString10bytes/2   847.018 MiB/sec  884.017 MiB/sec     4.368       False
   42            UniqueInt64/4     2.508 GiB/sec    2.600 GiB/sec     3.701       False
   39    UniqueString10bytes/3   855.985 MiB/sec  886.914 MiB/sec     3.613       False
   28           UniqueInt64/10     2.413 GiB/sec    2.494 GiB/sec     3.360       False
   34   UniqueString100bytes/3     4.254 GiB/sec    4.369 GiB/sec     2.722       False
   11   UniqueString100bytes/2     3.993 GiB/sec    4.094 GiB/sec     2.544       False
   9     UniqueString10bytes/9   654.257 MiB/sec  668.714 MiB/sec     2.210       False
   35    UniqueString10bytes/6   662.915 MiB/sec  676.832 MiB/sec     2.099       False
   6     BuildStringDictionary    80.971 MiB/sec   81.753 MiB/sec     0.966       False
   22   UniqueString100bytes/1     4.002 GiB/sec    4.033 GiB/sec     0.783       False
   25            UniqueInt64/0     2.153 GiB/sec    2.168 GiB/sec     0.697       False
   17    UniqueString10bytes/0   917.726 MiB/sec  918.783 MiB/sec     0.115       False
   43            UniqueInt64/6     2.017 GiB/sec    2.016 GiB/sec    -0.071       False
   40   UniqueString100bytes/0     4.091 GiB/sec    4.086 GiB/sec    -0.130       False
   3    UniqueString100bytes/7     1.938 GiB/sec    1.909 GiB/sec    -1.519       False
   26   UniqueString100bytes/8     1.954 GiB/sec    1.897 GiB/sec    -2.935       False
   2    UniqueString100bytes/9     2.114 GiB/sec    2.034 GiB/sec    -3.782       False
   30   UniqueString100bytes/6     2.008 GiB/sec    1.895 GiB/sec    -5.649        True
   10            UniqueUInt8/3   474.468 MiB/sec  422.604 MiB/sec   -10.931        True
   15          BuildDictionary     1.776 GiB/sec    1.212 GiB/sec   -31.742        True
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm edited a comment on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

Posted by GitBox <gi...@apache.org>.
wesm edited a comment on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647821487


   Here's a benchmark run with gcc-8 
   
   ```
   -------------------------------------------------------------------------------
   Benchmark                        Time           CPU Iterations UserCounters...
   -------------------------------------------------------------------------------
   BuildDictionary            2625315 ns    2625247 ns        271 null_percent=0.88889    1.4865GB/s
   BuildStringDictionary      3475855 ns    3475854 ns        200   86.8577MB/s
   UniqueInt64/0              9842842 ns    9842834 ns         71 null_percent=0 num_unique=1024    3.1749GB/s
   UniqueInt64/1             10617685 ns   10617360 ns         66 null_percent=0.1 num_unique=1024   2.94329GB/s
   UniqueInt64/2             12648447 ns   12648430 ns         59 null_percent=1 num_unique=1024   2.47066GB/s
   UniqueInt64/3             15365202 ns   15365113 ns         43 null_percent=10 num_unique=1024   2.03383GB/s
   UniqueInt64/4              5126936 ns    5126851 ns        128 null_percent=99 num_unique=1024   6.09536GB/s
   UniqueInt64/5              1763829 ns    1763809 ns        400 null_percent=100 num_unique=1024   17.7173GB/s
   UniqueInt64/6             10545960 ns   10545841 ns         67 null_percent=0 num_unique=10.24k   2.96325GB/s
   UniqueInt64/7             11478529 ns   11478403 ns         61 null_percent=0.1 num_unique=10.24k    2.7225GB/s
   UniqueInt64/8             12792912 ns   12792429 ns         54 null_percent=1 num_unique=10.24k   2.44285GB/s
   UniqueInt64/9             16805938 ns   16805535 ns         44 null_percent=10 num_unique=10.24k   1.85951GB/s
   UniqueInt64/10             5503266 ns    5503108 ns        114 null_percent=99 num_unique=10.24k   5.67861GB/s
   UniqueInt64/11             1763742 ns    1763699 ns        392 null_percent=100 num_unique=10.24k   17.7184GB/s
   UniqueString10bytes/0     44193582 ns   44191679 ns         16 null_percent=0 num_unique=1024   905.148MB/s
   UniqueString10bytes/1     45022703 ns   45022263 ns         15 null_percent=0.1 num_unique=1024   888.449MB/s
   UniqueString10bytes/2     47131705 ns   47130800 ns         15 null_percent=1 num_unique=1024   848.702MB/s
   UniqueString10bytes/3     50106213 ns   50105455 ns         14 null_percent=10 num_unique=1024   798.316MB/s
   UniqueString10bytes/4     15905586 ns   15905158 ns         43 null_percent=99 num_unique=1024   2.45596GB/s
   UniqueString10bytes/5     12983446 ns   12983327 ns         55 null_percent=100 num_unique=1024   3.00867GB/s
   UniqueString10bytes/6     62149404 ns   62148971 ns         11 null_percent=0 num_unique=10.24k   643.615MB/s
   UniqueString10bytes/7     62707969 ns   62705282 ns         11 null_percent=0.1 num_unique=10.24k   637.905MB/s
   UniqueString10bytes/8     65508665 ns   65508532 ns         10 null_percent=1 num_unique=10.24k   610.607MB/s
   UniqueString10bytes/9     65766803 ns   65766094 ns         11 null_percent=10 num_unique=10.24k   608.216MB/s
   UniqueString10bytes/10    16297990 ns   16298076 ns         43 null_percent=99 num_unique=10.24k   2.39676GB/s
   UniqueString10bytes/11    13298987 ns   13298798 ns         54 null_percent=100 num_unique=10.24k    2.9373GB/s
   UniqueString100bytes/0    94204048 ns   94200614 ns          7 null_percent=0 num_unique=1024   4.14674GB/s
   UniqueString100bytes/1    95631478 ns   95630838 ns          7 null_percent=0.1 num_unique=1024   4.08472GB/s
   UniqueString100bytes/2    96547756 ns   96546348 ns          7 null_percent=1 num_unique=1024   4.04598GB/s
   UniqueString100bytes/3    91950796 ns   91949032 ns          8 null_percent=10 num_unique=1024   4.24828GB/s
   UniqueString100bytes/4    17292562 ns   17291979 ns         42 null_percent=99 num_unique=1024     22.59GB/s
   UniqueString100bytes/5    13096944 ns   13096809 ns         55 null_percent=100 num_unique=1024    29.826GB/s
   UniqueString100bytes/6   196165738 ns  196161451 ns          4 null_percent=0 num_unique=10.24k   1.99134GB/s
   UniqueString100bytes/7   198475556 ns  198475456 ns          4 null_percent=0.1 num_unique=10.24k   1.96813GB/s
   UniqueString100bytes/8   199273625 ns  199270358 ns          3 null_percent=1 num_unique=10.24k   1.96028GB/s
   UniqueString100bytes/9   189235180 ns  189232925 ns          4 null_percent=10 num_unique=10.24k   2.06425GB/s
   UniqueString100bytes/10   18381309 ns   18381409 ns         36 null_percent=99 num_unique=10.24k   21.2511GB/s
   UniqueString100bytes/11   13426102 ns   13426072 ns         51 null_percent=100 num_unique=10.24k   29.0945GB/s
   UniqueUInt8/0              2239549 ns    2239561 ns        309 null_percent=0 num_unique=200    1.7442GB/s
   UniqueUInt8/1              2687371 ns    2687349 ns        248 null_percent=0.1 num_unique=200   1.45357GB/s
   UniqueUInt8/2              4244052 ns    4244058 ns        166 null_percent=1 num_unique=200   942.494MB/s
   UniqueUInt8/3              7563076 ns    7563066 ns         94 null_percent=10 num_unique=200   528.886MB/s
   UniqueUInt8/4              3313484 ns    3313447 ns        214 null_percent=99 num_unique=200   1.17891GB/s
   UniqueUInt8/5              1711948 ns    1711947 ns        415 null_percent=100 num_unique=200   2.28176GB/s
   ```
   
   Here is the % diff versus the baseline. 
   
   * Cases 1 and 7 are the mostly-not-null cases. This shows a 15-20% perf improvement
   * Cases 5 and 11 are the all-null cases.
   * Case 4 and 10 are the 99% null cases
   * The "BuildDictionary" case at the bottom with the perf regression is one of the "worst case scenarios". 89% of the values are null and so we almost never observe an all-null or all-not-null block. The use of `BitUtil::GetBit` over BitmapReader causes this slightly regression since nearly every validity bit must be checked separately. I don't think it's worth optimizing for this case since the others are more empirically representative of real world data
   
   ```
                     benchmark          baseline        contender  change %  regression
   8    UniqueString100bytes/5    40.668 GiB/sec  272.392 GiB/sec   569.787       False
   37    UniqueString10bytes/5     4.064 GiB/sec   27.207 GiB/sec   569.456       False
   33   UniqueString10bytes/11     4.065 GiB/sec   27.062 GiB/sec   565.751       False
   12  UniqueString100bytes/11    40.578 GiB/sec  264.909 GiB/sec   552.841       False
   0     UniqueString10bytes/4     3.568 GiB/sec    9.051 GiB/sec   153.692       False
   36   UniqueString100bytes/4    34.408 GiB/sec   83.010 GiB/sec   141.252       False
   19   UniqueString10bytes/10     3.375 GiB/sec    7.891 GiB/sec   133.794       False
   24            UniqueUInt8/1   677.981 MiB/sec    1.506 GiB/sec   127.435       False
   5   UniqueString100bytes/10    30.775 GiB/sec   63.206 GiB/sec   105.381       False
   27            UniqueUInt8/5  1000.163 MiB/sec    1.729 GiB/sec    76.989       False
   13            UniqueUInt8/2   650.819 MiB/sec  846.372 MiB/sec    30.047       False
   29           UniqueInt64/11     2.703 GiB/sec    3.409 GiB/sec    26.126       False
   7             UniqueInt64/5     2.704 GiB/sec    3.404 GiB/sec    25.903       False
   18            UniqueUInt8/4   932.926 MiB/sec    1.098 GiB/sec    20.535       False
   23            UniqueInt64/1     1.681 GiB/sec    2.014 GiB/sec    19.840       False
   21            UniqueInt64/7     1.628 GiB/sec    1.896 GiB/sec    16.476       False
   31            UniqueInt64/2     1.658 GiB/sec    1.835 GiB/sec    10.651       False
   20    UniqueString10bytes/7   612.647 MiB/sec  668.943 MiB/sec     9.189       False
   16            UniqueInt64/3     1.386 GiB/sec    1.511 GiB/sec     9.053       False
   38    UniqueString10bytes/8   601.259 MiB/sec  655.490 MiB/sec     9.019       False
   1             UniqueUInt8/0     1.808 GiB/sec    1.963 GiB/sec     8.588       False
   41            UniqueInt64/9     1.355 GiB/sec    1.458 GiB/sec     7.562       False
   14    UniqueString10bytes/1   830.614 MiB/sec  886.336 MiB/sec     6.709       False
   4             UniqueInt64/8     1.603 GiB/sec    1.703 GiB/sec     6.260       False
   32    UniqueString10bytes/2   847.018 MiB/sec  884.017 MiB/sec     4.368       False
   42            UniqueInt64/4     2.508 GiB/sec    2.600 GiB/sec     3.701       False
   39    UniqueString10bytes/3   855.985 MiB/sec  886.914 MiB/sec     3.613       False
   28           UniqueInt64/10     2.413 GiB/sec    2.494 GiB/sec     3.360       False
   34   UniqueString100bytes/3     4.254 GiB/sec    4.369 GiB/sec     2.722       False
   11   UniqueString100bytes/2     3.993 GiB/sec    4.094 GiB/sec     2.544       False
   9     UniqueString10bytes/9   654.257 MiB/sec  668.714 MiB/sec     2.210       False
   35    UniqueString10bytes/6   662.915 MiB/sec  676.832 MiB/sec     2.099       False
   6     BuildStringDictionary    80.971 MiB/sec   81.753 MiB/sec     0.966       False
   22   UniqueString100bytes/1     4.002 GiB/sec    4.033 GiB/sec     0.783       False
   25            UniqueInt64/0     2.153 GiB/sec    2.168 GiB/sec     0.697       False
   17    UniqueString10bytes/0   917.726 MiB/sec  918.783 MiB/sec     0.115       False
   43            UniqueInt64/6     2.017 GiB/sec    2.016 GiB/sec    -0.071       False
   40   UniqueString100bytes/0     4.091 GiB/sec    4.086 GiB/sec    -0.130       False
   3    UniqueString100bytes/7     1.938 GiB/sec    1.909 GiB/sec    -1.519       False
   26   UniqueString100bytes/8     1.954 GiB/sec    1.897 GiB/sec    -2.935       False
   2    UniqueString100bytes/9     2.114 GiB/sec    2.034 GiB/sec    -3.782       False
   30   UniqueString100bytes/6     2.008 GiB/sec    1.895 GiB/sec    -5.649        True
   10            UniqueUInt8/3   474.468 MiB/sec  422.604 MiB/sec   -10.931        True
   15          BuildDictionary     1.776 GiB/sec    1.212 GiB/sec   -31.742        True
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] cyb70289 edited a comment on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

Posted by GitBox <gi...@apache.org>.
cyb70289 edited a comment on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647914768


   > I'm refactoring to nix util::optional. I'm too tired to finish it tonight so I'll work on it tomorrow morning. If the perf regression isn't gone I'll rewrite the sort kernels.
   
   ~~Very likely I'm wrong. I remember util::optional is added due to CI failure https://github.com/apache/arrow/pull/6495#issuecomment-593732821~~
   
   I think this patch is okay.
   Sorting regression can be fixed (maybe improved). I'm okay to do the follow up changes.
   
   I refined util::optional (not sure if same as @wesm thought). No performance difference from this change, still much lower than original code. Diff attached at https://pastebin.com/ywbPxyLL, hope it can save some time for @wesm :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647887135


   I'm refactoring to nix util::optional. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm edited a comment on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

Posted by GitBox <gi...@apache.org>.
wesm edited a comment on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647887135


   I'm refactoring to nix util::optional. I'm too tired to finish it tonight so I'll work on it tomorrow morning. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] cyb70289 commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

Posted by GitBox <gi...@apache.org>.
cyb70289 commented on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647884195


   I see big performance drop from some counting sort cases, also tested on my local machine. Should be related to these visitor code: https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/vector_sort.cc#L133-L155


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #7521:
URL: https://github.com/apache/arrow/pull/7521


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm edited a comment on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

Posted by GitBox <gi...@apache.org>.
wesm edited a comment on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647887135


   I'm refactoring to nix util::optional. I'm too tired to finish it tonight so I'll work on it tomorrow morning. If the perf regression isn't gone I'll rewrite the sort kernels. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-648019411


   Let's leave sorting optimizations for another PR. I'll review this one.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647825256


   FWIW on the "gcc/clang perf discussion", clang also shows performance benefits and limited downside
   
   ```
                     benchmark         baseline        contender  change %  regression
   2            UniqueInt64/11    6.444 GiB/sec   18.511 GiB/sec   187.240       False
   31            UniqueInt64/5    6.470 GiB/sec   18.390 GiB/sec   184.244       False
   39            UniqueUInt8/5  810.180 MiB/sec    1.747 GiB/sec   120.867       False
   26            UniqueUInt8/1  683.475 MiB/sec    1.430 GiB/sec   114.196       False
   42            UniqueInt64/4    5.424 GiB/sec    6.965 GiB/sec    28.397       False
   18            UniqueInt64/1    2.672 GiB/sec    3.411 GiB/sec    27.627       False
   40            UniqueUInt8/2  654.320 MiB/sec  826.916 MiB/sec    26.378       False
   33            UniqueUInt8/4  758.115 MiB/sec  947.360 MiB/sec    24.962       False
   25           UniqueInt64/10    5.248 GiB/sec    6.426 GiB/sec    22.460       False
   9    UniqueString100bytes/5   26.923 GiB/sec   32.142 GiB/sec    19.384       False
   35   UniqueString10bytes/11    2.691 GiB/sec    3.207 GiB/sec    19.173       False
   3     UniqueString10bytes/5    2.695 GiB/sec    3.200 GiB/sec    18.731       False
   20  UniqueString100bytes/11   26.909 GiB/sec   31.831 GiB/sec    18.291       False
   30            UniqueInt64/7    2.514 GiB/sec    2.890 GiB/sec    14.960       False
   37            UniqueInt64/2    2.619 GiB/sec    2.975 GiB/sec    13.578       False
   11    UniqueString10bytes/4    2.487 GiB/sec    2.700 GiB/sec     8.596       False
   32   UniqueString10bytes/10    2.386 GiB/sec    2.589 GiB/sec     8.481       False
   0    UniqueString100bytes/4   24.419 GiB/sec   26.365 GiB/sec     7.966       False
   38  UniqueString100bytes/10   22.463 GiB/sec   24.128 GiB/sec     7.411       False
   34            UniqueInt64/8    2.392 GiB/sec    2.563 GiB/sec     7.157       False
   19    UniqueString10bytes/1  781.817 MiB/sec  835.760 MiB/sec     6.900       False
   43            UniqueInt64/3    2.184 GiB/sec    2.331 GiB/sec     6.721       False
   24    UniqueString10bytes/7  583.523 MiB/sec  621.007 MiB/sec     6.424       False
   15   UniqueString100bytes/7    1.936 GiB/sec    2.024 GiB/sec     4.538       False
   6     UniqueString10bytes/2  780.337 MiB/sec  805.686 MiB/sec     3.248       False
   27   UniqueString100bytes/2    3.934 GiB/sec    4.059 GiB/sec     3.197       False
   13   UniqueString100bytes/1    3.898 GiB/sec    3.995 GiB/sec     2.485       False
   7     UniqueString10bytes/8  592.115 MiB/sec  604.865 MiB/sec     2.153       False
   29   UniqueString100bytes/8    1.969 GiB/sec    2.011 GiB/sec     2.111       False
   21            UniqueInt64/9    2.034 GiB/sec    2.048 GiB/sec     0.676       False
   1     BuildStringDictionary   85.937 MiB/sec   85.928 MiB/sec    -0.010       False
   41            UniqueUInt8/3  449.171 MiB/sec  448.844 MiB/sec    -0.073       False
   28   UniqueString100bytes/0    4.084 GiB/sec    4.077 GiB/sec    -0.161       False
   4    UniqueString100bytes/3    4.255 GiB/sec    4.235 GiB/sec    -0.450       False
   5    UniqueString100bytes/6    2.054 GiB/sec    2.033 GiB/sec    -1.041       False
   14   UniqueString100bytes/9    2.138 GiB/sec    2.107 GiB/sec    -1.449       False
   8             UniqueUInt8/0    1.777 GiB/sec    1.750 GiB/sec    -1.487       False
   23            UniqueInt64/0    3.860 GiB/sec    3.799 GiB/sec    -1.560       False
   10    UniqueString10bytes/9  616.458 MiB/sec  605.470 MiB/sec    -1.782       False
   22    UniqueString10bytes/3  799.494 MiB/sec  783.825 MiB/sec    -1.960       False
   17    UniqueString10bytes/6  647.921 MiB/sec  631.631 MiB/sec    -2.514       False
   36          BuildDictionary    1.539 GiB/sec    1.498 GiB/sec    -2.694       False
   16            UniqueInt64/6    3.193 GiB/sec    3.077 GiB/sec    -3.634       False
   12    UniqueString10bytes/0  881.975 MiB/sec  839.487 MiB/sec    -4.817       False
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647823397


   Also on the binary size, these changes add about 75KB to libarrow.so. My guess is the difference is mostly coming from code inlining for the all-null case (which wasn't split out before)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647886669


   Also, I don't really understand the use of `util::optional` in these templates. The user should pass separate lambdas for the not-null and null cases


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm edited a comment on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

Posted by GitBox <gi...@apache.org>.
wesm edited a comment on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647821487


   Here's a benchmark run with gcc-8 
   
   ```
   ---------------------------------------------------------------
   Benchmark                        Time           CPU Iterations
   ---------------------------------------------------------------
   BuildDictionary            3219443 ns    3219440 ns        218   1.21215GB/s
   BuildStringDictionary      3692881 ns    3692881 ns        192   81.7532MB/s
   UniqueInt64/0             14413456 ns   14413251 ns         48 null_percent=0   2.16814GB/s
   UniqueInt64/1             15516052 ns   15515737 ns         45 null_percent=0.1   2.01408GB/s
   UniqueInt64/2             17031282 ns   17031266 ns         41 null_percent=1   1.83486GB/s
   UniqueInt64/3             20680114 ns   20680064 ns         34 null_percent=10   1.51112GB/s
   UniqueInt64/4             12018069 ns   12017844 ns         57 null_percent=99    2.6003GB/s
   UniqueInt64/5              9179953 ns    9179946 ns         77 null_percent=100   3.40416GB/s
   UniqueInt64/6             15501523 ns   15501496 ns         45 null_percent=0   2.01593GB/s
   UniqueInt64/7             16482935 ns   16482300 ns         41 null_percent=0.1   1.89597GB/s
   UniqueInt64/8             18349988 ns   18349317 ns         38 null_percent=1   1.70306GB/s
   UniqueInt64/9             21439268 ns   21439244 ns         32 null_percent=10   1.45761GB/s
   UniqueInt64/10            12530067 ns   12529871 ns         55 null_percent=99   2.49404GB/s
   UniqueInt64/11             9167314 ns    9167365 ns         75 null_percent=100   3.40883GB/s
   UniqueString10bytes/0     43535899 ns   43535846 ns         16 null_percent=0   918.783MB/s
   UniqueString10bytes/1     45130595 ns   45129634 ns         16 null_percent=0.1   886.336MB/s
   UniqueString10bytes/2     45249034 ns   45247983 ns         15 null_percent=1   884.017MB/s
   UniqueString10bytes/3     45101533 ns   45100209 ns         16 null_percent=10   886.914MB/s
   UniqueString10bytes/4      4316048 ns    4316019 ns        163 null_percent=99   9.05059GB/s
   UniqueString10bytes/5      1435781 ns    1435763 ns        485 null_percent=100   27.2068GB/s
   UniqueString10bytes/6     59100344 ns   59098817 ns         12 null_percent=0   676.832MB/s
   UniqueString10bytes/7     59797544 ns   59795857 ns         12 null_percent=0.1   668.943MB/s
   UniqueString10bytes/8     61024697 ns   61023090 ns         11 null_percent=1    655.49MB/s
   UniqueString10bytes/9     59817211 ns   59816339 ns         12 null_percent=10   668.714MB/s
   UniqueString10bytes/10     4950387 ns    4950242 ns        134 null_percent=99   7.89103GB/s
   UniqueString10bytes/11     1443482 ns    1443434 ns        446 null_percent=100   27.0622GB/s
   UniqueString100bytes/0    95609006 ns   95606132 ns          7 null_percent=0   4.08577GB/s
   UniqueString100bytes/1    96850582 ns   96849441 ns          7 null_percent=0.1   4.03332GB/s
   UniqueString100bytes/2    95404742 ns   95404634 ns          7 null_percent=1    4.0944GB/s
   UniqueString100bytes/3    89401775 ns   89401006 ns          8 null_percent=10   4.36936GB/s
   UniqueString100bytes/4     4705868 ns    4705746 ns        148 null_percent=99   83.0102GB/s
   UniqueString100bytes/5     1434077 ns    1434055 ns        486 null_percent=100   272.392GB/s
   UniqueString100bytes/6   206155133 ns  206148425 ns          3 null_percent=0   1.89487GB/s
   UniqueString100bytes/7   204661287 ns  204653659 ns          3 null_percent=0.1   1.90871GB/s
   UniqueString100bytes/8   205941884 ns  205941271 ns          3 null_percent=1   1.89678GB/s
   UniqueString100bytes/9   192074501 ns  192073431 ns          4 null_percent=10   2.03373GB/s
   UniqueString100bytes/10    6180349 ns    6180227 ns        111 null_percent=99   63.2056GB/s
   UniqueString100bytes/11    1474565 ns    1474564 ns        482 null_percent=100   264.909GB/s
   UniqueUInt8/0              1990025 ns    1990023 ns        348 null_percent=0   1.96292GB/s
   UniqueUInt8/1              2594146 ns    2594089 ns        272 null_percent=0.1   1.50583GB/s
   UniqueUInt8/2              4726027 ns    4726053 ns        145 null_percent=1   846.372MB/s
   UniqueUInt8/3              9465222 ns    9465126 ns         75 null_percent=10   422.604MB/s
   UniqueUInt8/4              3557141 ns    3557135 ns        195 null_percent=99    1124.5MB/s
   UniqueUInt8/5              2259664 ns    2259664 ns        314 null_percent=100   1.72869GB/s
   ```
   
   Here is the % diff versus the baseline. 
   
   * Cases 1 and 7 are the mostly-not-null cases. This shows a 15-20% perf improvement
   * Cases 5 and 11 are the all-null cases.
   * Case 4 and 10 are the 99% null cases
   * The "BuildDictionary" case at the bottom with the perf regression is one of the "worst case scenarios". 89% of the values are null and so we almost never observe an all-null or all-not-null block. The use of `BitUtil::GetBit` over BitmapReader causes this slightly regression since nearly every validity bit must be checked separately. I don't think it's worth optimizing for this case since the others are more empirically representative of real world data
   
   ```
                     benchmark          baseline        contender  change %  regression
   8    UniqueString100bytes/5    40.668 GiB/sec  272.392 GiB/sec   569.787       False
   37    UniqueString10bytes/5     4.064 GiB/sec   27.207 GiB/sec   569.456       False
   33   UniqueString10bytes/11     4.065 GiB/sec   27.062 GiB/sec   565.751       False
   12  UniqueString100bytes/11    40.578 GiB/sec  264.909 GiB/sec   552.841       False
   0     UniqueString10bytes/4     3.568 GiB/sec    9.051 GiB/sec   153.692       False
   36   UniqueString100bytes/4    34.408 GiB/sec   83.010 GiB/sec   141.252       False
   19   UniqueString10bytes/10     3.375 GiB/sec    7.891 GiB/sec   133.794       False
   24            UniqueUInt8/1   677.981 MiB/sec    1.506 GiB/sec   127.435       False
   5   UniqueString100bytes/10    30.775 GiB/sec   63.206 GiB/sec   105.381       False
   27            UniqueUInt8/5  1000.163 MiB/sec    1.729 GiB/sec    76.989       False
   13            UniqueUInt8/2   650.819 MiB/sec  846.372 MiB/sec    30.047       False
   29           UniqueInt64/11     2.703 GiB/sec    3.409 GiB/sec    26.126       False
   7             UniqueInt64/5     2.704 GiB/sec    3.404 GiB/sec    25.903       False
   18            UniqueUInt8/4   932.926 MiB/sec    1.098 GiB/sec    20.535       False
   23            UniqueInt64/1     1.681 GiB/sec    2.014 GiB/sec    19.840       False
   21            UniqueInt64/7     1.628 GiB/sec    1.896 GiB/sec    16.476       False
   31            UniqueInt64/2     1.658 GiB/sec    1.835 GiB/sec    10.651       False
   20    UniqueString10bytes/7   612.647 MiB/sec  668.943 MiB/sec     9.189       False
   16            UniqueInt64/3     1.386 GiB/sec    1.511 GiB/sec     9.053       False
   38    UniqueString10bytes/8   601.259 MiB/sec  655.490 MiB/sec     9.019       False
   1             UniqueUInt8/0     1.808 GiB/sec    1.963 GiB/sec     8.588       False
   41            UniqueInt64/9     1.355 GiB/sec    1.458 GiB/sec     7.562       False
   14    UniqueString10bytes/1   830.614 MiB/sec  886.336 MiB/sec     6.709       False
   4             UniqueInt64/8     1.603 GiB/sec    1.703 GiB/sec     6.260       False
   32    UniqueString10bytes/2   847.018 MiB/sec  884.017 MiB/sec     4.368       False
   42            UniqueInt64/4     2.508 GiB/sec    2.600 GiB/sec     3.701       False
   39    UniqueString10bytes/3   855.985 MiB/sec  886.914 MiB/sec     3.613       False
   28           UniqueInt64/10     2.413 GiB/sec    2.494 GiB/sec     3.360       False
   34   UniqueString100bytes/3     4.254 GiB/sec    4.369 GiB/sec     2.722       False
   11   UniqueString100bytes/2     3.993 GiB/sec    4.094 GiB/sec     2.544       False
   9     UniqueString10bytes/9   654.257 MiB/sec  668.714 MiB/sec     2.210       False
   35    UniqueString10bytes/6   662.915 MiB/sec  676.832 MiB/sec     2.099       False
   6     BuildStringDictionary    80.971 MiB/sec   81.753 MiB/sec     0.966       False
   22   UniqueString100bytes/1     4.002 GiB/sec    4.033 GiB/sec     0.783       False
   25            UniqueInt64/0     2.153 GiB/sec    2.168 GiB/sec     0.697       False
   17    UniqueString10bytes/0   917.726 MiB/sec  918.783 MiB/sec     0.115       False
   43            UniqueInt64/6     2.017 GiB/sec    2.016 GiB/sec    -0.071       False
   40   UniqueString100bytes/0     4.091 GiB/sec    4.086 GiB/sec    -0.130       False
   3    UniqueString100bytes/7     1.938 GiB/sec    1.909 GiB/sec    -1.519       False
   26   UniqueString100bytes/8     1.954 GiB/sec    1.897 GiB/sec    -2.935       False
   2    UniqueString100bytes/9     2.114 GiB/sec    2.034 GiB/sec    -3.782       False
   30   UniqueString100bytes/6     2.008 GiB/sec    1.895 GiB/sec    -5.649        True
   10            UniqueUInt8/3   474.468 MiB/sec  422.604 MiB/sec   -10.931        True
   15          BuildDictionary     1.776 GiB/sec    1.212 GiB/sec   -31.742        True
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm edited a comment on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

Posted by GitBox <gi...@apache.org>.
wesm edited a comment on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647888437


   FWIW the performance issue seems to be more pronounced on gcc than clang, here is the benchmark comparison on my machine with clang-8
   
   ```
                                                 benchmark         baseline        contender  change %                                       counters
   1     SortToIndicesInt64Count/32768/10000/min_time:1.000    1.560 GiB/sec    2.000 GiB/sec    28.163    {'iterations': 70030, 'null_percent': 0.01}
   15  SortToIndicesInt64Compare/32768/10000/min_time:1.000  145.735 MiB/sec  158.918 MiB/sec     9.046     {'iterations': 6654, 'null_percent': 0.01}
   5     SortToIndicesInt64Compare/32768/100/min_time:1.000  149.117 MiB/sec  159.609 MiB/sec     7.036      {'iterations': 6545, 'null_percent': 1.0}
   7       SortToIndicesInt64Compare/32768/0/min_time:1.000  153.027 MiB/sec  162.227 MiB/sec     6.012      {'iterations': 6862, 'null_percent': 0.0}
   4      SortToIndicesInt64Compare/32768/10/min_time:1.000  160.419 MiB/sec  167.725 MiB/sec     4.554     {'iterations': 6934, 'null_percent': 10.0}
   2       SortToIndicesInt64Compare/32768/2/min_time:1.000  255.024 MiB/sec  260.284 MiB/sec     2.063    {'iterations': 11390, 'null_percent': 50.0}
   9       SortToIndicesInt64Count/32768/100/min_time:1.000    1.486 GiB/sec    1.458 GiB/sec    -1.912     {'iterations': 66757, 'null_percent': 1.0}
   10        SortToIndicesInt64Count/32768/0/min_time:1.000    2.143 GiB/sec    2.067 GiB/sec    -3.568     {'iterations': 98191, 'null_percent': 0.0}
   13      SortToIndicesInt64Count/8388608/1/min_time:1.000    4.215 GiB/sec    3.813 GiB/sec    -9.531     {'iterations': 762, 'null_percent': 100.0}
   11        SortToIndicesInt64Count/32768/2/min_time:1.000  679.023 MiB/sec  609.379 MiB/sec   -10.256    {'iterations': 29602, 'null_percent': 50.0}
   0       SortToIndicesInt64Count/1048576/1/min_time:1.000    4.487 GiB/sec    4.021 GiB/sec   -10.400    {'iterations': 6550, 'null_percent': 100.0}
   12    SortToIndicesInt64Compare/8388608/1/min_time:1.000    4.250 GiB/sec    3.762 GiB/sec   -11.476     {'iterations': 766, 'null_percent': 100.0}
   6         SortToIndicesInt64Count/32768/1/min_time:1.000    4.758 GiB/sec    4.185 GiB/sec   -12.040  {'iterations': 217705, 'null_percent': 100.0}
   8       SortToIndicesInt64Compare/32768/1/min_time:1.000    4.730 GiB/sec    4.125 GiB/sec   -12.780  {'iterations': 213908, 'null_percent': 100.0}
   3     SortToIndicesInt64Compare/1048576/1/min_time:1.000    4.556 GiB/sec    3.953 GiB/sec   -13.228    {'iterations': 6539, 'null_percent': 100.0}
   14       SortToIndicesInt64Count/32768/10/min_time:1.000    1.316 GiB/sec    1.051 GiB/sec   -20.108    {'iterations': 59539, 'null_percent': 10.0}
   ```
   
   The perf regression with the data 100% null is an artifact of the improper implementation


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm edited a comment on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

Posted by GitBox <gi...@apache.org>.
wesm edited a comment on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647884747


   Sorting seems too important to leave it to these relatively complex templates, I would suggest implementing the counting sort without using `VisitArrayDataInline`. I'm happy to help with this. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647821487


   Here's a benchmark run with gcc-8 
   
   ```
   ---------------------------------------------------------------
   Benchmark                        Time           CPU Iterations
   ---------------------------------------------------------------
   BuildDictionary            3219443 ns    3219440 ns        218   1.21215GB/s
   BuildStringDictionary      3692881 ns    3692881 ns        192   81.7532MB/s
   UniqueInt64/0             14413456 ns   14413251 ns         48 null_percent=0   2.16814GB/s
   UniqueInt64/1             15516052 ns   15515737 ns         45 null_percent=0.1   2.01408GB/s
   UniqueInt64/2             17031282 ns   17031266 ns         41 null_percent=1   1.83486GB/s
   UniqueInt64/3             20680114 ns   20680064 ns         34 null_percent=10   1.51112GB/s
   UniqueInt64/4             12018069 ns   12017844 ns         57 null_percent=99    2.6003GB/s
   UniqueInt64/5              9179953 ns    9179946 ns         77 null_percent=100   3.40416GB/s
   UniqueInt64/6             15501523 ns   15501496 ns         45 null_percent=0   2.01593GB/s
   UniqueInt64/7             16482935 ns   16482300 ns         41 null_percent=0.1   1.89597GB/s
   UniqueInt64/8             18349988 ns   18349317 ns         38 null_percent=1   1.70306GB/s
   UniqueInt64/9             21439268 ns   21439244 ns         32 null_percent=10   1.45761GB/s
   UniqueInt64/10            12530067 ns   12529871 ns         55 null_percent=99   2.49404GB/s
   UniqueInt64/11             9167314 ns    9167365 ns         75 null_percent=100   3.40883GB/s
   UniqueString10bytes/0     43535899 ns   43535846 ns         16 null_percent=0   918.783MB/s
   UniqueString10bytes/1     45130595 ns   45129634 ns         16 null_percent=0.1   886.336MB/s
   UniqueString10bytes/2     45249034 ns   45247983 ns         15 null_percent=1   884.017MB/s
   UniqueString10bytes/3     45101533 ns   45100209 ns         16 null_percent=10   886.914MB/s
   UniqueString10bytes/4      4316048 ns    4316019 ns        163 null_percent=99   9.05059GB/s
   UniqueString10bytes/5      1435781 ns    1435763 ns        485 null_percent=100   27.2068GB/s
   UniqueString10bytes/6     59100344 ns   59098817 ns         12 null_percent=0   676.832MB/s
   UniqueString10bytes/7     59797544 ns   59795857 ns         12 null_percent=0.1   668.943MB/s
   UniqueString10bytes/8     61024697 ns   61023090 ns         11 null_percent=1    655.49MB/s
   UniqueString10bytes/9     59817211 ns   59816339 ns         12 null_percent=10   668.714MB/s
   UniqueString10bytes/10     4950387 ns    4950242 ns        134 null_percent=99   7.89103GB/s
   UniqueString10bytes/11     1443482 ns    1443434 ns        446 null_percent=100   27.0622GB/s
   UniqueString100bytes/0    95609006 ns   95606132 ns          7 null_percent=0   4.08577GB/s
   UniqueString100bytes/1    96850582 ns   96849441 ns          7 null_percent=0.1   4.03332GB/s
   UniqueString100bytes/2    95404742 ns   95404634 ns          7 null_percent=1    4.0944GB/s
   UniqueString100bytes/3    89401775 ns   89401006 ns          8 null_percent=10   4.36936GB/s
   UniqueString100bytes/4     4705868 ns    4705746 ns        148 null_percent=99   83.0102GB/s
   UniqueString100bytes/5     1434077 ns    1434055 ns        486 null_percent=100   272.392GB/s
   UniqueString100bytes/6   206155133 ns  206148425 ns          3 null_percent=0   1.89487GB/s
   UniqueString100bytes/7   204661287 ns  204653659 ns          3 null_percent=0.1   1.90871GB/s
   UniqueString100bytes/8   205941884 ns  205941271 ns          3 null_percent=1   1.89678GB/s
   UniqueString100bytes/9   192074501 ns  192073431 ns          4 null_percent=10   2.03373GB/s
   UniqueString100bytes/10    6180349 ns    6180227 ns        111 null_percent=99   63.2056GB/s
   UniqueString100bytes/11    1474565 ns    1474564 ns        482 null_percent=100   264.909GB/s
   UniqueUInt8/0              1990025 ns    1990023 ns        348 null_percent=0   1.96292GB/s
   UniqueUInt8/1              2594146 ns    2594089 ns        272 null_percent=0.1   1.50583GB/s
   UniqueUInt8/2              4726027 ns    4726053 ns        145 null_percent=1   846.372MB/s
   UniqueUInt8/3              9465222 ns    9465126 ns         75 null_percent=10   422.604MB/s
   UniqueUInt8/4              3557141 ns    3557135 ns        195 null_percent=99    1124.5MB/s
   UniqueUInt8/5              2259664 ns    2259664 ns        314 null_percent=100   1.72869GB/s
   ```
   
   Here is the % diff versus the baseline. 
   
   * Cases 1 and 7 are the mostly-not-null cases. This shows a 15-20% perf improvement
   * Cases 5 and 11 are the all-null cases.
   * Case 4 is the 99% null case
   * The "BuildDictionary" case at the bottom with the perf regression is one of the "worst case scenarios". 89% of the values are null and so we almost never observe an all-null or all-not-null block. The use of `BitUtil::GetBit` over BitmapReader causes this slightly regression since nearly every validity bit must be checked separately. I don't think it's worth optimizing for this case since the others are more empirically representative of real world data
   
   ```
                     benchmark          baseline        contender  change %  regression
   8    UniqueString100bytes/5    40.668 GiB/sec  272.392 GiB/sec   569.787       False
   37    UniqueString10bytes/5     4.064 GiB/sec   27.207 GiB/sec   569.456       False
   33   UniqueString10bytes/11     4.065 GiB/sec   27.062 GiB/sec   565.751       False
   12  UniqueString100bytes/11    40.578 GiB/sec  264.909 GiB/sec   552.841       False
   0     UniqueString10bytes/4     3.568 GiB/sec    9.051 GiB/sec   153.692       False
   36   UniqueString100bytes/4    34.408 GiB/sec   83.010 GiB/sec   141.252       False
   19   UniqueString10bytes/10     3.375 GiB/sec    7.891 GiB/sec   133.794       False
   24            UniqueUInt8/1   677.981 MiB/sec    1.506 GiB/sec   127.435       False
   5   UniqueString100bytes/10    30.775 GiB/sec   63.206 GiB/sec   105.381       False
   27            UniqueUInt8/5  1000.163 MiB/sec    1.729 GiB/sec    76.989       False
   13            UniqueUInt8/2   650.819 MiB/sec  846.372 MiB/sec    30.047       False
   29           UniqueInt64/11     2.703 GiB/sec    3.409 GiB/sec    26.126       False
   7             UniqueInt64/5     2.704 GiB/sec    3.404 GiB/sec    25.903       False
   18            UniqueUInt8/4   932.926 MiB/sec    1.098 GiB/sec    20.535       False
   23            UniqueInt64/1     1.681 GiB/sec    2.014 GiB/sec    19.840       False
   21            UniqueInt64/7     1.628 GiB/sec    1.896 GiB/sec    16.476       False
   31            UniqueInt64/2     1.658 GiB/sec    1.835 GiB/sec    10.651       False
   20    UniqueString10bytes/7   612.647 MiB/sec  668.943 MiB/sec     9.189       False
   16            UniqueInt64/3     1.386 GiB/sec    1.511 GiB/sec     9.053       False
   38    UniqueString10bytes/8   601.259 MiB/sec  655.490 MiB/sec     9.019       False
   1             UniqueUInt8/0     1.808 GiB/sec    1.963 GiB/sec     8.588       False
   41            UniqueInt64/9     1.355 GiB/sec    1.458 GiB/sec     7.562       False
   14    UniqueString10bytes/1   830.614 MiB/sec  886.336 MiB/sec     6.709       False
   4             UniqueInt64/8     1.603 GiB/sec    1.703 GiB/sec     6.260       False
   32    UniqueString10bytes/2   847.018 MiB/sec  884.017 MiB/sec     4.368       False
   42            UniqueInt64/4     2.508 GiB/sec    2.600 GiB/sec     3.701       False
   39    UniqueString10bytes/3   855.985 MiB/sec  886.914 MiB/sec     3.613       False
   28           UniqueInt64/10     2.413 GiB/sec    2.494 GiB/sec     3.360       False
   34   UniqueString100bytes/3     4.254 GiB/sec    4.369 GiB/sec     2.722       False
   11   UniqueString100bytes/2     3.993 GiB/sec    4.094 GiB/sec     2.544       False
   9     UniqueString10bytes/9   654.257 MiB/sec  668.714 MiB/sec     2.210       False
   35    UniqueString10bytes/6   662.915 MiB/sec  676.832 MiB/sec     2.099       False
   6     BuildStringDictionary    80.971 MiB/sec   81.753 MiB/sec     0.966       False
   22   UniqueString100bytes/1     4.002 GiB/sec    4.033 GiB/sec     0.783       False
   25            UniqueInt64/0     2.153 GiB/sec    2.168 GiB/sec     0.697       False
   17    UniqueString10bytes/0   917.726 MiB/sec  918.783 MiB/sec     0.115       False
   43            UniqueInt64/6     2.017 GiB/sec    2.016 GiB/sec    -0.071       False
   40   UniqueString100bytes/0     4.091 GiB/sec    4.086 GiB/sec    -0.130       False
   3    UniqueString100bytes/7     1.938 GiB/sec    1.909 GiB/sec    -1.519       False
   26   UniqueString100bytes/8     1.954 GiB/sec    1.897 GiB/sec    -2.935       False
   2    UniqueString100bytes/9     2.114 GiB/sec    2.034 GiB/sec    -3.782       False
   30   UniqueString100bytes/6     2.008 GiB/sec    1.895 GiB/sec    -5.649        True
   10            UniqueUInt8/3   474.468 MiB/sec  422.604 MiB/sec   -10.931        True
   15          BuildDictionary     1.776 GiB/sec    1.212 GiB/sec   -31.742        True
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647888437


   FWIW the performance issue seems to be more pronounced on gcc than clang, here is the benchmark comparison on my machine with clang-8
   
   ```
                                                 benchmark         baseline        contender  change %                                       counters
   1     SortToIndicesInt64Count/32768/10000/min_time:1.000    1.560 GiB/sec    2.000 GiB/sec    28.163    {'iterations': 70030, 'null_percent': 0.01}
   15  SortToIndicesInt64Compare/32768/10000/min_time:1.000  145.735 MiB/sec  158.918 MiB/sec     9.046     {'iterations': 6654, 'null_percent': 0.01}
   5     SortToIndicesInt64Compare/32768/100/min_time:1.000  149.117 MiB/sec  159.609 MiB/sec     7.036      {'iterations': 6545, 'null_percent': 1.0}
   7       SortToIndicesInt64Compare/32768/0/min_time:1.000  153.027 MiB/sec  162.227 MiB/sec     6.012      {'iterations': 6862, 'null_percent': 0.0}
   4      SortToIndicesInt64Compare/32768/10/min_time:1.000  160.419 MiB/sec  167.725 MiB/sec     4.554     {'iterations': 6934, 'null_percent': 10.0}
   2       SortToIndicesInt64Compare/32768/2/min_time:1.000  255.024 MiB/sec  260.284 MiB/sec     2.063    {'iterations': 11390, 'null_percent': 50.0}
   9       SortToIndicesInt64Count/32768/100/min_time:1.000    1.486 GiB/sec    1.458 GiB/sec    -1.912     {'iterations': 66757, 'null_percent': 1.0}
   10        SortToIndicesInt64Count/32768/0/min_time:1.000    2.143 GiB/sec    2.067 GiB/sec    -3.568     {'iterations': 98191, 'null_percent': 0.0}
   13      SortToIndicesInt64Count/8388608/1/min_time:1.000    4.215 GiB/sec    3.813 GiB/sec    -9.531     {'iterations': 762, 'null_percent': 100.0}
   11        SortToIndicesInt64Count/32768/2/min_time:1.000  679.023 MiB/sec  609.379 MiB/sec   -10.256    {'iterations': 29602, 'null_percent': 50.0}
   0       SortToIndicesInt64Count/1048576/1/min_time:1.000    4.487 GiB/sec    4.021 GiB/sec   -10.400    {'iterations': 6550, 'null_percent': 100.0}
   12    SortToIndicesInt64Compare/8388608/1/min_time:1.000    4.250 GiB/sec    3.762 GiB/sec   -11.476     {'iterations': 766, 'null_percent': 100.0}
   6         SortToIndicesInt64Count/32768/1/min_time:1.000    4.758 GiB/sec    4.185 GiB/sec   -12.040  {'iterations': 217705, 'null_percent': 100.0}
   8       SortToIndicesInt64Compare/32768/1/min_time:1.000    4.730 GiB/sec    4.125 GiB/sec   -12.780  {'iterations': 213908, 'null_percent': 100.0}
   3     SortToIndicesInt64Compare/1048576/1/min_time:1.000    4.556 GiB/sec    3.953 GiB/sec   -13.228    {'iterations': 6539, 'null_percent': 100.0}
   14       SortToIndicesInt64Count/32768/10/min_time:1.000    1.316 GiB/sec    1.051 GiB/sec   -20.108    {'iterations': 59539, 'null_percent': 10.0}
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647884747


   Sorting seems too important to leave it to these relatively complex templates, I would suggest implementing the counting sort without using `VisitArrayDataInline`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] cyb70289 commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

Posted by GitBox <gi...@apache.org>.
cyb70289 commented on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647914768


   > I'm refactoring to nix util::optional. I'm too tired to finish it tonight so I'll work on it tomorrow morning. If the perf regression isn't gone I'll rewrite the sort kernels.
   
   Very likely I'm wrong. I remember util::optional is added due to CI failure https://github.com/apache/arrow/pull/6495#issuecomment-593732821
   
   I think this patch is okay.
   Sorting regression can be fixed (maybe improved). I'm okay to do the follow up changes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] cyb70289 commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

Posted by GitBox <gi...@apache.org>.
cyb70289 commented on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647878260


   @ursabot benchmark --suite-filter=arrow-compute-vector-sort-benchmark


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647820320


   https://issues.apache.org/jira/browse/ARROW-9210


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm edited a comment on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

Posted by GitBox <gi...@apache.org>.
wesm edited a comment on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647884747


   Sorting seems too important to leave it to these relatively complex templates (for example, just after determining that a value is not null, the `optional` value is checked whether it's null again!), I would suggest implementing the counting sort without using `VisitArrayDataInline`. I'm happy to help with this. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm edited a comment on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

Posted by GitBox <gi...@apache.org>.
wesm edited a comment on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647825256


   FWIW on the "gcc/clang perf discussion", clang-8 also shows performance benefits and limited downside
   
   ```
                     benchmark         baseline        contender  change %  regression
   2            UniqueInt64/11    6.444 GiB/sec   18.511 GiB/sec   187.240       False
   31            UniqueInt64/5    6.470 GiB/sec   18.390 GiB/sec   184.244       False
   39            UniqueUInt8/5  810.180 MiB/sec    1.747 GiB/sec   120.867       False
   26            UniqueUInt8/1  683.475 MiB/sec    1.430 GiB/sec   114.196       False
   42            UniqueInt64/4    5.424 GiB/sec    6.965 GiB/sec    28.397       False
   18            UniqueInt64/1    2.672 GiB/sec    3.411 GiB/sec    27.627       False
   40            UniqueUInt8/2  654.320 MiB/sec  826.916 MiB/sec    26.378       False
   33            UniqueUInt8/4  758.115 MiB/sec  947.360 MiB/sec    24.962       False
   25           UniqueInt64/10    5.248 GiB/sec    6.426 GiB/sec    22.460       False
   9    UniqueString100bytes/5   26.923 GiB/sec   32.142 GiB/sec    19.384       False
   35   UniqueString10bytes/11    2.691 GiB/sec    3.207 GiB/sec    19.173       False
   3     UniqueString10bytes/5    2.695 GiB/sec    3.200 GiB/sec    18.731       False
   20  UniqueString100bytes/11   26.909 GiB/sec   31.831 GiB/sec    18.291       False
   30            UniqueInt64/7    2.514 GiB/sec    2.890 GiB/sec    14.960       False
   37            UniqueInt64/2    2.619 GiB/sec    2.975 GiB/sec    13.578       False
   11    UniqueString10bytes/4    2.487 GiB/sec    2.700 GiB/sec     8.596       False
   32   UniqueString10bytes/10    2.386 GiB/sec    2.589 GiB/sec     8.481       False
   0    UniqueString100bytes/4   24.419 GiB/sec   26.365 GiB/sec     7.966       False
   38  UniqueString100bytes/10   22.463 GiB/sec   24.128 GiB/sec     7.411       False
   34            UniqueInt64/8    2.392 GiB/sec    2.563 GiB/sec     7.157       False
   19    UniqueString10bytes/1  781.817 MiB/sec  835.760 MiB/sec     6.900       False
   43            UniqueInt64/3    2.184 GiB/sec    2.331 GiB/sec     6.721       False
   24    UniqueString10bytes/7  583.523 MiB/sec  621.007 MiB/sec     6.424       False
   15   UniqueString100bytes/7    1.936 GiB/sec    2.024 GiB/sec     4.538       False
   6     UniqueString10bytes/2  780.337 MiB/sec  805.686 MiB/sec     3.248       False
   27   UniqueString100bytes/2    3.934 GiB/sec    4.059 GiB/sec     3.197       False
   13   UniqueString100bytes/1    3.898 GiB/sec    3.995 GiB/sec     2.485       False
   7     UniqueString10bytes/8  592.115 MiB/sec  604.865 MiB/sec     2.153       False
   29   UniqueString100bytes/8    1.969 GiB/sec    2.011 GiB/sec     2.111       False
   21            UniqueInt64/9    2.034 GiB/sec    2.048 GiB/sec     0.676       False
   1     BuildStringDictionary   85.937 MiB/sec   85.928 MiB/sec    -0.010       False
   41            UniqueUInt8/3  449.171 MiB/sec  448.844 MiB/sec    -0.073       False
   28   UniqueString100bytes/0    4.084 GiB/sec    4.077 GiB/sec    -0.161       False
   4    UniqueString100bytes/3    4.255 GiB/sec    4.235 GiB/sec    -0.450       False
   5    UniqueString100bytes/6    2.054 GiB/sec    2.033 GiB/sec    -1.041       False
   14   UniqueString100bytes/9    2.138 GiB/sec    2.107 GiB/sec    -1.449       False
   8             UniqueUInt8/0    1.777 GiB/sec    1.750 GiB/sec    -1.487       False
   23            UniqueInt64/0    3.860 GiB/sec    3.799 GiB/sec    -1.560       False
   10    UniqueString10bytes/9  616.458 MiB/sec  605.470 MiB/sec    -1.782       False
   22    UniqueString10bytes/3  799.494 MiB/sec  783.825 MiB/sec    -1.960       False
   17    UniqueString10bytes/6  647.921 MiB/sec  631.631 MiB/sec    -2.514       False
   36          BuildDictionary    1.539 GiB/sec    1.498 GiB/sec    -2.694       False
   16            UniqueInt64/6    3.193 GiB/sec    3.077 GiB/sec    -3.634       False
   12    UniqueString10bytes/0  881.975 MiB/sec  839.487 MiB/sec    -4.817       False
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647883517


   [AMD64 Ubuntu 18.04 C++ Benchmark (#114347)](https://ci.ursalabs.org/#builders/73/builds/89) builder has been succeeded.
   
   Revision: dbd166df749e73cbf7c1ec0c6cfa5837280aa32d
   
   ```diff
     ====================================================  ===============  ===============  ========
     benchmark                                             baseline         contender        change
     ====================================================  ===============  ===============  ========
     SortToIndicesInt64Count/32768/1/min_time:1.000        2.690 GiB/sec    2.646 GiB/sec    -1.654%
     SortToIndicesInt64Count/1048576/1/min_time:1.000      3.244 GiB/sec    3.198 GiB/sec    -1.423%
     SortToIndicesInt64Compare/32768/10000/min_time:1.000  103.004 MiB/sec  101.724 MiB/sec  -1.243%
     SortToIndicesInt64Compare/32768/1/min_time:1.000      2.685 GiB/sec    2.612 GiB/sec    -2.707%
     SortToIndicesInt64Compare/32768/0/min_time:1.000      105.027 MiB/sec  103.783 MiB/sec  -1.184%
   - SortToIndicesInt64Compare/32768/10/min_time:1.000     109.648 MiB/sec  102.376 MiB/sec  -6.633%
   - SortToIndicesInt64Count/32768/10/min_time:1.000       701.425 MiB/sec  286.420 MiB/sec  -59.166%
   - SortToIndicesInt64Count/32768/100/min_time:1.000      686.441 MiB/sec  386.614 MiB/sec  -43.678%
     SortToIndicesInt64Compare/8388608/1/min_time:1.000    3.162 GiB/sec    3.201 GiB/sec    1.242%
   - SortToIndicesInt64Count/32768/2/min_time:1.000        526.866 MiB/sec  259.139 MiB/sec  -50.815%
   - SortToIndicesInt64Count/32768/10000/min_time:1.000    683.857 MiB/sec  599.732 MiB/sec  -12.301%
     SortToIndicesInt64Compare/32768/100/min_time:1.000    103.157 MiB/sec  98.649 MiB/sec   -4.370%
     SortToIndicesInt64Count/8388608/1/min_time:1.000      3.259 GiB/sec    3.211 GiB/sec    -1.495%
     SortToIndicesInt64Count/32768/0/min_time:1.000        647.629 MiB/sec  627.171 MiB/sec  -3.159%
     SortToIndicesInt64Compare/1048576/1/min_time:1.000    3.197 GiB/sec    3.198 GiB/sec    0.035%
   - SortToIndicesInt64Compare/32768/2/min_time:1.000      171.750 MiB/sec  162.637 MiB/sec  -5.306%
     ====================================================  ===============  ===============  ========
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-648147535


   thanks @pitrou and @cyb70289 -- I will spend a little time on the count-sort implementation and post a new patch


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org