You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/07 20:38:47 UTC

[GitHub] [arrow] nirandaperera opened a new pull request #10679: ARROW-13170 [C++] Reducing branching in compute/kernels/vector_selection.cc

nirandaperera opened a new pull request #10679:
URL: https://github.com/apache/arrow/pull/10679


   This PR adds the changes discussed in ARROW-13170.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] wesm commented on pull request #10679: ARROW-13170 [C++] Reducing branching in compute/kernels/vector_selection.cc

Posted by GitBox <gi...@apache.org>.

wesm commented on pull request #10679:
URL: https://github.com/apache/arrow/pull/10679#issuecomment-876226830


   I guess branch prediction works really well on modern processors — when the filter values are mostly false then the if block is rarely executed. In the 50% selected case, branch prediction doesn't help so this method yields speedups. 
   
   Not sure if it's worth investigating, but moving `out_position_` from a class member to a stack variable could have some performance impact. Another thought is removing some of the offset arithmetic
   
   Lastly, since we're using `SetBitTo` a lot, it might be worth checking the superscalar variant described at
   
   https://graphics.stanford.edu/~seander/bithacks.html#ConditionalSetOrClearBitsWithoutBranching


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] pitrou commented on pull request #10679: ARROW-13170 [C++] Reducing branching in compute/kernels/vector_selection.cc

Posted by GitBox <gi...@apache.org>.

pitrou commented on pull request #10679:
URL: https://github.com/apache/arrow/pull/10679#issuecomment-884312893


   I get these benchmark results here (AMD Zen 2 CPU):
   ```
   --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Non-regressions: (20)
   --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                              benchmark       baseline      contender  change %                                                                                                                                                                                counters
   FilterInt64FilterWithNulls/524288/10  1.656 GiB/sec  3.296 GiB/sec    99.035 {'run_name': 'FilterInt64FilterWithNulls/524288/10', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 2369, 'data null%': 10.0, 'mask null%': 5.0, 'select%': 50.0}
    FilterInt64FilterWithNulls/524288/7  1.746 GiB/sec  3.446 GiB/sec    97.382   {'run_name': 'FilterInt64FilterWithNulls/524288/7', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 2488, 'data null%': 1.0, 'mask null%': 5.0, 'select%': 50.0}
   FilterInt64FilterWithNulls/524288/13  1.658 GiB/sec  3.255 GiB/sec    96.242 {'run_name': 'FilterInt64FilterWithNulls/524288/13', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 2375, 'data null%': 90.0, 'mask null%': 5.0, 'select%': 50.0}
    FilterInt64FilterWithNulls/524288/1  1.829 GiB/sec  3.570 GiB/sec    95.193   {'run_name': 'FilterInt64FilterWithNulls/524288/1', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 2608, 'data null%': 0.0, 'mask null%': 5.0, 'select%': 50.0}
    FilterInt64FilterWithNulls/524288/4  1.834 GiB/sec  3.556 GiB/sec    93.834   {'run_name': 'FilterInt64FilterWithNulls/524288/4', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 2620, 'data null%': 0.1, 'mask null%': 5.0, 'select%': 50.0}
      FilterInt64FilterNoNulls/524288/7  1.790 GiB/sec  3.102 GiB/sec    73.262     {'run_name': 'FilterInt64FilterNoNulls/524288/7', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 2563, 'data null%': 1.0, 'mask null%': 0.0, 'select%': 50.0}
      FilterInt64FilterNoNulls/524288/4  1.865 GiB/sec  2.713 GiB/sec    45.508     {'run_name': 'FilterInt64FilterNoNulls/524288/4', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 2666, 'data null%': 0.1, 'mask null%': 0.0, 'select%': 50.0}
     FilterInt64FilterNoNulls/524288/13  1.741 GiB/sec  2.500 GiB/sec    43.569   {'run_name': 'FilterInt64FilterNoNulls/524288/13', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 2503, 'data null%': 90.0, 'mask null%': 0.0, 'select%': 50.0}
     FilterInt64FilterNoNulls/524288/10  1.747 GiB/sec  2.458 GiB/sec    40.644   {'run_name': 'FilterInt64FilterNoNulls/524288/10', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 2503, 'data null%': 10.0, 'mask null%': 0.0, 'select%': 50.0}
    FilterInt64FilterWithNulls/524288/9  2.782 GiB/sec  3.264 GiB/sec    17.322  {'run_name': 'FilterInt64FilterWithNulls/524288/9', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 3930, 'data null%': 10.0, 'mask null%': 5.0, 'select%': 99.9}
    FilterInt64FilterWithNulls/524288/6  3.131 GiB/sec  3.621 GiB/sec    15.660   {'run_name': 'FilterInt64FilterWithNulls/524288/6', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 4484, 'data null%': 1.0, 'mask null%': 5.0, 'select%': 99.9}
   FilterInt64FilterWithNulls/524288/12  2.792 GiB/sec  3.200 GiB/sec    14.600 {'run_name': 'FilterInt64FilterWithNulls/524288/12', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 3997, 'data null%': 90.0, 'mask null%': 5.0, 'select%': 99.9}
     FilterInt64FilterNoNulls/524288/12 10.371 GiB/sec 11.380 GiB/sec     9.728  {'run_name': 'FilterInt64FilterNoNulls/524288/12', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 14327, 'data null%': 90.0, 'mask null%': 0.0, 'select%': 99.9}
      FilterInt64FilterNoNulls/524288/9 10.397 GiB/sec 11.059 GiB/sec     6.363   {'run_name': 'FilterInt64FilterNoNulls/524288/9', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 14889, 'data null%': 10.0, 'mask null%': 0.0, 'select%': 99.9}
      FilterInt64FilterNoNulls/524288/0 31.150 GiB/sec 33.025 GiB/sec     6.021    {'run_name': 'FilterInt64FilterNoNulls/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 44550, 'data null%': 0.0, 'mask null%': 0.0, 'select%': 99.9}
      FilterInt64FilterNoNulls/524288/6 12.010 GiB/sec 12.518 GiB/sec     4.228    {'run_name': 'FilterInt64FilterNoNulls/524288/6', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 17148, 'data null%': 1.0, 'mask null%': 0.0, 'select%': 99.9}
      FilterInt64FilterNoNulls/524288/1  3.233 GiB/sec  3.314 GiB/sec     2.478     {'run_name': 'FilterInt64FilterNoNulls/524288/1', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 4619, 'data null%': 0.0, 'mask null%': 0.0, 'select%': 50.0}
      FilterInt64FilterNoNulls/524288/2 55.380 GiB/sec 55.998 GiB/sec     1.116     {'run_name': 'FilterInt64FilterNoNulls/524288/2', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 79290, 'data null%': 0.0, 'mask null%': 0.0, 'select%': 1.0}
    FilterInt64FilterWithNulls/524288/3  3.614 GiB/sec  3.621 GiB/sec     0.191   {'run_name': 'FilterInt64FilterWithNulls/524288/3', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 5171, 'data null%': 0.1, 'mask null%': 5.0, 'select%': 99.9}
      FilterInt64FilterNoNulls/524288/3 16.487 GiB/sec 16.002 GiB/sec    -2.945    {'run_name': 'FilterInt64FilterNoNulls/524288/3', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 24008, 'data null%': 0.1, 'mask null%': 0.0, 'select%': 99.9}
   
   -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Regressions: (10)
   -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                              benchmark       baseline     contender  change %                                                                                                                                                                                counters
    FilterInt64FilterWithNulls/524288/0  3.703 GiB/sec 2.591 GiB/sec   -30.008   {'run_name': 'FilterInt64FilterWithNulls/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 5297, 'data null%': 0.0, 'mask null%': 5.0, 'select%': 99.9}
    FilterInt64FilterWithNulls/524288/8  9.559 GiB/sec 6.414 GiB/sec   -32.902   {'run_name': 'FilterInt64FilterWithNulls/524288/8', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 13527, 'data null%': 1.0, 'mask null%': 5.0, 'select%': 1.0}
   FilterInt64FilterWithNulls/524288/14  9.474 GiB/sec 6.310 GiB/sec   -33.396 {'run_name': 'FilterInt64FilterWithNulls/524288/14', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 13647, 'data null%': 90.0, 'mask null%': 5.0, 'select%': 1.0}
    FilterInt64FilterWithNulls/524288/2 10.062 GiB/sec 6.623 GiB/sec   -34.171   {'run_name': 'FilterInt64FilterWithNulls/524288/2', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 14448, 'data null%': 0.0, 'mask null%': 5.0, 'select%': 1.0}
   FilterInt64FilterWithNulls/524288/11  9.583 GiB/sec 6.202 GiB/sec   -35.279 {'run_name': 'FilterInt64FilterWithNulls/524288/11', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 13649, 'data null%': 10.0, 'mask null%': 5.0, 'select%': 1.0}
    FilterInt64FilterWithNulls/524288/5 10.277 GiB/sec 6.542 GiB/sec   -36.339   {'run_name': 'FilterInt64FilterWithNulls/524288/5', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 14650, 'data null%': 0.1, 'mask null%': 5.0, 'select%': 1.0}
      FilterInt64FilterNoNulls/524288/5 12.943 GiB/sec 6.610 GiB/sec   -48.933     {'run_name': 'FilterInt64FilterNoNulls/524288/5', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 18519, 'data null%': 0.1, 'mask null%': 0.0, 'select%': 1.0}
      FilterInt64FilterNoNulls/524288/8 12.834 GiB/sec 6.503 GiB/sec   -49.332     {'run_name': 'FilterInt64FilterNoNulls/524288/8', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 18022, 'data null%': 1.0, 'mask null%': 0.0, 'select%': 1.0}
     FilterInt64FilterNoNulls/524288/11 12.924 GiB/sec 6.505 GiB/sec   -49.671   {'run_name': 'FilterInt64FilterNoNulls/524288/11', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 18604, 'data null%': 10.0, 'mask null%': 0.0, 'select%': 1.0}
     FilterInt64FilterNoNulls/524288/14 12.991 GiB/sec 6.454 GiB/sec   -50.321   {'run_name': 'FilterInt64FilterNoNulls/524288/14', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 18607, 'data null%': 90.0, 'mask null%': 0.0, 'select%': 1.0}
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] cyb70289 commented on pull request #10679: ARROW-13170 [C++] Reducing branching in compute/kernels/vector_selection.cc

Posted by GitBox <gi...@apache.org>.

cyb70289 commented on pull request #10679:
URL: https://github.com/apache/arrow/pull/10679#issuecomment-899163047


   I'm closing this PR. Feel free to reopen if new findings.
   IMO, branchless implementation unconditionally introduces extra instructions. Given predictable branch is free on modern cpu, looks current branch code is optimal for low selection case which is common in practice.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot edited a comment on pull request #10679: ARROW-13170 [C++] Reducing branching in compute/kernels/vector_selection.cc

Posted by GitBox <gi...@apache.org>.

ursabot edited a comment on pull request #10679:
URL: https://github.com/apache/arrow/pull/10679#issuecomment-875950075


   Benchmark runs are scheduled for baseline = cf6a7ff65f4e2920641d116a3ba1f578b2bd8a9e and contender = 5a14b94046288938b22e1db1e7d0a5f6cc5d2fd1. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/96acb784c87842b2bfd28e17a9a90432...3465b868d70e425e8af9141550f9450f/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/5744b64d14d448198229be1dbb5265e7...87610e39461a4309accf9c5ce9f7f2f9/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/cd0a0e80ad2c4de0b60cda38c58b64a4...ec5b59d942eb4e7794b7c7808006e223/)
   Supported benchmarks:
   ursa-i9-9960x: langs = Python, R
   ursa-thinkcentre-m75q: langs = C++, Java
   ec2-t3-xlarge-us-east-2: cloud = True
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot commented on pull request #10679: ARROW-13170 [C++] Reducing branching in compute/kernels/vector_selection.cc

Posted by GitBox <gi...@apache.org>.

ursabot commented on pull request #10679:
URL: https://github.com/apache/arrow/pull/10679#issuecomment-879314016


   Benchmark runs are scheduled for baseline = cf6a7ff65f4e2920641d116a3ba1f578b2bd8a9e and contender = 38110e8e7ee598ddb0e8a3465d81ea7e24bafebc. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Provided benchmark filters do not have any benchmark groups to be executed on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/96acb784c87842b2bfd28e17a9a90432...fdfaab8d2dc84fbd81318b663afe1e44/)
   [Skipped :warning: Only ['Python', 'R'] langs are supported on ursa-i9-9960x] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/5744b64d14d448198229be1dbb5265e7...a08372ce96df46c48f6874d84c74035c/)
   [Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/cd0a0e80ad2c4de0b60cda38c58b64a4...9d55178f98434a64b8398f25458f5408/)
   Supported benchmarks:
   ursa-i9-9960x: langs = Python, R
   ursa-thinkcentre-m75q: langs = C++, Java
   ec2-t3-xlarge-us-east-2: cloud = True
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot edited a comment on pull request #10679: ARROW-13170 [C++] Reducing branching in compute/kernels/vector_selection.cc

Posted by GitBox <gi...@apache.org>.

ursabot edited a comment on pull request #10679:
URL: https://github.com/apache/arrow/pull/10679#issuecomment-875917401


   Benchmark runs are scheduled for baseline = 6c8d30ea82222fd2750b999840872d3f6cbdc8f8 and contender = c2d694b17596b13007bd54804d382808c60066aa. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/ead005de138847cdb35b900308a8716e...91280e77e83149fb857bf3153ece3a4e/)
   [Failed] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/5d26982bd71e45878dd5387f80c8d0f4...9475f85ca9e14856868ea301f4c6e7ea/)
   [Failed] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/00761e2d135d41859abe0f44fca6ee61...d7af8eaebb1149308ccf7b8870fe84c9/)
   Supported benchmarks:
   ursa-i9-9960x: langs = Python, R
   ursa-thinkcentre-m75q: langs = C++, Java
   ec2-t3-xlarge-us-east-2: cloud = True
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nirandaperera commented on pull request #10679: ARROW-13170 [C++] Reducing branching in compute/kernels/vector_selection.cc

Posted by GitBox <gi...@apache.org>.

nirandaperera commented on pull request #10679:
URL: https://github.com/apache/arrow/pull/10679#issuecomment-879313446


   @ursabot please benchmark lang=C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nirandaperera commented on pull request #10679: ARROW-13170 [C++] Reducing branching in compute/kernels/vector_selection.cc

Posted by GitBox <gi...@apache.org>.

nirandaperera commented on pull request #10679:
URL: https://github.com/apache/arrow/pull/10679#issuecomment-887987917


   > It seems like breaking out the super-scaler variant of SetBitTo out into a separate PR would be a good thing regardless
   
   I will open a separate JIRA for this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot edited a comment on pull request #10679: ARROW-13170 [C++] Reducing branching in compute/kernels/vector_selection.cc

Posted by GitBox <gi...@apache.org>.

ursabot edited a comment on pull request #10679:
URL: https://github.com/apache/arrow/pull/10679#issuecomment-875950075


   Benchmark runs are scheduled for baseline = cf6a7ff65f4e2920641d116a3ba1f578b2bd8a9e and contender = 5a14b94046288938b22e1db1e7d0a5f6cc5d2fd1. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/96acb784c87842b2bfd28e17a9a90432...3465b868d70e425e8af9141550f9450f/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/5744b64d14d448198229be1dbb5265e7...87610e39461a4309accf9c5ce9f7f2f9/)
   [Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/cd0a0e80ad2c4de0b60cda38c58b64a4...ec5b59d942eb4e7794b7c7808006e223/)
   Supported benchmarks:
   ursa-i9-9960x: langs = Python, R
   ursa-thinkcentre-m75q: langs = C++, Java
   ec2-t3-xlarge-us-east-2: cloud = True
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] github-actions[bot] commented on pull request #10679: ARROW-13170 [C++] Reducing branching in compute/kernels/vector_selection.cc

Posted by GitBox <gi...@apache.org>.

github-actions[bot] commented on pull request #10679:
URL: https://github.com/apache/arrow/pull/10679#issuecomment-875916614


   https://issues.apache.org/jira/browse/ARROW-13170


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] wesm commented on pull request #10679: ARROW-13170 [C++] Reducing branching in compute/kernels/vector_selection.cc

Posted by GitBox <gi...@apache.org>.

wesm commented on pull request #10679:
URL: https://github.com/apache/arrow/pull/10679#issuecomment-887773427


   It seems like breaking out the super-scaler variant of SetBitTo out into a separate PR would be a good thing regardless


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] cyb70289 commented on pull request #10679: ARROW-13170 [C++] Reducing branching in compute/kernels/vector_selection.cc

Posted by GitBox <gi...@apache.org>.

cyb70289 commented on pull request #10679:
URL: https://github.com/apache/arrow/pull/10679#issuecomment-876164351


   Thanks @nirandaperera for doing this!
   
   There are big improvement for 50% selection case, when cpu branch prediction works worst. This is great.
   I'm a bit concerned about the _big drop of 1% selection_ case (looks useful in real world IMO).
   
   I tested on my xeon gold 5218 server and got similar result as yours. To ease debugging, I only listed FilterInt64FilterNoNulls tests.
   (NOTE: FilterRecordBatchXXX/100/X tests are pretty noisy per my experience, you may ignore them).
   
   ```
   $ archery benchmark diff --suite-filter=arrow-compute-vector-selection-benchmark \
                            --benchmark-filter="^FilterInt64FilterNoNulls" \
                            --cc=clang-10 --cxx=clang++-10
   
   -----------------------------------------------------------------------------------------------------------------
   Non-regressions: (11)
   -----------------------------------------------------------------------------------------------------------------
                             benchmark       baseline      contender  change 
   // XXX: big improvement for selection = 50%
    FilterInt64FilterNoNulls/1048576/4  1.033 GiB/sec  2.123 GiB/sec   105.545 {'data null%': 0.1,  'select%': 50.0}
    FilterInt64FilterNoNulls/1048576/7  1.031 GiB/sec  1.921 GiB/sec    86.369 {'data null%': 1.0,  'select%': 50.0}
   FilterInt64FilterNoNulls/1048576/10  1.055 GiB/sec  1.778 GiB/sec    68.505 {'data null%': 10.0, 'select%': 50.0}
   FilterInt64FilterNoNulls/1048576/13  1.054 GiB/sec  1.772 GiB/sec    68.161 {'data null%': 90.0, 'select%': 50.0}
   
   // XXX: no difference for selection = 99.9%
    FilterInt64FilterNoNulls/1048576/9  5.495 GiB/sec  5.744 GiB/sec     4.530 {'data null%': 10.0, 'select%': 99.9}
   FilterInt64FilterNoNulls/1048576/12  5.572 GiB/sec  5.693 GiB/sec     2.176 {'data null%': 90.0, 'select%': 99.9}
    FilterInt64FilterNoNulls/1048576/3  8.387 GiB/sec  8.431 GiB/sec     0.521 {'data null%': 0.1,  'select%': 99.9}
    FilterInt64FilterNoNulls/1048576/0 12.422 GiB/sec 12.417 GiB/sec    -0.040 {'data null%': 0.0,  'select%': 99.9}
    FilterInt64FilterNoNulls/1048576/6  6.787 GiB/sec  6.717 GiB/sec    -1.030 {'data null%': 1.0,  'select%': 99.9}
   
   // XXX: no difference if no nulls, regardless selection
    FilterInt64FilterNoNulls/1048576/1  1.927 GiB/sec  1.955 GiB/sec     1.470 {'data null%': 0.0,  'select%': 50.0}
    FilterInt64FilterNoNulls/1048576/2 31.374 GiB/sec 31.808 GiB/sec     1.383 {'data null%': 0.0,  'select%': 1.0}
   
   -----------------------------------------------------------------------------------------------------------------
   Regressions: (4)
   -----------------------------------------------------------------------------------------------------------------
                             benchmark      baseline     contender  change
   // XXX: big regression for selection = 1%
    FilterInt64FilterNoNulls/1048576/5 6.755 GiB/sec 3.766 GiB/sec   -44.239 {'data null%': 0.1,  'select%': 1.0}
    FilterInt64FilterNoNulls/1048576/8 6.766 GiB/sec 3.500 GiB/sec   -48.265 {'data null%': 1.0,  'select%': 1.0}
   FilterInt64FilterNoNulls/1048576/14 7.182 GiB/sec 3.271 GiB/sec   -54.453 {'data null%': 90.0, 'select%': 1.0}
   FilterInt64FilterNoNulls/1048576/11 7.183 GiB/sec 3.271 GiB/sec   -54.470 {'data null%': 10.0, 'select%': 1.0}
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nirandaperera commented on pull request #10679: ARROW-13170 [C++] Reducing branching in compute/kernels/vector_selection.cc

Posted by GitBox <gi...@apache.org>.

nirandaperera commented on pull request #10679:
URL: https://github.com/apache/arrow/pull/10679#issuecomment-884323959


   > I'm not fond of this PR. The fact that the results are rather mixed while it adds significant complexity to the implementation doesn't make it extremely desirable IMHO.
   
   I agree. I also have a problem with these regressions. I am planning to add some SIMD stuff into this, to see if we could get a better outcome in the low selectivity cases. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] cyb70289 commented on pull request #10679: ARROW-13170 [C++] Reducing branching in compute/kernels/vector_selection.cc

Posted by GitBox <gi...@apache.org>.

cyb70289 commented on pull request #10679:
URL: https://github.com/apache/arrow/pull/10679#issuecomment-880342643


   > @wesm @cyb70289 @bkietz Is there anything else we could do for the low selectivity cases (1% select)?
   
   I don't have satisfying suggestions.
   A possible workaround I guess is to choose branch/non-branch code per selectivity. Smells like "benchmark oriented optimization"?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot edited a comment on pull request #10679: ARROW-13170 [C++] Reducing branching in compute/kernels/vector_selection.cc

Posted by GitBox <gi...@apache.org>.

ursabot edited a comment on pull request #10679:
URL: https://github.com/apache/arrow/pull/10679#issuecomment-879314016


   Benchmark runs are scheduled for baseline = cf6a7ff65f4e2920641d116a3ba1f578b2bd8a9e and contender = 38110e8e7ee598ddb0e8a3465d81ea7e24bafebc. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Provided benchmark filters do not have any benchmark groups to be executed on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/96acb784c87842b2bfd28e17a9a90432...fdfaab8d2dc84fbd81318b663afe1e44/)
   [Skipped :warning: Only ['Python', 'R'] langs are supported on ursa-i9-9960x] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/5744b64d14d448198229be1dbb5265e7...a08372ce96df46c48f6874d84c74035c/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/cd0a0e80ad2c4de0b60cda38c58b64a4...9d55178f98434a64b8398f25458f5408/)
   Supported benchmarks:
   ursa-i9-9960x: langs = Python, R
   ursa-thinkcentre-m75q: langs = C++, Java
   ec2-t3-xlarge-us-east-2: cloud = True
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nirandaperera commented on pull request #10679: ARROW-13170 [C++] Reducing branching in compute/kernels/vector_selection.cc

Posted by GitBox <gi...@apache.org>.

nirandaperera commented on pull request #10679:
URL: https://github.com/apache/arrow/pull/10679#issuecomment-876664929


   @wesm with super scalar variant, I get the following,
   ```
   BEFORE:
   -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Non-regressions: (10)
   -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                   benchmark         baseline        contender  change %                                    counters
        FilterInt64FilterNoNulls/1048576/4    1.423 GiB/sec    2.658 GiB/sec    86.837  {'run_name': 'FilterInt64FilterNoNulls/1048576/4', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1021, 'data null%': 0.1, 'mask null%': 0.0, 'select%': 50.0}
        FilterInt64FilterNoNulls/1048576/7    1.367 GiB/sec    2.398 GiB/sec    75.409   {'run_name': 'FilterInt64FilterNoNulls/1048576/7', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 978, 'data null%': 1.0, 'mask null%': 0.0, 'select%': 50.0}
       FilterInt64FilterNoNulls/1048576/13    1.318 GiB/sec    2.223 GiB/sec    68.611 {'run_name': 'FilterInt64FilterNoNulls/1048576/13', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 936, 'data null%': 90.0, 'mask null%': 0.0, 'select%': 50.0}
       FilterInt64FilterNoNulls/1048576/10    1.323 GiB/sec    2.220 GiB/sec    67.824 {'run_name': 'FilterInt64FilterNoNulls/1048576/10', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 950, 'data null%': 10.0, 'mask null%': 0.0, 'select%': 50.0}
       FilterInt64FilterNoNulls/1048576/12    6.890 GiB/sec    7.092 GiB/sec     2.938{'run_name': 'FilterInt64FilterNoNulls/1048576/12', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 4949, 'data null%': 90.0, 'mask null%': 0.0, 'select%': 99.9}
        FilterInt64FilterNoNulls/1048576/1    2.358 GiB/sec    2.386 GiB/sec     1.166  {'run_name': 'FilterInt64FilterNoNulls/1048576/1', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1676, 'data null%': 0.0, 'mask null%': 0.0, 'select%': 50.0}
        FilterInt64FilterNoNulls/1048576/6    8.110 GiB/sec    8.034 GiB/sec    -0.940  {'run_name': 'FilterInt64FilterNoNulls/1048576/6', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 5779, 'data null%': 1.0, 'mask null%': 0.0, 'select%': 99.9}
        FilterInt64FilterNoNulls/1048576/2   39.687 GiB/sec   39.170 GiB/sec    -1.301  {'run_name': 'FilterInt64FilterNoNulls/1048576/2', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 28002, 'data null%': 0.0, 'mask null%': 0.0, 'select%': 1.0}
        FilterInt64FilterNoNulls/1048576/0   14.103 GiB/sec   13.782 GiB/sec    -2.278  {'run_name': 'FilterInt64FilterNoNulls/1048576/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 9845, 'data null%': 0.0, 'mask null%': 0.0, 'select%': 99.9}
        FilterInt64FilterNoNulls/1048576/9    7.006 GiB/sec    6.800 GiB/sec    -2.951 {'run_name': 'FilterInt64FilterNoNulls/1048576/9', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 5061, 'data null%': 10.0, 'mask null%': 0.0, 'select%': 99.9}
   
   ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Regressions: (5)
   ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                benchmark        baseline       contender  change %                                     counters
       FilterInt64FilterNoNulls/1048576/3   9.850 GiB/sec   9.107 GiB/sec    -7.538   {'run_name': 'FilterInt64FilterNoNulls/1048576/3', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 7039, 'data null%': 0.1, 'mask null%': 0.0, 'select%': 99.9}
       FilterInt64FilterNoNulls/1048576/5  10.072 GiB/sec   4.863 GiB/sec   -51.715    {'run_name': 'FilterInt64FilterNoNulls/1048576/5', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 7212, 'data null%': 0.1, 'mask null%': 0.0, 'select%': 1.0}
       FilterInt64FilterNoNulls/1048576/8   9.578 GiB/sec   4.409 GiB/sec   -53.966    {'run_name': 'FilterInt64FilterNoNulls/1048576/8', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6873, 'data null%': 1.0, 'mask null%': 0.0, 'select%': 1.0}
      FilterInt64FilterNoNulls/1048576/11   9.509 GiB/sec   4.080 GiB/sec   -57.096  {'run_name': 'FilterInt64FilterNoNulls/1048576/11', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6842, 'data null%': 10.0, 'mask null%': 0.0, 'select%': 1.0}
      FilterInt64FilterNoNulls/1048576/14   9.532 GiB/sec   4.084 GiB/sec   -57.154  {'run_name': 'FilterInt64FilterNoNulls/1048576/14', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6748, 'data null%': 90.0, 'mask null%': 0.0, 'select%': 1.0}
   
   
   AFTER:
   ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Non-regressions: (9)
   ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                             benchmark       baseline      contender  change %                                     counters
    FilterInt64FilterNoNulls/1048576/4  1.422 GiB/sec  2.976 GiB/sec   109.349   {'run_name': 'FilterInt64FilterNoNulls/1048576/4', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1017, 'data null%': 0.1, 'mask null%': 0.0, 'select%': 50.0}
    FilterInt64FilterNoNulls/1048576/7  1.363 GiB/sec  2.500 GiB/sec    83.386    {'run_name': 'FilterInt64FilterNoNulls/1048576/7', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 978, 'data null%': 1.0, 'mask null%': 0.0, 'select%': 50.0}
   FilterInt64FilterNoNulls/1048576/13  1.320 GiB/sec  2.216 GiB/sec    67.878  {'run_name': 'FilterInt64FilterNoNulls/1048576/13', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 945, 'data null%': 90.0, 'mask null%': 0.0, 'select%': 50.0}
   FilterInt64FilterNoNulls/1048576/10  1.322 GiB/sec  2.201 GiB/sec    66.533  {'run_name': 'FilterInt64FilterNoNulls/1048576/10', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 947, 'data null%': 10.0, 'mask null%': 0.0, 'select%': 50.0}
    FilterInt64FilterNoNulls/1048576/1  2.351 GiB/sec  2.627 GiB/sec    11.739   {'run_name': 'FilterInt64FilterNoNulls/1048576/1', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1683, 'data null%': 0.0, 'mask null%': 0.0, 'select%': 50.0}
    FilterInt64FilterNoNulls/1048576/0 13.413 GiB/sec 13.631 GiB/sec     1.628   {'run_name': 'FilterInt64FilterNoNulls/1048576/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 9605, 'data null%': 0.0, 'mask null%': 0.0, 'select%': 99.9}
    FilterInt64FilterNoNulls/1048576/6  7.672 GiB/sec  7.593 GiB/sec    -1.027   {'run_name': 'FilterInt64FilterNoNulls/1048576/6', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 5715, 'data null%': 1.0, 'mask null%': 0.0, 'select%': 99.9}
    FilterInt64FilterNoNulls/1048576/2 39.856 GiB/sec 38.819 GiB/sec    -2.600   {'run_name': 'FilterInt64FilterNoNulls/1048576/2', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 28575, 'data null%': 0.0, 'mask null%': 0.0, 'select%': 1.0}
   FilterInt64FilterNoNulls/1048576/12  6.806 GiB/sec  6.558 GiB/sec    -3.653 {'run_name': 'FilterInt64FilterNoNulls/1048576/12', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 4785, 'data null%': 90.0, 'mask null%': 0.0, 'select%': 99.9}
   
   ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Regressions: (6)
   ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                             benchmark       baseline     contender  change %                                    counters
    FilterInt64FilterNoNulls/1048576/9  6.843 GiB/sec 6.472 GiB/sec    -5.426 {'run_name': 'FilterInt64FilterNoNulls/1048576/9', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 4689, 'data null%': 10.0, 'mask null%': 0.0, 'select%': 99.9}
    FilterInt64FilterNoNulls/1048576/3  9.378 GiB/sec 7.717 GiB/sec   -17.710  {'run_name': 'FilterInt64FilterNoNulls/1048576/3', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6703, 'data null%': 0.1, 'mask null%': 0.0, 'select%': 99.9}
    FilterInt64FilterNoNulls/1048576/5 10.064 GiB/sec 5.390 GiB/sec   -46.442   {'run_name': 'FilterInt64FilterNoNulls/1048576/5', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 7216, 'data null%': 0.1, 'mask null%': 0.0, 'select%': 1.0}
    FilterInt64FilterNoNulls/1048576/8  9.600 GiB/sec 4.489 GiB/sec   -53.233   {'run_name': 'FilterInt64FilterNoNulls/1048576/8', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6865, 'data null%': 1.0, 'mask null%': 0.0, 'select%': 1.0}
   FilterInt64FilterNoNulls/1048576/11  9.529 GiB/sec 4.105 GiB/sec   -56.926 {'run_name': 'FilterInt64FilterNoNulls/1048576/11', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6825, 'data null%': 10.0, 'mask null%': 0.0, 'select%': 1.0}
   FilterInt64FilterNoNulls/1048576/14  9.537 GiB/sec 4.098 GiB/sec   -57.027 {'run_name': 'FilterInt64FilterNoNulls/1048576/14', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6825, 'data null%': 90.0, 'mask null%': 0.0, 'select%': 1.0}
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nirandaperera commented on pull request #10679: ARROW-13170 [C++] Reducing branching in compute/kernels/vector_selection.cc

Posted by GitBox <gi...@apache.org>.

nirandaperera commented on pull request #10679:
URL: https://github.com/apache/arrow/pull/10679#issuecomment-878574623


   @wesm @cyb70289 @bkietz Is there anything else we could do for the low selectivity cases (1% select)? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot edited a comment on pull request #10679: ARROW-13170 [C++] Reducing branching in compute/kernels/vector_selection.cc

Posted by GitBox <gi...@apache.org>.

ursabot edited a comment on pull request #10679:
URL: https://github.com/apache/arrow/pull/10679#issuecomment-875950075


   Benchmark runs are scheduled for baseline = cf6a7ff65f4e2920641d116a3ba1f578b2bd8a9e and contender = 5a14b94046288938b22e1db1e7d0a5f6cc5d2fd1. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/96acb784c87842b2bfd28e17a9a90432...3465b868d70e425e8af9141550f9450f/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/5744b64d14d448198229be1dbb5265e7...87610e39461a4309accf9c5ce9f7f2f9/)
   [Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/cd0a0e80ad2c4de0b60cda38c58b64a4...ec5b59d942eb4e7794b7c7808006e223/)
   Supported benchmarks:
   ursa-i9-9960x: langs = Python, R
   ursa-thinkcentre-m75q: langs = C++, Java
   ec2-t3-xlarge-us-east-2: cloud = True
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot commented on pull request #10679: ARROW-13170 [C++] Reducing branching in compute/kernels/vector_selection.cc

Posted by GitBox <gi...@apache.org>.

ursabot commented on pull request #10679:
URL: https://github.com/apache/arrow/pull/10679#issuecomment-875917401


   Benchmark runs are scheduled for baseline = 6c8d30ea82222fd2750b999840872d3f6cbdc8f8 and contender = c2d694b17596b13007bd54804d382808c60066aa. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/ead005de138847cdb35b900308a8716e...91280e77e83149fb857bf3153ece3a4e/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/5d26982bd71e45878dd5387f80c8d0f4...9475f85ca9e14856868ea301f4c6e7ea/)
   [Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/00761e2d135d41859abe0f44fca6ee61...d7af8eaebb1149308ccf7b8870fe84c9/)
   Supported benchmarks:
   ursa-i9-9960x: langs = Python, R
   ursa-thinkcentre-m75q: langs = C++, Java
   ec2-t3-xlarge-us-east-2: cloud = True
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nirandaperera commented on pull request #10679: ARROW-13170 [C++] Reducing branching in compute/kernels/vector_selection.cc

Posted by GitBox <gi...@apache.org>.

nirandaperera commented on pull request #10679:
URL: https://github.com/apache/arrow/pull/10679#issuecomment-879096754


   I believe AVX instructions like mask_store could also help in this use case. 
   https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=mask&cats=Load,Store


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nirandaperera commented on pull request #10679: ARROW-13170 [C++] Reducing branching in compute/kernels/vector_selection.cc

Posted by GitBox <gi...@apache.org>.

nirandaperera commented on pull request #10679:
URL: https://github.com/apache/arrow/pull/10679#issuecomment-875917243


   @ursabot please benchmark


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] cyb70289 closed pull request #10679: ARROW-13170 [C++] Reducing branching in compute/kernels/vector_selection.cc

Posted by GitBox <gi...@apache.org>.

cyb70289 closed pull request #10679:
URL: https://github.com/apache/arrow/pull/10679


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] pitrou commented on pull request #10679: ARROW-13170 [C++] Reducing branching in compute/kernels/vector_selection.cc

Posted by GitBox <gi...@apache.org>.

pitrou commented on pull request #10679:
URL: https://github.com/apache/arrow/pull/10679#issuecomment-897739852


   Should we close this? It was an interesting experiment but doesn't seem to give very convincing results.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot commented on pull request #10679: ARROW-13170 [C++] Reducing branching in compute/kernels/vector_selection.cc

Posted by GitBox <gi...@apache.org>.

ursabot commented on pull request #10679:
URL: https://github.com/apache/arrow/pull/10679#issuecomment-875950075


   Benchmark runs are scheduled for baseline = cf6a7ff65f4e2920641d116a3ba1f578b2bd8a9e and contender = 5a14b94046288938b22e1db1e7d0a5f6cc5d2fd1. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/96acb784c87842b2bfd28e17a9a90432...3465b868d70e425e8af9141550f9450f/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/5744b64d14d448198229be1dbb5265e7...87610e39461a4309accf9c5ce9f7f2f9/)
   [Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/cd0a0e80ad2c4de0b60cda38c58b64a4...ec5b59d942eb4e7794b7c7808006e223/)
   Supported benchmarks:
   ursa-i9-9960x: langs = Python, R
   ursa-thinkcentre-m75q: langs = C++, Java
   ec2-t3-xlarge-us-east-2: cloud = True
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] pitrou commented on pull request #10679: ARROW-13170 [C++] Reducing branching in compute/kernels/vector_selection.cc

Posted by GitBox <gi...@apache.org>.

pitrou commented on pull request #10679:
URL: https://github.com/apache/arrow/pull/10679#issuecomment-884317890


   I'm not fond of this PR. The fact that the results are rather mixed while it adds significant complexity to the implementation doesn't make it extremely desirable IMHO.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nirandaperera commented on pull request #10679: ARROW-13170 [C++] Reducing branching in compute/kernels/vector_selection.cc

Posted by GitBox <gi...@apache.org>.

nirandaperera commented on pull request #10679:
URL: https://github.com/apache/arrow/pull/10679#issuecomment-897787608


   Yes. I think so. I'd like to do some experiments with some AVX
   instructions. But I don't think I'd do that immediately.
   
   On Thu, Aug 12, 2021, 11:34 Antoine Pitrou ***@***.***> wrote:
   
   > Should we close this? It was an interesting experiment but doesn't seem to
   > give very convincing results.
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/arrow/pull/10679#issuecomment-897739852>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/ABKS65MUVO2ZQHB46JKHWWDT4PSYNANCNFSM477MIY3Q>
   > .
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nirandaperera commented on pull request #10679: ARROW-13170 [C++] Reducing branching in compute/kernels/vector_selection.cc

Posted by GitBox <gi...@apache.org>.

nirandaperera commented on pull request #10679:
URL: https://github.com/apache/arrow/pull/10679#issuecomment-875949918


   @ursabot please benchmark


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nirandaperera commented on pull request #10679: ARROW-13170 [C++] Reducing branching in compute/kernels/vector_selection.cc

Posted by GitBox <gi...@apache.org>.

nirandaperera commented on pull request #10679:
URL: https://github.com/apache/arrow/pull/10679#issuecomment-878573510


   I tested with a `VisitWords` impl for this. And it seems to have better results. 
   ```
   ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Non-regressions: (11)
   ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                             benchmark       baseline      contender  change %                                                                                                                                                                               counters
    FilterInt64FilterNoNulls/1048576/4  1.421 GiB/sec  2.958 GiB/sec   108.167   {'run_name': 'FilterInt64FilterNoNulls/1048576/4', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1019, 'data null%': 0.1, 'mask null%': 0.0, 'select%': 50.0}
   FilterInt64FilterNoNulls/1048576/13  1.318 GiB/sec  2.708 GiB/sec   105.439  {'run_name': 'FilterInt64FilterNoNulls/1048576/13', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 945, 'data null%': 90.0, 'mask null%': 0.0, 'select%': 50.0}
   FilterInt64FilterNoNulls/1048576/10  1.318 GiB/sec  2.704 GiB/sec   105.207  {'run_name': 'FilterInt64FilterNoNulls/1048576/10', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 944, 'data null%': 10.0, 'mask null%': 0.0, 'select%': 50.0}
    FilterInt64FilterNoNulls/1048576/7  1.364 GiB/sec  2.754 GiB/sec   101.937    {'run_name': 'FilterInt64FilterNoNulls/1048576/7', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 977, 'data null%': 1.0, 'mask null%': 0.0, 'select%': 50.0}
    FilterInt64FilterNoNulls/1048576/1  2.353 GiB/sec  2.621 GiB/sec    11.389   {'run_name': 'FilterInt64FilterNoNulls/1048576/1', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1689, 'data null%': 0.0, 'mask null%': 0.0, 'select%': 50.0}
   FilterInt64FilterNoNulls/1048576/12  6.524 GiB/sec  7.231 GiB/sec    10.830 {'run_name': 'FilterInt64FilterNoNulls/1048576/12', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 4783, 'data null%': 90.0, 'mask null%': 0.0, 'select%': 99.9}
    FilterInt64FilterNoNulls/1048576/2 39.394 GiB/sec 42.267 GiB/sec     7.293   {'run_name': 'FilterInt64FilterNoNulls/1048576/2', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 28183, 'data null%': 0.0, 'mask null%': 0.0, 'select%': 1.0}
    FilterInt64FilterNoNulls/1048576/9  6.925 GiB/sec  7.207 GiB/sec     4.077  {'run_name': 'FilterInt64FilterNoNulls/1048576/9', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 4905, 'data null%': 10.0, 'mask null%': 0.0, 'select%': 99.9}
    FilterInt64FilterNoNulls/1048576/6  7.873 GiB/sec  8.009 GiB/sec     1.730   {'run_name': 'FilterInt64FilterNoNulls/1048576/6', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 5643, 'data null%': 1.0, 'mask null%': 0.0, 'select%': 99.9}
    FilterInt64FilterNoNulls/1048576/3  9.167 GiB/sec  9.225 GiB/sec     0.637   {'run_name': 'FilterInt64FilterNoNulls/1048576/3', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6530, 'data null%': 0.1, 'mask null%': 0.0, 'select%': 99.9}
    FilterInt64FilterNoNulls/1048576/0 13.827 GiB/sec 13.744 GiB/sec    -0.597   {'run_name': 'FilterInt64FilterNoNulls/1048576/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 9834, 'data null%': 0.0, 'mask null%': 0.0, 'select%': 99.9}
   
   ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Regressions: (4)
   ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                             benchmark       baseline     contender  change %                                                                                                                                                                              counters
    FilterInt64FilterNoNulls/1048576/5 10.049 GiB/sec 5.493 GiB/sec   -45.344   {'run_name': 'FilterInt64FilterNoNulls/1048576/5', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 7199, 'data null%': 0.1, 'mask null%': 0.0, 'select%': 1.0}
    FilterInt64FilterNoNulls/1048576/8  9.571 GiB/sec 5.223 GiB/sec   -45.423   {'run_name': 'FilterInt64FilterNoNulls/1048576/8', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6869, 'data null%': 1.0, 'mask null%': 0.0, 'select%': 1.0}
   FilterInt64FilterNoNulls/1048576/11  9.494 GiB/sec 5.073 GiB/sec   -46.560 {'run_name': 'FilterInt64FilterNoNulls/1048576/11', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6836, 'data null%': 10.0, 'mask null%': 0.0, 'select%': 1.0}
   FilterInt64FilterNoNulls/1048576/14  9.517 GiB/sec 5.075 GiB/sec   -46.674 {'run_name': 'FilterInt64FilterNoNulls/1048576/14', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6829, 'data null%': 90.0, 'mask null%': 0.0, 'select%': 1.0}
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nirandaperera commented on pull request #10679: ARROW-13170 [C++] Reducing branching in compute/kernels/vector_selection.cc

Posted by GitBox <gi...@apache.org>.

nirandaperera commented on pull request #10679:
URL: https://github.com/apache/arrow/pull/10679#issuecomment-875997414


   These were the perf results from my local desktop. 
   https://gist.github.com/nirandaperera/dfafb77865e948514ca520162be10558 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org