You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/07/25 18:51:35 UTC
[GitHub] [arrow] yordan-pavlov edited a comment on pull request #7798: ARROW-9523 [Rust] Improve filter kernel performance

yordan-pavlov edited a comment on pull request #7798:
URL: https://github.com/apache/arrow/pull/7798#issuecomment-663888189


   @jorgecarleitao thanks for the feedback
   
   I did some more profiling specifically around the unsafe parts of the code and found that the safe version of `copy_null_bit` is just as fast so have removed that unsafe section; here are some benchmark results:
   
   copy_null_bit unsafe:
   filter context u8 w NULLs low selectivity  time:   [142.05 us 142.35 us 142.68 us]
   filter context u8 w NULLs high selectivity time:   [2.0915 us 2.1015 us 2.1127 us]
   
   copy_null_bit safe:
   filter context u8 w NULLs low selectivity  time:   [134.74 us 134.86 us 134.98 us]
   filter context u8 w NULLs high selectivity time:   [2.0536 us 2.0613 us 2.0707 us]
   
   I also benchmarked replacing the unsafe section in the `filter_array_impl` method with `value_buffer.write()` but this results in approximately 17% drop in performance with sparse filter arrays as can be seen from the benchmark results below:
   
   filter u8 low selectivity
                           time:   [131.08 us 132.46 us 134.27 us]
                           change: [+13.141% +17.189% +22.115%] (p = 0.00 < 0.05)
   
   filter context u8 low selectivity
                           time:   [127.47 us 129.27 us 131.56 us]
                           change: [+12.008% +19.674% +27.939%] (p = 0.00 < 0.05)
   
   filter context u8 w NULLs low selectivity
                           time:   [154.32 us 155.27 us 156.79 us]
                           change: [+15.444% +17.846% +22.268%] (p = 0.00 < 0.05)
   
   filter context f32 low selectivity
                           time:   [137.62 us 138.01 us 138.52 us]
                           change: [+12.495% +18.180% +23.088%] (p = 0.00 < 0.05)
   
   finally, looking at the C++ implementation inspired me to change the `filter_array_impl` method to add a special case where the 64bit filter batch is all 1s and this doesn't appear to reduce performance in other cases but improves performance of filtering with very dense filter arrays (almost all 1s) by about 20 times; here are the latest benchmark results:
   
   filter u8 low selectivity                      time:   [109.75 us 110.30 us 111.02 us]
   filter u8 high selectivity                     time:   [4.8372 us 4.8433 us 4.8502 us]
   filter u8 very low selectivity                 time:   [11.782 us 11.798 us 11.816 us]
   filter context u8 low selectivity              time:   [109.07 us 109.65 us 110.66 us]
   filter context u8 high selectivity             time:   [1.4704 us 1.4762 us 1.4842 us]
   filter context u8 very low selectivity         time:   [8.8455 us 9.0530 us 9.3171 us]
   filter context u8 w NULLs low selectivity      time:   [135.32 us 135.49 us 135.66 us]
   filter context u8 w NULLs high selectivity     time:   [2.0579 us 2.0680 us 2.0796 us]
   filter context u8 w NULLs very low selectivity time:   [11.583 us 11.668 us 11.780 us]
   filter context f32 low selectivity             time:   [138.71 us 139.83 us 141.41 us]
   filter context f32 high selectivity            time:   [1.6111 us 1.6342 us 1.6605 us]
   filter context f32 very low selectivity        time:   [19.719 us 19.865 us 20.045 us]
   
   @andygrove  any thoughts?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org