You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/26 10:59:48 UTC

[GitHub] [arrow-rs] tustvold commented on pull request #1225: Improve MutableArrayData Null Handling (#1224) (#1230)

tustvold commented on pull request #1225:
URL: https://github.com/apache/arrow-rs/pull/1225#issuecomment-1022089589


   ```
   cargo criterion --bench filter_kernels
      Compiling arrow v8.0.0 (/home/raphael/repos/external/arrow-rs/arrow)
       Finished bench [optimized] target(s) in 18.23s
   filter u8               time:   [291.13 us 293.93 us 298.49 us]                      
                           change: [-40.935% -40.686% -40.239%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   filter u8 high selectivity                                                                             
                           time:   [5.8296 us 5.8316 us 5.8336 us]
                           change: [-54.079% -53.954% -53.829%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   filter u8 low selectivity                                                                             
                           time:   [3.7740 us 3.7783 us 3.7829 us]
                           change: [-12.217% -11.997% -11.788%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   filter context u8       time:   [105.74 us 105.76 us 105.80 us]                              
                           change: [-63.643% -63.614% -63.586%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   filter context u8 high selectivity                                                                             
                           time:   [1.3801 us 1.3816 us 1.3829 us]
                           change: [-82.396% -82.359% -82.319%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   filter context u8 low selectivity                                                                            
                           time:   [401.67 ns 401.79 ns 401.92 ns]
                           change: [-58.196% -58.112% -58.047%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   filter context u8 w NULLs                                                                            
                           time:   [427.53 us 427.66 us 427.80 us]
                           change: [+13.449% +13.527% +13.598%] (p = 0.00 < 0.05)
                           Performance has regressed.
   
   filter context u8 w NULLs high selectivity                                                                             
                           time:   [6.8897 us 6.8919 us 6.8946 us]
                           change: [+0.2869% +0.3711% +0.4592%] (p = 0.00 < 0.05)
                           Change within noise threshold.
   
   filter context u8 w NULLs low selectivity                                                                             
                           time:   [1.0082 us 1.0085 us 1.0088 us]
                           change: [+6.1612% +6.4041% +6.5859%] (p = 0.00 < 0.05)
                           Performance has regressed.
   
   filter f32              time:   [606.18 us 607.55 us 608.93 us]                       
                           change: [+6.1214% +6.3825% +6.6391%] (p = 0.00 < 0.05)
                           Performance has regressed.
   
   filter context f32      time:   [427.36 us 428.01 us 429.08 us]                               
                           change: [+12.435% +12.609% +12.799%] (p = 0.00 < 0.05)
                           Performance has regressed.
   
   filter context f32 high selectivity                                                                             
                           time:   [12.375 us 12.907 us 13.357 us]
                           change: [+1.5047% +4.5855% +7.1816%] (p = 0.00 < 0.05)
                           Performance has regressed.
   
   filter context f32 low selectivity                                                                             
                           time:   [1.0550 us 1.0552 us 1.0554 us]
                           change: [+8.6226% +9.4838% +10.093%] (p = 0.00 < 0.05)
                           Performance has regressed.
   
   filter context string   time:   [534.98 us 535.16 us 535.32 us]                                  
                           change: [+9.7285% +9.8604% +10.001%] (p = 0.00 < 0.05)
                           Performance has regressed.
   
   filter context string high selectivity                                                                            
                           time:   [402.80 us 402.92 us 403.03 us]
                           change: [-2.6457% -2.5796% -2.5140%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   filter context string low selectivity                                                                             
                           time:   [1.3243 us 1.3246 us 1.3249 us]
                           change: [+3.4003% +3.7378% +4.0158%] (p = 0.00 < 0.05)
                           Performance has regressed.
   
   filter single record batch                                                                            
                           time:   [286.47 us 286.77 us 287.09 us]
                           change: [-41.765% -41.668% -41.572%] (p = 0.00 < 0.05)
                           Performance has improved.
   ```
   
   So it makes filtering arrays without nulls about 50% faster, however, it does seem to make filtering arrays with nulls ~10% slower. This is likely down to the issue in #1229 , that the extend_bits function is ludicrously "hot" for these benchmarks where the runs are typically 1 or 2 elements long.
   
   I'd personally prefer to merge this as is and keep pushing forward, but I can also hold off on this until I've fixed #1229 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org