You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/30 22:55:07 UTC
[GitHub] [arrow-rs] tustvold commented on pull request #1248: POC: Specialized filter kernels

tustvold commented on pull request #1248:
URL: https://github.com/apache/arrow-rs/pull/1248#issuecomment-1025251778


   I found some time this afternoon, so bashed out porting the filter context abstraction (caching the selection vector) and fixing up the null buffer construction.
   
   Here's where we stand now, across the board about 10x performance uplift other than for high selectivity with nulls where the bottleneck on copying ranges of null bitmasks is unchanged
   
   ```
   filter u8               time:   [48.733 us 48.795 us 48.889 us]                       
                           change: [-90.138% -90.127% -90.116%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   filter u8 high selectivity                                                                             
                           time:   [2.4004 us 2.4048 us 2.4099 us]
                           change: [-81.016% -80.967% -80.915%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   filter u8 low selectivity                                                                             
                           time:   [1.4000 us 1.4015 us 1.4032 us]
                           change: [-67.378% -67.309% -67.246%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   filter context u8       time:   [14.783 us 14.798 us 14.816 us]                               
                           change: [-95.157% -95.150% -95.143%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   filter context u8 high selectivity                                                                             
                           time:   [1.1631 us 1.1636 us 1.1642 us]
                           change: [-85.209% -85.201% -85.192%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   filter context u8 low selectivity                                                                            
                           time:   [150.47 ns 150.64 ns 150.83 ns]
                           change: [-84.434% -84.380% -84.295%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   filter context u8 w NULLs                                                                             
                           time:   [40.762 us 40.771 us 40.781 us]
                           change: [-89.275% -89.267% -89.259%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   filter context u8 w NULLs high selectivity                                                                             
                           time:   [7.0784 us 7.0828 us 7.0876 us]
                           change: [+2.4840% +2.5549% +2.6258%] (p = 0.00 < 0.05)
                           Performance has regressed.
   
   filter context u8 w NULLs low selectivity                                                                            
                           time:   [267.79 ns 267.96 ns 268.12 ns]
                           change: [-72.091% -72.023% -71.946%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   filter f32              time:   [117.24 us 117.27 us 117.32 us]                       
                           change: [-79.651% -79.637% -79.623%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   filter context f32      time:   [41.474 us 41.486 us 41.499 us]                                
                           change: [-88.917% -88.911% -88.905%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   filter context f32 high selectivity                                                                             
                           time:   [11.289 us 11.292 us 11.294 us]
                           change: [+3.9848% +4.0461% +4.1142%] (p = 0.00 < 0.05)
                           Performance has regressed.
   
   filter context f32 low selectivity                                                                            
                           time:   [295.75 ns 296.19 ns 296.65 ns]
                           change: [-69.321% -69.181% -69.008%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   filter single record batch                                                                            
                           time:   [69.716 us 69.749 us 69.783 us]
                           change: [-86.024% -86.009% -85.998%] (p = 0.00 < 0.05)
                           Performance has improved.
   ```
   
   What is interesting, at least to me, is the performance tax imposed by the packed bitmask representation of BooleanArray - even with non-trivial bit-twiddling shenanigans it still appears to be more performant to hydrate the filter to an array of indices or slices when filtering multiple arrays. 
   
   ```
   filter optimize         time:   [45.865 us 45.990 us 46.096 us]                             
   
   filter optimize high selectivity                                                                             
                           time:   [1.6900 us 1.6910 us 1.6920 us]
   
   filter optimize low selectivity                                                                             
                           time:   [1.4087 us 1.4116 us 1.4149 us]
   ```
   
   Perhaps the compiler just struggles to auto-vectorise bitmask loops or something, not sure :sweat_smile: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org