You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/11/15 01:39:01 UTC

[GitHub] [arrow] vertexclique opened a new pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

vertexclique opened a new pull request #8664:
URL: https://github.com/apache/arrow/pull/8664


   * Implements safe bit operations for Arrow
   * Implements `typed_bits` to get bits as Vec<bool>
   * Implements various bit operations for the use with arrow arrays
   * Adjusts parquet array reader to use Arrow bit operations
   
   
   
   <details>
   <summary>Benchmarks</summary>
   <br>
   <p>
   
   ```
        Running /home/vertexclique/projects/arrow/rust/target/release/deps/aggregate_kernels-c31fbf405e8694a5
   sum 512                 time:   [632.58 ns 632.79 ns 633.01 ns]                     
                           change: [-0.1888% -0.1141% -0.0410%] (p = 0.00 < 0.05)
                           Change within noise threshold.
   Found 9 outliers among 100 measurements (9.00%)
     9 (9.00%) high mild
   
   min 512                 time:   [918.14 ns 918.20 ns 918.27 ns]                     
                           change: [-0.2613% -0.0871% +0.0407%] (p = 0.32 > 0.05)
                           No change in performance detected.
   Found 5 outliers among 100 measurements (5.00%)
     1 (1.00%) low mild
     1 (1.00%) high mild
     3 (3.00%) high severe
   
   sum nulls 512           time:   [369.66 ns 369.75 ns 369.84 ns]                          
                           change: [-1.4167% -1.3636% -1.3233%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 9 outliers among 100 measurements (9.00%)
     3 (3.00%) low mild
     4 (4.00%) high mild
     2 (2.00%) high severe
   
   min nulls 512           time:   [1.9885 us 1.9900 us 1.9917 us]                           
                           change: [-2.6032% -2.3645% -2.1248%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 15 outliers among 100 measurements (15.00%)
     10 (10.00%) high mild
     5 (5.00%) high severe
   
        Running /home/vertexclique/projects/arrow/rust/target/release/deps/arithmetic_kernels-1984fccdba1970da
   add 512                 time:   [1.2587 us 1.2590 us 1.2593 us]                     
                           change: [-0.6426% -0.5921% -0.5463%] (p = 0.00 < 0.05)
                           Change within noise threshold.
   Found 9 outliers among 100 measurements (9.00%)
     1 (1.00%) low mild
     6 (6.00%) high mild
     2 (2.00%) high severe
   
   subtract 512            time:   [1.4423 us 1.4424 us 1.4425 us]                          
                           change: [-1.9454% -1.9192% -1.8910%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     1 (1.00%) low mild
     1 (1.00%) high mild
     1 (1.00%) high severe
   
   multiply 512            time:   [1.3578 us 1.3579 us 1.3580 us]                          
                           change: [-4.5031% -4.4893% -4.4759%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high severe
   
   divide 512              time:   [1.8784 us 1.8793 us 1.8803 us]                        
                           change: [-5.6385% -5.5871% -5.5370%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   limit 512, 512          time:   [123.77 ns 123.78 ns 123.80 ns]                           
                           change: [-0.5449% -0.5320% -0.5173%] (p = 0.00 < 0.05)
                           Change within noise threshold.
   Found 9 outliers among 100 measurements (9.00%)
     2 (2.00%) low mild
     6 (6.00%) high mild
     1 (1.00%) high severe
   
   add_nulls_512           time:   [1.3148 us 1.3150 us 1.3151 us]                           
                           change: [-8.4170% -8.4016% -8.3843%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     1 (1.00%) low mild
     2 (2.00%) high mild
     2 (2.00%) high severe
   
   divide_nulls_512        time:   [1.8536 us 1.8546 us 1.8556 us]                              
                           change: [-0.0625% -0.0094% +0.0469%] (p = 0.74 > 0.05)
                           No change in performance detected.
   
        Running /home/vertexclique/projects/arrow/rust/target/release/deps/array_from_vec-61dde2e42601af4d
   array_from_vec 128      time:   [460.91 ns 461.72 ns 462.49 ns]                               
                           change: [-1.5182% -1.2296% -0.9445%] (p = 0.00 < 0.05)
                           Change within noise threshold.
   
   array_from_vec 256      time:   [700.00 ns 701.51 ns 703.03 ns]                                
                           change: [-2.4928% -2.2326% -1.9810%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   array_from_vec 512      time:   [1.1866 us 1.1882 us 1.1898 us]                                
                           change: [-5.8191% -5.6620% -5.4929%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   array_string_from_vec 128                                                                             
                           time:   [2.0925 us 2.0934 us 2.0943 us]
                           change: [-6.1524% -6.1040% -6.0558%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   array_string_from_vec 256                                                                             
                           time:   [3.1637 us 3.1662 us 3.1686 us]
                           change: [-5.2043% -5.1092% -5.0175%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   array_string_from_vec 512                                                                             
                           time:   [5.4878 us 5.4891 us 5.4903 us]
                           change: [-2.7271% -2.6299% -2.5362%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) low mild
     1 (1.00%) high mild
   
   struct_array_from_vec 128                                                                             
                           time:   [3.5905 us 3.6011 us 3.6156 us]
                           change: [+0.5452% +0.7197% +0.9826%] (p = 0.00 < 0.05)
                           Change within noise threshold.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high severe
   
   struct_array_from_vec 256                                                                             
                           time:   [5.2740 us 5.2753 us 5.2767 us]
                           change: [+7.1818% +7.2231% +7.2645%] (p = 0.00 < 0.05)
                           Performance has regressed.
   
   struct_array_from_vec 512                                                                             
                           time:   [7.7544 us 7.7561 us 7.7580 us]
                           change: [+1.1725% +1.2180% +1.2635%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   struct_array_from_vec 1024                                                                             
                           time:   [12.600 us 12.605 us 12.609 us]
                           change: [+1.0007% +1.0620% +1.1174%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) low mild
     1 (1.00%) high severe
   
        Running /home/vertexclique/projects/arrow/rust/target/release/deps/boolean_kernels-44967fe6790f65f3
   and                     time:   [43.433 us 43.448 us 43.462 us]                 
                           change: [+11.734% +11.776% +11.817%] (p = 0.00 < 0.05)
                           Performance has regressed.
   
   or                      time:   [43.599 us 43.609 us 43.621 us]                
                           change: [+11.665% +11.716% +11.764%] (p = 0.00 < 0.05)
                           Performance has regressed.
   
   not                     time:   [21.861 us 21.869 us 21.876 us]                 
                           change: [+11.350% +11.485% +11.611%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 4 outliers among 100 measurements (4.00%)
     4 (4.00%) high mild
   
        Running /home/vertexclique/projects/arrow/rust/target/release/deps/buffer_bit_ops-db62104d61fab165
   buffer_bit_ops and      time:   [1.4752 us 1.4753 us 1.4754 us]                                
                           change: [+883.28% +883.79% +884.27%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 6 outliers among 100 measurements (6.00%)
     2 (2.00%) low mild
     1 (1.00%) high mild
     3 (3.00%) high severe
   
        Running /home/vertexclique/projects/arrow/rust/target/release/deps/builder-f619aaa37114abcd
   Benchmarking bench_primitive: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.8s, enable flat sampling, or reduce sample count to 50.
   bench_primitive         time:   [1.5379 ms 1.5381 ms 1.5383 ms]                             
                           thrpt:  [2.5393 GiB/s 2.5397 GiB/s 2.5400 GiB/s]
                    change:
                           time:   [+18.813% +18.858% +18.906%] (p = 0.00 < 0.05)
                           thrpt:  [-15.900% -15.866% -15.834%]
                           Performance has regressed.
   Found 8 outliers among 100 measurements (8.00%)
     1 (1.00%) low severe
     1 (1.00%) low mild
     2 (2.00%) high mild
     4 (4.00%) high severe
   
   bench_bool              time:   [3.4393 ms 3.4419 ms 3.4445 ms]                        
                           thrpt:  [145.16 MiB/s 145.27 MiB/s 145.38 MiB/s]
                    change:
                           time:   [+33.543% +33.660% +33.765%] (p = 0.00 < 0.05)
                           thrpt:  [-25.242% -25.184% -25.118%]
                           Performance has regressed.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
        Running /home/vertexclique/projects/arrow/rust/target/release/deps/cast_kernels-32b526d43dab4d1d
   cast int32 to int32 512 time:   [30.416 ns 30.426 ns 30.435 ns]                                     
                           change: [+2.9310% +2.9847% +3.0390%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 6 outliers among 100 measurements (6.00%)
     5 (5.00%) high mild
     1 (1.00%) high severe
   
   cast int32 to uint32 512                                                                             
                           time:   [10.468 us 10.470 us 10.471 us]
                           change: [-9.9982% -9.9821% -9.9668%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) low mild
     1 (1.00%) high severe
   
   cast int32 to float32 512                                                                             
                           time:   [10.071 us 10.073 us 10.075 us]
                           change: [-12.479% -12.462% -12.444%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     2 (2.00%) high mild
     1 (1.00%) high severe
   
   cast int32 to float64 512                                                                             
                           time:   [10.028 us 10.030 us 10.033 us]
                           change: [-12.811% -12.788% -12.766%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   cast int32 to int64 512 time:   [9.8698 us 9.8718 us 9.8741 us]                                     
                           change: [-14.016% -13.999% -13.982%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) low mild
   
   cast float32 to int32 512                                                                             
                           time:   [10.274 us 10.276 us 10.277 us]
                           change: [-12.382% -12.355% -12.317%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     2 (2.00%) high mild
     1 (1.00%) high severe
   
   cast float64 to float32 512                                                                             
                           time:   [10.114 us 10.116 us 10.117 us]
                           change: [-2.8473% -2.8224% -2.7997%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   cast float64 to uint64 512                                                                             
                           time:   [10.694 us 10.697 us 10.700 us]
                           change: [-11.879% -11.851% -11.826%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   cast int64 to int32 512 time:   [9.7565 us 9.7573 us 9.7583 us]                                     
                           change: [-17.410% -17.400% -17.391%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   cast date64 to date32 512                                                                             
                           time:   [26.983 us 26.986 us 26.989 us]
                           change: [+25.366% +25.546% +25.663%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 9 outliers among 100 measurements (9.00%)
     4 (4.00%) high mild
     5 (5.00%) high severe
   
   cast date32 to date64 512                                                                             
                           time:   [26.888 us 26.891 us 26.894 us]
                           change: [+27.476% +27.552% +27.602%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 3 outliers among 100 measurements (3.00%)
     2 (2.00%) high mild
     1 (1.00%) high severe
   
   cast time32s to time32ms 512                                                                             
                           time:   [1.5742 us 1.5745 us 1.5749 us]
                           change: [-16.374% -16.343% -16.311%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     3 (3.00%) high mild
     2 (2.00%) high severe
   
   cast time32s to time64us 512                                                                             
                           time:   [11.926 us 11.928 us 11.931 us]
                           change: [-13.407% -13.386% -13.368%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   cast time64ns to time32s 512                                                                             
                           time:   [29.751 us 29.754 us 29.756 us]
                           change: [+24.045% +24.068% +24.088%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   cast timestamp_ns to timestamp_s 512                                                                             
                           time:   [30.128 ns 30.132 ns 30.136 ns]
                           change: [-5.2715% -5.2598% -5.2480%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   cast timestamp_ms to timestamp_ns 512                                                                             
                           time:   [1.8053 us 1.8057 us 1.8061 us]
                           change: [-17.522% -17.470% -17.424%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     5 (5.00%) high mild
   
   cast timestamp_ms to i64 512                                                                            
                           time:   [178.12 ns 178.18 ns 178.24 ns]
                           change: [+0.9006% +0.9526% +0.9972%] (p = 0.00 < 0.05)
                           Change within noise threshold.
   Found 4 outliers among 100 measurements (4.00%)
     2 (2.00%) low severe
     2 (2.00%) low mild
   
        Running /home/vertexclique/projects/arrow/rust/target/release/deps/comparison_kernels-b7d9a0aa5c84846a
   Benchmarking eq Float32: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.6s, enable flat sampling, or reduce sample count to 50.
   eq Float32              time:   [1.5118 ms 1.5120 ms 1.5123 ms]                        
                           change: [+21.289% +21.316% +21.347%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 7 outliers among 100 measurements (7.00%)
     2 (2.00%) high mild
     5 (5.00%) high severe
   
   Benchmarking eq scalar Float32: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.2s, enable flat sampling, or reduce sample count to 50.
   eq scalar Float32       time:   [1.4161 ms 1.4162 ms 1.4162 ms]                               
                           change: [+26.675% +26.691% +26.707%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 8 outliers among 100 measurements (8.00%)
     1 (1.00%) low severe
     1 (1.00%) low mild
     4 (4.00%) high mild
     2 (2.00%) high severe
   
   Benchmarking neq Float32: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.9s, enable flat sampling, or reduce sample count to 50.
   neq Float32             time:   [1.3700 ms 1.3702 ms 1.3703 ms]                         
                           change: [+16.109% +16.168% +16.232%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 12 outliers among 100 measurements (12.00%)
     8 (8.00%) high mild
     4 (4.00%) high severe
   
   Benchmarking neq scalar Float32: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.2s, enable flat sampling, or reduce sample count to 50.
   neq scalar Float32      time:   [1.4162 ms 1.4163 ms 1.4164 ms]                                
                           change: [+26.163% +26.209% +26.253%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 10 outliers among 100 measurements (10.00%)
     1 (1.00%) low mild
     3 (3.00%) high mild
     6 (6.00%) high severe
   
   Benchmarking lt Float32: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.9s, enable flat sampling, or reduce sample count to 50.
   lt Float32              time:   [1.3708 ms 1.3710 ms 1.3713 ms]                        
                           change: [+16.056% +16.113% +16.168%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 8 outliers among 100 measurements (8.00%)
     6 (6.00%) high mild
     2 (2.00%) high severe
   
   Benchmarking lt scalar Float32: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.2s, enable flat sampling, or reduce sample count to 50.
   lt scalar Float32       time:   [1.4173 ms 1.4174 ms 1.4175 ms]                               
                           change: [+21.190% +21.234% +21.267%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 6 outliers among 100 measurements (6.00%)
     1 (1.00%) low severe
     3 (3.00%) high mild
     2 (2.00%) high severe
   
   Benchmarking lt_eq Float32: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.6s, enable flat sampling, or reduce sample count to 50.
   lt_eq Float32           time:   [1.5143 ms 1.5145 ms 1.5147 ms]                           
                           change: [+24.324% +24.349% +24.371%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 14 outliers among 100 measurements (14.00%)
     6 (6.00%) low mild
     3 (3.00%) high mild
     5 (5.00%) high severe
   
   Benchmarking lt_eq scalar Float32: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.5s, enable flat sampling, or reduce sample count to 50.
   lt_eq scalar Float32    time:   [1.4861 ms 1.4862 ms 1.4862 ms]                                  
                           change: [+29.290% +29.303% +29.316%] (p = 0.00 < 0.05)
                           Performance has regressed.
   
   Benchmarking gt Float32: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.9s, enable flat sampling, or reduce sample count to 50.
   gt Float32              time:   [1.3714 ms 1.3715 ms 1.3715 ms]                        
                           change: [+16.004% +16.055% +16.104%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 9 outliers among 100 measurements (9.00%)
     1 (1.00%) low severe
     2 (2.00%) low mild
     3 (3.00%) high mild
     3 (3.00%) high severe
   
   Benchmarking gt scalar Float32: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.8s, enable flat sampling, or reduce sample count to 60.
   gt scalar Float32       time:   [1.3463 ms 1.3465 ms 1.3469 ms]                               
                           change: [+23.254% +23.285% +23.316%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 3 outliers among 100 measurements (3.00%)
     1 (1.00%) high mild
     2 (2.00%) high severe
   
   Benchmarking gt_eq Float32: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.6s, enable flat sampling, or reduce sample count to 50.
   gt_eq Float32           time:   [1.5085 ms 1.5085 ms 1.5086 ms]                           
                           change: [+23.864% +23.878% +23.891%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 14 outliers among 100 measurements (14.00%)
     1 (1.00%) low mild
     9 (9.00%) high mild
     4 (4.00%) high severe
   
   Benchmarking gt_eq scalar Float32: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.2s, enable flat sampling, or reduce sample count to 50.
   gt_eq scalar Float32    time:   [1.4184 ms 1.4184 ms 1.4185 ms]                                  
                           change: [+22.218% +22.243% +22.268%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 7 outliers among 100 measurements (7.00%)
     4 (4.00%) high mild
     3 (3.00%) high severe
   
        Running /home/vertexclique/projects/arrow/rust/target/release/deps/csv_writer-4bae0c5626a4bf18
   record_batches_to_csv   time:   [97.536 us 97.731 us 97.920 us]                                  
                           change: [+20.400% +20.644% +20.907%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 17 outliers among 100 measurements (17.00%)
     9 (9.00%) low severe
     7 (7.00%) low mild
     1 (1.00%) high mild
   
        Running /home/vertexclique/projects/arrow/rust/target/release/deps/equal-03b760e11e404a08
   equal_512               time:   [44.182 ns 44.187 ns 44.193 ns]                       
                           change: [+3.9306% +3.9590% +3.9908%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 6 outliers among 100 measurements (6.00%)
     1 (1.00%) low mild
     1 (1.00%) high mild
     4 (4.00%) high severe
   
   equal_nulls_512         time:   [3.1297 us 3.1298 us 3.1299 us]                             
                           change: [-4.3580% -4.3531% -4.3480%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 11 outliers among 100 measurements (11.00%)
     4 (4.00%) low mild
     3 (3.00%) high mild
     4 (4.00%) high severe
   
   equal_string_512        time:   [62.525 ns 62.527 ns 62.528 ns]                             
                           change: [-0.5929% -0.5775% -0.5642%] (p = 0.00 < 0.05)
                           Change within noise threshold.
   Found 7 outliers among 100 measurements (7.00%)
     2 (2.00%) low mild
     2 (2.00%) high mild
     3 (3.00%) high severe
   
   equal_string_nulls_512  time:   [6.2088 us 6.2252 us 6.2409 us]                                    
                           change: [+44.524% +44.911% +45.229%] (p = 0.00 < 0.05)
                           Performance has regressed.
   
        Running /home/vertexclique/projects/arrow/rust/target/release/deps/filter_kernels-637318e1617ce1c9
   filter u8 low selectivity                                                                            
                           time:   [147.97 us 148.00 us 148.03 us]
                           change: [+24.688% +24.733% +24.774%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 15 outliers among 100 measurements (15.00%)
     2 (2.00%) low mild
     2 (2.00%) high mild
     11 (11.00%) high severe
   
   filter u8 high selectivity                                                                             
                           time:   [18.523 us 18.529 us 18.535 us]
                           change: [+122.09% +122.22% +122.33%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high mild
   
   filter u8 very low selectivity                                                                             
                           time:   [25.126 us 25.129 us 25.132 us]
                           change: [+72.083% +72.108% +72.134%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high severe
   
   filter context u8 low selectivity                                                                            
                           time:   [131.77 us 131.78 us 131.78 us]
                           change: [+15.022% +15.428% +15.674%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 12 outliers among 100 measurements (12.00%)
     3 (3.00%) low mild
     5 (5.00%) high mild
     4 (4.00%) high severe
   
   filter context u8 high selectivity                                                                             
                           time:   [2.5485 us 2.5490 us 2.5496 us]
                           change: [-19.434% -19.341% -19.256%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 10 outliers among 100 measurements (10.00%)
     4 (4.00%) low severe
     1 (1.00%) low mild
     2 (2.00%) high mild
     3 (3.00%) high severe
   
   filter context u8 very low selectivity                                                                             
                           time:   [9.0383 us 9.0390 us 9.0400 us]
                           change: [-2.9885% -2.9482% -2.9090%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   filter context u8 w NULLs low selectivity                                                                            
                           time:   [165.77 us 165.77 us 165.78 us]
                           change: [+6.0512% +6.1032% +6.1319%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 4 outliers among 100 measurements (4.00%)
     2 (2.00%) low mild
     1 (1.00%) high mild
     1 (1.00%) high severe
   
   filter context u8 w NULLs high selectivity                                                                             
                           time:   [2.9709 us 2.9731 us 2.9752 us]
                           change: [-16.600% -16.534% -16.477%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   filter context u8 w NULLs very low selectivity                                                                            
                           time:   [170.21 us 170.37 us 170.55 us]
                           change: [+16.476% +16.558% +16.648%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   filter context f32 low selectivity                                                                            
                           time:   [176.88 us 176.99 us 177.10 us]
                           change: [+12.562% +12.647% +12.729%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 18 outliers among 100 measurements (18.00%)
     18 (18.00%) low mild
   
   filter context f32 high selectivity                                                                             
                           time:   [3.1933 us 3.1961 us 3.2004 us]
                           change: [-14.557% -14.363% -14.096%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 17 outliers among 100 measurements (17.00%)
     12 (12.00%) low mild
     1 (1.00%) high mild
     4 (4.00%) high severe
   
   filter context f32 very low selectivity                                                                             
                           time:   [21.602 us 21.605 us 21.608 us]
                           change: [+1.4682% +1.4827% +1.4966%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) high mild
   
        Running /home/vertexclique/projects/arrow/rust/target/release/deps/length_kernel-c61545d424c00334
   length                  time:   [26.844 us 26.852 us 26.861 us]                    
                           change: [-26.430% -26.400% -26.372%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) low severe
   
        Running /home/vertexclique/projects/arrow/rust/target/release/deps/sort_kernel-d16c55a8008e1feb
   sort 2^10               time:   [432.77 us 432.79 us 432.81 us]                      
                           change: [+2.5326% +2.5395% +2.5485%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 4 outliers among 100 measurements (4.00%)
     4 (4.00%) high severe
   
   sort 2^12               time:   [2.0424 ms 2.0426 ms 2.0428 ms]                       
                           change: [+2.1088% +2.1227% +2.1353%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 3 outliers among 100 measurements (3.00%)
     1 (1.00%) high mild
     2 (2.00%) high severe
   
   sort nulls 2^10         time:   [475.22 us 475.26 us 475.31 us]                            
                           change: [+3.5253% +3.5358% +3.5463%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 11 outliers among 100 measurements (11.00%)
     2 (2.00%) low mild
     5 (5.00%) high mild
     4 (4.00%) high severe
   
   sort nulls 2^12         time:   [2.2075 ms 2.2077 ms 2.2079 ms]                             
                           change: [+1.4792% +1.4929% +1.5076%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 8 outliers among 100 measurements (8.00%)
     4 (4.00%) high mild
     4 (4.00%) high severe
   
        Running /home/vertexclique/projects/arrow/rust/target/release/deps/take_kernels-82f6f6cc00b32b54
   take i32 512            time:   [8.4713 us 8.4723 us 8.4734 us]                          
                           change: [+3.8200% +3.8378% +3.8541%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 3 outliers among 100 measurements (3.00%)
     2 (2.00%) high mild
     1 (1.00%) high severe
   
   take i32 1024           time:   [16.320 us 16.321 us 16.322 us]                           
                           change: [+3.2287% +3.2425% +3.2557%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 4 outliers among 100 measurements (4.00%)
     1 (1.00%) low mild
     2 (2.00%) high mild
     1 (1.00%) high severe
   
   take bool 512           time:   [8.7281 us 8.7294 us 8.7309 us]                           
                           change: [+9.3812% +9.4491% +9.5113%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 16 outliers among 100 measurements (16.00%)
     5 (5.00%) low severe
     2 (2.00%) low mild
     6 (6.00%) high mild
     3 (3.00%) high severe
   
   take bool 1024          time:   [16.007 us 16.008 us 16.010 us]                            
                           change: [+3.1302% +3.1785% +3.2161%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 4 outliers among 100 measurements (4.00%)
     2 (2.00%) high mild
     2 (2.00%) high severe
   
   take str 512            time:   [17.927 us 17.929 us 17.932 us]                          
                           change: [-2.8282% -2.8003% -2.7792%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     2 (2.00%) high mild
     4 (4.00%) high severe
   
   take str 1024           time:   [34.721 us 34.729 us 34.739 us]                           
                           change: [-3.4705% -3.4539% -3.4346%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     1 (1.00%) low mild
     1 (1.00%) high mild
     3 (3.00%) high severe
   
   ```
   
   </p>
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique commented on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-733276442


   Ok, so crucial operations are improved. I've updated the benchmarks. Since benchmarks with 512 elements fit most caches it creates unstable benchmarks. Kernels only can get better after this PR got merged and they are rewritten with parallel iterators. Feel free to benchmark this PR.
   
   @nevi-me @jorgecarleitao @alamb


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on a change in pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique commented on a change in pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#discussion_r525980049



##########
File path: rust/arrow/src/util/bit_ops.rs
##########
@@ -0,0 +1,407 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::buffer::Buffer;
+
+use bitvec::prelude::*;
+use bitvec::slice::ChunksExact;
+
+use std::fmt::Debug;
+
+///
+/// Immutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSlice<'a> {
+    buffer_data: &'a [u8],

Review comment:
       It was my thinking about it, I was skeptical about dispensing a new shared slice struct or carry on with the underlying one. Seems like carrying on with the underlying one makes life way easier.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on a change in pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique commented on a change in pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#discussion_r526122010



##########
File path: rust/arrow/src/util/bit_ops.rs
##########
@@ -0,0 +1,407 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::buffer::Buffer;
+
+use bitvec::prelude::*;
+use bitvec::slice::ChunksExact;
+
+use std::fmt::Debug;
+
+///
+/// Immutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSlice<'a> {
+    buffer_data: &'a [u8],
+    bit_slice: &'a BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSlice<'a> {
+    ///
+    /// Creates a immutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice(buffer_data).unwrap();
+
+        BufferBitSlice {
+            buffer_data,
+            bit_slice: &bit_slice,
+        }
+    }
+
+    ///
+    /// Returns immutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn view(&self, offset_in_bits: usize, len_in_bits: usize) -> Self {
+        Self {
+            buffer_data: self.buffer_data,
+            bit_slice: &self.bit_slice[offset_in_bits..offset_in_bits + len_in_bits],
+        }
+    }
+
+    ///
+    /// Returns bit chunks in native 64-bit allocation size.
+    /// Native representations in Arrow follows 64-bit convention.
+    /// Chunks can still be reinterpreted in any primitive type lower than u64.
+    #[inline]

Review comment:
       Addressed, I've added a doctest that is also explaining the rationale behind this. I hope you like it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on a change in pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique commented on a change in pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#discussion_r525984128



##########
File path: rust/arrow/src/util/bit_ops.rs
##########
@@ -0,0 +1,407 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::buffer::Buffer;
+
+use bitvec::prelude::*;
+use bitvec::slice::ChunksExact;
+
+use std::fmt::Debug;
+
+///
+/// Immutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSlice<'a> {
+    buffer_data: &'a [u8],
+    bit_slice: &'a BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSlice<'a> {
+    ///
+    /// Creates a immutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice(buffer_data).unwrap();
+
+        BufferBitSlice {
+            buffer_data,
+            bit_slice: &bit_slice,
+        }
+    }
+
+    ///
+    /// Returns immutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn view(&self, offset_in_bits: usize, len_in_bits: usize) -> Self {
+        Self {
+            buffer_data: self.buffer_data,
+            bit_slice: &self.bit_slice[offset_in_bits..offset_in_bits + len_in_bits],
+        }
+    }
+
+    ///
+    /// Returns bit chunks in native 64-bit allocation size.
+    /// Native representations in Arrow follows 64-bit convention.
+    /// Chunks can still be reinterpreted in any primitive type lower than u64.
+    #[inline]
+    pub fn chunks<T>(&self) -> BufferBitChunksExact<T>
+    where
+        T: BitMemory,
+    {
+        let offset_size_in_bits = 8 * std::mem::size_of::<T>();
+        let chunks_exact = self.bit_slice.chunks_exact(offset_size_in_bits);
+        let remainder_bits = chunks_exact.remainder();
+        let remainder: T = if remainder_bits.len() == 0 {
+            T::default()
+        } else {
+            remainder_bits.load::<T>()
+        };
+        BufferBitChunksExact {
+            chunks_exact,
+            remainder,
+            remainder_len_in_bits: remainder_bits.len(),
+        }
+    }
+
+    ///
+    /// Converts the bit view into the Buffer.
+    /// Buffer is always byte-aligned and well-aligned.

Review comment:
       Ok, I was thinking that you will tell me something about the explanation since it is not that clear. I know it is not clear totally but by byte-aligned I meant(also everyone meant) byte-alignment on structure level:
   https://en.wikipedia.org/wiki/Data_structure_alignment#Typical_alignment_of_C_structs_on_x86
   and well-aligned I meant pointer to the memory location:
   https://en.wikipedia.org/wiki/Data_structure_alignment#Hardware_significance_of_alignment_requirements 
   
   I am ok with any wording that will explain this briefly. I am totally ok with removing it too, but IMO it is important to tell to people that both things are aligned in Arrow Rust impl.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8664: ARROW-10588: [Rust] Safe and parallel bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#discussion_r530329221



##########
File path: rust/arrow/src/compute/kernels/aggregate.rs
##########
@@ -141,32 +146,41 @@ where
 
     match array.data().null_buffer() {
         None => {
-            let sum = data.iter().fold(T::default_value(), |accumulator, value| {
-                accumulator + *value
-            });
+            let total = data
+                .par_iter()

Review comment:
       `rayon` is pretty smart in this respect, as its execution model starts the first part of the iterator immediately. 
   
   I am thankful that @vertexclique introduced this, as I also believe that we greatly benefit from multi-threading at this level.
   
   What I am a bit unsure is whether there is any issue in nested spawning: we may want to fork in other places that are nested with this op. But I am confident that someone knows the answer (👀  to @vertexclique and @alamb).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on pull request #8664: ARROW-10588: [Rust] Safe and parallel bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique commented on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-733773942


   > Things to do after this pr:
   > * Other kernels can be improved by different prs
   
   I have already mentioned that here I think: https://github.com/apache/arrow/pull/8664#issuecomment-733276442


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique edited a comment on pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique edited a comment on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-733276442


   Ok, so crucial operations are improved. I've updated the benchmarks. Since benchmarks with 512 elements fit most caches it creates unstable benchmarks. Kernels only can get better after this PR got merged and they are rewritten with parallel iterators. Feel free to benchmark this PR.
   
   @nevi-me @jorgecarleitao @alamb
   
   Because bit ops performance has been improved these benchmarks have been improved significantly:
   ```
   
        Running /home/vertexclique/projects/arrow/rust/target/release/deps/comparison_kernels-f8939ec12975f45e
   eq Float32              time:   [36.413 us 36.429 us 36.447 us]                        
                           change: [-95.357% -95.353% -95.349%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   eq scalar Float32       time:   [33.540 us 33.551 us 33.562 us]                               
                           change: [-94.698% -94.690% -94.684%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high severe
   
   neq Float32             time:   [36.756 us 36.768 us 36.781 us]                         
                           change: [-94.049% -94.047% -94.045%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     4 (4.00%) high mild
     2 (2.00%) high severe
   
   neq scalar Float32      time:   [30.706 us 30.718 us 30.734 us]                                
                           change: [-95.092% -95.089% -95.086%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   lt Float32              time:   [36.489 us 36.498 us 36.509 us]                        
                           change: [-94.514% -94.504% -94.495%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
     3 (3.00%) high mild
     1 (1.00%) high severe
   
   lt scalar Float32       time:   [30.855 us 30.871 us 30.892 us]                               
                           change: [-94.996% -94.993% -94.990%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   lt_eq Float32           time:   [36.478 us 36.492 us 36.508 us]                           
                           change: [-94.700% -94.697% -94.695%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) high mild
     1 (1.00%) high severe
   
   lt_eq scalar Float32    time:   [32.641 us 32.653 us 32.668 us]                                  
                           change: [-95.305% -95.299% -95.292%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) high mild
     1 (1.00%) high severe
   
   gt Float32              time:   [36.647 us 36.658 us 36.672 us]                        
                           change: [-94.121% -94.119% -94.116%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     2 (2.00%) high mild
     1 (1.00%) high severe
   
   gt scalar Float32       time:   [36.541 us 36.562 us 36.583 us]                               
                           change: [-94.288% -94.280% -94.273%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) low mild
   
   gt_eq Float32           time:   [36.510 us 36.524 us 36.540 us]                           
                           change: [-95.402% -95.396% -95.390%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   gt_eq scalar Float32    time:   [32.121 us 32.141 us 32.163 us]                                  
                           change: [-94.877% -94.875% -94.873%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 13 outliers among 100 measurements (13.00%)
     12 (12.00%) low mild
     1 (1.00%) high mild
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on a change in pull request #8664: ARROW-10588: [Rust] Safe and parallel bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
nevi-me commented on a change in pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#discussion_r530293315



##########
File path: rust/arrow/Cargo.toml
##########
@@ -48,11 +49,12 @@ lazy_static = "1.4"
 packed_simd = { version = "0.3.4", optional = true, package = "packed_simd_2" }
 chrono = "0.4"
 flatbuffers = "0.6"
+bitvec = "0.19"
 hex = "0.4"
 prettytable-rs = { version = "0.8.0", optional = true }
 
 [features]
-default = []
+default = ["simd"]

Review comment:
       we probably shouldn't change the default to include simd, as we'd like the default features to allow users to compile with stable.

##########
File path: rust/arrow/src/compute/kernels/aggregate.rs
##########
@@ -141,32 +146,41 @@ where
 
     match array.data().null_buffer() {
         None => {
-            let sum = data.iter().fold(T::default_value(), |accumulator, value| {
-                accumulator + *value
-            });
+            let total = data
+                .par_iter()

Review comment:
       It's interesting that this is yielding better results. I would have thought that `rayon` being introduced at thsi level, would incur enough overhead to slow the kernels down. I've previously applied parallelism at an `Array` level instead.

##########
File path: rust/arrow/src/compute/kernels/aggregate.rs
##########
@@ -180,9 +194,12 @@ where
 ///
 /// Returns `None` if the array is empty or only contains null values.
 #[cfg(all(any(target_arch = "x86", target_arch = "x86_64"), feature = "simd"))]
-pub fn sum<T: ArrowNumericType>(array: &PrimitiveArray<T>) -> Option<T::Native>
+pub fn sum<T>(array: &PrimitiveArray<T>) -> Option<T::Native>
 where
-    T::Native: Add<Output = T::Native>,
+    T: ArrowNumericType,
+    // T::Native: Add<Output = T::Native> + Sum + Sum<T::Simd>,

Review comment:
       nit: should the comment be removed?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on a change in pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique commented on a change in pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#discussion_r526121662



##########
File path: rust/arrow/src/util/bit_ops.rs
##########
@@ -0,0 +1,407 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::buffer::Buffer;
+
+use bitvec::prelude::*;
+use bitvec::slice::ChunksExact;
+
+use std::fmt::Debug;
+
+///
+/// Immutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSlice<'a> {
+    buffer_data: &'a [u8],
+    bit_slice: &'a BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSlice<'a> {
+    ///
+    /// Creates a immutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice(buffer_data).unwrap();
+
+        BufferBitSlice {
+            buffer_data,
+            bit_slice: &bit_slice,
+        }
+    }
+
+    ///
+    /// Returns immutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn view(&self, offset_in_bits: usize, len_in_bits: usize) -> Self {

Review comment:
       Addressed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique commented on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-729933868


   Finished, I have addressed all the comments. @jorgecarleitao I saw that it is because of the unaligned bit checks and boundary checks in byte slices. Yet another addition comes from `as_buffer` method. Since we are allocating a buffer for it now to make it well aligned this is creating an overhead. I have tried to overcome that problem by writing unowned buffer but one test failed. I am also unsure that test is actually testing correctly or not.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#discussion_r526371716



##########
File path: rust/arrow/src/util/bit_ops.rs
##########
@@ -0,0 +1,407 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::buffer::Buffer;
+
+use bitvec::prelude::*;
+use bitvec::slice::ChunksExact;
+
+use std::fmt::Debug;
+
+///
+/// Immutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSlice<'a> {
+    buffer_data: &'a [u8],

Review comment:
       There is also now a nice symmetry with `BufferBitSliceMut` as well




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on a change in pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique commented on a change in pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#discussion_r526143753



##########
File path: rust/arrow/src/util/utils.rs
##########
@@ -0,0 +1,119 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Utils for working with bits
+
+#[cfg(feature = "simd")]
+use packed_simd::u8x64;
+
+/// Returns the nearest number that is `>=` than `num` and is a multiple of 64
+#[inline]
+pub fn round_upto_multiple_of_64(num: usize) -> usize {

Review comment:
       Forgot to remove it :man_facepalming: 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on a change in pull request #8664: ARROW-10588: [Rust] Safe and parallel bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique commented on a change in pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#discussion_r530327927



##########
File path: rust/arrow/src/compute/kernels/aggregate.rs
##########
@@ -141,32 +146,41 @@ where
 
     match array.data().null_buffer() {
         None => {
-            let sum = data.iter().fold(T::default_value(), |accumulator, value| {
-                accumulator + *value
-            });
+            let total = data
+                .par_iter()

Review comment:
       This is what MLK suggests, Array level non-primitive parallelism shared pointer overhead, that's the reason. I had so much bad time before with arc overhead in my projects.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8664: ARROW-10588: [Rust] Safe and parallel bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#discussion_r530331929



##########
File path: rust/arrow/src/util/bit_ops.rs
##########
@@ -0,0 +1,588 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::buffer::Buffer;
+
+use bitvec::prelude::*;
+use bitvec::slice::ChunksExact;
+
+use rayon::iter::plumbing::*;
+use rayon::prelude::*;
+use std::fmt::Debug;
+use std::marker::PhantomData as marker;
+
+///
+/// Immutable bit slice view of `Buffer` data.
+///
+/// `BufferBitSlice` does not own any underlying data, but rather wraps references
+/// to the underlying data in a `Buffer` and has methods for addressing and interacting with
+/// individual bits
+#[derive(Debug)]
+pub struct BufferBitSlice<'a> {
+    bit_slice: &'a BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSlice<'a> {
+    ///
+    /// Creates a immutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice(buffer_data).unwrap();
+
+        BufferBitSlice {
+            bit_slice: &bit_slice,
+        }
+    }
+
+    ///
+    /// Returns immutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn slicing(&self, offset_in_bits: usize, len_in_bits: usize) -> Self {
+        Self {
+            bit_slice: &self.bit_slice[offset_in_bits..offset_in_bits + len_in_bits],
+        }
+    }
+
+    ///
+    /// Returns bit chunks in given byte width.
+    /// This can be u64(native Arrow byte representation size) or any other unsigned primitive like:
+    /// u8, u16, u32, u128 and usize.
+    ///
+    /// This method is generic over the given primitives to enable user to filter out
+    /// any upper/lower nibble/s which is not used like:
+    ///
+    /// # Example
+    ///
+    /// ```
+    /// # use arrow::buffer::Buffer;
+    /// let input: &[u8] = &[
+    ///     0b11111111, 0b00000000, 0b11111111, 0b00000000,
+    ///     0b11111111, 0b00000000, 0b11111111, 0b00000000,
+    /// ];
+    ///
+    /// let buffer: Buffer = Buffer::from(input);
+    /// let bit_slice = buffer.bit_slice();
+    /// // Interpret bit slice as u8
+    /// let chunks = bit_slice.chunks::<u8>();
+    ///
+    /// // Filter out null bytes for compression
+    /// let bytes = chunks.into_native_iter().filter(|e| *e != 0x00_u8).collect::<Vec<u8>>();
+    /// assert_eq!(bytes.len(), 4);
+    /// ```
+    /// Native representations in Arrow follows 64-bit convention.
+    /// Chunks can still be reinterpreted in any primitive type lower than u64.
+    #[inline]
+    pub fn chunks<T>(&self) -> BufferBitChunksExact<T>
+    where
+        T: BitMemory,
+    {
+        let offset_size_in_bits = 8 * std::mem::size_of::<T>();
+        let chunks_exact = self.bit_slice.chunks_exact(offset_size_in_bits);
+        let remainder_bits = chunks_exact.remainder();
+        let remainder: T = if remainder_bits.is_empty() {
+            T::default()
+        } else {
+            remainder_bits.load::<T>()
+        };
+        BufferBitChunksExact {
+            chunks_exact,
+            remainder,
+            remainder_len_in_bits: remainder_bits.len(),
+        }
+    }
+
+    #[inline]
+    pub fn par_chunks<T>(&self) -> ParallelChunksExact<T>
+    where
+        T: BitMemory,
+    {
+        let offset_size_in_bits = 8 * std::mem::size_of::<T>();
+        let chunks_exact = self.bit_slice.chunks_exact(offset_size_in_bits);
+        let remainder_bits = chunks_exact.remainder();
+        let remainder: T = if remainder_bits.is_empty() {
+            T::default()
+        } else {
+            remainder_bits.load::<T>()
+        };
+        ParallelChunksExact {
+            bit_slice: self.bit_slice,
+            chunk_size: offset_size_in_bits,
+            remainder,
+            remainder_len_in_bits: remainder_bits.len(),
+        }
+    }
+
+    ///
+    /// Converts the bit view into a Buffer.
+    /// Buffer is always byte-aligned and it's pointer is aligned to size of u64.
+    #[inline]
+    pub fn as_buffer(&self) -> Buffer {
+        Buffer::from(self.bit_slice.as_slice())
+    }
+
+    ///
+    /// Count ones in the given bit view
+    #[inline]
+    pub fn count_ones(&self) -> usize {
+        self.bit_slice.count_ones()
+    }
+
+    ///
+    /// Count zeros in the given bit view
+    #[inline]
+    pub fn count_zeros(&self) -> usize {
+        self.bit_slice.count_zeros()
+    }
+
+    ///
+    /// Get bit value at the given index in this bit view
+    #[inline]
+    pub fn get_bit(&self, index: usize) -> bool {
+        *unsafe { self.bit_slice.get_unchecked(index) }
+    }
+
+    ///
+    /// Get bits in this view as vector of booleans
+    #[inline]
+    pub fn typed_bits(&self) -> Vec<bool> {
+        self.bit_slice.iter().copied().collect()
+    }
+
+    ///
+    /// Get manipulated data as byte slice
+    #[inline]
+    pub fn to_slice(&self) -> &[u8] {
+        self.bit_slice.as_slice()
+    }
+}
+
+impl<'a> PartialEq for BufferBitSlice<'a> {
+    fn eq(&self, other: &Self) -> bool {
+        self.bit_slice == other.bit_slice
+    }
+}
+
+///
+/// Conversion from mutable slice to immutable bit slice
+impl<'a> From<&'a [u8]> for BufferBitSlice<'a> {
+    fn from(data: &'a [u8]) -> Self {
+        BufferBitSlice::new(data)
+    }
+}
+
+///
+/// Mutable bit slice view of buffer data
+///
+/// `BufferBitSliceMut` does not own any underlying data, but rather
+/// has methods for addressing and interacting with individual bits.
+#[derive(Debug)]
+pub struct BufferBitSliceMut<'a> {
+    bit_slice: &'a mut BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSliceMut<'a> {
+    ///
+    /// Creates a mutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a mut [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice_mut(buffer_data).unwrap();
+
+        BufferBitSliceMut { bit_slice }
+    }
+
+    ///
+    /// Returns mutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn slicing(&'a mut self, offset_in_bits: usize, len_in_bits: usize) -> Self {
+        Self {
+            bit_slice: &mut self.bit_slice[offset_in_bits..offset_in_bits + len_in_bits],
+        }
+    }
+
+    ///
+    /// Sets all bits in this slice to the given value
+    #[inline]
+    pub fn set_bit_all(&mut self, value: bool) {
+        self.bit_slice.set_all(value)
+    }
+
+    ///
+    /// Set given bit at the position to a given value
+    #[inline]
+    pub fn set_bit(&mut self, index: usize, value: bool) {
+        unsafe { self.bit_slice.set_unchecked(index, value) }
+    }
+
+    ///
+    /// Converts the bit view into a Buffer.
+    /// Buffer is always byte-aligned and it's pointer is aligned to size of u64.
+    #[inline]
+    pub fn as_buffer(&self) -> Buffer {
+        Buffer::from(self.bit_slice.as_slice())
+    }
+
+    ///
+    /// Count ones in the given bit view
+    #[inline]
+    pub fn count_ones(&self) -> usize {
+        self.bit_slice.count_ones()
+    }
+
+    ///
+    /// Count zeros in the given bit view
+    #[inline]
+    pub fn count_zeros(&self) -> usize {
+        self.bit_slice.count_zeros()
+    }
+
+    ///
+    /// Get bit value at the given index in this bit view
+    #[inline]
+    pub fn get_bit(&self, index: usize) -> bool {

Review comment:
       can't this go out of bounds? What do you think about making this unsafe and expose a safe version with a check?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-729965696


   I ran the filtering selectivity benchmark on my laptop again and this time it shows a ~400% slowdown... So I wonder if that unsafe code is actually giving us a performance gain?
   
   results on master @ 0e2681fe52ea48cade10171a018554da05f95285
   ```
   cargo bench -- "filter u8 high selectivity"
   
   filter u8 high selectivity
                           time:   [3.5905 us 3.6379 us 3.6942 us]
                           time:   [3.3929 us 3.4179 us 3.4457 us]
                           time:   [3.3876 us 3.4167 us 3.4470 us]
                           time:   [3.4520 us 3.4747 us 3.4981 us]
                           time:   [3.4288 us 3.4538 us 3.4820 us]
   
   ```
   
   Then on this PR at @457d8c7303864673c25bcb4e95e002b9585e9c39
   
   ```
   filter u8 high selectivity
                           time:   [16.951 us 17.164 us 17.395 us]
                           time:   [16.649 us 16.886 us 17.136 us]
                           time:   [17.175 us 17.751 us 18.520 us]
                           time:   [16.028 us 16.109 us 16.196 us]
                           time:   [16.224 us 16.341 us 16.456 us]
   ```
   
   Which does seem like a substantial and significant performance regression.
   
   I double checked that I could reproduce the faster master numbers and I can
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on a change in pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique commented on a change in pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#discussion_r525984998



##########
File path: rust/arrow/src/compute/kernels/comparison.rs
##########
@@ -570,27 +570,26 @@ where
         ));
     }
 
-    let num_bytes = bit_util::ceil(left_len, 8);
+    let num_bytes = utils::ceil(left_len, 8);
 
     let not_both_null_bit_buffer =
         match combine_option_bitmap(left.data_ref(), right.data_ref(), left_len)? {
             Some(buff) => buff,
             None => new_all_set_buffer(num_bytes),
         };
-    let not_both_null_bitmap = not_both_null_bit_buffer.data();
+    let _not_both_null_bitmap = not_both_null_bit_buffer.data();

Review comment:
       Things that I forgot along the lines.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique commented on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-729991100


   Weird, on my machine when I pushed the initial implementation of this PR I got the numbers above. Seems like it is regressed for me too.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on a change in pull request #8664: ARROW-10588: [Rust] Safe and parallel bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique commented on a change in pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#discussion_r530326379



##########
File path: rust/arrow/Cargo.toml
##########
@@ -48,11 +49,12 @@ lazy_static = "1.4"
 packed_simd = { version = "0.3.4", optional = true, package = "packed_simd_2" }
 chrono = "0.4"
 flatbuffers = "0.6"
+bitvec = "0.19"
 hex = "0.4"
 prettytable-rs = { version = "0.8.0", optional = true }
 
 [features]
-default = []
+default = ["simd"]

Review comment:
       Oh forgot to set it back.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on a change in pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique commented on a change in pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#discussion_r526161079



##########
File path: rust/arrow/src/compute/kernels/aggregate.rs
##########
@@ -219,24 +223,27 @@ where
             let data_chunks = data.chunks_exact(64);
             let remainder = data_chunks.remainder();
 
-            let bit_chunks = buffer.bit_chunks(array.offset(), array.len());
+            let bit_slice = buffer.bit_slice().view(array.offset(), array.len());
+            let bit_chunks = bit_slice.chunks::<u64>();
             let remainder_bits = bit_chunks.remainder_bits();
 
-            data_chunks.zip(bit_chunks).for_each(|(chunk, mut mask)| {
-                // split chunks further into slices corresponding to the vector length
-                // the compiler is able to unroll this inner loop and remove bounds checks
-                // since the outer chunk size (64) is always a multiple of the number of lanes
-                chunk.chunks_exact(T::lanes()).for_each(|chunk| {
-                    let zero = T::init(T::default_value());
-                    let vecmask = T::mask_from_u64(mask);
-                    let chunk = T::load(&chunk);
-                    let blended = T::mask_select(vecmask, chunk, zero);
-
-                    vector_sum = vector_sum + blended;
-
-                    mask = mask >> T::lanes();
+            data_chunks
+                .zip(bit_chunks.interpret())

Review comment:
       Addressed.

##########
File path: rust/arrow/src/compute/kernels/comparison.rs
##########
@@ -570,27 +570,26 @@ where
         ));
     }
 
-    let num_bytes = bit_util::ceil(left_len, 8);
+    let num_bytes = utils::ceil(left_len, 8);
 
     let not_both_null_bit_buffer =
         match combine_option_bitmap(left.data_ref(), right.data_ref(), left_len)? {
             Some(buff) => buff,
             None => new_all_set_buffer(num_bytes),
         };
-    let not_both_null_bitmap = not_both_null_bit_buffer.data();
+    let _not_both_null_bitmap = not_both_null_bit_buffer.data();

Review comment:
       Addressed too.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb edited a comment on pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
alamb edited a comment on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-729965696


   I ran the filtering selectivity benchmark on my laptop again and this time it shows a ~400% slowdown... So I wonder if that unsafe code is actually giving us a performance gain?
   
   results on master @ 0e2681fe52ea48cade10171a018554da05f95285
   ```
   cargo bench -- "filter u8 high selectivity"
   
   filter u8 high selectivity
                           time:   [3.5905 us 3.6379 us 3.6942 us]
                           time:   [3.3929 us 3.4179 us 3.4457 us]
                           time:   [3.3876 us 3.4167 us 3.4470 us]
                           time:   [3.4520 us 3.4747 us 3.4981 us]
                           time:   [3.4288 us 3.4538 us 3.4820 us]
   
   ```
   
   Then on this PR at @ 457d8c7303864673c25bcb4e95e002b9585e9c39
   
   ```
   filter u8 high selectivity
                           time:   [16.951 us 17.164 us 17.395 us]
                           time:   [16.649 us 16.886 us 17.136 us]
                           time:   [17.175 us 17.751 us 18.520 us]
                           time:   [16.028 us 16.109 us 16.196 us]
                           time:   [16.224 us 16.341 us 16.456 us]
   ```
   
   Which does seem like a substantial and significant performance regression.
   
   I double checked that I could reproduce the faster master numbers and I can
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on a change in pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique commented on a change in pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#discussion_r526309397



##########
File path: rust/arrow/src/util/bit_ops.rs
##########
@@ -0,0 +1,407 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::buffer::Buffer;
+
+use bitvec::prelude::*;
+use bitvec::slice::ChunksExact;
+
+use std::fmt::Debug;
+
+///
+/// Immutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSlice<'a> {
+    buffer_data: &'a [u8],
+    bit_slice: &'a BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSlice<'a> {
+    ///
+    /// Creates a immutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice(buffer_data).unwrap();
+
+        BufferBitSlice {
+            buffer_data,
+            bit_slice: &bit_slice,
+        }
+    }
+
+    ///
+    /// Returns immutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn view(&self, offset_in_bits: usize, len_in_bits: usize) -> Self {
+        Self {
+            buffer_data: self.buffer_data,
+            bit_slice: &self.bit_slice[offset_in_bits..offset_in_bits + len_in_bits],
+        }
+    }
+
+    ///
+    /// Returns bit chunks in native 64-bit allocation size.
+    /// Native representations in Arrow follows 64-bit convention.
+    /// Chunks can still be reinterpreted in any primitive type lower than u64.
+    #[inline]
+    pub fn chunks<T>(&self) -> BufferBitChunksExact<T>
+    where
+        T: BitMemory,
+    {
+        let offset_size_in_bits = 8 * std::mem::size_of::<T>();
+        let chunks_exact = self.bit_slice.chunks_exact(offset_size_in_bits);
+        let remainder_bits = chunks_exact.remainder();
+        let remainder: T = if remainder_bits.len() == 0 {
+            T::default()
+        } else {
+            remainder_bits.load::<T>()
+        };
+        BufferBitChunksExact {
+            chunks_exact,
+            remainder,
+            remainder_len_in_bits: remainder_bits.len(),
+        }
+    }
+
+    ///
+    /// Converts the bit view into the Buffer.
+    /// Buffer is always byte-aligned and well-aligned.
+    #[inline]
+    pub fn as_buffer(&self) -> Buffer {
+        Buffer::from(self.bit_slice.as_slice())
+    }
+
+    ///
+    /// Count ones in the given bit view
+    #[inline]
+    pub fn count_ones(&self) -> usize {
+        self.bit_slice.count_ones()
+    }
+
+    ///
+    /// Count zeros in the given bit view
+    #[inline]
+    pub fn count_zeros(&self) -> usize {
+        self.bit_slice.count_zeros()
+    }
+
+    ///
+    /// Get bit value at the given index in this bit view
+    #[inline]
+    pub fn get_bit(&self, index: usize) -> bool {
+        *unsafe { self.bit_slice.get_unchecked(index) }
+    }
+
+    ///
+    /// Get bits in this view as vector of booleans
+    #[inline]
+    pub fn typed_bits(&self) -> Vec<bool> {
+        self.bit_slice.iter().map(|e| *e).collect()
+    }
+
+    ///
+    /// Get manipulated data as byte slice
+    #[inline]
+    pub fn to_slice(&self) -> &[u8] {
+        self.bit_slice.as_slice()
+    }
+}
+
+impl<'a> PartialEq for BufferBitSlice<'a> {
+    fn eq(&self, other: &Self) -> bool {
+        self.bit_slice == other.bit_slice
+    }
+}
+
+///
+/// Conversion from mutable slice to immutable bit slice
+impl<'a> From<&'a [u8]> for BufferBitSlice<'a> {
+    fn from(data: &'a [u8]) -> Self {
+        BufferBitSlice::new(data)
+    }
+}
+
+///
+/// Mutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSliceMut<'a> {
+    bit_slice: &'a mut BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSliceMut<'a> {
+    ///
+    /// Creates a mutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a mut [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice_mut(buffer_data).unwrap();
+
+        BufferBitSliceMut { bit_slice }
+    }
+
+    ///
+    /// Returns mutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn view(&'a mut self, offset_in_bits: usize, len_in_bits: usize) -> Self {
+        Self {
+            bit_slice: &mut self.bit_slice[offset_in_bits..offset_in_bits + len_in_bits],
+        }
+    }
+
+    ///
+    /// Set given bit at the position to a given value
+    #[inline]
+    pub fn set_bit_all(&mut self, value: bool) {
+        self.bit_slice.set_all(value)
+    }
+
+    ///
+    /// Count zeros in the given bit view
+    #[inline]
+    pub fn set_bit(&mut self, index: usize, value: bool) {
+        unsafe { self.bit_slice.set_unchecked(index, value) }
+    }
+
+    ///
+    /// Converts the bit view into the Buffer.
+    /// Buffer is always byte-aligned and well-aligned.

Review comment:
       Responded to the previous comment.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on a change in pull request #8664: ARROW-10588: [Rust] Safe and parallel bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique commented on a change in pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#discussion_r530343892



##########
File path: rust/arrow/src/compute/kernels/aggregate.rs
##########
@@ -141,32 +146,41 @@ where
 
     match array.data().null_buffer() {
         None => {
-            let sum = data.iter().fold(T::default_value(), |accumulator, value| {
-                accumulator + *value
-            });
+            let total = data
+                .par_iter()

Review comment:
       Rayon is initializing global pool with 1.5 * logical cores. Since they are all spawned but not joined threads, that will easily work from direct users of arrow. Closures that spawn on though, are executing like fork-join. Coming to the process forking point of view, while using this library as is, won't cause problems. In the nested operations case, they will be queued and spawned to any free slot, all operations are going through global producer-consumer hub inside rayon (take a gaze into bit ops code in this pr). Sometime later if a contributor comes and says that, "I want to configure tlp" then we can just expose the pool config by a method. But since it configures itself on the machine it's running we can directly skip that part.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique edited a comment on pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique edited a comment on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-733276442


   Ok, so crucial operations are improved. I've updated the benchmarks. Since benchmarks with 512 elements fit most caches it creates unstable benchmarks. Kernels only can get better after this PR got merged and they are rewritten with parallel iterators. Feel free to benchmark this PR.
   
   @nevi-me @jorgecarleitao @alamb
   
   Because bit ops performance has been improved these benchmarks have been improved significantly. Rest of the improvements are in the pr description:
   ```
   
        Running /home/vertexclique/projects/arrow/rust/target/release/deps/comparison_kernels-f8939ec12975f45e
   eq Float32              time:   [36.413 us 36.429 us 36.447 us]                        
                           change: [-95.357% -95.353% -95.349%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   eq scalar Float32       time:   [33.540 us 33.551 us 33.562 us]                               
                           change: [-94.698% -94.690% -94.684%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high severe
   
   neq Float32             time:   [36.756 us 36.768 us 36.781 us]                         
                           change: [-94.049% -94.047% -94.045%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     4 (4.00%) high mild
     2 (2.00%) high severe
   
   neq scalar Float32      time:   [30.706 us 30.718 us 30.734 us]                                
                           change: [-95.092% -95.089% -95.086%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   lt Float32              time:   [36.489 us 36.498 us 36.509 us]                        
                           change: [-94.514% -94.504% -94.495%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
     3 (3.00%) high mild
     1 (1.00%) high severe
   
   lt scalar Float32       time:   [30.855 us 30.871 us 30.892 us]                               
                           change: [-94.996% -94.993% -94.990%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   lt_eq Float32           time:   [36.478 us 36.492 us 36.508 us]                           
                           change: [-94.700% -94.697% -94.695%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) high mild
     1 (1.00%) high severe
   
   lt_eq scalar Float32    time:   [32.641 us 32.653 us 32.668 us]                                  
                           change: [-95.305% -95.299% -95.292%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) high mild
     1 (1.00%) high severe
   
   gt Float32              time:   [36.647 us 36.658 us 36.672 us]                        
                           change: [-94.121% -94.119% -94.116%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     2 (2.00%) high mild
     1 (1.00%) high severe
   
   gt scalar Float32       time:   [36.541 us 36.562 us 36.583 us]                               
                           change: [-94.288% -94.280% -94.273%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) low mild
   
   gt_eq Float32           time:   [36.510 us 36.524 us 36.540 us]                           
                           change: [-95.402% -95.396% -95.390%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   gt_eq scalar Float32    time:   [32.121 us 32.141 us 32.163 us]                                  
                           change: [-94.877% -94.875% -94.873%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 13 outliers among 100 measurements (13.00%)
     12 (12.00%) low mild
     1 (1.00%) high mild
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on pull request #8664: ARROW-10588: [Rust] Safe and parallel bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique commented on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-733966928


   Personally, I think violation against me also refers to the code of conduct even in two occurrences:
   
   > Be empathetic, welcoming, friendly, and patient. We work together to resolve conflict, assume good intentions, and do our best to act in an empathetic fashion. We may all experience some frustration from time to time, but we do not allow frustration to turn into a personal attack. A community where people feel uncomfortable or threatened is not a productive one. We should be respectful when dealing with other community members as well as with people outside our community.
   
   I tried my best to assume good intentions. Even there is no problem in PRs I have received comments which are not meant to bring any productivity but changing the entire aim of the PR.
   https://github.com/apache/arrow/pull/8609#discussion_r523458622
   https://github.com/apache/arrow/pull/8665#issuecomment-727847232
   But preventing PRs and keep raising debates(which are not related to the aim of the PR) in every PR is not what I expect as neither empathetic nor welcoming behavior.
   
   > Be collaborative. Our work will be used by other people, and in turn we will depend on the work of others. When we make something for the benefit of the project, we are willing to explain to others how it works, so that they can build on the work to make it even better. Any decision we make will affect users and colleagues, and we take those consequences seriously when making decisions.
   
   Which I did:
   * explained the safety of the bit operations
   * the other features that previous pr (https://github.com/apache/arrow/pull/8598) and this pr (https://github.com/apache/arrow/pull/8664) brought in. They come intrinsically with the change.
   * And in turn, it turned into a benchmarking problem:
   https://github.com/apache/arrow/pull/8598#issuecomment-723326224
   and the actual point of all these has been missed. I have tried to explain it in various ways but some got me, some didn't.
   
   I was thinking 3 months ago the project was going in a nice direction, I think it got prevented somehow and now it is not going very well from my point of view.
   
   Moreover in general also Apache Foundation's values described here not quite held in my opinion. https://www.apache.org/foundation/how-it-works.html#meritocracy 
   
   So, finally, I would like to say that, I am totally ok with not committing to this project. Feel free to close this PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on a change in pull request #8664: ARROW-10588: [Rust] Safe and parallel bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique commented on a change in pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#discussion_r530327927



##########
File path: rust/arrow/src/compute/kernels/aggregate.rs
##########
@@ -141,32 +146,41 @@ where
 
     match array.data().null_buffer() {
         None => {
-            let sum = data.iter().fold(T::default_value(), |accumulator, value| {
-                accumulator + *value
-            });
+            let total = data
+                .par_iter()

Review comment:
       This is what MLK suggests, Array level non-primitive parallelism has shared pointer overhead, that's the reason. I had so much bad time before with arc overhead in my projects.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on a change in pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique commented on a change in pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#discussion_r526122655



##########
File path: rust/arrow/src/util/bit_ops.rs
##########
@@ -0,0 +1,407 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::buffer::Buffer;
+
+use bitvec::prelude::*;
+use bitvec::slice::ChunksExact;
+
+use std::fmt::Debug;
+
+///
+/// Immutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSlice<'a> {
+    buffer_data: &'a [u8],
+    bit_slice: &'a BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSlice<'a> {
+    ///
+    /// Creates a immutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice(buffer_data).unwrap();
+
+        BufferBitSlice {
+            buffer_data,
+            bit_slice: &bit_slice,
+        }
+    }
+
+    ///
+    /// Returns immutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn view(&self, offset_in_bits: usize, len_in_bits: usize) -> Self {
+        Self {
+            buffer_data: self.buffer_data,
+            bit_slice: &self.bit_slice[offset_in_bits..offset_in_bits + len_in_bits],
+        }
+    }
+
+    ///
+    /// Returns bit chunks in native 64-bit allocation size.
+    /// Native representations in Arrow follows 64-bit convention.
+    /// Chunks can still be reinterpreted in any primitive type lower than u64.
+    #[inline]
+    pub fn chunks<T>(&self) -> BufferBitChunksExact<T>
+    where
+        T: BitMemory,
+    {
+        let offset_size_in_bits = 8 * std::mem::size_of::<T>();
+        let chunks_exact = self.bit_slice.chunks_exact(offset_size_in_bits);
+        let remainder_bits = chunks_exact.remainder();
+        let remainder: T = if remainder_bits.len() == 0 {
+            T::default()
+        } else {
+            remainder_bits.load::<T>()
+        };
+        BufferBitChunksExact {
+            chunks_exact,
+            remainder,
+            remainder_len_in_bits: remainder_bits.len(),
+        }
+    }
+
+    ///
+    /// Converts the bit view into the Buffer.
+    /// Buffer is always byte-aligned and well-aligned.
+    #[inline]
+    pub fn as_buffer(&self) -> Buffer {
+        Buffer::from(self.bit_slice.as_slice())
+    }
+
+    ///
+    /// Count ones in the given bit view
+    #[inline]
+    pub fn count_ones(&self) -> usize {
+        self.bit_slice.count_ones()
+    }
+
+    ///
+    /// Count zeros in the given bit view
+    #[inline]
+    pub fn count_zeros(&self) -> usize {
+        self.bit_slice.count_zeros()
+    }
+
+    ///
+    /// Get bit value at the given index in this bit view
+    #[inline]
+    pub fn get_bit(&self, index: usize) -> bool {
+        *unsafe { self.bit_slice.get_unchecked(index) }
+    }
+
+    ///
+    /// Get bits in this view as vector of booleans
+    #[inline]
+    pub fn typed_bits(&self) -> Vec<bool> {
+        self.bit_slice.iter().map(|e| *e).collect()
+    }
+
+    ///
+    /// Get manipulated data as byte slice
+    #[inline]
+    pub fn to_slice(&self) -> &[u8] {
+        self.bit_slice.as_slice()
+    }
+}
+
+impl<'a> PartialEq for BufferBitSlice<'a> {
+    fn eq(&self, other: &Self) -> bool {
+        self.bit_slice == other.bit_slice
+    }
+}
+
+///
+/// Conversion from mutable slice to immutable bit slice
+impl<'a> From<&'a [u8]> for BufferBitSlice<'a> {
+    fn from(data: &'a [u8]) -> Self {
+        BufferBitSlice::new(data)
+    }
+}
+
+///
+/// Mutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSliceMut<'a> {
+    bit_slice: &'a mut BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSliceMut<'a> {
+    ///
+    /// Creates a mutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a mut [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice_mut(buffer_data).unwrap();
+
+        BufferBitSliceMut { bit_slice }
+    }
+
+    ///
+    /// Returns mutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn view(&'a mut self, offset_in_bits: usize, len_in_bits: usize) -> Self {
+        Self {
+            bit_slice: &mut self.bit_slice[offset_in_bits..offset_in_bits + len_in_bits],
+        }
+    }
+
+    ///
+    /// Set given bit at the position to a given value
+    #[inline]
+    pub fn set_bit_all(&mut self, value: bool) {
+        self.bit_slice.set_all(value)
+    }
+
+    ///
+    /// Count zeros in the given bit view
+    #[inline]
+    pub fn set_bit(&mut self, index: usize, value: bool) {
+        unsafe { self.bit_slice.set_unchecked(index, value) }
+    }
+
+    ///
+    /// Converts the bit view into the Buffer.
+    /// Buffer is always byte-aligned and well-aligned.
+    #[inline]
+    pub fn as_buffer(&self) -> Buffer {
+        Buffer::from(self.bit_slice.as_slice())
+    }
+
+    ///
+    /// Count ones in the given bit view
+    #[inline]
+    pub fn count_ones(&self) -> usize {
+        self.bit_slice.count_ones()
+    }
+
+    ///
+    /// Count zeros in the given bit view
+    #[inline]
+    pub fn count_zeros(&self) -> usize {
+        self.bit_slice.count_zeros()
+    }
+
+    ///
+    /// Get bit value at the given index in this bit view
+    #[inline]
+    pub fn get_bit(&self, index: usize) -> bool {
+        *unsafe { self.bit_slice.get_unchecked(index) }
+    }
+
+    ///
+    /// Get bits in this view as vector of booleans
+    #[inline]
+    pub fn typed_bits(&self) -> Vec<bool> {
+        self.bit_slice.iter().map(|e| *e).collect()
+    }
+
+    ///
+    /// Get manipulated data as byte slice
+    #[inline]
+    pub fn to_slice(&self) -> &[u8] {
+        self.bit_slice.as_slice()
+    }
+}
+
+impl<'a> PartialEq for BufferBitSliceMut<'a> {
+    fn eq(&self, other: &Self) -> bool {
+        self.bit_slice == other.bit_slice
+    }
+}
+
+///
+/// Conversion from mutable slice to mutable bit slice
+impl<'a> From<&'a mut [u8]> for BufferBitSliceMut<'a> {
+    fn from(data: &'a mut [u8]) -> Self {
+        BufferBitSliceMut::new(data)
+    }
+}
+
+///
+/// Exact chunk view over the bit slice
+#[derive(Clone, Debug)]
+pub struct BufferBitChunksExact<'a, T>
+where
+    T: BitMemory,

Review comment:
       Wrote a doccomment and doctest there to make understanding and design rationale a little bit easier.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on a change in pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique commented on a change in pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#discussion_r526421965



##########
File path: rust/arrow/src/util/bit_ops.rs
##########
@@ -0,0 +1,407 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::buffer::Buffer;
+
+use bitvec::prelude::*;
+use bitvec::slice::ChunksExact;
+
+use std::fmt::Debug;
+
+///
+/// Immutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSlice<'a> {
+    buffer_data: &'a [u8],
+    bit_slice: &'a BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSlice<'a> {
+    ///
+    /// Creates a immutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice(buffer_data).unwrap();
+
+        BufferBitSlice {
+            buffer_data,
+            bit_slice: &bit_slice,
+        }
+    }
+
+    ///
+    /// Returns immutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn view(&self, offset_in_bits: usize, len_in_bits: usize) -> Self {
+        Self {
+            buffer_data: self.buffer_data,
+            bit_slice: &self.bit_slice[offset_in_bits..offset_in_bits + len_in_bits],
+        }
+    }
+
+    ///
+    /// Returns bit chunks in native 64-bit allocation size.
+    /// Native representations in Arrow follows 64-bit convention.
+    /// Chunks can still be reinterpreted in any primitive type lower than u64.
+    #[inline]
+    pub fn chunks<T>(&self) -> BufferBitChunksExact<T>
+    where
+        T: BitMemory,
+    {
+        let offset_size_in_bits = 8 * std::mem::size_of::<T>();
+        let chunks_exact = self.bit_slice.chunks_exact(offset_size_in_bits);
+        let remainder_bits = chunks_exact.remainder();
+        let remainder: T = if remainder_bits.len() == 0 {
+            T::default()
+        } else {
+            remainder_bits.load::<T>()
+        };
+        BufferBitChunksExact {
+            chunks_exact,
+            remainder,
+            remainder_len_in_bits: remainder_bits.len(),
+        }
+    }
+
+    ///
+    /// Converts the bit view into the Buffer.
+    /// Buffer is always byte-aligned and well-aligned.

Review comment:
       Reworded both occurrences with that.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #8664: ARROW-10588: [Rust] Safe and parallel bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-736096684


   I agree with @Dandandan 's comments 👍 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique closed pull request #8664: ARROW-10588: [Rust] Safe and parallel bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique closed pull request #8664:
URL: https://github.com/apache/arrow/pull/8664


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on a change in pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique commented on a change in pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#discussion_r526122170



##########
File path: rust/arrow/src/util/bit_ops.rs
##########
@@ -0,0 +1,407 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::buffer::Buffer;
+
+use bitvec::prelude::*;
+use bitvec::slice::ChunksExact;
+
+use std::fmt::Debug;
+
+///
+/// Immutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSlice<'a> {
+    buffer_data: &'a [u8],

Review comment:
       Addressed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique commented on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-727560018


   This is using safe bitvec interface to manage bits. I worked on top of my existing PR. Since that didn't change all code to use safe operations, now this does. I squashed all my work into a single commit and closing PR: https://github.com/apache/arrow/pull/8598
   
   Closed that with comment:  https://github.com/apache/arrow/pull/8598#issuecomment-727294219


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#discussion_r526364599



##########
File path: rust/arrow/src/util/bit_ops.rs
##########
@@ -0,0 +1,407 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::buffer::Buffer;
+
+use bitvec::prelude::*;
+use bitvec::slice::ChunksExact;
+
+use std::fmt::Debug;
+
+///
+/// Immutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSlice<'a> {
+    buffer_data: &'a [u8],
+    bit_slice: &'a BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSlice<'a> {
+    ///
+    /// Creates a immutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice(buffer_data).unwrap();
+
+        BufferBitSlice {
+            buffer_data,
+            bit_slice: &bit_slice,
+        }
+    }
+
+    ///
+    /// Returns immutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn view(&self, offset_in_bits: usize, len_in_bits: usize) -> Self {
+        Self {
+            buffer_data: self.buffer_data,
+            bit_slice: &self.bit_slice[offset_in_bits..offset_in_bits + len_in_bits],
+        }
+    }
+
+    ///
+    /// Returns bit chunks in native 64-bit allocation size.
+    /// Native representations in Arrow follows 64-bit convention.
+    /// Chunks can still be reinterpreted in any primitive type lower than u64.
+    #[inline]

Review comment:
       Looks good. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on pull request #8664: ARROW-10588: [Rust] Safe and parallel bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique commented on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-733737357


   > Conceptually this is not a small change. Personally I think parallelizing on the datafusion level and keeping kernels single-threaded is the better model.
   
   I don't think personally, I am experimenting and working on that based on proofs. It is not a better model, even Intel says that. So there is still no conflict for further parallelizing in the DataFusion (which is already done via reactor). We are using arrow in the company. and yet still, we haven't parallelized chunks which is also wrong in many directions.
   
   > The benchmark is now also testing a much larger array (2^20 elements) than what is usually used as a chunk size, so in reality the speedup due to parallelism would be much smaller.
   
   There are no predetermined chunk sizes and benchmarks shouldn't work on cache fittable data at all. That's against the benchmarking rule. Otherwise, you don't see the actual processing time, at all. Moreover, if this data format can't process large data on demand, there is no point in having this data format. (Users will think like that even I don't, which is also the most valid thing out there.) Also please don't alter my comments/words or use them against the pr.
   
   Please also stop undermining others' work and be honest about what has been done. I have already told you to do this way but you preferred sharing raw pointers in an unsafe context, which won't and never ever going to be parallelized without a full rewrite. This is that full rewrite.
   
   Finally, It is entirely unethical and dishonest to open PRs that I told/taught to you. I am not your rival, I don't see myself as a rival to anyone in any form, because there is no rivalry. This is an open-source environment. I don't see goodwill from your comments too. Also, some of your comments are giving false information (which I stopped giving feedback). I prefer instead of having counterproductive comments, productive comments from you. Even I told this plenty of times seems like that's not going to happen. So, please don't comment and review my PRs. Thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on a change in pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique commented on a change in pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#discussion_r526145990



##########
File path: rust/arrow/src/array/array_list.rs
##########
@@ -711,8 +722,9 @@ mod tests {
         assert_eq!(1, sliced_array.offset());
         assert_eq!(2, sliced_array.null_count());
 
+        let null_bit_slice = BufferBitSliceMut::new(&mut null_bits);

Review comment:
       Nice catch :+1: 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on pull request #8664: ARROW-10588: [Rust] Safe and parallel bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique commented on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-759441787


   @alamb Thanks for reaching out! I don't have time to work on these PRs. Closing.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-727293470


   https://issues.apache.org/jira/browse/ARROW-10588


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on a change in pull request #8664: ARROW-10588: [Rust] Safe and parallel bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique commented on a change in pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#discussion_r530327927



##########
File path: rust/arrow/src/compute/kernels/aggregate.rs
##########
@@ -141,32 +146,41 @@ where
 
     match array.data().null_buffer() {
         None => {
-            let sum = data.iter().fold(T::default_value(), |accumulator, value| {
-                accumulator + *value
-            });
+            let total = data
+                .par_iter()

Review comment:
       This is what MKL suggests, Array level non-primitive parallelism has shared pointer overhead, that's the reason. I had so much bad time before with arc overhead in my projects.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-727551719


   @vertexclique did you work on this on top of #8663 from @jhorstmann (or at least both PRs remove the popcnt table)?
   The changes here look reasonable to me, so I can review after a rebase, as I've just merged #8663


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #8664: ARROW-10588: [Rust] Safe and parallel bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-759438909


   @vertexclique  --  Given the imminent Arrow 3.0 release, I am trying to clean up older Rust PRs and see if the authors have plans to move them forward. 
   
   Do you plan on working on this PR in the near future? If not, should we close this PR until there is time to make progress? Thanks again for your contributions so far. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-732676657


   Thanks @vertexclique, that looks great. I'm here with the approach


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jhorstmann commented on pull request #8664: ARROW-10588: [Rust] Safe and parallel bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-733710712


   > Introducing thread parallelism at a compute kernel level
   
   Conceptually this is not a small change. Personally I think parallelizing on the datafusion level and keeping kernels single-threaded is the better model.
   
   The benchmark is now also testing a much larger array (2^20 elements) than what is usually used as a chunk size, so in reality the speedup due to parallelism would be much smaller.
   
   I'm totally fine with a small slowdown if that leads to cleaner and safer code.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#discussion_r526371341



##########
File path: rust/arrow/src/util/bit_ops.rs
##########
@@ -0,0 +1,407 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::buffer::Buffer;
+
+use bitvec::prelude::*;
+use bitvec::slice::ChunksExact;
+
+use std::fmt::Debug;
+
+///
+/// Immutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSlice<'a> {
+    buffer_data: &'a [u8],
+    bit_slice: &'a BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSlice<'a> {
+    ///
+    /// Creates a immutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice(buffer_data).unwrap();
+
+        BufferBitSlice {
+            buffer_data,
+            bit_slice: &bit_slice,
+        }
+    }
+
+    ///
+    /// Returns immutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn view(&self, offset_in_bits: usize, len_in_bits: usize) -> Self {
+        Self {
+            buffer_data: self.buffer_data,
+            bit_slice: &self.bit_slice[offset_in_bits..offset_in_bits + len_in_bits],
+        }
+    }
+
+    ///
+    /// Returns bit chunks in native 64-bit allocation size.
+    /// Native representations in Arrow follows 64-bit convention.
+    /// Chunks can still be reinterpreted in any primitive type lower than u64.
+    #[inline]
+    pub fn chunks<T>(&self) -> BufferBitChunksExact<T>
+    where
+        T: BitMemory,
+    {
+        let offset_size_in_bits = 8 * std::mem::size_of::<T>();
+        let chunks_exact = self.bit_slice.chunks_exact(offset_size_in_bits);
+        let remainder_bits = chunks_exact.remainder();
+        let remainder: T = if remainder_bits.len() == 0 {
+            T::default()
+        } else {
+            remainder_bits.load::<T>()
+        };
+        BufferBitChunksExact {
+            chunks_exact,
+            remainder,
+            remainder_len_in_bits: remainder_bits.len(),
+        }
+    }
+
+    ///
+    /// Converts the bit view into the Buffer.
+    /// Buffer is always byte-aligned and well-aligned.

Review comment:
       To me, `byte-aligned` means a pointer whose address points to the start of some byte in memory.
   
   I just haven't run across the term "well-aligned pointer" before and it isn't immediately google searchable -- https://www.google.com/search?q=well+aligned+pointer&oq=well+aligned+pointer
   
   Maybe "well-aligned" means "the pointer is aligned to the size of `T`"? 
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on a change in pull request #8664: ARROW-10588: [Rust] Safe and parallel bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique commented on a change in pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#discussion_r530328033



##########
File path: rust/arrow/src/compute/kernels/aggregate.rs
##########
@@ -180,9 +194,12 @@ where
 ///
 /// Returns `None` if the array is empty or only contains null values.
 #[cfg(all(any(target_arch = "x86", target_arch = "x86_64"), feature = "simd"))]
-pub fn sum<T: ArrowNumericType>(array: &PrimitiveArray<T>) -> Option<T::Native>
+pub fn sum<T>(array: &PrimitiveArray<T>) -> Option<T::Native>
 where
-    T::Native: Add<Output = T::Native>,
+    T: ArrowNumericType,
+    // T::Native: Add<Output = T::Native> + Sum + Sum<T::Simd>,

Review comment:
       Yeah, eagle eyes.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique commented on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-730066083


   So any idea how to proceed with this? Based on that I will close all related PRs. Though, seems like https://github.com/apache/arrow/pull/8688 can use these changes to build slice alignment on top.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique edited a comment on pull request #8664: ARROW-10588: [Rust] Safe and parallel bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique edited a comment on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-733276442


   Ok, so crucial operations are improved. I've updated the benchmarks. Since benchmarks with 512 elements fit most caches it creates unstable benchmarks. Kernels only can get better after this PR got merged and they are rewritten with parallel iterators. Feel free to benchmark this PR. 
   Things to do after this pr:
   * Other kernels can be improved by different prs
   * Some code can be removed. e.g. `mask_from_u64`. These are not needed and will improve performance.
   * Most of the operations are not operating on the larger data so benchmarks are kind of not reliable, these also can be changed by yet another PR.
   * We can start writing parallel code.
   
   @nevi-me @jorgecarleitao @alamb
   
   Because bit ops performance has been improved these benchmarks have been improved significantly. The rest of the improvements are in the pr description:
   ```
   
        Running /home/vertexclique/projects/arrow/rust/target/release/deps/comparison_kernels-f8939ec12975f45e
   eq Float32              time:   [36.413 us 36.429 us 36.447 us]                        
                           change: [-95.357% -95.353% -95.349%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   eq scalar Float32       time:   [33.540 us 33.551 us 33.562 us]                               
                           change: [-94.698% -94.690% -94.684%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high severe
   
   neq Float32             time:   [36.756 us 36.768 us 36.781 us]                         
                           change: [-94.049% -94.047% -94.045%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     4 (4.00%) high mild
     2 (2.00%) high severe
   
   neq scalar Float32      time:   [30.706 us 30.718 us 30.734 us]                                
                           change: [-95.092% -95.089% -95.086%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   lt Float32              time:   [36.489 us 36.498 us 36.509 us]                        
                           change: [-94.514% -94.504% -94.495%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
     3 (3.00%) high mild
     1 (1.00%) high severe
   
   lt scalar Float32       time:   [30.855 us 30.871 us 30.892 us]                               
                           change: [-94.996% -94.993% -94.990%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   lt_eq Float32           time:   [36.478 us 36.492 us 36.508 us]                           
                           change: [-94.700% -94.697% -94.695%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) high mild
     1 (1.00%) high severe
   
   lt_eq scalar Float32    time:   [32.641 us 32.653 us 32.668 us]                                  
                           change: [-95.305% -95.299% -95.292%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) high mild
     1 (1.00%) high severe
   
   gt Float32              time:   [36.647 us 36.658 us 36.672 us]                        
                           change: [-94.121% -94.119% -94.116%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     2 (2.00%) high mild
     1 (1.00%) high severe
   
   gt scalar Float32       time:   [36.541 us 36.562 us 36.583 us]                               
                           change: [-94.288% -94.280% -94.273%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) low mild
   
   gt_eq Float32           time:   [36.510 us 36.524 us 36.540 us]                           
                           change: [-95.402% -95.396% -95.390%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   gt_eq scalar Float32    time:   [32.121 us 32.141 us 32.163 us]                                  
                           change: [-94.877% -94.875% -94.873%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 13 outliers among 100 measurements (13.00%)
     12 (12.00%) low mild
     1 (1.00%) high mild
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on a change in pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique commented on a change in pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#discussion_r526358664



##########
File path: rust/arrow/src/buffer.rs
##########
@@ -258,39 +259,52 @@ impl Buffer {
     /// Returns a slice of this buffer starting at a certain bit offset.
     /// If the offset is byte-aligned the returned buffer is a shallow clone,
     /// otherwise a new buffer is allocated and filled with a copy of the bits in the range.
-    pub fn bit_slice(&self, offset: usize, len: usize) -> Self {
-        if offset % 8 == 0 && len % 8 == 0 {
-            return self.slice(offset / 8);
+    #[inline]
+    pub fn bit_view(&self, offset_in_bits: usize, len_in_bits: usize) -> Self {

Review comment:
       I am unsure about this suggestion tbh. Since it is giving a range and it is not important to the user how it gives. It feels like it is important to give a subview from the user's point of view rather than how it gives. Since most of our code is relying on aligned access. Feels like the second paragraph of the doc is mostly for us, arrow committers.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on pull request #8664: ARROW-10588: [Rust] Safe and parallel bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-733668680


   > @nevi-me , can I have 12h to review it?
   
   Yup, I more meant that if someone else picks up things they'd like addressed, we could open JIRAs for them instead of trying to address them as part of this PR (unless they're clear blockers). In any case, we're still weeks or at least 2 months before the next release, so we still have time even for potential blockers.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on a change in pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique commented on a change in pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#discussion_r526137607



##########
File path: rust/arrow/src/util/bit_ops.rs
##########
@@ -0,0 +1,407 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::buffer::Buffer;
+
+use bitvec::prelude::*;
+use bitvec::slice::ChunksExact;
+
+use std::fmt::Debug;
+
+///
+/// Immutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSlice<'a> {
+    buffer_data: &'a [u8],
+    bit_slice: &'a BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSlice<'a> {
+    ///
+    /// Creates a immutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice(buffer_data).unwrap();
+
+        BufferBitSlice {
+            buffer_data,
+            bit_slice: &bit_slice,
+        }
+    }
+
+    ///
+    /// Returns immutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn view(&self, offset_in_bits: usize, len_in_bits: usize) -> Self {
+        Self {
+            buffer_data: self.buffer_data,
+            bit_slice: &self.bit_slice[offset_in_bits..offset_in_bits + len_in_bits],
+        }
+    }
+
+    ///
+    /// Returns bit chunks in native 64-bit allocation size.
+    /// Native representations in Arrow follows 64-bit convention.
+    /// Chunks can still be reinterpreted in any primitive type lower than u64.
+    #[inline]
+    pub fn chunks<T>(&self) -> BufferBitChunksExact<T>
+    where
+        T: BitMemory,
+    {
+        let offset_size_in_bits = 8 * std::mem::size_of::<T>();
+        let chunks_exact = self.bit_slice.chunks_exact(offset_size_in_bits);
+        let remainder_bits = chunks_exact.remainder();
+        let remainder: T = if remainder_bits.len() == 0 {
+            T::default()
+        } else {
+            remainder_bits.load::<T>()
+        };
+        BufferBitChunksExact {
+            chunks_exact,
+            remainder,
+            remainder_len_in_bits: remainder_bits.len(),
+        }
+    }
+
+    ///
+    /// Converts the bit view into the Buffer.
+    /// Buffer is always byte-aligned and well-aligned.
+    #[inline]
+    pub fn as_buffer(&self) -> Buffer {
+        Buffer::from(self.bit_slice.as_slice())
+    }
+
+    ///
+    /// Count ones in the given bit view
+    #[inline]
+    pub fn count_ones(&self) -> usize {
+        self.bit_slice.count_ones()
+    }
+
+    ///
+    /// Count zeros in the given bit view
+    #[inline]
+    pub fn count_zeros(&self) -> usize {
+        self.bit_slice.count_zeros()
+    }
+
+    ///
+    /// Get bit value at the given index in this bit view
+    #[inline]
+    pub fn get_bit(&self, index: usize) -> bool {
+        *unsafe { self.bit_slice.get_unchecked(index) }
+    }
+
+    ///
+    /// Get bits in this view as vector of booleans
+    #[inline]
+    pub fn typed_bits(&self) -> Vec<bool> {
+        self.bit_slice.iter().map(|e| *e).collect()
+    }
+
+    ///
+    /// Get manipulated data as byte slice
+    #[inline]
+    pub fn to_slice(&self) -> &[u8] {
+        self.bit_slice.as_slice()
+    }
+}
+
+impl<'a> PartialEq for BufferBitSlice<'a> {
+    fn eq(&self, other: &Self) -> bool {
+        self.bit_slice == other.bit_slice
+    }
+}
+
+///
+/// Conversion from mutable slice to immutable bit slice
+impl<'a> From<&'a [u8]> for BufferBitSlice<'a> {
+    fn from(data: &'a [u8]) -> Self {
+        BufferBitSlice::new(data)
+    }
+}
+
+///
+/// Mutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSliceMut<'a> {
+    bit_slice: &'a mut BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSliceMut<'a> {
+    ///
+    /// Creates a mutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a mut [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice_mut(buffer_data).unwrap();
+
+        BufferBitSliceMut { bit_slice }
+    }
+
+    ///
+    /// Returns mutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn view(&'a mut self, offset_in_bits: usize, len_in_bits: usize) -> Self {
+        Self {
+            bit_slice: &mut self.bit_slice[offset_in_bits..offset_in_bits + len_in_bits],
+        }
+    }
+
+    ///
+    /// Set given bit at the position to a given value
+    #[inline]
+    pub fn set_bit_all(&mut self, value: bool) {
+        self.bit_slice.set_all(value)
+    }
+
+    ///
+    /// Count zeros in the given bit view
+    #[inline]
+    pub fn set_bit(&mut self, index: usize, value: bool) {
+        unsafe { self.bit_slice.set_unchecked(index, value) }
+    }
+
+    ///
+    /// Converts the bit view into the Buffer.
+    /// Buffer is always byte-aligned and well-aligned.
+    #[inline]
+    pub fn as_buffer(&self) -> Buffer {
+        Buffer::from(self.bit_slice.as_slice())
+    }
+
+    ///
+    /// Count ones in the given bit view
+    #[inline]
+    pub fn count_ones(&self) -> usize {
+        self.bit_slice.count_ones()
+    }
+
+    ///
+    /// Count zeros in the given bit view
+    #[inline]
+    pub fn count_zeros(&self) -> usize {
+        self.bit_slice.count_zeros()
+    }
+
+    ///
+    /// Get bit value at the given index in this bit view
+    #[inline]
+    pub fn get_bit(&self, index: usize) -> bool {
+        *unsafe { self.bit_slice.get_unchecked(index) }
+    }
+
+    ///
+    /// Get bits in this view as vector of booleans
+    #[inline]
+    pub fn typed_bits(&self) -> Vec<bool> {
+        self.bit_slice.iter().map(|e| *e).collect()
+    }
+
+    ///
+    /// Get manipulated data as byte slice
+    #[inline]
+    pub fn to_slice(&self) -> &[u8] {
+        self.bit_slice.as_slice()
+    }
+}
+
+impl<'a> PartialEq for BufferBitSliceMut<'a> {
+    fn eq(&self, other: &Self) -> bool {
+        self.bit_slice == other.bit_slice
+    }
+}
+
+///
+/// Conversion from mutable slice to mutable bit slice
+impl<'a> From<&'a mut [u8]> for BufferBitSliceMut<'a> {
+    fn from(data: &'a mut [u8]) -> Self {
+        BufferBitSliceMut::new(data)
+    }
+}
+
+///
+/// Exact chunk view over the bit slice
+#[derive(Clone, Debug)]
+pub struct BufferBitChunksExact<'a, T>
+where
+    T: BitMemory,
+{
+    chunks_exact: ChunksExact<'a, LocalBits, u8>,
+    remainder: T,
+    remainder_len_in_bits: usize,
+}
+
+impl<'a, T> BufferBitChunksExact<'a, T>
+where
+    T: BitMemory,
+{
+    ///
+    /// Returns remainder bit length from the exact chunk iterator
+    #[inline(always)]
+    pub fn remainder_bit_len(&self) -> usize {
+        self.remainder_len_in_bits
+    }
+
+    ///
+    /// Returns the remainder bits interpreted as given type.
+    #[inline(always)]
+    pub fn remainder_bits(&self) -> T {
+        self.remainder
+    }
+
+    ///
+    /// Interprets underlying chunk's view's bits as a given type.
+    #[inline(always)]
+    pub fn interpret(self) -> impl Iterator<Item = T> + 'a
+    where
+        T: BitMemory,
+    {
+        self.chunks_exact.map(|e| e.load::<T>())
+    }
+
+    ///
+    /// Returns underlying iterator as it is
+    #[inline(always)]
+    pub fn iter(&self) -> &ChunksExact<'a, LocalBits, u8> {
+        &self.chunks_exact
+    }
+}
+
+///
+/// Implements consuming iterator for exact chunk iterator
+impl<'a, T> IntoIterator for BufferBitChunksExact<'a, T>
+where
+    T: BitMemory,
+{
+    type Item = &'a BitSlice<LocalBits, u8>;
+    type IntoIter = ChunksExact<'a, LocalBits, u8>;
+
+    fn into_iter(self) -> Self::IntoIter {
+        self.chunks_exact
+    }
+}
+
+#[cfg(all(test, target_endian = "little"))]
+mod tests_bit_slices_little_endian {
+    use super::*;
+    use crate::datatypes::ToByteSlice;
+
+    #[test]
+    fn test_bit_slice_iter_aligned() {
+        let input: &[u8] = &[0, 1, 2, 3, 4, 5, 6, 7];
+        let buffer: Buffer = Buffer::from(input);
+
+        let bit_slice = buffer.bit_slice();
+        let result = bit_slice.chunks().interpret().collect::<Vec<u64>>();
+
+        assert_eq!(vec![0x0706050403020100], result);
+    }
+
+    #[test]
+    fn test_bit_slice_iter_unaligned() {
+        let input: &[u8] = &[
+            0b00000000, 0b00000001, 0b00000010, 0b00000100, 0b00001000, 0b00010000,
+            0b00100000, 0b01000000, 0b11111111,
+        ];
+        let buffer: Buffer = Buffer::from(input);
+
+        let bit_slice = buffer.bit_slice().view(4, 64);
+        let chunks = bit_slice.chunks::<u64>();
+
+        assert_eq!(0, chunks.remainder_bit_len());
+        assert_eq!(0, chunks.remainder_bits());
+
+        let result = chunks.interpret().collect::<Vec<u64>>();
+
+        assert_eq!(
+            vec![0b1111_01000000_00100000_00010000_00001000_00000100_00000010_00000001_0000],
+            result
+        );
+    }
+
+    #[test]
+    fn test_bit_slice_iter_unaligned_remainder_1_byte() {
+        let input: &[u8] = &[
+            0b00000000, 0b00000001, 0b00000010, 0b00000100, 0b00001000, 0b00010000,
+            0b00100000, 0b01000000, 0b11111111,
+        ];
+        let buffer: Buffer = Buffer::from(input);
+
+        let bit_slice = buffer.bit_slice().view(4, 66);
+        let chunks = bit_slice.chunks::<u64>();
+
+        assert_eq!(2, chunks.remainder_bit_len());
+        assert_eq!(0b00000011, chunks.remainder_bits());
+
+        let result = chunks.interpret().collect::<Vec<u64>>();
+
+        assert_eq!(
+            vec![0b1111_01000000_00100000_00010000_00001000_00000100_00000010_00000001_0000],

Review comment:
       Addressed like how you said.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on a change in pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique commented on a change in pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#discussion_r526390939



##########
File path: rust/arrow/src/util/bit_ops.rs
##########
@@ -0,0 +1,407 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::buffer::Buffer;
+
+use bitvec::prelude::*;
+use bitvec::slice::ChunksExact;
+
+use std::fmt::Debug;
+
+///
+/// Immutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSlice<'a> {
+    buffer_data: &'a [u8],
+    bit_slice: &'a BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSlice<'a> {
+    ///
+    /// Creates a immutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice(buffer_data).unwrap();
+
+        BufferBitSlice {
+            buffer_data,
+            bit_slice: &bit_slice,
+        }
+    }
+
+    ///
+    /// Returns immutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn view(&self, offset_in_bits: usize, len_in_bits: usize) -> Self {
+        Self {
+            buffer_data: self.buffer_data,
+            bit_slice: &self.bit_slice[offset_in_bits..offset_in_bits + len_in_bits],
+        }
+    }
+
+    ///
+    /// Returns bit chunks in native 64-bit allocation size.
+    /// Native representations in Arrow follows 64-bit convention.
+    /// Chunks can still be reinterpreted in any primitive type lower than u64.
+    #[inline]
+    pub fn chunks<T>(&self) -> BufferBitChunksExact<T>
+    where
+        T: BitMemory,
+    {
+        let offset_size_in_bits = 8 * std::mem::size_of::<T>();
+        let chunks_exact = self.bit_slice.chunks_exact(offset_size_in_bits);
+        let remainder_bits = chunks_exact.remainder();
+        let remainder: T = if remainder_bits.len() == 0 {
+            T::default()
+        } else {
+            remainder_bits.load::<T>()
+        };
+        BufferBitChunksExact {
+            chunks_exact,
+            remainder,
+            remainder_len_in_bits: remainder_bits.len(),
+        }
+    }
+
+    ///
+    /// Converts the bit view into the Buffer.
+    /// Buffer is always byte-aligned and well-aligned.

Review comment:
       Yes that's what I meant. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] paddyhoran commented on pull request #8664: ARROW-10588: [Rust] Safe and parallel bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
paddyhoran commented on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-734413544


   Hi @vertexclique.  
   
   All your contributions are very much appreciated.  You are one of the most advanced contributors to the project meaning that it's important for the other members of the community to be able to review and ask questions.  This may be frustrating for you but it is necessary.
   
   In general, smaller more focused PR's will be easier to merge.  Larger PR's or ones that are low level in nature need to be reviewed in more depth and we should be glad to have @jhorstmann and others to provide constructive feedback.  *Also, you may be asked to make changes to your PR's and you may not always agree.  This can always happen in a community run project.* 
   
   In general, you need to not take feedback so personally.  I have re-read the interactions you mentioned and I'm sorry I can't see the issues you are referring to and I think you were wrong to call out @jhorstmann.  On this PR I agreed with @jhorstmann's perspective, as did @alamb.  I at least thought the question was a good one to ask.
   
   On the other PR's you linked to it's clear that @jhorstmann is doing his best to be polite while providing feedback, for example:
    - `Just an idea, ...`
    - `Nice performance improvement! I'm a bit surprised... ` 
   
   Hopefully, you can move past this and keep contributing to the project.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#discussion_r525228463



##########
File path: rust/arrow/src/util/bit_ops.rs
##########
@@ -0,0 +1,407 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::buffer::Buffer;
+
+use bitvec::prelude::*;
+use bitvec::slice::ChunksExact;
+
+use std::fmt::Debug;
+
+///
+/// Immutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSlice<'a> {
+    buffer_data: &'a [u8],
+    bit_slice: &'a BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSlice<'a> {
+    ///
+    /// Creates a immutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice(buffer_data).unwrap();
+
+        BufferBitSlice {
+            buffer_data,
+            bit_slice: &bit_slice,
+        }
+    }
+
+    ///
+    /// Returns immutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn view(&self, offset_in_bits: usize, len_in_bits: usize) -> Self {
+        Self {
+            buffer_data: self.buffer_data,
+            bit_slice: &self.bit_slice[offset_in_bits..offset_in_bits + len_in_bits],
+        }
+    }
+
+    ///
+    /// Returns bit chunks in native 64-bit allocation size.
+    /// Native representations in Arrow follows 64-bit convention.
+    /// Chunks can still be reinterpreted in any primitive type lower than u64.
+    #[inline]

Review comment:
       I am a little confused about this function -- specifically, all instantiations  of `chunks<_>` that I could find use `u64` and the documentation on this function says it returns "native 64-bit alignment size", however, it is generic over  `T: BitMemory` suggesting it could be used for 
   
   I suggest  we  document what `T` type logically means / reasons someone would use something other than `u64` when invoking it in arrow, or perhaps just make `chunks` non generic and use `u64` directly in the implementation
   

##########
File path: rust/arrow/src/util/bit_ops.rs
##########
@@ -0,0 +1,407 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::buffer::Buffer;
+
+use bitvec::prelude::*;
+use bitvec::slice::ChunksExact;
+
+use std::fmt::Debug;
+
+///
+/// Immutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSlice<'a> {
+    buffer_data: &'a [u8],
+    bit_slice: &'a BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSlice<'a> {
+    ///
+    /// Creates a immutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice(buffer_data).unwrap();
+
+        BufferBitSlice {
+            buffer_data,
+            bit_slice: &bit_slice,
+        }
+    }
+
+    ///
+    /// Returns immutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn view(&self, offset_in_bits: usize, len_in_bits: usize) -> Self {

Review comment:
       Stylistically if this were  called `slice` it would keep the terminology more consistent and be easier to read. This function is effectively "slicing" the slice

##########
File path: rust/arrow/src/util/bit_ops.rs
##########
@@ -0,0 +1,407 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::buffer::Buffer;
+
+use bitvec::prelude::*;
+use bitvec::slice::ChunksExact;
+
+use std::fmt::Debug;
+
+///
+/// Immutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSlice<'a> {
+    buffer_data: &'a [u8],
+    bit_slice: &'a BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSlice<'a> {
+    ///
+    /// Creates a immutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice(buffer_data).unwrap();
+
+        BufferBitSlice {
+            buffer_data,
+            bit_slice: &bit_slice,
+        }
+    }
+
+    ///
+    /// Returns immutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn view(&self, offset_in_bits: usize, len_in_bits: usize) -> Self {
+        Self {
+            buffer_data: self.buffer_data,
+            bit_slice: &self.bit_slice[offset_in_bits..offset_in_bits + len_in_bits],
+        }
+    }
+
+    ///
+    /// Returns bit chunks in native 64-bit allocation size.
+    /// Native representations in Arrow follows 64-bit convention.
+    /// Chunks can still be reinterpreted in any primitive type lower than u64.
+    #[inline]
+    pub fn chunks<T>(&self) -> BufferBitChunksExact<T>
+    where
+        T: BitMemory,
+    {
+        let offset_size_in_bits = 8 * std::mem::size_of::<T>();
+        let chunks_exact = self.bit_slice.chunks_exact(offset_size_in_bits);
+        let remainder_bits = chunks_exact.remainder();
+        let remainder: T = if remainder_bits.len() == 0 {
+            T::default()
+        } else {
+            remainder_bits.load::<T>()
+        };
+        BufferBitChunksExact {
+            chunks_exact,
+            remainder,
+            remainder_len_in_bits: remainder_bits.len(),
+        }
+    }
+
+    ///
+    /// Converts the bit view into the Buffer.
+    /// Buffer is always byte-aligned and well-aligned.
+    #[inline]
+    pub fn as_buffer(&self) -> Buffer {
+        Buffer::from(self.bit_slice.as_slice())
+    }
+
+    ///
+    /// Count ones in the given bit view
+    #[inline]
+    pub fn count_ones(&self) -> usize {
+        self.bit_slice.count_ones()
+    }
+
+    ///
+    /// Count zeros in the given bit view
+    #[inline]
+    pub fn count_zeros(&self) -> usize {
+        self.bit_slice.count_zeros()
+    }
+
+    ///
+    /// Get bit value at the given index in this bit view
+    #[inline]
+    pub fn get_bit(&self, index: usize) -> bool {
+        *unsafe { self.bit_slice.get_unchecked(index) }
+    }
+
+    ///
+    /// Get bits in this view as vector of booleans
+    #[inline]
+    pub fn typed_bits(&self) -> Vec<bool> {
+        self.bit_slice.iter().map(|e| *e).collect()
+    }
+
+    ///
+    /// Get manipulated data as byte slice
+    #[inline]
+    pub fn to_slice(&self) -> &[u8] {
+        self.bit_slice.as_slice()
+    }
+}
+
+impl<'a> PartialEq for BufferBitSlice<'a> {
+    fn eq(&self, other: &Self) -> bool {
+        self.bit_slice == other.bit_slice
+    }
+}
+
+///
+/// Conversion from mutable slice to immutable bit slice
+impl<'a> From<&'a [u8]> for BufferBitSlice<'a> {
+    fn from(data: &'a [u8]) -> Self {
+        BufferBitSlice::new(data)
+    }
+}
+
+///
+/// Mutable bit slice representation of buffer data

Review comment:
       ```suggestion
   ///
   /// Mutable bit slice view of buffer data
   ///
   /// `BufferBitSliceMut` does not own any underlying data, but rather 
   /// has methods for addressing and interacting with individual bits.
   ```

##########
File path: rust/arrow/src/util/bit_ops.rs
##########
@@ -0,0 +1,407 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::buffer::Buffer;
+
+use bitvec::prelude::*;
+use bitvec::slice::ChunksExact;
+
+use std::fmt::Debug;
+
+///
+/// Immutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSlice<'a> {
+    buffer_data: &'a [u8],
+    bit_slice: &'a BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSlice<'a> {
+    ///
+    /// Creates a immutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice(buffer_data).unwrap();
+
+        BufferBitSlice {
+            buffer_data,
+            bit_slice: &bit_slice,
+        }
+    }
+
+    ///
+    /// Returns immutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn view(&self, offset_in_bits: usize, len_in_bits: usize) -> Self {
+        Self {
+            buffer_data: self.buffer_data,
+            bit_slice: &self.bit_slice[offset_in_bits..offset_in_bits + len_in_bits],
+        }
+    }
+
+    ///
+    /// Returns bit chunks in native 64-bit allocation size.
+    /// Native representations in Arrow follows 64-bit convention.
+    /// Chunks can still be reinterpreted in any primitive type lower than u64.
+    #[inline]
+    pub fn chunks<T>(&self) -> BufferBitChunksExact<T>
+    where
+        T: BitMemory,
+    {
+        let offset_size_in_bits = 8 * std::mem::size_of::<T>();
+        let chunks_exact = self.bit_slice.chunks_exact(offset_size_in_bits);
+        let remainder_bits = chunks_exact.remainder();
+        let remainder: T = if remainder_bits.len() == 0 {
+            T::default()
+        } else {
+            remainder_bits.load::<T>()
+        };
+        BufferBitChunksExact {
+            chunks_exact,
+            remainder,
+            remainder_len_in_bits: remainder_bits.len(),
+        }
+    }
+
+    ///
+    /// Converts the bit view into the Buffer.
+    /// Buffer is always byte-aligned and well-aligned.
+    #[inline]
+    pub fn as_buffer(&self) -> Buffer {
+        Buffer::from(self.bit_slice.as_slice())
+    }
+
+    ///
+    /// Count ones in the given bit view
+    #[inline]
+    pub fn count_ones(&self) -> usize {
+        self.bit_slice.count_ones()
+    }
+
+    ///
+    /// Count zeros in the given bit view
+    #[inline]
+    pub fn count_zeros(&self) -> usize {
+        self.bit_slice.count_zeros()
+    }
+
+    ///
+    /// Get bit value at the given index in this bit view
+    #[inline]
+    pub fn get_bit(&self, index: usize) -> bool {
+        *unsafe { self.bit_slice.get_unchecked(index) }
+    }
+
+    ///
+    /// Get bits in this view as vector of booleans
+    #[inline]
+    pub fn typed_bits(&self) -> Vec<bool> {
+        self.bit_slice.iter().map(|e| *e).collect()
+    }
+
+    ///
+    /// Get manipulated data as byte slice
+    #[inline]
+    pub fn to_slice(&self) -> &[u8] {
+        self.bit_slice.as_slice()
+    }
+}
+
+impl<'a> PartialEq for BufferBitSlice<'a> {
+    fn eq(&self, other: &Self) -> bool {
+        self.bit_slice == other.bit_slice
+    }
+}
+
+///
+/// Conversion from mutable slice to immutable bit slice
+impl<'a> From<&'a [u8]> for BufferBitSlice<'a> {
+    fn from(data: &'a [u8]) -> Self {
+        BufferBitSlice::new(data)
+    }
+}
+
+///
+/// Mutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSliceMut<'a> {
+    bit_slice: &'a mut BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSliceMut<'a> {
+    ///
+    /// Creates a mutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a mut [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice_mut(buffer_data).unwrap();
+
+        BufferBitSliceMut { bit_slice }
+    }
+
+    ///
+    /// Returns mutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn view(&'a mut self, offset_in_bits: usize, len_in_bits: usize) -> Self {
+        Self {
+            bit_slice: &mut self.bit_slice[offset_in_bits..offset_in_bits + len_in_bits],
+        }
+    }
+
+    ///
+    /// Set given bit at the position to a given value
+    #[inline]
+    pub fn set_bit_all(&mut self, value: bool) {
+        self.bit_slice.set_all(value)
+    }
+
+    ///
+    /// Count zeros in the given bit view
+    #[inline]
+    pub fn set_bit(&mut self, index: usize, value: bool) {
+        unsafe { self.bit_slice.set_unchecked(index, value) }
+    }
+
+    ///
+    /// Converts the bit view into the Buffer.

Review comment:
       ```suggestion
       /// Converts the bit view into a Buffer.
   ```

##########
File path: rust/arrow/src/util/bit_ops.rs
##########
@@ -0,0 +1,407 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::buffer::Buffer;
+
+use bitvec::prelude::*;
+use bitvec::slice::ChunksExact;
+
+use std::fmt::Debug;
+
+///
+/// Immutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSlice<'a> {
+    buffer_data: &'a [u8],
+    bit_slice: &'a BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSlice<'a> {
+    ///
+    /// Creates a immutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice(buffer_data).unwrap();
+
+        BufferBitSlice {
+            buffer_data,
+            bit_slice: &bit_slice,
+        }
+    }
+
+    ///
+    /// Returns immutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn view(&self, offset_in_bits: usize, len_in_bits: usize) -> Self {
+        Self {
+            buffer_data: self.buffer_data,
+            bit_slice: &self.bit_slice[offset_in_bits..offset_in_bits + len_in_bits],
+        }
+    }
+
+    ///
+    /// Returns bit chunks in native 64-bit allocation size.
+    /// Native representations in Arrow follows 64-bit convention.
+    /// Chunks can still be reinterpreted in any primitive type lower than u64.
+    #[inline]
+    pub fn chunks<T>(&self) -> BufferBitChunksExact<T>
+    where
+        T: BitMemory,
+    {
+        let offset_size_in_bits = 8 * std::mem::size_of::<T>();
+        let chunks_exact = self.bit_slice.chunks_exact(offset_size_in_bits);
+        let remainder_bits = chunks_exact.remainder();
+        let remainder: T = if remainder_bits.len() == 0 {
+            T::default()
+        } else {
+            remainder_bits.load::<T>()
+        };
+        BufferBitChunksExact {
+            chunks_exact,
+            remainder,
+            remainder_len_in_bits: remainder_bits.len(),
+        }
+    }
+
+    ///
+    /// Converts the bit view into the Buffer.
+    /// Buffer is always byte-aligned and well-aligned.
+    #[inline]
+    pub fn as_buffer(&self) -> Buffer {
+        Buffer::from(self.bit_slice.as_slice())
+    }
+
+    ///
+    /// Count ones in the given bit view
+    #[inline]
+    pub fn count_ones(&self) -> usize {
+        self.bit_slice.count_ones()
+    }
+
+    ///
+    /// Count zeros in the given bit view
+    #[inline]
+    pub fn count_zeros(&self) -> usize {
+        self.bit_slice.count_zeros()
+    }
+
+    ///
+    /// Get bit value at the given index in this bit view
+    #[inline]
+    pub fn get_bit(&self, index: usize) -> bool {
+        *unsafe { self.bit_slice.get_unchecked(index) }
+    }
+
+    ///
+    /// Get bits in this view as vector of booleans
+    #[inline]
+    pub fn typed_bits(&self) -> Vec<bool> {
+        self.bit_slice.iter().map(|e| *e).collect()
+    }
+
+    ///
+    /// Get manipulated data as byte slice
+    #[inline]
+    pub fn to_slice(&self) -> &[u8] {
+        self.bit_slice.as_slice()
+    }
+}
+
+impl<'a> PartialEq for BufferBitSlice<'a> {
+    fn eq(&self, other: &Self) -> bool {
+        self.bit_slice == other.bit_slice
+    }
+}
+
+///
+/// Conversion from mutable slice to immutable bit slice
+impl<'a> From<&'a [u8]> for BufferBitSlice<'a> {
+    fn from(data: &'a [u8]) -> Self {
+        BufferBitSlice::new(data)
+    }
+}
+
+///
+/// Mutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSliceMut<'a> {
+    bit_slice: &'a mut BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSliceMut<'a> {
+    ///
+    /// Creates a mutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a mut [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice_mut(buffer_data).unwrap();
+
+        BufferBitSliceMut { bit_slice }
+    }
+
+    ///
+    /// Returns mutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn view(&'a mut self, offset_in_bits: usize, len_in_bits: usize) -> Self {
+        Self {
+            bit_slice: &mut self.bit_slice[offset_in_bits..offset_in_bits + len_in_bits],
+        }
+    }
+
+    ///
+    /// Set given bit at the position to a given value

Review comment:
       ```suggestion
       /// Sets all bits in this slice to the given value
   ```

##########
File path: rust/arrow/src/util/bit_ops.rs
##########
@@ -0,0 +1,407 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::buffer::Buffer;
+
+use bitvec::prelude::*;
+use bitvec::slice::ChunksExact;
+
+use std::fmt::Debug;
+
+///
+/// Immutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSlice<'a> {
+    buffer_data: &'a [u8],
+    bit_slice: &'a BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSlice<'a> {
+    ///
+    /// Creates a immutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice(buffer_data).unwrap();
+
+        BufferBitSlice {
+            buffer_data,
+            bit_slice: &bit_slice,
+        }
+    }
+
+    ///
+    /// Returns immutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn view(&self, offset_in_bits: usize, len_in_bits: usize) -> Self {
+        Self {
+            buffer_data: self.buffer_data,
+            bit_slice: &self.bit_slice[offset_in_bits..offset_in_bits + len_in_bits],
+        }
+    }
+
+    ///
+    /// Returns bit chunks in native 64-bit allocation size.
+    /// Native representations in Arrow follows 64-bit convention.
+    /// Chunks can still be reinterpreted in any primitive type lower than u64.
+    #[inline]
+    pub fn chunks<T>(&self) -> BufferBitChunksExact<T>
+    where
+        T: BitMemory,
+    {
+        let offset_size_in_bits = 8 * std::mem::size_of::<T>();
+        let chunks_exact = self.bit_slice.chunks_exact(offset_size_in_bits);
+        let remainder_bits = chunks_exact.remainder();
+        let remainder: T = if remainder_bits.len() == 0 {
+            T::default()
+        } else {
+            remainder_bits.load::<T>()
+        };
+        BufferBitChunksExact {
+            chunks_exact,
+            remainder,
+            remainder_len_in_bits: remainder_bits.len(),
+        }
+    }
+
+    ///
+    /// Converts the bit view into the Buffer.
+    /// Buffer is always byte-aligned and well-aligned.
+    #[inline]
+    pub fn as_buffer(&self) -> Buffer {
+        Buffer::from(self.bit_slice.as_slice())
+    }
+
+    ///
+    /// Count ones in the given bit view
+    #[inline]
+    pub fn count_ones(&self) -> usize {
+        self.bit_slice.count_ones()
+    }
+
+    ///
+    /// Count zeros in the given bit view
+    #[inline]
+    pub fn count_zeros(&self) -> usize {
+        self.bit_slice.count_zeros()
+    }
+
+    ///
+    /// Get bit value at the given index in this bit view
+    #[inline]
+    pub fn get_bit(&self, index: usize) -> bool {
+        *unsafe { self.bit_slice.get_unchecked(index) }
+    }
+
+    ///
+    /// Get bits in this view as vector of booleans
+    #[inline]
+    pub fn typed_bits(&self) -> Vec<bool> {
+        self.bit_slice.iter().map(|e| *e).collect()
+    }
+
+    ///
+    /// Get manipulated data as byte slice
+    #[inline]
+    pub fn to_slice(&self) -> &[u8] {
+        self.bit_slice.as_slice()
+    }
+}
+
+impl<'a> PartialEq for BufferBitSlice<'a> {
+    fn eq(&self, other: &Self) -> bool {
+        self.bit_slice == other.bit_slice
+    }
+}
+
+///
+/// Conversion from mutable slice to immutable bit slice
+impl<'a> From<&'a [u8]> for BufferBitSlice<'a> {
+    fn from(data: &'a [u8]) -> Self {
+        BufferBitSlice::new(data)
+    }
+}
+
+///
+/// Mutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSliceMut<'a> {
+    bit_slice: &'a mut BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSliceMut<'a> {
+    ///
+    /// Creates a mutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a mut [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice_mut(buffer_data).unwrap();
+
+        BufferBitSliceMut { bit_slice }
+    }
+
+    ///
+    /// Returns mutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn view(&'a mut self, offset_in_bits: usize, len_in_bits: usize) -> Self {
+        Self {
+            bit_slice: &mut self.bit_slice[offset_in_bits..offset_in_bits + len_in_bits],
+        }
+    }
+
+    ///
+    /// Set given bit at the position to a given value
+    #[inline]
+    pub fn set_bit_all(&mut self, value: bool) {
+        self.bit_slice.set_all(value)
+    }
+
+    ///
+    /// Count zeros in the given bit view
+    #[inline]
+    pub fn set_bit(&mut self, index: usize, value: bool) {
+        unsafe { self.bit_slice.set_unchecked(index, value) }
+    }
+
+    ///
+    /// Converts the bit view into the Buffer.
+    /// Buffer is always byte-aligned and well-aligned.
+    #[inline]
+    pub fn as_buffer(&self) -> Buffer {
+        Buffer::from(self.bit_slice.as_slice())
+    }
+
+    ///
+    /// Count ones in the given bit view
+    #[inline]
+    pub fn count_ones(&self) -> usize {
+        self.bit_slice.count_ones()
+    }
+
+    ///
+    /// Count zeros in the given bit view
+    #[inline]
+    pub fn count_zeros(&self) -> usize {
+        self.bit_slice.count_zeros()
+    }
+
+    ///
+    /// Get bit value at the given index in this bit view
+    #[inline]
+    pub fn get_bit(&self, index: usize) -> bool {
+        *unsafe { self.bit_slice.get_unchecked(index) }
+    }
+
+    ///
+    /// Get bits in this view as vector of booleans
+    #[inline]
+    pub fn typed_bits(&self) -> Vec<bool> {
+        self.bit_slice.iter().map(|e| *e).collect()
+    }
+
+    ///
+    /// Get manipulated data as byte slice
+    #[inline]
+    pub fn to_slice(&self) -> &[u8] {
+        self.bit_slice.as_slice()
+    }
+}
+
+impl<'a> PartialEq for BufferBitSliceMut<'a> {
+    fn eq(&self, other: &Self) -> bool {
+        self.bit_slice == other.bit_slice
+    }
+}
+
+///
+/// Conversion from mutable slice to mutable bit slice
+impl<'a> From<&'a mut [u8]> for BufferBitSliceMut<'a> {
+    fn from(data: &'a mut [u8]) -> Self {
+        BufferBitSliceMut::new(data)
+    }
+}
+
+///
+/// Exact chunk view over the bit slice
+#[derive(Clone, Debug)]

Review comment:
       ```suggestion
   ///
   /// The view is represented as some number of aligned T-sized chunks, 
   /// followed by some number of remainder bits
   #[derive(Clone, Debug)]
   ```

##########
File path: rust/arrow/src/util/bit_ops.rs
##########
@@ -0,0 +1,407 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::buffer::Buffer;
+
+use bitvec::prelude::*;
+use bitvec::slice::ChunksExact;
+
+use std::fmt::Debug;
+
+///
+/// Immutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSlice<'a> {
+    buffer_data: &'a [u8],

Review comment:
       It looks to me like `buffer_data` is not read -- I wonder if we could omit it and just keep `bit_slice`

##########
File path: rust/arrow/src/util/utils.rs
##########
@@ -0,0 +1,119 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Utils for working with bits
+
+#[cfg(feature = "simd")]
+use packed_simd::u8x64;
+
+/// Returns the nearest number that is `>=` than `num` and is a multiple of 64
+#[inline]
+pub fn round_upto_multiple_of_64(num: usize) -> usize {

Review comment:
       This code seems ~ the same as what is in https://github.com/apache/arrow/blob/master/rust/arrow/src/util/bit_util.rs and this PR removes that module [here](https://github.com/apache/arrow/pull/8664/files#diff-10923f5e6e9ac82323735492c134734372787553fee0801d6bb12afccdb3c147L19) but this PR doesn't seem to chagne bit_util.rs -- maybe I am missing something. 
   
   Or perhaps you meant to delete `bit_util.rs`?

##########
File path: rust/arrow/src/util/bit_ops.rs
##########
@@ -0,0 +1,407 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::buffer::Buffer;
+
+use bitvec::prelude::*;
+use bitvec::slice::ChunksExact;
+
+use std::fmt::Debug;
+
+///
+/// Immutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSlice<'a> {
+    buffer_data: &'a [u8],
+    bit_slice: &'a BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSlice<'a> {
+    ///
+    /// Creates a immutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice(buffer_data).unwrap();
+
+        BufferBitSlice {
+            buffer_data,
+            bit_slice: &bit_slice,
+        }
+    }
+
+    ///
+    /// Returns immutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn view(&self, offset_in_bits: usize, len_in_bits: usize) -> Self {
+        Self {
+            buffer_data: self.buffer_data,
+            bit_slice: &self.bit_slice[offset_in_bits..offset_in_bits + len_in_bits],
+        }
+    }
+
+    ///
+    /// Returns bit chunks in native 64-bit allocation size.
+    /// Native representations in Arrow follows 64-bit convention.
+    /// Chunks can still be reinterpreted in any primitive type lower than u64.
+    #[inline]
+    pub fn chunks<T>(&self) -> BufferBitChunksExact<T>
+    where
+        T: BitMemory,
+    {
+        let offset_size_in_bits = 8 * std::mem::size_of::<T>();
+        let chunks_exact = self.bit_slice.chunks_exact(offset_size_in_bits);
+        let remainder_bits = chunks_exact.remainder();
+        let remainder: T = if remainder_bits.len() == 0 {
+            T::default()
+        } else {
+            remainder_bits.load::<T>()
+        };
+        BufferBitChunksExact {
+            chunks_exact,
+            remainder,
+            remainder_len_in_bits: remainder_bits.len(),
+        }
+    }
+
+    ///
+    /// Converts the bit view into the Buffer.
+    /// Buffer is always byte-aligned and well-aligned.

Review comment:
       I am a little confused about this comment. What is the difference between `byte-aligned` and `well-aligned`? 

##########
File path: rust/arrow/src/util/bit_ops.rs
##########
@@ -0,0 +1,407 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::buffer::Buffer;
+
+use bitvec::prelude::*;
+use bitvec::slice::ChunksExact;
+
+use std::fmt::Debug;
+
+///
+/// Immutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSlice<'a> {
+    buffer_data: &'a [u8],
+    bit_slice: &'a BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSlice<'a> {
+    ///
+    /// Creates a immutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice(buffer_data).unwrap();
+
+        BufferBitSlice {
+            buffer_data,
+            bit_slice: &bit_slice,
+        }
+    }
+
+    ///
+    /// Returns immutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn view(&self, offset_in_bits: usize, len_in_bits: usize) -> Self {
+        Self {
+            buffer_data: self.buffer_data,
+            bit_slice: &self.bit_slice[offset_in_bits..offset_in_bits + len_in_bits],
+        }
+    }
+
+    ///
+    /// Returns bit chunks in native 64-bit allocation size.
+    /// Native representations in Arrow follows 64-bit convention.
+    /// Chunks can still be reinterpreted in any primitive type lower than u64.
+    #[inline]
+    pub fn chunks<T>(&self) -> BufferBitChunksExact<T>
+    where
+        T: BitMemory,
+    {
+        let offset_size_in_bits = 8 * std::mem::size_of::<T>();
+        let chunks_exact = self.bit_slice.chunks_exact(offset_size_in_bits);
+        let remainder_bits = chunks_exact.remainder();
+        let remainder: T = if remainder_bits.len() == 0 {
+            T::default()
+        } else {
+            remainder_bits.load::<T>()
+        };
+        BufferBitChunksExact {
+            chunks_exact,
+            remainder,
+            remainder_len_in_bits: remainder_bits.len(),
+        }
+    }
+
+    ///
+    /// Converts the bit view into the Buffer.
+    /// Buffer is always byte-aligned and well-aligned.
+    #[inline]
+    pub fn as_buffer(&self) -> Buffer {
+        Buffer::from(self.bit_slice.as_slice())
+    }
+
+    ///
+    /// Count ones in the given bit view
+    #[inline]
+    pub fn count_ones(&self) -> usize {
+        self.bit_slice.count_ones()
+    }
+
+    ///
+    /// Count zeros in the given bit view
+    #[inline]
+    pub fn count_zeros(&self) -> usize {
+        self.bit_slice.count_zeros()
+    }
+
+    ///
+    /// Get bit value at the given index in this bit view
+    #[inline]
+    pub fn get_bit(&self, index: usize) -> bool {
+        *unsafe { self.bit_slice.get_unchecked(index) }
+    }
+
+    ///
+    /// Get bits in this view as vector of booleans
+    #[inline]
+    pub fn typed_bits(&self) -> Vec<bool> {
+        self.bit_slice.iter().map(|e| *e).collect()
+    }
+
+    ///
+    /// Get manipulated data as byte slice
+    #[inline]
+    pub fn to_slice(&self) -> &[u8] {
+        self.bit_slice.as_slice()
+    }
+}
+
+impl<'a> PartialEq for BufferBitSlice<'a> {
+    fn eq(&self, other: &Self) -> bool {
+        self.bit_slice == other.bit_slice
+    }
+}
+
+///
+/// Conversion from mutable slice to immutable bit slice
+impl<'a> From<&'a [u8]> for BufferBitSlice<'a> {
+    fn from(data: &'a [u8]) -> Self {
+        BufferBitSlice::new(data)
+    }
+}
+
+///
+/// Mutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSliceMut<'a> {
+    bit_slice: &'a mut BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSliceMut<'a> {
+    ///
+    /// Creates a mutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a mut [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice_mut(buffer_data).unwrap();
+
+        BufferBitSliceMut { bit_slice }
+    }
+
+    ///
+    /// Returns mutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn view(&'a mut self, offset_in_bits: usize, len_in_bits: usize) -> Self {
+        Self {
+            bit_slice: &mut self.bit_slice[offset_in_bits..offset_in_bits + len_in_bits],
+        }
+    }
+
+    ///
+    /// Set given bit at the position to a given value
+    #[inline]
+    pub fn set_bit_all(&mut self, value: bool) {
+        self.bit_slice.set_all(value)
+    }
+
+    ///
+    /// Count zeros in the given bit view

Review comment:
       ```suggestion
       /// Set given bit at the position to a given value
   ```

##########
File path: rust/arrow/src/util/bit_ops.rs
##########
@@ -0,0 +1,407 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::buffer::Buffer;
+
+use bitvec::prelude::*;
+use bitvec::slice::ChunksExact;
+
+use std::fmt::Debug;
+
+///
+/// Immutable bit slice representation of buffer data

Review comment:
       ```suggestion
   ///
   /// Immutable bit slice view of `Buffer` data. 
   ///
   /// `BufferBitSlice` does not own any underlying data, but rather wraps references 
   /// to the underlying data in a `Buffer` and has methods for addressing and interacting with 
   /// individual bits
   ```

##########
File path: rust/arrow/src/util/bit_ops.rs
##########
@@ -0,0 +1,407 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::buffer::Buffer;
+
+use bitvec::prelude::*;
+use bitvec::slice::ChunksExact;
+
+use std::fmt::Debug;
+
+///
+/// Immutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSlice<'a> {
+    buffer_data: &'a [u8],
+    bit_slice: &'a BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSlice<'a> {
+    ///
+    /// Creates a immutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice(buffer_data).unwrap();
+
+        BufferBitSlice {
+            buffer_data,
+            bit_slice: &bit_slice,
+        }
+    }
+
+    ///
+    /// Returns immutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn view(&self, offset_in_bits: usize, len_in_bits: usize) -> Self {
+        Self {
+            buffer_data: self.buffer_data,
+            bit_slice: &self.bit_slice[offset_in_bits..offset_in_bits + len_in_bits],
+        }
+    }
+
+    ///
+    /// Returns bit chunks in native 64-bit allocation size.
+    /// Native representations in Arrow follows 64-bit convention.
+    /// Chunks can still be reinterpreted in any primitive type lower than u64.
+    #[inline]
+    pub fn chunks<T>(&self) -> BufferBitChunksExact<T>
+    where
+        T: BitMemory,
+    {
+        let offset_size_in_bits = 8 * std::mem::size_of::<T>();
+        let chunks_exact = self.bit_slice.chunks_exact(offset_size_in_bits);
+        let remainder_bits = chunks_exact.remainder();
+        let remainder: T = if remainder_bits.len() == 0 {
+            T::default()
+        } else {
+            remainder_bits.load::<T>()
+        };
+        BufferBitChunksExact {
+            chunks_exact,
+            remainder,
+            remainder_len_in_bits: remainder_bits.len(),
+        }
+    }
+
+    ///
+    /// Converts the bit view into the Buffer.
+    /// Buffer is always byte-aligned and well-aligned.
+    #[inline]
+    pub fn as_buffer(&self) -> Buffer {
+        Buffer::from(self.bit_slice.as_slice())
+    }
+
+    ///
+    /// Count ones in the given bit view
+    #[inline]
+    pub fn count_ones(&self) -> usize {
+        self.bit_slice.count_ones()
+    }
+
+    ///
+    /// Count zeros in the given bit view
+    #[inline]
+    pub fn count_zeros(&self) -> usize {
+        self.bit_slice.count_zeros()
+    }
+
+    ///
+    /// Get bit value at the given index in this bit view
+    #[inline]
+    pub fn get_bit(&self, index: usize) -> bool {
+        *unsafe { self.bit_slice.get_unchecked(index) }
+    }
+
+    ///
+    /// Get bits in this view as vector of booleans
+    #[inline]
+    pub fn typed_bits(&self) -> Vec<bool> {
+        self.bit_slice.iter().map(|e| *e).collect()
+    }
+
+    ///
+    /// Get manipulated data as byte slice
+    #[inline]
+    pub fn to_slice(&self) -> &[u8] {
+        self.bit_slice.as_slice()
+    }
+}
+
+impl<'a> PartialEq for BufferBitSlice<'a> {
+    fn eq(&self, other: &Self) -> bool {
+        self.bit_slice == other.bit_slice
+    }
+}
+
+///
+/// Conversion from mutable slice to immutable bit slice
+impl<'a> From<&'a [u8]> for BufferBitSlice<'a> {
+    fn from(data: &'a [u8]) -> Self {
+        BufferBitSlice::new(data)
+    }
+}
+
+///
+/// Mutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSliceMut<'a> {
+    bit_slice: &'a mut BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSliceMut<'a> {
+    ///
+    /// Creates a mutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a mut [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice_mut(buffer_data).unwrap();
+
+        BufferBitSliceMut { bit_slice }
+    }
+
+    ///
+    /// Returns mutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn view(&'a mut self, offset_in_bits: usize, len_in_bits: usize) -> Self {
+        Self {
+            bit_slice: &mut self.bit_slice[offset_in_bits..offset_in_bits + len_in_bits],
+        }
+    }
+
+    ///
+    /// Set given bit at the position to a given value
+    #[inline]
+    pub fn set_bit_all(&mut self, value: bool) {
+        self.bit_slice.set_all(value)
+    }
+
+    ///
+    /// Count zeros in the given bit view
+    #[inline]
+    pub fn set_bit(&mut self, index: usize, value: bool) {
+        unsafe { self.bit_slice.set_unchecked(index, value) }
+    }
+
+    ///
+    /// Converts the bit view into the Buffer.
+    /// Buffer is always byte-aligned and well-aligned.

Review comment:
       same question as above about "is there a difference between `byte-aligned` and `well-aligned`"?

##########
File path: rust/arrow/src/util/bit_ops.rs
##########
@@ -0,0 +1,407 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::buffer::Buffer;
+
+use bitvec::prelude::*;
+use bitvec::slice::ChunksExact;
+
+use std::fmt::Debug;
+
+///
+/// Immutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSlice<'a> {
+    buffer_data: &'a [u8],
+    bit_slice: &'a BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSlice<'a> {
+    ///
+    /// Creates a immutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice(buffer_data).unwrap();
+
+        BufferBitSlice {
+            buffer_data,
+            bit_slice: &bit_slice,
+        }
+    }
+
+    ///
+    /// Returns immutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn view(&self, offset_in_bits: usize, len_in_bits: usize) -> Self {
+        Self {
+            buffer_data: self.buffer_data,
+            bit_slice: &self.bit_slice[offset_in_bits..offset_in_bits + len_in_bits],
+        }
+    }
+
+    ///
+    /// Returns bit chunks in native 64-bit allocation size.
+    /// Native representations in Arrow follows 64-bit convention.
+    /// Chunks can still be reinterpreted in any primitive type lower than u64.
+    #[inline]
+    pub fn chunks<T>(&self) -> BufferBitChunksExact<T>
+    where
+        T: BitMemory,
+    {
+        let offset_size_in_bits = 8 * std::mem::size_of::<T>();
+        let chunks_exact = self.bit_slice.chunks_exact(offset_size_in_bits);
+        let remainder_bits = chunks_exact.remainder();
+        let remainder: T = if remainder_bits.len() == 0 {
+            T::default()
+        } else {
+            remainder_bits.load::<T>()
+        };
+        BufferBitChunksExact {
+            chunks_exact,
+            remainder,
+            remainder_len_in_bits: remainder_bits.len(),
+        }
+    }
+
+    ///
+    /// Converts the bit view into the Buffer.

Review comment:
       ```suggestion
       /// Converts the bit view into a Buffer.
   ```

##########
File path: rust/arrow/src/util/bit_ops.rs
##########
@@ -0,0 +1,407 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::buffer::Buffer;
+
+use bitvec::prelude::*;
+use bitvec::slice::ChunksExact;
+
+use std::fmt::Debug;
+
+///
+/// Immutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSlice<'a> {
+    buffer_data: &'a [u8],
+    bit_slice: &'a BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSlice<'a> {
+    ///
+    /// Creates a immutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice(buffer_data).unwrap();
+
+        BufferBitSlice {
+            buffer_data,
+            bit_slice: &bit_slice,
+        }
+    }
+
+    ///
+    /// Returns immutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn view(&self, offset_in_bits: usize, len_in_bits: usize) -> Self {
+        Self {
+            buffer_data: self.buffer_data,
+            bit_slice: &self.bit_slice[offset_in_bits..offset_in_bits + len_in_bits],
+        }
+    }
+
+    ///
+    /// Returns bit chunks in native 64-bit allocation size.
+    /// Native representations in Arrow follows 64-bit convention.
+    /// Chunks can still be reinterpreted in any primitive type lower than u64.
+    #[inline]
+    pub fn chunks<T>(&self) -> BufferBitChunksExact<T>
+    where
+        T: BitMemory,
+    {
+        let offset_size_in_bits = 8 * std::mem::size_of::<T>();
+        let chunks_exact = self.bit_slice.chunks_exact(offset_size_in_bits);
+        let remainder_bits = chunks_exact.remainder();
+        let remainder: T = if remainder_bits.len() == 0 {
+            T::default()
+        } else {
+            remainder_bits.load::<T>()
+        };
+        BufferBitChunksExact {
+            chunks_exact,
+            remainder,
+            remainder_len_in_bits: remainder_bits.len(),
+        }
+    }
+
+    ///
+    /// Converts the bit view into the Buffer.
+    /// Buffer is always byte-aligned and well-aligned.
+    #[inline]
+    pub fn as_buffer(&self) -> Buffer {
+        Buffer::from(self.bit_slice.as_slice())
+    }
+
+    ///
+    /// Count ones in the given bit view
+    #[inline]
+    pub fn count_ones(&self) -> usize {
+        self.bit_slice.count_ones()
+    }
+
+    ///
+    /// Count zeros in the given bit view
+    #[inline]
+    pub fn count_zeros(&self) -> usize {
+        self.bit_slice.count_zeros()
+    }
+
+    ///
+    /// Get bit value at the given index in this bit view
+    #[inline]
+    pub fn get_bit(&self, index: usize) -> bool {
+        *unsafe { self.bit_slice.get_unchecked(index) }
+    }
+
+    ///
+    /// Get bits in this view as vector of booleans
+    #[inline]
+    pub fn typed_bits(&self) -> Vec<bool> {
+        self.bit_slice.iter().map(|e| *e).collect()
+    }
+
+    ///
+    /// Get manipulated data as byte slice
+    #[inline]
+    pub fn to_slice(&self) -> &[u8] {
+        self.bit_slice.as_slice()
+    }
+}
+
+impl<'a> PartialEq for BufferBitSlice<'a> {
+    fn eq(&self, other: &Self) -> bool {
+        self.bit_slice == other.bit_slice
+    }
+}
+
+///
+/// Conversion from mutable slice to immutable bit slice
+impl<'a> From<&'a [u8]> for BufferBitSlice<'a> {
+    fn from(data: &'a [u8]) -> Self {
+        BufferBitSlice::new(data)
+    }
+}
+
+///
+/// Mutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSliceMut<'a> {
+    bit_slice: &'a mut BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSliceMut<'a> {
+    ///
+    /// Creates a mutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a mut [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice_mut(buffer_data).unwrap();
+
+        BufferBitSliceMut { bit_slice }
+    }
+
+    ///
+    /// Returns mutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn view(&'a mut self, offset_in_bits: usize, len_in_bits: usize) -> Self {
+        Self {
+            bit_slice: &mut self.bit_slice[offset_in_bits..offset_in_bits + len_in_bits],
+        }
+    }
+
+    ///
+    /// Set given bit at the position to a given value
+    #[inline]
+    pub fn set_bit_all(&mut self, value: bool) {
+        self.bit_slice.set_all(value)
+    }
+
+    ///
+    /// Count zeros in the given bit view
+    #[inline]
+    pub fn set_bit(&mut self, index: usize, value: bool) {
+        unsafe { self.bit_slice.set_unchecked(index, value) }
+    }
+
+    ///
+    /// Converts the bit view into the Buffer.
+    /// Buffer is always byte-aligned and well-aligned.
+    #[inline]
+    pub fn as_buffer(&self) -> Buffer {
+        Buffer::from(self.bit_slice.as_slice())
+    }
+
+    ///
+    /// Count ones in the given bit view
+    #[inline]
+    pub fn count_ones(&self) -> usize {
+        self.bit_slice.count_ones()
+    }
+
+    ///
+    /// Count zeros in the given bit view
+    #[inline]
+    pub fn count_zeros(&self) -> usize {
+        self.bit_slice.count_zeros()
+    }
+
+    ///
+    /// Get bit value at the given index in this bit view
+    #[inline]
+    pub fn get_bit(&self, index: usize) -> bool {
+        *unsafe { self.bit_slice.get_unchecked(index) }
+    }
+
+    ///
+    /// Get bits in this view as vector of booleans
+    #[inline]
+    pub fn typed_bits(&self) -> Vec<bool> {
+        self.bit_slice.iter().map(|e| *e).collect()
+    }
+
+    ///
+    /// Get manipulated data as byte slice
+    #[inline]
+    pub fn to_slice(&self) -> &[u8] {
+        self.bit_slice.as_slice()
+    }
+}
+
+impl<'a> PartialEq for BufferBitSliceMut<'a> {
+    fn eq(&self, other: &Self) -> bool {
+        self.bit_slice == other.bit_slice
+    }
+}
+
+///
+/// Conversion from mutable slice to mutable bit slice
+impl<'a> From<&'a mut [u8]> for BufferBitSliceMut<'a> {
+    fn from(data: &'a mut [u8]) -> Self {
+        BufferBitSliceMut::new(data)
+    }
+}
+
+///
+/// Exact chunk view over the bit slice
+#[derive(Clone, Debug)]
+pub struct BufferBitChunksExact<'a, T>
+where
+    T: BitMemory,

Review comment:
       In general my questions about keeping this generic over `T` vs hard coding `u64` remains here too

##########
File path: rust/arrow/src/compute/kernels/comparison.rs
##########
@@ -570,27 +570,26 @@ where
         ));
     }
 
-    let num_bytes = bit_util::ceil(left_len, 8);
+    let num_bytes = utils::ceil(left_len, 8);
 
     let not_both_null_bit_buffer =
         match combine_option_bitmap(left.data_ref(), right.data_ref(), left_len)? {
             Some(buff) => buff,
             None => new_all_set_buffer(num_bytes),
         };
-    let not_both_null_bitmap = not_both_null_bit_buffer.data();
+    let _not_both_null_bitmap = not_both_null_bit_buffer.data();

Review comment:
       I wonder if there is some reason to keep this statement? It seems unused

##########
File path: rust/arrow/src/buffer.rs
##########
@@ -258,39 +259,52 @@ impl Buffer {
     /// Returns a slice of this buffer starting at a certain bit offset.
     /// If the offset is byte-aligned the returned buffer is a shallow clone,
     /// otherwise a new buffer is allocated and filled with a copy of the bits in the range.
-    pub fn bit_slice(&self, offset: usize, len: usize) -> Self {
-        if offset % 8 == 0 && len % 8 == 0 {
-            return self.slice(offset / 8);
+    #[inline]
+    pub fn bit_view(&self, offset_in_bits: usize, len_in_bits: usize) -> Self {

Review comment:
       the fact that it can *copy* the underlying data even though it is called `bit_view` is confusing to me
   
   Maybe we could call it `as_aligned()` or something 

##########
File path: rust/arrow/src/array/array_list.rs
##########
@@ -711,8 +722,9 @@ mod tests {
         assert_eq!(1, sliced_array.offset());
         assert_eq!(2, sliced_array.null_count());
 
+        let null_bit_slice = BufferBitSliceMut::new(&mut null_bits);

Review comment:
       this might be able to be `BitBufferSlice` as it doesn't need to be mutable

##########
File path: rust/arrow/src/util/bit_ops.rs
##########
@@ -0,0 +1,407 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::buffer::Buffer;
+
+use bitvec::prelude::*;
+use bitvec::slice::ChunksExact;
+
+use std::fmt::Debug;
+
+///
+/// Immutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSlice<'a> {
+    buffer_data: &'a [u8],
+    bit_slice: &'a BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSlice<'a> {
+    ///
+    /// Creates a immutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice(buffer_data).unwrap();
+
+        BufferBitSlice {
+            buffer_data,
+            bit_slice: &bit_slice,
+        }
+    }
+
+    ///
+    /// Returns immutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn view(&self, offset_in_bits: usize, len_in_bits: usize) -> Self {
+        Self {
+            buffer_data: self.buffer_data,
+            bit_slice: &self.bit_slice[offset_in_bits..offset_in_bits + len_in_bits],
+        }
+    }
+
+    ///
+    /// Returns bit chunks in native 64-bit allocation size.
+    /// Native representations in Arrow follows 64-bit convention.
+    /// Chunks can still be reinterpreted in any primitive type lower than u64.
+    #[inline]
+    pub fn chunks<T>(&self) -> BufferBitChunksExact<T>
+    where
+        T: BitMemory,
+    {
+        let offset_size_in_bits = 8 * std::mem::size_of::<T>();
+        let chunks_exact = self.bit_slice.chunks_exact(offset_size_in_bits);
+        let remainder_bits = chunks_exact.remainder();
+        let remainder: T = if remainder_bits.len() == 0 {
+            T::default()
+        } else {
+            remainder_bits.load::<T>()
+        };
+        BufferBitChunksExact {
+            chunks_exact,
+            remainder,
+            remainder_len_in_bits: remainder_bits.len(),
+        }
+    }
+
+    ///
+    /// Converts the bit view into the Buffer.
+    /// Buffer is always byte-aligned and well-aligned.
+    #[inline]
+    pub fn as_buffer(&self) -> Buffer {
+        Buffer::from(self.bit_slice.as_slice())
+    }
+
+    ///
+    /// Count ones in the given bit view
+    #[inline]
+    pub fn count_ones(&self) -> usize {
+        self.bit_slice.count_ones()
+    }
+
+    ///
+    /// Count zeros in the given bit view
+    #[inline]
+    pub fn count_zeros(&self) -> usize {
+        self.bit_slice.count_zeros()
+    }
+
+    ///
+    /// Get bit value at the given index in this bit view
+    #[inline]
+    pub fn get_bit(&self, index: usize) -> bool {
+        *unsafe { self.bit_slice.get_unchecked(index) }
+    }
+
+    ///
+    /// Get bits in this view as vector of booleans
+    #[inline]
+    pub fn typed_bits(&self) -> Vec<bool> {
+        self.bit_slice.iter().map(|e| *e).collect()
+    }
+
+    ///
+    /// Get manipulated data as byte slice
+    #[inline]
+    pub fn to_slice(&self) -> &[u8] {
+        self.bit_slice.as_slice()
+    }
+}
+
+impl<'a> PartialEq for BufferBitSlice<'a> {
+    fn eq(&self, other: &Self) -> bool {
+        self.bit_slice == other.bit_slice
+    }
+}
+
+///
+/// Conversion from mutable slice to immutable bit slice
+impl<'a> From<&'a [u8]> for BufferBitSlice<'a> {
+    fn from(data: &'a [u8]) -> Self {
+        BufferBitSlice::new(data)
+    }
+}
+
+///
+/// Mutable bit slice representation of buffer data
+#[derive(Debug)]
+pub struct BufferBitSliceMut<'a> {
+    bit_slice: &'a mut BitSlice<LocalBits, u8>,
+}
+
+impl<'a> BufferBitSliceMut<'a> {
+    ///
+    /// Creates a mutable bit slice over the given data
+    #[inline]
+    pub fn new(buffer_data: &'a mut [u8]) -> Self {
+        let bit_slice = BitSlice::<LocalBits, _>::from_slice_mut(buffer_data).unwrap();
+
+        BufferBitSliceMut { bit_slice }
+    }
+
+    ///
+    /// Returns mutable view with the given offset in bits and length in bits.
+    /// This view have zero-copy representation over the actual data.
+    #[inline]
+    pub fn view(&'a mut self, offset_in_bits: usize, len_in_bits: usize) -> Self {
+        Self {
+            bit_slice: &mut self.bit_slice[offset_in_bits..offset_in_bits + len_in_bits],
+        }
+    }
+
+    ///
+    /// Set given bit at the position to a given value
+    #[inline]
+    pub fn set_bit_all(&mut self, value: bool) {
+        self.bit_slice.set_all(value)
+    }
+
+    ///
+    /// Count zeros in the given bit view
+    #[inline]
+    pub fn set_bit(&mut self, index: usize, value: bool) {
+        unsafe { self.bit_slice.set_unchecked(index, value) }
+    }
+
+    ///
+    /// Converts the bit view into the Buffer.
+    /// Buffer is always byte-aligned and well-aligned.
+    #[inline]
+    pub fn as_buffer(&self) -> Buffer {
+        Buffer::from(self.bit_slice.as_slice())
+    }
+
+    ///
+    /// Count ones in the given bit view
+    #[inline]
+    pub fn count_ones(&self) -> usize {
+        self.bit_slice.count_ones()
+    }
+
+    ///
+    /// Count zeros in the given bit view
+    #[inline]
+    pub fn count_zeros(&self) -> usize {
+        self.bit_slice.count_zeros()
+    }
+
+    ///
+    /// Get bit value at the given index in this bit view
+    #[inline]
+    pub fn get_bit(&self, index: usize) -> bool {
+        *unsafe { self.bit_slice.get_unchecked(index) }
+    }
+
+    ///
+    /// Get bits in this view as vector of booleans
+    #[inline]
+    pub fn typed_bits(&self) -> Vec<bool> {
+        self.bit_slice.iter().map(|e| *e).collect()
+    }
+
+    ///
+    /// Get manipulated data as byte slice
+    #[inline]
+    pub fn to_slice(&self) -> &[u8] {
+        self.bit_slice.as_slice()
+    }
+}
+
+impl<'a> PartialEq for BufferBitSliceMut<'a> {
+    fn eq(&self, other: &Self) -> bool {
+        self.bit_slice == other.bit_slice
+    }
+}
+
+///
+/// Conversion from mutable slice to mutable bit slice
+impl<'a> From<&'a mut [u8]> for BufferBitSliceMut<'a> {
+    fn from(data: &'a mut [u8]) -> Self {
+        BufferBitSliceMut::new(data)
+    }
+}
+
+///
+/// Exact chunk view over the bit slice
+#[derive(Clone, Debug)]
+pub struct BufferBitChunksExact<'a, T>
+where
+    T: BitMemory,
+{
+    chunks_exact: ChunksExact<'a, LocalBits, u8>,
+    remainder: T,
+    remainder_len_in_bits: usize,
+}
+
+impl<'a, T> BufferBitChunksExact<'a, T>
+where
+    T: BitMemory,
+{
+    ///
+    /// Returns remainder bit length from the exact chunk iterator
+    #[inline(always)]
+    pub fn remainder_bit_len(&self) -> usize {
+        self.remainder_len_in_bits
+    }
+
+    ///
+    /// Returns the remainder bits interpreted as given type.
+    #[inline(always)]
+    pub fn remainder_bits(&self) -> T {
+        self.remainder
+    }
+
+    ///
+    /// Interprets underlying chunk's view's bits as a given type.
+    #[inline(always)]
+    pub fn interpret(self) -> impl Iterator<Item = T> + 'a
+    where
+        T: BitMemory,
+    {
+        self.chunks_exact.map(|e| e.load::<T>())
+    }
+
+    ///
+    /// Returns underlying iterator as it is
+    #[inline(always)]
+    pub fn iter(&self) -> &ChunksExact<'a, LocalBits, u8> {
+        &self.chunks_exact
+    }
+}
+
+///
+/// Implements consuming iterator for exact chunk iterator
+impl<'a, T> IntoIterator for BufferBitChunksExact<'a, T>
+where
+    T: BitMemory,
+{
+    type Item = &'a BitSlice<LocalBits, u8>;
+    type IntoIter = ChunksExact<'a, LocalBits, u8>;
+
+    fn into_iter(self) -> Self::IntoIter {
+        self.chunks_exact
+    }
+}
+
+#[cfg(all(test, target_endian = "little"))]
+mod tests_bit_slices_little_endian {
+    use super::*;
+    use crate::datatypes::ToByteSlice;
+
+    #[test]
+    fn test_bit_slice_iter_aligned() {
+        let input: &[u8] = &[0, 1, 2, 3, 4, 5, 6, 7];
+        let buffer: Buffer = Buffer::from(input);
+
+        let bit_slice = buffer.bit_slice();
+        let result = bit_slice.chunks().interpret().collect::<Vec<u64>>();
+
+        assert_eq!(vec![0x0706050403020100], result);
+    }
+
+    #[test]
+    fn test_bit_slice_iter_unaligned() {
+        let input: &[u8] = &[
+            0b00000000, 0b00000001, 0b00000010, 0b00000100, 0b00001000, 0b00010000,
+            0b00100000, 0b01000000, 0b11111111,
+        ];
+        let buffer: Buffer = Buffer::from(input);
+
+        let bit_slice = buffer.bit_slice().view(4, 64);
+        let chunks = bit_slice.chunks::<u64>();
+
+        assert_eq!(0, chunks.remainder_bit_len());
+        assert_eq!(0, chunks.remainder_bits());
+
+        let result = chunks.interpret().collect::<Vec<u64>>();
+
+        assert_eq!(
+            vec![0b1111_01000000_00100000_00010000_00001000_00000100_00000010_00000001_0000],
+            result
+        );
+    }
+
+    #[test]
+    fn test_bit_slice_iter_unaligned_remainder_1_byte() {
+        let input: &[u8] = &[
+            0b00000000, 0b00000001, 0b00000010, 0b00000100, 0b00001000, 0b00010000,
+            0b00100000, 0b01000000, 0b11111111,
+        ];
+        let buffer: Buffer = Buffer::from(input);
+
+        let bit_slice = buffer.bit_slice().view(4, 66);
+        let chunks = bit_slice.chunks::<u64>();
+
+        assert_eq!(2, chunks.remainder_bit_len());
+        assert_eq!(0b00000011, chunks.remainder_bits());
+
+        let result = chunks.interpret().collect::<Vec<u64>>();
+
+        assert_eq!(
+            vec![0b1111_01000000_00100000_00010000_00001000_00000100_00000010_00000001_0000],

Review comment:
       I don't know if it is intended, but it is strange to me that the resoult 
   
   can you explain why the expected answer here is the same as when the view is `view(4, 64): https://github.com/apache/arrow/pull/8664/files#diff-715ddfbc534281523a73117d3bf4986d69d8a375dd320bc19f83298d3ba7ebd6R333
   
   Maybe you can change the bit pattern so there the extra two bits result in an additional `11` in the high place

##########
File path: rust/arrow/src/compute/kernels/aggregate.rs
##########
@@ -219,24 +223,27 @@ where
             let data_chunks = data.chunks_exact(64);
             let remainder = data_chunks.remainder();
 
-            let bit_chunks = buffer.bit_chunks(array.offset(), array.len());
+            let bit_slice = buffer.bit_slice().view(array.offset(), array.len());
+            let bit_chunks = bit_slice.chunks::<u64>();
             let remainder_bits = bit_chunks.remainder_bits();
 
-            data_chunks.zip(bit_chunks).for_each(|(chunk, mut mask)| {
-                // split chunks further into slices corresponding to the vector length
-                // the compiler is able to unroll this inner loop and remove bounds checks
-                // since the outer chunk size (64) is always a multiple of the number of lanes
-                chunk.chunks_exact(T::lanes()).for_each(|chunk| {
-                    let zero = T::init(T::default_value());
-                    let vecmask = T::mask_from_u64(mask);
-                    let chunk = T::load(&chunk);
-                    let blended = T::mask_select(vecmask, chunk, zero);
-
-                    vector_sum = vector_sum + blended;
-
-                    mask = mask >> T::lanes();
+            data_chunks
+                .zip(bit_chunks.interpret())

Review comment:
       `interpret` was confusing to me - it is making an `iter` over the underlying type `T` (`u64` in this case) -- maybe calling `interpret` like `to_native_iter` or something would be clearer. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #8664: ARROW-10588: [Rust] Safe and parallel bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-733811489


   > Conceptually this is not a small change. Personally I think parallelizing on the datafusion level and keeping kernels single-threaded is the better model.
   
   I agree with @jhorstmann 's opinion here -- I think we should keep arrow single threaded (parallelizing their invocation across record batches can be done at a higher level).
   
   One challenge of adding parallelism like rayon at a lower level such as the kernel is that the higher level libraries or applications lose control of the resources (for example, if some app wanted a particular background operation with only  a single core on a multi core, they couldn't do that easily with the code in this PR anymore)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-730076790


   If this is not stalling any other PRs, you can leave it open. I've started looking into what's causing the regression, but I'll only finish on the weekend (I'm mostly working on the Parquet writer with my evening time).
   
   I'm also seeing big regressions of 400%+


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #8664: ARROW-10588: [Rust] Safe and parallel bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-733667373


   Really impressive improvement, @vertexclique . 
   
   @nevi-me , can I have 12h to review it?
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #8664: ARROW-10588: [Rust] Safe and parallel bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-735416076


   I think there are some very interesting things in this PR:
   
   * Usage of bitvec / new structure for null buffer. I think it makes sense to use this library here rather than reinvent it.
   * For the benchmarks it makes sense to have some bigger / more realistic ones as well. 2 ^ 20 maybe is a bit big. We also have some benchmarks in datafusion / benchmarks directory which can be extended to cover more realistic scenario's.
   * For parallelism,  I am also not convinced that it's a good idea to introduce rayon without being able to turn it off / control it. For big arrays it can be a good idea, but for smaller arrays, projects like datafusion and libraries, it can actually slow it down and/or use more resources overall. I think it would maybe be nice to revisit this sometime and see if we can make it an optional dependency (it's pretty big) and you could opt-in to use it for some kernels / ops?
   
   Would love if this PR would be continued, maybe in a slimmed down form?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-729400987


   Super cool. Great work @vertexclique ! 💯  I am all in in removing that `unsafe` code. 👍 
   
   Just curious, do you know why it has a 120% hit in performance on a filtering op?
   
   ```
   filter u8 high selectivity                                                                             
                           time:   [18.523 us 18.529 us 18.535 us]
                           change: [+122.09% +122.22% +122.33%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high mild
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique edited a comment on pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique edited a comment on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-727560018


   This is using safe bitvec interface to manage bits. I worked on top of my existing PR. Since that didn't change all code to use safe operations, now this does. I squashed all my work into a single commit and closing PR: https://github.com/apache/arrow/pull/8598
   
   Closed that with comment:  https://github.com/apache/arrow/pull/8598#issuecomment-727294219
   
   Bitvec is also issuing popcnt instruction like the other pr which just got merged. So I didn't put effort to fix them instead dependency does them safely.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on pull request #8664: ARROW-10588: [Rust] Safe and parallel bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-733956565


   > Also, some of your comments are giving false information (which I stopped giving feedback). I prefer instead of having counterproductive comments, productive comments from you. Even I told this plenty of times seems like that's not going to happen. So, please don't comment and review my PRs. Thanks.
   
   @vertexclique This really isn't acceptable behavior. It is against the project's code of conduct [1] to insult or harass other contributors. We don't do that here.
   
   [1] https://www.apache.org/foundation/policies/conduct


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique edited a comment on pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique edited a comment on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-732497888


   @nevi-me @alamb @jorgecarleitao 
   The good news, I have found a solution for the performance related considerations. I have experimented on the `sum` and my roofline analysis brought some good results. Also, criterion benches are here:
   
   Before (current master):
   
   ```
   sum 2^20                time:   [900.13 us 902.01 us 904.02 us]                     
   Found 9 outliers among 100 measurements (9.00%)
     2 (2.00%) low mild
     3 (3.00%) high mild
     4 (4.00%) high severe
   
   sum nulls 2^20          time:   [2.5859 ms 2.5909 ms 2.5967 ms]                            
   Found 8 outliers among 100 measurements (8.00%)
     2 (2.00%) high mild
     6 (6.00%) high severe
   ```
   
   After:
   
   ```
   sum 2^20                time:   [236.61 us 238.02 us 239.58 us]                     
                           change: [-73.888% -73.699% -73.493%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 11 outliers among 100 measurements (11.00%)
     1 (1.00%) low mild
     5 (5.00%) high mild
     5 (5.00%) high severe
   
   sum nulls 2^20          time:   [549.14 us 551.39 us 554.07 us]                           
                           change: [-78.784% -78.671% -78.548%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 12 outliers among 100 measurements (12.00%)
     7 (7.00%) high mild
     5 (5.00%) high severe
   ```
   
   Since it is a time-consuming task, I am not going to perform a full rewrite until we agree that this performance improvement is enough. Looking forward to receiving your feedback.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique commented on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-732497888


   @nevi-me @alamb @jorgecarleitao 
   The good news, I have found a solution for the performance related considerations. I have experimented on the `sum` and my roofline analysis brought some good results. Also, criterion benches are here:
   
   Before:
   
   ```
   sum 2^20                time:   [900.13 us 902.01 us 904.02 us]                     
   Found 9 outliers among 100 measurements (9.00%)
     2 (2.00%) low mild
     3 (3.00%) high mild
     4 (4.00%) high severe
   
   sum nulls 2^20          time:   [2.5859 ms 2.5909 ms 2.5967 ms]                            
   Found 8 outliers among 100 measurements (8.00%)
     2 (2.00%) high mild
     6 (6.00%) high severe
   ```
   
   After:
   
   ```
   sum 2^20                time:   [236.61 us 238.02 us 239.58 us]                     
                           change: [-73.888% -73.699% -73.493%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 11 outliers among 100 measurements (11.00%)
     1 (1.00%) low mild
     5 (5.00%) high mild
     5 (5.00%) high severe
   
   sum nulls 2^20          time:   [549.14 us 551.39 us 554.07 us]                           
                           change: [-78.784% -78.671% -78.548%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 12 outliers among 100 measurements (12.00%)
     7 (7.00%) high mild
     5 (5.00%) high severe
   ```
   
   Since it is a time-consuming task, I am not going to perform a full rewrite until we agree that this performance improvement is enough.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique edited a comment on pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

Posted by GitBox <gi...@apache.org>.
vertexclique edited a comment on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-732497888


   @nevi-me @alamb @jorgecarleitao 
   The good news, I have found a solution for the performance related considerations. I have experimented on the `sum` and my roofline analysis brought some good results. Also, criterion benches are here:
   
   Before (current master):
   
   ```
   sum 2^20                time:   [900.13 us 902.01 us 904.02 us]                     
   Found 9 outliers among 100 measurements (9.00%)
     2 (2.00%) low mild
     3 (3.00%) high mild
     4 (4.00%) high severe
   
   sum nulls 2^20          time:   [2.5859 ms 2.5909 ms 2.5967 ms]                            
   Found 8 outliers among 100 measurements (8.00%)
     2 (2.00%) high mild
     6 (6.00%) high severe
   ```
   
   After:
   
   ```
   sum 2^20                time:   [236.61 us 238.02 us 239.58 us]                     
                           change: [-73.888% -73.699% -73.493%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 11 outliers among 100 measurements (11.00%)
     1 (1.00%) low mild
     5 (5.00%) high mild
     5 (5.00%) high severe
   
   sum nulls 2^20          time:   [549.14 us 551.39 us 554.07 us]                           
                           change: [-78.784% -78.671% -78.548%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 12 outliers among 100 measurements (12.00%)
     7 (7.00%) high mild
     5 (5.00%) high severe
   ```
   
   Since it is a time-consuming task, I am not going to perform a full rewrite until we agree that this performance improvement is enough.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org