You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/08/26 17:56:22 UTC
[GitHub] [arrow-rs] jhorstmann commented on pull request #716: Optimize array::transform::utils::set_bits

jhorstmann commented on pull request #716:
URL: https://github.com/apache/arrow-rs/pull/716#issuecomment-906620829


   Code looks good to me. The chained loop with bits_to_align and remainder could be split into two loops to write more sequential, but the code looks a bit simpler with one loop. I benchmarked this locally by adding to `concatenate_kernels`:
   
   ```
       let v1 = create_boolean_array(1024, 0.5, 0.0);
       let v2 = create_boolean_array(1024, 0.5, 0.0);
       c.bench_function("concat bool 1024", |b| {
           b.iter(|| bench_concat(&v1, &v2))
       });
   
       let v1 = create_boolean_array(1024, 0.5, 0.5);
       let v2 = create_boolean_array(1024, 0.5, 0.5);
       c.bench_function("concat bool nulls 1024", |b| {
           b.iter(|| bench_concat(&v1, &v2))
       });
   ```
   
   The results are very good, speedup between factor 3-4, improvement on bigger batches could be even better. Interestingly the benchmark setup seems to always create a null bitmap, also for the tests that are supposed to be non-null. Otherwise I can't explain why those benches also see a big speedup.
   
   There is minimal additional overhead in "concat 1024 arrays i32 4" but that is probably the worst case, concatenating 1024 arrays of length 4.
   
   ```
   concat i32 1024         time:   [1.4334 us 1.4357 us 1.4382 us]                             
                           change: [-67.795% -67.484% -67.188%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 12 outliers among 100 measurements (12.00%)
     7 (7.00%) high mild
     5 (5.00%) high severe
   
   concat i32 nulls 1024   time:   [1.6528 us 1.6549 us 1.6572 us]                                   
                           change: [-58.407% -57.885% -57.194%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 15 outliers among 100 measurements (15.00%)
     7 (7.00%) high mild
     8 (8.00%) high severe
   
   concat 1024 arrays i32 4                                                                            
                           time:   [162.80 us 162.99 us 163.23 us]
                           change: [+4.3373% +6.1785% +7.9774%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 11 outliers among 100 measurements (11.00%)
     4 (4.00%) high mild
     7 (7.00%) high severe
   
   concat str 1024         time:   [4.1305 us 4.1378 us 4.1471 us]                             
                           change: [-40.416% -40.067% -39.739%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 14 outliers among 100 measurements (14.00%)
     5 (5.00%) high mild
     9 (9.00%) high severe
   
   concat str nulls 1024   time:   [21.156 us 21.181 us 21.208 us]                                   
                           change: [-3.1958% -2.3516% -1.5638%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 7 outliers among 100 measurements (7.00%)
     3 (3.00%) high mild
     4 (4.00%) high severe
   
   concat bool 1024        time:   [1.4137 us 1.4203 us 1.4281 us]                              
                           change: [-74.572% -74.403% -74.216%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     2 (2.00%) high mild
     6 (6.00%) high severe
   
   concat bool nulls 1024  time:   [1.4999 us 1.5033 us 1.5070 us]                                    
                           change: [-74.566% -74.398% -74.230%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 7 outliers among 100 measurements (7.00%)
     3 (3.00%) high mild
     4 (4.00%) high severe
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org