You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/28 09:59:01 UTC

[GitHub] [arrow] jorgecarleitao opened a new pull request #9027: ARROW-11045: [Rust] Fix performance issues of allocator

jorgecarleitao opened a new pull request #9027:
URL: https://github.com/apache/arrow/pull/9027


   This PR addresses a performance issue in how we allocate and reallocate the `MutableBuffer` by migrating the relevant parts of rust's std `alloc` lib into this crate.
   
   # Problem
   
   The following is the result of 4 runs:
   
   mutable                 time:   [929.26 us 931.88 us 935.42 us]                    
   mutable prepared        time:   [1.0682 ms 1.0693 ms 1.0709 ms]                              
   from_slice              time:   [4.4857 ms 4.5043 ms 4.5247 ms]                        
   from_slice prepared     time:   [1.4358 ms 1.4406 ms 1.4467 ms]                                 
   
   1. start with an empty `MutableBuffer` and grow it (`realloc + memcopy`)
   2. start with a mutable with the correct capacity and grow (i.e. no `realloc + memcopy`)
   3. do the same as 1. with a `Vec<u8>` (`realloc + memcopy`) and at the end of all use `Buffer::from` (a `memcopy`)
   4. same as 2 and at the end of all use `Buffer::from` (`memcopy to vec + memcopy to Buffer`)
   
   The fact that there is no difference between 1 and 2 but a 3.5x difference between 3 and 4 shows that we are doing something wrong. The fact that 1 is as fast as 2 shows that we are doing something wrong.
   
   # This PR
   
   This PR rewrites our current allocator code to a code very close to the code used by `std` allocator. The core reason we do this is that we benefit from cache-line aligned allocated buffers [ref](https://github.com/apache/arrow/pull/8796#issuecomment-748470192), but Rust's custom allocator's API is `unstable` (and thus only available in nightly).
   
   The code in this PR is not very complex and I assume that it was already well though through from rust's std team. I did the necessary modifications for our use-case:
   * always allocate aligned
   * always allocate in chunks of 64 bytes
   * always allocate initialized to zero (`std::alloc::alloc_zeroed`)
   
   Benchmarks for `take`:
   
   ```bash
   git checkout master
   cargo bench --bench take_kernels --features simd
   git checkout alloc
   cargo bench --bench take_kernels --features simd
   ```
   
   |  benchmark | variation (%) |
   |-------------- | -------------- | 
   | take i32 nulls 1024 | 5.9 | 
   | take i32 1024 | 4.4 | 
   | take i32 512 | 2.1 | 
   | take str 512 | 2.1 | 
   | take i32 nulls 512 | 1.6 | 
   | take str 1024 | 1.0 | 
   | take bool nulls 1024 | -1.5 | 
   | take bool nulls 512 | -2.4 | 
   | take bool 512 | -6.8 | 
   | take bool 1024 | -8.3 | 
   | take str null values 1024 | -10.6 | 
   | take str null values null indices 1024 | -18.9 | 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao closed pull request #9027: ARROW-11045: [Rust] Fix performance issues of allocator

Posted by GitBox <gi...@apache.org>.
jorgecarleitao closed pull request #9027:
URL: https://github.com/apache/arrow/pull/9027


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9027: ARROW-11045: [Rust] Fix performance issues of allocator

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9027:
URL: https://github.com/apache/arrow/pull/9027#issuecomment-751658624


   https://issues.apache.org/jira/browse/ARROW-11045


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org