You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/28 09:59:01 UTC
[GitHub] [arrow] jorgecarleitao opened a new pull request #9027: ARROW-11045: [Rust] Fix performance issues of allocator
jorgecarleitao opened a new pull request #9027:
URL: https://github.com/apache/arrow/pull/9027
This PR addresses a performance issue in how we allocate and reallocate the `MutableBuffer` by migrating the relevant parts of rust's std `alloc` lib into this crate.
# Problem
The following is the result of 4 runs:
mutable time: [929.26 us 931.88 us 935.42 us]
mutable prepared time: [1.0682 ms 1.0693 ms 1.0709 ms]
from_slice time: [4.4857 ms 4.5043 ms 4.5247 ms]
from_slice prepared time: [1.4358 ms 1.4406 ms 1.4467 ms]
1. start with an empty `MutableBuffer` and grow it (`realloc + memcopy`)
2. start with a mutable with the correct capacity and grow (i.e. no `realloc + memcopy`)
3. do the same as 1. with a `Vec<u8>` (`realloc + memcopy`) and at the end of all use `Buffer::from` (a `memcopy`)
4. same as 2 and at the end of all use `Buffer::from` (`memcopy to vec + memcopy to Buffer`)
The fact that there is no difference between 1 and 2 but a 3.5x difference between 3 and 4 shows that we are doing something wrong. The fact that 1 is as fast as 2 shows that we are doing something wrong.
# This PR
This PR rewrites our current allocator code to a code very close to the code used by `std` allocator. The core reason we do this is that we benefit from cache-line aligned allocated buffers [ref](https://github.com/apache/arrow/pull/8796#issuecomment-748470192), but Rust's custom allocator's API is `unstable` (and thus only available in nightly).
The code in this PR is not very complex and I assume that it was already well though through from rust's std team. I did the necessary modifications for our use-case:
* always allocate aligned
* always allocate in chunks of 64 bytes
* always allocate initialized to zero (`std::alloc::alloc_zeroed`)
Benchmarks for `take`:
```bash
git checkout master
cargo bench --bench take_kernels --features simd
git checkout alloc
cargo bench --bench take_kernels --features simd
```
| benchmark | variation (%) |
|-------------- | -------------- |
| take i32 nulls 1024 | 5.9 |
| take i32 1024 | 4.4 |
| take i32 512 | 2.1 |
| take str 512 | 2.1 |
| take i32 nulls 512 | 1.6 |
| take str 1024 | 1.0 |
| take bool nulls 1024 | -1.5 |
| take bool nulls 512 | -2.4 |
| take bool 512 | -6.8 |
| take bool 1024 | -8.3 |
| take str null values 1024 | -10.6 |
| take str null values null indices 1024 | -18.9 |
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorgecarleitao closed pull request #9027: ARROW-11045: [Rust] Fix performance issues of allocator
Posted by GitBox <gi...@apache.org>.
jorgecarleitao closed pull request #9027:
URL: https://github.com/apache/arrow/pull/9027
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #9027: ARROW-11045: [Rust] Fix performance issues of allocator
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9027:
URL: https://github.com/apache/arrow/pull/9027#issuecomment-751658624
https://issues.apache.org/jira/browse/ARROW-11045
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org