You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/04 10:04:26 UTC

[GitHub] [arrow-rs] jhorstmann opened a new issue #397: Optimize MutableArrayData::extend for null buffers

jhorstmann opened a new issue #397:
URL: https://github.com/apache/arrow-rs/issues/397


   In one of our benchmarks the `concat` kernel was identified as a big performance bottleneck while sorting, specifically the closures inside `build_extend_null_bits`. The logic in there currently sets individual bits and also contains a branch for every bit
   
   ```
   if bit_util::get_bit(...) {
       bit_util::set_bit(...);
   }
   ```
   
   I think it should be possible to rewrite this to set multiple bits at the same time and remove most of the branch overhead. The general idea would look like this:
   
   - append individual bits until the destination buffer starts at a byte offset
   - start a BitChunk iterator on the source buffer and then append u8 or u64 at a time
   - append the remainder u8 at a time
   
   Similar logic would apply to setting all bits to valid, appending chunks of u8::MAX or u64::MAX at a time.
   
   The `get_bit` / `set_bit` functions themselves could probably also be speed up a little, I think on modern processors calculating the bit masks instead of using a lookup table should be faster. But after the above changes, those functions would no longer be used in the hot path.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb closed issue #397: Optimize MutableArrayData::extend for null buffers

Posted by GitBox <gi...@apache.org>.
alamb closed issue #397:
URL: https://github.com/apache/arrow-rs/issues/397


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] mathiaspeters-sig commented on issue #397: Optimize MutableArrayData::extend for null buffers

Posted by GitBox <gi...@apache.org>.
mathiaspeters-sig commented on issue #397:
URL: https://github.com/apache/arrow-rs/issues/397#issuecomment-902529475


   Since I can't assign myself I'll comment instead: I'm starting to work on this now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] Dandandan commented on issue #397: Optimize MutableArrayData::extend for null buffers

Posted by GitBox <gi...@apache.org>.
Dandandan commented on issue #397:
URL: https://github.com/apache/arrow-rs/issues/397#issuecomment-902628995


   @mathiaspeters-sig 
   
   There might be some inspiration to use from the work in arrow2 by @jorgecarleitao , e.g. see https://github.com/jorgecarleitao/arrow2/pull/291


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org