You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/09 08:23:59 UTC

[GitHub] [arrow-rs] jorgecarleitao commented on pull request #269: Aligned vec

jorgecarleitao commented on pull request #269:
URL: https://github.com/apache/arrow-rs/pull/269#issuecomment-835741997


   I agree with you in all the above 👍 
   
   So, background goes as follows:
   
   * `ArrayData::buffer: Vec<Buffer>` exists, thus, `Buffer` must be untyped.
   * Some APIs to build arrays also have untyped `Vec<MutableBuffer>`, and thus `MutableBuffer` must also be untyped.
   
   The last time I measured, the performance of `Vec` was subpar to the performance of `MutableBuffer`. I.e. it was faster to use `MutableBuffer::from_trusted_len_iter` then to use `.collect::<Vec<_>>()` (Vec uses trait specialization and unstable's `TrustedLen`, and thus these are comparable). I do not fully understand why, but maybe the fact that `MutableBuffer` only holds structs that do not need to be dropped has some optimizations?
   
   In light of this, my understanding is that `Vec`'s API is really the only thing we need, as everyone is very used to it. I tried to align `MutableBuffer` with `Vec`s API about 2 months, by introducing some methods like `push`, over `T`. This is what I meant with: maybe we could try to patch `MutableBuffer` with what is missing from `Vec`?
   
   I agree that this whole thing emerges because Rusts' custom allocator API is unstable, which means that we can't build a `Vec` with a custom allocator. Still, I think that the constraint `T: NativeType` makes things simpler, as it ensures that we do not need to call drop and all that part that `Vec` needs to take care of.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org