You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/10 13:55:14 UTC

[GitHub] [arrow] jorgecarleitao commented on pull request #8200: ARROW-8883: [Rust] [Integration] Enable more tests

jorgecarleitao commented on pull request #8200:
URL: https://github.com/apache/arrow/pull/8200#issuecomment-706552752


   Thanks a lot for driving this and CCing. This is definitely important.
   
   I have myself hit that `capacity` problem multiple times! One was when I was trying to simplify the `equal.rs`, for what you wrote on which buffers are different when the capacity is different, the second one was on #8401 , since when we have a buffer that receives data via the c data interface, we do not even know (or care) about its capacity.
   
   AFAI can tell, we have two use-cases of `capacity` atm:
   
   * deallocating the region when it is no longer used
   * computing the total size in bytes of arrays
   
   Because arrays share buffers, the total size of an array is currently misleading. For example, when the array is computed from `is_not_null` of another array, both the null buffer (buffer 0) and the `value` (buffer 1) share the same memory region, and thus IMO the total size computation based on `capacity` is incorrect. This is also true for complex structs on which buffers are shared within the same array.
   
   Thus, IMO `capacity` main use-case is for bookkeeping, on how to de-allocate the region. #8401 systematizes that idea, on which `BufferData` (renamed `Bytes` there) no longer has `capacity`, but an `enum` about how to de-allocate itself.
   
   Regardless, I would say that the vast majority of the use-cases on which we want to compare buffers is when we want to compare its contents, irrespectively of how they should be deallocated / capacity. Therefore, I would be happy to have buffer comparison be based on their actual content (in bytes). We just need to be careful about bitmaps, on which the comparison should be made in bits.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org