You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/24 05:25:59 UTC

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8960: ARROW-10540: [Rust] Extended filter kernel to all types and improved performance

jorgecarleitao commented on a change in pull request #8960:
URL: https://github.com/apache/arrow/pull/8960#discussion_r548387772



##########
File path: rust/arrow/src/array/transform/variable_size.rs
##########
@@ -43,33 +47,33 @@ pub(super) fn build_extend<T: OffsetSizeTrait>(array: &ArrayData) -> Extend {
         // fast case where we can copy regions without null issues
         Box::new(
             move |mutable: &mut _MutableArrayData, _, start: usize, len: usize| {
-                let mutable_offsets = mutable.buffer::<T>(0);
-                let last_offset = mutable_offsets[mutable_offsets.len() - 1];
-                // offsets
-                let buffer = &mut mutable.buffers[0];
+                let offset_buffer = &mut mutable.buffer1;
+                let values_buffer = &mut mutable.buffer2;
+
+                // this is safe due to how offset is built. See details on `get_last_offset`
+                let last_offset = unsafe { get_last_offset(offset_buffer) };

Review comment:
       I get your point.
   
   `get_last_offset` must only be used in offset buffers, i.e. buffers whose bytes represent a i32 or i64 and were specifically built from those types. Doing so in other buffers is undefined behavior (even with the safeguards of using `align_to`). We can remove the `unsafe` mark from `get_last_offset`, though.
   
   My proposal is that we refactor the `src/transform` code so that it has a struct specific for each array type (that implements some trait for `dyn` support). This will allow `last_offset` to be stored in the array-specific struct, thereby avoiding this problem altogether (of having to read bytes written to the `MutableBuffer`).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org