You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/09 11:02:14 UTC

[GitHub] [arrow-rs] alamb commented on a diff in pull request #1825: Replace RawPtrBox with ScalarBuffer, reduce `unsafe` usage (#1811)

alamb commented on code in PR #1825:
URL: https://github.com/apache/arrow-rs/pull/1825#discussion_r893357988


##########
arrow/src/array/array_binary.rs:
##########
@@ -64,67 +64,40 @@ impl<OffsetSize: OffsetSizeTrait> GenericBinaryArray<OffsetSize> {
 
     /// Returns a clone of the value data buffer
     pub fn value_data(&self) -> Buffer {
-        self.data.buffers()[1].clone()
+        self.value_data.clone()
     }
 
     /// Returns the offset values in the offsets buffer
     #[inline]
     pub fn value_offsets(&self) -> &[OffsetSize] {

Review Comment:
   ❤️ 



##########
arrow/src/array/array_binary.rs:
##########
@@ -39,8 +39,8 @@ use crate::{buffer::MutableBuffer, datatypes::DataType};
 /// binary data.
 pub struct GenericBinaryArray<OffsetSize: OffsetSizeTrait> {
     data: ArrayData,
-    value_offsets: RawPtrBox<OffsetSize>,
-    value_data: RawPtrBox<u8>,
+    value_offsets: ScalarBuffer<OffsetSize>,

Review Comment:
   I wonder if calling `ScalarBuffer` something like `TypedBuffer` would make its purpose slightly clearer 🤔 



##########
arrow/src/array/array_string.rs:
##########
@@ -89,30 +81,28 @@ impl<OffsetSize: OffsetSizeTrait> GenericStringArray<OffsetSize> {
     /// caller is responsible for ensuring that index is within the array bounds
     #[inline]
     pub unsafe fn value_unchecked(&self, i: usize) -> &str {
-        let end = self.value_offsets().get_unchecked(i + 1);
-        let start = self.value_offsets().get_unchecked(i);
-
-        // Soundness
-        // pointer alignment & location is ensured by RawPtrBox
-        // buffer bounds/offset is ensured by the value_offset invariants
-        // ISSUE: utf-8 well formedness is not checked
-
         // Safety of `to_isize().unwrap()`
         // `start` and `end` are &OffsetSize, which is a generic type that implements the
         // OffsetSizeTrait. Currently, only i32 and i64 implement OffsetSizeTrait,
         // both of which should cleanly cast to isize on an architecture that supports
         // 32/64-bit offsets
-        let slice = std::slice::from_raw_parts(
-            self.value_data.as_ptr().offset(start.to_isize().unwrap()),
-            (*end - *start).to_usize().unwrap(),
-        );
+        let start = self.value_offsets.get_unchecked(i).to_isize().unwrap();
+        let end = self.value_offsets.get_unchecked(i + 1).to_isize().unwrap();
+
+        // buffer bounds/offset is ensured by the value_offset invariants

Review Comment:
   and to be clear, this method is `unsafe` so getting unchecked values seems like it would be ok...



##########
arrow/src/array/array_string.rs:
##########
@@ -573,6 +566,18 @@ mod tests {
             .expect("All null array has valid array data");
     }
 
+    #[test]
+    fn test_string_array_empty_offsets() {

Review Comment:
   👍 



##########
arrow/src/array/raw_pointer.rs:
##########
@@ -1,67 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one

Review Comment:
   🔥 👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org