You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/15 20:48:01 UTC

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3115: Add COW conversion for Buffer and PrimitiveArray and unary_mut

tustvold commented on code in PR #3115:
URL: https://github.com/apache/arrow-rs/pull/3115#discussion_r1023233519


##########
arrow-buffer/src/buffer/immutable.rs:
##########
@@ -227,6 +227,28 @@ impl Buffer {
     pub fn count_set_bits_offset(&self, offset: usize, len: usize) -> usize {
         UnalignedBitChunk::new(self.as_slice(), offset, len).count_ones()
     }
+
+    /// Returns `MutableBuffer` for mutating the buffer if this buffer is not shared.
+    pub fn into_mutable(self, len: usize) -> Result<MutableBuffer, Self> {
+        let offset_ptr = self.as_ptr();
+        let offset = self.offset;
+        let length = self.length;
+        Arc::try_unwrap(self.data)
+            .map(|bytes| {

Review Comment:
   We need to verify that `bytes` has `deallocation` of `Deallocation::Arrow`, otherwise this is not well formed



##########
arrow-buffer/src/buffer/immutable.rs:
##########
@@ -227,6 +227,28 @@ impl Buffer {
     pub fn count_set_bits_offset(&self, offset: usize, len: usize) -> usize {
         UnalignedBitChunk::new(self.as_slice(), offset, len).count_ones()
     }
+
+    /// Returns `MutableBuffer` for mutating the buffer if this buffer is not shared.
+    pub fn into_mutable(self, len: usize) -> Result<MutableBuffer, Self> {
+        let offset_ptr = self.as_ptr();
+        let offset = self.offset;
+        let length = self.length;
+        Arc::try_unwrap(self.data)
+            .map(|bytes| {
+                // The pointer of underlying buffer should not be offset.

Review Comment:
   This is a somewhat annoying limitation, I wonder if there is some way to avoid it :thinking: 
   
   Perhaps we could push the offset into Bytes :thinking: 



##########
arrow-buffer/src/buffer/mutable.rs:
##########
@@ -92,6 +92,15 @@ impl MutableBuffer {
         }
     }
 
+    /// Allocates a new [MutableBuffer] from given pointer `ptr`, `capacity`.

Review Comment:
   I think it would be "safer" for the to be `from_bytes` and contain the necessary logic to make that well-formed, i.e. check the deallocation is as expected, etc...



##########
arrow-array/src/builder/null_buffer_builder.rs:
##########
@@ -42,6 +42,15 @@ impl NullBufferBuilder {
         }
     }
 
+    pub fn new_from_buffer(buffer: Option<MutableBuffer>, capacity: usize) -> Self {

Review Comment:
   I don't think the option here is necessary, as if there is no MutableBuffer, callers can just call the regular constructor?



##########
arrow-data/src/data.rs:
##########
@@ -387,6 +387,10 @@ impl ArrayData {
         &self.buffers[..]
     }
 
+    pub fn get_buffers(self) -> Vec<Buffer> {

Review Comment:
   What about child_data, null_buffers, etc... ?
   
   I wonder if a more general pattern would be to use `ArrayData::buffers`, clone the desired buffers and then drop the ArrayData?
   
   



##########
arrow-array/src/array/primitive_array.rs:
##########
@@ -397,6 +397,54 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
         unsafe { build_primitive_array(len, buffer, null_count, null_buffer) }
     }
 
+    /// Applies an unary and infallible function to a mutable primitive array.
+    /// Mutable primitive array means that the buffer is not shared with other arrays.
+    /// As a result, this mutates the buffer directly without allocating new buffer.
+    ///
+    /// # Implementation
+    ///
+    /// This will apply the function for all values, including those on null slots.
+    /// This implies that the operation must be infallible for any value of the corresponding type
+    /// or this function may panic.
+    /// # Example
+    /// ```rust
+    /// # use arrow_array::{Int32Array, types::Int32Type};
+    /// # fn main() {
+    /// let array = Int32Array::from(vec![Some(5), Some(7), None]);
+    /// let c = array.unary_mut(|x| x * 2 + 1).unwrap();
+    /// assert_eq!(c, Int32Array::from(vec![Some(11), Some(15), None]));
+    /// # }
+    /// ```
+    pub fn unary_mut<F>(self, op: F) -> Result<PrimitiveArray<T>, ArrowError>

Review Comment:
   I wonder if this should return `Self` as an error to match the other similar methods?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org