You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "tustvold (via GitHub)" <gi...@apache.org> on 2023/06/07 17:08:14 UTC

[GitHub] [arrow-rs] tustvold opened a new pull request, #4380: Add NullBuffer and BooleanBuffer From conversions

tustvold opened a new pull request, #4380:
URL: https://github.com/apache/arrow-rs/pull/4380

   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #.
   
   # Rationale for this change
    
   <!--
   Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
   Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.
   -->
   
   Follow up to #4379 and #4338
   
   # What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   # Are there any user-facing changes?
   
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!---
   If there are any breaking changes to public APIs, please add the `breaking change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #4380: Add NullBuffer and BooleanBuffer From conversions

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #4380:
URL: https://github.com/apache/arrow-rs/pull/4380#discussion_r1223262802


##########
arrow-array/src/array/primitive_array.rs:
##########
@@ -299,6 +299,17 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
     /// # Panics
     ///
     /// Panics if [`Self::try_new`] returns an error
+    ///
+    /// ```

Review Comment:
   +1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on a diff in pull request #4380: Add NullBuffer and BooleanBuffer From conversions

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #4380:
URL: https://github.com/apache/arrow-rs/pull/4380#discussion_r1223007188


##########
arrow-array/src/array/primitive_array.rs:
##########
@@ -299,6 +299,17 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
     /// # Panics
     ///
     /// Panics if [`Self::try_new`] returns an error
+    ///
+    /// ```

Review Comment:
   Giving a little context about when to use this constructor I think might help
   
   ```suggestion
       ///
       /// # Example
       ///
       /// Creating a [`PrimitiveArray`] directly from a [`ScalarBuffer`] and [`NullBuffer`]  ls generally
       /// the most performant approach and avoids any additional allocations.
       ///
       /// ```
   ```



##########
arrow-array/src/array/primitive_array.rs:
##########
@@ -299,6 +299,17 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
     /// # Panics
     ///
     /// Panics if [`Self::try_new`] returns an error
+    ///
+    /// ```
+    /// # use arrow_array::Int32Array;
+    /// # use arrow_array::types::Int32Type;
+    /// # use arrow_buffer::NullBuffer;
+    /// // [1, 2, 3, 4]
+    /// let array = Int32Array::new(vec![1, 2, 3, 4].into(), None);
+    /// // [1, null, 3, 4]
+    /// let nulls = NullBuffer::from(vec![true, false, true, true]);

Review Comment:
   I agree -- it should be called the `ValidBuffer` or else have defined `1` to mean null. As you say 🤷 



##########
arrow-array/src/array/primitive_array.rs:
##########
@@ -299,6 +299,17 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
     /// # Panics
     ///
     /// Panics if [`Self::try_new`] returns an error
+    ///
+    /// ```
+    /// # use arrow_array::Int32Array;
+    /// # use arrow_array::types::Int32Type;
+    /// # use arrow_buffer::NullBuffer;
+    /// // [1, 2, 3, 4]
+    /// let array = Int32Array::new(vec![1, 2, 3, 4].into(), None);
+    /// // [1, null, 3, 4]
+    /// let nulls = NullBuffer::from(vec![true, false, true, true]);

Review Comment:
   BTW could this be 
   
   ```suggestion
       /// let nulls = NullBuffer::from(&[true, false, true, true]);
   ```
   
   And potentially avoid the allocation? I see you added that `From` impl 🤔 



##########
arrow-buffer/src/buffer/boolean.rs:
##########
@@ -265,6 +265,30 @@ impl<'a> IntoIterator for &'a BooleanBuffer {
     }
 }
 
+impl From<&[bool]> for BooleanBuffer {
+    fn from(value: &[bool]) -> Self {
+        let mut builder = BooleanBufferBuilder::new(value.len());
+        builder.append_slice(value);
+        builder.finish()
+    }
+}
+
+impl From<Vec<bool>> for BooleanBuffer {
+    fn from(value: Vec<bool>) -> Self {
+        value.as_slice().into()
+    }
+}
+
+impl FromIterator<bool> for BooleanBuffer {
+    fn from_iter<T: IntoIterator<Item = bool>>(iter: T) -> Self {
+        let iter = iter.into_iter();
+        let (hint, _) = iter.size_hint();

Review Comment:
   given the hint is used for allocation I wonder if we should bound it (like avoid crash due to allocation error with usize::MAX)
   
   The docs say it should return `0` but isn't required
   
   https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.size_hint
   
   Maybe this is just a theoretical concern. If the same pattern exists elsewhere in the code I think this is fine oo



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4380: Add NullBuffer and BooleanBuffer From conversions

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4380:
URL: https://github.com/apache/arrow-rs/pull/4380#discussion_r1221930063


##########
arrow-buffer/src/buffer/null.rs:
##########
@@ -218,6 +218,30 @@ impl<'a> IntoIterator for &'a NullBuffer {
     }
 }
 
+impl From<BooleanBuffer> for NullBuffer {
+    fn from(value: BooleanBuffer) -> Self {
+        Self::new(value)
+    }
+}
+
+impl From<&[bool]> for NullBuffer {

Review Comment:
   I debated adding custom conversion that would count the nulls, but I wasn't entirely convinced it would be faster - as the bit counting is very fast, so I left it for now. We can easily optimise later



##########
arrow-buffer/src/buffer/null.rs:
##########
@@ -218,6 +218,30 @@ impl<'a> IntoIterator for &'a NullBuffer {
     }
 }
 
+impl From<BooleanBuffer> for NullBuffer {
+    fn from(value: BooleanBuffer) -> Self {
+        Self::new(value)
+    }
+}
+
+impl From<&[bool]> for NullBuffer {

Review Comment:
   I debated adding custom conversion that would count the nulls, but I wasn't entirely convinced it would be faster - as the bit counting is very fast, so I left it for now. We can easily optimise later



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold merged pull request #4380: Add NullBuffer and BooleanBuffer From conversions

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold merged PR #4380:
URL: https://github.com/apache/arrow-rs/pull/4380


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #4380: Add NullBuffer and BooleanBuffer From conversions

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #4380:
URL: https://github.com/apache/arrow-rs/pull/4380#discussion_r1223266193


##########
arrow-array/src/array/primitive_array.rs:
##########
@@ -299,6 +299,17 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
     /// # Panics
     ///
     /// Panics if [`Self::try_new`] returns an error
+    ///
+    /// ```
+    /// # use arrow_array::Int32Array;
+    /// # use arrow_array::types::Int32Type;
+    /// # use arrow_buffer::NullBuffer;
+    /// // [1, 2, 3, 4]
+    /// let array = Int32Array::new(vec![1, 2, 3, 4].into(), None);
+    /// // [1, null, 3, 4]
+    /// let nulls = NullBuffer::from(vec![true, false, true, true]);

Review Comment:
   For `From<&[bool]>`, looks like it still does allocation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4380: Add NullBuffer and BooleanBuffer From conversions

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4380:
URL: https://github.com/apache/arrow-rs/pull/4380#discussion_r1221929177


##########
arrow-array/src/array/primitive_array.rs:
##########
@@ -299,6 +299,17 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
     /// # Panics
     ///
     /// Panics if [`Self::try_new`] returns an error
+    ///
+    /// ```
+    /// # use arrow_array::Int32Array;
+    /// # use arrow_array::types::Int32Type;
+    /// # use arrow_buffer::NullBuffer;
+    /// // [1, 2, 3, 4]
+    /// let array = Int32Array::new(vec![1, 2, 3, 4].into(), None);
+    /// // [1, null, 3, 4]
+    /// let nulls = NullBuffer::from(vec![true, false, true, true]);

Review Comment:
   It is unfortunate that arrow went with the name null buffer, and then made true mean not null. But hey ho



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4380: Add NullBuffer and BooleanBuffer From conversions

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4380:
URL: https://github.com/apache/arrow-rs/pull/4380#discussion_r1223314766


##########
arrow-array/src/array/primitive_array.rs:
##########
@@ -299,6 +299,17 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
     /// # Panics
     ///
     /// Panics if [`Self::try_new`] returns an error
+    ///
+    /// ```
+    /// # use arrow_array::Int32Array;
+    /// # use arrow_array::types::Int32Type;
+    /// # use arrow_buffer::NullBuffer;
+    /// // [1, 2, 3, 4]
+    /// let array = Int32Array::new(vec![1, 2, 3, 4].into(), None);
+    /// // [1, null, 3, 4]
+    /// let nulls = NullBuffer::from(vec![true, false, true, true]);

Review Comment:
   It would eliminate the allocation for the Vec, although the compiler might be smart enough to do that anyway, will update



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4380: Add NullBuffer and BooleanBuffer From conversions

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4380:
URL: https://github.com/apache/arrow-rs/pull/4380#discussion_r1223320796


##########
arrow-buffer/src/buffer/boolean.rs:
##########
@@ -265,6 +265,30 @@ impl<'a> IntoIterator for &'a BooleanBuffer {
     }
 }
 
+impl From<&[bool]> for BooleanBuffer {
+    fn from(value: &[bool]) -> Self {
+        let mut builder = BooleanBufferBuilder::new(value.len());
+        builder.append_slice(value);
+        builder.finish()
+    }
+}
+
+impl From<Vec<bool>> for BooleanBuffer {
+    fn from(value: Vec<bool>) -> Self {
+        value.as_slice().into()
+    }
+}
+
+impl FromIterator<bool> for BooleanBuffer {
+    fn from_iter<T: IntoIterator<Item = bool>>(iter: T) -> Self {
+        let iter = iter.into_iter();
+        let (hint, _) = iter.size_hint();

Review Comment:
   Given the docs for size_hint state
   
   > size_hint() is primarily intended to be used for optimizations such as reserving space for the elements of the iterator
   
   I think it is reasonable to use the lower bound to pre-allocate, we also do this in other places



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4380: Add NullBuffer and BooleanBuffer From conversions

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4380:
URL: https://github.com/apache/arrow-rs/pull/4380#discussion_r1223314766


##########
arrow-array/src/array/primitive_array.rs:
##########
@@ -299,6 +299,17 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
     /// # Panics
     ///
     /// Panics if [`Self::try_new`] returns an error
+    ///
+    /// ```
+    /// # use arrow_array::Int32Array;
+    /// # use arrow_array::types::Int32Type;
+    /// # use arrow_buffer::NullBuffer;
+    /// // [1, 2, 3, 4]
+    /// let array = Int32Array::new(vec![1, 2, 3, 4].into(), None);
+    /// // [1, null, 3, 4]
+    /// let nulls = NullBuffer::from(vec![true, false, true, true]);

Review Comment:
   It would eliminate the allocation for the Vec, although the compiler might be smart enough to do that anyway, will update.
   
   Edit this actually causes issues because it infers the type as `&[bool; 4]`, will leave as `vec`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org