You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/18 08:18:17 UTC

[GitHub] [arrow-rs] viirya opened a new pull request, #3134: Add add_scalar_mut and add_scalar_checked_mut

viirya opened a new pull request, #3134:
URL: https://github.com/apache/arrow-rs/pull/3134

   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #3133.
   
   # Rationale for this change
    
   <!--
   Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
   Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.
   -->
   
   # What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   # Are there any user-facing changes?
   
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!---
   If there are any breaking changes to public APIs, please add the `breaking change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on pull request #3134: Add add_scalar_mut and add_scalar_checked_mut

Posted by GitBox <gi...@apache.org>.
viirya commented on PR #3134:
URL: https://github.com/apache/arrow-rs/pull/3134#issuecomment-1328366171

   I ran a simple benchmark which calls `add` or `add_mut` on two primitive arrays, e.g.,
   
   ```rust
   let mut arr_a = create_primitive_array::<Float32Type>(5120000, 0.0);
   
   for _ in 0..10 {
       let arr_b = create_primitive_array::<Float32Type>(5120000, 0.0);
       // arr_a = add(&arr_a, &arr_b).unwrap();
       arr_a = add_mut(arr_a, &arr_b).unwrap().unwrap();
   }
   ```
   
   ```
   array add               time:   [675.93 ms 676.09 ms 676.28 ms]
                           change: [-3.2262% -3.0312% -2.8816%] (p = 0.00 < 0.05)
                           Performance has improved.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on pull request #3134: Add add_scalar_mut and add_scalar_checked_mut

Posted by GitBox <gi...@apache.org>.
viirya commented on PR #3134:
URL: https://github.com/apache/arrow-rs/pull/3134#issuecomment-1329417024

   Yeah, thought about this option too. I think it makes sense. I can remove `_mut` arithmetic kernels and only keep `try_unary_mut` so we can implement these kernels.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3134: Add try_unary_mut

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #3134:
URL: https://github.com/apache/arrow-rs/pull/3134#discussion_r1033841464


##########
arrow-array/src/builder/primitive_builder.rs:
##########
@@ -285,6 +285,26 @@ impl<T: ArrowPrimitiveType> PrimitiveBuilder<T> {
     pub fn values_slice_mut(&mut self) -> &mut [T::Native] {
         self.values_builder.as_slice_mut()
     }
+
+    /// Returns the current values buffer as a slice
+    #[allow(dead_code)]

Review Comment:
   I think this shouldn't be necessary, as they are public



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on pull request #3134: Add add_scalar_mut and add_scalar_checked_mut

Posted by GitBox <gi...@apache.org>.
viirya commented on PR #3134:
URL: https://github.com/apache/arrow-rs/pull/3134#issuecomment-1324606017

   Simply updated slice related functions.
   
   I will update the result type later or tomorrow. (occupied occasionally by other matters in recent days, will be a bit slow response)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on pull request #3134: Add add_scalar_mut and add_scalar_checked_mut

Posted by GitBox <gi...@apache.org>.
viirya commented on PR #3134:
URL: https://github.com/apache/arrow-rs/pull/3134#issuecomment-1323309409

   > Thank you for working on this, I wonder if you have some benchmarks of how this compares to the allocating version, ideally using a modern allocator like jemalloc.
   
   Not yet on benchmarking. I think it should be better and hard to think that it could be slower. I will revise this based on above comments and run a benchmark later. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #3134: Add try_unary_mut

Posted by GitBox <gi...@apache.org>.
viirya commented on code in PR #3134:
URL: https://github.com/apache/arrow-rs/pull/3134#discussion_r1033844414


##########
arrow-array/src/builder/primitive_builder.rs:
##########
@@ -285,6 +285,26 @@ impl<T: ArrowPrimitiveType> PrimitiveBuilder<T> {
     pub fn values_slice_mut(&mut self) -> &mut [T::Native] {
         self.values_builder.as_slice_mut()
     }
+
+    /// Returns the current values buffer as a slice
+    #[allow(dead_code)]

Review Comment:
   Yeah, removed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #3134: Add add_scalar_mut and add_scalar_checked_mut

Posted by GitBox <gi...@apache.org>.
tustvold commented on PR #3134:
URL: https://github.com/apache/arrow-rs/pull/3134#issuecomment-1328988681

   This seems like quite a lot of additional complexity for only a 3% performance improvement. What do you think about just providing `try_unary_mut` and leave it at that, just thinking about the number of additional `_mut` arithmetic kernels this is going to produce? Especially given that most of these kernels would be trivial for a user to implement themselves, should they wish to?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3134: Add add_scalar_mut and add_scalar_checked_mut

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #3134:
URL: https://github.com/apache/arrow-rs/pull/3134#discussion_r1028142772


##########
arrow-array/src/builder/primitive_builder.rs:
##########
@@ -228,6 +228,14 @@ impl<T: ArrowPrimitiveType> PrimitiveBuilder<T> {
     pub fn values_slice_mut(&mut self) -> &mut [T::Native] {
         self.values_builder.as_slice_mut()
     }
+
+    /// Returns the current values buffer and null buffer as a slice
+    pub fn as_slice(&mut self) -> (&mut [T::Native], Option<&[u8]>) {
+        (
+            self.values_builder.as_slice_mut(),
+            self.null_buffer_builder.as_slice(),
+        )

Review Comment:
   ```suggestion
       pub fn slices_mut(&mut self) -> (&mut [T::Native], Option<&mut [u8]>) {
           (
               self.values_builder.as_slice_mut(),
               self.null_buffer_builder.as_slice_mut(),
           )
   ```
   Or something, it seems a bit unusual for both to not be mutable.
   
   We could then add `validity_slice` and `validity_slice_mut` for completeness



##########
arrow-array/src/array/primitive_array.rs:
##########
@@ -469,6 +469,42 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
         })
     }
 
+    /// Applies an unary and fallible function to all valid values in a mutable primitive array.
+    /// Mutable primitive array means that the buffer is not shared with other arrays.
+    /// As a result, this mutates the buffer directly without allocating new buffer.
+    ///
+    /// This is unlike [`Self::unary_mut`] which will apply an infallible function to all rows
+    /// regardless of validity, in many cases this will be significantly faster and should
+    /// be preferred if `op` is infallible.
+    ///
+    /// This returns an `Err` for two cases. First is input array is shared buffer with other
+    /// array. In the case, returned `Err` wraps a `Ok` of input array. Second, if the function
+    /// encounters an error during applying on values. In the case, returned `Err` wraps an
+    /// `Err` of the actual error.
+    ///
+    /// Note: LLVM is currently unable to effectively vectorize fallible operations
+    pub fn try_unary_mut<F, E>(
+        self,
+        op: F,
+    ) -> Result<PrimitiveArray<T>, Result<PrimitiveArray<T>, E>>

Review Comment:
   ```suggestion
       ) -> Result<Result<PrimitiveArray<T>, E>, PrimitiveArray<T>>
   ```
   
   I think the reverse result order makes more sense, as it represents the order of fallibility. If we can't convert to a builder is the first error case, then within that we have the error case of ArrowError.
   
   I think it will also make it easier to implement a fallback, e.g.
   
   ```
   arr.try_unary_mut(&mut op).unwrap_or_else(|arr| arr.try_unary(&mut op))?
   ```



##########
arrow-array/src/builder/null_buffer_builder.rs:
##########
@@ -150,6 +150,11 @@ impl NullBufferBuilder {
             self.bitmap_builder = Some(b);
         }
     }
+
+    #[inline]
+    pub fn as_slice(&self) -> Option<&[u8]> {

Review Comment:
   :+1:
   
   Could also add an `as_slice_mut` for completeness



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #3134: Add add_scalar_mut and add_scalar_checked_mut

Posted by GitBox <gi...@apache.org>.
viirya commented on code in PR #3134:
URL: https://github.com/apache/arrow-rs/pull/3134#discussion_r1029027250


##########
arrow-array/src/array/primitive_array.rs:
##########
@@ -469,6 +469,42 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
         })
     }
 
+    /// Applies an unary and fallible function to all valid values in a mutable primitive array.
+    /// Mutable primitive array means that the buffer is not shared with other arrays.
+    /// As a result, this mutates the buffer directly without allocating new buffer.
+    ///
+    /// This is unlike [`Self::unary_mut`] which will apply an infallible function to all rows
+    /// regardless of validity, in many cases this will be significantly faster and should
+    /// be preferred if `op` is infallible.
+    ///
+    /// This returns an `Err` for two cases. First is input array is shared buffer with other
+    /// array. In the case, returned `Err` wraps a `Ok` of input array. Second, if the function
+    /// encounters an error during applying on values. In the case, returned `Err` wraps an
+    /// `Err` of the actual error.
+    ///
+    /// Note: LLVM is currently unable to effectively vectorize fallible operations
+    pub fn try_unary_mut<F, E>(
+        self,
+        op: F,
+    ) -> Result<PrimitiveArray<T>, Result<PrimitiveArray<T>, E>>

Review Comment:
   Makes sense.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #3134: Add add_scalar_mut and add_scalar_checked_mut

Posted by GitBox <gi...@apache.org>.
tustvold commented on PR #3134:
URL: https://github.com/apache/arrow-rs/pull/3134#issuecomment-1323311176

   > I think it should be better and hard to think that it could be slower
   
   Yeah, I'm just curious as it is a non-trivial additional complexity and so I'm curious what difference it makes :sweat_smile: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #3134: Add add_scalar_mut and add_scalar_checked_mut

Posted by GitBox <gi...@apache.org>.
viirya commented on code in PR #3134:
URL: https://github.com/apache/arrow-rs/pull/3134#discussion_r1029028695


##########
arrow-array/src/builder/primitive_builder.rs:
##########
@@ -228,6 +228,14 @@ impl<T: ArrowPrimitiveType> PrimitiveBuilder<T> {
     pub fn values_slice_mut(&mut self) -> &mut [T::Native] {
         self.values_builder.as_slice_mut()
     }
+
+    /// Returns the current values buffer and null buffer as a slice
+    pub fn as_slice(&mut self) -> (&mut [T::Native], Option<&[u8]>) {
+        (
+            self.values_builder.as_slice_mut(),
+            self.null_buffer_builder.as_slice(),
+        )

Review Comment:
   Okay, it is because for this I don't need null buffer slice to be mutable. But yes, it looks a bit strange to have one mutable with one non-mutable. 😄 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #3134: Add add_scalar_mut and add_scalar_checked_mut

Posted by GitBox <gi...@apache.org>.
viirya commented on code in PR #3134:
URL: https://github.com/apache/arrow-rs/pull/3134#discussion_r1029030589


##########
arrow-array/src/builder/null_buffer_builder.rs:
##########
@@ -150,6 +150,11 @@ impl NullBufferBuilder {
             self.bitmap_builder = Some(b);
         }
     }
+
+    #[inline]
+    pub fn as_slice(&self) -> Option<&[u8]> {

Review Comment:
   Yea



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ursabot commented on pull request #3134: Add try_unary_mut

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #3134:
URL: https://github.com/apache/arrow-rs/pull/3134#issuecomment-1329781422

   Benchmark runs are scheduled for baseline = 6f41b95de4a0ac33319b9e96e179b47fcd71fdbf and contender = 5d84746cfdfe3ae9a2678f10b4dbb2e9385dc479. 5d84746cfdfe3ae9a2678f10b4dbb2e9385dc479 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/3eafce19a33d45bb81f184c18fee393c...15004d5b525a43008bb61bafa1981c58/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/55fed41ee8ca49c5bbef1bd825e4f474...02f9b1a9151a4043b04b44e9823de3f4/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/733e3d0fef224c9ea7b2a8913c500199...4c220577950448d98d8aa80b189618cb/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/88f846162d9642e8bacd38008c8825a3...36885bfe31fd43beb0a98956236214b2/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold merged pull request #3134: Add try_unary_mut

Posted by GitBox <gi...@apache.org>.
tustvold merged PR #3134:
URL: https://github.com/apache/arrow-rs/pull/3134


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org