You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/20 03:36:49 UTC

[GitHub] [arrow] tyrelr opened a new pull request #8973: Iterate primitive buffers by slice

tyrelr opened a new pull request #8973:
URL: https://github.com/apache/arrow/pull/8973


   Iterating slices instead of indexes seems to improve performance of non-simd arithmetic operations.  
   
   This adds a new method raw_values_slice to PrimitiveArray (so named to pun off of the raw_values function which returns a pointer).  A few of the functions in PrimitiveArray rely on the caller for safety guarantees (bounds-checks for value_slice, value), so should probably be unsafe?  But they are used widely, so I thought it simpler to start by adding a safer alternative...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tyrelr commented on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice

Posted by GitBox <gi...@apache.org>.
tyrelr commented on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748753938


   I ran out of time today (and probably the next few days), but just as an update... I hit a speedbump looking at removing the .value(...) function.
   
   At a first pass-through at dropping the PrimitiveArray.value() function, I hit a few usecases which are not trivially handled by a typed-slice in a performant way.
   1) filter kernel does a batched indexing-like operation based on bits being set in a u64.
   This can probably be re-arranged to minimize/eliminate bounds checks.
   2) sort & take kernels appear to cherrypick indexes based on another index array
   These are tricky.  We may be able to minimize the need bounds checks in some way (finding contiguous runs to batch-copy instead of one-by-one? checking max index?) but all are adding overhead at a different spot.
   3) csv writing & display try to iterate N columns in lock-step
   This can probably be rewritten with some ahead-of-time bounds checks, perhaps relying on some kind of Vec<&dyn Iter<Item=String>> by having each column built itself an iterator and map itself to String...  The naive & slow approach of editing the current macro to use values()[$i] causes a compilation when the macro is used for the BooleanArray type (it still has a values() function returning a buffer, like PrimitiveArray used to). I haven't looked at whether BooleanArray could also have its API cut down, or if the two should just be separated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#discussion_r547687817



##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -63,17 +63,28 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
         self.data.is_empty()
     }
 
-    /// Returns a raw pointer to the values of this array.
-    pub fn raw_values(&self) -> *const T::Native {
-        unsafe { self.raw_values.get().add(self.data.offset()) }
-    }
-
     /// Returns a slice for the given offset and length
     ///
     /// Note this doesn't do any bound checking, for performance reason.
-    pub fn value_slice(&self, offset: usize, len: usize) -> &[T::Native] {
-        let raw =
-            unsafe { std::slice::from_raw_parts(self.raw_values().add(offset), len) };
+    /// # Safety
+    /// caller must ensure that the passed in offset + len are less than the array len()
+    #[deprecated(note = "Please use values() instead")]
+    pub unsafe fn value_slice(&self, offset: usize, len: usize) -> &[T::Native] {
+        let raw = std::slice::from_raw_parts(
+            self.raw_values.get().add(self.data.offset()).add(offset),
+            len,
+        );
+        &raw[..]
+    }
+
+    /// Returns a slice of the values of this array
+    pub fn values(&self) -> &[T::Native] {

Review comment:
       `inline`?

##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -63,17 +63,28 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
         self.data.is_empty()
     }
 
-    /// Returns a raw pointer to the values of this array.
-    pub fn raw_values(&self) -> *const T::Native {
-        unsafe { self.raw_values.get().add(self.data.offset()) }
-    }
-
     /// Returns a slice for the given offset and length
     ///
     /// Note this doesn't do any bound checking, for performance reason.
-    pub fn value_slice(&self, offset: usize, len: usize) -> &[T::Native] {
-        let raw =
-            unsafe { std::slice::from_raw_parts(self.raw_values().add(offset), len) };
+    /// # Safety
+    /// caller must ensure that the passed in offset + len are less than the array len()
+    #[deprecated(note = "Please use values() instead")]
+    pub unsafe fn value_slice(&self, offset: usize, len: usize) -> &[T::Native] {
+        let raw = std::slice::from_raw_parts(
+            self.raw_values.get().add(self.data.offset()).add(offset),
+            len,
+        );
+        &raw[..]
+    }
+
+    /// Returns a slice of the values of this array
+    pub fn values(&self) -> &[T::Native] {
+        let raw = unsafe {

Review comment:
       IMO we should justify that this is safe because `from(ArrayDataRef)` asserts that the pointer is aligned with T, etc, as we describe in the `README`.

##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -63,17 +63,28 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
         self.data.is_empty()
     }
 
-    /// Returns a raw pointer to the values of this array.
-    pub fn raw_values(&self) -> *const T::Native {
-        unsafe { self.raw_values.get().add(self.data.offset()) }
-    }
-
     /// Returns a slice for the given offset and length
     ///
     /// Note this doesn't do any bound checking, for performance reason.
-    pub fn value_slice(&self, offset: usize, len: usize) -> &[T::Native] {
-        let raw =
-            unsafe { std::slice::from_raw_parts(self.raw_values().add(offset), len) };
+    /// # Safety
+    /// caller must ensure that the passed in offset + len are less than the array len()
+    #[deprecated(note = "Please use values() instead")]
+    pub unsafe fn value_slice(&self, offset: usize, len: usize) -> &[T::Native] {
+        let raw = std::slice::from_raw_parts(
+            self.raw_values.get().add(self.data.offset()).add(offset),
+            len,
+        );
+        &raw[..]
+    }
+
+    /// Returns a slice of the values of this array
+    pub fn values(&self) -> &[T::Native] {
+        let raw = unsafe {
+            std::slice::from_raw_parts(
+                self.raw_values.get().add(self.data.offset()),
+                self.len(),
+            )
+        };
         &raw[..]

Review comment:
       Is this needed here?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748791491


   Looks like a nice simplification to me and if it compiles makes sense to integrate it. Do you have any benchmark results @tyrelr ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tyrelr edited a comment on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice

Posted by GitBox <gi...@apache.org>.
tyrelr edited a comment on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748753938


   I ran out of time today (and probably the next few days), but just as an update... I hit a speedbump looking at removing the .value(...) function.
   
   At a first pass-through at dropping the PrimitiveArray.value() function, I hit a few usecases which are not trivially handled by a typed-slice in a performant way.
   1) filter kernel does a batched indexing-like operation based on bits being set in a u64.
   This can probably be re-arranged to minimize/eliminate bounds checks.
   2) sort & take kernels appear to cherrypick indexes based on another index array
   These are tricky.  We may be able to minimize the need bounds checks in some way (finding contiguous runs to batch-copy instead of one-by-one? checking max index?) but all are adding overhead at a different spot.
   3) csv writing & display try to iterate N columns in lock-step
   This can probably be rewritten with some ahead-of-time bounds checks, perhaps relying on some kind of Vec<&dyn Iter<Item=String>> by having each column built itself an iterator and map itself to String...  The naive & slow approach of editing the current macro to use values()[$i] causes a compilation when the macro is used for the BooleanArray type (it still has a values() function returning a buffer, like PrimitiveArray used to). I haven't looked at whether BooleanArray could also have its API cut down, or if the two should just be separated.
   
   [Edit: to be clear, I have not actually attempted any of these edits yet, so I do not have real performance numbers, just a gut feeling]


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748559191


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=h1) Report
   > Merging [#8973](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=desc) (2e0aa18) into [master](https://codecov.io/gh/apache/arrow/commit/5eb6ce18e110fe6534ab08f158dc0b8a2744e59e?el=desc) (5eb6ce1) will **decrease** coverage by `0.52%`.
   > The diff coverage is `82.35%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8973/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #8973      +/-   ##
   ==========================================
   - Coverage   83.19%   82.66%   -0.53%     
   ==========================================
     Files         199      200       +1     
     Lines       48661    49767    +1106     
   ==========================================
   + Hits        40482    41139     +657     
   - Misses       8179     8628     +449     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/compute/kernels/comparison.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2NvbXBhcmlzb24ucnM=) | `96.28% <ø> (ø)` | |
   | [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `91.68% <66.66%> (-0.59%)` | :arrow_down: |
   | [rust/arrow/src/array/array\_dictionary.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfZGljdGlvbmFyeS5ycw==) | `85.05% <100.00%> (ø)` | |
   | [rust/arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYnVpbGRlci5ycw==) | `84.11% <100.00%> (-0.07%)` | :arrow_down: |
   | [rust/arrow/src/compute/kernels/aggregate.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2FnZ3JlZ2F0ZS5ycw==) | `75.00% <100.00%> (+3.57%)` | :arrow_up: |
   | [rust/arrow/src/compute/kernels/arithmetic.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2FyaXRobWV0aWMucnM=) | `89.83% <100.00%> (+0.04%)` | :arrow_up: |
   | [rust/datafusion/src/physical\_plan/functions.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2Z1bmN0aW9ucy5ycw==) | `79.34% <0.00%> (-2.68%)` | :arrow_down: |
   | [rust/arrow/src/compute/kernels/sort.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL3NvcnQucnM=) | `93.42% <0.00%> (-2.34%)` | :arrow_down: |
   | [rust/arrow/src/ipc/gen/File.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL2dlbi9GaWxlLnJz) | `40.94% <0.00%> (-2.20%)` | :arrow_down: |
   | [rust/parquet/src/arrow/arrow\_writer.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJvd193cml0ZXIucnM=) | `95.58% <0.00%> (-1.76%)` | :arrow_down: |
   | ... and [39 more](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=footer). Last update [5eb6ce1...2e0aa18](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748559191


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=h1) Report
   > Merging [#8973](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=desc) (2c823df) into [master](https://codecov.io/gh/apache/arrow/commit/5eb6ce18e110fe6534ab08f158dc0b8a2744e59e?el=desc) (5eb6ce1) will **decrease** coverage by `0.02%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8973/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #8973      +/-   ##
   ==========================================
   - Coverage   83.19%   83.17%   -0.03%     
   ==========================================
     Files         199      200       +1     
     Lines       48661    48939     +278     
   ==========================================
   + Hits        40482    40703     +221     
   - Misses       8179     8236      +57     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/array/array\_dictionary.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfZGljdGlvbmFyeS5ycw==) | `85.05% <100.00%> (ø)` | |
   | [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `92.36% <100.00%> (+0.09%)` | :arrow_up: |
   | [rust/arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYnVpbGRlci5ycw==) | `84.17% <100.00%> (ø)` | |
   | [rust/arrow/src/compute/kernels/aggregate.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2FnZ3JlZ2F0ZS5ycw==) | `75.00% <100.00%> (+3.57%)` | :arrow_up: |
   | [rust/arrow/src/compute/kernels/arithmetic.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2FyaXRobWV0aWMucnM=) | `89.83% <100.00%> (+0.04%)` | :arrow_up: |
   | [rust/arrow/src/compute/kernels/sort.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL3NvcnQucnM=) | `93.42% <0.00%> (-2.34%)` | :arrow_down: |
   | [rust/arrow/src/compute/util.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS91dGlsLnJz) | `98.93% <0.00%> (-0.44%)` | :arrow_down: |
   | [rust/arrow/src/array/cast.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvY2FzdC5ycw==) | `100.00% <0.00%> (ø)` | |
   | [rust/benchmarks/src/bin/tpch.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9iZW5jaG1hcmtzL3NyYy9iaW4vdHBjaC5ycw==) | `0.00% <0.00%> (ø)` | |
   | [rust/datafusion/src/physical\_plan/planner.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3BsYW5uZXIucnM=) | `80.45% <0.00%> (ø)` | |
   | ... and [7 more](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=footer). Last update [5eb6ce1...2c823df](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tyrelr commented on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice

Posted by GitBox <gi...@apache.org>.
tyrelr commented on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-749581718


   Yep, I forgot that simd wasn't a default feature so I didn't test that locally.  I'll take a look at that tonight, hopefully.  This will touch a few lines already modified by PR #8975, but this change is simple enough it shouldn't cause concern.
   
   @Dandandan I ran master vs. current head over night (twice to try to distinguish environmental performance from code performance).  Using critcmp to filter to out anything smaller than a 10% difference, it looks like this:
   ```
   group                                    head-2c823df03-ARROW-10989             head-2c823df03-ARROW-10989-run2        master-ARROW-10989                     master-ARROW-10989-run2
   -----                                    --------------------------             -------------------------------        ------------------                     -----------------------
   add 512                                  1.30    408.5±3.37ns        ? B/sec    1.00    315.0±1.97ns        ? B/sec    3.46   1088.8±5.62ns        ? B/sec    3.64   1146.1±4.42ns        ? B/sec
   add_nulls_512                            1.09    423.8±3.18ns        ? B/sec    1.00    389.3±2.58ns        ? B/sec    2.98   1158.3±6.43ns        ? B/sec    2.95   1149.9±6.95ns        ? B/sec
   array_from_vec 128                       1.08    444.3±3.26ns        ? B/sec    1.11    458.3±1.97ns        ? B/sec    1.00    411.4±2.09ns        ? B/sec    1.05    432.2±2.82ns        ? B/sec
   bench_primitive                          1.00   1109.8±9.80µs     3.5 GB/sec    1.03   1146.2±6.90µs     3.4 GB/sec    3.01      3.3±0.01ms  1198.7 MB/sec    2.94      3.3±0.01ms  1225.9 MB/sec
   cast float64 to float32 512              1.01      2.8±0.03µs        ? B/sec    1.00      2.8±0.02µs        ? B/sec    1.11      3.1±0.03µs        ? B/sec    1.00      2.8±0.02µs        ? B/sec
   cast int32 to int32 512                  1.00     26.9±0.38ns        ? B/sec    1.00     26.8±0.19ns        ? B/sec    0.99     26.8±0.15ns        ? B/sec    1.11     30.0±0.16ns        ? B/sec
   cast time32s to time32ms 512             1.00   965.9±11.12ns        ? B/sec    1.05   1012.6±9.16ns        ? B/sec    1.75   1687.8±8.08ns        ? B/sec    1.68   1621.7±7.75ns        ? B/sec
   cast time64ns to time32s 512             1.10     11.1±0.12µs        ? B/sec    1.00     10.1±0.16µs        ? B/sec    1.00     10.1±0.04µs        ? B/sec    1.00     10.1±0.11µs        ? B/sec
   cast timestamp_ms to timestamp_ns 512    1.14  1481.4±11.07ns        ? B/sec    1.00  1304.2±10.03ns        ? B/sec    1.41   1840.6±9.37ns        ? B/sec    1.45  1894.2±10.44ns        ? B/sec
   divide 512                               1.04  1830.0±12.34ns        ? B/sec    1.00   1752.3±9.17ns        ? B/sec    1.03   1797.1±9.00ns        ? B/sec    1.27      2.2±0.01µs        ? B/sec
   eq scalar Float32                        1.00     64.2±0.52µs        ? B/sec    1.01     64.9±0.26µs        ? B/sec    1.07     68.9±0.29µs        ? B/sec    1.11     71.3±0.26µs        ? B/sec
   filter context f32 low selectivity       1.10    129.4±2.24µs        ? B/sec    1.00    117.1±0.63µs        ? B/sec    1.01    118.7±0.56µs        ? B/sec    1.01    118.7±0.51µs        ? B/sec
   min nulls 512                            1.13      2.1±0.02µs        ? B/sec    1.01  1872.1±12.28ns        ? B/sec    1.00  1858.1±25.37ns        ? B/sec    1.12      2.1±0.02µs        ? B/sec
   multiply 512                             1.00    403.0±2.35ns        ? B/sec    1.21    487.6±4.20ns        ? B/sec    2.86   1152.5±9.71ns        ? B/sec    2.86   1151.0±6.00ns        ? B/sec
   subtract 512                             1.00    385.3±4.18ns        ? B/sec    1.09    418.9±2.99ns        ? B/sec    3.24   1246.6±6.37ns        ? B/sec    3.00   1157.5±6.40ns        ? B/sec
   take bool nulls 1024                     1.02      2.6±0.02µs        ? B/sec    1.00      2.6±0.02µs        ? B/sec    1.91      4.9±0.05µs        ? B/sec    1.91      4.9±0.05µs        ? B/sec
   take bool nulls 512                      1.03  1449.2±18.90ns        ? B/sec    1.00  1412.8±11.94ns        ? B/sec    1.27  1788.1±14.12ns        ? B/sec    1.34  1899.4±32.11ns        ? B/sec
   take i32 512                             1.00    924.2±5.01ns        ? B/sec    1.00    922.8±6.28ns        ? B/sec    1.11   1022.4±5.66ns        ? B/sec    1.00    925.5±9.49ns        ? B/sec
   take i32 nulls 512                       1.09   1067.0±6.65ns        ? B/sec    1.00    975.3±5.77ns        ? B/sec    1.09   1065.0±5.21ns        ? B/sec    1.11   1080.0±6.80ns        ? B/sec
   take str 1024                            1.00      4.6±0.02µs        ? B/sec    1.04      4.8±0.03µs        ? B/sec    1.14      5.3±0.04µs        ? B/sec    1.06      4.9±0.03µs        ? B/sec
   take str 512                             1.00      2.8±0.01µs        ? B/sec    1.04      2.9±0.02µs        ? B/sec    1.14      3.2±0.03µs        ? B/sec    1.01      2.8±0.02µs        ? B/sec
   take str null indices 512                1.00      2.8±0.01µs        ? B/sec    1.04      2.9±0.02µs        ? B/sec    1.16      3.2±0.03µs        ? B/sec    1.03      2.9±0.02µs        ? B/sec
   ```
   A few sum benchmarks are significantly faster (finished in half to a third the time).
   I am surprised by a performance increase in take bool nulls 1025/512 and take str null indices 512...  I wouldn't expect those to use primitive arrays at all.  I'll look into why that changes at the same time I look into fixing the simd compilation.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748559191


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=h1) Report
   > Merging [#8973](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=desc) (a8e4f8b) into [master](https://codecov.io/gh/apache/arrow/commit/5eb6ce18e110fe6534ab08f158dc0b8a2744e59e?el=desc) (5eb6ce1) will **decrease** coverage by `0.55%`.
   > The diff coverage is `81.25%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8973/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #8973      +/-   ##
   ==========================================
   - Coverage   83.19%   82.63%   -0.56%     
   ==========================================
     Files         199      200       +1     
     Lines       48661    49726    +1065     
   ==========================================
   + Hits        40482    41093     +611     
   - Misses       8179     8633     +454     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/compute/kernels/comparison.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2NvbXBhcmlzb24ucnM=) | `96.28% <ø> (ø)` | |
   | [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `91.68% <62.50%> (-0.59%)` | :arrow_down: |
   | [rust/arrow/src/array/array\_dictionary.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfZGljdGlvbmFyeS5ycw==) | `85.05% <100.00%> (ø)` | |
   | [rust/arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYnVpbGRlci5ycw==) | `84.11% <100.00%> (-0.07%)` | :arrow_down: |
   | [rust/arrow/src/compute/kernels/aggregate.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2FnZ3JlZ2F0ZS5ycw==) | `75.00% <100.00%> (+3.57%)` | :arrow_up: |
   | [rust/arrow/src/compute/kernels/arithmetic.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2FyaXRobWV0aWMucnM=) | `89.83% <100.00%> (+0.04%)` | :arrow_up: |
   | [rust/datafusion/src/physical\_plan/functions.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2Z1bmN0aW9ucy5ycw==) | `79.34% <0.00%> (-2.68%)` | :arrow_down: |
   | [rust/arrow/src/compute/kernels/sort.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL3NvcnQucnM=) | `93.42% <0.00%> (-2.34%)` | :arrow_down: |
   | [rust/arrow/src/ipc/gen/File.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL2dlbi9GaWxlLnJz) | `40.94% <0.00%> (-2.20%)` | :arrow_down: |
   | [rust/parquet/src/arrow/arrow\_writer.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJvd193cml0ZXIucnM=) | `95.58% <0.00%> (-1.76%)` | :arrow_down: |
   | ... and [37 more](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=footer). Last update [5eb6ce1...be2ea6f](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748559191


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=h1) Report
   > Merging [#8973](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=desc) (56c9f3b) into [master](https://codecov.io/gh/apache/arrow/commit/5eb6ce18e110fe6534ab08f158dc0b8a2744e59e?el=desc) (5eb6ce1) will **decrease** coverage by `0.55%`.
   > The diff coverage is `77.77%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8973/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #8973      +/-   ##
   ==========================================
   - Coverage   83.19%   82.63%   -0.56%     
   ==========================================
     Files         199      200       +1     
     Lines       48661    49728    +1067     
   ==========================================
   + Hits        40482    41094     +612     
   - Misses       8179     8634     +455     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/compute/kernels/comparison.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2NvbXBhcmlzb24ucnM=) | `96.28% <ø> (ø)` | |
   | [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `91.48% <60.00%> (-0.79%)` | :arrow_down: |
   | [rust/arrow/src/array/array\_dictionary.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfZGljdGlvbmFyeS5ycw==) | `85.05% <100.00%> (ø)` | |
   | [rust/arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYnVpbGRlci5ycw==) | `84.11% <100.00%> (-0.07%)` | :arrow_down: |
   | [rust/arrow/src/compute/kernels/aggregate.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2FnZ3JlZ2F0ZS5ycw==) | `75.00% <100.00%> (+3.57%)` | :arrow_up: |
   | [rust/arrow/src/compute/kernels/arithmetic.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2FyaXRobWV0aWMucnM=) | `89.83% <100.00%> (+0.04%)` | :arrow_up: |
   | [rust/datafusion/src/physical\_plan/functions.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2Z1bmN0aW9ucy5ycw==) | `79.34% <0.00%> (-2.68%)` | :arrow_down: |
   | [rust/arrow/src/compute/kernels/sort.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL3NvcnQucnM=) | `93.42% <0.00%> (-2.34%)` | :arrow_down: |
   | [rust/arrow/src/ipc/gen/File.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL2dlbi9GaWxlLnJz) | `40.94% <0.00%> (-2.20%)` | :arrow_down: |
   | [rust/parquet/src/arrow/arrow\_writer.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJvd193cml0ZXIucnM=) | `95.58% <0.00%> (-1.76%)` | :arrow_down: |
   | ... and [37 more](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=footer). Last update [5eb6ce1...56c9f3b](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8973: [Rust] Iterate primitive buffers by slice

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748558702


   <!--
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
   -->
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on JIRA?
   https://issues.apache.org/jira/browse/ARROW
   
   Then could you also rename pull request title in the following format?
   
       ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
     * [Other pull requests](https://github.com/apache/arrow/pulls/)
     * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748559191


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=h1) Report
   > Merging [#8973](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=desc) (b94c46f) into [master](https://codecov.io/gh/apache/arrow/commit/5eb6ce18e110fe6534ab08f158dc0b8a2744e59e?el=desc) (5eb6ce1) will **decrease** coverage by `0.52%`.
   > The diff coverage is `82.35%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8973/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #8973      +/-   ##
   ==========================================
   - Coverage   83.19%   82.66%   -0.53%     
   ==========================================
     Files         199      200       +1     
     Lines       48661    49767    +1106     
   ==========================================
   + Hits        40482    41139     +657     
   - Misses       8179     8628     +449     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/compute/kernels/comparison.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2NvbXBhcmlzb24ucnM=) | `96.28% <ø> (ø)` | |
   | [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `91.68% <66.66%> (-0.59%)` | :arrow_down: |
   | [rust/arrow/src/array/array\_dictionary.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfZGljdGlvbmFyeS5ycw==) | `85.05% <100.00%> (ø)` | |
   | [rust/arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYnVpbGRlci5ycw==) | `84.11% <100.00%> (-0.07%)` | :arrow_down: |
   | [rust/arrow/src/compute/kernels/aggregate.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2FnZ3JlZ2F0ZS5ycw==) | `75.00% <100.00%> (+3.57%)` | :arrow_up: |
   | [rust/arrow/src/compute/kernels/arithmetic.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2FyaXRobWV0aWMucnM=) | `89.83% <100.00%> (+0.04%)` | :arrow_up: |
   | [rust/datafusion/src/physical\_plan/functions.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2Z1bmN0aW9ucy5ycw==) | `79.34% <0.00%> (-2.68%)` | :arrow_down: |
   | [rust/arrow/src/compute/kernels/sort.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL3NvcnQucnM=) | `93.42% <0.00%> (-2.34%)` | :arrow_down: |
   | [rust/arrow/src/ipc/gen/File.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL2dlbi9GaWxlLnJz) | `40.94% <0.00%> (-2.20%)` | :arrow_down: |
   | [rust/parquet/src/arrow/arrow\_writer.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJvd193cml0ZXIucnM=) | `95.58% <0.00%> (-1.76%)` | :arrow_down: |
   | ... and [39 more](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=footer). Last update [5eb6ce1...b94c46f](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748787802


   IMO this is already a very good improvement, it is fine to keep the `value` for now. I think that the SIMD is breaking, probably because that also contains some accesses using the old API.
   
   cc @Dandandan and @jhorstmann , that have been touching these APIs


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io commented on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748559191


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=h1) Report
   > Merging [#8973](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=desc) (4e06899) into [master](https://codecov.io/gh/apache/arrow/commit/5eb6ce18e110fe6534ab08f158dc0b8a2744e59e?el=desc) (5eb6ce1) will **decrease** coverage by `0.00%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8973/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #8973      +/-   ##
   ==========================================
   - Coverage   83.19%   83.19%   -0.01%     
   ==========================================
     Files         199      199              
     Lines       48661    48665       +4     
   ==========================================
   + Hits        40482    40485       +3     
   - Misses       8179     8180       +1     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `92.32% <100.00%> (+0.05%)` | :arrow_up: |
   | [rust/arrow/src/compute/kernels/arithmetic.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2FyaXRobWV0aWMucnM=) | `89.83% <100.00%> (+0.04%)` | :arrow_up: |
   | [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `95.24% <0.00%> (-0.20%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=footer). Last update [5eb6ce1...4e06899](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tyrelr commented on a change in pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice

Posted by GitBox <gi...@apache.org>.
tyrelr commented on a change in pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#discussion_r548092554



##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -63,17 +63,28 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
         self.data.is_empty()
     }
 
-    /// Returns a raw pointer to the values of this array.
-    pub fn raw_values(&self) -> *const T::Native {
-        unsafe { self.raw_values.get().add(self.data.offset()) }
-    }
-
     /// Returns a slice for the given offset and length
     ///
     /// Note this doesn't do any bound checking, for performance reason.
-    pub fn value_slice(&self, offset: usize, len: usize) -> &[T::Native] {
-        let raw =
-            unsafe { std::slice::from_raw_parts(self.raw_values().add(offset), len) };
+    /// # Safety
+    /// caller must ensure that the passed in offset + len are less than the array len()
+    #[deprecated(note = "Please use values() instead")]
+    pub unsafe fn value_slice(&self, offset: usize, len: usize) -> &[T::Native] {
+        let raw = std::slice::from_raw_parts(
+            self.raw_values.get().add(self.data.offset()).add(offset),
+            len,
+        );
+        &raw[..]
+    }
+
+    /// Returns a slice of the values of this array
+    pub fn values(&self) -> &[T::Native] {
+        let raw = unsafe {

Review comment:
       I also added a note to the raw_values pointer itself, since any future constructor should also make the same guarantees before constructing a struct.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jhorstmann commented on a change in pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on a change in pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#discussion_r547839892



##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -63,17 +63,28 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
         self.data.is_empty()
     }
 
-    /// Returns a raw pointer to the values of this array.
-    pub fn raw_values(&self) -> *const T::Native {
-        unsafe { self.raw_values.get().add(self.data.offset()) }
-    }
-
     /// Returns a slice for the given offset and length
     ///
     /// Note this doesn't do any bound checking, for performance reason.
-    pub fn value_slice(&self, offset: usize, len: usize) -> &[T::Native] {
-        let raw =
-            unsafe { std::slice::from_raw_parts(self.raw_values().add(offset), len) };
+    /// # Safety
+    /// caller must ensure that the passed in offset + len are less than the array len()
+    #[deprecated(note = "Please use values() instead")]
+    pub unsafe fn value_slice(&self, offset: usize, len: usize) -> &[T::Native] {

Review comment:
       Since `value_slice` is now only used in the simd comparison kernels, I can remove it later in [ARROW-10990][1].
   
   I think this PR is a nice simplification and also makes the code more safe, thanks!
   
    [1]: https://github.com/apache/arrow/pull/8975




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-750754917


   Thank you for your contribution, @tyrelr . FYI, I have not assigned you to the issue on Jira because I can't (@andygrove , how do we do this?), but I think we should.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748571786


   Hi @tyrelr , that is a good idea.
   
   I am a reticent in introducing yet another public method to the `PrimitiveArray`, as it is already a bit confusing which one to use (we have `value_slice`, `raw_values`, `values` and `value`).
   
   Originally, I was thinking in solving this mess via an iterator of values, however, you have a better idea: just offer a slice of `T`.
   
   In which case, this PR should be bolder: replace `value_slice`, `raw_values`, `values` and (if you feel confident) `value` by a single method, `values`, whose implementation is what you wrote for `raw_values_slice`.
   
   In other words, instead of introducing yet another endpoint, we simplify the existing API with something simpler (and safer, as many of the methods are actually `unsafe` and cause undefined behavior).
   
   How would that sound to you?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tyrelr commented on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice

Posted by GitBox <gi...@apache.org>.
tyrelr commented on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748677658


   That makes sense.  I'll take a look.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748559518


   https://issues.apache.org/jira/browse/ARROW-10989


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao closed pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice

Posted by GitBox <gi...@apache.org>.
jorgecarleitao closed pull request #8973:
URL: https://github.com/apache/arrow/pull/8973


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] removed a comment on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice

Posted by GitBox <gi...@apache.org>.
github-actions[bot] removed a comment on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748558702


   <!--
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
   -->
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on JIRA?
   https://issues.apache.org/jira/browse/ARROW
   
   Then could you also rename pull request title in the following format?
   
       ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
     * [Other pull requests](https://github.com/apache/arrow/pulls/)
     * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#discussion_r547954350



##########
File path: rust/arrow/src/compute/kernels/comparison.rs
##########
@@ -451,15 +453,17 @@ where
     let rem = len % lanes;
 
     for i in (0..len - rem).step_by(lanes) {
-        let simd_left = T::load(left.value_slice(i, lanes));
+        let simd_left = T::load(unsafe { left.value_slice(i, lanes) });
         let simd_result = op(simd_left, simd_right);
         T::bitmask(&simd_result, |b| {
             result.extend_from_slice(b);
         });
     }
 
     if rem > 0 {
-        let simd_left = T::load(left.value_slice(len - rem, lanes));
+        //Soundness
+        //	This is not sound because it can read past the end of PrimitiveArray buffer (lanes is always greater than rem), see ARROW-10990

Review comment:
       🤣 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748559191


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=h1) Report
   > Merging [#8973](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=desc) (be2ea6f) into [master](https://codecov.io/gh/apache/arrow/commit/5eb6ce18e110fe6534ab08f158dc0b8a2744e59e?el=desc) (5eb6ce1) will **decrease** coverage by `0.55%`.
   > The diff coverage is `77.77%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8973/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #8973      +/-   ##
   ==========================================
   - Coverage   83.19%   82.63%   -0.56%     
   ==========================================
     Files         199      200       +1     
     Lines       48661    49728    +1067     
   ==========================================
   + Hits        40482    41094     +612     
   - Misses       8179     8634     +455     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/compute/kernels/comparison.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2NvbXBhcmlzb24ucnM=) | `96.28% <ø> (ø)` | |
   | [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `91.48% <60.00%> (-0.79%)` | :arrow_down: |
   | [rust/arrow/src/array/array\_dictionary.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfZGljdGlvbmFyeS5ycw==) | `85.05% <100.00%> (ø)` | |
   | [rust/arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYnVpbGRlci5ycw==) | `84.11% <100.00%> (-0.07%)` | :arrow_down: |
   | [rust/arrow/src/compute/kernels/aggregate.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2FnZ3JlZ2F0ZS5ycw==) | `75.00% <100.00%> (+3.57%)` | :arrow_up: |
   | [rust/arrow/src/compute/kernels/arithmetic.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2FyaXRobWV0aWMucnM=) | `89.83% <100.00%> (+0.04%)` | :arrow_up: |
   | [rust/datafusion/src/physical\_plan/functions.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2Z1bmN0aW9ucy5ycw==) | `79.34% <0.00%> (-2.68%)` | :arrow_down: |
   | [rust/arrow/src/compute/kernels/sort.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL3NvcnQucnM=) | `93.42% <0.00%> (-2.34%)` | :arrow_down: |
   | [rust/arrow/src/ipc/gen/File.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL2dlbi9GaWxlLnJz) | `40.94% <0.00%> (-2.20%)` | :arrow_down: |
   | [rust/parquet/src/arrow/arrow\_writer.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJvd193cml0ZXIucnM=) | `95.58% <0.00%> (-1.76%)` | :arrow_down: |
   | ... and [37 more](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=footer). Last update [5eb6ce1...be2ea6f](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org