You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/16 19:17:02 UTC

[GitHub] [arrow] tyrelr opened a new pull request #9215: ARROW-11270: [Rust] Array slice accessors

tyrelr opened a new pull request #9215:
URL: https://github.com/apache/arrow/pull/9215


   Using an approach similar to ARROW-10989, migrate typed array API's to use slices where they can.
   
   This impacts the API of GenericBinaryArray<>,  GenericListArray<>,  GenericStringArray<>
   
   This also enables bounds checking in every value() function on each of the above arrays, as well as PrimitiveArray<>.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#discussion_r563153376



##########
File path: rust/arrow/src/array/array_string.rs
##########
@@ -50,49 +50,65 @@ pub struct GenericStringArray<OffsetSize: StringOffsetSizeTrait> {
 }
 
 impl<OffsetSize: StringOffsetSizeTrait> GenericStringArray<OffsetSize> {
-    /// Returns the offset for the element at index `i`.
-    ///
-    /// Note this doesn't do any bound checking, for performance reason.
-    #[inline]
-    pub fn value_offset(&self, i: usize) -> OffsetSize {
-        self.value_offset_at(self.data.offset() + i)
-    }
-
     /// Returns the length for the element at index `i`.
-    ///
-    /// Note this doesn't do any bound checking, for performance reason.
     #[inline]
-    pub fn value_length(&self, mut i: usize) -> OffsetSize {
-        i += self.data.offset();
-        self.value_offset_at(i + 1) - self.value_offset_at(i)
+    pub fn value_length(&self, i: usize) -> OffsetSize {
+        let offsets = self.value_offsets();
+        offsets[i + 1] - offsets[i]
     }
 
-    /// Returns a clone of the value offset buffer
-    pub fn value_offsets(&self) -> Buffer {
-        self.data.buffers()[0].clone()
+    /// Returns the offset values in the offsets buffer
+    #[inline]
+    pub fn value_offsets(&self) -> &[OffsetSize] {

Review comment:
       what do you think of naming it simply `offsets`? I can't see the value of `value_` on the name (same for the other types). Since we are touching the API in backward incompatible ways... xD




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-761621256


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=h1) Report
   > Merging [#9215](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=desc) (e18e319) into [master](https://codecov.io/gh/apache/arrow/commit/499b6d0c5fbcf7d3aabf25e286cb8c300e6de52a?el=desc) (499b6d0) will **increase** coverage by `0.23%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9215/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9215      +/-   ##
   ==========================================
   + Coverage   81.64%   81.88%   +0.23%     
   ==========================================
     Files         215      215              
     Lines       52419    53031     +612     
   ==========================================
   + Hits        42798    43423     +625     
   + Misses       9621     9608      -13     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/array/array.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXkucnM=) | `88.61% <100.00%> (+1.62%)` | :arrow_up: |
   | [rust/arrow/src/array/array\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfYmluYXJ5LnJz) | `91.75% <100.00%> (+0.76%)` | :arrow_up: |
   | [rust/arrow/src/array/array\_list.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfbGlzdC5ycw==) | `93.84% <100.00%> (+0.78%)` | :arrow_up: |
   | [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `94.77% <100.00%> (+0.24%)` | :arrow_up: |
   | [rust/arrow/src/array/array\_string.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfc3RyaW5nLnJz) | `91.66% <100.00%> (+1.66%)` | :arrow_up: |
   | [rust/arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYnVpbGRlci5ycw==) | `85.10% <100.00%> (ø)` | |
   | [rust/arrow/src/compute/kernels/cast.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2Nhc3QucnM=) | `96.96% <100.00%> (-0.03%)` | :arrow_down: |
   | [rust/arrow/src/compute/kernels/limit.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2xpbWl0LnJz) | `100.00% <100.00%> (ø)` | |
   | [rust/arrow/src/compute/util.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS91dGlsLnJz) | `98.92% <100.00%> (ø)` | |
   | [rust/arrow/src/buffer.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYnVmZmVyLnJz) | `93.68% <0.00%> (-2.51%)` | :arrow_down: |
   | ... and [22 more](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=footer). Last update [499b6d0...e18e319](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-761621256


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=h1) Report
   > Merging [#9215](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=desc) (dda0545) into [master](https://codecov.io/gh/apache/arrow/commit/ab5fc979c69ccc5dde07e1bc1467b02951b4b7e9?el=desc) (ab5fc97) will **increase** coverage by `0.03%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9215/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9215      +/-   ##
   ==========================================
   + Coverage   81.89%   81.92%   +0.03%     
   ==========================================
     Files         215      215              
     Lines       52988    53070      +82     
   ==========================================
   + Hits        43392    43478      +86     
   + Misses       9596     9592       -4     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/array/array.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXkucnM=) | `88.61% <100.00%> (ø)` | |
   | [rust/arrow/src/array/array\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfYmluYXJ5LnJz) | `91.75% <100.00%> (+0.76%)` | :arrow_up: |
   | [rust/arrow/src/array/array\_list.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfbGlzdC5ycw==) | `93.84% <100.00%> (+0.78%)` | :arrow_up: |
   | [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `94.72% <100.00%> (+0.24%)` | :arrow_up: |
   | [rust/arrow/src/array/array\_string.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfc3RyaW5nLnJz) | `95.37% <100.00%> (+1.25%)` | :arrow_up: |
   | [rust/arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYnVpbGRlci5ycw==) | `85.10% <100.00%> (ø)` | |
   | [rust/arrow/src/compute/kernels/cast.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2Nhc3QucnM=) | `96.96% <100.00%> (-0.03%)` | :arrow_down: |
   | [rust/arrow/src/compute/kernels/limit.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2xpbWl0LnJz) | `100.00% <100.00%> (ø)` | |
   | [rust/arrow/src/compute/util.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS91dGlsLnJz) | `98.92% <100.00%> (ø)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=footer). Last update [ab5fc97...dda0545](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-761773323


   I apologize for the delay in merging Rust PRs -- the 3.0 release is being finalized now and are planning to minimize entropy by postponing merging  changes not critical for the release until the process was complete. I hope the process is complete in the next few days. There is more [discussion](https://lists.apache.org/list.html?dev@arrow.apache.org) in the mailing list 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-763561370


   Roger -- I'll plan to merge it later today (after the current round of merges have gotten through CI)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao closed pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
jorgecarleitao closed pull request #9215:
URL: https://github.com/apache/arrow/pull/9215


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tyrelr commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
tyrelr commented on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-766288246


   I've rebased locally and will be running benchmarks overnight.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tyrelr commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
tyrelr commented on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-764164139


   Rebased.  But I didn't hit the merge conflicts during...  I can see that a number of tests were swapped to use Buffer::from_slice_ref()... so I'll add another commit to switch to that function just to be safe.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tyrelr commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
tyrelr commented on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-766153344


   The 10% performance impact looks like mostly improvements:
   ```
   critcmp master-499b6d0c array-slice-78f360b7c -t 10
   group                      array-slice-78f360b7c                  master-499b6d0c
   -----                      ---------------------                  ---------------
   add 512                    1.00    282.9±0.41ns        ? B/sec    1.37    387.7±0.71ns        ? B/sec
   add_nulls_512              1.00    344.7±0.54ns        ? B/sec    1.30    449.3±1.08ns        ? B/sec
   bench_primitive            1.42      3.1±0.00ms  1279.3 MB/sec    1.00      2.2±0.01ms  1815.1 MB/sec
   buffer_bit_ops or          1.00    590.9±0.95ns        ? B/sec    1.14    673.4±0.93ns        ? B/sec
   cast int64 to int32 512    1.15      2.5±0.02µs        ? B/sec    1.00      2.2±0.01µs        ? B/sec
   equal_bool_512             1.11     43.1±0.08ns        ? B/sec    1.00     38.8±0.06ns        ? B/sec
   like_utf8 scalar equals    1.00     72.5±0.88µs        ? B/sec    1.22     88.4±0.18µs        ? B/sec
   max nulls 512              1.00   1758.9±6.23ns        ? B/sec    1.31      2.3±0.01µs        ? B/sec
   min string 512             1.00      3.4±0.00µs        ? B/sec    1.32      4.5±0.07µs        ? B/sec
   multiply 512               1.00    335.6±0.67ns        ? B/sec    1.19    398.6±5.74ns        ? B/sec
   mutable str 1024           1.00      2.7±0.00ms        ? B/sec    1.14      3.0±0.01ms        ? B/sec
   subtract 512               1.00    345.3±1.26ns        ? B/sec    1.12    385.0±0.77ns        ? B/sec
   take bool nulls 1024       1.15      4.2±0.03µs        ? B/sec    1.00      3.7±0.02µs        ? B/sec
   ```
   I don't believe that the list/binary/string array would be involved in most of the benchmarks  listed here? (multiply, subtract, take bool, add, add_nulls, bench_primitive should be operations on boolean/primitive arrays, which have no functional changes)...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tyrelr commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
tyrelr commented on a change in pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#discussion_r561918229



##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
     }
 
     /// Returns the primitive value at index `i`.
-    ///
-    /// Note this doesn't do any bound checking, for performance reason.
-    /// # Safety
-    /// caller must ensure that the passed in offset is less than the array len()
+    #[inline]
     pub fn value(&self, i: usize) -> T::Native {
-        let offset = i + self.offset();
-        unsafe { *self.raw_values.as_ptr().add(offset) }
+        self.values()[i]

Review comment:
       I agree, that we want to guide API users to switch to values().  My hesitancy with making/leaving it unsafe is that I think it's the only Array value(x) method when combined with the other changes in this PR (which is why it got added as a followon commit).  That leads to awkwardness in macros, such as the comparison kernel, where it effectively calls "$left.value(x)",  it could be a papercut for macro-style invocation.  
   
   (There may be an argument that a trait could abstract away the need for macros altogether... but my experiments to rework the comparison kernel to that effect are incomplete - and partly sidelined with various other experiments I'm toying with)
   
   I would expect that in the exact for loop you mention, the compiler could recognize the satisfied loop bound since both len() and .value(i) are inlinable and use the same underlying field, but I expect it is also easy to complicate on that loop to defeat that by having a less direct relationship between i & array.len().  The pretty printing logic probably would do additional bounds checks now, as it iterates based on  len() from the record batch, which is semantically equal, but nothing in the code enforces that in a way I would expect the compiler to understand.
   
   I'll launch some benchmarks of head, master, and the commit just before the PrimitiveArray safety change.

##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
     }
 
     /// Returns the primitive value at index `i`.
-    ///
-    /// Note this doesn't do any bound checking, for performance reason.
-    /// # Safety
-    /// caller must ensure that the passed in offset is less than the array len()
+    #[inline]
     pub fn value(&self, i: usize) -> T::Native {
-        let offset = i + self.offset();
-        unsafe { *self.raw_values.as_ptr().add(offset) }
+        self.values()[i]

Review comment:
       I agree, that we want to guide API users to switch to values().  My hesitancy with making/leaving it unsafe is that I think it's the only Array value(x) method when combined with the other changes in this PR (which is why it got added as a followon commit).  That leads to awkwardness in macros, such as the comparison kernel, where it effectively calls "$left.value(x)".  Making macro style invocation sometimes-need-unsafe is a bit unfortunate.
   
   (There may be an argument that a trait could abstract away the need for macros altogether... but my experiments to rework the comparison kernel to that effect are incomplete - and partly sidelined with various other experiments I'm toying with... most of which somehow relate to removing direct calls to __Array.value(i) )
   
   I would expect that in the exact for loop you mention, the compiler could recognize the satisfied loop bound since both len() and .value(i) are inlinable and use the same underlying field, but I expect it is also easy to complicate on that loop to defeat that by having a less direct relationship between i & array.len().  The pretty printing logic probably would do additional bounds checks now, as it iterates based on  len() from the record batch, which is semantically equal, but nothing in the code enforces that in a way I would expect the compiler to understand.
   
   I'll launch some benchmarks of head, master, and the commit just before the PrimitiveArray safety change.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tyrelr commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
tyrelr commented on a change in pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#discussion_r562999845



##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
     }
 
     /// Returns the primitive value at index `i`.
-    ///
-    /// Note this doesn't do any bound checking, for performance reason.
-    /// # Safety
-    /// caller must ensure that the passed in offset is less than the array len()
+    #[inline]
     pub fn value(&self, i: usize) -> T::Native {
-        let offset = i + self.offset();
-        unsafe { *self.raw_values.as_ptr().add(offset) }
+        self.values()[i]

Review comment:
       #9291 is good progress towards eliminating use of the function.
   
   And certainly, we could 'split' the macro for primitives as a quick fix to get rid of the call to the function.  I've been experimenting with an alternative approach that might be a bit more flexible to multiple use cases, described at the bottom of this comment.
   
   I am quite torn about whether I think value should or should not be in the interface.
   # Reasons to drop value(i) -> T::Native
   I think that even if value(i) was dropped from the PrimitiveArray impl's, efficient random access to items without a bounds check can still be achieved through ```unsafe{*primitive_array.values().get_unchecked(i)}```  (the extra * because get_unchecked() returns a ref to the value).
   
   I'm not sure I have any example code or measurements to demonstrate it on hand, but I am certain I saw the silently-unsafe implementation```x.values().iter().zip(y.values().iter())``` did (slightly) outperform ```(0..x.len()).map(|i|{x.value(i),y.value(i)}```.  I believe it was when I was playing with non-simd arithmetic kernels....  So that is the root of my hesitancy, is I'm worried it doesn't actually escape any overhead, and unintentionally lead people away from a more reliable/performant way.  IF there is a context where unsafe{x.value(i)} beats the performance of unsafe{*x.values().get_unchecked(i)}
   
   # Reasons to keep value(i) -> T::Native
   All other array implementations have value functions as far as I recall, so it is a nice 'consistency'.  
   
   In the back of my mind, the biggest argument to keep value(i) is for api consistency... so long term, a 'trait' may be the place where it might fit best?  Very roughly, I'm thinking:
   ```
   trait TypedArrowArray : ArrowArray {
      type RefType;
      fn is_valid(i:usize) -> bool; //bounds check
      unsafe fn is_valid(i:usize) -> bool; //no bounds check
      fn value(i:usize) -> RefType;  //bounds check
      unsafe fn value_unchecked(i:usize) -> RefType; //no bounds checked
      fn iter() -> impl Iterator<Option<RefType>>;
      fn iter_values() -> impl Iterator<RefType>;
   }
   impl <T: ArrowPrimitiveType> TypedArrowArray<&T::Native> for PrimitiveArray<T> { ... }
   impl TypedArrowArray<ArrayRef> for GenericListArray<T> { ... }
   //and similar for string/binary. ... I am not sure whether struct arrays could fit... Dictionary would not give access to 'keys', only to the values referenced by each key?  Union would require some kind of RefType that can downcast into the actual value?
   ```
   Of course, I am uncertain how much overhead the 'standarization' such a trait impl implies would bring...  would any kernels actually benefit from using generic implementations against such an api, or will they always go down to the concrete type to squeeze little short-cuts out that don't fit in the generic interface?  I'm unsure, so very (very, very) slowly experimenting...
   
   # Summary
   
   So in short, my thoughts are:
   * I think that leaving value(i) safety consideration out of this PR makes sense.  I've rebased to drop that out - although I did leave the additional values() test code.
   * Marking it unsafe in the near future is absolutely better than being silently-unsafe.  The argument that adding bounds-checks could silently impact external users is reasonable, taking unsafe has the larger 'warning' so that the change isn't missed.
   * Longer term, the options of deprecating it, or explicitly moving it into an trait impl are both contenders in my mind... but neither option is directly relevant to this PR.
   
   Let me know if that seems reasonable.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tyrelr commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
tyrelr commented on a change in pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#discussion_r562406187



##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
     }
 
     /// Returns the primitive value at index `i`.
-    ///
-    /// Note this doesn't do any bound checking, for performance reason.
-    /// # Safety
-    /// caller must ensure that the passed in offset is less than the array len()
+    #[inline]
     pub fn value(&self, i: usize) -> T::Native {
-        let offset = i + self.offset();
-        unsafe { *self.raw_values.as_ptr().add(offset) }
+        self.values()[i]

Review comment:
       That makes sense.  I'll rebase dropping that commit.

##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
     }
 
     /// Returns the primitive value at index `i`.
-    ///
-    /// Note this doesn't do any bound checking, for performance reason.
-    /// # Safety
-    /// caller must ensure that the passed in offset is less than the array len()
+    #[inline]
     pub fn value(&self, i: usize) -> T::Native {
-        let offset = i + self.offset();
-        unsafe { *self.raw_values.as_ptr().add(offset) }
+        self.values()[i]

Review comment:
       safety overhead on a few benchmarks seems to be 15-30%
   ```
   group                           master-                                safe-prim-                             unsafe-prim-
   -----                           -------                                ----------                             ------------
   add 512                         1.00    307.4±4.79ns        ? B/sec    1.07    327.7±6.48ns        ? B/sec    1.30    400.5±7.09ns        ? B/sec
   add_nulls_512                   1.20    447.8±8.09ns        ? B/sec    1.32    490.9±3.41ns        ? B/sec    1.00    372.2±5.96ns        ? B/sec
   buffer_bit_ops or               1.17   731.0±20.02ns        ? B/sec    1.19    743.7±5.43ns        ? B/sec    1.00   624.0±10.86ns        ? B/sec
   cast date64 to date32 512       1.00      7.4±0.22µs        ? B/sec    1.10      8.2±0.07µs        ? B/sec    1.03      7.7±0.22µs        ? B/sec
   cast i64 to string 512          1.00     24.8±0.33µs        ? B/sec    1.09     27.1±0.79µs        ? B/sec    1.10     27.3±0.41µs        ? B/sec
   cast time32s to time32ms 512    1.00   917.0±19.20ns        ? B/sec    1.21  1113.0±22.04ns        ? B/sec    1.07    984.5±7.53ns        ? B/sec
   equal_512                       1.00     44.1±0.81ns        ? B/sec    1.13     49.9±0.65ns        ? B/sec    1.01     44.4±0.27ns        ? B/sec
   min nulls string 512            1.04      6.4±0.12µs        ? B/sec    1.14      7.0±0.15µs        ? B/sec    1.00      6.2±0.09µs        ? B/sec
   multiply 512                    1.14   471.5±10.15ns        ? B/sec    1.18    487.0±4.72ns        ? B/sec    1.00    413.4±6.16ns        ? B/sec
   not                             1.00  1102.3±14.55ns        ? B/sec    1.12  1235.9±19.91ns        ? B/sec    1.02  1120.3±19.83ns        ? B/sec
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#discussion_r561618199



##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
     }
 
     /// Returns the primitive value at index `i`.
-    ///
-    /// Note this doesn't do any bound checking, for performance reason.
-    /// # Safety
-    /// caller must ensure that the passed in offset is less than the array len()
+    #[inline]
     pub fn value(&self, i: usize) -> T::Native {
-        let offset = i + self.offset();
-        unsafe { *self.raw_values.as_ptr().add(offset) }
+        self.values()[i]

Review comment:
       Thinking about it, I am not sure we want this: this will have severe implications to almost all uses of `value()`. Could you run the benches before and after this PR?
   
   My suggestion is that we mark this method as `unsafe`, so that it is clear that people need to be conscious about it. Since this will impact a large part of our code base (and likely others), I suggest that we do this change on a separate PR.
   
   I think that we would like to instruct users of this API to switch to `values()`. This change will not trigger that change (as the code continues to run), but will have an amazing performance implication.
   
   Think 
   
   ```rust
   for i in 0..array.len() {
       if array.value(i) > 2 {
           // do x
       };
   }
   ```
   this loop will suffer a lot from this change.
   
   we would like users to change it to 
   
   ```rust
   array.values().for_each(|value| {
       if value > 2 {
           // do x
       };
   });
   ```
   
   For that, we need to mark `array.value(i)` as `unsafe` to indicate that yes, you can use that method, yes, it is fast, _but_ you need to be careful about `i`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tyrelr commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
tyrelr commented on a change in pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#discussion_r563168268



##########
File path: rust/arrow/src/array/array_string.rs
##########
@@ -50,49 +50,65 @@ pub struct GenericStringArray<OffsetSize: StringOffsetSizeTrait> {
 }
 
 impl<OffsetSize: StringOffsetSizeTrait> GenericStringArray<OffsetSize> {
-    /// Returns the offset for the element at index `i`.
-    ///
-    /// Note this doesn't do any bound checking, for performance reason.
-    #[inline]
-    pub fn value_offset(&self, i: usize) -> OffsetSize {
-        self.value_offset_at(self.data.offset() + i)
-    }
-
     /// Returns the length for the element at index `i`.
-    ///
-    /// Note this doesn't do any bound checking, for performance reason.
     #[inline]
-    pub fn value_length(&self, mut i: usize) -> OffsetSize {
-        i += self.data.offset();
-        self.value_offset_at(i + 1) - self.value_offset_at(i)
+    pub fn value_length(&self, i: usize) -> OffsetSize {
+        let offsets = self.value_offsets();
+        offsets[i + 1] - offsets[i]
     }
 
-    /// Returns a clone of the value offset buffer
-    pub fn value_offsets(&self) -> Buffer {
-        self.data.buffers()[0].clone()
+    /// Returns the offset values in the offsets buffer
+    #[inline]
+    pub fn value_offsets(&self) -> &[OffsetSize] {
+        // Soundness
+        //     pointer alignment & location is ensured by RawPtrBox
+        //     buffer bounds/offset is ensured by the ArrayData instance.
+        unsafe {
+            std::slice::from_raw_parts(
+                self.value_offsets.as_ptr().add(self.data.offset()),
+                self.len() + 1,
+            )
+        }
     }
 
     /// Returns a clone of the value data buffer
     pub fn value_data(&self) -> Buffer {
         self.data.buffers()[1].clone()
     }
 
-    #[inline]
-    fn value_offset_at(&self, i: usize) -> OffsetSize {
-        unsafe { *self.value_offsets.as_ptr().add(i) }
+    /// Returns the element at index
+    /// # Safety
+    /// caller is responsible for ensuring that index is within the array bounds
+    pub unsafe fn value_unchecked(&self, i: usize) -> &str {
+        let end = self.value_offsets().get_unchecked(i + 1);
+        let start = self.value_offsets().get_unchecked(i);
+
+        // Soundness
+        // pointer alignment & location is ensured by RawPtrBox
+        // buffer bounds/offset is ensured by the value_offset invariants
+        // ISSUE: utf-8 well formedness is not checked
+        let slice = std::slice::from_raw_parts(
+            self.value_data.as_ptr().offset(start.to_isize()),
+            (*end - *start).to_usize().unwrap(),
+        );
+        std::str::from_utf8_unchecked(slice)
     }
 
     /// Returns the element at index `i` as &str
     pub fn value(&self, i: usize) -> &str {
         assert!(i < self.data.len(), "StringArray out of bounds access");
-        let offset = i.checked_add(self.data.offset()).unwrap();
+        let end = self.value_offsets()[i + 1];
+        let start = self.value_offsets()[i];

Review comment:
       switching to unsafe unsafe access here would help the string benchmarks.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tyrelr commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
tyrelr commented on a change in pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#discussion_r563166348



##########
File path: rust/arrow/src/array/array_string.rs
##########
@@ -50,49 +50,65 @@ pub struct GenericStringArray<OffsetSize: StringOffsetSizeTrait> {
 }
 
 impl<OffsetSize: StringOffsetSizeTrait> GenericStringArray<OffsetSize> {
-    /// Returns the offset for the element at index `i`.
-    ///
-    /// Note this doesn't do any bound checking, for performance reason.
-    #[inline]
-    pub fn value_offset(&self, i: usize) -> OffsetSize {
-        self.value_offset_at(self.data.offset() + i)
-    }
-
     /// Returns the length for the element at index `i`.
-    ///
-    /// Note this doesn't do any bound checking, for performance reason.
     #[inline]
-    pub fn value_length(&self, mut i: usize) -> OffsetSize {
-        i += self.data.offset();
-        self.value_offset_at(i + 1) - self.value_offset_at(i)
+    pub fn value_length(&self, i: usize) -> OffsetSize {
+        let offsets = self.value_offsets();
+        offsets[i + 1] - offsets[i]
     }
 
-    /// Returns a clone of the value offset buffer
-    pub fn value_offsets(&self) -> Buffer {
-        self.data.buffers()[0].clone()
+    /// Returns the offset values in the offsets buffer
+    #[inline]
+    pub fn value_offsets(&self) -> &[OffsetSize] {

Review comment:
       I had considered that, but was concerned that it may be confused with the buffer.offset... my hope was that it'd make it clear that it's an "offset into values()" (not into the buffer, which has the additional array-slicing offset).
   
   I could go either way with it though?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-761621256


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=h1) Report
   > Merging [#9215](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=desc) (47086b8) into [master](https://codecov.io/gh/apache/arrow/commit/1393188e1aa1b3d59993ce7d4ade7f7ac8570959?el=desc) (1393188) will **increase** coverage by `0.03%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9215/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9215      +/-   ##
   ==========================================
   + Coverage   81.61%   81.64%   +0.03%     
   ==========================================
     Files         215      215              
     Lines       51867    51950      +83     
   ==========================================
   + Hits        42329    42415      +86     
   + Misses       9538     9535       -3     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/array/array\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfYmluYXJ5LnJz) | `91.75% <100.00%> (+0.76%)` | :arrow_up: |
   | [rust/arrow/src/array/array\_list.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfbGlzdC5ycw==) | `93.90% <100.00%> (+0.80%)` | :arrow_up: |
   | [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `92.89% <100.00%> (+0.31%)` | :arrow_up: |
   | [rust/arrow/src/array/array\_string.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfc3RyaW5nLnJz) | `90.62% <100.00%> (+1.73%)` | :arrow_up: |
   | [rust/arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYnVpbGRlci5ycw==) | `85.89% <100.00%> (ø)` | |
   | [rust/arrow/src/compute/kernels/cast.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2Nhc3QucnM=) | `96.97% <100.00%> (-0.03%)` | :arrow_down: |
   | [rust/arrow/src/compute/kernels/limit.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2xpbWl0LnJz) | `100.00% <100.00%> (ø)` | |
   | [rust/arrow/src/compute/util.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS91dGlsLnJz) | `98.93% <100.00%> (ø)` | |
   | [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `94.86% <0.00%> (-0.20%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=footer). Last update [eaa7b7a...47086b8](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-763555725


   Ready to merge. Note that it has some backward incompatible changes, so PRs on top need to be merged carefully.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#discussion_r561618199



##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
     }
 
     /// Returns the primitive value at index `i`.
-    ///
-    /// Note this doesn't do any bound checking, for performance reason.
-    /// # Safety
-    /// caller must ensure that the passed in offset is less than the array len()
+    #[inline]
     pub fn value(&self, i: usize) -> T::Native {
-        let offset = i + self.offset();
-        unsafe { *self.raw_values.as_ptr().add(offset) }
+        self.values()[i]

Review comment:
       Thinking about it, I am not sure we want this: this will have severe implications to almost all uses of `value()`. Could you run the benches before and after this PR?
   
   My suggestion is that we mark this method as `unsafe`, so that it is clear that people need to be conscious about it. Since this will impact a large part of our code base (and likely others), I suggest that we do this change on a separate PR

##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
     }
 
     /// Returns the primitive value at index `i`.
-    ///
-    /// Note this doesn't do any bound checking, for performance reason.
-    /// # Safety
-    /// caller must ensure that the passed in offset is less than the array len()
+    #[inline]
     pub fn value(&self, i: usize) -> T::Native {
-        let offset = i + self.offset();
-        unsafe { *self.raw_values.as_ptr().add(offset) }
+        self.values()[i]

Review comment:
       Thinking about it, I am not sure we want this: this will have severe implications to almost all uses of `value()`. Could you run the benches before and after this PR?
   
   My suggestion is that we mark this method as `unsafe`, so that it is clear that people need to be conscious about it. Since this will impact a large part of our code base (and likely others), I suggest that we do this change on a separate PR.
   
   I think that we would like to instruct users of this API to switch to `values()`. This change will not trigger that change (as the code continues to run), but will have an amazing performance implication.
   
   Think 
   
   ```rust
   for i in 0..array.len() {
       if array.value(i) > 2 {
           // do x
       };
   }
   ```
   this loop will suffer a lot from this change.
   
   we would like users to change it to 
   
   ```
   array.values().for_each(|value| {
       if value > 2 {
           // do x
       };
   });
   ```
   

##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
     }
 
     /// Returns the primitive value at index `i`.
-    ///
-    /// Note this doesn't do any bound checking, for performance reason.
-    /// # Safety
-    /// caller must ensure that the passed in offset is less than the array len()
+    #[inline]
     pub fn value(&self, i: usize) -> T::Native {
-        let offset = i + self.offset();
-        unsafe { *self.raw_values.as_ptr().add(offset) }
+        self.values()[i]

Review comment:
       Thinking about it, I am not sure we want this: this will have severe implications to almost all uses of `value()`. Could you run the benches before and after this PR?
   
   My suggestion is that we mark this method as `unsafe`, so that it is clear that people need to be conscious about it. Since this will impact a large part of our code base (and likely others), I suggest that we do this change on a separate PR.
   
   I think that we would like to instruct users of this API to switch to `values()`. This change will not trigger that change (as the code continues to run), but will have an amazing performance implication.
   
   Think 
   
   ```rust
   for i in 0..array.len() {
       if array.value(i) > 2 {
           // do x
       };
   }
   ```
   this loop will suffer a lot from this change.
   
   we would like users to change it to 
   
   ```rust
   array.values().for_each(|value| {
       if value > 2 {
           // do x
       };
   });
   ```
   

##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
     }
 
     /// Returns the primitive value at index `i`.
-    ///
-    /// Note this doesn't do any bound checking, for performance reason.
-    /// # Safety
-    /// caller must ensure that the passed in offset is less than the array len()
+    #[inline]
     pub fn value(&self, i: usize) -> T::Native {
-        let offset = i + self.offset();
-        unsafe { *self.raw_values.as_ptr().add(offset) }
+        self.values()[i]

Review comment:
       Thinking about it, I am not sure we want this: this will have severe implications to almost all uses of `value()`. Could you run the benches before and after this PR?
   
   My suggestion is that we mark this method as `unsafe`, so that it is clear that people need to be conscious about it. Since this will impact a large part of our code base (and likely others), I suggest that we do this change on a separate PR.
   
   I think that we would like to instruct users of this API to switch to `values()`. This change will not trigger that change (as the code continues to run), but will have an amazing performance implication.
   
   Think 
   
   ```rust
   for i in 0..array.len() {
       if array.value(i) > 2 {
           // do x
       };
   }
   ```
   this loop will suffer a lot from this change.
   
   we would like users to change it to 
   
   ```rust
   array.values().for_each(|value| {
       if value > 2 {
           // do x
       };
   });
   ```
   
   For that, we need to mark `array.value(i)` as `unsafe` to indicate that yes, you can use that method, yes, it is fast, _but_ you need to be careful about `i`.

##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
     }
 
     /// Returns the primitive value at index `i`.
-    ///
-    /// Note this doesn't do any bound checking, for performance reason.
-    /// # Safety
-    /// caller must ensure that the passed in offset is less than the array len()
+    #[inline]
     pub fn value(&self, i: usize) -> T::Native {
-        let offset = i + self.offset();
-        unsafe { *self.raw_values.as_ptr().add(offset) }
+        self.values()[i]

Review comment:
       I think that we may need to park this change until we migrate our code base to use `values()` whenever possible. 
   
   If we merge this one, we get a major performance degradation. If we add `unsafe`, we need to add `unsafe` in a lot of places. Neither are great options :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-767712832


   @tyrelr , could you rebase against master?
   
   @kszucs , something may have happened to the force push: the commits seem funny.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-761621256


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=h1) Report
   > Merging [#9215](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=desc) (5cd8075) into [master](https://codecov.io/gh/apache/arrow/commit/a0e12445cc8689befe40c20e2fac6e6df252bef6?el=desc) (a0e1244) will **increase** coverage by `0.03%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9215/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9215      +/-   ##
   ==========================================
   + Coverage   81.64%   81.68%   +0.03%     
   ==========================================
     Files         215      215              
     Lines       52489    52572      +83     
   ==========================================
   + Hits        42857    42945      +88     
   + Misses       9632     9627       -5     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/array/array\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfYmluYXJ5LnJz) | `91.75% <100.00%> (+0.76%)` | :arrow_up: |
   | [rust/arrow/src/array/array\_list.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfbGlzdC5ycw==) | `84.84% <100.00%> (+1.62%)` | :arrow_up: |
   | [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `94.76% <100.00%> (+0.23%)` | :arrow_up: |
   | [rust/arrow/src/array/array\_string.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfc3RyaW5nLnJz) | `91.66% <100.00%> (+1.66%)` | :arrow_up: |
   | [rust/arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYnVpbGRlci5ycw==) | `85.51% <100.00%> (ø)` | |
   | [rust/arrow/src/compute/kernels/cast.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2Nhc3QucnM=) | `96.96% <100.00%> (-0.03%)` | :arrow_down: |
   | [rust/arrow/src/compute/kernels/limit.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2xpbWl0LnJz) | `100.00% <100.00%> (ø)` | |
   | [rust/arrow/src/compute/util.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS91dGlsLnJz) | `98.92% <100.00%> (ø)` | |
   | [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `95.43% <0.00%> (+0.19%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=footer). Last update [a0e1244...9ca08d7](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-767753398


   Could you please rebase on top of master?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#discussion_r562862070



##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
     }
 
     /// Returns the primitive value at index `i`.
-    ///
-    /// Note this doesn't do any bound checking, for performance reason.
-    /// # Safety
-    /// caller must ensure that the passed in offset is less than the array len()
+    #[inline]
     pub fn value(&self, i: usize) -> T::Native {
-        let offset = i + self.offset();
-        unsafe { *self.raw_values.as_ptr().add(offset) }
+        self.values()[i]

Review comment:
       I think that I covered all of those in #9291 . Good that there is no impact on `sort`, which was the only case where I could not optimize (i.e. we really needed the bound check).
   
   The larger task atm is to convert all unit-tests that use `value(i)` to `values()[i]`.
   
   Fyi, the reason I propose keeping `value(i)` as `unsafe` instead of un-performant is that there is no way to efficiently access the value `i` without knowing the internals of the primitiveArray (i.e. pointer and offset). imo this method does serve the purpose of abstracting out those details. It just happens to be `unsafe` and we currently not marking it as such. However, please say if you disagree here.
   
   wrt to the `comparison kernel`, we can create a generic for primitives alone for that use-case, no?
   
   Let me know what do you think




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-766158069


   @tyrelr , I believe that you may have to run the benches against the latest master vs your branch rebased, for consistency: 499b6d0c is before a significant change to the mutable buffer's performance. E.g. this code does not touch `add`, but there is a 40% difference in the benches (consistent with the PR that optimizes that kernel.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#discussion_r561618199



##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
     }
 
     /// Returns the primitive value at index `i`.
-    ///
-    /// Note this doesn't do any bound checking, for performance reason.
-    /// # Safety
-    /// caller must ensure that the passed in offset is less than the array len()
+    #[inline]
     pub fn value(&self, i: usize) -> T::Native {
-        let offset = i + self.offset();
-        unsafe { *self.raw_values.as_ptr().add(offset) }
+        self.values()[i]

Review comment:
       Thinking about it, I am not sure we want this: this will have severe implications to almost all uses of `value()`. Could you run the benches before and after this PR?
   
   My suggestion is that we mark this method as `unsafe`, so that it is clear that people need to be conscious about it. Since this will impact a large part of our code base (and likely others), I suggest that we do this change on a separate PR




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] sunchao commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#discussion_r563210211



##########
File path: rust/arrow/src/array/array_string.rs
##########
@@ -50,49 +50,65 @@ pub struct GenericStringArray<OffsetSize: StringOffsetSizeTrait> {
 }
 
 impl<OffsetSize: StringOffsetSizeTrait> GenericStringArray<OffsetSize> {
-    /// Returns the offset for the element at index `i`.
-    ///
-    /// Note this doesn't do any bound checking, for performance reason.
-    #[inline]
-    pub fn value_offset(&self, i: usize) -> OffsetSize {
-        self.value_offset_at(self.data.offset() + i)
-    }
-
     /// Returns the length for the element at index `i`.
-    ///
-    /// Note this doesn't do any bound checking, for performance reason.
     #[inline]
-    pub fn value_length(&self, mut i: usize) -> OffsetSize {
-        i += self.data.offset();
-        self.value_offset_at(i + 1) - self.value_offset_at(i)
+    pub fn value_length(&self, i: usize) -> OffsetSize {
+        let offsets = self.value_offsets();
+        offsets[i + 1] - offsets[i]
     }
 
-    /// Returns a clone of the value offset buffer
-    pub fn value_offsets(&self) -> Buffer {
-        self.data.buffers()[0].clone()
+    /// Returns the offset values in the offsets buffer
+    #[inline]
+    pub fn value_offsets(&self) -> &[OffsetSize] {

Review comment:
       I'm fine with removing the `value` part - we should do it in another PR though.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-763554846


   @jorgecarleitao  do you think this PR is ready to merge? Does it need any additional review or rebasing?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#discussion_r561618199



##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
     }
 
     /// Returns the primitive value at index `i`.
-    ///
-    /// Note this doesn't do any bound checking, for performance reason.
-    /// # Safety
-    /// caller must ensure that the passed in offset is less than the array len()
+    #[inline]
     pub fn value(&self, i: usize) -> T::Native {
-        let offset = i + self.offset();
-        unsafe { *self.raw_values.as_ptr().add(offset) }
+        self.values()[i]

Review comment:
       Thinking about it, I am not sure we want this: this will have severe implications to almost all uses of `value()`. Could you run the benches before and after this PR?
   
   My suggestion is that we mark this method as `unsafe`, so that it is clear that people need to be conscious about it. Since this will impact a large part of our code base (and likely others), I suggest that we do this change on a separate PR.
   
   I think that we would like to instruct users of this API to switch to `values()`. This change will not trigger that change (as the code continues to run), but will have an amazing performance implication.
   
   Think 
   
   ```rust
   for i in 0..array.len() {
       if array.value(i) > 2 {
           // do x
       };
   }
   ```
   this loop will suffer a lot from this change.
   
   we would like users to change it to 
   
   ```
   array.values().for_each(|value| {
       if value > 2 {
           // do x
       };
   });
   ```
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-761621256


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=h1) Report
   > Merging [#9215](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=desc) (5cd8075) into [master](https://codecov.io/gh/apache/arrow/commit/a0e12445cc8689befe40c20e2fac6e6df252bef6?el=desc) (a0e1244) will **increase** coverage by `0.03%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9215/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9215      +/-   ##
   ==========================================
   + Coverage   81.64%   81.68%   +0.03%     
   ==========================================
     Files         215      215              
     Lines       52489    52572      +83     
   ==========================================
   + Hits        42857    42945      +88     
   + Misses       9632     9627       -5     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/array/array\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfYmluYXJ5LnJz) | `91.75% <100.00%> (+0.76%)` | :arrow_up: |
   | [rust/arrow/src/array/array\_list.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfbGlzdC5ycw==) | `84.84% <100.00%> (+1.62%)` | :arrow_up: |
   | [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `94.76% <100.00%> (+0.23%)` | :arrow_up: |
   | [rust/arrow/src/array/array\_string.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfc3RyaW5nLnJz) | `91.66% <100.00%> (+1.66%)` | :arrow_up: |
   | [rust/arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYnVpbGRlci5ycw==) | `85.51% <100.00%> (ø)` | |
   | [rust/arrow/src/compute/kernels/cast.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2Nhc3QucnM=) | `96.96% <100.00%> (-0.03%)` | :arrow_down: |
   | [rust/arrow/src/compute/kernels/limit.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2xpbWl0LnJz) | `100.00% <100.00%> (ø)` | |
   | [rust/arrow/src/compute/util.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS91dGlsLnJz) | `98.92% <100.00%> (ø)` | |
   | [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `95.43% <0.00%> (+0.19%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=footer). Last update [a0e1244...9ca08d7](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-761621256


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=h1) Report
   > Merging [#9215](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=desc) (78f360b) into [master](https://codecov.io/gh/apache/arrow/commit/499b6d0c5fbcf7d3aabf25e286cb8c300e6de52a?el=desc) (499b6d0) will **increase** coverage by `0.23%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9215/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9215      +/-   ##
   ==========================================
   + Coverage   81.64%   81.88%   +0.23%     
   ==========================================
     Files         215      215              
     Lines       52419    53031     +612     
   ==========================================
   + Hits        42798    43422     +624     
   + Misses       9621     9609      -12     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/array/array.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXkucnM=) | `88.61% <100.00%> (+1.62%)` | :arrow_up: |
   | [rust/arrow/src/array/array\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfYmluYXJ5LnJz) | `91.75% <100.00%> (+0.76%)` | :arrow_up: |
   | [rust/arrow/src/array/array\_list.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfbGlzdC5ycw==) | `93.84% <100.00%> (+0.78%)` | :arrow_up: |
   | [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `94.77% <100.00%> (+0.24%)` | :arrow_up: |
   | [rust/arrow/src/array/array\_string.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfc3RyaW5nLnJz) | `91.66% <100.00%> (+1.66%)` | :arrow_up: |
   | [rust/arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYnVpbGRlci5ycw==) | `85.10% <100.00%> (ø)` | |
   | [rust/arrow/src/compute/kernels/cast.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2Nhc3QucnM=) | `96.96% <100.00%> (-0.03%)` | :arrow_down: |
   | [rust/arrow/src/compute/kernels/limit.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2xpbWl0LnJz) | `100.00% <100.00%> (ø)` | |
   | [rust/arrow/src/compute/util.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS91dGlsLnJz) | `98.92% <100.00%> (ø)` | |
   | [rust/arrow/src/buffer.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYnVmZmVyLnJz) | `93.68% <0.00%> (-2.51%)` | :arrow_down: |
   | ... and [21 more](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=footer). Last update [499b6d0...e18e319](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tyrelr commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
tyrelr commented on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-766361051


   I'm still seeing a mix of inconsistent performance hits/bumps after the rebase.
   ```
   critcmp master-67d0c2e38 array-slice-83b8938af -t 10
   group                        array-slice-83b8938af                  master-67d0c2e38
   -----                        ---------------------                  ----------------
   array_slice 512              1.11    127.8±0.36ns        ? B/sec    1.00    115.4±0.15ns        ? B/sec
   cast int64 to int32 512      1.00      2.3±0.00µs        ? B/sec    1.11      2.5±0.01µs        ? B/sec
   concat i32 1024              1.00      2.5±0.00µs        ? B/sec    1.19      3.0±0.01µs        ? B/sec
   equal_512                    1.14     46.7±0.18ns        ? B/sec    1.00     41.1±0.05ns        ? B/sec
   like_utf8 scalar complex     1.00   1075.9±2.54µs        ? B/sec    1.16   1249.6±2.29µs        ? B/sec
   like_utf8 scalar equals      1.00     70.8±0.08µs        ? B/sec    1.25     88.2±0.12µs        ? B/sec
   min nulls string 512         1.14      6.5±0.04µs        ? B/sec    1.00      5.7±0.03µs        ? B/sec
   min string 512               1.00      3.4±0.00µs        ? B/sec    1.28      4.3±0.01µs        ? B/sec
   multiply 512                 1.38    346.6±0.37ns        ? B/sec    1.00    251.2±0.49ns        ? B/sec
   nlike_utf8 scalar complex    1.00   1162.1±1.30µs        ? B/sec    1.12   1300.7±1.45µs        ? B/sec
   take bool nulls 1024         1.00      3.7±0.03µs        ? B/sec    1.38      5.1±0.03µs        ? B/sec
   take bool nulls 512          1.00   1720.3±8.69ns        ? B/sec    1.42      2.5±0.02µs        ? B/sec
   take i32 512                 1.11   1023.5±1.61ns        ? B/sec    1.00    918.6±1.07ns        ? B/sec
   take i32 nulls 512           1.00    989.6±1.31ns        ? B/sec    1.10   1089.6±2.06ns        ? B/sec
   ```
   I won't push my rebase up unless we decide on some further tweaks to make, as it doesn't seem worth forcing a re-review since there were no conflicts.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tyrelr commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
tyrelr commented on a change in pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#discussion_r561918229



##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
     }
 
     /// Returns the primitive value at index `i`.
-    ///
-    /// Note this doesn't do any bound checking, for performance reason.
-    /// # Safety
-    /// caller must ensure that the passed in offset is less than the array len()
+    #[inline]
     pub fn value(&self, i: usize) -> T::Native {
-        let offset = i + self.offset();
-        unsafe { *self.raw_values.as_ptr().add(offset) }
+        self.values()[i]

Review comment:
       I agree, that we want to guide API users to switch to values().  My hesitancy with making/leaving it unsafe is that I think it's the only Array value(x) method when combined with the other changes in this PR (which is why it got added as a followon commit).  That leads to awkwardness in macros, such as the comparison kernel, where it effectively calls "$left.value(x)",  it could be a papercut for macro-style invocation.  
   
   (There may be an argument that a trait could abstract away the need for macros altogether... but my experiments to rework the comparison kernel to that effect are incomplete - and partly sidelined with various other experiments I'm toying with)
   
   I would expect that in the exact for loop you mention, the compiler could recognize the satisfied loop bound since both len() and .value(i) are inlinable and use the same underlying field, but I expect it is also easy to complicate on that loop to defeat that by having a less direct relationship between i & array.len().  The pretty printing logic probably would do additional bounds checks now, as it iterates based on  len() from the record batch, which is semantically equal, but nothing in the code enforces that in a way I would expect the compiler to understand.
   
   I'll launch some benchmarks of head, master, and the commit just before the PrimitiveArray safety change.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tyrelr commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
tyrelr commented on a change in pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#discussion_r563020240



##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
     }
 
     /// Returns the primitive value at index `i`.
-    ///
-    /// Note this doesn't do any bound checking, for performance reason.
-    /// # Safety
-    /// caller must ensure that the passed in offset is less than the array len()
+    #[inline]
     pub fn value(&self, i: usize) -> T::Native {
-        let offset = i + self.offset();
-        unsafe { *self.raw_values.as_ptr().add(offset) }
+        self.values()[i]

Review comment:
       Just looking over the benchmarks again, one of the biggest changes is actually a performance improvement while doing safe access... which makes very little sense... so I think I'll run the benchmarks again overnight.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tyrelr commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
tyrelr commented on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-764164139


   Rebased.  But I didn't hit the merge conflicts during...  I can see that a number of tests were swapped to use Buffer::from_slice_ref()... so I'll add another commit to switch to that function just to be safe.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#discussion_r561618199



##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
     }
 
     /// Returns the primitive value at index `i`.
-    ///
-    /// Note this doesn't do any bound checking, for performance reason.
-    /// # Safety
-    /// caller must ensure that the passed in offset is less than the array len()
+    #[inline]
     pub fn value(&self, i: usize) -> T::Native {
-        let offset = i + self.offset();
-        unsafe { *self.raw_values.as_ptr().add(offset) }
+        self.values()[i]

Review comment:
       Thinking about it, I am not sure we want this: this will have severe implications to almost all uses of `value()`. Could you run the benches before and after this PR?
   
   My suggestion is that we mark this method as `unsafe`, so that it is clear that people need to be conscious about it. Since this will impact a large part of our code base (and likely others), I suggest that we do this change on a separate PR.
   
   I think that we would like to instruct users of this API to switch to `values()`. This change will not trigger that change (as the code continues to run), but will have an amazing performance implication.
   
   Think 
   
   ```rust
   for i in 0..array.len() {
       if array.value(i) > 2 {
           // do x
       };
   }
   ```
   this loop will suffer a lot from this change.
   
   we would like users to change it to 
   
   ```rust
   array.values().for_each(|value| {
       if value > 2 {
           // do x
       };
   });
   ```
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tyrelr commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
tyrelr commented on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-766105862


   current 10% perf impact looks like:
   ```
   critcmp master-499b6d0c array-slice-41c29bc4 -t 10
   group                            array-slice-41c29bc4                   master-499b6d0c
   -----                            --------------------                   ---------------
   add_nulls_512                    1.00    349.6±1.41ns        ? B/sec    1.30    452.9±1.68ns        ? B/sec
   array_from_vec 128               1.17    466.8±0.76ns        ? B/sec    1.00    399.4±1.21ns        ? B/sec
   bench_primitive                  1.38      3.1±0.01ms  1277.6 MB/sec    1.00      2.3±0.02ms  1766.1 MB/sec
   concat i32 1024                  1.21      3.0±0.00µs        ? B/sec    1.00      2.5±0.01µs        ? B/sec
   equal_string_512                 1.00    111.7±0.34ns        ? B/sec    1.17    131.2±0.77ns        ? B/sec
   from_slice                       1.64    835.2±1.80µs        ? B/sec    1.00    510.7±0.77µs        ? B/sec
   like_utf8 scalar ends with       1.11    366.5±1.52µs        ? B/sec    1.00    331.3±0.62µs        ? B/sec
   like_utf8 scalar equals          1.20    105.6±0.19µs        ? B/sec    1.00     88.2±0.17µs        ? B/sec
   nlike_utf8 scalar starts with    1.13    455.3±2.89µs        ? B/sec    1.00    403.1±3.25µs        ? B/sec
   subtract 512                     1.20    453.0±0.88ns        ? B/sec    1.00    376.1±6.67ns        ? B/sec
   take i32 nulls 512               1.13   1120.3±3.11ns        ? B/sec    1.00    993.1±1.18ns        ? B/sec
   ```
   
   from_slice should just be noise (no changes to buffer)
   the operations on primitives should just be noise (no changes to primitive array)
   But the 'string' functions in particular stand out to me...
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-761618600


   https://issues.apache.org/jira/browse/ARROW-11270


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tyrelr commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
tyrelr commented on a change in pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#discussion_r561918229



##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
     }
 
     /// Returns the primitive value at index `i`.
-    ///
-    /// Note this doesn't do any bound checking, for performance reason.
-    /// # Safety
-    /// caller must ensure that the passed in offset is less than the array len()
+    #[inline]
     pub fn value(&self, i: usize) -> T::Native {
-        let offset = i + self.offset();
-        unsafe { *self.raw_values.as_ptr().add(offset) }
+        self.values()[i]

Review comment:
       I agree, that we want to guide API users to switch to values().  My hesitancy with making/leaving it unsafe is that I think it's the only Array value(x) method when combined with the other changes in this PR (which is why it got added as a followon commit).  That leads to awkwardness in macros, such as the comparison kernel, where it effectively calls "$left.value(x)".  Making macro style invocation sometimes-need-unsafe is a bit unfortunate.
   
   (There may be an argument that a trait could abstract away the need for macros altogether... but my experiments to rework the comparison kernel to that effect are incomplete - and partly sidelined with various other experiments I'm toying with... most of which somehow relate to removing direct calls to __Array.value(i) )
   
   I would expect that in the exact for loop you mention, the compiler could recognize the satisfied loop bound since both len() and .value(i) are inlinable and use the same underlying field, but I expect it is also easy to complicate on that loop to defeat that by having a less direct relationship between i & array.len().  The pretty printing logic probably would do additional bounds checks now, as it iterates based on  len() from the record batch, which is semantically equal, but nothing in the code enforces that in a way I would expect the compiler to understand.
   
   I'll launch some benchmarks of head, master, and the commit just before the PrimitiveArray safety change.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9215: ARROW-11270: [Rust] Array slice accessors

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-761621256


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=h1) Report
   > Merging [#9215](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=desc) (41c29bc) into [master](https://codecov.io/gh/apache/arrow/commit/499b6d0c5fbcf7d3aabf25e286cb8c300e6de52a?el=desc) (499b6d0) will **increase** coverage by `0.03%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9215/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9215      +/-   ##
   ==========================================
   + Coverage   81.64%   81.68%   +0.03%     
   ==========================================
     Files         215      215              
     Lines       52419    52501      +82     
   ==========================================
   + Hits        42798    42885      +87     
   + Misses       9621     9616       -5     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/array/array.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXkucnM=) | `86.99% <100.00%> (ø)` | |
   | [rust/arrow/src/array/array\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfYmluYXJ5LnJz) | `91.75% <100.00%> (+0.76%)` | :arrow_up: |
   | [rust/arrow/src/array/array\_list.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfbGlzdC5ycw==) | `93.84% <100.00%> (+0.78%)` | :arrow_up: |
   | [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `94.77% <100.00%> (+0.24%)` | :arrow_up: |
   | [rust/arrow/src/array/array\_string.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfc3RyaW5nLnJz) | `91.66% <100.00%> (+1.66%)` | :arrow_up: |
   | [rust/arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYnVpbGRlci5ycw==) | `85.10% <100.00%> (ø)` | |
   | [rust/arrow/src/compute/kernels/cast.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2Nhc3QucnM=) | `96.96% <100.00%> (-0.03%)` | :arrow_down: |
   | [rust/arrow/src/compute/kernels/limit.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2xpbWl0LnJz) | `100.00% <100.00%> (ø)` | |
   | [rust/arrow/src/compute/util.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS91dGlsLnJz) | `98.92% <100.00%> (ø)` | |
   | [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `95.43% <0.00%> (+0.19%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=footer). Last update [499b6d0...41c29bc](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org