You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/16 19:17:02 UTC
[GitHub] [arrow] tyrelr opened a new pull request #9215: ARROW-11270: [Rust] Array slice accessors
tyrelr opened a new pull request #9215:
URL: https://github.com/apache/arrow/pull/9215
Using an approach similar to ARROW-10989, migrate typed array API's to use slices where they can.
This impacts the API of GenericBinaryArray<>, GenericListArray<>, GenericStringArray<>
This also enables bounds checking in every value() function on each of the above arrays, as well as PrimitiveArray<>.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#discussion_r563153376
##########
File path: rust/arrow/src/array/array_string.rs
##########
@@ -50,49 +50,65 @@ pub struct GenericStringArray<OffsetSize: StringOffsetSizeTrait> {
}
impl<OffsetSize: StringOffsetSizeTrait> GenericStringArray<OffsetSize> {
- /// Returns the offset for the element at index `i`.
- ///
- /// Note this doesn't do any bound checking, for performance reason.
- #[inline]
- pub fn value_offset(&self, i: usize) -> OffsetSize {
- self.value_offset_at(self.data.offset() + i)
- }
-
/// Returns the length for the element at index `i`.
- ///
- /// Note this doesn't do any bound checking, for performance reason.
#[inline]
- pub fn value_length(&self, mut i: usize) -> OffsetSize {
- i += self.data.offset();
- self.value_offset_at(i + 1) - self.value_offset_at(i)
+ pub fn value_length(&self, i: usize) -> OffsetSize {
+ let offsets = self.value_offsets();
+ offsets[i + 1] - offsets[i]
}
- /// Returns a clone of the value offset buffer
- pub fn value_offsets(&self) -> Buffer {
- self.data.buffers()[0].clone()
+ /// Returns the offset values in the offsets buffer
+ #[inline]
+ pub fn value_offsets(&self) -> &[OffsetSize] {
Review comment:
what do you think of naming it simply `offsets`? I can't see the value of `value_` on the name (same for the other types). Since we are touching the API in backward incompatible ways... xD
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io edited a comment on pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-761621256
# [Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=h1) Report
> Merging [#9215](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=desc) (e18e319) into [master](https://codecov.io/gh/apache/arrow/commit/499b6d0c5fbcf7d3aabf25e286cb8c300e6de52a?el=desc) (499b6d0) will **increase** coverage by `0.23%`.
> The diff coverage is `100.00%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9215/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #9215 +/- ##
==========================================
+ Coverage 81.64% 81.88% +0.23%
==========================================
Files 215 215
Lines 52419 53031 +612
==========================================
+ Hits 42798 43423 +625
+ Misses 9621 9608 -13
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=tree) | Coverage Δ | |
|---|---|---|
| [rust/arrow/src/array/array.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXkucnM=) | `88.61% <100.00%> (+1.62%)` | :arrow_up: |
| [rust/arrow/src/array/array\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfYmluYXJ5LnJz) | `91.75% <100.00%> (+0.76%)` | :arrow_up: |
| [rust/arrow/src/array/array\_list.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfbGlzdC5ycw==) | `93.84% <100.00%> (+0.78%)` | :arrow_up: |
| [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `94.77% <100.00%> (+0.24%)` | :arrow_up: |
| [rust/arrow/src/array/array\_string.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfc3RyaW5nLnJz) | `91.66% <100.00%> (+1.66%)` | :arrow_up: |
| [rust/arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYnVpbGRlci5ycw==) | `85.10% <100.00%> (ø)` | |
| [rust/arrow/src/compute/kernels/cast.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2Nhc3QucnM=) | `96.96% <100.00%> (-0.03%)` | :arrow_down: |
| [rust/arrow/src/compute/kernels/limit.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2xpbWl0LnJz) | `100.00% <100.00%> (ø)` | |
| [rust/arrow/src/compute/util.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS91dGlsLnJz) | `98.92% <100.00%> (ø)` | |
| [rust/arrow/src/buffer.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYnVmZmVyLnJz) | `93.68% <0.00%> (-2.51%)` | :arrow_down: |
| ... and [22 more](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree-more) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=footer). Last update [499b6d0...e18e319](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io edited a comment on pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-761621256
# [Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=h1) Report
> Merging [#9215](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=desc) (dda0545) into [master](https://codecov.io/gh/apache/arrow/commit/ab5fc979c69ccc5dde07e1bc1467b02951b4b7e9?el=desc) (ab5fc97) will **increase** coverage by `0.03%`.
> The diff coverage is `100.00%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9215/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #9215 +/- ##
==========================================
+ Coverage 81.89% 81.92% +0.03%
==========================================
Files 215 215
Lines 52988 53070 +82
==========================================
+ Hits 43392 43478 +86
+ Misses 9596 9592 -4
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=tree) | Coverage Δ | |
|---|---|---|
| [rust/arrow/src/array/array.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXkucnM=) | `88.61% <100.00%> (ø)` | |
| [rust/arrow/src/array/array\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfYmluYXJ5LnJz) | `91.75% <100.00%> (+0.76%)` | :arrow_up: |
| [rust/arrow/src/array/array\_list.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfbGlzdC5ycw==) | `93.84% <100.00%> (+0.78%)` | :arrow_up: |
| [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `94.72% <100.00%> (+0.24%)` | :arrow_up: |
| [rust/arrow/src/array/array\_string.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfc3RyaW5nLnJz) | `95.37% <100.00%> (+1.25%)` | :arrow_up: |
| [rust/arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYnVpbGRlci5ycw==) | `85.10% <100.00%> (ø)` | |
| [rust/arrow/src/compute/kernels/cast.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2Nhc3QucnM=) | `96.96% <100.00%> (-0.03%)` | :arrow_down: |
| [rust/arrow/src/compute/kernels/limit.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2xpbWl0LnJz) | `100.00% <100.00%> (ø)` | |
| [rust/arrow/src/compute/util.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS91dGlsLnJz) | `98.92% <100.00%> (ø)` | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=footer). Last update [ab5fc97...dda0545](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alamb commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-761773323
I apologize for the delay in merging Rust PRs -- the 3.0 release is being finalized now and are planning to minimize entropy by postponing merging changes not critical for the release until the process was complete. I hope the process is complete in the next few days. There is more [discussion](https://lists.apache.org/list.html?dev@arrow.apache.org) in the mailing list
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alamb commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-763561370
Roger -- I'll plan to merge it later today (after the current round of merges have gotten through CI)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorgecarleitao closed pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
jorgecarleitao closed pull request #9215:
URL: https://github.com/apache/arrow/pull/9215
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] tyrelr commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
tyrelr commented on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-766288246
I've rebased locally and will be running benchmarks overnight.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] tyrelr commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
tyrelr commented on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-764164139
Rebased. But I didn't hit the merge conflicts during... I can see that a number of tests were swapped to use Buffer::from_slice_ref()... so I'll add another commit to switch to that function just to be safe.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] tyrelr commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
tyrelr commented on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-766153344
The 10% performance impact looks like mostly improvements:
```
critcmp master-499b6d0c array-slice-78f360b7c -t 10
group array-slice-78f360b7c master-499b6d0c
----- --------------------- ---------------
add 512 1.00 282.9±0.41ns ? B/sec 1.37 387.7±0.71ns ? B/sec
add_nulls_512 1.00 344.7±0.54ns ? B/sec 1.30 449.3±1.08ns ? B/sec
bench_primitive 1.42 3.1±0.00ms 1279.3 MB/sec 1.00 2.2±0.01ms 1815.1 MB/sec
buffer_bit_ops or 1.00 590.9±0.95ns ? B/sec 1.14 673.4±0.93ns ? B/sec
cast int64 to int32 512 1.15 2.5±0.02µs ? B/sec 1.00 2.2±0.01µs ? B/sec
equal_bool_512 1.11 43.1±0.08ns ? B/sec 1.00 38.8±0.06ns ? B/sec
like_utf8 scalar equals 1.00 72.5±0.88µs ? B/sec 1.22 88.4±0.18µs ? B/sec
max nulls 512 1.00 1758.9±6.23ns ? B/sec 1.31 2.3±0.01µs ? B/sec
min string 512 1.00 3.4±0.00µs ? B/sec 1.32 4.5±0.07µs ? B/sec
multiply 512 1.00 335.6±0.67ns ? B/sec 1.19 398.6±5.74ns ? B/sec
mutable str 1024 1.00 2.7±0.00ms ? B/sec 1.14 3.0±0.01ms ? B/sec
subtract 512 1.00 345.3±1.26ns ? B/sec 1.12 385.0±0.77ns ? B/sec
take bool nulls 1024 1.15 4.2±0.03µs ? B/sec 1.00 3.7±0.02µs ? B/sec
```
I don't believe that the list/binary/string array would be involved in most of the benchmarks listed here? (multiply, subtract, take bool, add, add_nulls, bench_primitive should be operations on boolean/primitive arrays, which have no functional changes)...
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] tyrelr commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
tyrelr commented on a change in pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#discussion_r561918229
##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
}
/// Returns the primitive value at index `i`.
- ///
- /// Note this doesn't do any bound checking, for performance reason.
- /// # Safety
- /// caller must ensure that the passed in offset is less than the array len()
+ #[inline]
pub fn value(&self, i: usize) -> T::Native {
- let offset = i + self.offset();
- unsafe { *self.raw_values.as_ptr().add(offset) }
+ self.values()[i]
Review comment:
I agree, that we want to guide API users to switch to values(). My hesitancy with making/leaving it unsafe is that I think it's the only Array value(x) method when combined with the other changes in this PR (which is why it got added as a followon commit). That leads to awkwardness in macros, such as the comparison kernel, where it effectively calls "$left.value(x)", it could be a papercut for macro-style invocation.
(There may be an argument that a trait could abstract away the need for macros altogether... but my experiments to rework the comparison kernel to that effect are incomplete - and partly sidelined with various other experiments I'm toying with)
I would expect that in the exact for loop you mention, the compiler could recognize the satisfied loop bound since both len() and .value(i) are inlinable and use the same underlying field, but I expect it is also easy to complicate on that loop to defeat that by having a less direct relationship between i & array.len(). The pretty printing logic probably would do additional bounds checks now, as it iterates based on len() from the record batch, which is semantically equal, but nothing in the code enforces that in a way I would expect the compiler to understand.
I'll launch some benchmarks of head, master, and the commit just before the PrimitiveArray safety change.
##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
}
/// Returns the primitive value at index `i`.
- ///
- /// Note this doesn't do any bound checking, for performance reason.
- /// # Safety
- /// caller must ensure that the passed in offset is less than the array len()
+ #[inline]
pub fn value(&self, i: usize) -> T::Native {
- let offset = i + self.offset();
- unsafe { *self.raw_values.as_ptr().add(offset) }
+ self.values()[i]
Review comment:
I agree, that we want to guide API users to switch to values(). My hesitancy with making/leaving it unsafe is that I think it's the only Array value(x) method when combined with the other changes in this PR (which is why it got added as a followon commit). That leads to awkwardness in macros, such as the comparison kernel, where it effectively calls "$left.value(x)". Making macro style invocation sometimes-need-unsafe is a bit unfortunate.
(There may be an argument that a trait could abstract away the need for macros altogether... but my experiments to rework the comparison kernel to that effect are incomplete - and partly sidelined with various other experiments I'm toying with... most of which somehow relate to removing direct calls to __Array.value(i) )
I would expect that in the exact for loop you mention, the compiler could recognize the satisfied loop bound since both len() and .value(i) are inlinable and use the same underlying field, but I expect it is also easy to complicate on that loop to defeat that by having a less direct relationship between i & array.len(). The pretty printing logic probably would do additional bounds checks now, as it iterates based on len() from the record batch, which is semantically equal, but nothing in the code enforces that in a way I would expect the compiler to understand.
I'll launch some benchmarks of head, master, and the commit just before the PrimitiveArray safety change.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] tyrelr commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
tyrelr commented on a change in pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#discussion_r562999845
##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
}
/// Returns the primitive value at index `i`.
- ///
- /// Note this doesn't do any bound checking, for performance reason.
- /// # Safety
- /// caller must ensure that the passed in offset is less than the array len()
+ #[inline]
pub fn value(&self, i: usize) -> T::Native {
- let offset = i + self.offset();
- unsafe { *self.raw_values.as_ptr().add(offset) }
+ self.values()[i]
Review comment:
#9291 is good progress towards eliminating use of the function.
And certainly, we could 'split' the macro for primitives as a quick fix to get rid of the call to the function. I've been experimenting with an alternative approach that might be a bit more flexible to multiple use cases, described at the bottom of this comment.
I am quite torn about whether I think value should or should not be in the interface.
# Reasons to drop value(i) -> T::Native
I think that even if value(i) was dropped from the PrimitiveArray impl's, efficient random access to items without a bounds check can still be achieved through ```unsafe{*primitive_array.values().get_unchecked(i)}``` (the extra * because get_unchecked() returns a ref to the value).
I'm not sure I have any example code or measurements to demonstrate it on hand, but I am certain I saw the silently-unsafe implementation```x.values().iter().zip(y.values().iter())``` did (slightly) outperform ```(0..x.len()).map(|i|{x.value(i),y.value(i)}```. I believe it was when I was playing with non-simd arithmetic kernels.... So that is the root of my hesitancy, is I'm worried it doesn't actually escape any overhead, and unintentionally lead people away from a more reliable/performant way. IF there is a context where unsafe{x.value(i)} beats the performance of unsafe{*x.values().get_unchecked(i)}
# Reasons to keep value(i) -> T::Native
All other array implementations have value functions as far as I recall, so it is a nice 'consistency'.
In the back of my mind, the biggest argument to keep value(i) is for api consistency... so long term, a 'trait' may be the place where it might fit best? Very roughly, I'm thinking:
```
trait TypedArrowArray : ArrowArray {
type RefType;
fn is_valid(i:usize) -> bool; //bounds check
unsafe fn is_valid(i:usize) -> bool; //no bounds check
fn value(i:usize) -> RefType; //bounds check
unsafe fn value_unchecked(i:usize) -> RefType; //no bounds checked
fn iter() -> impl Iterator<Option<RefType>>;
fn iter_values() -> impl Iterator<RefType>;
}
impl <T: ArrowPrimitiveType> TypedArrowArray<&T::Native> for PrimitiveArray<T> { ... }
impl TypedArrowArray<ArrayRef> for GenericListArray<T> { ... }
//and similar for string/binary. ... I am not sure whether struct arrays could fit... Dictionary would not give access to 'keys', only to the values referenced by each key? Union would require some kind of RefType that can downcast into the actual value?
```
Of course, I am uncertain how much overhead the 'standarization' such a trait impl implies would bring... would any kernels actually benefit from using generic implementations against such an api, or will they always go down to the concrete type to squeeze little short-cuts out that don't fit in the generic interface? I'm unsure, so very (very, very) slowly experimenting...
# Summary
So in short, my thoughts are:
* I think that leaving value(i) safety consideration out of this PR makes sense. I've rebased to drop that out - although I did leave the additional values() test code.
* Marking it unsafe in the near future is absolutely better than being silently-unsafe. The argument that adding bounds-checks could silently impact external users is reasonable, taking unsafe has the larger 'warning' so that the change isn't missed.
* Longer term, the options of deprecating it, or explicitly moving it into an trait impl are both contenders in my mind... but neither option is directly relevant to this PR.
Let me know if that seems reasonable.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] tyrelr commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
tyrelr commented on a change in pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#discussion_r562406187
##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
}
/// Returns the primitive value at index `i`.
- ///
- /// Note this doesn't do any bound checking, for performance reason.
- /// # Safety
- /// caller must ensure that the passed in offset is less than the array len()
+ #[inline]
pub fn value(&self, i: usize) -> T::Native {
- let offset = i + self.offset();
- unsafe { *self.raw_values.as_ptr().add(offset) }
+ self.values()[i]
Review comment:
That makes sense. I'll rebase dropping that commit.
##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
}
/// Returns the primitive value at index `i`.
- ///
- /// Note this doesn't do any bound checking, for performance reason.
- /// # Safety
- /// caller must ensure that the passed in offset is less than the array len()
+ #[inline]
pub fn value(&self, i: usize) -> T::Native {
- let offset = i + self.offset();
- unsafe { *self.raw_values.as_ptr().add(offset) }
+ self.values()[i]
Review comment:
safety overhead on a few benchmarks seems to be 15-30%
```
group master- safe-prim- unsafe-prim-
----- ------- ---------- ------------
add 512 1.00 307.4±4.79ns ? B/sec 1.07 327.7±6.48ns ? B/sec 1.30 400.5±7.09ns ? B/sec
add_nulls_512 1.20 447.8±8.09ns ? B/sec 1.32 490.9±3.41ns ? B/sec 1.00 372.2±5.96ns ? B/sec
buffer_bit_ops or 1.17 731.0±20.02ns ? B/sec 1.19 743.7±5.43ns ? B/sec 1.00 624.0±10.86ns ? B/sec
cast date64 to date32 512 1.00 7.4±0.22µs ? B/sec 1.10 8.2±0.07µs ? B/sec 1.03 7.7±0.22µs ? B/sec
cast i64 to string 512 1.00 24.8±0.33µs ? B/sec 1.09 27.1±0.79µs ? B/sec 1.10 27.3±0.41µs ? B/sec
cast time32s to time32ms 512 1.00 917.0±19.20ns ? B/sec 1.21 1113.0±22.04ns ? B/sec 1.07 984.5±7.53ns ? B/sec
equal_512 1.00 44.1±0.81ns ? B/sec 1.13 49.9±0.65ns ? B/sec 1.01 44.4±0.27ns ? B/sec
min nulls string 512 1.04 6.4±0.12µs ? B/sec 1.14 7.0±0.15µs ? B/sec 1.00 6.2±0.09µs ? B/sec
multiply 512 1.14 471.5±10.15ns ? B/sec 1.18 487.0±4.72ns ? B/sec 1.00 413.4±6.16ns ? B/sec
not 1.00 1102.3±14.55ns ? B/sec 1.12 1235.9±19.91ns ? B/sec 1.02 1120.3±19.83ns ? B/sec
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#discussion_r561618199
##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
}
/// Returns the primitive value at index `i`.
- ///
- /// Note this doesn't do any bound checking, for performance reason.
- /// # Safety
- /// caller must ensure that the passed in offset is less than the array len()
+ #[inline]
pub fn value(&self, i: usize) -> T::Native {
- let offset = i + self.offset();
- unsafe { *self.raw_values.as_ptr().add(offset) }
+ self.values()[i]
Review comment:
Thinking about it, I am not sure we want this: this will have severe implications to almost all uses of `value()`. Could you run the benches before and after this PR?
My suggestion is that we mark this method as `unsafe`, so that it is clear that people need to be conscious about it. Since this will impact a large part of our code base (and likely others), I suggest that we do this change on a separate PR.
I think that we would like to instruct users of this API to switch to `values()`. This change will not trigger that change (as the code continues to run), but will have an amazing performance implication.
Think
```rust
for i in 0..array.len() {
if array.value(i) > 2 {
// do x
};
}
```
this loop will suffer a lot from this change.
we would like users to change it to
```rust
array.values().for_each(|value| {
if value > 2 {
// do x
};
});
```
For that, we need to mark `array.value(i)` as `unsafe` to indicate that yes, you can use that method, yes, it is fast, _but_ you need to be careful about `i`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] tyrelr commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
tyrelr commented on a change in pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#discussion_r563168268
##########
File path: rust/arrow/src/array/array_string.rs
##########
@@ -50,49 +50,65 @@ pub struct GenericStringArray<OffsetSize: StringOffsetSizeTrait> {
}
impl<OffsetSize: StringOffsetSizeTrait> GenericStringArray<OffsetSize> {
- /// Returns the offset for the element at index `i`.
- ///
- /// Note this doesn't do any bound checking, for performance reason.
- #[inline]
- pub fn value_offset(&self, i: usize) -> OffsetSize {
- self.value_offset_at(self.data.offset() + i)
- }
-
/// Returns the length for the element at index `i`.
- ///
- /// Note this doesn't do any bound checking, for performance reason.
#[inline]
- pub fn value_length(&self, mut i: usize) -> OffsetSize {
- i += self.data.offset();
- self.value_offset_at(i + 1) - self.value_offset_at(i)
+ pub fn value_length(&self, i: usize) -> OffsetSize {
+ let offsets = self.value_offsets();
+ offsets[i + 1] - offsets[i]
}
- /// Returns a clone of the value offset buffer
- pub fn value_offsets(&self) -> Buffer {
- self.data.buffers()[0].clone()
+ /// Returns the offset values in the offsets buffer
+ #[inline]
+ pub fn value_offsets(&self) -> &[OffsetSize] {
+ // Soundness
+ // pointer alignment & location is ensured by RawPtrBox
+ // buffer bounds/offset is ensured by the ArrayData instance.
+ unsafe {
+ std::slice::from_raw_parts(
+ self.value_offsets.as_ptr().add(self.data.offset()),
+ self.len() + 1,
+ )
+ }
}
/// Returns a clone of the value data buffer
pub fn value_data(&self) -> Buffer {
self.data.buffers()[1].clone()
}
- #[inline]
- fn value_offset_at(&self, i: usize) -> OffsetSize {
- unsafe { *self.value_offsets.as_ptr().add(i) }
+ /// Returns the element at index
+ /// # Safety
+ /// caller is responsible for ensuring that index is within the array bounds
+ pub unsafe fn value_unchecked(&self, i: usize) -> &str {
+ let end = self.value_offsets().get_unchecked(i + 1);
+ let start = self.value_offsets().get_unchecked(i);
+
+ // Soundness
+ // pointer alignment & location is ensured by RawPtrBox
+ // buffer bounds/offset is ensured by the value_offset invariants
+ // ISSUE: utf-8 well formedness is not checked
+ let slice = std::slice::from_raw_parts(
+ self.value_data.as_ptr().offset(start.to_isize()),
+ (*end - *start).to_usize().unwrap(),
+ );
+ std::str::from_utf8_unchecked(slice)
}
/// Returns the element at index `i` as &str
pub fn value(&self, i: usize) -> &str {
assert!(i < self.data.len(), "StringArray out of bounds access");
- let offset = i.checked_add(self.data.offset()).unwrap();
+ let end = self.value_offsets()[i + 1];
+ let start = self.value_offsets()[i];
Review comment:
switching to unsafe unsafe access here would help the string benchmarks.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] tyrelr commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
tyrelr commented on a change in pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#discussion_r563166348
##########
File path: rust/arrow/src/array/array_string.rs
##########
@@ -50,49 +50,65 @@ pub struct GenericStringArray<OffsetSize: StringOffsetSizeTrait> {
}
impl<OffsetSize: StringOffsetSizeTrait> GenericStringArray<OffsetSize> {
- /// Returns the offset for the element at index `i`.
- ///
- /// Note this doesn't do any bound checking, for performance reason.
- #[inline]
- pub fn value_offset(&self, i: usize) -> OffsetSize {
- self.value_offset_at(self.data.offset() + i)
- }
-
/// Returns the length for the element at index `i`.
- ///
- /// Note this doesn't do any bound checking, for performance reason.
#[inline]
- pub fn value_length(&self, mut i: usize) -> OffsetSize {
- i += self.data.offset();
- self.value_offset_at(i + 1) - self.value_offset_at(i)
+ pub fn value_length(&self, i: usize) -> OffsetSize {
+ let offsets = self.value_offsets();
+ offsets[i + 1] - offsets[i]
}
- /// Returns a clone of the value offset buffer
- pub fn value_offsets(&self) -> Buffer {
- self.data.buffers()[0].clone()
+ /// Returns the offset values in the offsets buffer
+ #[inline]
+ pub fn value_offsets(&self) -> &[OffsetSize] {
Review comment:
I had considered that, but was concerned that it may be confused with the buffer.offset... my hope was that it'd make it clear that it's an "offset into values()" (not into the buffer, which has the additional array-slicing offset).
I could go either way with it though?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-761621256
# [Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=h1) Report
> Merging [#9215](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=desc) (47086b8) into [master](https://codecov.io/gh/apache/arrow/commit/1393188e1aa1b3d59993ce7d4ade7f7ac8570959?el=desc) (1393188) will **increase** coverage by `0.03%`.
> The diff coverage is `100.00%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9215/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #9215 +/- ##
==========================================
+ Coverage 81.61% 81.64% +0.03%
==========================================
Files 215 215
Lines 51867 51950 +83
==========================================
+ Hits 42329 42415 +86
+ Misses 9538 9535 -3
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=tree) | Coverage Δ | |
|---|---|---|
| [rust/arrow/src/array/array\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfYmluYXJ5LnJz) | `91.75% <100.00%> (+0.76%)` | :arrow_up: |
| [rust/arrow/src/array/array\_list.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfbGlzdC5ycw==) | `93.90% <100.00%> (+0.80%)` | :arrow_up: |
| [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `92.89% <100.00%> (+0.31%)` | :arrow_up: |
| [rust/arrow/src/array/array\_string.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfc3RyaW5nLnJz) | `90.62% <100.00%> (+1.73%)` | :arrow_up: |
| [rust/arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYnVpbGRlci5ycw==) | `85.89% <100.00%> (ø)` | |
| [rust/arrow/src/compute/kernels/cast.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2Nhc3QucnM=) | `96.97% <100.00%> (-0.03%)` | :arrow_down: |
| [rust/arrow/src/compute/kernels/limit.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2xpbWl0LnJz) | `100.00% <100.00%> (ø)` | |
| [rust/arrow/src/compute/util.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS91dGlsLnJz) | `98.93% <100.00%> (ø)` | |
| [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `94.86% <0.00%> (-0.20%)` | :arrow_down: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=footer). Last update [eaa7b7a...47086b8](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorgecarleitao commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-763555725
Ready to merge. Note that it has some backward incompatible changes, so PRs on top need to be merged carefully.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#discussion_r561618199
##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
}
/// Returns the primitive value at index `i`.
- ///
- /// Note this doesn't do any bound checking, for performance reason.
- /// # Safety
- /// caller must ensure that the passed in offset is less than the array len()
+ #[inline]
pub fn value(&self, i: usize) -> T::Native {
- let offset = i + self.offset();
- unsafe { *self.raw_values.as_ptr().add(offset) }
+ self.values()[i]
Review comment:
Thinking about it, I am not sure we want this: this will have severe implications to almost all uses of `value()`. Could you run the benches before and after this PR?
My suggestion is that we mark this method as `unsafe`, so that it is clear that people need to be conscious about it. Since this will impact a large part of our code base (and likely others), I suggest that we do this change on a separate PR
##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
}
/// Returns the primitive value at index `i`.
- ///
- /// Note this doesn't do any bound checking, for performance reason.
- /// # Safety
- /// caller must ensure that the passed in offset is less than the array len()
+ #[inline]
pub fn value(&self, i: usize) -> T::Native {
- let offset = i + self.offset();
- unsafe { *self.raw_values.as_ptr().add(offset) }
+ self.values()[i]
Review comment:
Thinking about it, I am not sure we want this: this will have severe implications to almost all uses of `value()`. Could you run the benches before and after this PR?
My suggestion is that we mark this method as `unsafe`, so that it is clear that people need to be conscious about it. Since this will impact a large part of our code base (and likely others), I suggest that we do this change on a separate PR.
I think that we would like to instruct users of this API to switch to `values()`. This change will not trigger that change (as the code continues to run), but will have an amazing performance implication.
Think
```rust
for i in 0..array.len() {
if array.value(i) > 2 {
// do x
};
}
```
this loop will suffer a lot from this change.
we would like users to change it to
```
array.values().for_each(|value| {
if value > 2 {
// do x
};
});
```
##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
}
/// Returns the primitive value at index `i`.
- ///
- /// Note this doesn't do any bound checking, for performance reason.
- /// # Safety
- /// caller must ensure that the passed in offset is less than the array len()
+ #[inline]
pub fn value(&self, i: usize) -> T::Native {
- let offset = i + self.offset();
- unsafe { *self.raw_values.as_ptr().add(offset) }
+ self.values()[i]
Review comment:
Thinking about it, I am not sure we want this: this will have severe implications to almost all uses of `value()`. Could you run the benches before and after this PR?
My suggestion is that we mark this method as `unsafe`, so that it is clear that people need to be conscious about it. Since this will impact a large part of our code base (and likely others), I suggest that we do this change on a separate PR.
I think that we would like to instruct users of this API to switch to `values()`. This change will not trigger that change (as the code continues to run), but will have an amazing performance implication.
Think
```rust
for i in 0..array.len() {
if array.value(i) > 2 {
// do x
};
}
```
this loop will suffer a lot from this change.
we would like users to change it to
```rust
array.values().for_each(|value| {
if value > 2 {
// do x
};
});
```
##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
}
/// Returns the primitive value at index `i`.
- ///
- /// Note this doesn't do any bound checking, for performance reason.
- /// # Safety
- /// caller must ensure that the passed in offset is less than the array len()
+ #[inline]
pub fn value(&self, i: usize) -> T::Native {
- let offset = i + self.offset();
- unsafe { *self.raw_values.as_ptr().add(offset) }
+ self.values()[i]
Review comment:
Thinking about it, I am not sure we want this: this will have severe implications to almost all uses of `value()`. Could you run the benches before and after this PR?
My suggestion is that we mark this method as `unsafe`, so that it is clear that people need to be conscious about it. Since this will impact a large part of our code base (and likely others), I suggest that we do this change on a separate PR.
I think that we would like to instruct users of this API to switch to `values()`. This change will not trigger that change (as the code continues to run), but will have an amazing performance implication.
Think
```rust
for i in 0..array.len() {
if array.value(i) > 2 {
// do x
};
}
```
this loop will suffer a lot from this change.
we would like users to change it to
```rust
array.values().for_each(|value| {
if value > 2 {
// do x
};
});
```
For that, we need to mark `array.value(i)` as `unsafe` to indicate that yes, you can use that method, yes, it is fast, _but_ you need to be careful about `i`.
##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
}
/// Returns the primitive value at index `i`.
- ///
- /// Note this doesn't do any bound checking, for performance reason.
- /// # Safety
- /// caller must ensure that the passed in offset is less than the array len()
+ #[inline]
pub fn value(&self, i: usize) -> T::Native {
- let offset = i + self.offset();
- unsafe { *self.raw_values.as_ptr().add(offset) }
+ self.values()[i]
Review comment:
I think that we may need to park this change until we migrate our code base to use `values()` whenever possible.
If we merge this one, we get a major performance degradation. If we add `unsafe`, we need to add `unsafe` in a lot of places. Neither are great options :)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorgecarleitao commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-767712832
@tyrelr , could you rebase against master?
@kszucs , something may have happened to the force push: the commits seem funny.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io edited a comment on pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-761621256
# [Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=h1) Report
> Merging [#9215](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=desc) (5cd8075) into [master](https://codecov.io/gh/apache/arrow/commit/a0e12445cc8689befe40c20e2fac6e6df252bef6?el=desc) (a0e1244) will **increase** coverage by `0.03%`.
> The diff coverage is `100.00%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9215/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #9215 +/- ##
==========================================
+ Coverage 81.64% 81.68% +0.03%
==========================================
Files 215 215
Lines 52489 52572 +83
==========================================
+ Hits 42857 42945 +88
+ Misses 9632 9627 -5
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=tree) | Coverage Δ | |
|---|---|---|
| [rust/arrow/src/array/array\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfYmluYXJ5LnJz) | `91.75% <100.00%> (+0.76%)` | :arrow_up: |
| [rust/arrow/src/array/array\_list.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfbGlzdC5ycw==) | `84.84% <100.00%> (+1.62%)` | :arrow_up: |
| [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `94.76% <100.00%> (+0.23%)` | :arrow_up: |
| [rust/arrow/src/array/array\_string.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfc3RyaW5nLnJz) | `91.66% <100.00%> (+1.66%)` | :arrow_up: |
| [rust/arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYnVpbGRlci5ycw==) | `85.51% <100.00%> (ø)` | |
| [rust/arrow/src/compute/kernels/cast.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2Nhc3QucnM=) | `96.96% <100.00%> (-0.03%)` | :arrow_down: |
| [rust/arrow/src/compute/kernels/limit.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2xpbWl0LnJz) | `100.00% <100.00%> (ø)` | |
| [rust/arrow/src/compute/util.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS91dGlsLnJz) | `98.92% <100.00%> (ø)` | |
| [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `95.43% <0.00%> (+0.19%)` | :arrow_up: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=footer). Last update [a0e1244...9ca08d7](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] kszucs commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-767753398
Could you please rebase on top of master?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#discussion_r562862070
##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
}
/// Returns the primitive value at index `i`.
- ///
- /// Note this doesn't do any bound checking, for performance reason.
- /// # Safety
- /// caller must ensure that the passed in offset is less than the array len()
+ #[inline]
pub fn value(&self, i: usize) -> T::Native {
- let offset = i + self.offset();
- unsafe { *self.raw_values.as_ptr().add(offset) }
+ self.values()[i]
Review comment:
I think that I covered all of those in #9291 . Good that there is no impact on `sort`, which was the only case where I could not optimize (i.e. we really needed the bound check).
The larger task atm is to convert all unit-tests that use `value(i)` to `values()[i]`.
Fyi, the reason I propose keeping `value(i)` as `unsafe` instead of un-performant is that there is no way to efficiently access the value `i` without knowing the internals of the primitiveArray (i.e. pointer and offset). imo this method does serve the purpose of abstracting out those details. It just happens to be `unsafe` and we currently not marking it as such. However, please say if you disagree here.
wrt to the `comparison kernel`, we can create a generic for primitives alone for that use-case, no?
Let me know what do you think
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorgecarleitao commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-766158069
@tyrelr , I believe that you may have to run the benches against the latest master vs your branch rebased, for consistency: 499b6d0c is before a significant change to the mutable buffer's performance. E.g. this code does not touch `add`, but there is a 40% difference in the benches (consistent with the PR that optimizes that kernel.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#discussion_r561618199
##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
}
/// Returns the primitive value at index `i`.
- ///
- /// Note this doesn't do any bound checking, for performance reason.
- /// # Safety
- /// caller must ensure that the passed in offset is less than the array len()
+ #[inline]
pub fn value(&self, i: usize) -> T::Native {
- let offset = i + self.offset();
- unsafe { *self.raw_values.as_ptr().add(offset) }
+ self.values()[i]
Review comment:
Thinking about it, I am not sure we want this: this will have severe implications to almost all uses of `value()`. Could you run the benches before and after this PR?
My suggestion is that we mark this method as `unsafe`, so that it is clear that people need to be conscious about it. Since this will impact a large part of our code base (and likely others), I suggest that we do this change on a separate PR
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] sunchao commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#discussion_r563210211
##########
File path: rust/arrow/src/array/array_string.rs
##########
@@ -50,49 +50,65 @@ pub struct GenericStringArray<OffsetSize: StringOffsetSizeTrait> {
}
impl<OffsetSize: StringOffsetSizeTrait> GenericStringArray<OffsetSize> {
- /// Returns the offset for the element at index `i`.
- ///
- /// Note this doesn't do any bound checking, for performance reason.
- #[inline]
- pub fn value_offset(&self, i: usize) -> OffsetSize {
- self.value_offset_at(self.data.offset() + i)
- }
-
/// Returns the length for the element at index `i`.
- ///
- /// Note this doesn't do any bound checking, for performance reason.
#[inline]
- pub fn value_length(&self, mut i: usize) -> OffsetSize {
- i += self.data.offset();
- self.value_offset_at(i + 1) - self.value_offset_at(i)
+ pub fn value_length(&self, i: usize) -> OffsetSize {
+ let offsets = self.value_offsets();
+ offsets[i + 1] - offsets[i]
}
- /// Returns a clone of the value offset buffer
- pub fn value_offsets(&self) -> Buffer {
- self.data.buffers()[0].clone()
+ /// Returns the offset values in the offsets buffer
+ #[inline]
+ pub fn value_offsets(&self) -> &[OffsetSize] {
Review comment:
I'm fine with removing the `value` part - we should do it in another PR though.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alamb commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-763554846
@jorgecarleitao do you think this PR is ready to merge? Does it need any additional review or rebasing?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#discussion_r561618199
##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
}
/// Returns the primitive value at index `i`.
- ///
- /// Note this doesn't do any bound checking, for performance reason.
- /// # Safety
- /// caller must ensure that the passed in offset is less than the array len()
+ #[inline]
pub fn value(&self, i: usize) -> T::Native {
- let offset = i + self.offset();
- unsafe { *self.raw_values.as_ptr().add(offset) }
+ self.values()[i]
Review comment:
Thinking about it, I am not sure we want this: this will have severe implications to almost all uses of `value()`. Could you run the benches before and after this PR?
My suggestion is that we mark this method as `unsafe`, so that it is clear that people need to be conscious about it. Since this will impact a large part of our code base (and likely others), I suggest that we do this change on a separate PR.
I think that we would like to instruct users of this API to switch to `values()`. This change will not trigger that change (as the code continues to run), but will have an amazing performance implication.
Think
```rust
for i in 0..array.len() {
if array.value(i) > 2 {
// do x
};
}
```
this loop will suffer a lot from this change.
we would like users to change it to
```
array.values().for_each(|value| {
if value > 2 {
// do x
};
});
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io edited a comment on pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-761621256
# [Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=h1) Report
> Merging [#9215](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=desc) (5cd8075) into [master](https://codecov.io/gh/apache/arrow/commit/a0e12445cc8689befe40c20e2fac6e6df252bef6?el=desc) (a0e1244) will **increase** coverage by `0.03%`.
> The diff coverage is `100.00%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9215/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #9215 +/- ##
==========================================
+ Coverage 81.64% 81.68% +0.03%
==========================================
Files 215 215
Lines 52489 52572 +83
==========================================
+ Hits 42857 42945 +88
+ Misses 9632 9627 -5
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=tree) | Coverage Δ | |
|---|---|---|
| [rust/arrow/src/array/array\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfYmluYXJ5LnJz) | `91.75% <100.00%> (+0.76%)` | :arrow_up: |
| [rust/arrow/src/array/array\_list.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfbGlzdC5ycw==) | `84.84% <100.00%> (+1.62%)` | :arrow_up: |
| [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `94.76% <100.00%> (+0.23%)` | :arrow_up: |
| [rust/arrow/src/array/array\_string.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfc3RyaW5nLnJz) | `91.66% <100.00%> (+1.66%)` | :arrow_up: |
| [rust/arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYnVpbGRlci5ycw==) | `85.51% <100.00%> (ø)` | |
| [rust/arrow/src/compute/kernels/cast.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2Nhc3QucnM=) | `96.96% <100.00%> (-0.03%)` | :arrow_down: |
| [rust/arrow/src/compute/kernels/limit.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2xpbWl0LnJz) | `100.00% <100.00%> (ø)` | |
| [rust/arrow/src/compute/util.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS91dGlsLnJz) | `98.92% <100.00%> (ø)` | |
| [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `95.43% <0.00%> (+0.19%)` | :arrow_up: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=footer). Last update [a0e1244...9ca08d7](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io edited a comment on pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-761621256
# [Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=h1) Report
> Merging [#9215](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=desc) (78f360b) into [master](https://codecov.io/gh/apache/arrow/commit/499b6d0c5fbcf7d3aabf25e286cb8c300e6de52a?el=desc) (499b6d0) will **increase** coverage by `0.23%`.
> The diff coverage is `100.00%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9215/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #9215 +/- ##
==========================================
+ Coverage 81.64% 81.88% +0.23%
==========================================
Files 215 215
Lines 52419 53031 +612
==========================================
+ Hits 42798 43422 +624
+ Misses 9621 9609 -12
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=tree) | Coverage Δ | |
|---|---|---|
| [rust/arrow/src/array/array.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXkucnM=) | `88.61% <100.00%> (+1.62%)` | :arrow_up: |
| [rust/arrow/src/array/array\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfYmluYXJ5LnJz) | `91.75% <100.00%> (+0.76%)` | :arrow_up: |
| [rust/arrow/src/array/array\_list.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfbGlzdC5ycw==) | `93.84% <100.00%> (+0.78%)` | :arrow_up: |
| [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `94.77% <100.00%> (+0.24%)` | :arrow_up: |
| [rust/arrow/src/array/array\_string.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfc3RyaW5nLnJz) | `91.66% <100.00%> (+1.66%)` | :arrow_up: |
| [rust/arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYnVpbGRlci5ycw==) | `85.10% <100.00%> (ø)` | |
| [rust/arrow/src/compute/kernels/cast.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2Nhc3QucnM=) | `96.96% <100.00%> (-0.03%)` | :arrow_down: |
| [rust/arrow/src/compute/kernels/limit.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2xpbWl0LnJz) | `100.00% <100.00%> (ø)` | |
| [rust/arrow/src/compute/util.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS91dGlsLnJz) | `98.92% <100.00%> (ø)` | |
| [rust/arrow/src/buffer.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYnVmZmVyLnJz) | `93.68% <0.00%> (-2.51%)` | :arrow_down: |
| ... and [21 more](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree-more) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=footer). Last update [499b6d0...e18e319](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] tyrelr commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
tyrelr commented on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-766361051
I'm still seeing a mix of inconsistent performance hits/bumps after the rebase.
```
critcmp master-67d0c2e38 array-slice-83b8938af -t 10
group array-slice-83b8938af master-67d0c2e38
----- --------------------- ----------------
array_slice 512 1.11 127.8±0.36ns ? B/sec 1.00 115.4±0.15ns ? B/sec
cast int64 to int32 512 1.00 2.3±0.00µs ? B/sec 1.11 2.5±0.01µs ? B/sec
concat i32 1024 1.00 2.5±0.00µs ? B/sec 1.19 3.0±0.01µs ? B/sec
equal_512 1.14 46.7±0.18ns ? B/sec 1.00 41.1±0.05ns ? B/sec
like_utf8 scalar complex 1.00 1075.9±2.54µs ? B/sec 1.16 1249.6±2.29µs ? B/sec
like_utf8 scalar equals 1.00 70.8±0.08µs ? B/sec 1.25 88.2±0.12µs ? B/sec
min nulls string 512 1.14 6.5±0.04µs ? B/sec 1.00 5.7±0.03µs ? B/sec
min string 512 1.00 3.4±0.00µs ? B/sec 1.28 4.3±0.01µs ? B/sec
multiply 512 1.38 346.6±0.37ns ? B/sec 1.00 251.2±0.49ns ? B/sec
nlike_utf8 scalar complex 1.00 1162.1±1.30µs ? B/sec 1.12 1300.7±1.45µs ? B/sec
take bool nulls 1024 1.00 3.7±0.03µs ? B/sec 1.38 5.1±0.03µs ? B/sec
take bool nulls 512 1.00 1720.3±8.69ns ? B/sec 1.42 2.5±0.02µs ? B/sec
take i32 512 1.11 1023.5±1.61ns ? B/sec 1.00 918.6±1.07ns ? B/sec
take i32 nulls 512 1.00 989.6±1.31ns ? B/sec 1.10 1089.6±2.06ns ? B/sec
```
I won't push my rebase up unless we decide on some further tweaks to make, as it doesn't seem worth forcing a re-review since there were no conflicts.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] tyrelr commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
tyrelr commented on a change in pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#discussion_r561918229
##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
}
/// Returns the primitive value at index `i`.
- ///
- /// Note this doesn't do any bound checking, for performance reason.
- /// # Safety
- /// caller must ensure that the passed in offset is less than the array len()
+ #[inline]
pub fn value(&self, i: usize) -> T::Native {
- let offset = i + self.offset();
- unsafe { *self.raw_values.as_ptr().add(offset) }
+ self.values()[i]
Review comment:
I agree, that we want to guide API users to switch to values(). My hesitancy with making/leaving it unsafe is that I think it's the only Array value(x) method when combined with the other changes in this PR (which is why it got added as a followon commit). That leads to awkwardness in macros, such as the comparison kernel, where it effectively calls "$left.value(x)", it could be a papercut for macro-style invocation.
(There may be an argument that a trait could abstract away the need for macros altogether... but my experiments to rework the comparison kernel to that effect are incomplete - and partly sidelined with various other experiments I'm toying with)
I would expect that in the exact for loop you mention, the compiler could recognize the satisfied loop bound since both len() and .value(i) are inlinable and use the same underlying field, but I expect it is also easy to complicate on that loop to defeat that by having a less direct relationship between i & array.len(). The pretty printing logic probably would do additional bounds checks now, as it iterates based on len() from the record batch, which is semantically equal, but nothing in the code enforces that in a way I would expect the compiler to understand.
I'll launch some benchmarks of head, master, and the commit just before the PrimitiveArray safety change.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] tyrelr commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
tyrelr commented on a change in pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#discussion_r563020240
##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
}
/// Returns the primitive value at index `i`.
- ///
- /// Note this doesn't do any bound checking, for performance reason.
- /// # Safety
- /// caller must ensure that the passed in offset is less than the array len()
+ #[inline]
pub fn value(&self, i: usize) -> T::Native {
- let offset = i + self.offset();
- unsafe { *self.raw_values.as_ptr().add(offset) }
+ self.values()[i]
Review comment:
Just looking over the benchmarks again, one of the biggest changes is actually a performance improvement while doing safe access... which makes very little sense... so I think I'll run the benchmarks again overnight.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] tyrelr commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
tyrelr commented on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-764164139
Rebased. But I didn't hit the merge conflicts during... I can see that a number of tests were swapped to use Buffer::from_slice_ref()... so I'll add another commit to switch to that function just to be safe.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#discussion_r561618199
##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
}
/// Returns the primitive value at index `i`.
- ///
- /// Note this doesn't do any bound checking, for performance reason.
- /// # Safety
- /// caller must ensure that the passed in offset is less than the array len()
+ #[inline]
pub fn value(&self, i: usize) -> T::Native {
- let offset = i + self.offset();
- unsafe { *self.raw_values.as_ptr().add(offset) }
+ self.values()[i]
Review comment:
Thinking about it, I am not sure we want this: this will have severe implications to almost all uses of `value()`. Could you run the benches before and after this PR?
My suggestion is that we mark this method as `unsafe`, so that it is clear that people need to be conscious about it. Since this will impact a large part of our code base (and likely others), I suggest that we do this change on a separate PR.
I think that we would like to instruct users of this API to switch to `values()`. This change will not trigger that change (as the code continues to run), but will have an amazing performance implication.
Think
```rust
for i in 0..array.len() {
if array.value(i) > 2 {
// do x
};
}
```
this loop will suffer a lot from this change.
we would like users to change it to
```rust
array.values().for_each(|value| {
if value > 2 {
// do x
};
});
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] tyrelr commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
tyrelr commented on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-766105862
current 10% perf impact looks like:
```
critcmp master-499b6d0c array-slice-41c29bc4 -t 10
group array-slice-41c29bc4 master-499b6d0c
----- -------------------- ---------------
add_nulls_512 1.00 349.6±1.41ns ? B/sec 1.30 452.9±1.68ns ? B/sec
array_from_vec 128 1.17 466.8±0.76ns ? B/sec 1.00 399.4±1.21ns ? B/sec
bench_primitive 1.38 3.1±0.01ms 1277.6 MB/sec 1.00 2.3±0.02ms 1766.1 MB/sec
concat i32 1024 1.21 3.0±0.00µs ? B/sec 1.00 2.5±0.01µs ? B/sec
equal_string_512 1.00 111.7±0.34ns ? B/sec 1.17 131.2±0.77ns ? B/sec
from_slice 1.64 835.2±1.80µs ? B/sec 1.00 510.7±0.77µs ? B/sec
like_utf8 scalar ends with 1.11 366.5±1.52µs ? B/sec 1.00 331.3±0.62µs ? B/sec
like_utf8 scalar equals 1.20 105.6±0.19µs ? B/sec 1.00 88.2±0.17µs ? B/sec
nlike_utf8 scalar starts with 1.13 455.3±2.89µs ? B/sec 1.00 403.1±3.25µs ? B/sec
subtract 512 1.20 453.0±0.88ns ? B/sec 1.00 376.1±6.67ns ? B/sec
take i32 nulls 512 1.13 1120.3±3.11ns ? B/sec 1.00 993.1±1.18ns ? B/sec
```
from_slice should just be noise (no changes to buffer)
the operations on primitives should just be noise (no changes to primitive array)
But the 'string' functions in particular stand out to me...
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-761618600
https://issues.apache.org/jira/browse/ARROW-11270
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] tyrelr commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
tyrelr commented on a change in pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#discussion_r561918229
##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -86,13 +86,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
}
/// Returns the primitive value at index `i`.
- ///
- /// Note this doesn't do any bound checking, for performance reason.
- /// # Safety
- /// caller must ensure that the passed in offset is less than the array len()
+ #[inline]
pub fn value(&self, i: usize) -> T::Native {
- let offset = i + self.offset();
- unsafe { *self.raw_values.as_ptr().add(offset) }
+ self.values()[i]
Review comment:
I agree, that we want to guide API users to switch to values(). My hesitancy with making/leaving it unsafe is that I think it's the only Array value(x) method when combined with the other changes in this PR (which is why it got added as a followon commit). That leads to awkwardness in macros, such as the comparison kernel, where it effectively calls "$left.value(x)". Making macro style invocation sometimes-need-unsafe is a bit unfortunate.
(There may be an argument that a trait could abstract away the need for macros altogether... but my experiments to rework the comparison kernel to that effect are incomplete - and partly sidelined with various other experiments I'm toying with... most of which somehow relate to removing direct calls to __Array.value(i) )
I would expect that in the exact for loop you mention, the compiler could recognize the satisfied loop bound since both len() and .value(i) are inlinable and use the same underlying field, but I expect it is also easy to complicate on that loop to defeat that by having a less direct relationship between i & array.len(). The pretty printing logic probably would do additional bounds checks now, as it iterates based on len() from the record batch, which is semantically equal, but nothing in the code enforces that in a way I would expect the compiler to understand.
I'll launch some benchmarks of head, master, and the commit just before the PrimitiveArray safety change.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io edited a comment on pull request #9215: ARROW-11270: [Rust] Array slice accessors
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9215:
URL: https://github.com/apache/arrow/pull/9215#issuecomment-761621256
# [Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=h1) Report
> Merging [#9215](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=desc) (41c29bc) into [master](https://codecov.io/gh/apache/arrow/commit/499b6d0c5fbcf7d3aabf25e286cb8c300e6de52a?el=desc) (499b6d0) will **increase** coverage by `0.03%`.
> The diff coverage is `100.00%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9215/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #9215 +/- ##
==========================================
+ Coverage 81.64% 81.68% +0.03%
==========================================
Files 215 215
Lines 52419 52501 +82
==========================================
+ Hits 42798 42885 +87
+ Misses 9621 9616 -5
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=tree) | Coverage Δ | |
|---|---|---|
| [rust/arrow/src/array/array.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXkucnM=) | `86.99% <100.00%> (ø)` | |
| [rust/arrow/src/array/array\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfYmluYXJ5LnJz) | `91.75% <100.00%> (+0.76%)` | :arrow_up: |
| [rust/arrow/src/array/array\_list.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfbGlzdC5ycw==) | `93.84% <100.00%> (+0.78%)` | :arrow_up: |
| [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `94.77% <100.00%> (+0.24%)` | :arrow_up: |
| [rust/arrow/src/array/array\_string.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfc3RyaW5nLnJz) | `91.66% <100.00%> (+1.66%)` | :arrow_up: |
| [rust/arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYnVpbGRlci5ycw==) | `85.10% <100.00%> (ø)` | |
| [rust/arrow/src/compute/kernels/cast.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2Nhc3QucnM=) | `96.96% <100.00%> (-0.03%)` | :arrow_down: |
| [rust/arrow/src/compute/kernels/limit.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2xpbWl0LnJz) | `100.00% <100.00%> (ø)` | |
| [rust/arrow/src/compute/util.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS91dGlsLnJz) | `98.92% <100.00%> (ø)` | |
| [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9215/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `95.43% <0.00%> (+0.19%)` | :arrow_up: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=footer). Last update [499b6d0...41c29bc](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org