You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/20 03:36:49 UTC
[GitHub] [arrow] tyrelr opened a new pull request #8973: Iterate primitive buffers by slice
tyrelr opened a new pull request #8973:
URL: https://github.com/apache/arrow/pull/8973
Iterating slices instead of indexes seems to improve performance of non-simd arithmetic operations.
This adds a new method raw_values_slice to PrimitiveArray (so named to pun off of the raw_values function which returns a pointer). A few of the functions in PrimitiveArray rely on the caller for safety guarantees (bounds-checks for value_slice, value), so should probably be unsafe? But they are used widely, so I thought it simpler to start by adding a safer alternative...
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] tyrelr commented on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice
Posted by GitBox <gi...@apache.org>.
tyrelr commented on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748753938
I ran out of time today (and probably the next few days), but just as an update... I hit a speedbump looking at removing the .value(...) function.
At a first pass-through at dropping the PrimitiveArray.value() function, I hit a few usecases which are not trivially handled by a typed-slice in a performant way.
1) filter kernel does a batched indexing-like operation based on bits being set in a u64.
This can probably be re-arranged to minimize/eliminate bounds checks.
2) sort & take kernels appear to cherrypick indexes based on another index array
These are tricky. We may be able to minimize the need bounds checks in some way (finding contiguous runs to batch-copy instead of one-by-one? checking max index?) but all are adding overhead at a different spot.
3) csv writing & display try to iterate N columns in lock-step
This can probably be rewritten with some ahead-of-time bounds checks, perhaps relying on some kind of Vec<&dyn Iter<Item=String>> by having each column built itself an iterator and map itself to String... The naive & slow approach of editing the current macro to use values()[$i] causes a compilation when the macro is used for the BooleanArray type (it still has a values() function returning a buffer, like PrimitiveArray used to). I haven't looked at whether BooleanArray could also have its API cut down, or if the two should just be separated.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice
Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#discussion_r547687817
##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -63,17 +63,28 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
self.data.is_empty()
}
- /// Returns a raw pointer to the values of this array.
- pub fn raw_values(&self) -> *const T::Native {
- unsafe { self.raw_values.get().add(self.data.offset()) }
- }
-
/// Returns a slice for the given offset and length
///
/// Note this doesn't do any bound checking, for performance reason.
- pub fn value_slice(&self, offset: usize, len: usize) -> &[T::Native] {
- let raw =
- unsafe { std::slice::from_raw_parts(self.raw_values().add(offset), len) };
+ /// # Safety
+ /// caller must ensure that the passed in offset + len are less than the array len()
+ #[deprecated(note = "Please use values() instead")]
+ pub unsafe fn value_slice(&self, offset: usize, len: usize) -> &[T::Native] {
+ let raw = std::slice::from_raw_parts(
+ self.raw_values.get().add(self.data.offset()).add(offset),
+ len,
+ );
+ &raw[..]
+ }
+
+ /// Returns a slice of the values of this array
+ pub fn values(&self) -> &[T::Native] {
Review comment:
`inline`?
##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -63,17 +63,28 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
self.data.is_empty()
}
- /// Returns a raw pointer to the values of this array.
- pub fn raw_values(&self) -> *const T::Native {
- unsafe { self.raw_values.get().add(self.data.offset()) }
- }
-
/// Returns a slice for the given offset and length
///
/// Note this doesn't do any bound checking, for performance reason.
- pub fn value_slice(&self, offset: usize, len: usize) -> &[T::Native] {
- let raw =
- unsafe { std::slice::from_raw_parts(self.raw_values().add(offset), len) };
+ /// # Safety
+ /// caller must ensure that the passed in offset + len are less than the array len()
+ #[deprecated(note = "Please use values() instead")]
+ pub unsafe fn value_slice(&self, offset: usize, len: usize) -> &[T::Native] {
+ let raw = std::slice::from_raw_parts(
+ self.raw_values.get().add(self.data.offset()).add(offset),
+ len,
+ );
+ &raw[..]
+ }
+
+ /// Returns a slice of the values of this array
+ pub fn values(&self) -> &[T::Native] {
+ let raw = unsafe {
Review comment:
IMO we should justify that this is safe because `from(ArrayDataRef)` asserts that the pointer is aligned with T, etc, as we describe in the `README`.
##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -63,17 +63,28 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
self.data.is_empty()
}
- /// Returns a raw pointer to the values of this array.
- pub fn raw_values(&self) -> *const T::Native {
- unsafe { self.raw_values.get().add(self.data.offset()) }
- }
-
/// Returns a slice for the given offset and length
///
/// Note this doesn't do any bound checking, for performance reason.
- pub fn value_slice(&self, offset: usize, len: usize) -> &[T::Native] {
- let raw =
- unsafe { std::slice::from_raw_parts(self.raw_values().add(offset), len) };
+ /// # Safety
+ /// caller must ensure that the passed in offset + len are less than the array len()
+ #[deprecated(note = "Please use values() instead")]
+ pub unsafe fn value_slice(&self, offset: usize, len: usize) -> &[T::Native] {
+ let raw = std::slice::from_raw_parts(
+ self.raw_values.get().add(self.data.offset()).add(offset),
+ len,
+ );
+ &raw[..]
+ }
+
+ /// Returns a slice of the values of this array
+ pub fn values(&self) -> &[T::Native] {
+ let raw = unsafe {
+ std::slice::from_raw_parts(
+ self.raw_values.get().add(self.data.offset()),
+ self.len(),
+ )
+ };
&raw[..]
Review comment:
Is this needed here?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] Dandandan commented on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice
Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748791491
Looks like a nice simplification to me and if it compiles makes sense to integrate it. Do you have any benchmark results @tyrelr ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] tyrelr edited a comment on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice
Posted by GitBox <gi...@apache.org>.
tyrelr edited a comment on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748753938
I ran out of time today (and probably the next few days), but just as an update... I hit a speedbump looking at removing the .value(...) function.
At a first pass-through at dropping the PrimitiveArray.value() function, I hit a few usecases which are not trivially handled by a typed-slice in a performant way.
1) filter kernel does a batched indexing-like operation based on bits being set in a u64.
This can probably be re-arranged to minimize/eliminate bounds checks.
2) sort & take kernels appear to cherrypick indexes based on another index array
These are tricky. We may be able to minimize the need bounds checks in some way (finding contiguous runs to batch-copy instead of one-by-one? checking max index?) but all are adding overhead at a different spot.
3) csv writing & display try to iterate N columns in lock-step
This can probably be rewritten with some ahead-of-time bounds checks, perhaps relying on some kind of Vec<&dyn Iter<Item=String>> by having each column built itself an iterator and map itself to String... The naive & slow approach of editing the current macro to use values()[$i] causes a compilation when the macro is used for the BooleanArray type (it still has a values() function returning a buffer, like PrimitiveArray used to). I haven't looked at whether BooleanArray could also have its API cut down, or if the two should just be separated.
[Edit: to be clear, I have not actually attempted any of these edits yet, so I do not have real performance numbers, just a gut feeling]
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io edited a comment on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748559191
# [Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=h1) Report
> Merging [#8973](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=desc) (2e0aa18) into [master](https://codecov.io/gh/apache/arrow/commit/5eb6ce18e110fe6534ab08f158dc0b8a2744e59e?el=desc) (5eb6ce1) will **decrease** coverage by `0.52%`.
> The diff coverage is `82.35%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8973/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #8973 +/- ##
==========================================
- Coverage 83.19% 82.66% -0.53%
==========================================
Files 199 200 +1
Lines 48661 49767 +1106
==========================================
+ Hits 40482 41139 +657
- Misses 8179 8628 +449
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=tree) | Coverage Δ | |
|---|---|---|
| [rust/arrow/src/compute/kernels/comparison.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2NvbXBhcmlzb24ucnM=) | `96.28% <ø> (ø)` | |
| [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `91.68% <66.66%> (-0.59%)` | :arrow_down: |
| [rust/arrow/src/array/array\_dictionary.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfZGljdGlvbmFyeS5ycw==) | `85.05% <100.00%> (ø)` | |
| [rust/arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYnVpbGRlci5ycw==) | `84.11% <100.00%> (-0.07%)` | :arrow_down: |
| [rust/arrow/src/compute/kernels/aggregate.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2FnZ3JlZ2F0ZS5ycw==) | `75.00% <100.00%> (+3.57%)` | :arrow_up: |
| [rust/arrow/src/compute/kernels/arithmetic.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2FyaXRobWV0aWMucnM=) | `89.83% <100.00%> (+0.04%)` | :arrow_up: |
| [rust/datafusion/src/physical\_plan/functions.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2Z1bmN0aW9ucy5ycw==) | `79.34% <0.00%> (-2.68%)` | :arrow_down: |
| [rust/arrow/src/compute/kernels/sort.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL3NvcnQucnM=) | `93.42% <0.00%> (-2.34%)` | :arrow_down: |
| [rust/arrow/src/ipc/gen/File.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL2dlbi9GaWxlLnJz) | `40.94% <0.00%> (-2.20%)` | :arrow_down: |
| [rust/parquet/src/arrow/arrow\_writer.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJvd193cml0ZXIucnM=) | `95.58% <0.00%> (-1.76%)` | :arrow_down: |
| ... and [39 more](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree-more) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=footer). Last update [5eb6ce1...2e0aa18](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io edited a comment on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748559191
# [Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=h1) Report
> Merging [#8973](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=desc) (2c823df) into [master](https://codecov.io/gh/apache/arrow/commit/5eb6ce18e110fe6534ab08f158dc0b8a2744e59e?el=desc) (5eb6ce1) will **decrease** coverage by `0.02%`.
> The diff coverage is `100.00%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8973/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #8973 +/- ##
==========================================
- Coverage 83.19% 83.17% -0.03%
==========================================
Files 199 200 +1
Lines 48661 48939 +278
==========================================
+ Hits 40482 40703 +221
- Misses 8179 8236 +57
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=tree) | Coverage Δ | |
|---|---|---|
| [rust/arrow/src/array/array\_dictionary.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfZGljdGlvbmFyeS5ycw==) | `85.05% <100.00%> (ø)` | |
| [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `92.36% <100.00%> (+0.09%)` | :arrow_up: |
| [rust/arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYnVpbGRlci5ycw==) | `84.17% <100.00%> (ø)` | |
| [rust/arrow/src/compute/kernels/aggregate.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2FnZ3JlZ2F0ZS5ycw==) | `75.00% <100.00%> (+3.57%)` | :arrow_up: |
| [rust/arrow/src/compute/kernels/arithmetic.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2FyaXRobWV0aWMucnM=) | `89.83% <100.00%> (+0.04%)` | :arrow_up: |
| [rust/arrow/src/compute/kernels/sort.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL3NvcnQucnM=) | `93.42% <0.00%> (-2.34%)` | :arrow_down: |
| [rust/arrow/src/compute/util.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS91dGlsLnJz) | `98.93% <0.00%> (-0.44%)` | :arrow_down: |
| [rust/arrow/src/array/cast.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvY2FzdC5ycw==) | `100.00% <0.00%> (ø)` | |
| [rust/benchmarks/src/bin/tpch.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9iZW5jaG1hcmtzL3NyYy9iaW4vdHBjaC5ycw==) | `0.00% <0.00%> (ø)` | |
| [rust/datafusion/src/physical\_plan/planner.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3BsYW5uZXIucnM=) | `80.45% <0.00%> (ø)` | |
| ... and [7 more](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree-more) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=footer). Last update [5eb6ce1...2c823df](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] tyrelr commented on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice
Posted by GitBox <gi...@apache.org>.
tyrelr commented on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-749581718
Yep, I forgot that simd wasn't a default feature so I didn't test that locally. I'll take a look at that tonight, hopefully. This will touch a few lines already modified by PR #8975, but this change is simple enough it shouldn't cause concern.
@Dandandan I ran master vs. current head over night (twice to try to distinguish environmental performance from code performance). Using critcmp to filter to out anything smaller than a 10% difference, it looks like this:
```
group head-2c823df03-ARROW-10989 head-2c823df03-ARROW-10989-run2 master-ARROW-10989 master-ARROW-10989-run2
----- -------------------------- ------------------------------- ------------------ -----------------------
add 512 1.30 408.5±3.37ns ? B/sec 1.00 315.0±1.97ns ? B/sec 3.46 1088.8±5.62ns ? B/sec 3.64 1146.1±4.42ns ? B/sec
add_nulls_512 1.09 423.8±3.18ns ? B/sec 1.00 389.3±2.58ns ? B/sec 2.98 1158.3±6.43ns ? B/sec 2.95 1149.9±6.95ns ? B/sec
array_from_vec 128 1.08 444.3±3.26ns ? B/sec 1.11 458.3±1.97ns ? B/sec 1.00 411.4±2.09ns ? B/sec 1.05 432.2±2.82ns ? B/sec
bench_primitive 1.00 1109.8±9.80µs 3.5 GB/sec 1.03 1146.2±6.90µs 3.4 GB/sec 3.01 3.3±0.01ms 1198.7 MB/sec 2.94 3.3±0.01ms 1225.9 MB/sec
cast float64 to float32 512 1.01 2.8±0.03µs ? B/sec 1.00 2.8±0.02µs ? B/sec 1.11 3.1±0.03µs ? B/sec 1.00 2.8±0.02µs ? B/sec
cast int32 to int32 512 1.00 26.9±0.38ns ? B/sec 1.00 26.8±0.19ns ? B/sec 0.99 26.8±0.15ns ? B/sec 1.11 30.0±0.16ns ? B/sec
cast time32s to time32ms 512 1.00 965.9±11.12ns ? B/sec 1.05 1012.6±9.16ns ? B/sec 1.75 1687.8±8.08ns ? B/sec 1.68 1621.7±7.75ns ? B/sec
cast time64ns to time32s 512 1.10 11.1±0.12µs ? B/sec 1.00 10.1±0.16µs ? B/sec 1.00 10.1±0.04µs ? B/sec 1.00 10.1±0.11µs ? B/sec
cast timestamp_ms to timestamp_ns 512 1.14 1481.4±11.07ns ? B/sec 1.00 1304.2±10.03ns ? B/sec 1.41 1840.6±9.37ns ? B/sec 1.45 1894.2±10.44ns ? B/sec
divide 512 1.04 1830.0±12.34ns ? B/sec 1.00 1752.3±9.17ns ? B/sec 1.03 1797.1±9.00ns ? B/sec 1.27 2.2±0.01µs ? B/sec
eq scalar Float32 1.00 64.2±0.52µs ? B/sec 1.01 64.9±0.26µs ? B/sec 1.07 68.9±0.29µs ? B/sec 1.11 71.3±0.26µs ? B/sec
filter context f32 low selectivity 1.10 129.4±2.24µs ? B/sec 1.00 117.1±0.63µs ? B/sec 1.01 118.7±0.56µs ? B/sec 1.01 118.7±0.51µs ? B/sec
min nulls 512 1.13 2.1±0.02µs ? B/sec 1.01 1872.1±12.28ns ? B/sec 1.00 1858.1±25.37ns ? B/sec 1.12 2.1±0.02µs ? B/sec
multiply 512 1.00 403.0±2.35ns ? B/sec 1.21 487.6±4.20ns ? B/sec 2.86 1152.5±9.71ns ? B/sec 2.86 1151.0±6.00ns ? B/sec
subtract 512 1.00 385.3±4.18ns ? B/sec 1.09 418.9±2.99ns ? B/sec 3.24 1246.6±6.37ns ? B/sec 3.00 1157.5±6.40ns ? B/sec
take bool nulls 1024 1.02 2.6±0.02µs ? B/sec 1.00 2.6±0.02µs ? B/sec 1.91 4.9±0.05µs ? B/sec 1.91 4.9±0.05µs ? B/sec
take bool nulls 512 1.03 1449.2±18.90ns ? B/sec 1.00 1412.8±11.94ns ? B/sec 1.27 1788.1±14.12ns ? B/sec 1.34 1899.4±32.11ns ? B/sec
take i32 512 1.00 924.2±5.01ns ? B/sec 1.00 922.8±6.28ns ? B/sec 1.11 1022.4±5.66ns ? B/sec 1.00 925.5±9.49ns ? B/sec
take i32 nulls 512 1.09 1067.0±6.65ns ? B/sec 1.00 975.3±5.77ns ? B/sec 1.09 1065.0±5.21ns ? B/sec 1.11 1080.0±6.80ns ? B/sec
take str 1024 1.00 4.6±0.02µs ? B/sec 1.04 4.8±0.03µs ? B/sec 1.14 5.3±0.04µs ? B/sec 1.06 4.9±0.03µs ? B/sec
take str 512 1.00 2.8±0.01µs ? B/sec 1.04 2.9±0.02µs ? B/sec 1.14 3.2±0.03µs ? B/sec 1.01 2.8±0.02µs ? B/sec
take str null indices 512 1.00 2.8±0.01µs ? B/sec 1.04 2.9±0.02µs ? B/sec 1.16 3.2±0.03µs ? B/sec 1.03 2.9±0.02µs ? B/sec
```
A few sum benchmarks are significantly faster (finished in half to a third the time).
I am surprised by a performance increase in take bool nulls 1025/512 and take str null indices 512... I wouldn't expect those to use primitive arrays at all. I'll look into why that changes at the same time I look into fixing the simd compilation.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io edited a comment on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748559191
# [Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=h1) Report
> Merging [#8973](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=desc) (a8e4f8b) into [master](https://codecov.io/gh/apache/arrow/commit/5eb6ce18e110fe6534ab08f158dc0b8a2744e59e?el=desc) (5eb6ce1) will **decrease** coverage by `0.55%`.
> The diff coverage is `81.25%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8973/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #8973 +/- ##
==========================================
- Coverage 83.19% 82.63% -0.56%
==========================================
Files 199 200 +1
Lines 48661 49726 +1065
==========================================
+ Hits 40482 41093 +611
- Misses 8179 8633 +454
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=tree) | Coverage Δ | |
|---|---|---|
| [rust/arrow/src/compute/kernels/comparison.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2NvbXBhcmlzb24ucnM=) | `96.28% <ø> (ø)` | |
| [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `91.68% <62.50%> (-0.59%)` | :arrow_down: |
| [rust/arrow/src/array/array\_dictionary.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfZGljdGlvbmFyeS5ycw==) | `85.05% <100.00%> (ø)` | |
| [rust/arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYnVpbGRlci5ycw==) | `84.11% <100.00%> (-0.07%)` | :arrow_down: |
| [rust/arrow/src/compute/kernels/aggregate.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2FnZ3JlZ2F0ZS5ycw==) | `75.00% <100.00%> (+3.57%)` | :arrow_up: |
| [rust/arrow/src/compute/kernels/arithmetic.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2FyaXRobWV0aWMucnM=) | `89.83% <100.00%> (+0.04%)` | :arrow_up: |
| [rust/datafusion/src/physical\_plan/functions.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2Z1bmN0aW9ucy5ycw==) | `79.34% <0.00%> (-2.68%)` | :arrow_down: |
| [rust/arrow/src/compute/kernels/sort.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL3NvcnQucnM=) | `93.42% <0.00%> (-2.34%)` | :arrow_down: |
| [rust/arrow/src/ipc/gen/File.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL2dlbi9GaWxlLnJz) | `40.94% <0.00%> (-2.20%)` | :arrow_down: |
| [rust/parquet/src/arrow/arrow\_writer.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJvd193cml0ZXIucnM=) | `95.58% <0.00%> (-1.76%)` | :arrow_down: |
| ... and [37 more](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree-more) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=footer). Last update [5eb6ce1...be2ea6f](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io edited a comment on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748559191
# [Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=h1) Report
> Merging [#8973](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=desc) (56c9f3b) into [master](https://codecov.io/gh/apache/arrow/commit/5eb6ce18e110fe6534ab08f158dc0b8a2744e59e?el=desc) (5eb6ce1) will **decrease** coverage by `0.55%`.
> The diff coverage is `77.77%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8973/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #8973 +/- ##
==========================================
- Coverage 83.19% 82.63% -0.56%
==========================================
Files 199 200 +1
Lines 48661 49728 +1067
==========================================
+ Hits 40482 41094 +612
- Misses 8179 8634 +455
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=tree) | Coverage Δ | |
|---|---|---|
| [rust/arrow/src/compute/kernels/comparison.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2NvbXBhcmlzb24ucnM=) | `96.28% <ø> (ø)` | |
| [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `91.48% <60.00%> (-0.79%)` | :arrow_down: |
| [rust/arrow/src/array/array\_dictionary.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfZGljdGlvbmFyeS5ycw==) | `85.05% <100.00%> (ø)` | |
| [rust/arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYnVpbGRlci5ycw==) | `84.11% <100.00%> (-0.07%)` | :arrow_down: |
| [rust/arrow/src/compute/kernels/aggregate.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2FnZ3JlZ2F0ZS5ycw==) | `75.00% <100.00%> (+3.57%)` | :arrow_up: |
| [rust/arrow/src/compute/kernels/arithmetic.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2FyaXRobWV0aWMucnM=) | `89.83% <100.00%> (+0.04%)` | :arrow_up: |
| [rust/datafusion/src/physical\_plan/functions.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2Z1bmN0aW9ucy5ycw==) | `79.34% <0.00%> (-2.68%)` | :arrow_down: |
| [rust/arrow/src/compute/kernels/sort.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL3NvcnQucnM=) | `93.42% <0.00%> (-2.34%)` | :arrow_down: |
| [rust/arrow/src/ipc/gen/File.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL2dlbi9GaWxlLnJz) | `40.94% <0.00%> (-2.20%)` | :arrow_down: |
| [rust/parquet/src/arrow/arrow\_writer.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJvd193cml0ZXIucnM=) | `95.58% <0.00%> (-1.76%)` | :arrow_down: |
| ... and [37 more](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree-more) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=footer). Last update [5eb6ce1...56c9f3b](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #8973: [Rust] Iterate primitive buffers by slice
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748558702
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
Thanks for opening a pull request!
Could you open an issue for this pull request on JIRA?
https://issues.apache.org/jira/browse/ARROW
Then could you also rename pull request title in the following format?
ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
See also:
* [Other pull requests](https://github.com/apache/arrow/pulls/)
* [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io edited a comment on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748559191
# [Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=h1) Report
> Merging [#8973](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=desc) (b94c46f) into [master](https://codecov.io/gh/apache/arrow/commit/5eb6ce18e110fe6534ab08f158dc0b8a2744e59e?el=desc) (5eb6ce1) will **decrease** coverage by `0.52%`.
> The diff coverage is `82.35%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8973/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #8973 +/- ##
==========================================
- Coverage 83.19% 82.66% -0.53%
==========================================
Files 199 200 +1
Lines 48661 49767 +1106
==========================================
+ Hits 40482 41139 +657
- Misses 8179 8628 +449
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=tree) | Coverage Δ | |
|---|---|---|
| [rust/arrow/src/compute/kernels/comparison.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2NvbXBhcmlzb24ucnM=) | `96.28% <ø> (ø)` | |
| [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `91.68% <66.66%> (-0.59%)` | :arrow_down: |
| [rust/arrow/src/array/array\_dictionary.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfZGljdGlvbmFyeS5ycw==) | `85.05% <100.00%> (ø)` | |
| [rust/arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYnVpbGRlci5ycw==) | `84.11% <100.00%> (-0.07%)` | :arrow_down: |
| [rust/arrow/src/compute/kernels/aggregate.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2FnZ3JlZ2F0ZS5ycw==) | `75.00% <100.00%> (+3.57%)` | :arrow_up: |
| [rust/arrow/src/compute/kernels/arithmetic.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2FyaXRobWV0aWMucnM=) | `89.83% <100.00%> (+0.04%)` | :arrow_up: |
| [rust/datafusion/src/physical\_plan/functions.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2Z1bmN0aW9ucy5ycw==) | `79.34% <0.00%> (-2.68%)` | :arrow_down: |
| [rust/arrow/src/compute/kernels/sort.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL3NvcnQucnM=) | `93.42% <0.00%> (-2.34%)` | :arrow_down: |
| [rust/arrow/src/ipc/gen/File.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL2dlbi9GaWxlLnJz) | `40.94% <0.00%> (-2.20%)` | :arrow_down: |
| [rust/parquet/src/arrow/arrow\_writer.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJvd193cml0ZXIucnM=) | `95.58% <0.00%> (-1.76%)` | :arrow_down: |
| ... and [39 more](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree-more) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=footer). Last update [5eb6ce1...b94c46f](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorgecarleitao commented on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice
Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748787802
IMO this is already a very good improvement, it is fine to keep the `value` for now. I think that the SIMD is breaking, probably because that also contains some accesses using the old API.
cc @Dandandan and @jhorstmann , that have been touching these APIs
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io commented on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice
Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748559191
# [Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=h1) Report
> Merging [#8973](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=desc) (4e06899) into [master](https://codecov.io/gh/apache/arrow/commit/5eb6ce18e110fe6534ab08f158dc0b8a2744e59e?el=desc) (5eb6ce1) will **decrease** coverage by `0.00%`.
> The diff coverage is `100.00%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8973/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #8973 +/- ##
==========================================
- Coverage 83.19% 83.19% -0.01%
==========================================
Files 199 199
Lines 48661 48665 +4
==========================================
+ Hits 40482 40485 +3
- Misses 8179 8180 +1
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=tree) | Coverage Δ | |
|---|---|---|
| [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `92.32% <100.00%> (+0.05%)` | :arrow_up: |
| [rust/arrow/src/compute/kernels/arithmetic.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2FyaXRobWV0aWMucnM=) | `89.83% <100.00%> (+0.04%)` | :arrow_up: |
| [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `95.24% <0.00%> (-0.20%)` | :arrow_down: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=footer). Last update [5eb6ce1...4e06899](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] tyrelr commented on a change in pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice
Posted by GitBox <gi...@apache.org>.
tyrelr commented on a change in pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#discussion_r548092554
##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -63,17 +63,28 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
self.data.is_empty()
}
- /// Returns a raw pointer to the values of this array.
- pub fn raw_values(&self) -> *const T::Native {
- unsafe { self.raw_values.get().add(self.data.offset()) }
- }
-
/// Returns a slice for the given offset and length
///
/// Note this doesn't do any bound checking, for performance reason.
- pub fn value_slice(&self, offset: usize, len: usize) -> &[T::Native] {
- let raw =
- unsafe { std::slice::from_raw_parts(self.raw_values().add(offset), len) };
+ /// # Safety
+ /// caller must ensure that the passed in offset + len are less than the array len()
+ #[deprecated(note = "Please use values() instead")]
+ pub unsafe fn value_slice(&self, offset: usize, len: usize) -> &[T::Native] {
+ let raw = std::slice::from_raw_parts(
+ self.raw_values.get().add(self.data.offset()).add(offset),
+ len,
+ );
+ &raw[..]
+ }
+
+ /// Returns a slice of the values of this array
+ pub fn values(&self) -> &[T::Native] {
+ let raw = unsafe {
Review comment:
I also added a note to the raw_values pointer itself, since any future constructor should also make the same guarantees before constructing a struct.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jhorstmann commented on a change in pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice
Posted by GitBox <gi...@apache.org>.
jhorstmann commented on a change in pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#discussion_r547839892
##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -63,17 +63,28 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
self.data.is_empty()
}
- /// Returns a raw pointer to the values of this array.
- pub fn raw_values(&self) -> *const T::Native {
- unsafe { self.raw_values.get().add(self.data.offset()) }
- }
-
/// Returns a slice for the given offset and length
///
/// Note this doesn't do any bound checking, for performance reason.
- pub fn value_slice(&self, offset: usize, len: usize) -> &[T::Native] {
- let raw =
- unsafe { std::slice::from_raw_parts(self.raw_values().add(offset), len) };
+ /// # Safety
+ /// caller must ensure that the passed in offset + len are less than the array len()
+ #[deprecated(note = "Please use values() instead")]
+ pub unsafe fn value_slice(&self, offset: usize, len: usize) -> &[T::Native] {
Review comment:
Since `value_slice` is now only used in the simd comparison kernels, I can remove it later in [ARROW-10990][1].
I think this PR is a nice simplification and also makes the code more safe, thanks!
[1]: https://github.com/apache/arrow/pull/8975
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorgecarleitao commented on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice
Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-750754917
Thank you for your contribution, @tyrelr . FYI, I have not assigned you to the issue on Jira because I can't (@andygrove , how do we do this?), but I think we should.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorgecarleitao commented on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice
Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748571786
Hi @tyrelr , that is a good idea.
I am a reticent in introducing yet another public method to the `PrimitiveArray`, as it is already a bit confusing which one to use (we have `value_slice`, `raw_values`, `values` and `value`).
Originally, I was thinking in solving this mess via an iterator of values, however, you have a better idea: just offer a slice of `T`.
In which case, this PR should be bolder: replace `value_slice`, `raw_values`, `values` and (if you feel confident) `value` by a single method, `values`, whose implementation is what you wrote for `raw_values_slice`.
In other words, instead of introducing yet another endpoint, we simplify the existing API with something simpler (and safer, as many of the methods are actually `unsafe` and cause undefined behavior).
How would that sound to you?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] tyrelr commented on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice
Posted by GitBox <gi...@apache.org>.
tyrelr commented on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748677658
That makes sense. I'll take a look.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748559518
https://issues.apache.org/jira/browse/ARROW-10989
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorgecarleitao closed pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice
Posted by GitBox <gi...@apache.org>.
jorgecarleitao closed pull request #8973:
URL: https://github.com/apache/arrow/pull/8973
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] removed a comment on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice
Posted by GitBox <gi...@apache.org>.
github-actions[bot] removed a comment on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748558702
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
Thanks for opening a pull request!
Could you open an issue for this pull request on JIRA?
https://issues.apache.org/jira/browse/ARROW
Then could you also rename pull request title in the following format?
ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
See also:
* [Other pull requests](https://github.com/apache/arrow/pulls/)
* [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alamb commented on a change in pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice
Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#discussion_r547954350
##########
File path: rust/arrow/src/compute/kernels/comparison.rs
##########
@@ -451,15 +453,17 @@ where
let rem = len % lanes;
for i in (0..len - rem).step_by(lanes) {
- let simd_left = T::load(left.value_slice(i, lanes));
+ let simd_left = T::load(unsafe { left.value_slice(i, lanes) });
let simd_result = op(simd_left, simd_right);
T::bitmask(&simd_result, |b| {
result.extend_from_slice(b);
});
}
if rem > 0 {
- let simd_left = T::load(left.value_slice(len - rem, lanes));
+ //Soundness
+ // This is not sound because it can read past the end of PrimitiveArray buffer (lanes is always greater than rem), see ARROW-10990
Review comment:
🤣
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io edited a comment on pull request #8973: ARROW-10989: [Rust] Iterate primitive buffers by slice
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748559191
# [Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=h1) Report
> Merging [#8973](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=desc) (be2ea6f) into [master](https://codecov.io/gh/apache/arrow/commit/5eb6ce18e110fe6534ab08f158dc0b8a2744e59e?el=desc) (5eb6ce1) will **decrease** coverage by `0.55%`.
> The diff coverage is `77.77%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8973/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #8973 +/- ##
==========================================
- Coverage 83.19% 82.63% -0.56%
==========================================
Files 199 200 +1
Lines 48661 49728 +1067
==========================================
+ Hits 40482 41094 +612
- Misses 8179 8634 +455
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=tree) | Coverage Δ | |
|---|---|---|
| [rust/arrow/src/compute/kernels/comparison.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2NvbXBhcmlzb24ucnM=) | `96.28% <ø> (ø)` | |
| [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `91.48% <60.00%> (-0.79%)` | :arrow_down: |
| [rust/arrow/src/array/array\_dictionary.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfZGljdGlvbmFyeS5ycw==) | `85.05% <100.00%> (ø)` | |
| [rust/arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYnVpbGRlci5ycw==) | `84.11% <100.00%> (-0.07%)` | :arrow_down: |
| [rust/arrow/src/compute/kernels/aggregate.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2FnZ3JlZ2F0ZS5ycw==) | `75.00% <100.00%> (+3.57%)` | :arrow_up: |
| [rust/arrow/src/compute/kernels/arithmetic.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2FyaXRobWV0aWMucnM=) | `89.83% <100.00%> (+0.04%)` | :arrow_up: |
| [rust/datafusion/src/physical\_plan/functions.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2Z1bmN0aW9ucy5ycw==) | `79.34% <0.00%> (-2.68%)` | :arrow_down: |
| [rust/arrow/src/compute/kernels/sort.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL3NvcnQucnM=) | `93.42% <0.00%> (-2.34%)` | :arrow_down: |
| [rust/arrow/src/ipc/gen/File.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL2dlbi9GaWxlLnJz) | `40.94% <0.00%> (-2.20%)` | :arrow_down: |
| [rust/parquet/src/arrow/arrow\_writer.rs](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJvd193cml0ZXIucnM=) | `95.58% <0.00%> (-1.76%)` | :arrow_down: |
| ... and [37 more](https://codecov.io/gh/apache/arrow/pull/8973/diff?src=pr&el=tree-more) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=footer). Last update [5eb6ce1...be2ea6f](https://codecov.io/gh/apache/arrow/pull/8973?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org