You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/22 17:07:30 UTC
[GitHub] [arrow] Dandandan opened a new pull request #9293: ARROW-11349: [Rust[ Add from_iter_values to create arrays from (non null) values
Dandandan opened a new pull request #9293:
URL: https://github.com/apache/arrow/pull/9293
The idea of this PR is to have a function `from_iter_values` that (just like `from_iter`) creates an array based on an iterator, but from `T` instead of `Option<T>`.
I have seen some places in DataFusion (especially `to_array_of_size`) where an `Array` is generated from a `Vec` of items, which could be replaced by this.
The other iterators have some memory / time overhead in both creating and manipulating the null buffer.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] Dandandan commented on a change in pull request #9293: ARROW-11349: [Rust] Add from_iter_values to create arrays from (non null) values
Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9293:
URL: https://github.com/apache/arrow/pull/9293#discussion_r563147945
##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -94,6 +94,32 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
let offset = i + self.offset();
unsafe { *self.raw_values.as_ptr().add(offset) }
}
+
+ /// Creates a PrimitiveArray based on an iterator of values without nulls
+ pub fn from_iter_values<I: IntoIterator<Item = T::Native>>(iter: I) -> Self {
+ let iter = iter.into_iter();
+ let (_, data_len) = iter.size_hint();
+ let data_len = data_len.expect("Iterator must be sized"); // panic if no upper bound.
+
+ let mut val_buf = MutableBuffer::new(
+ data_len * mem::size_of::<<T as ArrowPrimitiveType>::Native>(),
+ );
+
+ iter.for_each(|item| {
+ val_buf.push(item);
+ });
Review comment:
@jorgecarleitao is the new usage with `extend` what you meant?
We also have to know the final count for creating the array, are you thinking of calculating that with `val_buf.len() / mem::size_of::<<T as ArrowPrimitiveType>::Native>()` or is there an other way?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] Dandandan commented on a change in pull request #9293: ARROW-11349: [Rust] Add from_iter_values to create arrays from (non null) values
Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9293:
URL: https://github.com/apache/arrow/pull/9293#discussion_r563150837
##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -94,6 +94,32 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
let offset = i + self.offset();
unsafe { *self.raw_values.as_ptr().add(offset) }
}
+
+ /// Creates a PrimitiveArray based on an iterator of values without nulls
+ pub fn from_iter_values<I: IntoIterator<Item = T::Native>>(iter: I) -> Self {
+ let iter = iter.into_iter();
+ let (_, data_len) = iter.size_hint();
+ let data_len = data_len.expect("Iterator must be sized"); // panic if no upper bound.
+
+ let mut val_buf = MutableBuffer::new(
+ data_len * mem::size_of::<<T as ArrowPrimitiveType>::Native>(),
+ );
+
+ iter.for_each(|item| {
+ val_buf.push(item);
+ });
Review comment:
Ah with just `Buffer` and `collect` this is even more clean. I adapted the code with the new API.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io edited a comment on pull request #9293: ARROW-11349: [Rust] Add from_iter_values to create arrays from (non null) values
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9293:
URL: https://github.com/apache/arrow/pull/9293#issuecomment-765978775
# [Codecov](https://codecov.io/gh/apache/arrow/pull/9293?src=pr&el=h1) Report
> Merging [#9293](https://codecov.io/gh/apache/arrow/pull/9293?src=pr&el=desc) (a37941c) into [master](https://codecov.io/gh/apache/arrow/commit/67d0c2e38011cd883059e3a9fd0ea08088661707?el=desc) (67d0c2e) will **increase** coverage by `0.04%`.
> The diff coverage is `94.87%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9293/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9293?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #9293 +/- ##
==========================================
+ Coverage 81.84% 81.89% +0.04%
==========================================
Files 215 215
Lines 52949 52988 +39
==========================================
+ Hits 43336 43392 +56
+ Misses 9613 9596 -17
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9293?src=pr&el=tree) | Coverage Δ | |
|---|---|---|
| [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/9293/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `94.48% <93.33%> (-0.05%)` | :arrow_down: |
| [rust/arrow/src/array/array\_string.rs](https://codecov.io/gh/apache/arrow/pull/9293/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfc3RyaW5nLnJz) | `94.11% <95.83%> (+4.11%)` | :arrow_up: |
| [rust/arrow/src/buffer.rs](https://codecov.io/gh/apache/arrow/pull/9293/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYnVmZmVyLnJz) | `96.21% <0.00%> (+2.52%)` | :arrow_up: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9293?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9293?src=pr&el=footer). Last update [67d0c2e...a37941c](https://codecov.io/gh/apache/arrow/pull/9293?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] Dandandan commented on pull request #9293: ARROW-11349: [Rust[ Add from_iter_values to create arrays from (non null) values
Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #9293:
URL: https://github.com/apache/arrow/pull/9293#issuecomment-765619033
@jorgecarleitao yes, that would make more sense for the particular use case I mentioned, and probably would be more performant as well (it could even use `memset` / `slice.fill` when that's stabilized).
But I think this would cover other use cases as well?
I'm happy to contribute the "from constant" method as well in a different PR for the use case.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alamb commented on a change in pull request #9293: ARROW-11349: [Rust] Add from_iter_values to create arrays from (non null) values
Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #9293:
URL: https://github.com/apache/arrow/pull/9293#discussion_r563139376
##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -94,6 +94,32 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
let offset = i + self.offset();
unsafe { *self.raw_values.as_ptr().add(offset) }
}
+
+ /// Creates a PrimitiveArray based on an iterator of values without nulls
+ pub fn from_iter_values<I: IntoIterator<Item = T::Native>>(iter: I) -> Self {
+ let iter = iter.into_iter();
+ let (_, data_len) = iter.size_hint();
+ let data_len = data_len.expect("Iterator must be sized"); // panic if no upper bound.
+
+ let mut val_buf = MutableBuffer::new(
+ data_len * mem::size_of::<<T as ArrowPrimitiveType>::Native>(),
+ );
+
+ iter.for_each(|item| {
+ val_buf.push(item);
+ });
Review comment:
#9235 is merged
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] Dandandan commented on a change in pull request #9293: ARROW-11349: [Rust[ Add from_iter_values to create arrays from (non null) values
Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9293:
URL: https://github.com/apache/arrow/pull/9293#discussion_r562868988
##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -94,6 +94,32 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
let offset = i + self.offset();
unsafe { *self.raw_values.as_ptr().add(offset) }
}
+
+ /// Creates a PrimitiveArray based on an iterator of values without nulls
+ pub fn from_iter_values<I: IntoIterator<Item = T::Native>>(iter: I) -> Self {
+ let iter = iter.into_iter();
+ let (_, data_len) = iter.size_hint();
+ let data_len = data_len.expect("Iterator must be sized"); // panic if no upper bound.
+
+ let mut val_buf = MutableBuffer::new(
+ data_len * mem::size_of::<<T as ArrowPrimitiveType>::Native>(),
+ );
+
+ iter.for_each(|item| {
+ val_buf.push(item);
+ });
Review comment:
Yes, sure 👍 sounds like a good idea
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #9293: ARROW-11349: [Rust[ Add from_iter_values to create arrays from (non null) values
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9293:
URL: https://github.com/apache/arrow/pull/9293#issuecomment-765558530
https://issues.apache.org/jira/browse/ARROW-11349
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9293: ARROW-11349: [Rust] Add from_iter_values to create arrays from (non null) values
Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9293:
URL: https://github.com/apache/arrow/pull/9293#discussion_r563148896
##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -94,6 +94,32 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
let offset = i + self.offset();
unsafe { *self.raw_values.as_ptr().add(offset) }
}
+
+ /// Creates a PrimitiveArray based on an iterator of values without nulls
+ pub fn from_iter_values<I: IntoIterator<Item = T::Native>>(iter: I) -> Self {
+ let iter = iter.into_iter();
+ let (_, data_len) = iter.size_hint();
+ let data_len = data_len.expect("Iterator must be sized"); // panic if no upper bound.
+
+ let mut val_buf = MutableBuffer::new(
+ data_len * mem::size_of::<<T as ArrowPrimitiveType>::Native>(),
+ );
+
+ iter.for_each(|item| {
+ val_buf.push(item);
+ });
Review comment:
Yes, that is what I meant: it should now be possible to do
```rust
let values = vec![0i32, 1];
let values = values.iter().map(|x| x + 1);
let buffer: Buffer: values.collect();
```
wrt to the `len`: that is a good point that I have had not thought about. 👍
One option is to do what you wrote. Another could be (I haven't tried to compile this, as `Fn` could become `FnMut`):
```
let mut count = 0;
let iter = iter.map(|x| {
count += 1;
x
});
```
I would probably have taken your idea, though :)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9293: ARROW-11349: [Rust[ Add from_iter_values to create arrays from (non null) values
Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9293:
URL: https://github.com/apache/arrow/pull/9293#discussion_r562853687
##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -94,6 +94,32 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
let offset = i + self.offset();
unsafe { *self.raw_values.as_ptr().add(offset) }
}
+
+ /// Creates a PrimitiveArray based on an iterator of values without nulls
+ pub fn from_iter_values<I: IntoIterator<Item = T::Native>>(iter: I) -> Self {
+ let iter = iter.into_iter();
+ let (_, data_len) = iter.size_hint();
+ let data_len = data_len.expect("Iterator must be sized"); // panic if no upper bound.
+
+ let mut val_buf = MutableBuffer::new(
+ data_len * mem::size_of::<<T as ArrowPrimitiveType>::Native>(),
+ );
+
+ iter.for_each(|item| {
+ val_buf.push(item);
+ });
Review comment:
Could we wait for #9235? It introduces a method to extend a `MutableBuffer` out of an iterator of Native types. It will also allow to drop the `.expect("Iterator must be sized")`, since it will be possible to extend it out of unsized iterators.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io commented on pull request #9293: ARROW-11349: [Rust] Add from_iter_values to create arrays from (non null) values
Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #9293:
URL: https://github.com/apache/arrow/pull/9293#issuecomment-765978775
# [Codecov](https://codecov.io/gh/apache/arrow/pull/9293?src=pr&el=h1) Report
> Merging [#9293](https://codecov.io/gh/apache/arrow/pull/9293?src=pr&el=desc) (941ee5d) into [master](https://codecov.io/gh/apache/arrow/commit/67d0c2e38011cd883059e3a9fd0ea08088661707?el=desc) (67d0c2e) will **increase** coverage by `0.02%`.
> The diff coverage is `93.02%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9293/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9293?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #9293 +/- ##
==========================================
+ Coverage 81.84% 81.86% +0.02%
==========================================
Files 215 215
Lines 52949 52992 +43
==========================================
+ Hits 43336 43382 +46
+ Misses 9613 9610 -3
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9293?src=pr&el=tree) | Coverage Δ | |
|---|---|---|
| [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/9293/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `94.29% <89.47%> (-0.23%)` | :arrow_down: |
| [rust/arrow/src/array/array\_string.rs](https://codecov.io/gh/apache/arrow/pull/9293/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfc3RyaW5nLnJz) | `94.11% <95.83%> (+4.11%)` | :arrow_up: |
| [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9293/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `95.24% <0.00%> (-0.20%)` | :arrow_down: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9293?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9293?src=pr&el=footer). Last update [67d0c2e...941ee5d](https://codecov.io/gh/apache/arrow/pull/9293?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorgecarleitao closed pull request #9293: ARROW-11349: [Rust] Add from_iter_values to create arrays from (non null) values
Posted by GitBox <gi...@apache.org>.
jorgecarleitao closed pull request #9293:
URL: https://github.com/apache/arrow/pull/9293
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorgecarleitao commented on pull request #9293: ARROW-11349: [Rust[ Add from_iter_values to create arrays from (non null) values
Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #9293:
URL: https://github.com/apache/arrow/pull/9293#issuecomment-765605635
@Dandandan , thanks a lot for this.
Looking at the use-case, couldn't it make more sense to offer a method that creates a constant non-null array and a constant null array?
I am asking this because the common use-case I see that justifies an iterator (instead of a constant) is to perform an infalible operation over the values while keeping the null buffer untouched (e.g. the typical unary operator).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorgecarleitao commented on pull request #9293: ARROW-11349: [Rust[ Add from_iter_values to create arrays from (non null) values
Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #9293:
URL: https://github.com/apache/arrow/pull/9293#issuecomment-765628986
Sorry for the noise, you are of course right. Let me just review it :)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorgecarleitao closed pull request #9293: ARROW-11349: [Rust] Add from_iter_values to create arrays from (non null) values
Posted by GitBox <gi...@apache.org>.
jorgecarleitao closed pull request #9293:
URL: https://github.com/apache/arrow/pull/9293
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org