You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/22 17:07:30 UTC

[GitHub] [arrow] Dandandan opened a new pull request #9293: ARROW-11349: [Rust[ Add from_iter_values to create arrays from (non null) values

Dandandan opened a new pull request #9293:
URL: https://github.com/apache/arrow/pull/9293


   The idea of this PR is to have a function `from_iter_values` that (just like `from_iter`) creates an array based on an iterator, but from `T` instead of `Option<T>`.
   
   I have seen some places in DataFusion (especially `to_array_of_size`) where an `Array` is generated from a `Vec` of items, which could be replaced by this.
   The other iterators have some memory / time overhead in both creating and manipulating the null buffer. 
   
    


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #9293: ARROW-11349: [Rust] Add from_iter_values to create arrays from (non null) values

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9293:
URL: https://github.com/apache/arrow/pull/9293#discussion_r563147945



##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -94,6 +94,32 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
         let offset = i + self.offset();
         unsafe { *self.raw_values.as_ptr().add(offset) }
     }
+
+    /// Creates a PrimitiveArray based on an iterator of values without nulls
+    pub fn from_iter_values<I: IntoIterator<Item = T::Native>>(iter: I) -> Self {
+        let iter = iter.into_iter();
+        let (_, data_len) = iter.size_hint();
+        let data_len = data_len.expect("Iterator must be sized"); // panic if no upper bound.
+
+        let mut val_buf = MutableBuffer::new(
+            data_len * mem::size_of::<<T as ArrowPrimitiveType>::Native>(),
+        );
+
+        iter.for_each(|item| {
+            val_buf.push(item);
+        });

Review comment:
       @jorgecarleitao is the new usage with `extend` what you meant?
   
   We also have to know the final count for creating the array, are you thinking of calculating that with `val_buf.len() / mem::size_of::<<T as ArrowPrimitiveType>::Native>()` or is there an other way?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #9293: ARROW-11349: [Rust] Add from_iter_values to create arrays from (non null) values

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9293:
URL: https://github.com/apache/arrow/pull/9293#discussion_r563150837



##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -94,6 +94,32 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
         let offset = i + self.offset();
         unsafe { *self.raw_values.as_ptr().add(offset) }
     }
+
+    /// Creates a PrimitiveArray based on an iterator of values without nulls
+    pub fn from_iter_values<I: IntoIterator<Item = T::Native>>(iter: I) -> Self {
+        let iter = iter.into_iter();
+        let (_, data_len) = iter.size_hint();
+        let data_len = data_len.expect("Iterator must be sized"); // panic if no upper bound.
+
+        let mut val_buf = MutableBuffer::new(
+            data_len * mem::size_of::<<T as ArrowPrimitiveType>::Native>(),
+        );
+
+        iter.for_each(|item| {
+            val_buf.push(item);
+        });

Review comment:
       Ah with just `Buffer` and `collect` this is even more clean. I adapted the code with the new API.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9293: ARROW-11349: [Rust] Add from_iter_values to create arrays from (non null) values

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9293:
URL: https://github.com/apache/arrow/pull/9293#issuecomment-765978775


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9293?src=pr&el=h1) Report
   > Merging [#9293](https://codecov.io/gh/apache/arrow/pull/9293?src=pr&el=desc) (a37941c) into [master](https://codecov.io/gh/apache/arrow/commit/67d0c2e38011cd883059e3a9fd0ea08088661707?el=desc) (67d0c2e) will **increase** coverage by `0.04%`.
   > The diff coverage is `94.87%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9293/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9293?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9293      +/-   ##
   ==========================================
   + Coverage   81.84%   81.89%   +0.04%     
   ==========================================
     Files         215      215              
     Lines       52949    52988      +39     
   ==========================================
   + Hits        43336    43392      +56     
   + Misses       9613     9596      -17     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9293?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/9293/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `94.48% <93.33%> (-0.05%)` | :arrow_down: |
   | [rust/arrow/src/array/array\_string.rs](https://codecov.io/gh/apache/arrow/pull/9293/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfc3RyaW5nLnJz) | `94.11% <95.83%> (+4.11%)` | :arrow_up: |
   | [rust/arrow/src/buffer.rs](https://codecov.io/gh/apache/arrow/pull/9293/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYnVmZmVyLnJz) | `96.21% <0.00%> (+2.52%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9293?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9293?src=pr&el=footer). Last update [67d0c2e...a37941c](https://codecov.io/gh/apache/arrow/pull/9293?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #9293: ARROW-11349: [Rust[ Add from_iter_values to create arrays from (non null) values

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #9293:
URL: https://github.com/apache/arrow/pull/9293#issuecomment-765619033


   @jorgecarleitao yes, that would make more sense for the particular use case I mentioned, and probably would be more performant as well (it could even use `memset` / `slice.fill` when that's stabilized).
   
   But I think this would cover other use cases as well?
   
   I'm happy to contribute the "from constant" method as well in a different PR for the use case.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #9293: ARROW-11349: [Rust] Add from_iter_values to create arrays from (non null) values

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #9293:
URL: https://github.com/apache/arrow/pull/9293#discussion_r563139376



##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -94,6 +94,32 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
         let offset = i + self.offset();
         unsafe { *self.raw_values.as_ptr().add(offset) }
     }
+
+    /// Creates a PrimitiveArray based on an iterator of values without nulls
+    pub fn from_iter_values<I: IntoIterator<Item = T::Native>>(iter: I) -> Self {
+        let iter = iter.into_iter();
+        let (_, data_len) = iter.size_hint();
+        let data_len = data_len.expect("Iterator must be sized"); // panic if no upper bound.
+
+        let mut val_buf = MutableBuffer::new(
+            data_len * mem::size_of::<<T as ArrowPrimitiveType>::Native>(),
+        );
+
+        iter.for_each(|item| {
+            val_buf.push(item);
+        });

Review comment:
       #9235 is merged
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #9293: ARROW-11349: [Rust[ Add from_iter_values to create arrays from (non null) values

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9293:
URL: https://github.com/apache/arrow/pull/9293#discussion_r562868988



##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -94,6 +94,32 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
         let offset = i + self.offset();
         unsafe { *self.raw_values.as_ptr().add(offset) }
     }
+
+    /// Creates a PrimitiveArray based on an iterator of values without nulls
+    pub fn from_iter_values<I: IntoIterator<Item = T::Native>>(iter: I) -> Self {
+        let iter = iter.into_iter();
+        let (_, data_len) = iter.size_hint();
+        let data_len = data_len.expect("Iterator must be sized"); // panic if no upper bound.
+
+        let mut val_buf = MutableBuffer::new(
+            data_len * mem::size_of::<<T as ArrowPrimitiveType>::Native>(),
+        );
+
+        iter.for_each(|item| {
+            val_buf.push(item);
+        });

Review comment:
       Yes, sure 👍  sounds like a good idea




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9293: ARROW-11349: [Rust[ Add from_iter_values to create arrays from (non null) values

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9293:
URL: https://github.com/apache/arrow/pull/9293#issuecomment-765558530


   https://issues.apache.org/jira/browse/ARROW-11349


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9293: ARROW-11349: [Rust] Add from_iter_values to create arrays from (non null) values

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9293:
URL: https://github.com/apache/arrow/pull/9293#discussion_r563148896



##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -94,6 +94,32 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
         let offset = i + self.offset();
         unsafe { *self.raw_values.as_ptr().add(offset) }
     }
+
+    /// Creates a PrimitiveArray based on an iterator of values without nulls
+    pub fn from_iter_values<I: IntoIterator<Item = T::Native>>(iter: I) -> Self {
+        let iter = iter.into_iter();
+        let (_, data_len) = iter.size_hint();
+        let data_len = data_len.expect("Iterator must be sized"); // panic if no upper bound.
+
+        let mut val_buf = MutableBuffer::new(
+            data_len * mem::size_of::<<T as ArrowPrimitiveType>::Native>(),
+        );
+
+        iter.for_each(|item| {
+            val_buf.push(item);
+        });

Review comment:
       Yes, that is what I meant: it should now be possible to do
   
   ```rust
   let values = vec![0i32, 1];
   
   let values = values.iter().map(|x| x + 1);
   
   let buffer: Buffer: values.collect();
   ```
   
   wrt to the `len`: that is a good point that I have had not thought about. 👍 
   
   One option is to do what you wrote. Another could be (I haven't tried to compile this, as `Fn` could become `FnMut`):
   
   ```
   let mut count = 0;
   let iter = iter.map(|x| {
      count += 1;
   x
   });
   ```
   
   I would probably have taken your idea, though :)
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9293: ARROW-11349: [Rust[ Add from_iter_values to create arrays from (non null) values

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9293:
URL: https://github.com/apache/arrow/pull/9293#discussion_r562853687



##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -94,6 +94,32 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
         let offset = i + self.offset();
         unsafe { *self.raw_values.as_ptr().add(offset) }
     }
+
+    /// Creates a PrimitiveArray based on an iterator of values without nulls
+    pub fn from_iter_values<I: IntoIterator<Item = T::Native>>(iter: I) -> Self {
+        let iter = iter.into_iter();
+        let (_, data_len) = iter.size_hint();
+        let data_len = data_len.expect("Iterator must be sized"); // panic if no upper bound.
+
+        let mut val_buf = MutableBuffer::new(
+            data_len * mem::size_of::<<T as ArrowPrimitiveType>::Native>(),
+        );
+
+        iter.for_each(|item| {
+            val_buf.push(item);
+        });

Review comment:
       Could we wait for #9235? It introduces a method to extend a `MutableBuffer` out of an iterator of Native types. It will also allow to drop the `.expect("Iterator must be sized")`, since it will be possible to extend it out of unsized iterators.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io commented on pull request #9293: ARROW-11349: [Rust] Add from_iter_values to create arrays from (non null) values

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #9293:
URL: https://github.com/apache/arrow/pull/9293#issuecomment-765978775


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9293?src=pr&el=h1) Report
   > Merging [#9293](https://codecov.io/gh/apache/arrow/pull/9293?src=pr&el=desc) (941ee5d) into [master](https://codecov.io/gh/apache/arrow/commit/67d0c2e38011cd883059e3a9fd0ea08088661707?el=desc) (67d0c2e) will **increase** coverage by `0.02%`.
   > The diff coverage is `93.02%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9293/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9293?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9293      +/-   ##
   ==========================================
   + Coverage   81.84%   81.86%   +0.02%     
   ==========================================
     Files         215      215              
     Lines       52949    52992      +43     
   ==========================================
   + Hits        43336    43382      +46     
   + Misses       9613     9610       -3     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9293?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/9293/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `94.29% <89.47%> (-0.23%)` | :arrow_down: |
   | [rust/arrow/src/array/array\_string.rs](https://codecov.io/gh/apache/arrow/pull/9293/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfc3RyaW5nLnJz) | `94.11% <95.83%> (+4.11%)` | :arrow_up: |
   | [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9293/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `95.24% <0.00%> (-0.20%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9293?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9293?src=pr&el=footer). Last update [67d0c2e...941ee5d](https://codecov.io/gh/apache/arrow/pull/9293?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao closed pull request #9293: ARROW-11349: [Rust] Add from_iter_values to create arrays from (non null) values

Posted by GitBox <gi...@apache.org>.
jorgecarleitao closed pull request #9293:
URL: https://github.com/apache/arrow/pull/9293


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #9293: ARROW-11349: [Rust[ Add from_iter_values to create arrays from (non null) values

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #9293:
URL: https://github.com/apache/arrow/pull/9293#issuecomment-765605635


   @Dandandan , thanks a lot for this.
   
   Looking at the use-case, couldn't it make more sense to offer a method that creates a constant non-null array and a constant null array?
   
   I am asking this because the common use-case I see that justifies an iterator (instead of a constant) is to perform an infalible operation over the values while keeping the null buffer untouched (e.g. the typical unary operator).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #9293: ARROW-11349: [Rust[ Add from_iter_values to create arrays from (non null) values

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #9293:
URL: https://github.com/apache/arrow/pull/9293#issuecomment-765628986


   Sorry for the noise, you are of course right. Let me just review it :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao closed pull request #9293: ARROW-11349: [Rust] Add from_iter_values to create arrays from (non null) values

Posted by GitBox <gi...@apache.org>.
jorgecarleitao closed pull request #9293:
URL: https://github.com/apache/arrow/pull/9293


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org