You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/23 22:56:48 UTC

[GitHub] [arrow] Dandandan opened a new pull request #9305: ARROW-11362:[Rust][DataFusion] Use iterator APIs in to_array_of_size to improve performance

Dandandan opened a new pull request #9305:
URL: https://github.com/apache/arrow/pull/9305


   This function `to_array_of_size` is about 8.3% of instructions in the db-benchmark queries.
   
   This uses the PR https://github.com/apache/arrow/pull/9293
   
   The case of converting an int32 to an array improved by ~5x according to the microbenchmark:
   
   ```
   to_array_of_size 100000 time:   [55.501 us 55.627 us 55.809 us]                                    
                           change: [-82.457% -82.384% -82.299%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     2 (2.00%) high mild
     1 (1.00%) high severe
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #9305: ARROW-11362:[Rust][DataFusion] Use iterator APIs in to_array_of_size to improve performance

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #9305:
URL: https://github.com/apache/arrow/pull/9305#issuecomment-769958232


   I am really sorry, @wesm . What were the consequences? 
   
   From what I see, the main problem is that the list of co-authors is rather large on the commit message.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #9305: ARROW-11362:[Rust][DataFusion] Use iterator APIs in to_array_of_size to improve performance

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #9305:
URL: https://github.com/apache/arrow/pull/9305#issuecomment-769551141


   @Dandandan there is a missing license in `rust/datafusion/benches/scalar.rs`. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9305: ARROW-11362:[Rust][DataFusion] Use iterator APIs in to_array_of_size to improve performance

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9305:
URL: https://github.com/apache/arrow/pull/9305#issuecomment-767499594


   The clippy failures were fixed in https://github.com/apache/arrow/pull/9314 - a quick rebase can probably get this PR green


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #9305: ARROW-11362:[Rust][DataFusion] Use iterator APIs in to_array_of_size to improve performance

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9305:
URL: https://github.com/apache/arrow/pull/9305#discussion_r563564784



##########
File path: rust/datafusion/src/scalar.rs
##########
@@ -205,28 +205,104 @@ impl ScalarValue {
             ScalarValue::Boolean(e) => {
                 Arc::new(BooleanArray::from(vec![*e; size])) as ArrayRef
             }
-            ScalarValue::Float64(e) => {
-                Arc::new(Float64Array::from(vec![*e; size])) as ArrayRef
-            }
-            ScalarValue::Float32(e) => Arc::new(Float32Array::from(vec![*e; size])),
-            ScalarValue::Int8(e) => Arc::new(Int8Array::from(vec![*e; size])),
-            ScalarValue::Int16(e) => Arc::new(Int16Array::from(vec![*e; size])),
-            ScalarValue::Int32(e) => Arc::new(Int32Array::from(vec![*e; size])),
-            ScalarValue::Int64(e) => Arc::new(Int64Array::from(vec![*e; size])),
-            ScalarValue::UInt8(e) => Arc::new(UInt8Array::from(vec![*e; size])),
-            ScalarValue::UInt16(e) => Arc::new(UInt16Array::from(vec![*e; size])),
-            ScalarValue::UInt32(e) => Arc::new(UInt32Array::from(vec![*e; size])),
-            ScalarValue::UInt64(e) => Arc::new(UInt64Array::from(vec![*e; size])),
-            ScalarValue::TimeMicrosecond(e) => {
-                Arc::new(TimestampMicrosecondArray::from(vec![*e]))
-            }
-            ScalarValue::TimeNanosecond(e) => {
-                Arc::new(TimestampNanosecondArray::from_opt_vec(vec![*e], None))
-            }
-            ScalarValue::Utf8(e) => Arc::new(StringArray::from(vec![e.as_deref(); size])),
-            ScalarValue::LargeUtf8(e) => {
-                Arc::new(LargeStringArray::from(vec![e.as_deref(); size]))
-            }
+            ScalarValue::Float64(e) => match e {
+                Some(value) => {
+                    Arc::new(Float64Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Float64Array>()),
+            },
+            ScalarValue::Float32(e) => match e {
+                Some(value) => {
+                    Arc::new(Float32Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Float32Array>()),
+            },
+            ScalarValue::Int8(e) => match e {
+                Some(value) => {
+                    Arc::new(Int8Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Int8Array>()),
+            },
+            ScalarValue::Int16(e) => match e {
+                Some(value) => {
+                    Arc::new(Int16Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Int16Array>()),
+            },
+            ScalarValue::Int32(e) => match e {
+                Some(value) => {
+                    Arc::new(Int32Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Int32Array>()),
+            },
+            ScalarValue::Int64(e) => match e {
+                Some(value) => {
+                    Arc::new(Int64Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Int64Array>()),
+            },
+            ScalarValue::UInt8(e) => match e {
+                Some(value) => {
+                    Arc::new(UInt8Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<UInt8Array>()),
+            },
+            ScalarValue::UInt16(e) => match e {
+                Some(value) => {
+                    Arc::new(UInt16Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<UInt16Array>()),
+            },
+            ScalarValue::UInt32(e) => match e {
+                Some(value) => {
+                    Arc::new(UInt32Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<UInt32Array>()),
+            },
+            ScalarValue::UInt64(e) => match e {
+                Some(value) => {
+                    Arc::new(UInt64Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<UInt64Array>()),
+            },
+            ScalarValue::TimeMicrosecond(e) => match e {
+                Some(value) => Arc::new(TimestampMicrosecondArray::from_iter_values(
+                    repeat(*value).take(size),
+                )),
+                None => Arc::new(
+                    repeat(None)
+                        .take(size)
+                        .collect::<TimestampMicrosecondArray>(),
+                ),
+            },
+            ScalarValue::TimeNanosecond(e) => match e {
+                Some(value) => Arc::new(TimestampNanosecondArray::from_iter_values(
+                    repeat(*value).take(size),
+                )),
+                None => Arc::new(
+                    repeat(None)
+                        .take(size)
+                        .collect::<TimestampNanosecondArray>(),
+                ),
+            },
+            ScalarValue::Utf8(e) => match e {
+                Some(value) => {
+                    Arc::new(StringArray::from_iter_values(repeat(value).take(size)))
+                }
+                None => {
+                    Arc::new(repeat(e.as_deref()).take(size).collect::<StringArray>())

Review comment:
       I couldn't get the `None` compiling here. 

##########
File path: rust/datafusion/src/scalar.rs
##########
@@ -205,28 +205,104 @@ impl ScalarValue {
             ScalarValue::Boolean(e) => {
                 Arc::new(BooleanArray::from(vec![*e; size])) as ArrayRef
             }
-            ScalarValue::Float64(e) => {
-                Arc::new(Float64Array::from(vec![*e; size])) as ArrayRef
-            }
-            ScalarValue::Float32(e) => Arc::new(Float32Array::from(vec![*e; size])),
-            ScalarValue::Int8(e) => Arc::new(Int8Array::from(vec![*e; size])),
-            ScalarValue::Int16(e) => Arc::new(Int16Array::from(vec![*e; size])),
-            ScalarValue::Int32(e) => Arc::new(Int32Array::from(vec![*e; size])),
-            ScalarValue::Int64(e) => Arc::new(Int64Array::from(vec![*e; size])),
-            ScalarValue::UInt8(e) => Arc::new(UInt8Array::from(vec![*e; size])),
-            ScalarValue::UInt16(e) => Arc::new(UInt16Array::from(vec![*e; size])),
-            ScalarValue::UInt32(e) => Arc::new(UInt32Array::from(vec![*e; size])),
-            ScalarValue::UInt64(e) => Arc::new(UInt64Array::from(vec![*e; size])),
-            ScalarValue::TimeMicrosecond(e) => {
-                Arc::new(TimestampMicrosecondArray::from(vec![*e]))
-            }
-            ScalarValue::TimeNanosecond(e) => {
-                Arc::new(TimestampNanosecondArray::from_opt_vec(vec![*e], None))
-            }
-            ScalarValue::Utf8(e) => Arc::new(StringArray::from(vec![e.as_deref(); size])),
-            ScalarValue::LargeUtf8(e) => {
-                Arc::new(LargeStringArray::from(vec![e.as_deref(); size]))
-            }
+            ScalarValue::Float64(e) => match e {
+                Some(value) => {
+                    Arc::new(Float64Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Float64Array>()),
+            },
+            ScalarValue::Float32(e) => match e {
+                Some(value) => {
+                    Arc::new(Float32Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Float32Array>()),
+            },
+            ScalarValue::Int8(e) => match e {
+                Some(value) => {
+                    Arc::new(Int8Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Int8Array>()),
+            },
+            ScalarValue::Int16(e) => match e {
+                Some(value) => {
+                    Arc::new(Int16Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Int16Array>()),
+            },
+            ScalarValue::Int32(e) => match e {
+                Some(value) => {
+                    Arc::new(Int32Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Int32Array>()),
+            },
+            ScalarValue::Int64(e) => match e {
+                Some(value) => {
+                    Arc::new(Int64Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Int64Array>()),
+            },
+            ScalarValue::UInt8(e) => match e {
+                Some(value) => {
+                    Arc::new(UInt8Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<UInt8Array>()),
+            },
+            ScalarValue::UInt16(e) => match e {
+                Some(value) => {
+                    Arc::new(UInt16Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<UInt16Array>()),
+            },
+            ScalarValue::UInt32(e) => match e {
+                Some(value) => {
+                    Arc::new(UInt32Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<UInt32Array>()),
+            },
+            ScalarValue::UInt64(e) => match e {
+                Some(value) => {
+                    Arc::new(UInt64Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<UInt64Array>()),
+            },
+            ScalarValue::TimeMicrosecond(e) => match e {
+                Some(value) => Arc::new(TimestampMicrosecondArray::from_iter_values(
+                    repeat(*value).take(size),
+                )),
+                None => Arc::new(
+                    repeat(None)
+                        .take(size)
+                        .collect::<TimestampMicrosecondArray>(),
+                ),
+            },
+            ScalarValue::TimeNanosecond(e) => match e {
+                Some(value) => Arc::new(TimestampNanosecondArray::from_iter_values(
+                    repeat(*value).take(size),
+                )),
+                None => Arc::new(
+                    repeat(None)
+                        .take(size)
+                        .collect::<TimestampNanosecondArray>(),
+                ),
+            },
+            ScalarValue::Utf8(e) => match e {
+                Some(value) => {
+                    Arc::new(StringArray::from_iter_values(repeat(value).take(size)))
+                }
+                None => {
+                    Arc::new(repeat(e.as_deref()).take(size).collect::<StringArray>())

Review comment:
       Updated.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #9305: ARROW-11362:[Rust][DataFusion] Use iterator APIs in to_array_of_size to improve performance

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #9305:
URL: https://github.com/apache/arrow/pull/9305#issuecomment-767773870


   PR is updated against master @jorgecarleitao @alamb 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9305: ARROW-11362:[Rust][DataFusion] Use iterator APIs in to_array_of_size to improve performance

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9305:
URL: https://github.com/apache/arrow/pull/9305#issuecomment-766197876






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #9305: ARROW-11362:[Rust][DataFusion] Use iterator APIs in to_array_of_size to improve performance

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #9305:
URL: https://github.com/apache/arrow/pull/9305#issuecomment-769974794


   I think it was still the recent rebase after 3.0 on master (which I think introduced some weirdness on the master branch?) causing this, I missed that it still had those commits in here.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9305: ARROW-11362:[Rust][DataFusion] Use iterator APIs in to_array_of_size to improve performance

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9305:
URL: https://github.com/apache/arrow/pull/9305#issuecomment-766195307


   https://issues.apache.org/jira/browse/ARROW-11362


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9305: ARROW-11362:[Rust][DataFusion] Use iterator APIs in to_array_of_size to improve performance

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9305:
URL: https://github.com/apache/arrow/pull/9305#issuecomment-766197876


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=h1) Report
   > Merging [#9305](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=desc) (d612b0f) into [master](https://codecov.io/gh/apache/arrow/commit/67d0c2e38011cd883059e3a9fd0ea08088661707?el=desc) (67d0c2e) will **increase** coverage by `0.01%`.
   > The diff coverage is `62.85%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9305/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9305      +/-   ##
   ==========================================
   + Coverage   81.84%   81.85%   +0.01%     
   ==========================================
     Files         215      215              
     Lines       52949    53035      +86     
   ==========================================
   + Hits        43336    43412      +76     
   - Misses       9613     9623      +10     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/datafusion/src/scalar.rs](https://codecov.io/gh/apache/arrow/pull/9305/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zY2FsYXIucnM=) | `56.04% <43.93%> (-2.93%)` | :arrow_down: |
   | [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/9305/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `94.48% <93.33%> (-0.05%)` | :arrow_down: |
   | [rust/arrow/src/array/array\_string.rs](https://codecov.io/gh/apache/arrow/pull/9305/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfc3RyaW5nLnJz) | `94.11% <95.83%> (+4.11%)` | :arrow_up: |
   | [rust/arrow/src/buffer.rs](https://codecov.io/gh/apache/arrow/pull/9305/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYnVmZmVyLnJz) | `96.21% <0.00%> (+2.52%)` | :arrow_up: |
   | [rust/arrow/src/array/transform/fixed\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9305/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvdHJhbnNmb3JtL2ZpeGVkX2JpbmFyeS5ycw==) | `84.21% <0.00%> (+5.26%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=footer). Last update [67d0c2e...6144a23](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #9305: ARROW-11362:[Rust][DataFusion] Use iterator APIs in to_array_of_size to improve performance

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #9305:
URL: https://github.com/apache/arrow/pull/9305#issuecomment-769955409


   Whoa. This PR needed to be rebased


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9305: ARROW-11362:[Rust][DataFusion] Use iterator APIs in to_array_of_size to improve performance

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9305:
URL: https://github.com/apache/arrow/pull/9305#discussion_r563510962



##########
File path: rust/datafusion/src/scalar.rs
##########
@@ -205,28 +205,104 @@ impl ScalarValue {
             ScalarValue::Boolean(e) => {
                 Arc::new(BooleanArray::from(vec![*e; size])) as ArrayRef
             }
-            ScalarValue::Float64(e) => {
-                Arc::new(Float64Array::from(vec![*e; size])) as ArrayRef
-            }
-            ScalarValue::Float32(e) => Arc::new(Float32Array::from(vec![*e; size])),
-            ScalarValue::Int8(e) => Arc::new(Int8Array::from(vec![*e; size])),
-            ScalarValue::Int16(e) => Arc::new(Int16Array::from(vec![*e; size])),
-            ScalarValue::Int32(e) => Arc::new(Int32Array::from(vec![*e; size])),
-            ScalarValue::Int64(e) => Arc::new(Int64Array::from(vec![*e; size])),
-            ScalarValue::UInt8(e) => Arc::new(UInt8Array::from(vec![*e; size])),
-            ScalarValue::UInt16(e) => Arc::new(UInt16Array::from(vec![*e; size])),
-            ScalarValue::UInt32(e) => Arc::new(UInt32Array::from(vec![*e; size])),
-            ScalarValue::UInt64(e) => Arc::new(UInt64Array::from(vec![*e; size])),
-            ScalarValue::TimeMicrosecond(e) => {
-                Arc::new(TimestampMicrosecondArray::from(vec![*e]))
-            }
-            ScalarValue::TimeNanosecond(e) => {
-                Arc::new(TimestampNanosecondArray::from_opt_vec(vec![*e], None))
-            }
-            ScalarValue::Utf8(e) => Arc::new(StringArray::from(vec![e.as_deref(); size])),
-            ScalarValue::LargeUtf8(e) => {
-                Arc::new(LargeStringArray::from(vec![e.as_deref(); size]))
-            }
+            ScalarValue::Float64(e) => match e {
+                Some(value) => {
+                    Arc::new(Float64Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Float64Array>()),
+            },
+            ScalarValue::Float32(e) => match e {
+                Some(value) => {
+                    Arc::new(Float32Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Float32Array>()),
+            },
+            ScalarValue::Int8(e) => match e {
+                Some(value) => {
+                    Arc::new(Int8Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Int8Array>()),
+            },
+            ScalarValue::Int16(e) => match e {
+                Some(value) => {
+                    Arc::new(Int16Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Int16Array>()),
+            },
+            ScalarValue::Int32(e) => match e {
+                Some(value) => {
+                    Arc::new(Int32Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Int32Array>()),
+            },
+            ScalarValue::Int64(e) => match e {
+                Some(value) => {
+                    Arc::new(Int64Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Int64Array>()),
+            },
+            ScalarValue::UInt8(e) => match e {
+                Some(value) => {
+                    Arc::new(UInt8Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<UInt8Array>()),
+            },
+            ScalarValue::UInt16(e) => match e {
+                Some(value) => {
+                    Arc::new(UInt16Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<UInt16Array>()),
+            },
+            ScalarValue::UInt32(e) => match e {
+                Some(value) => {
+                    Arc::new(UInt32Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<UInt32Array>()),
+            },
+            ScalarValue::UInt64(e) => match e {
+                Some(value) => {
+                    Arc::new(UInt64Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<UInt64Array>()),
+            },
+            ScalarValue::TimeMicrosecond(e) => match e {
+                Some(value) => Arc::new(TimestampMicrosecondArray::from_iter_values(
+                    repeat(*value).take(size),
+                )),
+                None => Arc::new(
+                    repeat(None)
+                        .take(size)
+                        .collect::<TimestampMicrosecondArray>(),
+                ),
+            },
+            ScalarValue::TimeNanosecond(e) => match e {
+                Some(value) => Arc::new(TimestampNanosecondArray::from_iter_values(
+                    repeat(*value).take(size),
+                )),
+                None => Arc::new(
+                    repeat(None)
+                        .take(size)
+                        .collect::<TimestampNanosecondArray>(),
+                ),
+            },
+            ScalarValue::Utf8(e) => match e {
+                Some(value) => {
+                    Arc::new(StringArray::from_iter_values(repeat(value).take(size)))
+                }
+                None => {
+                    Arc::new(repeat(e.as_deref()).take(size).collect::<StringArray>())

Review comment:
       any reason to use `e.as_deref` instead of `None`? Just curious.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #9305: ARROW-11362:[Rust][DataFusion] Use iterator APIs in to_array_of_size to improve performance

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9305:
URL: https://github.com/apache/arrow/pull/9305#discussion_r563566682



##########
File path: rust/datafusion/src/scalar.rs
##########
@@ -205,28 +205,104 @@ impl ScalarValue {
             ScalarValue::Boolean(e) => {
                 Arc::new(BooleanArray::from(vec![*e; size])) as ArrayRef
             }
-            ScalarValue::Float64(e) => {
-                Arc::new(Float64Array::from(vec![*e; size])) as ArrayRef
-            }
-            ScalarValue::Float32(e) => Arc::new(Float32Array::from(vec![*e; size])),
-            ScalarValue::Int8(e) => Arc::new(Int8Array::from(vec![*e; size])),
-            ScalarValue::Int16(e) => Arc::new(Int16Array::from(vec![*e; size])),
-            ScalarValue::Int32(e) => Arc::new(Int32Array::from(vec![*e; size])),
-            ScalarValue::Int64(e) => Arc::new(Int64Array::from(vec![*e; size])),
-            ScalarValue::UInt8(e) => Arc::new(UInt8Array::from(vec![*e; size])),
-            ScalarValue::UInt16(e) => Arc::new(UInt16Array::from(vec![*e; size])),
-            ScalarValue::UInt32(e) => Arc::new(UInt32Array::from(vec![*e; size])),
-            ScalarValue::UInt64(e) => Arc::new(UInt64Array::from(vec![*e; size])),
-            ScalarValue::TimeMicrosecond(e) => {
-                Arc::new(TimestampMicrosecondArray::from(vec![*e]))
-            }
-            ScalarValue::TimeNanosecond(e) => {
-                Arc::new(TimestampNanosecondArray::from_opt_vec(vec![*e], None))
-            }
-            ScalarValue::Utf8(e) => Arc::new(StringArray::from(vec![e.as_deref(); size])),
-            ScalarValue::LargeUtf8(e) => {
-                Arc::new(LargeStringArray::from(vec![e.as_deref(); size]))
-            }
+            ScalarValue::Float64(e) => match e {
+                Some(value) => {
+                    Arc::new(Float64Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Float64Array>()),
+            },
+            ScalarValue::Float32(e) => match e {
+                Some(value) => {
+                    Arc::new(Float32Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Float32Array>()),
+            },
+            ScalarValue::Int8(e) => match e {
+                Some(value) => {
+                    Arc::new(Int8Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Int8Array>()),
+            },
+            ScalarValue::Int16(e) => match e {
+                Some(value) => {
+                    Arc::new(Int16Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Int16Array>()),
+            },
+            ScalarValue::Int32(e) => match e {
+                Some(value) => {
+                    Arc::new(Int32Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Int32Array>()),
+            },
+            ScalarValue::Int64(e) => match e {
+                Some(value) => {
+                    Arc::new(Int64Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Int64Array>()),
+            },
+            ScalarValue::UInt8(e) => match e {
+                Some(value) => {
+                    Arc::new(UInt8Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<UInt8Array>()),
+            },
+            ScalarValue::UInt16(e) => match e {
+                Some(value) => {
+                    Arc::new(UInt16Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<UInt16Array>()),
+            },
+            ScalarValue::UInt32(e) => match e {
+                Some(value) => {
+                    Arc::new(UInt32Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<UInt32Array>()),
+            },
+            ScalarValue::UInt64(e) => match e {
+                Some(value) => {
+                    Arc::new(UInt64Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<UInt64Array>()),
+            },
+            ScalarValue::TimeMicrosecond(e) => match e {
+                Some(value) => Arc::new(TimestampMicrosecondArray::from_iter_values(
+                    repeat(*value).take(size),
+                )),
+                None => Arc::new(
+                    repeat(None)
+                        .take(size)
+                        .collect::<TimestampMicrosecondArray>(),
+                ),
+            },
+            ScalarValue::TimeNanosecond(e) => match e {
+                Some(value) => Arc::new(TimestampNanosecondArray::from_iter_values(
+                    repeat(*value).take(size),
+                )),
+                None => Arc::new(
+                    repeat(None)
+                        .take(size)
+                        .collect::<TimestampNanosecondArray>(),
+                ),
+            },
+            ScalarValue::Utf8(e) => match e {
+                Some(value) => {
+                    Arc::new(StringArray::from_iter_values(repeat(value).take(size)))
+                }
+                None => {
+                    Arc::new(repeat(e.as_deref()).take(size).collect::<StringArray>())

Review comment:
       Updated.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #9305: ARROW-11362:[Rust][DataFusion] Use iterator APIs in to_array_of_size to improve performance

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9305:
URL: https://github.com/apache/arrow/pull/9305#discussion_r563564784



##########
File path: rust/datafusion/src/scalar.rs
##########
@@ -205,28 +205,104 @@ impl ScalarValue {
             ScalarValue::Boolean(e) => {
                 Arc::new(BooleanArray::from(vec![*e; size])) as ArrayRef
             }
-            ScalarValue::Float64(e) => {
-                Arc::new(Float64Array::from(vec![*e; size])) as ArrayRef
-            }
-            ScalarValue::Float32(e) => Arc::new(Float32Array::from(vec![*e; size])),
-            ScalarValue::Int8(e) => Arc::new(Int8Array::from(vec![*e; size])),
-            ScalarValue::Int16(e) => Arc::new(Int16Array::from(vec![*e; size])),
-            ScalarValue::Int32(e) => Arc::new(Int32Array::from(vec![*e; size])),
-            ScalarValue::Int64(e) => Arc::new(Int64Array::from(vec![*e; size])),
-            ScalarValue::UInt8(e) => Arc::new(UInt8Array::from(vec![*e; size])),
-            ScalarValue::UInt16(e) => Arc::new(UInt16Array::from(vec![*e; size])),
-            ScalarValue::UInt32(e) => Arc::new(UInt32Array::from(vec![*e; size])),
-            ScalarValue::UInt64(e) => Arc::new(UInt64Array::from(vec![*e; size])),
-            ScalarValue::TimeMicrosecond(e) => {
-                Arc::new(TimestampMicrosecondArray::from(vec![*e]))
-            }
-            ScalarValue::TimeNanosecond(e) => {
-                Arc::new(TimestampNanosecondArray::from_opt_vec(vec![*e], None))
-            }
-            ScalarValue::Utf8(e) => Arc::new(StringArray::from(vec![e.as_deref(); size])),
-            ScalarValue::LargeUtf8(e) => {
-                Arc::new(LargeStringArray::from(vec![e.as_deref(); size]))
-            }
+            ScalarValue::Float64(e) => match e {
+                Some(value) => {
+                    Arc::new(Float64Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Float64Array>()),
+            },
+            ScalarValue::Float32(e) => match e {
+                Some(value) => {
+                    Arc::new(Float32Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Float32Array>()),
+            },
+            ScalarValue::Int8(e) => match e {
+                Some(value) => {
+                    Arc::new(Int8Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Int8Array>()),
+            },
+            ScalarValue::Int16(e) => match e {
+                Some(value) => {
+                    Arc::new(Int16Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Int16Array>()),
+            },
+            ScalarValue::Int32(e) => match e {
+                Some(value) => {
+                    Arc::new(Int32Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Int32Array>()),
+            },
+            ScalarValue::Int64(e) => match e {
+                Some(value) => {
+                    Arc::new(Int64Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Int64Array>()),
+            },
+            ScalarValue::UInt8(e) => match e {
+                Some(value) => {
+                    Arc::new(UInt8Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<UInt8Array>()),
+            },
+            ScalarValue::UInt16(e) => match e {
+                Some(value) => {
+                    Arc::new(UInt16Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<UInt16Array>()),
+            },
+            ScalarValue::UInt32(e) => match e {
+                Some(value) => {
+                    Arc::new(UInt32Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<UInt32Array>()),
+            },
+            ScalarValue::UInt64(e) => match e {
+                Some(value) => {
+                    Arc::new(UInt64Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<UInt64Array>()),
+            },
+            ScalarValue::TimeMicrosecond(e) => match e {
+                Some(value) => Arc::new(TimestampMicrosecondArray::from_iter_values(
+                    repeat(*value).take(size),
+                )),
+                None => Arc::new(
+                    repeat(None)
+                        .take(size)
+                        .collect::<TimestampMicrosecondArray>(),
+                ),
+            },
+            ScalarValue::TimeNanosecond(e) => match e {
+                Some(value) => Arc::new(TimestampNanosecondArray::from_iter_values(
+                    repeat(*value).take(size),
+                )),
+                None => Arc::new(
+                    repeat(None)
+                        .take(size)
+                        .collect::<TimestampNanosecondArray>(),
+                ),
+            },
+            ScalarValue::Utf8(e) => match e {
+                Some(value) => {
+                    Arc::new(StringArray::from_iter_values(repeat(value).take(size)))
+                }
+                None => {
+                    Arc::new(repeat(e.as_deref()).take(size).collect::<StringArray>())

Review comment:
       I couldn't get the `None` compiling here. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #9305: ARROW-11362:[Rust][DataFusion] Use iterator APIs in to_array_of_size to improve performance

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #9305:
URL: https://github.com/apache/arrow/pull/9305#issuecomment-766329191


   I noticed that this is also being used in `hash_aggregate::create_batch_from_map`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #9305: ARROW-11362:[Rust][DataFusion] Use iterator APIs in to_array_of_size to improve performance

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #9305:
URL: https://github.com/apache/arrow/pull/9305#issuecomment-766343030


   CI failures don't seem to be related to the changes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9305: ARROW-11362:[Rust][DataFusion] Use iterator APIs in to_array_of_size to improve performance

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9305:
URL: https://github.com/apache/arrow/pull/9305#issuecomment-770001718


   All the more reason not to be rebasing master going forward I would say


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm edited a comment on pull request #9305: ARROW-11362:[Rust][DataFusion] Use iterator APIs in to_array_of_size to improve performance

Posted by GitBox <gi...@apache.org>.
wesm edited a comment on pull request #9305:
URL: https://github.com/apache/arrow/pull/9305#issuecomment-769955409


   Whoa. This PR needed to be rebased (https://github.com/apache/arrow/pull/9305/commits)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao closed pull request #9305: ARROW-11362:[Rust][DataFusion] Use iterator APIs in to_array_of_size to improve performance

Posted by GitBox <gi...@apache.org>.
jorgecarleitao closed pull request #9305:
URL: https://github.com/apache/arrow/pull/9305


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io commented on pull request #9305: ARROW-11362:[Rust][DataFusion] Use iterator APIs in to_array_of_size to improve performance

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #9305:
URL: https://github.com/apache/arrow/pull/9305#issuecomment-766197876


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=h1) Report
   > Merging [#9305](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=desc) (69c298e) into [master](https://codecov.io/gh/apache/arrow/commit/67d0c2e38011cd883059e3a9fd0ea08088661707?el=desc) (67d0c2e) will **increase** coverage by `0.01%`.
   > The diff coverage is `63.46%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9305/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9305      +/-   ##
   ==========================================
   + Coverage   81.84%   81.85%   +0.01%     
   ==========================================
     Files         215      215              
     Lines       52949    53034      +85     
   ==========================================
   + Hits        43336    43412      +76     
   - Misses       9613     9622       +9     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/datafusion/src/scalar.rs](https://codecov.io/gh/apache/arrow/pull/9305/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zY2FsYXIucnM=) | `56.22% <44.61%> (-2.74%)` | :arrow_down: |
   | [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/9305/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `94.48% <93.33%> (-0.05%)` | :arrow_down: |
   | [rust/arrow/src/array/array\_string.rs](https://codecov.io/gh/apache/arrow/pull/9305/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfc3RyaW5nLnJz) | `94.11% <95.83%> (+4.11%)` | :arrow_up: |
   | [rust/arrow/src/buffer.rs](https://codecov.io/gh/apache/arrow/pull/9305/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYnVmZmVyLnJz) | `96.21% <0.00%> (+2.52%)` | :arrow_up: |
   | [rust/arrow/src/array/transform/fixed\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9305/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvdHJhbnNmb3JtL2ZpeGVkX2JpbmFyeS5ycw==) | `84.21% <0.00%> (+5.26%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=footer). Last update [67d0c2e...69c298e](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9305: ARROW-11362:[Rust][DataFusion] Use iterator APIs in to_array_of_size to improve performance

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9305:
URL: https://github.com/apache/arrow/pull/9305#issuecomment-766197876


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=h1) Report
   > Merging [#9305](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=desc) (e07f7e5) into [master](https://codecov.io/gh/apache/arrow/commit/cf7638fd61a9371630db63ebe866fbc4693d482f?el=desc) (cf7638f) will **decrease** coverage by `0.04%`.
   > The diff coverage is `43.28%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9305/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9305      +/-   ##
   ==========================================
   - Coverage   81.89%   81.85%   -0.05%     
   ==========================================
     Files         215      215              
     Lines       52988    53036      +48     
   ==========================================
   + Hits        43392    43410      +18     
   - Misses       9596     9626      +30     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/datafusion/src/scalar.rs](https://codecov.io/gh/apache/arrow/pull/9305/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zY2FsYXIucnM=) | `55.85% <43.28%> (-3.12%)` | :arrow_down: |
   | [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9305/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `95.24% <0.00%> (-0.20%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=footer). Last update [cf7638f...e07f7e5](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9305: ARROW-11362:[Rust][DataFusion] Use iterator APIs in to_array_of_size to improve performance

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9305:
URL: https://github.com/apache/arrow/pull/9305#discussion_r563510962



##########
File path: rust/datafusion/src/scalar.rs
##########
@@ -205,28 +205,104 @@ impl ScalarValue {
             ScalarValue::Boolean(e) => {
                 Arc::new(BooleanArray::from(vec![*e; size])) as ArrayRef
             }
-            ScalarValue::Float64(e) => {
-                Arc::new(Float64Array::from(vec![*e; size])) as ArrayRef
-            }
-            ScalarValue::Float32(e) => Arc::new(Float32Array::from(vec![*e; size])),
-            ScalarValue::Int8(e) => Arc::new(Int8Array::from(vec![*e; size])),
-            ScalarValue::Int16(e) => Arc::new(Int16Array::from(vec![*e; size])),
-            ScalarValue::Int32(e) => Arc::new(Int32Array::from(vec![*e; size])),
-            ScalarValue::Int64(e) => Arc::new(Int64Array::from(vec![*e; size])),
-            ScalarValue::UInt8(e) => Arc::new(UInt8Array::from(vec![*e; size])),
-            ScalarValue::UInt16(e) => Arc::new(UInt16Array::from(vec![*e; size])),
-            ScalarValue::UInt32(e) => Arc::new(UInt32Array::from(vec![*e; size])),
-            ScalarValue::UInt64(e) => Arc::new(UInt64Array::from(vec![*e; size])),
-            ScalarValue::TimeMicrosecond(e) => {
-                Arc::new(TimestampMicrosecondArray::from(vec![*e]))
-            }
-            ScalarValue::TimeNanosecond(e) => {
-                Arc::new(TimestampNanosecondArray::from_opt_vec(vec![*e], None))
-            }
-            ScalarValue::Utf8(e) => Arc::new(StringArray::from(vec![e.as_deref(); size])),
-            ScalarValue::LargeUtf8(e) => {
-                Arc::new(LargeStringArray::from(vec![e.as_deref(); size]))
-            }
+            ScalarValue::Float64(e) => match e {
+                Some(value) => {
+                    Arc::new(Float64Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Float64Array>()),
+            },
+            ScalarValue::Float32(e) => match e {
+                Some(value) => {
+                    Arc::new(Float32Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Float32Array>()),
+            },
+            ScalarValue::Int8(e) => match e {
+                Some(value) => {
+                    Arc::new(Int8Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Int8Array>()),
+            },
+            ScalarValue::Int16(e) => match e {
+                Some(value) => {
+                    Arc::new(Int16Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Int16Array>()),
+            },
+            ScalarValue::Int32(e) => match e {
+                Some(value) => {
+                    Arc::new(Int32Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Int32Array>()),
+            },
+            ScalarValue::Int64(e) => match e {
+                Some(value) => {
+                    Arc::new(Int64Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<Int64Array>()),
+            },
+            ScalarValue::UInt8(e) => match e {
+                Some(value) => {
+                    Arc::new(UInt8Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<UInt8Array>()),
+            },
+            ScalarValue::UInt16(e) => match e {
+                Some(value) => {
+                    Arc::new(UInt16Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<UInt16Array>()),
+            },
+            ScalarValue::UInt32(e) => match e {
+                Some(value) => {
+                    Arc::new(UInt32Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<UInt32Array>()),
+            },
+            ScalarValue::UInt64(e) => match e {
+                Some(value) => {
+                    Arc::new(UInt64Array::from_iter_values(repeat(*value).take(size)))
+                }
+                None => Arc::new(repeat(None).take(size).collect::<UInt64Array>()),
+            },
+            ScalarValue::TimeMicrosecond(e) => match e {
+                Some(value) => Arc::new(TimestampMicrosecondArray::from_iter_values(
+                    repeat(*value).take(size),
+                )),
+                None => Arc::new(
+                    repeat(None)
+                        .take(size)
+                        .collect::<TimestampMicrosecondArray>(),
+                ),
+            },
+            ScalarValue::TimeNanosecond(e) => match e {
+                Some(value) => Arc::new(TimestampNanosecondArray::from_iter_values(
+                    repeat(*value).take(size),
+                )),
+                None => Arc::new(
+                    repeat(None)
+                        .take(size)
+                        .collect::<TimestampNanosecondArray>(),
+                ),
+            },
+            ScalarValue::Utf8(e) => match e {
+                Some(value) => {
+                    Arc::new(StringArray::from_iter_values(repeat(value).take(size)))
+                }
+                None => {
+                    Arc::new(repeat(e.as_deref()).take(size).collect::<StringArray>())

Review comment:
       any reason to use `e.as_deref` instead of `None`? Just curious.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9305: ARROW-11362:[Rust][DataFusion] Use iterator APIs in to_array_of_size to improve performance

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9305:
URL: https://github.com/apache/arrow/pull/9305#issuecomment-766197876


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=h1) Report
   > Merging [#9305](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=desc) (555eb1d) into [master](https://codecov.io/gh/apache/arrow/commit/cf7638fd61a9371630db63ebe866fbc4693d482f?el=desc) (cf7638f) will **decrease** coverage by `0.04%`.
   > The diff coverage is `43.28%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9305/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9305      +/-   ##
   ==========================================
   - Coverage   81.89%   81.85%   -0.05%     
   ==========================================
     Files         215      215              
     Lines       52988    53036      +48     
   ==========================================
   + Hits        43392    43410      +18     
   - Misses       9596     9626      +30     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/datafusion/src/scalar.rs](https://codecov.io/gh/apache/arrow/pull/9305/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zY2FsYXIucnM=) | `55.85% <43.28%> (-3.12%)` | :arrow_down: |
   | [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9305/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `95.24% <0.00%> (-0.20%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=footer). Last update [cf7638f...555eb1d](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9305: ARROW-11362:[Rust][DataFusion] Use iterator APIs in to_array_of_size to improve performance

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9305:
URL: https://github.com/apache/arrow/pull/9305#issuecomment-766197876


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=h1) Report
   > Merging [#9305](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=desc) (6144a23) into [master](https://codecov.io/gh/apache/arrow/commit/67d0c2e38011cd883059e3a9fd0ea08088661707?el=desc) (67d0c2e) will **increase** coverage by `0.00%`.
   > The diff coverage is `62.26%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9305/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=tree)
   
   ```diff
   @@           Coverage Diff           @@
   ##           master    #9305   +/-   ##
   =======================================
     Coverage   81.84%   81.85%           
   =======================================
     Files         215      215           
     Lines       52949    53036   +87     
   =======================================
   + Hits        43336    43412   +76     
   - Misses       9613     9624   +11     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/datafusion/src/scalar.rs](https://codecov.io/gh/apache/arrow/pull/9305/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zY2FsYXIucnM=) | `55.85% <43.28%> (-3.12%)` | :arrow_down: |
   | [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/9305/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `94.48% <93.33%> (-0.05%)` | :arrow_down: |
   | [rust/arrow/src/array/array\_string.rs](https://codecov.io/gh/apache/arrow/pull/9305/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfc3RyaW5nLnJz) | `94.11% <95.83%> (+4.11%)` | :arrow_up: |
   | [rust/arrow/src/buffer.rs](https://codecov.io/gh/apache/arrow/pull/9305/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYnVmZmVyLnJz) | `96.21% <0.00%> (+2.52%)` | :arrow_up: |
   | [rust/arrow/src/array/transform/fixed\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9305/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvdHJhbnNmb3JtL2ZpeGVkX2JpbmFyeS5ycw==) | `84.21% <0.00%> (+5.26%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=footer). Last update [67d0c2e...6144a23](https://codecov.io/gh/apache/arrow/pull/9305?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #9305: ARROW-11362:[Rust][DataFusion] Use iterator APIs in to_array_of_size to improve performance

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #9305:
URL: https://github.com/apache/arrow/pull/9305#issuecomment-766329608


   @jorgecarleitao yes, but those are on `GroupByScalar` instead of `ScalarValue`. I think it might clean up some code by having just one  `ScalarValue` enum defined and reusing functions like this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org