You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/26 00:05:52 UTC

[GitHub] [arrow] jorgecarleitao opened a new pull request #9014: ARROW-11035: [Rust] Improved performance of casting to utf8

jorgecarleitao opened a new pull request #9014:
URL: https://github.com/apache/arrow/pull/9014


   cast i64 to string 512  time:   [92.618 us 92.839 us 93.097 us]                                   
                           change: [-14.915% -14.287% -13.743%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io commented on pull request #9014: ARROW-11035: [Rust] Improved performance of casting to utf8

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #9014:
URL: https://github.com/apache/arrow/pull/9014#issuecomment-751302721


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=h1) Report
   > Merging [#9014](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=desc) (dcd08eb) into [master](https://codecov.io/gh/apache/arrow/commit/a4f7c4a2acda874b3d6eb2eb4c986e7c3267c755?el=desc) (a4f7c4a) will **increase** coverage by `0.00%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9014/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=tree)
   
   ```diff
   @@           Coverage Diff           @@
   ##           master    #9014   +/-   ##
   =======================================
     Coverage   82.87%   82.87%           
   =======================================
     Files         201      201           
     Lines       49739    49728   -11     
   =======================================
   - Hits        41220    41213    -7     
   + Misses       8519     8515    -4     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/compute/kernels/cast.rs](https://codecov.io/gh/apache/arrow/pull/9014/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2Nhc3QucnM=) | `97.00% <100.00%> (+0.16%)` | :arrow_up: |
   | [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9014/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `95.43% <0.00%> (+0.19%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=footer). Last update [a4f7c4a...6f9ca05](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9014: ARROW-11035: [Rust] Improved performance of casting to utf8

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9014:
URL: https://github.com/apache/arrow/pull/9014#issuecomment-751302721


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=h1) Report
   > Merging [#9014](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=desc) (5d3fc8e) into [master](https://codecov.io/gh/apache/arrow/commit/a4f7c4a2acda874b3d6eb2eb4c986e7c3267c755?el=desc) (a4f7c4a) will **decrease** coverage by `0.00%`.
   > The diff coverage is `91.30%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9014/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9014      +/-   ##
   ==========================================
   - Coverage   82.87%   82.87%   -0.01%     
   ==========================================
     Files         201      201              
     Lines       49739    49718      -21     
   ==========================================
   - Hits        41220    41202      -18     
   + Misses       8519     8516       -3     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/json/reader.rs](https://codecov.io/gh/apache/arrow/pull/9014/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvanNvbi9yZWFkZXIucnM=) | `81.39% <71.42%> (-0.11%)` | :arrow_down: |
   | [rust/arrow/src/compute/kernels/cast.rs](https://codecov.io/gh/apache/arrow/pull/9014/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2Nhc3QucnM=) | `97.00% <100.00%> (+0.16%)` | :arrow_up: |
   | [rust/arrow/src/csv/reader.rs](https://codecov.io/gh/apache/arrow/pull/9014/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY3N2L3JlYWRlci5ycw==) | `94.48% <100.00%> (+0.15%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=footer). Last update [a4f7c4a...5d3fc8e](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #9014: ARROW-11035: [Rust] Improved performance of casting to utf8

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9014:
URL: https://github.com/apache/arrow/pull/9014#discussion_r548945814



##########
File path: rust/arrow/src/compute/kernels/cast.rs
##########
@@ -351,17 +351,13 @@ pub fn cast(array: &ArrayRef, to_type: &DataType) -> Result<ArrayRef> {
             Float32 => cast_bool_to_numeric::<Float32Type>(array),
             Float64 => cast_bool_to_numeric::<Float64Type>(array),
             Utf8 => {
-                let from = array.as_any().downcast_ref::<BooleanArray>().unwrap();
-                let mut b = StringBuilder::new(array.len());
-                for i in 0..array.len() {
-                    if array.is_null(i) {
-                        b.append(false)?;
-                    } else {
-                        b.append_value(if from.value(i) { "1" } else { "0" })?;
-                    }
-                }
-
-                Ok(Arc::new(b.finish()) as ArrayRef)
+                let array = array.as_any().downcast_ref::<BooleanArray>().unwrap();
+                Ok(Arc::new(
+                    array
+                        .iter()
+                        .map(|value| value.map(|value| if value { "1" } else { "0" }))

Review comment:
       Does the speed up come here from not using the string builder or is using this iterator also faster?
   It looks at least better, so if no difference this is better 👍 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9014: ARROW-11035: [Rust] Improved performance of casting to utf8

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9014:
URL: https://github.com/apache/arrow/pull/9014#issuecomment-751302721


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=h1) Report
   > Merging [#9014](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=desc) (52e1ecb) into [master](https://codecov.io/gh/apache/arrow/commit/51672b28e97f19f70de0f0a8800c40ee9bb939d3?el=desc) (51672b2) will **decrease** coverage by `0.00%`.
   > The diff coverage is `91.30%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9014/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9014      +/-   ##
   ==========================================
   - Coverage   82.61%   82.61%   -0.01%     
   ==========================================
     Files         202      202              
     Lines       50048    50027      -21     
   ==========================================
   - Hits        41347    41328      -19     
   + Misses       8701     8699       -2     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/json/reader.rs](https://codecov.io/gh/apache/arrow/pull/9014/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvanNvbi9yZWFkZXIucnM=) | `81.39% <71.42%> (-0.11%)` | :arrow_down: |
   | [rust/arrow/src/compute/kernels/cast.rs](https://codecov.io/gh/apache/arrow/pull/9014/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2Nhc3QucnM=) | `96.99% <100.00%> (+0.16%)` | :arrow_up: |
   | [rust/arrow/src/csv/reader.rs](https://codecov.io/gh/apache/arrow/pull/9014/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY3N2L3JlYWRlci5ycw==) | `94.48% <100.00%> (+0.15%)` | :arrow_up: |
   | [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9014/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `95.24% <0.00%> (-0.20%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=footer). Last update [51672b2...52e1ecb](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9014: ARROW-11035: [Rust] Improved performance of casting to utf8

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9014:
URL: https://github.com/apache/arrow/pull/9014#issuecomment-751302721


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=h1) Report
   > Merging [#9014](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=desc) (2efd7e5) into [master](https://codecov.io/gh/apache/arrow/commit/a4f7c4a2acda874b3d6eb2eb4c986e7c3267c755?el=desc) (a4f7c4a) will **increase** coverage by `0.00%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9014/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=tree)
   
   ```diff
   @@           Coverage Diff           @@
   ##           master    #9014   +/-   ##
   =======================================
     Coverage   82.87%   82.87%           
   =======================================
     Files         201      201           
     Lines       49739    49724   -15     
   =======================================
   - Hits        41220    41209   -11     
   + Misses       8519     8515    -4     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/compute/kernels/cast.rs](https://codecov.io/gh/apache/arrow/pull/9014/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2Nhc3QucnM=) | `97.00% <100.00%> (+0.16%)` | :arrow_up: |
   | [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9014/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `95.43% <0.00%> (+0.19%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=footer). Last update [a4f7c4a...5d3fc8e](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9014: ARROW-11035: [Rust] Improved performance of casting to utf8

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9014:
URL: https://github.com/apache/arrow/pull/9014#issuecomment-751307172


   https://issues.apache.org/jira/browse/ARROW-11035


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9014: ARROW-11035: [Rust] Improved performance of casting to utf8

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9014:
URL: https://github.com/apache/arrow/pull/9014#issuecomment-752956722


   The full set of Rust CI tests did not run on this PR :(
   
   Can you please rebase this PR against [apache/master](https://github.com/apache/arrow) to pick up the changes in https://github.com/apache/arrow/pull/9056 so that they do? 
   
   I apologize for the inconvenience. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #9014: ARROW-11035: [Rust] Improved performance of casting to utf8

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9014:
URL: https://github.com/apache/arrow/pull/9014#discussion_r548944938



##########
File path: rust/arrow/src/compute/kernels/cast.rs
##########
@@ -895,31 +886,22 @@ where
     FROM: ArrowNumericType,
     FROM::Native: std::string::ToString,
 {
-    numeric_to_string_cast::<FROM>(
+    Ok(Arc::new(numeric_to_string_cast::<FROM>(
         array
             .as_any()
             .downcast_ref::<PrimitiveArray<FROM>>()
             .unwrap(),
-    )
-    .map(|to| Arc::new(to) as ArrayRef)
+    )))
 }
 
-fn numeric_to_string_cast<T>(from: &PrimitiveArray<T>) -> Result<StringArray>
+fn numeric_to_string_cast<T>(from: &PrimitiveArray<T>) -> StringArray
 where
     T: ArrowPrimitiveType + ArrowNumericType,
     T::Native: std::string::ToString,
 {
-    let mut b = StringBuilder::new(from.len());
-
-    for i in 0..from.len() {
-        if from.is_null(i) {
-            b.append(false)?;
-        } else {
-            b.append_value(&from.value(i).to_string())?;
-        }
-    }
-
-    Ok(b.finish())
+    from.iter()
+        .map(|maybe_value| maybe_value.map(|value| value.to_string()))

Review comment:
       Note, we probably can use lexical like here later https://github.com/apache/arrow/pull/9010




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9014: ARROW-11035: [Rust] Improved performance of casting to utf8

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9014:
URL: https://github.com/apache/arrow/pull/9014#issuecomment-751302721


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=h1) Report
   > Merging [#9014](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=desc) (6f9ca05) into [master](https://codecov.io/gh/apache/arrow/commit/a4f7c4a2acda874b3d6eb2eb4c986e7c3267c755?el=desc) (a4f7c4a) will **increase** coverage by `0.00%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9014/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=tree)
   
   ```diff
   @@           Coverage Diff           @@
   ##           master    #9014   +/-   ##
   =======================================
     Coverage   82.87%   82.87%           
   =======================================
     Files         201      201           
     Lines       49739    49728   -11     
   =======================================
   - Hits        41220    41213    -7     
   + Misses       8519     8515    -4     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/compute/kernels/cast.rs](https://codecov.io/gh/apache/arrow/pull/9014/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2Nhc3QucnM=) | `97.00% <100.00%> (+0.16%)` | :arrow_up: |
   | [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9014/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `95.43% <0.00%> (+0.19%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=footer). Last update [a4f7c4a...6f9ca05](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb closed pull request #9014: ARROW-11035: [Rust] Improved performance of casting to utf8

Posted by GitBox <gi...@apache.org>.
alamb closed pull request #9014:
URL: https://github.com/apache/arrow/pull/9014


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9014: ARROW-11035: [Rust] Improved performance of casting to utf8

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9014:
URL: https://github.com/apache/arrow/pull/9014#discussion_r548946031



##########
File path: rust/arrow/src/compute/kernels/cast.rs
##########
@@ -351,17 +351,13 @@ pub fn cast(array: &ArrayRef, to_type: &DataType) -> Result<ArrayRef> {
             Float32 => cast_bool_to_numeric::<Float32Type>(array),
             Float64 => cast_bool_to_numeric::<Float64Type>(array),
             Utf8 => {
-                let from = array.as_any().downcast_ref::<BooleanArray>().unwrap();
-                let mut b = StringBuilder::new(array.len());
-                for i in 0..array.len() {
-                    if array.is_null(i) {
-                        b.append(false)?;
-                    } else {
-                        b.append_value(if from.value(i) { "1" } else { "0" })?;
-                    }
-                }
-
-                Ok(Arc::new(b.finish()) as ArrayRef)
+                let array = array.as_any().downcast_ref::<BooleanArray>().unwrap();
+                Ok(Arc::new(
+                    array
+                        .iter()
+                        .map(|value| value.map(|value| if value { "1" } else { "0" }))

Review comment:
       Good question. I suspect the builder, because the iterator does the same thing as before atm (i.e. same bound checks).
   
   The builders IMO are inefficient atm. Since IMO they are less idiomatic, I do not see any issue in replacing them whenever we can ^_^




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org