You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/26 00:05:52 UTC
[GitHub] [arrow] jorgecarleitao opened a new pull request #9014: ARROW-11035: [Rust] Improved performance of casting to utf8
jorgecarleitao opened a new pull request #9014:
URL: https://github.com/apache/arrow/pull/9014
cast i64 to string 512 time: [92.618 us 92.839 us 93.097 us]
change: [-14.915% -14.287% -13.743%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io commented on pull request #9014: ARROW-11035: [Rust] Improved performance of casting to utf8
Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #9014:
URL: https://github.com/apache/arrow/pull/9014#issuecomment-751302721
# [Codecov](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=h1) Report
> Merging [#9014](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=desc) (dcd08eb) into [master](https://codecov.io/gh/apache/arrow/commit/a4f7c4a2acda874b3d6eb2eb4c986e7c3267c755?el=desc) (a4f7c4a) will **increase** coverage by `0.00%`.
> The diff coverage is `100.00%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9014/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #9014 +/- ##
=======================================
Coverage 82.87% 82.87%
=======================================
Files 201 201
Lines 49739 49728 -11
=======================================
- Hits 41220 41213 -7
+ Misses 8519 8515 -4
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=tree) | Coverage Δ | |
|---|---|---|
| [rust/arrow/src/compute/kernels/cast.rs](https://codecov.io/gh/apache/arrow/pull/9014/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2Nhc3QucnM=) | `97.00% <100.00%> (+0.16%)` | :arrow_up: |
| [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9014/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `95.43% <0.00%> (+0.19%)` | :arrow_up: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=footer). Last update [a4f7c4a...6f9ca05](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io edited a comment on pull request #9014: ARROW-11035: [Rust] Improved performance of casting to utf8
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9014:
URL: https://github.com/apache/arrow/pull/9014#issuecomment-751302721
# [Codecov](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=h1) Report
> Merging [#9014](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=desc) (5d3fc8e) into [master](https://codecov.io/gh/apache/arrow/commit/a4f7c4a2acda874b3d6eb2eb4c986e7c3267c755?el=desc) (a4f7c4a) will **decrease** coverage by `0.00%`.
> The diff coverage is `91.30%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9014/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #9014 +/- ##
==========================================
- Coverage 82.87% 82.87% -0.01%
==========================================
Files 201 201
Lines 49739 49718 -21
==========================================
- Hits 41220 41202 -18
+ Misses 8519 8516 -3
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=tree) | Coverage Δ | |
|---|---|---|
| [rust/arrow/src/json/reader.rs](https://codecov.io/gh/apache/arrow/pull/9014/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvanNvbi9yZWFkZXIucnM=) | `81.39% <71.42%> (-0.11%)` | :arrow_down: |
| [rust/arrow/src/compute/kernels/cast.rs](https://codecov.io/gh/apache/arrow/pull/9014/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2Nhc3QucnM=) | `97.00% <100.00%> (+0.16%)` | :arrow_up: |
| [rust/arrow/src/csv/reader.rs](https://codecov.io/gh/apache/arrow/pull/9014/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY3N2L3JlYWRlci5ycw==) | `94.48% <100.00%> (+0.15%)` | :arrow_up: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=footer). Last update [a4f7c4a...5d3fc8e](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] Dandandan commented on a change in pull request #9014: ARROW-11035: [Rust] Improved performance of casting to utf8
Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9014:
URL: https://github.com/apache/arrow/pull/9014#discussion_r548945814
##########
File path: rust/arrow/src/compute/kernels/cast.rs
##########
@@ -351,17 +351,13 @@ pub fn cast(array: &ArrayRef, to_type: &DataType) -> Result<ArrayRef> {
Float32 => cast_bool_to_numeric::<Float32Type>(array),
Float64 => cast_bool_to_numeric::<Float64Type>(array),
Utf8 => {
- let from = array.as_any().downcast_ref::<BooleanArray>().unwrap();
- let mut b = StringBuilder::new(array.len());
- for i in 0..array.len() {
- if array.is_null(i) {
- b.append(false)?;
- } else {
- b.append_value(if from.value(i) { "1" } else { "0" })?;
- }
- }
-
- Ok(Arc::new(b.finish()) as ArrayRef)
+ let array = array.as_any().downcast_ref::<BooleanArray>().unwrap();
+ Ok(Arc::new(
+ array
+ .iter()
+ .map(|value| value.map(|value| if value { "1" } else { "0" }))
Review comment:
Does the speed up come here from not using the string builder or is using this iterator also faster?
It looks at least better, so if no difference this is better 👍
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io edited a comment on pull request #9014: ARROW-11035: [Rust] Improved performance of casting to utf8
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9014:
URL: https://github.com/apache/arrow/pull/9014#issuecomment-751302721
# [Codecov](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=h1) Report
> Merging [#9014](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=desc) (52e1ecb) into [master](https://codecov.io/gh/apache/arrow/commit/51672b28e97f19f70de0f0a8800c40ee9bb939d3?el=desc) (51672b2) will **decrease** coverage by `0.00%`.
> The diff coverage is `91.30%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9014/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #9014 +/- ##
==========================================
- Coverage 82.61% 82.61% -0.01%
==========================================
Files 202 202
Lines 50048 50027 -21
==========================================
- Hits 41347 41328 -19
+ Misses 8701 8699 -2
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=tree) | Coverage Δ | |
|---|---|---|
| [rust/arrow/src/json/reader.rs](https://codecov.io/gh/apache/arrow/pull/9014/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvanNvbi9yZWFkZXIucnM=) | `81.39% <71.42%> (-0.11%)` | :arrow_down: |
| [rust/arrow/src/compute/kernels/cast.rs](https://codecov.io/gh/apache/arrow/pull/9014/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2Nhc3QucnM=) | `96.99% <100.00%> (+0.16%)` | :arrow_up: |
| [rust/arrow/src/csv/reader.rs](https://codecov.io/gh/apache/arrow/pull/9014/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY3N2L3JlYWRlci5ycw==) | `94.48% <100.00%> (+0.15%)` | :arrow_up: |
| [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9014/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `95.24% <0.00%> (-0.20%)` | :arrow_down: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=footer). Last update [51672b2...52e1ecb](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io edited a comment on pull request #9014: ARROW-11035: [Rust] Improved performance of casting to utf8
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9014:
URL: https://github.com/apache/arrow/pull/9014#issuecomment-751302721
# [Codecov](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=h1) Report
> Merging [#9014](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=desc) (2efd7e5) into [master](https://codecov.io/gh/apache/arrow/commit/a4f7c4a2acda874b3d6eb2eb4c986e7c3267c755?el=desc) (a4f7c4a) will **increase** coverage by `0.00%`.
> The diff coverage is `100.00%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9014/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #9014 +/- ##
=======================================
Coverage 82.87% 82.87%
=======================================
Files 201 201
Lines 49739 49724 -15
=======================================
- Hits 41220 41209 -11
+ Misses 8519 8515 -4
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=tree) | Coverage Δ | |
|---|---|---|
| [rust/arrow/src/compute/kernels/cast.rs](https://codecov.io/gh/apache/arrow/pull/9014/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2Nhc3QucnM=) | `97.00% <100.00%> (+0.16%)` | :arrow_up: |
| [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9014/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `95.43% <0.00%> (+0.19%)` | :arrow_up: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=footer). Last update [a4f7c4a...5d3fc8e](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #9014: ARROW-11035: [Rust] Improved performance of casting to utf8
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9014:
URL: https://github.com/apache/arrow/pull/9014#issuecomment-751307172
https://issues.apache.org/jira/browse/ARROW-11035
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alamb commented on pull request #9014: ARROW-11035: [Rust] Improved performance of casting to utf8
Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9014:
URL: https://github.com/apache/arrow/pull/9014#issuecomment-752956722
The full set of Rust CI tests did not run on this PR :(
Can you please rebase this PR against [apache/master](https://github.com/apache/arrow) to pick up the changes in https://github.com/apache/arrow/pull/9056 so that they do?
I apologize for the inconvenience.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] Dandandan commented on a change in pull request #9014: ARROW-11035: [Rust] Improved performance of casting to utf8
Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9014:
URL: https://github.com/apache/arrow/pull/9014#discussion_r548944938
##########
File path: rust/arrow/src/compute/kernels/cast.rs
##########
@@ -895,31 +886,22 @@ where
FROM: ArrowNumericType,
FROM::Native: std::string::ToString,
{
- numeric_to_string_cast::<FROM>(
+ Ok(Arc::new(numeric_to_string_cast::<FROM>(
array
.as_any()
.downcast_ref::<PrimitiveArray<FROM>>()
.unwrap(),
- )
- .map(|to| Arc::new(to) as ArrayRef)
+ )))
}
-fn numeric_to_string_cast<T>(from: &PrimitiveArray<T>) -> Result<StringArray>
+fn numeric_to_string_cast<T>(from: &PrimitiveArray<T>) -> StringArray
where
T: ArrowPrimitiveType + ArrowNumericType,
T::Native: std::string::ToString,
{
- let mut b = StringBuilder::new(from.len());
-
- for i in 0..from.len() {
- if from.is_null(i) {
- b.append(false)?;
- } else {
- b.append_value(&from.value(i).to_string())?;
- }
- }
-
- Ok(b.finish())
+ from.iter()
+ .map(|maybe_value| maybe_value.map(|value| value.to_string()))
Review comment:
Note, we probably can use lexical like here later https://github.com/apache/arrow/pull/9010
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io edited a comment on pull request #9014: ARROW-11035: [Rust] Improved performance of casting to utf8
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9014:
URL: https://github.com/apache/arrow/pull/9014#issuecomment-751302721
# [Codecov](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=h1) Report
> Merging [#9014](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=desc) (6f9ca05) into [master](https://codecov.io/gh/apache/arrow/commit/a4f7c4a2acda874b3d6eb2eb4c986e7c3267c755?el=desc) (a4f7c4a) will **increase** coverage by `0.00%`.
> The diff coverage is `100.00%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9014/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #9014 +/- ##
=======================================
Coverage 82.87% 82.87%
=======================================
Files 201 201
Lines 49739 49728 -11
=======================================
- Hits 41220 41213 -7
+ Misses 8519 8515 -4
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=tree) | Coverage Δ | |
|---|---|---|
| [rust/arrow/src/compute/kernels/cast.rs](https://codecov.io/gh/apache/arrow/pull/9014/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2Nhc3QucnM=) | `97.00% <100.00%> (+0.16%)` | :arrow_up: |
| [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9014/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `95.43% <0.00%> (+0.19%)` | :arrow_up: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=footer). Last update [a4f7c4a...6f9ca05](https://codecov.io/gh/apache/arrow/pull/9014?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alamb closed pull request #9014: ARROW-11035: [Rust] Improved performance of casting to utf8
Posted by GitBox <gi...@apache.org>.
alamb closed pull request #9014:
URL: https://github.com/apache/arrow/pull/9014
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9014: ARROW-11035: [Rust] Improved performance of casting to utf8
Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9014:
URL: https://github.com/apache/arrow/pull/9014#discussion_r548946031
##########
File path: rust/arrow/src/compute/kernels/cast.rs
##########
@@ -351,17 +351,13 @@ pub fn cast(array: &ArrayRef, to_type: &DataType) -> Result<ArrayRef> {
Float32 => cast_bool_to_numeric::<Float32Type>(array),
Float64 => cast_bool_to_numeric::<Float64Type>(array),
Utf8 => {
- let from = array.as_any().downcast_ref::<BooleanArray>().unwrap();
- let mut b = StringBuilder::new(array.len());
- for i in 0..array.len() {
- if array.is_null(i) {
- b.append(false)?;
- } else {
- b.append_value(if from.value(i) { "1" } else { "0" })?;
- }
- }
-
- Ok(Arc::new(b.finish()) as ArrayRef)
+ let array = array.as_any().downcast_ref::<BooleanArray>().unwrap();
+ Ok(Arc::new(
+ array
+ .iter()
+ .map(|value| value.map(|value| if value { "1" } else { "0" }))
Review comment:
Good question. I suspect the builder, because the iterator does the same thing as before atm (i.e. same bound checks).
The builders IMO are inefficient atm. Since IMO they are less idiomatic, I do not see any issue in replacing them whenever we can ^_^
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org