You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/27 03:39:47 UTC
[GitHub] [arrow-rs] HaoYang670 opened a new pull request, #2181: Let the `StringBuilder` use `BinaryBuilder`
HaoYang670 opened a new pull request, #2181:
URL: https://github.com/apache/arrow-rs/pull/2181
# Which issue does this PR close?
<!---
We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
-->
Closes #2156.
# Rationale for this change
Using less memory and faster.
## Benchmark
# What changes are included in this PR?
1. Update the String Builder to use Binary Builder
2. Add some methods to Binary Builder because we use them in the String Builder.
3. Update tests.
4. Add from<binary_array> for string array.
# Are there any user-facing changes?
Delete the pub method `StringBuilder::append(bool)`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] HaoYang670 commented on a diff in pull request #2181: Let the `StringBuilder` use `BinaryBuilder`
Posted by GitBox <gi...@apache.org>.
HaoYang670 commented on code in PR #2181:
URL: https://github.com/apache/arrow-rs/pull/2181#discussion_r930589443
##########
arrow/src/array/array_string.rs:
##########
@@ -385,12 +406,6 @@ pub type StringArray = GenericStringArray<i32>;
/// ```
pub type LargeStringArray = GenericStringArray<i64>;
-impl<T: OffsetSizeTrait> From<GenericListArray<T>> for GenericStringArray<T> {
- fn from(v: GenericListArray<T>) -> Self {
Review Comment:
Move, not remove
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] ursabot commented on pull request #2181: Let the `StringBuilder` use `BinaryBuilder`
Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #2181:
URL: https://github.com/apache/arrow-rs/pull/2181#issuecomment-1197932103
Benchmark runs are scheduled for baseline = 82e0512b08b11fc03f9366c22d3a747d9bd76663 and contender = 9e47779e3f4667e676d00db108e725d576930fcc. 9e47779e3f4667e676d00db108e725d576930fcc is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/29af5a40fd6c4ca08fea670786048276...64cdd9788f38439eb8d945aed3bc8305/)
[Skipped :warning: Benchmarking of arrow-rs-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/815b4cdecf97437fa14290be84137206...63c94203325b44338529f41ce6d64450/)
[Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/bf43d89fb2774405b3b532faef966401...c3615623cfcd451f8c9ba57e3fc8e333/)
[Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/2be2d79db43148ae9ae4833b1cd17e5c...f546a54a0b6e414ebd3e7df68ca8e0ac/)
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] HaoYang670 commented on a diff in pull request #2181: Let the `StringBuilder` use `BinaryBuilder`
Posted by GitBox <gi...@apache.org>.
HaoYang670 commented on code in PR #2181:
URL: https://github.com/apache/arrow-rs/pull/2181#discussion_r931666996
##########
arrow/src/array/array_string.rs:
##########
@@ -313,6 +313,27 @@ impl<'a, OffsetSize: OffsetSizeTrait> ArrayAccessor
}
}
+impl<OffsetSize: OffsetSizeTrait> From<GenericListArray<OffsetSize>>
+ for GenericStringArray<OffsetSize>
+{
+ fn from(v: GenericListArray<OffsetSize>) -> Self {
+ GenericStringArray::<OffsetSize>::from_list(v)
+ }
+}
+
+impl<OffsetSize: OffsetSizeTrait> From<GenericBinaryArray<OffsetSize>>
+ for GenericStringArray<OffsetSize>
+{
+ fn from(v: GenericBinaryArray<OffsetSize>) -> Self {
Review Comment:
Filed #2205 to track this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] codecov-commenter commented on pull request #2181: Let the `StringBuilder` use `BinaryBuilder`
Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #2181:
URL: https://github.com/apache/arrow-rs/pull/2181#issuecomment-1196247049
# [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/2181?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#2181](https://codecov.io/gh/apache/arrow-rs/pull/2181?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (7c843ca) into [master](https://codecov.io/gh/apache/arrow-rs/commit/37dd03756953d92bd303549fb0f7610a6d3c5c56?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (37dd037) will **decrease** coverage by `0.49%`.
> The diff coverage is `47.29%`.
```diff
@@ Coverage Diff @@
## master #2181 +/- ##
==========================================
- Coverage 82.87% 82.38% -0.50%
==========================================
Files 237 239 +2
Lines 61465 62096 +631
==========================================
+ Hits 50940 51155 +215
- Misses 10525 10941 +416
```
| [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/2181?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [object\_store/src/aws.rs](https://codecov.io/gh/apache/arrow-rs/pull/2181/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-b2JqZWN0X3N0b3JlL3NyYy9hd3MucnM=) | `0.00% <0.00%> (ø)` | |
| [object\_store/src/azure.rs](https://codecov.io/gh/apache/arrow-rs/pull/2181/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-b2JqZWN0X3N0b3JlL3NyYy9henVyZS5ycw==) | `0.00% <0.00%> (ø)` | |
| [object\_store/src/gcp.rs](https://codecov.io/gh/apache/arrow-rs/pull/2181/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-b2JqZWN0X3N0b3JlL3NyYy9nY3AucnM=) | `0.00% <0.00%> (ø)` | |
| [object\_store/src/multipart.rs](https://codecov.io/gh/apache/arrow-rs/pull/2181/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-b2JqZWN0X3N0b3JlL3NyYy9tdWx0aXBhcnQucnM=) | `0.00% <0.00%> (ø)` | |
| [object\_store/src/throttle.rs](https://codecov.io/gh/apache/arrow-rs/pull/2181/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-b2JqZWN0X3N0b3JlL3NyYy90aHJvdHRsZS5ycw==) | `94.63% <0.00%> (-1.89%)` | :arrow_down: |
| [parquet/src/arrow/async\_reader.rs](https://codecov.io/gh/apache/arrow-rs/pull/2181/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvYXJyb3cvYXN5bmNfcmVhZGVyLnJz) | `0.00% <0.00%> (ø)` | |
| [parquet/src/column/page.rs](https://codecov.io/gh/apache/arrow-rs/pull/2181/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvY29sdW1uL3BhZ2UucnM=) | `83.33% <0.00%> (-15.36%)` | :arrow_down: |
| [arrow/src/array/builder/boolean\_builder.rs](https://codecov.io/gh/apache/arrow-rs/pull/2181/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2J1aWxkZXIvYm9vbGVhbl9idWlsZGVyLnJz) | `84.26% <66.66%> (-2.47%)` | :arrow_down: |
| [arrow/src/array/array\_string.rs](https://codecov.io/gh/apache/arrow-rs/pull/2181/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2FycmF5X3N0cmluZy5ycw==) | `92.05% <71.42%> (-5.01%)` | :arrow_down: |
| [object\_store/src/local.rs](https://codecov.io/gh/apache/arrow-rs/pull/2181/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-b2JqZWN0X3N0b3JlL3NyYy9sb2NhbC5ycw==) | `79.47% <72.78%> (-3.42%)` | :arrow_down: |
| ... and [21 more](https://codecov.io/gh/apache/arrow-rs/pull/2181/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] tustvold merged pull request #2181: Let the `StringBuilder` use `BinaryBuilder`
Posted by GitBox <gi...@apache.org>.
tustvold merged PR #2181:
URL: https://github.com/apache/arrow-rs/pull/2181
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] HaoYang670 commented on a diff in pull request #2181: Let the `StringBuilder` use `BinaryBuilder`
Posted by GitBox <gi...@apache.org>.
HaoYang670 commented on code in PR #2181:
URL: https://github.com/apache/arrow-rs/pull/2181#discussion_r930862409
##########
arrow/src/array/array_string.rs:
##########
@@ -313,6 +313,27 @@ impl<'a, OffsetSize: OffsetSizeTrait> ArrayAccessor
}
}
+impl<OffsetSize: OffsetSizeTrait> From<GenericListArray<OffsetSize>>
+ for GenericStringArray<OffsetSize>
+{
+ fn from(v: GenericListArray<OffsetSize>) -> Self {
+ GenericStringArray::<OffsetSize>::from_list(v)
+ }
+}
+
+impl<OffsetSize: OffsetSizeTrait> From<GenericBinaryArray<OffsetSize>>
+ for GenericStringArray<OffsetSize>
+{
+ fn from(v: GenericBinaryArray<OffsetSize>) -> Self {
Review Comment:
You are right. But this should not be done in this PR. Because it follows the style of the current`StringArray::from_list`(https://github.com/apache/arrow-rs/blob/master/arrow/src/array/array_string.rs#L122-L143) which is also unsafe.
We could file a follow-up issue to track this. Maybe implementing both `safe` and `unsafe` styles is a good choice:
```rust
impl StringArray {
unsafe fn from_list_unchecked(...) {...}
unsafe fn from_binary_unchecked (...) {...}
}
impl From<ListArray> for StringArray {
/// safe method with utf-8 checking
fn from(...) {...}
}
impl From<BinaryArray> for StringArray {
/// safe method with utf-8 checking
fn from(...) {...}
}
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2181: Let the `StringBuilder` use `BinaryBuilder`
Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2181:
URL: https://github.com/apache/arrow-rs/pull/2181#discussion_r930782833
##########
arrow/src/array/array_string.rs:
##########
@@ -313,6 +313,27 @@ impl<'a, OffsetSize: OffsetSizeTrait> ArrayAccessor
}
}
+impl<OffsetSize: OffsetSizeTrait> From<GenericListArray<OffsetSize>>
+ for GenericStringArray<OffsetSize>
+{
+ fn from(v: GenericListArray<OffsetSize>) -> Self {
+ GenericStringArray::<OffsetSize>::from_list(v)
+ }
+}
+
+impl<OffsetSize: OffsetSizeTrait> From<GenericBinaryArray<OffsetSize>>
+ for GenericStringArray<OffsetSize>
+{
+ fn from(v: GenericBinaryArray<OffsetSize>) -> Self {
Review Comment:
This would allow non-utf8 data within a StringArray using safe APIs, which would break our safety guarantees
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org