You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/27 03:39:47 UTC

[GitHub] [arrow-rs] HaoYang670 opened a new pull request, #2181: Let the `StringBuilder` use `BinaryBuilder`

HaoYang670 opened a new pull request, #2181:
URL: https://github.com/apache/arrow-rs/pull/2181

   # Which issue does this PR close?
   
   <!---
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #2156.
   
   # Rationale for this change
   Using less memory and faster.
   
   ## Benchmark
   
   # What changes are included in this PR?
   1. Update the String Builder to use Binary Builder
   2. Add some methods to Binary Builder because we use them in the String Builder.
   3. Update tests.
   4. Add from<binary_array> for string array.
   
   # Are there any user-facing changes?
   Delete the pub method `StringBuilder::append(bool)`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] HaoYang670 commented on a diff in pull request #2181: Let the `StringBuilder` use `BinaryBuilder`

Posted by GitBox <gi...@apache.org>.
HaoYang670 commented on code in PR #2181:
URL: https://github.com/apache/arrow-rs/pull/2181#discussion_r930589443


##########
arrow/src/array/array_string.rs:
##########
@@ -385,12 +406,6 @@ pub type StringArray = GenericStringArray<i32>;
 /// ```
 pub type LargeStringArray = GenericStringArray<i64>;
 
-impl<T: OffsetSizeTrait> From<GenericListArray<T>> for GenericStringArray<T> {
-    fn from(v: GenericListArray<T>) -> Self {

Review Comment:
   Move, not remove



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ursabot commented on pull request #2181: Let the `StringBuilder` use `BinaryBuilder`

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #2181:
URL: https://github.com/apache/arrow-rs/pull/2181#issuecomment-1197932103

   Benchmark runs are scheduled for baseline = 82e0512b08b11fc03f9366c22d3a747d9bd76663 and contender = 9e47779e3f4667e676d00db108e725d576930fcc. 9e47779e3f4667e676d00db108e725d576930fcc is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/29af5a40fd6c4ca08fea670786048276...64cdd9788f38439eb8d945aed3bc8305/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/815b4cdecf97437fa14290be84137206...63c94203325b44338529f41ce6d64450/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/bf43d89fb2774405b3b532faef966401...c3615623cfcd451f8c9ba57e3fc8e333/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/2be2d79db43148ae9ae4833b1cd17e5c...f546a54a0b6e414ebd3e7df68ca8e0ac/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] HaoYang670 commented on a diff in pull request #2181: Let the `StringBuilder` use `BinaryBuilder`

Posted by GitBox <gi...@apache.org>.
HaoYang670 commented on code in PR #2181:
URL: https://github.com/apache/arrow-rs/pull/2181#discussion_r931666996


##########
arrow/src/array/array_string.rs:
##########
@@ -313,6 +313,27 @@ impl<'a, OffsetSize: OffsetSizeTrait> ArrayAccessor
     }
 }
 
+impl<OffsetSize: OffsetSizeTrait> From<GenericListArray<OffsetSize>>
+    for GenericStringArray<OffsetSize>
+{
+    fn from(v: GenericListArray<OffsetSize>) -> Self {
+        GenericStringArray::<OffsetSize>::from_list(v)
+    }
+}
+
+impl<OffsetSize: OffsetSizeTrait> From<GenericBinaryArray<OffsetSize>>
+    for GenericStringArray<OffsetSize>
+{
+    fn from(v: GenericBinaryArray<OffsetSize>) -> Self {

Review Comment:
   Filed #2205 to track this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] codecov-commenter commented on pull request #2181: Let the `StringBuilder` use `BinaryBuilder`

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #2181:
URL: https://github.com/apache/arrow-rs/pull/2181#issuecomment-1196247049

   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/2181?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#2181](https://codecov.io/gh/apache/arrow-rs/pull/2181?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (7c843ca) into [master](https://codecov.io/gh/apache/arrow-rs/commit/37dd03756953d92bd303549fb0f7610a6d3c5c56?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (37dd037) will **decrease** coverage by `0.49%`.
   > The diff coverage is `47.29%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #2181      +/-   ##
   ==========================================
   - Coverage   82.87%   82.38%   -0.50%     
   ==========================================
     Files         237      239       +2     
     Lines       61465    62096     +631     
   ==========================================
   + Hits        50940    51155     +215     
   - Misses      10525    10941     +416     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/2181?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [object\_store/src/aws.rs](https://codecov.io/gh/apache/arrow-rs/pull/2181/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-b2JqZWN0X3N0b3JlL3NyYy9hd3MucnM=) | `0.00% <0.00%> (ø)` | |
   | [object\_store/src/azure.rs](https://codecov.io/gh/apache/arrow-rs/pull/2181/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-b2JqZWN0X3N0b3JlL3NyYy9henVyZS5ycw==) | `0.00% <0.00%> (ø)` | |
   | [object\_store/src/gcp.rs](https://codecov.io/gh/apache/arrow-rs/pull/2181/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-b2JqZWN0X3N0b3JlL3NyYy9nY3AucnM=) | `0.00% <0.00%> (ø)` | |
   | [object\_store/src/multipart.rs](https://codecov.io/gh/apache/arrow-rs/pull/2181/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-b2JqZWN0X3N0b3JlL3NyYy9tdWx0aXBhcnQucnM=) | `0.00% <0.00%> (ø)` | |
   | [object\_store/src/throttle.rs](https://codecov.io/gh/apache/arrow-rs/pull/2181/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-b2JqZWN0X3N0b3JlL3NyYy90aHJvdHRsZS5ycw==) | `94.63% <0.00%> (-1.89%)` | :arrow_down: |
   | [parquet/src/arrow/async\_reader.rs](https://codecov.io/gh/apache/arrow-rs/pull/2181/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvYXJyb3cvYXN5bmNfcmVhZGVyLnJz) | `0.00% <0.00%> (ø)` | |
   | [parquet/src/column/page.rs](https://codecov.io/gh/apache/arrow-rs/pull/2181/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvY29sdW1uL3BhZ2UucnM=) | `83.33% <0.00%> (-15.36%)` | :arrow_down: |
   | [arrow/src/array/builder/boolean\_builder.rs](https://codecov.io/gh/apache/arrow-rs/pull/2181/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2J1aWxkZXIvYm9vbGVhbl9idWlsZGVyLnJz) | `84.26% <66.66%> (-2.47%)` | :arrow_down: |
   | [arrow/src/array/array\_string.rs](https://codecov.io/gh/apache/arrow-rs/pull/2181/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2FycmF5X3N0cmluZy5ycw==) | `92.05% <71.42%> (-5.01%)` | :arrow_down: |
   | [object\_store/src/local.rs](https://codecov.io/gh/apache/arrow-rs/pull/2181/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-b2JqZWN0X3N0b3JlL3NyYy9sb2NhbC5ycw==) | `79.47% <72.78%> (-3.42%)` | :arrow_down: |
   | ... and [21 more](https://codecov.io/gh/apache/arrow-rs/pull/2181/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold merged pull request #2181: Let the `StringBuilder` use `BinaryBuilder`

Posted by GitBox <gi...@apache.org>.
tustvold merged PR #2181:
URL: https://github.com/apache/arrow-rs/pull/2181


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] HaoYang670 commented on a diff in pull request #2181: Let the `StringBuilder` use `BinaryBuilder`

Posted by GitBox <gi...@apache.org>.
HaoYang670 commented on code in PR #2181:
URL: https://github.com/apache/arrow-rs/pull/2181#discussion_r930862409


##########
arrow/src/array/array_string.rs:
##########
@@ -313,6 +313,27 @@ impl<'a, OffsetSize: OffsetSizeTrait> ArrayAccessor
     }
 }
 
+impl<OffsetSize: OffsetSizeTrait> From<GenericListArray<OffsetSize>>
+    for GenericStringArray<OffsetSize>
+{
+    fn from(v: GenericListArray<OffsetSize>) -> Self {
+        GenericStringArray::<OffsetSize>::from_list(v)
+    }
+}
+
+impl<OffsetSize: OffsetSizeTrait> From<GenericBinaryArray<OffsetSize>>
+    for GenericStringArray<OffsetSize>
+{
+    fn from(v: GenericBinaryArray<OffsetSize>) -> Self {

Review Comment:
   You are right. But this should not be done in this PR. Because it follows the style of the current`StringArray::from_list`(https://github.com/apache/arrow-rs/blob/master/arrow/src/array/array_string.rs#L122-L143) which is also unsafe.
   
   We could file a follow-up issue to track this. Maybe implementing both `safe` and `unsafe` styles is a good choice:
   ```rust
   impl StringArray {
       unsafe fn from_list_unchecked(...) {...}
       unsafe fn from_binary_unchecked (...) {...}
   }
   
   impl From<ListArray> for StringArray {
       /// safe method with utf-8 checking
       fn from(...) {...}
   }
   
   impl From<BinaryArray> for StringArray {
       /// safe method with utf-8 checking
       fn from(...) {...}
   }



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2181: Let the `StringBuilder` use `BinaryBuilder`

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2181:
URL: https://github.com/apache/arrow-rs/pull/2181#discussion_r930782833


##########
arrow/src/array/array_string.rs:
##########
@@ -313,6 +313,27 @@ impl<'a, OffsetSize: OffsetSizeTrait> ArrayAccessor
     }
 }
 
+impl<OffsetSize: OffsetSizeTrait> From<GenericListArray<OffsetSize>>
+    for GenericStringArray<OffsetSize>
+{
+    fn from(v: GenericListArray<OffsetSize>) -> Self {
+        GenericStringArray::<OffsetSize>::from_list(v)
+    }
+}
+
+impl<OffsetSize: OffsetSizeTrait> From<GenericBinaryArray<OffsetSize>>
+    for GenericStringArray<OffsetSize>
+{
+    fn from(v: GenericBinaryArray<OffsetSize>) -> Self {

Review Comment:
   This would allow non-utf8 data within a StringArray using safe APIs, which would break our safety guarantees



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org