You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/23 02:06:40 UTC

[GitHub] [arrow-rs] HaoYang670 opened a new pull request, #1608: Add `substring` support for binary

HaoYang670 opened a new pull request, #1608:
URL: https://github.com/apache/arrow-rs/pull/1608

   # Which issue does this PR close?
   
   <!---
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #1593 .
   
   # Rationale for this change
   
   
   # What changes are included in this PR?
   1. Add substring support for (Large)BinaryArray
   2. Add some tests
   3. fix a bug in the test of string array
   4. update doc
   
   # Are there any user-facing changes?
   Update some docs
   
   <!---
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!---
   If there are any breaking changes to public APIs, please add the `breaking change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] HaoYang670 commented on a diff in pull request #1608: Add `substring` support for binary

Posted by GitBox <gi...@apache.org>.
HaoYang670 commented on code in PR #1608:
URL: https://github.com/apache/arrow-rs/pull/1608#discussion_r856778546


##########
arrow/src/compute/kernels/substring.rs:
##########
@@ -291,11 +575,14 @@ mod tests {
 
         cases.into_iter().try_for_each::<_, Result<()>>(
             |(array, start, length, expected)| {
-                let array = StringArray::from(array);

Review Comment:
   We did not cover `LargeStringArray` in the past. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on a diff in pull request #1608: Add `substring` support for binary

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #1608:
URL: https://github.com/apache/arrow-rs/pull/1608#discussion_r857527231


##########
arrow/src/compute/kernels/substring.rs:
##########
@@ -25,7 +26,68 @@ use crate::{
 };
 use std::cmp::Ordering;
 
-fn generic_substring<OffsetSize: StringOffsetSizeTrait>(
+fn binary_substring<OffsetSize: BinaryOffsetSizeTrait>(

Review Comment:
   Rather than replicate quite so much code, I wonder if it would be possible to make `generic_substring` take a function to check char boundaries and and then pass in a function that does nothing
   
   Maybe like(untested):
   
   ```rust
   fn generic_substring<OffsetSize, F>(
       array: &GenericBinaryArray<OffsetSize>,
       start: OffsetSize,
       length: Option<OffsetSize>,
       check_char_boundary: F,
   )
   where 
     OffsetSize: StringOffsetSizeTrait,
     F: Fn(OffsetSize) -> Result<OffsetSize>
   {
   ...
   }



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb merged pull request #1608: Add `substring` support for binary

Posted by GitBox <gi...@apache.org>.
alamb merged PR #1608:
URL: https://github.com/apache/arrow-rs/pull/1608


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on a diff in pull request #1608: Add `substring` support for binary

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #1608:
URL: https://github.com/apache/arrow-rs/pull/1608#discussion_r857609305


##########
arrow/src/compute/kernels/substring.rs:
##########
@@ -25,7 +26,68 @@ use crate::{
 };
 use std::cmp::Ordering;
 
-fn generic_substring<OffsetSize: StringOffsetSizeTrait>(
+fn binary_substring<OffsetSize: BinaryOffsetSizeTrait>(

Review Comment:
   Got it -- makes sense -- duplication makes sense to me then



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] HaoYang670 commented on a diff in pull request #1608: Add `substring` support for binary

Posted by GitBox <gi...@apache.org>.
HaoYang670 commented on code in PR #1608:
URL: https://github.com/apache/arrow-rs/pull/1608#discussion_r857554322


##########
arrow/src/compute/kernels/substring.rs:
##########
@@ -25,7 +26,68 @@ use crate::{
 };
 use std::cmp::Ordering;
 
-fn generic_substring<OffsetSize: StringOffsetSizeTrait>(
+fn binary_substring<OffsetSize: BinaryOffsetSizeTrait>(

Review Comment:
   I am afraid that `BinaryArray` and `StringArray` can not share one API because `GenericBinaryArray` and `GenericStringArray` are two different types. 
   Maybe we could use **macro** to extract some common codes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #1608: Add `substring` support for binary

Posted by GitBox <gi...@apache.org>.
viirya commented on code in PR #1608:
URL: https://github.com/apache/arrow-rs/pull/1608#discussion_r857837664


##########
arrow/src/compute/kernels/substring.rs:
##########
@@ -168,8 +246,214 @@ pub fn substring(array: &dyn Array, start: i64, length: Option<u64>) -> Result<A
 mod tests {
     use super::*;
 
-    fn with_nulls<T: 'static + Array + PartialEq + From<Vec<Option<&'static str>>>>(
-    ) -> Result<()> {
+    #[allow(clippy::type_complexity)]
+    fn with_nulls_generic_binary<O: BinaryOffsetSizeTrait>() -> Result<()> {

Review Comment:
   I know this follows string version `with_nulls`, just wondering why in `with_nulls_...` only edge cases are tested against. `without_nulls_...` has normal test cases but it doesn't include nulls.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] HaoYang670 commented on a diff in pull request #1608: Add `substring` support for binary

Posted by GitBox <gi...@apache.org>.
HaoYang670 commented on code in PR #1608:
URL: https://github.com/apache/arrow-rs/pull/1608#discussion_r856785189


##########
arrow/src/compute/kernels/substring.rs:
##########
@@ -25,7 +26,68 @@ use crate::{
 };
 use std::cmp::Ordering;
 
-fn generic_substring<OffsetSize: StringOffsetSizeTrait>(
+fn binary_substring<OffsetSize: BinaryOffsetSizeTrait>(

Review Comment:
   The implementation is similar to `utf8_substring` expect that there is no char boundary checking.



##########
arrow/src/compute/kernels/substring.rs:
##########
@@ -25,7 +26,68 @@ use crate::{
 };
 use std::cmp::Ordering;
 
-fn generic_substring<OffsetSize: StringOffsetSizeTrait>(
+fn binary_substring<OffsetSize: BinaryOffsetSizeTrait>(

Review Comment:
   The implementation is similar to `utf8_substring` except that there is no char boundary checking.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] codecov-commenter commented on pull request #1608: Add `substring` support for binary

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #1608:
URL: https://github.com/apache/arrow-rs/pull/1608#issuecomment-1107298299

   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1608?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#1608](https://codecov.io/gh/apache/arrow-rs/pull/1608?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c503c53) into [master](https://codecov.io/gh/apache/arrow-rs/commit/fd9cb23f12ccfbf7422df5535c2602a925cc89dd?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (fd9cb23) will **increase** coverage by `0.04%`.
   > The diff coverage is `100.00%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #1608      +/-   ##
   ==========================================
   + Coverage   82.95%   83.00%   +0.04%     
   ==========================================
     Files         193      193              
     Lines       55435    55571     +136     
   ==========================================
   + Hits        45988    46124     +136     
     Misses       9447     9447              
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/1608?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [arrow/src/compute/kernels/substring.rs](https://codecov.io/gh/apache/arrow-rs/pull/1608/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2NvbXB1dGUva2VybmVscy9zdWJzdHJpbmcucnM=) | `100.00% <100.00%> (ø)` | |
   | [arrow/src/datatypes/datatype.rs](https://codecov.io/gh/apache/arrow-rs/pull/1608/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2RhdGF0eXBlcy9kYXRhdHlwZS5ycw==) | `66.40% <0.00%> (-0.40%)` | :arrow_down: |
   | [parquet\_derive/src/parquet\_field.rs](https://codecov.io/gh/apache/arrow-rs/pull/1608/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldF9kZXJpdmUvc3JjL3BhcnF1ZXRfZmllbGQucnM=) | `66.21% <0.00%> (+0.22%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1608?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1608?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [fd9cb23...c503c53](https://codecov.io/gh/apache/arrow-rs/pull/1608?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org