You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/23 02:06:40 UTC
[GitHub] [arrow-rs] HaoYang670 opened a new pull request, #1608: Add `substring` support for binary
HaoYang670 opened a new pull request, #1608:
URL: https://github.com/apache/arrow-rs/pull/1608
# Which issue does this PR close?
<!---
We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
-->
Closes #1593 .
# Rationale for this change
# What changes are included in this PR?
1. Add substring support for (Large)BinaryArray
2. Add some tests
3. fix a bug in the test of string array
4. update doc
# Are there any user-facing changes?
Update some docs
<!---
If there are user-facing changes then we may require documentation to be updated before approving the PR.
-->
<!---
If there are any breaking changes to public APIs, please add the `breaking change` label.
-->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] HaoYang670 commented on a diff in pull request #1608: Add `substring` support for binary
Posted by GitBox <gi...@apache.org>.
HaoYang670 commented on code in PR #1608:
URL: https://github.com/apache/arrow-rs/pull/1608#discussion_r856778546
##########
arrow/src/compute/kernels/substring.rs:
##########
@@ -291,11 +575,14 @@ mod tests {
cases.into_iter().try_for_each::<_, Result<()>>(
|(array, start, length, expected)| {
- let array = StringArray::from(array);
Review Comment:
We did not cover `LargeStringArray` in the past.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] alamb commented on a diff in pull request #1608: Add `substring` support for binary
Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #1608:
URL: https://github.com/apache/arrow-rs/pull/1608#discussion_r857527231
##########
arrow/src/compute/kernels/substring.rs:
##########
@@ -25,7 +26,68 @@ use crate::{
};
use std::cmp::Ordering;
-fn generic_substring<OffsetSize: StringOffsetSizeTrait>(
+fn binary_substring<OffsetSize: BinaryOffsetSizeTrait>(
Review Comment:
Rather than replicate quite so much code, I wonder if it would be possible to make `generic_substring` take a function to check char boundaries and and then pass in a function that does nothing
Maybe like(untested):
```rust
fn generic_substring<OffsetSize, F>(
array: &GenericBinaryArray<OffsetSize>,
start: OffsetSize,
length: Option<OffsetSize>,
check_char_boundary: F,
)
where
OffsetSize: StringOffsetSizeTrait,
F: Fn(OffsetSize) -> Result<OffsetSize>
{
...
}
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] alamb merged pull request #1608: Add `substring` support for binary
Posted by GitBox <gi...@apache.org>.
alamb merged PR #1608:
URL: https://github.com/apache/arrow-rs/pull/1608
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] alamb commented on a diff in pull request #1608: Add `substring` support for binary
Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #1608:
URL: https://github.com/apache/arrow-rs/pull/1608#discussion_r857609305
##########
arrow/src/compute/kernels/substring.rs:
##########
@@ -25,7 +26,68 @@ use crate::{
};
use std::cmp::Ordering;
-fn generic_substring<OffsetSize: StringOffsetSizeTrait>(
+fn binary_substring<OffsetSize: BinaryOffsetSizeTrait>(
Review Comment:
Got it -- makes sense -- duplication makes sense to me then
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] HaoYang670 commented on a diff in pull request #1608: Add `substring` support for binary
Posted by GitBox <gi...@apache.org>.
HaoYang670 commented on code in PR #1608:
URL: https://github.com/apache/arrow-rs/pull/1608#discussion_r857554322
##########
arrow/src/compute/kernels/substring.rs:
##########
@@ -25,7 +26,68 @@ use crate::{
};
use std::cmp::Ordering;
-fn generic_substring<OffsetSize: StringOffsetSizeTrait>(
+fn binary_substring<OffsetSize: BinaryOffsetSizeTrait>(
Review Comment:
I am afraid that `BinaryArray` and `StringArray` can not share one API because `GenericBinaryArray` and `GenericStringArray` are two different types.
Maybe we could use **macro** to extract some common codes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] viirya commented on a diff in pull request #1608: Add `substring` support for binary
Posted by GitBox <gi...@apache.org>.
viirya commented on code in PR #1608:
URL: https://github.com/apache/arrow-rs/pull/1608#discussion_r857837664
##########
arrow/src/compute/kernels/substring.rs:
##########
@@ -168,8 +246,214 @@ pub fn substring(array: &dyn Array, start: i64, length: Option<u64>) -> Result<A
mod tests {
use super::*;
- fn with_nulls<T: 'static + Array + PartialEq + From<Vec<Option<&'static str>>>>(
- ) -> Result<()> {
+ #[allow(clippy::type_complexity)]
+ fn with_nulls_generic_binary<O: BinaryOffsetSizeTrait>() -> Result<()> {
Review Comment:
I know this follows string version `with_nulls`, just wondering why in `with_nulls_...` only edge cases are tested against. `without_nulls_...` has normal test cases but it doesn't include nulls.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] HaoYang670 commented on a diff in pull request #1608: Add `substring` support for binary
Posted by GitBox <gi...@apache.org>.
HaoYang670 commented on code in PR #1608:
URL: https://github.com/apache/arrow-rs/pull/1608#discussion_r856785189
##########
arrow/src/compute/kernels/substring.rs:
##########
@@ -25,7 +26,68 @@ use crate::{
};
use std::cmp::Ordering;
-fn generic_substring<OffsetSize: StringOffsetSizeTrait>(
+fn binary_substring<OffsetSize: BinaryOffsetSizeTrait>(
Review Comment:
The implementation is similar to `utf8_substring` expect that there is no char boundary checking.
##########
arrow/src/compute/kernels/substring.rs:
##########
@@ -25,7 +26,68 @@ use crate::{
};
use std::cmp::Ordering;
-fn generic_substring<OffsetSize: StringOffsetSizeTrait>(
+fn binary_substring<OffsetSize: BinaryOffsetSizeTrait>(
Review Comment:
The implementation is similar to `utf8_substring` except that there is no char boundary checking.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] codecov-commenter commented on pull request #1608: Add `substring` support for binary
Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #1608:
URL: https://github.com/apache/arrow-rs/pull/1608#issuecomment-1107298299
# [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1608?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#1608](https://codecov.io/gh/apache/arrow-rs/pull/1608?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c503c53) into [master](https://codecov.io/gh/apache/arrow-rs/commit/fd9cb23f12ccfbf7422df5535c2602a925cc89dd?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (fd9cb23) will **increase** coverage by `0.04%`.
> The diff coverage is `100.00%`.
```diff
@@ Coverage Diff @@
## master #1608 +/- ##
==========================================
+ Coverage 82.95% 83.00% +0.04%
==========================================
Files 193 193
Lines 55435 55571 +136
==========================================
+ Hits 45988 46124 +136
Misses 9447 9447
```
| [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/1608?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [arrow/src/compute/kernels/substring.rs](https://codecov.io/gh/apache/arrow-rs/pull/1608/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2NvbXB1dGUva2VybmVscy9zdWJzdHJpbmcucnM=) | `100.00% <100.00%> (ø)` | |
| [arrow/src/datatypes/datatype.rs](https://codecov.io/gh/apache/arrow-rs/pull/1608/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2RhdGF0eXBlcy9kYXRhdHlwZS5ycw==) | `66.40% <0.00%> (-0.40%)` | :arrow_down: |
| [parquet\_derive/src/parquet\_field.rs](https://codecov.io/gh/apache/arrow-rs/pull/1608/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldF9kZXJpdmUvc3JjL3BhcnF1ZXRfZmllbGQucnM=) | `66.21% <0.00%> (+0.22%)` | :arrow_up: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1608?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1608?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [fd9cb23...c503c53](https://codecov.io/gh/apache/arrow-rs/pull/1608?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org