You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/25 21:06:50 UTC

[GitHub] [arrow-rs] alippai opened a new pull request #353: Add append_slice for GenericStringBuilder

alippai opened a new pull request #353:
URL: https://github.com/apache/arrow-rs/pull/353


   # Which issue does this PR close?
   
   <!---
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #352.
   
    # Rationale for this change
   
   Developer experience and perhaps performance. 
    <!---
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.
   -->
   
   # What changes are included in this PR?
   
   A new function `append_slice(&mut self, value: &[u8])` is added to `GenericStringBuilder`
   
   <!---
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   # Are there any user-facing changes?
   Only the new function, it's a non-breaking addition.
   A test and simple docs are included.
   
   <!---
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!---
   If there are any breaking changes to public APIs, please add the `breaking change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alippai commented on pull request #353: Add append_slice for GenericStringBuilder

Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #353:
URL: https://github.com/apache/arrow-rs/pull/353#issuecomment-851701049


   @ritchie46 updated the API with correct name + unsafe, also added benchmark.
   ```
   bench_string/bench_string                                                                            
                           time:   [10.194 ms 10.259 ms 10.333 ms]
                           thrpt:  [1.1341 GiB/s 1.1423 GiB/s 1.1496 GiB/s]
   bench_string/bench_string_slice                                                                            
                           time:   [19.182 ms 19.303 ms 19.418 ms]
                           thrpt:  [617.98 MiB/s 621.68 MiB/s 625.59 MiB/s]
   bench_string/bench_string_slice_unchecked                                                                            
                           time:   [7.2051 ms 7.2217 ms 7.2388 ms]
                           thrpt:  [1.6189 GiB/s 1.6227 GiB/s 1.6265 GiB/s]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] ritchie46 commented on pull request #353: Add append_slice for GenericStringBuilder

Posted by GitBox <gi...@apache.org>.
ritchie46 commented on pull request #353:
URL: https://github.com/apache/arrow-rs/pull/353#issuecomment-849791863


   This is `unsafe` without an `unsafe` keyword. Rust assumes valid utf8 for all `str` and `String` values, consequently arrow assumes this as well.
   
   If you want to create a `&str` from a slice without checking validity, you should use [https://doc.rust-lang.org/std/str/fn.from_utf8_unchecked.html](https://doc.rust-lang.org/std/str/fn.from_utf8_unchecked.html) which requires unsafe code.
   
   That `&str` can then be used in `GenericStringBuilder::append`. I don't think you should add an `unsafe` API for something that's already in the std library.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alippai commented on pull request #353: Add append_slice for GenericStringBuilder

Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #353:
URL: https://github.com/apache/arrow-rs/pull/353#issuecomment-849821395


   @ritchie46 thanks, it makes sense. I'll rework it a little bit. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alippai commented on pull request #353: Add append_slice for GenericStringBuilder

Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #353:
URL: https://github.com/apache/arrow-rs/pull/353#issuecomment-851943386


   Checking with godbolt and criterion this is basically just a wrapper around `from_utf8` and `from_utf8_unchecked`. In light of this I'm closing this PR, maybe the docs can be extended if needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #353: Add append_slice for GenericStringBuilder

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #353:
URL: https://github.com/apache/arrow-rs/pull/353#issuecomment-848275425


   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/353?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#353](https://codecov.io/gh/apache/arrow-rs/pull/353?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (51b0c78) into [master](https://codecov.io/gh/apache/arrow-rs/commit/f41cb17066146552701bb7eb67bc13b2ef9ff1b6?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (f41cb17) will **increase** coverage by `0.01%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/353/graphs/tree.svg?width=650&height=150&src=pr&token=pq9V9qWZ1N&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/353?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #353      +/-   ##
   ==========================================
   + Coverage   82.61%   82.62%   +0.01%     
   ==========================================
     Files         162      162              
     Lines       44228    44245      +17     
   ==========================================
   + Hits        36538    36558      +20     
   + Misses       7690     7687       -3     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/353?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow-rs/pull/353/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2J1aWxkZXIucnM=) | `85.46% <100.00%> (+0.16%)` | :arrow_up: |
   | [parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow-rs/pull/353/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvZW5jb2RpbmdzL2VuY29kaW5nLnJz) | `95.04% <0.00%> (+0.19%)` | :arrow_up: |
   | [arrow/src/error.rs](https://codecov.io/gh/apache/arrow-rs/pull/353/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2Vycm9yLnJz) | `13.33% <0.00%> (+4.44%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/353?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/353?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [f41cb17...51b0c78](https://codecov.io/gh/apache/arrow-rs/pull/353?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] Dandandan commented on a change in pull request #353: Add append_slice for GenericStringBuilder

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #353:
URL: https://github.com/apache/arrow-rs/pull/353#discussion_r642837631



##########
File path: arrow/src/array/builder.rs
##########
@@ -1246,6 +1247,33 @@ impl<OffsetSize: StringOffsetSizeTrait> GenericStringBuilder<OffsetSize> {
         Ok(())
     }
 
+    /// Appends a UTF-8 slice into the builder.
+    ///
+    /// Automatically calls the `append` method to delimit the slice appended in as a
+    /// distinct array element.
+    #[inline]
+    pub fn append_slice(&mut self, value: &[u8]) -> Result<()> {
+        // Do not append invalid UTF-8
+        from_utf8(value).or(Err(ArrowError::InvalidArgumentError(

Review comment:
       I think this `or` should be using `or_else` here otherwise the error path is always evaluated




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alippai commented on pull request #353: Add append_slice for GenericStringBuilder

Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #353:
URL: https://github.com/apache/arrow-rs/pull/353#issuecomment-848702806


   Alternatively I can turn this into append_slice_unchecked and create an append_slice which adds `run_utf8_validation(...) -> Result<>` like seen [here](https://doc.rust-lang.org/src/core/str/converts.rs.html#86), but since this is a builder, I'd prefer keeping it this way and adding an utf8 validation kernel later. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] codecov-commenter commented on pull request #353: Add append_slice for GenericStringBuilder

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #353:
URL: https://github.com/apache/arrow-rs/pull/353#issuecomment-848275425


   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/353?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#353](https://codecov.io/gh/apache/arrow-rs/pull/353?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (48ff2e5) into [master](https://codecov.io/gh/apache/arrow-rs/commit/488f82663d507baef6eb75e7bbbb9708ced3f9a2?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (488f826) will **increase** coverage by `0.00%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/353/graphs/tree.svg?width=650&height=150&src=pr&token=pq9V9qWZ1N&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/353?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@           Coverage Diff           @@
   ##           master     #353   +/-   ##
   =======================================
     Coverage   82.56%   82.56%           
   =======================================
     Files         162      162           
     Lines       44063    44069    +6     
   =======================================
   + Hits        36379    36386    +7     
   + Misses       7684     7683    -1     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/353?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow-rs/pull/353/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2J1aWxkZXIucnM=) | `85.35% <100.00%> (+0.06%)` | :arrow_up: |
   | [arrow/src/array/transform/boolean.rs](https://codecov.io/gh/apache/arrow-rs/pull/353/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9ib29sZWFuLnJz) | `84.61% <0.00%> (+7.69%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/353?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/353?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [488f826...48ff2e5](https://codecov.io/gh/apache/arrow-rs/pull/353?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alippai edited a comment on pull request #353: Add append_slice for GenericStringBuilder

Posted by GitBox <gi...@apache.org>.
alippai edited a comment on pull request #353:
URL: https://github.com/apache/arrow-rs/pull/353#issuecomment-848702806


   Alternatively I can turn this into `append_slice_unchecked` and create an `append_slice(...) -> Result<>` which adds `run_utf8_validation` like seen [here](https://doc.rust-lang.org/src/core/str/converts.rs.html#86), but since this is a builder, I'd prefer keeping it this way and adding an utf8 validation kernel later.
   
   Related discussion: https://the-asf.slack.com/archives/C01QUFS30TD/p1622017000132700?thread_ts=1621973962.114400&cid=C01QUFS30TD


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alippai edited a comment on pull request #353: Add append_slice for GenericStringBuilder

Posted by GitBox <gi...@apache.org>.
alippai edited a comment on pull request #353:
URL: https://github.com/apache/arrow-rs/pull/353#issuecomment-848702806


   Alternatively I can turn this into append_slice_unchecked and create an append_slice which adds `run_utf8_validation(...) -> Result<>` like seen [here](https://doc.rust-lang.org/src/core/str/converts.rs.html#86), but since this is a builder, I'd prefer keeping it this way and adding an utf8 validation kernel later.
   
   Related discussion: https://the-asf.slack.com/archives/C01QUFS30TD/p1622017000132700?thread_ts=1621973962.114400&cid=C01QUFS30TD


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alippai edited a comment on pull request #353: Add append_slice for GenericStringBuilder

Posted by GitBox <gi...@apache.org>.
alippai edited a comment on pull request #353:
URL: https://github.com/apache/arrow-rs/pull/353#issuecomment-848702806


   Alternatively I can turn this into` append_slice_unchecked` and create an `append_slice(...) -> Result<>` which adds `run_utf8_validation` like seen [here](https://doc.rust-lang.org/src/core/str/converts.rs.html#86), but since this is a builder, I'd prefer keeping it this way and adding an utf8 validation kernel later.
   
   Related discussion: https://the-asf.slack.com/archives/C01QUFS30TD/p1622017000132700?thread_ts=1621973962.114400&cid=C01QUFS30TD


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alippai closed pull request #353: Add append_slice for GenericStringBuilder

Posted by GitBox <gi...@apache.org>.
alippai closed pull request #353:
URL: https://github.com/apache/arrow-rs/pull/353


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org