You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "sgilmore10 (via GitHub)" <gi...@apache.org> on 2023/06/29 13:33:57 UTC

[GitHub] [arrow] sgilmore10 opened a new pull request, #36383: GH-36173: [C++] Add lone high and low code-point test case for UTF8StringToUTF16

sgilmore10 opened a new pull request, #36383:
URL: https://github.com/apache/arrow/pull/36383

   <!--
   Thanks for opening a pull request!
   If this is your first pull request you can find detailed information on how 
   to contribute here:
     * [New Contributor's Guide](https://arrow.apache.org/docs/dev/developers/guide/step_by_step/pr_lifecycle.html#reviews-and-merge-of-the-pull-request)
     * [Contributing Overview](https://arrow.apache.org/docs/dev/developers/overview.html)
   
   
   If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose
   
   Opening GitHub issues ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project.
   
   Then could you also rename the pull request title in the following format?
   
       GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   or
   
       MINOR: [${COMPONENT}] ${SUMMARY}
   
   In the case of PARQUET issues on JIRA the title also supports:
   
       PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   -->
   
   ### Rationale for this change
   
   This is a followup PR to #36167 that addresses feedback left after the PR was merged.
   
   ### What changes are included in this PR?
   
   1. Added a test point verifying `UTF8StringToUTF16` returns an `Invalid` status if given a UTF-8 encoded string that contains a lone high or low code point.
   2. Removed `ARROW_EXPORT` from definitions of `UTF8StringToUTF16` and `UTF16StringToUTF18`.
   
   
   ### Are these changes tested?
   
   Yes.
   
   ### Are there any user-facing changes?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] sgilmore10 commented on a diff in pull request #36383: GH-36173: [C++] Add lone high and low code-point test case for UTF8StringToUTF16

Posted by "sgilmore10 (via GitHub)" <gi...@apache.org>.
sgilmore10 commented on code in PR #36383:
URL: https://github.com/apache/arrow/pull/36383#discussion_r1246839563


##########
cpp/src/arrow/util/utf8_util_test.cc:
##########
@@ -416,6 +416,12 @@ TEST(UTF8StringToUTF16, Basics) {
 
   CheckInvalid("\xff");
   CheckInvalid("h\xc3");
+
+  // lone high-code point
+  CheckInvalid("\xed\xa0\x80");
+
+  // lone low-code point
+  CheckInvalid("\xed\0xb0\x81");

Review Comment:
   Good catch! Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou merged pull request #36383: GH-36173: [C++] Add lone high and low code-point test case for UTF8StringToUTF16

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou merged PR #36383:
URL: https://github.com/apache/arrow/pull/36383


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #36383: GH-36173: [C++] Add lone high and low code-point test case for UTF8StringToUTF16

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #36383:
URL: https://github.com/apache/arrow/pull/36383#discussion_r1246813056


##########
cpp/src/arrow/util/utf8_util_test.cc:
##########
@@ -416,6 +416,12 @@ TEST(UTF8StringToUTF16, Basics) {
 
   CheckInvalid("\xff");
   CheckInvalid("h\xc3");
+
+  // lone high-code point
+  CheckInvalid("\xed\xa0\x80");
+
+  // lone low-code point
+  CheckInvalid("\xed\0xb0\x81");

Review Comment:
   I think this is incorrect
   ```suggestion
     CheckInvalid("\xed\xb0\x81");
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on pull request #36383: GH-36173: [C++] Add lone high and low code-point test case for UTF8StringToUTF16

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on PR #36383:
URL: https://github.com/apache/arrow/pull/36383#issuecomment-1613563149

   Seems like Homebrew is having issues today, I'll merge anyway if it persists since those failures don't seem related.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] conbench-apache-arrow[bot] commented on pull request #36383: GH-36173: [C++] Add lone high and low code-point test case for UTF8StringToUTF16

Posted by "conbench-apache-arrow[bot] (via GitHub)" <gi...@apache.org>.
conbench-apache-arrow[bot] commented on PR #36383:
URL: https://github.com/apache/arrow/pull/36383#issuecomment-1621641443

   Conbench analyzed the 6 benchmark runs on commit `52d830e6`.
   
   There was 1 benchmark result indicating a performance regression:
   
   - Commit Run on `arm64-m6g-linux-compute` at [2023-06-29 21:36:21Z](http://conbench.ursa.dev/compare/runs/3074207cef0342348ea3f7a6477bf54d...8bb4b3eccaff4c9c9581f604ee8579eb/)
     - [params=num_cols:64/is_partial:1/real_time, source=cpp-micro, suite=arrow-ipc-read-write-benchmark](http://conbench.ursa.dev/compare/benchmarks/0649de233f71768080006b8e0ed5c4ad...0649df9aa67c7c3f80006c9f0b6d316f)
   
   The [full Conbench report](https://github.com/apache/arrow/runs/14792914159) has more details.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on pull request #36383: GH-36173: [C++] Add lone high and low code-point test case for UTF8StringToUTF16

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on PR #36383:
URL: https://github.com/apache/arrow/pull/36383#issuecomment-1613567725

   Thank you for your contributions @sgilmore10 !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org