You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by ap...@apache.org on 2023/06/29 17:26:12 UTC

[arrow] branch main updated: GH-36173: [C++] Add lone high and low code-point test case for UTF8StringToUTF16 (#36383)

This is an automated email from the ASF dual-hosted git repository.

apitrou pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
     new 52d830e64a GH-36173: [C++] Add lone high and low code-point test case for UTF8StringToUTF16 (#36383)
52d830e64a is described below

commit 52d830e64a0a02a6b0f68b1af35590f05d139471
Author: sgilmore10 <74...@users.noreply.github.com>
AuthorDate: Thu Jun 29 13:26:05 2023 -0400

    GH-36173: [C++] Add lone high and low code-point test case for UTF8StringToUTF16 (#36383)
    
    
    
    ### Rationale for this change
    
    This is a followup PR to #36167 that addresses feedback left after the PR was merged.
    
    ### What changes are included in this PR?
    
    1. Added a test point verifying `UTF8StringToUTF16` returns an `Invalid` status if given a UTF-8 encoded string that contains a lone high or low code point.
    2. Removed `ARROW_EXPORT` from definitions of `UTF8StringToUTF16` and `UTF16StringToUTF18`.
    
    ### Are these changes tested?
    
    Yes.
    
    ### Are there any user-facing changes?
    
    No.
    
    * Closes: #36173
    
    Lead-authored-by: Sarah Gilmore <sg...@mathworks.com>
    Co-authored-by: sgilmore10 <74...@users.noreply.github.com>
    Co-authored-by: Antoine Pitrou <pi...@free.fr>
    Signed-off-by: Antoine Pitrou <an...@python.org>
---
 cpp/src/arrow/util/utf8.cc           | 4 ++--
 cpp/src/arrow/util/utf8_util_test.cc | 6 ++++++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/cpp/src/arrow/util/utf8.cc b/cpp/src/arrow/util/utf8.cc
index 3aa46347ba..7f3e87a86e 100644
--- a/cpp/src/arrow/util/utf8.cc
+++ b/cpp/src/arrow/util/utf8.cc
@@ -176,7 +176,7 @@ ARROW_EXPORT Result<std::string> WideStringToUTF8(const std::wstring& source) {
   }
 }
 
-ARROW_EXPORT Result<std::string> UTF16StringToUTF8(const std::u16string& source) {
+Result<std::string> UTF16StringToUTF8(const std::u16string& source) {
   try {
     return UTF16StringToUTF8Internal(source);
   } catch (std::exception& e) {
@@ -184,7 +184,7 @@ ARROW_EXPORT Result<std::string> UTF16StringToUTF8(const std::u16string& source)
   }
 }
 
-ARROW_EXPORT Result<std::u16string> UTF8StringToUTF16(const std::string& source) {
+Result<std::u16string> UTF8StringToUTF16(const std::string& source) {
   try {
     return UTF8StringToUTF16Internal(source);
   } catch (std::exception& e) {
diff --git a/cpp/src/arrow/util/utf8_util_test.cc b/cpp/src/arrow/util/utf8_util_test.cc
index cb59ba9be0..bada5e59d8 100644
--- a/cpp/src/arrow/util/utf8_util_test.cc
+++ b/cpp/src/arrow/util/utf8_util_test.cc
@@ -416,6 +416,12 @@ TEST(UTF8StringToUTF16, Basics) {
 
   CheckInvalid("\xff");
   CheckInvalid("h\xc3");
+
+  // lone high-code point
+  CheckInvalid("\xed\xa0\x80");
+
+  // lone low-code point
+  CheckInvalid("\xed\xb0\x81");
 }
 
 TEST(UTF16StringToUTF8, Basics) {