You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by ap...@apache.org on 2023/06/29 17:26:12 UTC
[arrow] branch main updated: GH-36173: [C++] Add lone high and low code-point test case for UTF8StringToUTF16 (#36383)
This is an automated email from the ASF dual-hosted git repository.
apitrou pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/main by this push:
new 52d830e64a GH-36173: [C++] Add lone high and low code-point test case for UTF8StringToUTF16 (#36383)
52d830e64a is described below
commit 52d830e64a0a02a6b0f68b1af35590f05d139471
Author: sgilmore10 <74...@users.noreply.github.com>
AuthorDate: Thu Jun 29 13:26:05 2023 -0400
GH-36173: [C++] Add lone high and low code-point test case for UTF8StringToUTF16 (#36383)
### Rationale for this change
This is a followup PR to #36167 that addresses feedback left after the PR was merged.
### What changes are included in this PR?
1. Added a test point verifying `UTF8StringToUTF16` returns an `Invalid` status if given a UTF-8 encoded string that contains a lone high or low code point.
2. Removed `ARROW_EXPORT` from definitions of `UTF8StringToUTF16` and `UTF16StringToUTF18`.
### Are these changes tested?
Yes.
### Are there any user-facing changes?
No.
* Closes: #36173
Lead-authored-by: Sarah Gilmore <sg...@mathworks.com>
Co-authored-by: sgilmore10 <74...@users.noreply.github.com>
Co-authored-by: Antoine Pitrou <pi...@free.fr>
Signed-off-by: Antoine Pitrou <an...@python.org>
---
cpp/src/arrow/util/utf8.cc | 4 ++--
cpp/src/arrow/util/utf8_util_test.cc | 6 ++++++
2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/cpp/src/arrow/util/utf8.cc b/cpp/src/arrow/util/utf8.cc
index 3aa46347ba..7f3e87a86e 100644
--- a/cpp/src/arrow/util/utf8.cc
+++ b/cpp/src/arrow/util/utf8.cc
@@ -176,7 +176,7 @@ ARROW_EXPORT Result<std::string> WideStringToUTF8(const std::wstring& source) {
}
}
-ARROW_EXPORT Result<std::string> UTF16StringToUTF8(const std::u16string& source) {
+Result<std::string> UTF16StringToUTF8(const std::u16string& source) {
try {
return UTF16StringToUTF8Internal(source);
} catch (std::exception& e) {
@@ -184,7 +184,7 @@ ARROW_EXPORT Result<std::string> UTF16StringToUTF8(const std::u16string& source)
}
}
-ARROW_EXPORT Result<std::u16string> UTF8StringToUTF16(const std::string& source) {
+Result<std::u16string> UTF8StringToUTF16(const std::string& source) {
try {
return UTF8StringToUTF16Internal(source);
} catch (std::exception& e) {
diff --git a/cpp/src/arrow/util/utf8_util_test.cc b/cpp/src/arrow/util/utf8_util_test.cc
index cb59ba9be0..bada5e59d8 100644
--- a/cpp/src/arrow/util/utf8_util_test.cc
+++ b/cpp/src/arrow/util/utf8_util_test.cc
@@ -416,6 +416,12 @@ TEST(UTF8StringToUTF16, Basics) {
CheckInvalid("\xff");
CheckInvalid("h\xc3");
+
+ // lone high-code point
+ CheckInvalid("\xed\xa0\x80");
+
+ // lone low-code point
+ CheckInvalid("\xed\xb0\x81");
}
TEST(UTF16StringToUTF8, Basics) {