You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/06 20:29:47 UTC

[GitHub] [arrow] projjal commented on a change in pull request #10604: ARROW-13190: [C++] [Gandiva] Change behavior of INITCAP function

projjal commented on a change in pull request #10604:
URL: https://github.com/apache/arrow/pull/10604#discussion_r664857191



##########
File path: cpp/src/gandiva/gdv_function_stubs.cc
##########
@@ -635,30 +638,31 @@ const char* gdv_fn_initcap_utf8(int64_t context, const char* data, int32_t data_
   int32_t out_char_len = 0;
   int32_t out_idx = 0;
   uint32_t char_codepoint;
+
+  // Any character is considered as space, except if it is alphanumeric
   bool last_char_was_space = true;
 
   for (int32_t i = 0; i < data_len; i += char_len) {
     char_len = gdv_fn_utf8_char_length(data[i]);
-    // For single byte characters:
-    // If it is a lowercase ASCII character, set the output to its corresponding uppercase
-    // character; else, set the output to the read character
+    // An optimization for single byte characters:
     if (char_len == 1) {

Review comment:
       what happens if we have invalid bytes?

##########
File path: cpp/src/gandiva/gdv_function_stubs_test.cc
##########
@@ -514,7 +514,31 @@ TEST(TestGdvFnStubs, TestInitCap) {
   EXPECT_FALSE(ctx.has_error());
 
   out_str = gdv_fn_initcap_utf8(ctx_ptr, "{ÕHP,pqśv}Ń+", 15, &out_len);
-  EXPECT_EQ(std::string(out_str, out_len), "{ÕHP,pqśv}Ń+");
+  EXPECT_EQ(std::string(out_str, out_len), "{Õhp,Pqśv}Ń+");
+  EXPECT_FALSE(ctx.has_error());
+
+  out_str = gdv_fn_initcap_utf8(ctx_ptr, "sɦasasdsɦsd\"sdsdɦ", 19, &out_len);
+  EXPECT_EQ(std::string(out_str, out_len), "Sɦasasdsɦsd\"Sdsdɦ");
+  EXPECT_FALSE(ctx.has_error());
+
+  out_str = gdv_fn_initcap_utf8(ctx_ptr, "mysuperscipt@number²isfine", 27, &out_len);
+  EXPECT_EQ(std::string(out_str, out_len), "Mysuperscipt@Number²Isfine");
+  EXPECT_FALSE(ctx.has_error());
+
+  out_str = gdv_fn_initcap_utf8(ctx_ptr, "Ő<tŵas̓老ƕɱ¢vIYwށ", 25, &out_len);
+  EXPECT_EQ(std::string(out_str, out_len), "Ő<Tŵas̓老Ƕɱ¢Viywށ");

Review comment:
       why did the 25 disappear in the output?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org