You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/22 13:46:18 UTC

[GitHub] [arrow] siddhantrao23 commented on a diff in pull request #12333: ARROW-15568: [C++][Gandiva] Implement Translate Function

siddhantrao23 commented on code in PR #12333:
URL: https://github.com/apache/arrow/pull/12333#discussion_r903765420


##########
cpp/src/gandiva/gdv_string_function_stubs.cc:
##########
@@ -449,6 +450,218 @@ const char* gdv_fn_initcap_utf8(int64_t context, const char* data, int32_t data_
   *out_len = out_idx;
   return out;
 }
+GANDIVA_EXPORT
+const char* translate_utf8_utf8_utf8(int64_t context, const char* in, int32_t in_len,
+                                     const char* from, int32_t from_len, const char* to,
+                                     int32_t to_len, int32_t* out_len) {
+  if (in_len <= 0) {
+    *out_len = 0;
+    return "";
+  }
+
+  if (from_len <= 0) {
+    *out_len = in_len;
+    return in;
+  }
+
+  // This variable is to control if there are multi-byte utf8 entries
+  bool has_multi_byte = false;
+
+  // This variable is to store the final result
+  char* result;
+  int result_len;
+
+  // Searching multi-bytes in In
+  for (int i = 0; i < in_len; i++) {
+    unsigned char char_single_byte = in[i];
+    if (char_single_byte > 127) {
+      // found a multi-byte utf-8 char
+      has_multi_byte = true;
+      break;
+    }
+  }
+
+  // Searching multi-bytes in From
+  for (int i = 0; i < from_len; i++) {
+    unsigned char char_single_byte = from[i];
+    if (char_single_byte > 127) {
+      // found a multi-byte utf-8 char
+      has_multi_byte = true;
+      break;
+    }
+  }
+
+  // Searching multi-bytes in To
+  for (int i = 0; i < to_len; i++) {
+    unsigned char char_single_byte = to[i];
+    if (char_single_byte > 127) {
+      // found a multi-byte utf-8 char
+      has_multi_byte = true;
+      break;
+    }
+  }

Review Comment:
   A check can be added in each of the for loops to check if `has_multi_byte` has already been set to true to avoid extra computation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org