You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/07/10 04:15:12 UTC

[GitHub] [arrow] projjal commented on a change in pull request #7641: ARROW-9328: [C++][Gandiva] Add LTRIM, RTRIM, BTRIM functions for string

projjal commented on a change in pull request #7641:
URL: https://github.com/apache/arrow/pull/7641#discussion_r452613034



##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -320,6 +385,143 @@ const char* trim_utf8(gdv_int64 context, const char* data, gdv_int32 data_len,
   return data + start;
 }
 
+// Trims characters present in the trim text from the left end of the base text
+FORCE_INLINE
+const char* ltrim_utf8_utf8(gdv_int64 context, const char* basetext,
+                            gdv_int32 basetext_len, const char* trimtext,
+                            gdv_int32 trimtext_len, int32_t* out_len) {

Review comment:
       The utf8 handling seems incorrect. You need to decode the utf8 char and match against the target string instead of matching individual bytes. In this case a byte of multibyte char in trim_text might match against a byte of another multibyte char in target string




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org