You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/06 13:48:10 UTC

[GitHub] [arrow] naman1996 opened a new pull request #8231: ARROW-10023: [C++][Gandiva] Implement split_part function in gandiva

naman1996 opened a new pull request #8231:
URL: https://github.com/apache/arrow/pull/8231


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] praveenbingo closed pull request #8231: ARROW-10023: [C++][Gandiva] Implement split_part function in gandiva

Posted by GitBox <gi...@apache.org>.
praveenbingo closed pull request #8231:
URL: https://github.com/apache/arrow/pull/8231


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] naman1996 commented on a change in pull request #8231: ARROW-10023: [C++][Gandiva] Implement split_part function in gandiva

Posted by GitBox <gi...@apache.org>.
naman1996 commented on a change in pull request #8231:
URL: https://github.com/apache/arrow/pull/8231#discussion_r495902001



##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -835,6 +845,59 @@ const char* replace_utf8_utf8_utf8(gdv_int64 context, const char* text,
                                              out_len);
 }
 
+FORCE_INLINE
+const char* split_part(gdv_int64 context, const char* text, gdv_int32 text_len,
+                       const char* delimiter, gdv_int32 delim_len, gdv_int32 index,
+                       gdv_int32* out_len) {
+  char* ret;
+  if (index < 1) {
+    gdv_fn_context_set_error_msg(context, "Index should be >= 1");
+    return "";
+  }
+
+  if (delim_len == 0 || text_len == 0) {
+    // output will just be text if no delimiter is provided
+    return text;
+  }
+
+  // converting both c style arrays to string for easy processing
+  std::string input = std::string(text);

Review comment:
       currently this will work but after refactor to remove string won't. Will need to handle that separately after removing std::string.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] naman1996 commented on pull request #8231: ARROW-10023: [C++][Gandiva] Implement split_part function in gandiva

Posted by GitBox <gi...@apache.org>.
naman1996 commented on pull request #8231:
URL: https://github.com/apache/arrow/pull/8231#issuecomment-697189413


   Functionality - Takes in 3 arguments. String, split_string, and index. Will split the given string by using split_string as delimitersand return the string at index position. Eg - 
   
   split_part("A,B,C", ",", 2) will output B
   
   "A,B,C" -> [A, B, C] -> (return 2nd split element i.e. B)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8231: Arrow 10023: [C++][Gandiva] Implement split_part function in gandiva

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8231:
URL: https://github.com/apache/arrow/pull/8231#issuecomment-695912228


   <!--
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
   -->
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on JIRA?
   https://issues.apache.org/jira/browse/ARROW
   
   Then could you also rename pull request title in the following format?
   
       ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
     * [Other pull requests](https://github.com/apache/arrow/pulls/)
     * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vvellanki commented on a change in pull request #8231: ARROW-10023: [C++][Gandiva] Implement split_part function in gandiva

Posted by GitBox <gi...@apache.org>.
vvellanki commented on a change in pull request #8231:
URL: https://github.com/apache/arrow/pull/8231#discussion_r495801708



##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -835,6 +845,59 @@ const char* replace_utf8_utf8_utf8(gdv_int64 context, const char* text,
                                              out_len);
 }
 
+FORCE_INLINE
+const char* split_part(gdv_int64 context, const char* text, gdv_int32 text_len,
+                       const char* delimiter, gdv_int32 delim_len, gdv_int32 index,
+                       gdv_int32* out_len) {
+  char* ret;
+  if (index < 1) {
+    gdv_fn_context_set_error_msg(context, "Index should be >= 1");
+    return "";
+  }
+
+  if (delim_len == 0 || text_len == 0) {
+    // output will just be text if no delimiter is provided
+    return text;
+  }
+
+  // converting both c style arrays to string for easy processing
+  std::string input = std::string(text);

Review comment:
       Does this work with utf-8 strings and utf-8 delimiters?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] naman1996 commented on a change in pull request #8231: ARROW-10023: [C++][Gandiva] Implement split_part function in gandiva

Posted by GitBox <gi...@apache.org>.
naman1996 commented on a change in pull request #8231:
URL: https://github.com/apache/arrow/pull/8231#discussion_r501579527



##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -835,6 +847,66 @@ const char* replace_utf8_utf8_utf8(gdv_int64 context, const char* text,
                                              out_len);
 }
 
+FORCE_INLINE
+const char* split_part(gdv_int64 context, const char* text, gdv_int32 text_len,
+                       const char* delimiter, gdv_int32 delim_len, gdv_int32 index,
+                       gdv_int32* out_len) {
+  if (index < 1) {
+    char error_message[100];
+    snprintf(error_message, sizeof(error_message),
+             "Index in split_part must be positive, value provided was %d", index);
+    gdv_fn_context_set_error_msg(context, error_message);
+    *out_len = 0;
+    return "";
+  }
+
+  if (delim_len == 0 || text_len == 0) {
+    // output will just be text if no delimiter is provided
+    *out_len = text_len;
+    return text;
+  }
+
+  int i = 0, match_no = 1;
+
+  while (i < text_len) {
+    // find the position where delimiter matched for the first time
+    int match_pos = match_string(text, text_len, i, delimiter, delim_len);
+    if (match_pos == -1 && match_no != index) {
+      // reached the end without finding a match.
+      *out_len = 0;
+      return "";
+    } else {
+      // Found a match. If the match number is index then return this match
+      if (match_no == index) {
+        int end_pos = match_pos - delim_len;
+
+        if (match_pos == -1) {
+          // end position should be last position of the string as we have the last
+          // delimiter
+          end_pos = text_len;
+        }
+
+        *out_len = end_pos - i;
+        char* out_str =
+            reinterpret_cast<char*>(gdv_fn_context_arena_malloc(context, *out_len));
+        if (out_str == nullptr) {
+          gdv_fn_context_set_error_msg(context,
+                                       "Could not allocate memory for output string");
+          *out_len = 0;

Review comment:
       I cannot refactor this output length out as we set this before trying to allocate memory. In rest of the places I have removed *out_len = 0 in the PR and added it in the beginning.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] praveenbingo commented on a change in pull request #8231: ARROW-10023: [C++][Gandiva] Implement split_part function in gandiva

Posted by GitBox <gi...@apache.org>.
praveenbingo commented on a change in pull request #8231:
URL: https://github.com/apache/arrow/pull/8231#discussion_r498103645



##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -40,6 +40,18 @@ gdv_int32 bit_length_binary(const gdv_binary input, gdv_int32 length) {
   return length * 8;
 }
 
+int match_string(const char* input, gdv_int32 input_len, gdv_int32 start_pos,

Review comment:
       please force inline..

##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -40,6 +40,18 @@ gdv_int32 bit_length_binary(const gdv_binary input, gdv_int32 length) {
   return length * 8;
 }
 
+int match_string(const char* input, gdv_int32 input_len, gdv_int32 start_pos,

Review comment:
       are we intentionally not inling this?

##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -835,6 +847,66 @@ const char* replace_utf8_utf8_utf8(gdv_int64 context, const char* text,
                                              out_len);
 }
 
+FORCE_INLINE
+const char* split_part(gdv_int64 context, const char* text, gdv_int32 text_len,
+                       const char* delimiter, gdv_int32 delim_len, gdv_int32 index,
+                       gdv_int32* out_len) {
+  if (index < 1) {
+    char error_message[100];
+    snprintf(error_message, sizeof(error_message),
+             "Index in split_part must be positive, value provided was %d", index);
+    gdv_fn_context_set_error_msg(context, error_message);
+    *out_len = 0;
+    return "";
+  }
+
+  if (delim_len == 0 || text_len == 0) {
+    // output will just be text if no delimiter is provided
+    *out_len = text_len;
+    return text;
+  }
+
+  int i = 0, match_no = 1;
+
+  while (i < text_len) {
+    // find the position where delimiter matched for the first time
+    int match_pos = match_string(text, text_len, i, delimiter, delim_len);
+    if (match_pos == -1 && match_no != index) {
+      // reached the end without finding a match.
+      *out_len = 0;
+      return "";
+    } else {
+      // Found a match. If the match number is index then return this match
+      if (match_no == index) {
+        int end_pos = match_pos - delim_len;
+
+        if (match_pos == -1) {
+          // end position should be last position of the string as we have the last
+          // delimiter
+          end_pos = text_len;
+        }
+
+        *out_len = end_pos - i;
+        char* out_str =
+            reinterpret_cast<char*>(gdv_fn_context_arena_malloc(context, *out_len));
+        if (out_str == nullptr) {
+          gdv_fn_context_set_error_msg(context,
+                                       "Could not allocate memory for output string");
+          *out_len = 0;

Review comment:
       intialize this to 0 in the beginning and only override in the positive case? avoids missing setting this value for e.g. the last return misses this i guess?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] naman1996 commented on a change in pull request #8231: ARROW-10023: [C++][Gandiva] Implement split_part function in gandiva

Posted by GitBox <gi...@apache.org>.
naman1996 commented on a change in pull request #8231:
URL: https://github.com/apache/arrow/pull/8231#discussion_r496091512



##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -835,6 +845,59 @@ const char* replace_utf8_utf8_utf8(gdv_int64 context, const char* text,
                                              out_len);
 }
 
+FORCE_INLINE
+const char* split_part(gdv_int64 context, const char* text, gdv_int32 text_len,
+                       const char* delimiter, gdv_int32 delim_len, gdv_int32 index,
+                       gdv_int32* out_len) {
+  char* ret;
+  if (index < 1) {
+    gdv_fn_context_set_error_msg(context, "Index should be >= 1");
+    return "";
+  }
+
+  if (delim_len == 0 || text_len == 0) {
+    // output will just be text if no delimiter is provided
+    return text;
+  }
+
+  // converting both c style arrays to string for easy processing
+  std::string input = std::string(text);

Review comment:
       Have removed usages of std::string and have also added some unit tests for utf-8 strings and utf-8 delimiters.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8231: ARROW-10023: [C++][Gandiva] Implement split_part function in gandiva

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8231:
URL: https://github.com/apache/arrow/pull/8231#issuecomment-697188782


   https://issues.apache.org/jira/browse/ARROW-10023


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] naman1996 commented on a change in pull request #8231: ARROW-10023: [C++][Gandiva] Implement split_part function in gandiva

Posted by GitBox <gi...@apache.org>.
naman1996 commented on a change in pull request #8231:
URL: https://github.com/apache/arrow/pull/8231#discussion_r501558799



##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -835,6 +847,66 @@ const char* replace_utf8_utf8_utf8(gdv_int64 context, const char* text,
                                              out_len);
 }
 
+FORCE_INLINE
+const char* split_part(gdv_int64 context, const char* text, gdv_int32 text_len,
+                       const char* delimiter, gdv_int32 delim_len, gdv_int32 index,
+                       gdv_int32* out_len) {
+  if (index < 1) {
+    char error_message[100];
+    snprintf(error_message, sizeof(error_message),
+             "Index in split_part must be positive, value provided was %d", index);
+    gdv_fn_context_set_error_msg(context, error_message);
+    *out_len = 0;
+    return "";
+  }
+
+  if (delim_len == 0 || text_len == 0) {
+    // output will just be text if no delimiter is provided
+    *out_len = text_len;
+    return text;
+  }
+
+  int i = 0, match_no = 1;
+
+  while (i < text_len) {
+    // find the position where delimiter matched for the first time
+    int match_pos = match_string(text, text_len, i, delimiter, delim_len);
+    if (match_pos == -1 && match_no != index) {
+      // reached the end without finding a match.
+      *out_len = 0;
+      return "";
+    } else {
+      // Found a match. If the match number is index then return this match
+      if (match_no == index) {
+        int end_pos = match_pos - delim_len;
+
+        if (match_pos == -1) {
+          // end position should be last position of the string as we have the last
+          // delimiter
+          end_pos = text_len;
+        }
+
+        *out_len = end_pos - i;
+        char* out_str =
+            reinterpret_cast<char*>(gdv_fn_context_arena_malloc(context, *out_len));
+        if (out_str == nullptr) {
+          gdv_fn_context_set_error_msg(context,
+                                       "Could not allocate memory for output string");
+          *out_len = 0;

Review comment:
       added out_len 0 in the beginning




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] praveenbingo closed pull request #8231: ARROW-10023: [C++][Gandiva] Implement split_part function in gandiva

Posted by GitBox <gi...@apache.org>.
praveenbingo closed pull request #8231:
URL: https://github.com/apache/arrow/pull/8231


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] naman1996 commented on a change in pull request #8231: ARROW-10023: [C++][Gandiva] Implement split_part function in gandiva

Posted by GitBox <gi...@apache.org>.
naman1996 commented on a change in pull request #8231:
URL: https://github.com/apache/arrow/pull/8231#discussion_r501578740



##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -40,6 +40,18 @@ gdv_int32 bit_length_binary(const gdv_binary input, gdv_int32 length) {
   return length * 8;
 }
 
+int match_string(const char* input, gdv_int32 input_len, gdv_int32 start_pos,

Review comment:
       added




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] praveenbingo commented on a change in pull request #8231: ARROW-10023: [C++][Gandiva] Implement split_part function in gandiva

Posted by GitBox <gi...@apache.org>.
praveenbingo commented on a change in pull request #8231:
URL: https://github.com/apache/arrow/pull/8231#discussion_r495825647



##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -835,6 +845,59 @@ const char* replace_utf8_utf8_utf8(gdv_int64 context, const char* text,
                                              out_len);
 }
 
+FORCE_INLINE
+const char* split_part(gdv_int64 context, const char* text, gdv_int32 text_len,
+                       const char* delimiter, gdv_int32 delim_len, gdv_int32 index,
+                       gdv_int32* out_len) {
+  char* ret;
+  if (index < 1) {
+    gdv_fn_context_set_error_msg(context, "Index should be >= 1");

Review comment:
       need to also set NativeFunction::kCanReturnErrors in the function definition

##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -40,6 +40,16 @@ gdv_int32 bit_length_binary(const gdv_binary input, gdv_int32 length) {
   return length * 8;
 }
 
+int match_string(std::string str, int startPos, std::string splitter) {

Review comment:
       we avoid use of std::string since that causes linking issues in some environments..please use const char * instead

##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -835,6 +845,59 @@ const char* replace_utf8_utf8_utf8(gdv_int64 context, const char* text,
                                              out_len);
 }
 
+FORCE_INLINE
+const char* split_part(gdv_int64 context, const char* text, gdv_int32 text_len,
+                       const char* delimiter, gdv_int32 delim_len, gdv_int32 index,
+                       gdv_int32* out_len) {
+  char* ret;
+  if (index < 1) {
+    gdv_fn_context_set_error_msg(context, "Index should be >= 1");
+    return "";
+  }
+
+  if (delim_len == 0 || text_len == 0) {
+    // output will just be text if no delimiter is provided
+    return text;
+  }
+
+  // converting both c style arrays to string for easy processing
+  std::string input = std::string(text);
+  std::string splitter = std::string(delimiter);
+  std::string out_str = "";

Review comment:
       please avoid use of std::string throughout..




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8231: Arrow 10023: [C++][Gandiva] Implement split_part function in gandiva

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8231:
URL: https://github.com/apache/arrow/pull/8231#issuecomment-695912228


   <!--
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
   -->
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on JIRA?
   https://issues.apache.org/jira/browse/ARROW
   
   Then could you also rename pull request title in the following format?
   
       ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
     * [Other pull requests](https://github.com/apache/arrow/pulls/)
     * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] naman1996 commented on a change in pull request #8231: ARROW-10023: [C++][Gandiva] Implement split_part function in gandiva

Posted by GitBox <gi...@apache.org>.
naman1996 commented on a change in pull request #8231:
URL: https://github.com/apache/arrow/pull/8231#discussion_r501558799



##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -835,6 +847,66 @@ const char* replace_utf8_utf8_utf8(gdv_int64 context, const char* text,
                                              out_len);
 }
 
+FORCE_INLINE
+const char* split_part(gdv_int64 context, const char* text, gdv_int32 text_len,
+                       const char* delimiter, gdv_int32 delim_len, gdv_int32 index,
+                       gdv_int32* out_len) {
+  if (index < 1) {
+    char error_message[100];
+    snprintf(error_message, sizeof(error_message),
+             "Index in split_part must be positive, value provided was %d", index);
+    gdv_fn_context_set_error_msg(context, error_message);
+    *out_len = 0;
+    return "";
+  }
+
+  if (delim_len == 0 || text_len == 0) {
+    // output will just be text if no delimiter is provided
+    *out_len = text_len;
+    return text;
+  }
+
+  int i = 0, match_no = 1;
+
+  while (i < text_len) {
+    // find the position where delimiter matched for the first time
+    int match_pos = match_string(text, text_len, i, delimiter, delim_len);
+    if (match_pos == -1 && match_no != index) {
+      // reached the end without finding a match.
+      *out_len = 0;
+      return "";
+    } else {
+      // Found a match. If the match number is index then return this match
+      if (match_no == index) {
+        int end_pos = match_pos - delim_len;
+
+        if (match_pos == -1) {
+          // end position should be last position of the string as we have the last
+          // delimiter
+          end_pos = text_len;
+        }
+
+        *out_len = end_pos - i;
+        char* out_str =
+            reinterpret_cast<char*>(gdv_fn_context_arena_malloc(context, *out_len));
+        if (out_str == nullptr) {
+          gdv_fn_context_set_error_msg(context,
+                                       "Could not allocate memory for output string");
+          *out_len = 0;

Review comment:
       added out_len 0 in the beginning

##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -40,6 +40,18 @@ gdv_int32 bit_length_binary(const gdv_binary input, gdv_int32 length) {
   return length * 8;
 }
 
+int match_string(const char* input, gdv_int32 input_len, gdv_int32 start_pos,

Review comment:
       added

##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -835,6 +847,66 @@ const char* replace_utf8_utf8_utf8(gdv_int64 context, const char* text,
                                              out_len);
 }
 
+FORCE_INLINE
+const char* split_part(gdv_int64 context, const char* text, gdv_int32 text_len,
+                       const char* delimiter, gdv_int32 delim_len, gdv_int32 index,
+                       gdv_int32* out_len) {
+  if (index < 1) {
+    char error_message[100];
+    snprintf(error_message, sizeof(error_message),
+             "Index in split_part must be positive, value provided was %d", index);
+    gdv_fn_context_set_error_msg(context, error_message);
+    *out_len = 0;
+    return "";
+  }
+
+  if (delim_len == 0 || text_len == 0) {
+    // output will just be text if no delimiter is provided
+    *out_len = text_len;
+    return text;
+  }
+
+  int i = 0, match_no = 1;
+
+  while (i < text_len) {
+    // find the position where delimiter matched for the first time
+    int match_pos = match_string(text, text_len, i, delimiter, delim_len);
+    if (match_pos == -1 && match_no != index) {
+      // reached the end without finding a match.
+      *out_len = 0;
+      return "";
+    } else {
+      // Found a match. If the match number is index then return this match
+      if (match_no == index) {
+        int end_pos = match_pos - delim_len;
+
+        if (match_pos == -1) {
+          // end position should be last position of the string as we have the last
+          // delimiter
+          end_pos = text_len;
+        }
+
+        *out_len = end_pos - i;
+        char* out_str =
+            reinterpret_cast<char*>(gdv_fn_context_arena_malloc(context, *out_len));
+        if (out_str == nullptr) {
+          gdv_fn_context_set_error_msg(context,
+                                       "Could not allocate memory for output string");
+          *out_len = 0;

Review comment:
       I cannot refactor this output length out as we set this before trying to allocate memory. In rest of the places I have removed *out_len = 0 in the PR and added it in the beginning.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] naman1996 closed pull request #8231: ARROW-10023: [C++][Gandiva] Implement split_part function in gandiva

Posted by GitBox <gi...@apache.org>.
naman1996 closed pull request #8231:
URL: https://github.com/apache/arrow/pull/8231


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org